1
|
Sappington A, Mohanty V. Probabilistic Genotype-Phenotype Maps Reveal Mutational Robustness of RNA Folding, Spin Glasses, and Quantum Circuits. ARXIV 2024:arXiv:2301.01847v2. [PMID: 36713233 PMCID: PMC9882568] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Recent studies of genotype-phenotype (GP) maps have reported universally enhanced phenotypic robustness to genotype mutations, a feature essential to evolution. Virtually all of these studies make a simplifying assumption that each genotype-represented as a sequence-maps deterministically to a single phenotype, such as a discrete structure. Here, we introduce probabilistic genotype-phenotype (PrGP) maps, where each genotype maps to a vector of phenotype probabilities, as a more realistic and universal language for investigating robustness in a variety of physical, biological, and computational systems. We study three model systems to show that PrGP maps offer a generalized framework which can handle uncertainty emerging from various physical sources: (1) thermal fluctuation in RNA folding, (2) external field disorder in spin glass ground state finding, and (3) superposition and entanglement in quantum circuits, which are realized experimentally on IBM quantum computers. In all three cases, we observe a novel biphasic robustness scaling which is enhanced relative to random expectation for more frequent phenotypes and approaches random expectation for less frequent phenotypes. We derive an analytical theory for the behavior of PrGP robustness, and we demonstrate that the theory is highly predictive of empirical robustness.
Collapse
Affiliation(s)
- Anna Sappington
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Harvard-MIT Health Sciences and Technology, Harvard Medical School, Boston, MA 02115 and Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Vaibhav Mohanty
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138
- Harvard-MIT Health Sciences and Technology, Harvard Medical School, Boston, MA 02115 and Massachusetts Institute of Technology, Cambridge, MA 02139
| |
Collapse
|
2
|
Zhang H, Quadeer AA, McKay MR. Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis. Nat Commun 2023; 14:7457. [PMID: 37978179 PMCID: PMC10656532 DOI: 10.1038/s41467-023-42550-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
Direct-acting antiviral agents (DAAs) provide efficacious therapeutic treatments for chronic Hepatitis C virus (HCV) infection. However, emergence of drug resistance mutations (DRMs) can greatly affect treatment outcomes and impede virological cure. While multiple DRMs have been observed for all currently used DAAs, the evolutionary determinants of such mutations are not currently well understood. Here, by considering DAAs targeting the nonstructural 3 (NS3) protein of HCV, we present results suggesting that epistasis plays an important role in the evolution of DRMs. Employing a sequence-based fitness landscape model whose predictions correlate highly with experimental data, we identify specific DRMs that are associated with strong epistatic interactions, and these are found to be enriched in multiple NS3-specific DAAs. Evolutionary modelling further supports that the identified DRMs involve compensatory mutational interactions that facilitate relatively easy escape from drug-induced selection pressures. Our results indicate that accounting for epistasis is important for designing future HCV NS3-targeting DAAs.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC, Australia.
- Department of Microbiology and Immunology, University of Melbourne, at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia.
| |
Collapse
|
3
|
D’Orso I, Forst CV. Mathematical Models of HIV-1 Dynamics, Transcription, and Latency. Viruses 2023; 15:2119. [PMID: 37896896 PMCID: PMC10612035 DOI: 10.3390/v15102119] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 10/10/2023] [Accepted: 10/18/2023] [Indexed: 10/29/2023] Open
Abstract
HIV-1 latency is a major barrier to curing infections with antiretroviral therapy and, consequently, to eliminating the disease globally. The establishment, maintenance, and potential clearance of latent infection are complex dynamic processes and can be best described with the help of mathematical models followed by experimental validation. Here, we review the use of viral dynamics models for HIV-1, with a focus on applications to the latent reservoir. Such models have been used to explain the multi-phasic decay of viral load during antiretroviral therapy, the early seeding of the latent reservoir during acute infection and the limited inflow during treatment, the dynamics of viral blips, and the phenomenon of post-treatment control. Finally, we discuss that mathematical models have been used to predict the efficacy of potential HIV-1 cure strategies, such as latency-reversing agents, early treatment initiation, or gene therapies, and to provide guidance for designing trials of these novel interventions.
Collapse
Affiliation(s)
- Iván D’Orso
- Department of Microbiology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA;
| | - Christian V. Forst
- Department of Genetics and Genomic Sciences, Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| |
Collapse
|
4
|
Li M, Oliveira Passos D, Shan Z, Smith SJ, Sun Q, Biswas A, Choudhuri I, Strutzenberg TS, Haldane A, Deng N, Li Z, Zhao XZ, Briganti L, Kvaratskhelia M, Burke TR, Levy RM, Hughes SH, Craigie R, Lyumkis D. Mechanisms of HIV-1 integrase resistance to dolutegravir and potent inhibition of drug-resistant variants. SCIENCE ADVANCES 2023; 9:eadg5953. [PMID: 37478179 DOI: 10.1126/sciadv.adg5953] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 06/16/2023] [Indexed: 07/23/2023]
Abstract
HIV-1 infection depends on the integration of viral DNA into host chromatin. Integration is mediated by the viral enzyme integrase and is blocked by integrase strand transfer inhibitors (INSTIs), first-line antiretroviral therapeutics widely used in the clinic. Resistance to even the best INSTIs is a problem, and the mechanisms of resistance are poorly understood. Here, we analyze combinations of the mutations E138K, G140A/S, and Q148H/K/R, which confer resistance to INSTIs. The investigational drug 4d more effectively inhibited the mutants compared with the approved drug Dolutegravir (DTG). We present 11 new cryo-EM structures of drug-resistant HIV-1 intasomes bound to DTG or 4d, with better than 3-Å resolution. These structures, complemented with free energy simulations, virology, and enzymology, explain the mechanisms of DTG resistance involving E138K + G140A/S + Q148H/K/R and show why 4d maintains potency better than DTG. These data establish a foundation for further development of INSTIs that potently inhibit resistant forms in integrase.
Collapse
Affiliation(s)
- Min Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Zelin Shan
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Steven J Smith
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Qinfang Sun
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | - Avik Biswas
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Nanjie Deng
- Department of Chemistry and Physical Sciences, Pace University, New York, NY, 10038, USA
| | - Zhaoyang Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xue Zhi Zhao
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Lorenzo Briganti
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Mamuka Kvaratskhelia
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Terrence R Burke
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Ronald M Levy
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Stephen H Hughes
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Robert Craigie
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Dmitry Lyumkis
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Graduate School of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
5
|
Mauri E, Cocco S, Monasson R. Mutational Paths with Sequence-Based Models of Proteins: From Sampling to Mean-Field Characterization. PHYSICAL REVIEW LETTERS 2023; 130:158402. [PMID: 37115874 DOI: 10.1103/physrevlett.130.158402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Accepted: 03/16/2023] [Indexed: 06/19/2023]
Abstract
Identifying and characterizing mutational paths is an important issue in evolutionary biology, with potential applications to bioengineering. We here propose an algorithm to sample mutational paths, which we benchmark on exactly solvable models of proteins in silico, and apply to data-driven models of natural proteins learned from sequence data with restricted Boltzmann machines. We then use mean-field theory to characterize paths for different mutational dynamics of interest, and to extend Kimura's estimate of evolutionary distances to sequence-based epistatic models of selection.
Collapse
Affiliation(s)
- Eugenio Mauri
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 and PSL Research, Sorbonne Université, 24 rue Lhomond, 75231 Paris cedex 05, France
| | - Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 and PSL Research, Sorbonne Université, 24 rue Lhomond, 75231 Paris cedex 05, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR 8023 and PSL Research, Sorbonne Université, 24 rue Lhomond, 75231 Paris cedex 05, France
| |
Collapse
|
6
|
Dichio V, Zeng HL, Aurell E. Statistical genetics in and out of quasi-linkage equilibrium. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2023; 86:052601. [PMID: 36944245 DOI: 10.1088/1361-6633/acc5fa] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/08/2022] [Accepted: 03/21/2023] [Indexed: 06/18/2023]
Abstract
This review is about statistical genetics, an interdisciplinary topic between statistical physics and population biology. The focus is on the phase ofquasi-linkage equilibrium(QLE). Our goals here are to clarify under which conditions the QLE phase can be expected to hold in population biology and how the stability of the QLE phase is lost. The QLE state, which has many similarities to a thermal equilibrium state in statistical mechanics, was discovered by M Kimura for a two-locus two-allele model, and was extended and generalized to the global genome scale byNeher&Shraiman (2011). What we will refer to as the Kimura-Neher-Shraiman theory describes a population evolving due to the mutations, recombination, natural selection and possibly genetic drift. A QLE phase exists at sufficiently high recombination rate (r) and/or mutation ratesµwith respect to selection strength. We show how in QLE it is possible to infer the epistatic parameters of the fitness function from the knowledge of the (dynamical) distribution of genotypes in a population. We further consider the breakdown of the QLE regime for high enough selection strength. We review recent results for the selection-mutation and selection-recombination dynamics. Finally, we identify and characterize a new phase which we call the non-random coexistence where variability persists in the population without either fixating or disappearing.
Collapse
Affiliation(s)
- Vito Dichio
- Sorbonne Université, Paris Brain Institute-ICM, CNRS, Inria, Inserm, AP-HP, Hôpital de la Pitié Salpêtrière, F-75013 Paris, France
| | - Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, People's Republic of China
| | - Erik Aurell
- Department of Computational Science and Technology, KTH-Royal Institute of Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
7
|
Mohanty V, Louis AA. Robustness and stability of spin-glass ground states to perturbed interactions. Phys Rev E 2023; 107:014126. [PMID: 36797942 DOI: 10.1103/physreve.107.014126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 12/16/2022] [Indexed: 06/18/2023]
Abstract
Across many problems in science and engineering, it is important to consider how much the output of a given system changes due to perturbations of the input. Here, we investigate the glassy phase of ±J spin glasses at zero temperature by calculating the robustness of the ground states to flips in the sign of single interactions. For random graphs and the Sherrington-Kirkpatrick model, we find relatively large sets of bond configurations that generate the same ground state. These sets can themselves be analyzed as subgraphs of the interaction domain, and we compute many of their topological properties. In particular, we find that the robustness, equivalent to the average degree, of these subgraphs is much higher than one would expect from a random model. Most notably, it scales in the same logarithmic way with the size of the subgraph as has been found in genotype-phenotype maps for RNA secondary structure folding, protein quaternary structure, gene regulatory networks, as well as for models for genetic programming. The similarity between these disparate systems suggests that this scaling may have a more universal origin.
Collapse
Affiliation(s)
- Vaibhav Mohanty
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, OX1 3NP, United Kingdom
- MD-PhD Program and Program in Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts 02125, USA and Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, OX1 3NP, United Kingdom
| |
Collapse
|
8
|
Choudhuri I, Biswas A, Haldane A, Levy RM. Contingency and Entrenchment of Drug-Resistance Mutations in HIV Viral Proteins. J Phys Chem B 2022; 126:10622-10636. [PMID: 36493468 PMCID: PMC9841799 DOI: 10.1021/acs.jpcb.2c06123] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The ability of HIV-1 to rapidly mutate leads to antiretroviral therapy (ART) failure among infected patients. Drug-resistance mutations (DRMs), which cause a fitness penalty to intrinsic viral fitness, are compensated by accessory mutations with favorable epistatic interactions which cause an evolutionary trapping effect, but the kinetics of this overall process has not been well characterized. Here, using a Potts Hamiltonian model describing epistasis combined with kinetic Monte Carlo simulations of evolutionary trajectories, we explore how epistasis modulates the evolutionary dynamics of HIV DRMs. We show how the occurrence of a drug-resistance mutation is contingent on favorable epistatic interactions with many other residues of the sequence background and that subsequent mutations entrench DRMs. We measure the time-autocorrelation of fluctuations in the likelihood of DRMs due to epistatic coupling with the sequence background, which reveals the presence of two evolutionary processes controlling DRM kinetics with two distinct time scales. Further analysis of waiting times for the evolutionary trapping effect to reverse reveals that the sequences which entrench (trap) a DRM are responsible for the slower time scale. We also quantify the overall strength of epistatic effects on the evolutionary kinetics for different mutations and show these are much larger for DRM positions than polymorphic positions, and we also show that trapping of a DRM is often caused by the collective effect of many accessory mutations, rather than a few strongly coupled ones, suggesting the importance of multiresidue sequence variations in HIV evolution. The analysis presented here provides a framework to explore the kinetic pathways through which viral proteins like HIV evolve under drug-selection pressure.
Collapse
Affiliation(s)
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States; Department of Physics, Temple University, Philadelphia, Pennsylvania 19122-6008, United States
| | - Ronald M. Levy
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States; Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
9
|
Zeng HL, Liu Y, Dichio V, Aurell E. Temporal epistasis inference from more than 3 500 000 SARS-CoV-2 genomic sequences. Phys Rev E 2022; 106:044409. [PMID: 36397507 DOI: 10.1103/physreve.106.044409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
We use direct coupling analysis (DCA) to determine epistatic interactions between loci of variability of the SARS-CoV-2 virus, segmenting genomes by month of sampling. We use full-length, high-quality genomes from the GISAID repository up to October 2021 for a total of over 3 500 000 genomes. We find that DCA terms are more stable over time than correlations but nevertheless change over time as mutations disappear from the global population or reach fixation. Correlations are enriched for phylogenetic effects, and in particularly statistical dependencies at short genomic distances, while DCA brings out links at longer genomic distance. We discuss the validity of a DCA analysis under these conditions in terms of a transient auasilinkage equilibrium state. We identify putative epistatic interaction mutations involving loci in spike.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Yue Liu
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Vito Dichio
- Inria Paris, Aramis Project Team, Paris 75013, France
- Institut du Cerveau, ICM, Inserm U 1127, CNRS UMR 7225, Sorbonne Université, Paris, France
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
10
|
Patel R, Carnevale V, Kumar S. Epistasis Creates Invariant Sites and Modulates the Rate of Molecular Evolution. Mol Biol Evol 2022; 39:msac106. [PMID: 35575390 PMCID: PMC9156017 DOI: 10.1093/molbev/msac106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Invariant sites are a common feature of amino acid sequence evolution. The presence of invariant sites is frequently attributed to the need to preserve function through site-specific conservation of amino acid residues. Amino acid substitution models without a provision for invariant sites often fit the data significantly worse than those that allow for an excess of invariant sites beyond those predicted by models that only incorporate rate variation among sites (e.g., a Gamma distribution). An alternative is epistasis between sites to preserve residue interactions that can create invariant sites. Through computer-simulated sequence evolution, we evaluated the relative effects of site-specific preferences and site-site couplings in the generation of invariant sites and the modulation of the rate of molecular evolution. In an analysis of ten major families of protein domains with diverse sequence and functional properties, we find that the negative selection imposed by epistasis creates many more invariant sites than site-specific residue preferences alone. Further, epistasis plays an increasingly larger role in creating invariant sites over longer evolutionary periods. Epistasis also dictates rates of domain evolution over time by exerting significant additional purifying selection to preserve site couplings. These patterns illuminate the mechanistic role of epistasis in the processes underlying observed site invariance and evolutionary rates.
Collapse
Affiliation(s)
- Ravi Patel
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Vincenzo Carnevale
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA 19122, USA
- Department of Biology, Temple University, Philadelphia, PA 19122, USA
- Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
11
|
Biswas A, Haldane A, Levy RM. Limits to detecting epistasis in the fitness landscape of HIV. PLoS One 2022; 17:e0262314. [PMID: 35041711 PMCID: PMC8765623 DOI: 10.1371/journal.pone.0262314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/20/2021] [Indexed: 02/05/2023] Open
Abstract
The rapid evolution of HIV is constrained by interactions between mutations which affect viral fitness. In this work, we explore the role of epistasis in determining the mutational fitness landscape of HIV for multiple drug target proteins, including Protease, Reverse Transcriptase, and Integrase. Epistatic interactions between residues modulate the mutation patterns involved in drug resistance, with unambiguous signatures of epistasis best seen in the comparison of the Potts model predicted and experimental HIV sequence “prevalences” expressed as higher-order marginals (beyond triplets) of the sequence probability distribution. In contrast, experimental measures of fitness such as viral replicative capacities generally probe fitness effects of point mutations in a single background, providing weak evidence for epistasis in viral systems. The detectable effects of epistasis are obscured by higher evolutionary conservation at sites. While double mutant cycles in principle, provide one of the best ways to probe epistatic interactions experimentally without reference to a particular background, we show that the analysis is complicated by the small dynamic range of measurements. Overall, we show that global pairwise interaction Potts models are necessary for predicting the mutational landscape of viral proteins.
Collapse
Affiliation(s)
- Avik Biswas
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Allan Haldane
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Ronald M. Levy
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
- Department of Chemistry, Temple University, Philadelphia, PA, United States of America
- * E-mail:
| |
Collapse
|
12
|
Doelger J, Kardar M, Chakraborty AK. Inferring the intrinsic mutational fitness landscape of influenzalike evolving antigens from temporally ordered sequence data. Phys Rev E 2022; 105:024401. [PMID: 35291059 DOI: 10.1103/physreve.105.024401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 01/19/2022] [Indexed: 06/14/2023]
Abstract
There still are no effective long-term protective vaccines against viruses that continuously evolve under immune pressure such as seasonal influenza, which has caused, and can cause, devastating epidemics in the human population. To find such a broadly protective immunization strategy, it is useful to know how easily the virus can escape via mutation from specific antibody responses. This information is encoded in the fitness landscape of the viral proteins (i.e., knowledge of the viral fitness as a function of sequence). Here we present a computational method to infer the intrinsic mutational fitness landscape of influenzalike evolving antigens from yearly sequence data. We test inference performance with computer-generated sequence data that are based on stochastic simulations mimicking basic features of immune-driven viral evolution. Although the numerically simulated model does create a phylogeny based on the allowed mutations, the inference scheme does not use this information. This provides a contrast to other methods that rely on reconstruction of phylogenetic trees. Our method just needs a sufficient number of samples over multiple years. With our method, we are able to infer single as well as pairwise mutational fitness effects from the simulated sequence time series for short antigenic proteins. Our fitness inference approach may have potential future use for the design of immunization protocols by identifying intrinsically vulnerable immune target combinations on antigens that evolve under immune-driven selection. In the future, this approach may be applied to influenza and other novel viruses such as SARS-CoV-2, which evolves and, like influenza, might continue to escape the natural and vaccine-mediated immune pressures.
Collapse
Affiliation(s)
- Julia Doelger
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Arup K Chakraborty
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; and Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
13
|
Shen Y, Olson ER, Van Deelen TR. Spatially explicit modeling of community occupancy using Markov Random Field models with imperfect observation: Mesocarnivores in Apostle Islands National Lakeshore. Ecol Modell 2021. [DOI: 10.1016/j.ecolmodel.2021.109712] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
14
|
Adenovirus-vectored vaccine containing multidimensionally conserved parts of the HIV proteome is immunogenic in rhesus macaques. Proc Natl Acad Sci U S A 2021; 118:2022496118. [PMID: 33514660 DOI: 10.1073/pnas.2022496118] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An effective vaccine that can protect against HIV infection does not exist. A major reason why a vaccine is not available is the high mutability of the virus, which enables it to evolve mutations that can evade human immune responses. This challenge is exacerbated by the ability of the virus to evolve compensatory mutations that can partially restore the fitness cost of immune-evading mutations. Based on the fitness landscapes of HIV proteins that account for the effects of coupled mutations, we designed a single long peptide immunogen comprising parts of the HIV proteome wherein mutations are likely to be deleterious regardless of the sequence of the rest of the viral protein. This immunogen was then stably expressed in adenovirus vectors that are currently in clinical development. Macaques immunized with these vaccine constructs exhibited T-cell responses that were comparable in magnitude to animals immunized with adenovirus vectors with whole HIV protein inserts. Moreover, the T-cell responses in immunized macaques strongly targeted regions contained in our immunogen. These results suggest that further studies aimed toward using our vaccine construct for HIV prophylaxis and cure are warranted.
Collapse
|
15
|
Gao A, Chen Z, Amitai A, Doelger J, Mallajosyula V, Sundquist E, Pereyra Segal F, Carrington M, Davis MM, Streeck H, Chakraborty AK, Julg B. Learning from HIV-1 to predict the immunogenicity of T cell epitopes in SARS-CoV-2. iScience 2021; 24:102311. [PMID: 33748696 PMCID: PMC7956900 DOI: 10.1016/j.isci.2021.102311] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/22/2021] [Accepted: 03/10/2021] [Indexed: 12/18/2022] Open
Abstract
We describe a physics-based learning model for predicting the immunogenicity of cytotoxic T lymphocyte (CTL) epitopes derived from diverse pathogens including SARS-CoV-2. The model was trained and optimized on the relative immunodominance of CTL epitopes in human immunodeficiency virus infection. Its accuracy was tested against experimental data from patients with COVID-19. Our model predicts that only some SARS-CoV-2 epitopes predicted to bind to HLA molecules are immunogenic. The immunogenic CTL epitopes across all SARS-CoV-2 proteins are predicted to provide broad population coverage, but those from the SARS-CoV-2 spike protein alone are unlikely to do so. Our model also predicts that several immunogenic SARS-CoV-2 CTL epitopes are identical to seasonal coronaviruses circulating in the population and such cross-reactive CD8+ T cells can indeed be detected in prepandemic blood donors, suggesting that some level of CTL immunity against COVID-19 may be present in some individuals before SARS-CoV-2 infection.
Collapse
Affiliation(s)
- Ang Gao
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Zhilin Chen
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, 400 Technology Sq., Cambridge, MA 02139, USA
| | - Assaf Amitai
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Julia Doelger
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
| | - Vamsee Mallajosyula
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Emily Sundquist
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, 400 Technology Sq., Cambridge, MA 02139, USA
| | | | - Mary Carrington
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, 400 Technology Sq., Cambridge, MA 02139, USA
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Mark M. Davis
- Institute for Immunity, Transplantation, and Infection, Stanford University School of Medicine, Stanford, CA 94305, USA
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University School of Medicine, Stanford, CA 94305, USA
| | - Hendrik Streeck
- Institut für Virologie, Universitätsklinikum Bonn, 53127 Bonn, Germany
| | - Arup K. Chakraborty
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, 400 Technology Sq., Cambridge, MA 02139, USA
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Boris Julg
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, 400 Technology Sq., Cambridge, MA 02139, USA
| |
Collapse
|
16
|
Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021; 10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The design of synthetic proteins with the desired function is a long-standing goal in biomolecular science, with broad applications in biochemical engineering, agriculture, medicine, and public health. Rational de novo design and experimental directed evolution have achieved remarkable successes but are challenged by the requirement to find functional "needles" in the vast "haystack" of protein sequence space. Data-driven models for fitness landscapes provide a predictive map between protein sequence and function and can prospectively identify functional candidates for experimental testing to greatly improve the efficiency of this search. This Viewpoint reviews the applications of machine learning and, in particular, deep learning as part of data-driven protein engineering platforms. We highlight recent successes, review promising computational methodologies, and provide an outlook on future challenges and opportunities. The article is written for a broad audience comprising both polymer and protein scientists and computer and data scientists interested in an up-to-date review of recent innovations and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Rama Ranganathan
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Center for Physics of Evolving Systems, University of Chicago, Chicago, Illinois 60637, United States
- Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
17
|
Conti S, Kaczorowski KJ, Song G, Porter K, Andrabi R, Burton DR, Chakraborty AK, Karplus M. Design of immunogens to elicit broadly neutralizing antibodies against HIV targeting the CD4 binding site. Proc Natl Acad Sci U S A 2021; 118:e2018338118. [PMID: 33637649 PMCID: PMC7936365 DOI: 10.1073/pnas.2018338118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
A vaccine which is effective against the HIV virus is considered to be the best solution to the ongoing global HIV/AIDS epidemic. In the past thirty years, numerous attempts to develop an effective vaccine have been made with little or no success, due, in large part, to the high mutability of the virus. More recent studies showed that a vaccine able to elicit broadly neutralizing antibodies (bnAbs), that is, antibodies that can neutralize a high fraction of global virus variants, has promise to protect against HIV. Such a vaccine has been proposed to involve at least three separate stages: First, activate the appropriate precursor B cells; second, shepherd affinity maturation along pathways toward bnAbs; and, third, polish the Ab response to bind with high affinity to diverse HIV envelopes (Env). This final stage may require immunization with a mixture of Envs. In this paper, we set up a framework based on theory and modeling to design optimal panels of antigens to use in such a mixture. The designed antigens are characterized experimentally and are shown to be stable and to be recognized by known HIV antibodies.
Collapse
Affiliation(s)
- Simone Conti
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138
| | - Kevin J Kaczorowski
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Ge Song
- Scripps Consortium for HIV/AIDS Vaccine Development, The Scripps Research Institute, La Jolla, CA 92037
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037
| | - Katelyn Porter
- Scripps Consortium for HIV/AIDS Vaccine Development, The Scripps Research Institute, La Jolla, CA 92037
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037
| | - Raiees Andrabi
- Scripps Consortium for HIV/AIDS Vaccine Development, The Scripps Research Institute, La Jolla, CA 92037
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037
| | - Dennis R Burton
- Scripps Consortium for HIV/AIDS Vaccine Development, The Scripps Research Institute, La Jolla, CA 92037
- IAVI Neutralizing Antibody Center, The Scripps Research Institute, La Jolla, CA 92037
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA 92037
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02139
| | - Arup K Chakraborty
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02139
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Martin Karplus
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138;
- Laboratoire de Chimie Biophysique, Institut de Science et d'Ingénierie Supramoléculaires, Université de Strasbourg, 67000 Strasbourg, France
| |
Collapse
|
18
|
Puller V, Sagulenko P, Neher RA. Efficient inference, potential, and limitations of site-specific substitution models. Virus Evol 2020; 6:veaa066. [PMID: 33343922 PMCID: PMC7733610 DOI: 10.1093/ve/veaa066] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Natural selection imposes a complex filter on which variants persist in a population resulting in evolutionary patterns that vary greatly along the genome. Some sites evolve close to neutrally, while others are highly conserved, allow only specific states, or only change in concert with other sites. On one hand, such constraints on sequence evolution can be to infer biological function, one the other hand they need to be accounted for in phylogenetic reconstruction. Phylogenetic models often account for this complexity by partitioning sites into a small number of discrete classes with different rates and/or state preferences. Appropriate model complexity is typically determined by model selection procedures. Here, we present an efficient algorithm to estimate more complex models that allow for different preferences at every site and explore the accuracy at which such models can be estimated from simulated data. Our iterative approximate maximum likelihood scheme uses information in the data efficiently and accurately estimates site-specific preferences from large data sets with moderately diverged sequences and known topology. However, the joint estimation of site-specific rates, and site-specific preferences, and phylogenetic branch length can suffer from identifiability problems, while ignoring variation in preferences across sites results in branch length underestimates. Site-specific preferences estimated from large HIV pol alignments show qualitative concordance with intra-host estimates of fitness costs. Analysis of these substitution models suggests near saturation of divergence after a few hundred years. Such saturation can explain the inability to infer deep divergence times of HIV and SIVs using molecular clock approaches and time-dependent rate estimates.
Collapse
Affiliation(s)
- Vadim Puller
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| | - Pavel Sagulenko
- Max Planck Institute for Developmental Biology, Max-Planck-Ring 5, 72076 Tübingen, Germany
| | - Richard A Neher
- Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Klingelbergstrasse 61, Basel, Switzerland
| |
Collapse
|
19
|
Zeng HL, Dichio V, Rodríguez Horta E, Thorell K, Aurell E. Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes. Proc Natl Acad Sci U S A 2020; 117:31519-31526. [PMID: 33203681 PMCID: PMC7733830 DOI: 10.1073/pnas.2012331117] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to deeper understanding of microbial pathogenesis. We have considered all complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes deposited in the Global Initiative on Sharing All Influenza Data (GISAID) repository until four different cutoff dates, and used direct coupling analysis together with an assumption of quasi-linkage equilibrium to infer epistatic contributions to fitness from polymorphic loci. We find eight interactions, of which three are between pairs where one locus lies in gene ORF3a, both loci holding nonsynonymous mutations. We also find interactions between two loci in gene nsp13, both holding nonsynonymous mutations, and four interactions involving one locus holding a synonymous mutation. Altogether, we infer interactions between loci in viral genes ORF3a and nsp2, nsp12, and nsp6, between ORF8 and nsp4, and between loci in genes nsp2, nsp13, and nsp14. The paper opens the prospect to use prominent epistatically linked pairs as a starting point to search for combinatorial weaknesses of recombinant viral pathogens.
Collapse
Affiliation(s)
- Hong-Li Zeng
- New Energy Technology Engineering Laboratory of Jiangsu Province, School of Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
| | - Vito Dichio
- Nordic Institute for Theoretical Physics, Royal Institute of Technology and Stockholm University, 10691 Stockholm, Sweden
- Department of Physics, University of Trieste, 34151 Trieste, Italy
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden
| | - Edwin Rodríguez Horta
- Group of Complex Systems and Statistical Physics, Department of Theoretical Physics, Physics Faculty, University of Havana, 10400 Havana, Cuba
| | - Kaisa Thorell
- Department of Infectious Diseases, Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530 Gothenburg, Sweden
- Center for Translational Microbiome Research, Department of Microbiology, Cell and Tumor Biology, Karolinska Institutet, 17177 Stockholm, Sweden
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, 10691 Stockholm, Sweden;
| |
Collapse
|
20
|
Gao A, Chen Z, Segal FP, Carrington M, Streeck H, Chakraborty AK, Julg B. Predicting the Immunogenicity of T cell epitopes: From HIV to SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020:2020.05.14.095885. [PMID: 32511339 PMCID: PMC7241102 DOI: 10.1101/2020.05.14.095885] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
We describe a physics-based learning model for predicting the immunogenicity of Cytotoxic T Lymphocyte (CTL) epitopes derived from diverse pathogens, given a Human Leukocyte Antigen (HLA) genotype. The model was trained and tested on experimental data on the relative immunodominance of CTL epitopes in Human Immunodeficiency Virus infection. The method is more accurate than publicly available models. Our model predicts that only a fraction of SARS-CoV-2 epitopes that have been predicted to bind to HLA molecules is immunogenic. The immunogenic CTL epitopes across all SARS-CoV-2 proteins are predicted to provide broad population coverage, but the immunogenic epitopes in the SARS-CoV-2 spike protein alone are unlikely to do so. Our model predicts that several immunogenic SARS-CoV-2 CTL epitopes are identical to those contained in low-pathogenicity coronaviruses circulating in the population. Thus, we suggest that some level of CTL immunity against COVID-19 may be present in some individuals prior to SARS-CoV-2 infection.
Collapse
Affiliation(s)
- Ang Gao
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA
| | - Zhilin Chen
- Ragon Insitute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA
| | | | - Mary Carrington
- Ragon Insitute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA
- Basic Science Program, Frederick National Laboratory for Cancer Research, Frederick, MD 21702, USA
| | - Hendrik Streeck
- Institut für Virologie, Universitätsklinikum Bonn, 53127 Bonn, Germany
| | - Arup K. Chakraborty
- Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Ragon Insitute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology Cambridge, MA 02139, USA
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Boris Julg
- Ragon Insitute of Massachusetts General Hospital, Massachusetts Institute of Technology, and Harvard, Cambridge, MA 02139, USA
| |
Collapse
|
21
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
22
|
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape. Nat Commun 2020; 11:377. [PMID: 31953427 PMCID: PMC6969152 DOI: 10.1038/s41467-019-14174-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations. Poliovirus has a higher mutation rate than HIV, yet has been almost eradicated by vaccination while an effective vaccine against HIV does not exist. Here, the authors develop a fitness model for poliovirus viral protein 1 to show that it is subject to stringent evolutionary constraints that limit its ability to avoid vaccine-induced immune responses.
Collapse
|
23
|
Biswas A, Haldane A, Arnold E, Levy RM. Epistasis and entrenchment of drug resistance in HIV-1 subtype B. eLife 2019; 8:e50524. [PMID: 31591964 PMCID: PMC6783267 DOI: 10.7554/elife.50524] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 12/17/2022] Open
Abstract
The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Allan Haldane
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Eddy Arnold
- Center for Advanced Biotechnology and MedicineRutgers UniversityPiscatawayUnited States
- Department of Chemistry and Chemical BiologyRutgers UniversityPiscatawayUnited States
| | - Ronald M Levy
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
- Department of ChemistryTemple UniversityPhiladelphiaUnited States
| |
Collapse
|
24
|
Henes M, Kosovrasti K, Lockbaum GJ, Leidner F, Nachum GS, Nalivaika EA, Bolon DN, Yilmaz NK, Schiffer CA, Whitfield TW. Molecular Determinants of Epistasis in HIV-1 Protease: Elucidating the Interdependence of L89V and L90M Mutations in Resistance. Biochemistry 2019; 58:3711-3726. [PMID: 31386353 PMCID: PMC6941756 DOI: 10.1021/acs.biochem.9b00446] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protease inhibitors have the highest potency among antiviral therapies against HIV-1 infections, yet the virus can evolve resistance. Darunavir (DRV), currently the most potent Food and Drug Administration-approved protease inhibitor, retains potency against single-site mutations. However, complex combinations of mutations can confer resistance to DRV. While the interdependence between mutations within HIV-1 protease is key for inhibitor potency, the molecular mechanisms that underlie this control remain largely unknown. In this study, we investigated the interdependence between the L89V and L90M mutations and their effects on DRV binding. These two mutations have been reported to be positively correlated with one another in HIV-1 patient-derived protease isolates, with the presence of one mutation making the probability of the occurrence of the second mutation more likely. The focus of our investigation is a patient-derived isolate, with 24 mutations that we call "KY"; this variant includes the L89V and L90M mutations. Three additional KY variants with back-mutations, KY(V89L), KY(M90L), and the KY(V89L/M90L) double mutation, were used to experimentally assess the individual and combined effects of these mutations on DRV inhibition and substrate processing. The enzymatic assays revealed that the KY(V89L) variant, with methionine at residue 90, is highly resistant, but its catalytic function is compromised. When a leucine to valine mutation at residue 89 is present simultaneously with the L90M mutation, a rescue of catalytic efficiency is observed. Molecular dynamics simulations of these DRV-bound protease variants reveal how the L90M mutation induces structural changes throughout the enzyme that undermine the binding interactions.
Collapse
Affiliation(s)
- Mina Henes
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Klajdi Kosovrasti
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gordon J. Lockbaum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Florian Leidner
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gily S. Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Ellen A. Nalivaika
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Daniel N.A. Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Nese Kurt Yilmaz
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Celia A. Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Corresponding Author Celia A. Schiffer: Phone: +1 508 856 8008; , Troy W. Whitfield: Phone: +1 508 856 4401;
| | - Troy W. Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Corresponding Author Celia A. Schiffer: Phone: +1 508 856 8008; , Troy W. Whitfield: Phone: +1 508 856 4401;
| |
Collapse
|
25
|
Barton JP, Rajkoomar E, Mann JK, Murakowski DK, Toyoda M, Mahiti M, Mwimanzi P, Ueno T, Chakraborty AK, Ndung'u T. Modelling and in vitro testing of the HIV-1 Nef fitness landscape. Virus Evol 2019; 5:vez029. [PMID: 31392033 PMCID: PMC6680064 DOI: 10.1093/ve/vez029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
An effective vaccine is urgently required to curb the HIV-1 epidemic. We have previously described an approach to model the fitness landscape of several HIV-1 proteins, and have validated the results against experimental and clinical data. The fitness landscape may be used to identify mutation patterns harmful to virus viability, and consequently inform the design of immunogens that can target such regions for immunological control. Here we apply such an analysis and complementary experiments to HIV-1 Nef, a multifunctional protein which plays a key role in HIV-1 pathogenesis. We measured Nef-driven replication capacities as well as Nef-mediated CD4 and HLA-I down-modulation capacities of thirty-two different Nef mutants, and tested model predictions against these results. Furthermore, we evaluated the models using 448 patient-derived Nef sequences for which several Nef activities were previously measured. Model predictions correlated significantly with Nef-driven replication and CD4 down-modulation capacities, but not HLA-I down-modulation capacities, of the various Nef mutants. Similarly, in our analysis of patient-derived Nef sequences, CD4 down-modulation capacity correlated the most significantly with model predictions, suggesting that of the tested Nef functions, this is the most important in vivo. Overall, our results highlight how the fitness landscape inferred from patient-derived sequences captures, at least in part, the in vivo functional effects of mutations to Nef. However, the correlation between predictions of the fitness landscape and measured parameters of Nef function is not as accurate as the correlation observed in past studies for other proteins. This may be because of the additional complexity associated with inferring the cost of mutations on the diverse functions of Nef.
Collapse
Affiliation(s)
- John P Barton
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Erasha Rajkoomar
- HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Jaclyn K Mann
- HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Dariusz K Murakowski
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mako Toyoda
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan
| | | | | | - Takamasa Ueno
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan.,International Research Center for Medical Sciences (IRCMS), Kumamoto University, Kumamoto, Japan
| | - Arup K Chakraborty
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Thumbi Ndung'u
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA.,HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.,Africa Health Research Institute, Durban, South Africa.,Max Planck Institute for Infection Biology, Chariteplatz, D-10117 Berlin, Germany
| |
Collapse
|
26
|
Boucher JI, Whitfield TW, Dauphin A, Nachum G, Hollins C, Zeldovich KB, Swanstrom R, Schiffer CA, Luban J, Bolon DNA. Constrained Mutational Sampling of Amino Acids in HIV-1 Protease Evolution. Mol Biol Evol 2019; 36:798-810. [PMID: 30721995 DOI: 10.1093/molbev/msz022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution is interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available data set of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple-base mutations in HIV-1 protease is strongly influenced by mutational sampling.
Collapse
Affiliation(s)
- Jeffrey I Boucher
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Troy W Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA
| | - Ann Dauphin
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Gily Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Carl Hollins
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Ronald Swanstrom
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Jeremy Luban
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| |
Collapse
|
27
|
Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019; 114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open
Abstract
The protein kinase catalytic domain is one of the most abundant domains across all branches of life. Although kinases share a common core function of phosphoryl-transfer, they also have wide functional diversity and play varied roles in cell signaling networks, and for this reason are implicated in a number of human diseases. This functional diversity is primarily achieved through sequence variation, and uncovering the sequence-function relationships for the kinase family is a major challenge. In this study we use a statistical inference technique inspired by statistical physics, which builds a coevolutionary "Potts" Hamiltonian model of sequence variation in a protein family. We show how this model has sufficient power to predict the probability of specific subsequences in the highly diverged kinase family, which we verify by comparing the model's predictions with experimental observations in the Uniprot database. We show that the pairwise (residue-residue) interaction terms of the statistical model are necessary and sufficient to capture higher-than-pairwise mutation patterns of natural kinase sequences. We observe that previously identified functional sets of residues have much stronger correlated interaction scores than are typical.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Peng He
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania.
| |
Collapse
|
28
|
Identifying immunologically-vulnerable regions of the HCV E2 glycoprotein and broadly neutralizing antibodies that target them. Nat Commun 2019; 10:2073. [PMID: 31061402 PMCID: PMC6502829 DOI: 10.1038/s41467-019-09819-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 04/02/2019] [Indexed: 02/06/2023] Open
Abstract
Isolation of broadly neutralizing human monoclonal antibodies (HmAbs) targeting the E2 glycoprotein of Hepatitis C virus (HCV) has sparked hope for effective vaccine development. Nonetheless, escape mutations have been reported. Ideally, a potent vaccine should elicit HmAbs that target regions of E2 that are most difficult to escape. Here, aimed at addressing this challenge, we develop a predictive in-silico evolutionary model for E2 that identifies one such region, a specific antigenic domain, making it an attractive target for a robust antibody response. Specific broadly neutralizing HmAbs that appear difficult to escape from are also identified. By providing a framework for identifying vulnerable regions of E2 and for assessing the potency of specific antibodies, our results can aid the rational design of an effective prophylactic HCV vaccine. A good vaccine should direct the immune response to virus regions that are most difficult to escape. Here, Quadeer et al. develop a predictive in-silico evolutionary model for HCV E2 which identifies one such antigenic region and identifies multiple broadly neutralizing human antibodies that appear difficult to escape from.
Collapse
|
29
|
Dingens AS, Arenz D, Weight H, Overbaugh J, Bloom JD. An Antigenic Atlas of HIV-1 Escape from Broadly Neutralizing Antibodies Distinguishes Functional and Structural Epitopes. Immunity 2019; 50:520-532.e3. [PMID: 30709739 PMCID: PMC6435357 DOI: 10.1016/j.immuni.2018.12.017] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2018] [Revised: 11/16/2018] [Accepted: 12/14/2018] [Indexed: 11/18/2022]
Abstract
Anti-HIV broadly neutralizing antibodies (bnAbs) have revealed vaccine targets on the virus's envelope (Env) protein and are themselves promising immunotherapies. The efficacy of bnAb-based therapies and vaccines depends in part on how readily the virus can escape neutralization. Although structural studies can define contacts between bnAbs and Env, only functional studies can define mutations that confer escape. Here, we mapped how all possible single amino acid mutations in Env affect neutralization of HIV by nine bnAbs targeting five epitopes. For most bnAbs, mutations at only a small fraction of structurally defined contact sites mediated escape, and most escape occurred at sites near, but not in direct contact with, the antibody. The Env mutations selected by two pooled bnAbs were similar to those expected from the combination of the bnAbs's independent action. Overall, our mutation-level antigenic atlas provides a comprehensive dataset for understanding viral immune escape and refining therapies and vaccines.
Collapse
Affiliation(s)
- Adam S Dingens
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Molecular & Cellular Biology PhD Program, University of Washington, Seattle, WA 98195, USA; Division of Human Biology and Epidemiology Program, Seattle, WA 98109, USA
| | - Dana Arenz
- Division of Human Biology and Epidemiology Program, Seattle, WA 98109, USA
| | - Haidyn Weight
- Division of Human Biology and Epidemiology Program, Seattle, WA 98109, USA
| | - Julie Overbaugh
- Division of Human Biology and Epidemiology Program, Seattle, WA 98109, USA.
| | - Jesse D Bloom
- Basic Sciences Division and Computational Biology Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; Howard Hughes Medical Institute, Seattle, WA 98109, USA.
| |
Collapse
|
30
|
Gao CY, Cecconi F, Vulpiani A, Zhou HJ, Aurell E. DCA for genome-wide epistasis analysis: the statistical genetics perspective. Phys Biol 2019; 16:026002. [PMID: 30605896 DOI: 10.1088/1478-3975/aafbe0] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Direct coupling analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the quasi-linkage equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of clonal competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3000 genomes of the human pathogen Streptococcus pneumoniae.
Collapse
Affiliation(s)
- Chen-Yi Gao
- Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China. School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | | | | | | | | |
Collapse
|
31
|
Hart GR, Ferguson AL. Computational design of hepatitis C virus immunogens from host-pathogen dynamics over empirical viral fitness landscapes. Phys Biol 2018; 16:016004. [PMID: 30484433 DOI: 10.1088/1478-3975/aaeec0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Hepatitis C virus (HCV) afflicts 170 million people and kills 700 000 annually. Vaccination offers the most realistic and cost effective hope of controlling this epidemic, but despite 25 years of research, no vaccine is available. A major obstacle is HCV's extreme genetic variability and rapid mutational escape from immune pressure. Coupling maximum entropy inference with population dynamics simulations, we have employed a computational approach to translate HCV sequence databases into empirical landscapes of viral fitness and simulate the intrahost evolution of the viral quasispecies over these landscapes. We explicitly model the coupled host-pathogen dynamics by combining agent-based models of viral mutation with stochastically-integrated coupled ordinary differential equations for the host immune response. We validate our model in predicting the mutational evolution of the HCV RNA-dependent RNA polymerase (protein NS5B) within seven individuals for whom longitudinal sequencing data is available. We then use our approach to perform exhaustive in silico evaluation of putative immunogen candidates to rationally design tailored vaccines to simultaneously cripple viral fitness and block mutational escape within two selected individuals. By systematically identifying a small number of promising vaccine candidates, our empirical fitness landscapes and host-pathogen dynamics simulator can guide and accelerate experimental vaccine design efforts.
Collapse
Affiliation(s)
- Gregory R Hart
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, IL 61801, United States of America. Present address: Department of Therapeutic Radiology, Yale University, 202 LLCI, 15 York Street, New Haven, CT 96510, United States of America
| | | |
Collapse
|
32
|
Nelson ED, Grishin NV. Inference of epistatic effects in a key mitochondrial protein. Phys Rev E 2018; 97:062404. [PMID: 30011480 DOI: 10.1103/physreve.97.062404] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Indexed: 12/17/2022]
Abstract
We use Potts model inference to predict pair epistatic effects in a key mitochondrial protein-cytochrome c oxidase subunit 2-for ray-finned fishes. We examine the effect of phylogenetic correlations on our predictions using a simple exact fitness model, and we find that, although epistatic effects are underpredicted, they maintain a roughly linear relationship to their true (model) values. After accounting for this correction, epistatic effects in the protein are still relatively weak, leading to fitness valleys of depth 2Ns≃-5 in compensatory double mutants. Interestingly, positive epistasis is more pronounced than negative epistasis, and the strongest positive effects capture nearly all sites subject to positive selection in fishes, similar to virus proteins evolving under selection pressure in the context of drug therapy.
Collapse
Affiliation(s)
- Erik D Nelson
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 6001 Forest Park Blvd., Room ND10.124, Dallas, Texas 75235-9050, USA
| |
Collapse
|
33
|
Co-evolution networks of HIV/HCV are modular with direct association to structure and function. PLoS Comput Biol 2018; 14:e1006409. [PMID: 30192744 PMCID: PMC6145588 DOI: 10.1371/journal.pcbi.1006409] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 09/19/2018] [Accepted: 07/31/2018] [Indexed: 01/09/2023] Open
Abstract
Mutational correlation patterns found in population-level sequence data for the Human Immunodeficiency Virus (HIV) and the Hepatitis C Virus (HCV) have been demonstrated to be informative of viral fitness. Such patterns can be seen as footprints of the intrinsic functional constraints placed on viral evolution under diverse selective pressures. Here, considering multiple HIV and HCV proteins, we demonstrate that these mutational correlations encode a modular co-evolutionary structure that is tightly linked to the structural and functional properties of the respective proteins. Specifically, by introducing a robust statistical method based on sparse principal component analysis, we identify near-disjoint sets of collectively-correlated residues (sectors) having mostly a one-to-one association to largely distinct structural or functional domains. This suggests that the distinct phenotypic properties of HIV/HCV proteins often give rise to quasi-independent modes of evolution, with each mode involving a sparse and localized network of mutational interactions. Moreover, individual inferred sectors of HIV are shown to carry immunological significance, providing insight for guiding targeted vaccine strategies.
Collapse
|
34
|
Anishchenko I, Kundrotas PJ, Vakser IA. Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model. Biophys J 2018; 115:809-821. [PMID: 30122295 DOI: 10.1016/j.bpj.2018.07.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 07/31/2018] [Indexed: 12/18/2022] Open
Abstract
The energy function is the key component of protein modeling methodology. This work presents a semianalytical approach to the development of contact potentials for protein structure modeling. Residue-residue and atom-atom contact energies were derived by maximizing the probability of observing native sequences in a nonredundant set of protein structures. The optimization task was formulated as an inverse statistical mechanics problem applied to the Potts model. Its solution by pseudolikelihood maximization provides consistent estimates of coupling constants at atomic and residue levels. The best performance was achieved when interacting atoms were grouped according to their physicochemical properties. For individual protein structures, the performance of the contact potentials in distinguishing near-native structures from the decoys is similar to the top-performing scoring functions. The potentials also yielded significant improvement in the protein docking success rates. The potentials recapitulated experimentally determined protein stability changes upon point mutations and protein-protein binding affinities. The approach offers a different perspective on knowledge-based potentials and may serve as the basis for their further development.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas
| | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| |
Collapse
|
35
|
De Martino A, De Martino D. An introduction to the maximum entropy approach and its application to inference problems in biology. Heliyon 2018; 4:e00596. [PMID: 29862358 PMCID: PMC5968179 DOI: 10.1016/j.heliyon.2018.e00596] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Revised: 03/31/2018] [Accepted: 04/03/2018] [Indexed: 11/15/2022] Open
Abstract
A cornerstone of statistical inference, the maximum entropy framework is being increasingly applied to construct descriptive and predictive models of biological systems, especially complex biological networks, from large experimental data sets. Both its broad applicability and the success it obtained in different contexts hinge upon its conceptual simplicity and mathematical soundness. Here we try to concisely review the basic elements of the maximum entropy principle, starting from the notion of 'entropy', and describe its usefulness for the analysis of biological systems. As examples, we focus specifically on the problem of reconstructing gene interaction networks from expression data and on recent work attempting to expand our system-level understanding of bacterial metabolism. Finally, we highlight some extensions and potential limitations of the maximum entropy approach, and point to more recent developments that are likely to play a key role in the upcoming challenges of extracting structures and information from increasingly rich, high-throughput biological data.
Collapse
Affiliation(s)
- Andrea De Martino
- Soft & Living Matter Lab, Institute of Nanotechnology (NANOTEC), Consiglio Nazionale delle Ricerche, Rome, Italy
- Italian Institute for Genomic Medicine (IIGM), Turin, Italy
| | | |
Collapse
|
36
|
Schubert B, Schärfe C, Dönnes P, Hopf T, Marks D, Kohlbacher O. Population-specific design of de-immunized protein biotherapeutics. PLoS Comput Biol 2018; 14:e1005983. [PMID: 29499035 PMCID: PMC5851651 DOI: 10.1371/journal.pcbi.1005983] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Revised: 03/14/2018] [Accepted: 01/15/2018] [Indexed: 11/19/2022] Open
Abstract
Immunogenicity is a major problem during the development of biotherapeutics since it can lead to rapid clearance of the drug and adverse reactions. The challenge for biotherapeutic design is therefore to identify mutants of the protein sequence that minimize immunogenicity in a target population whilst retaining pharmaceutical activity and protein function. Current approaches are moderately successful in designing sequences with reduced immunogenicity, but do not account for the varying frequencies of different human leucocyte antigen alleles in a specific population and in addition, since many designs are non-functional, require costly experimental post-screening. Here, we report a new method for de-immunization design using multi-objective combinatorial optimization. The method simultaneously optimizes the likelihood of a functional protein sequence at the same time as minimizing its immunogenicity tailored to a target population. We bypass the need for three-dimensional protein structure or molecular simulations to identify functional designs by automatically generating sequences using probabilistic models that have been used previously for mutation effect prediction and structure prediction. As proof-of-principle we designed sequences of the C2 domain of Factor VIII and tested them experimentally, resulting in a good correlation with the predicted immunogenicity of our model.
Collapse
Affiliation(s)
- Benjamin Schubert
- Center for Bioinformatics, University of Tübingen, Tübingen, Germany
- Applied Bioinformatics, Dept. of Computer Science, Tübingen, Germany
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
- * E-mail:
| | - Charlotta Schärfe
- Center for Bioinformatics, University of Tübingen, Tübingen, Germany
- Applied Bioinformatics, Dept. of Computer Science, Tübingen, Germany
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Pierre Dönnes
- Center for Bioinformatics, University of Tübingen, Tübingen, Germany
- SciCross AB, Skövde, Sweden
| | - Thomas Hopf
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Debora Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America
| | - Oliver Kohlbacher
- Center for Bioinformatics, University of Tübingen, Tübingen, Germany
- Applied Bioinformatics, Dept. of Computer Science, Tübingen, Germany
- Quantitative Biology Center, Tübingen, Germany
- Faculty of Medicine, University of Tübingen, Tübingen, Germany
- Biomolecular Interactions, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|
37
|
Louie RHY, Kaczorowski KJ, Barton JP, Chakraborty AK, McKay MR. Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies. Proc Natl Acad Sci U S A 2018; 115:E564-E573. [PMID: 29311326 PMCID: PMC5789945 DOI: 10.1073/pnas.1717765115] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
HIV is a highly mutable virus, and over 30 years after its discovery, a vaccine or cure is still not available. The isolation of broadly neutralizing antibodies (bnAbs) from HIV-infected patients has led to renewed hope for a prophylactic vaccine capable of combating the scourge of HIV. A major challenge is the design of immunogens and vaccination protocols that can elicit bnAbs that target regions of the virus's spike proteins where the likelihood of mutational escape is low due to the high fitness cost of mutations. Related challenges include the choice of combinations of bnAbs for therapy. An accurate representation of viral fitness as a function of its protein sequences (a fitness landscape), with explicit accounting of the effects of coupling between mutations, could help address these challenges. We describe a computational approach that has allowed us to infer a fitness landscape for gp160, the HIV polyprotein that comprises the viral spike that is targeted by antibodies. We validate the inferred landscape through comparisons with experimental fitness measurements, and various other metrics. We show that an effective antibody that prevents immune escape must selectively bind to high escape cost residues that are surrounded by those where mutations incur a low fitness cost, motivating future applications of our landscape for immunogen design.
Collapse
Affiliation(s)
- Raymond H Y Louie
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Institute for Advanced Study, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Kevin J Kaczorowski
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - John P Barton
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139
| | - Arup K Chakraborty
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Matthew R McKay
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong;
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
38
|
Lee AA, Brenner MP, Colwell LJ. Optimal Design of Experiments by Combining Coarse and Fine Measurements. PHYSICAL REVIEW LETTERS 2017; 119:208101. [PMID: 29219382 DOI: 10.1103/physrevlett.119.208101] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2017] [Indexed: 06/07/2023]
Abstract
In many contexts, it is extremely costly to perform enough high-quality experimental measurements to accurately parametrize a predictive quantitative model. However, it is often much easier to carry out large numbers of experiments that indicate whether each sample is above or below a given threshold. Can many such categorical or "coarse" measurements be combined with a much smaller number of high-resolution or "fine" measurements to yield accurate models? Here, we demonstrate an intuitive strategy, inspired by statistical physics, wherein the coarse measurements are used to identify the salient features of the data, while the fine measurements determine the relative importance of these features. A linear model is inferred from the fine measurements, augmented by a quadratic term that captures the correlation structure of the coarse data. We illustrate our strategy by considering the problems of predicting the antimalarial potency and aqueous solubility of small organic molecules from their 2D molecular structure.
Collapse
Affiliation(s)
- Alpha A Lee
- Cavendish Laboratory, University of Cambridge, Cambridge CB3 0HE, United Kingdom and School of Engineering and Applied Sciences and Kavli Institute of Bionano Science and Technology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Michael P Brenner
- School of Engineering and Applied Sciences and Kavli Institute of Bionano Science and Technology, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Lucy J Colwell
- Department of Chemistry, University of Cambridge, CB2 1EW Cambridge, United Kingdom
| |
Collapse
|
39
|
Flynn WF, Haldane A, Torbett BE, Levy RM. Inference of Epistatic Effects Leading to Entrenchment and Drug Resistance in HIV-1 Protease. Mol Biol Evol 2017; 34:1291-1306. [PMID: 28369521 PMCID: PMC5435099 DOI: 10.1093/molbev/msx095] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Understanding the complex mutation patterns that give rise to drug resistant viral strains provides a foundation for developing more effective treatment strategies for HIV/AIDS. Multiple sequence alignments of drug-experienced HIV-1 protease sequences contain networks of many pair correlations which can be used to build a (Potts) Hamiltonian model of these mutation patterns. Using this Hamiltonian model, we translate HIV-1 protease sequence covariation data into quantitative predictions for the probability of observing specific mutation patterns which are in agreement with the observed sequence statistics. We find that the statistical energies of the Potts model are correlated with the fitness of individual proteins containing therapy-associated mutations as estimated by in vitro measurements of protein stability and viral infectivity. We show that the penalty for acquiring primary resistance mutations depends on the epistatic interactions with the sequence background. Primary mutations which lead to drug resistance can become highly advantageous (or entrenched) by the complex mutation patterns which arise in response to drug therapy despite being destabilizing in the wildtype background. Anticipating epistatic effects is important for the design of future protease inhibitor therapies.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, New Brunswick, NJ
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| |
Collapse
|
40
|
Chakraborty AK, Barton JP. Rational design of vaccine targets and strategies for HIV: a crossroad of statistical physics, biology, and medicine. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2017; 80:032601. [PMID: 28059778 DOI: 10.1088/1361-6633/aa574a] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Vaccination has saved more lives than any other medical procedure. Pathogens have now evolved that have not succumbed to vaccination using the empirical paradigms pioneered by Pasteur and Jenner. Vaccine design strategies that are based on a mechanistic understanding of the pertinent immunology and virology are required to confront and eliminate these scourges. In this perspective, we describe just a few examples of work aimed to achieve this goal by bringing together approaches from statistical physics with biology and clinical research.
Collapse
Affiliation(s)
- Arup K Chakraborty
- Departments of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Ragon Institute of MIT, MGH, & Harvard, Cambridge, MA 02139, United States of America
| | | |
Collapse
|
41
|
Zanini F, Puller V, Brodin J, Albert J, Neher RA. In vivo mutation rates and the landscape of fitness costs of HIV-1. Virus Evol 2017; 3:vex003. [PMID: 28458914 PMCID: PMC5399928 DOI: 10.1093/ve/vex003] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Mutation rates and fitness costs of deleterious mutations are difficult to measure in vivo but essential for a quantitative understanding of evolution. Using whole genome deep sequencing data from longitudinal samples during untreated HIV-1 infection, we estimated mutation rates and fitness costs in HIV-1 from the dynamics of genetic variation. At approximately neutral sites, mutations accumulate with a rate of 1.2 × 10-5 per site per day, in agreement with the rate measured in cell cultures. We estimated the rate from G to A to be the largest, followed by the other transitions C to T, T to C, and A to G, while transversions are less frequent. At other sites, mutations tend to reduce virus replication. We estimated the fitness cost of mutations at every site in the HIV-1 genome using a model of mutation selection balance. About half of all non-synonymous mutations have large fitness costs (>10 percent), while most synonymous mutations have costs <1 percent. The cost of synonymous mutations is especially low in most of pol where we could not detect measurable costs for the majority of synonymous mutations. In contrast, we find high costs for synonymous mutations in important RNA structures and regulatory regions. The intra-patient fitness cost estimates are consistent across multiple patients, indicating that the deleterious part of the fitness landscape is universal and explains a large fraction of global HIV-1 group M diversity.
Collapse
Affiliation(s)
- Fabio Zanini
- Max Planck Institute for Developmental Biology, Tübingen 72076, Germany
- Department of Bioengineering, Stanford University, Stanford, CA 94305, USA
| | - Vadim Puller
- Max Planck Institute for Developmental Biology, Tübingen 72076, Germany
| | - Johanna Brodin
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, SE-171 76 Stockholm, Sweden
| | - Jan Albert
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institute, SE-171 76 Stockholm, Sweden
- Department of Clinical Microbiology, Karolinska Institute, SE-171 76, Stockholm, Sweden
| | - Richard A. Neher
- Max Planck Institute for Developmental Biology, Tübingen 72076, Germany
| |
Collapse
|
42
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|
43
|
Dettmer SL, Nguyen HC, Berg J. Network inference in the nonequilibrium steady state. Phys Rev E 2016; 94:052116. [PMID: 27967084 DOI: 10.1103/physreve.94.052116] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Indexed: 06/06/2023]
Abstract
Nonequilibrium systems lack an explicit characterization of their steady state like the Boltzmann distribution for equilibrium systems. This has drastic consequences for the inference of the parameters of a model when its dynamics lacks detailed balance. Such nonequilibrium systems occur naturally in applications like neural networks and gene regulatory networks. Here, we focus on the paradigmatic asymmetric Ising model and show that we can learn its parameters from independent samples of the nonequilibrium steady state. We present both an exact inference algorithm and a computationally more efficient, approximate algorithm for weak interactions based on a systematic expansion around mean-field theory. Obtaining expressions for magnetizations and two- and three-point spin correlations, we establish that these observables are sufficient to infer the model parameters. Further, we discuss the symmetries characterizing the different orders of the expansion around the mean field and show how different types of dynamics can be distinguished on the basis of samples from the nonequilibrium steady state.
Collapse
Affiliation(s)
- Simon L Dettmer
- Institute for Theoretical Physics, University of Cologne, Zülpicher Straße 77, 50937 Cologne, Germany
| | - H Chau Nguyen
- Max-Planck-Institut für Physik komplexer Systeme, Nöthnitzer Str. 38, 01187 Dresden, Germany
| | - Johannes Berg
- Institute for Theoretical Physics, University of Cologne, Zülpicher Straße 77, 50937 Cologne, Germany
| |
Collapse
|
44
|
Barton JP, De Leonardis E, Coucke A, Cocco S. ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 2016; 32:3089-3097. [PMID: 27329863 DOI: 10.1093/bioinformatics/btw328] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2016] [Accepted: 05/18/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Graphical models are often employed to interpret patterns of correlations observed in data through a network of interactions between the variables. Recently, Ising/Potts models, also known as Markov random fields, have been productively applied to diverse problems in biology, including the prediction of structural contacts from protein sequence data and the description of neural activity patterns. However, inference of such models is a challenging computational problem that cannot be solved exactly. Here, we describe the adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data. ACE avoids overfitting by constructing a sparse network of interactions sufficient to reproduce the observed correlation data within the statistical error expected due to finite sampling. When convergence of the ACE algorithm is slow, we combine it with a Boltzmann Machine Learning algorithm (BML). We illustrate this method on a variety of biological and artificial datasets and compare it to state-of-the-art approximate methods such as Gaussian and pseudo-likelihood inference. RESULTS We show that ACE accurately reproduces the true parameters of the underlying model when they are known, and yields accurate statistical descriptions of both biological and artificial data. Models inferred by ACE more accurately describe the statistics of the data, including both the constrained low-order correlations and unconstrained higher-order correlations, compared to those obtained by faster Gaussian and pseudo-likelihood methods. These alternative approaches can recover the structure of the interaction network but typically not the correct strength of interactions, resulting in less accurate generative models. AVAILABILITY AND IMPLEMENTATION The ACE source code, user manual and tutorials with the example data and filtered correlations described herein are freely available on GitHub at https://github.com/johnbarton/ACE CONTACTS: jpbarton@mit.edu, cocco@lps.ens.frSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- J P Barton
- Departments of Chemical Engineering and Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139, USA
| | - E De Leonardis
- Laboratoire de Physique Statistique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Université, Paris, France
| | - A Coucke
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Université, Paris, France Laboratoire de Physique Théorique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France
| | - S Cocco
- Laboratoire de Physique Statistique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France
| |
Collapse
|
45
|
Barton JP, Goonetilleke N, Butler TC, Walker BD, McMichael AJ, Chakraborty AK. Relative rate and location of intra-host HIV evolution to evade cellular immunity are predictable. Nat Commun 2016; 7:11660. [PMID: 27212475 PMCID: PMC4879252 DOI: 10.1038/ncomms11660] [Citation(s) in RCA: 76] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 04/18/2016] [Indexed: 12/05/2022] Open
Abstract
Human immunodeficiency virus (HIV) evolves within infected persons to escape being destroyed by the host immune system, thereby preventing effective immune control of infection. Here, we combine methods from evolutionary dynamics and statistical physics to simulate in vivo HIV sequence evolution, predicting the relative rate of escape and the location of escape mutations in response to T-cell-mediated immune pressure in a cohort of 17 persons with acute HIV infection. Predicted and clinically observed times to escape immune responses agree well, and we show that the mutational pathways to escape depend on the viral sequence background due to epistatic interactions. The ability to predict escape pathways and the duration over which control is maintained by specific immune responses open the door to rational design of immunotherapeutic strategies that might enable long-term control of HIV infection. Our approach enables intra-host evolution of a human pathogen to be predicted in a probabilistic framework.
Collapse
Affiliation(s)
- John P. Barton
- Ragon Institute of MGH, MIT and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Nilu Goonetilleke
- Department of Microbiology, Immunology and Medicine, University of North Carolina, Chapel Hill, North Carolina 27599, USA
- Nuffield Department of Medicine, University of Oxford, Old Road Campus, Headington, Oxford OX3 7FZ, UK
| | - Thomas C. Butler
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Bruce D. Walker
- Ragon Institute of MGH, MIT and Harvard, Cambridge, Massachusetts 02139, USA
- Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA
| | - Andrew J. McMichael
- Nuffield Department of Medicine, University of Oxford, Old Road Campus, Headington, Oxford OX3 7FZ, UK
| | - Arup K. Chakraborty
- Ragon Institute of MGH, MIT and Harvard, Cambridge, Massachusetts 02139, USA
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
46
|
Butler TC, Barton JP, Kardar M, Chakraborty AK. Identification of drug resistance mutations in HIV from constraints on natural evolution. Phys Rev E 2016; 93:022412. [PMID: 26986367 DOI: 10.1103/physreve.93.022412] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Indexed: 11/07/2022]
Abstract
Human immunodeficiency virus (HIV) evolves with extraordinary rapidity. However, its evolution is constrained by interactions between mutations in its fitness landscape. Here we show that an Ising model describing these interactions, inferred from sequence data obtained prior to the use of antiretroviral drugs, can be used to identify clinically significant sites of resistance mutations. Successful predictions of the resistance sites indicate progress in the development of successful models of real viral evolution at the single residue level and suggest that our approach may be applied to help design new therapies that are less prone to failure even where resistance data are not yet available.
Collapse
Affiliation(s)
- Thomas C Butler
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA
| | - John P Barton
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Arup K Chakraborty
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.,Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Cambridge, Massachusetts 02139, USA.,Departments of Chemistry and Biological Engineering, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
47
|
Abstract
UNLABELLED Hepatitis C virus (HCV) afflicts 170 million people worldwide, 2%-3% of the global population, and kills 350 000 each year. Prophylactic vaccination offers the most realistic and cost effective hope of controlling this epidemic in the developing world where expensive drug therapies are not available. Despite 20 years of research, the high mutability of the virus and lack of knowledge of what constitutes effective immune responses have impeded development of an effective vaccine. Coupling data mining of sequence databases with spin glass models from statistical physics, we have developed a computational approach to translate clinical sequence databases into empirical fitness landscapes quantifying the replicative capacity of the virus as a function of its amino acid sequence. These landscapes explicitly connect viral genotype to phenotypic fitness, and reveal vulnerable immunological targets within the viral proteome that can be exploited to rationally design vaccine immunogens. We have recovered the empirical fitness landscape for the HCV RNA-dependent RNA polymerase (protein NS5B) responsible for viral genome replication, and validated the predictions of our model by demonstrating excellent accord with experimental measurements and clinical observations. We have used our landscapes to perform exhaustive in silico screening of 16.8 million T-cell immunogen candidates to identify 86 optimal formulations. By reducing the search space of immunogen candidates by over five orders of magnitude, our approach can offer valuable savings in time, expense, and labor for experimental vaccine development and accelerate the search for a HCV vaccine. ABBREVIATIONS HCV-hepatitis C virus, HLA-human leukocyte antigen, CTL-cytotoxic T lymphocyte, NS5B-nonstructural protein 5B, MSA-multiple sequence alignment, PEG-IFN-pegylated interferon.
Collapse
Affiliation(s)
- Gregory R Hart
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | | |
Collapse
|
48
|
Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M. Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1. Mol Biol Evol 2015; 33:268-80. [PMID: 26446903 PMCID: PMC4693977 DOI: 10.1093/molbev/msv211] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, for example, of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared with recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared with approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20 Å from the mutated site, reaching thus far beyond residues in direct physical contact.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- UPMC, Institut de Calcul et de la Simulation, Sorbonne Universités, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| | - Hervé Jacquier
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France Service de Bactériologie-Virologie, Groupe Hospitalier Lariboisiére-Fernand Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute for Technology, Eggenstein-Leopoldshafen, Germany
| | - Oliver Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France
| | - Martin Weigt
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| |
Collapse
|
49
|
Dixit PD, Jain A, Stock G, Dill KA. Inferring Transition Rates of Networks from Populations in Continuous-Time Markov Processes. J Chem Theory Comput 2015; 11:5464-72. [PMID: 26574334 DOI: 10.1021/acs.jctc.5b00537] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
We are interested inferring rate processes on networks. In particular, given a network's topology, the stationary populations on its nodes, and a few global dynamical observables, can we infer all the transition rates between nodes? We draw inferences using the principle of maximum caliber (maximum path entropy). We have previously derived results for discrete-time Markov processes. Here, we treat continuous-time processes, such as dynamics among metastable states of proteins. The present work leads to a particularly important analytical result: namely, that when the network is constrained only by a mean jump rate, the rate matrix is given by a square-root dependence of the rate, kab ∝ (πb/πa)(1/2), on πa and πb, the stationary-state populations at nodes a and b. This leads to a fast way to estimate all of the microscopic rates in the system. As an illustration, we show that the method accurately predicts the nonequilibrium transition rates in an in silico gene expression network and transition probabilities among the metastable states of a small peptide at equilibrium. We note also that the method makes sensible predictions for so-called extra-thermodynamic relationships, such as those of Bronsted, Hammond, and others.
Collapse
Affiliation(s)
- Purushottam D Dixit
- Department of Systems Biology, Columbia University , New York, New York 10032, United States
| | - Abhinav Jain
- Institute of Physics and Freiburg Institute for Advanced Studies (FRIAS), Albert Ludwigs University , 79104 Freiburg, Germany
| | - Gerhard Stock
- Institute of Physics and Freiburg Institute for Advanced Studies (FRIAS), Albert Ludwigs University , 79104 Freiburg, Germany
| | - Ken A Dill
- Laufer Center for Quantitative Biology, Department of Chemistry, and Department of Physics and Astronomy, Stony Brook University , Stony Brook, New York 11790, United States
| |
Collapse
|
50
|
Dixit PD. Stationary properties of maximum-entropy random walks. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:042149. [PMID: 26565210 DOI: 10.1103/physreve.92.042149] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2015] [Indexed: 06/05/2023]
Abstract
Maximum-entropy (ME) inference of state probabilities using state-dependent constraints is popular in the study of complex systems. In stochastic systems, how state space topology and path-dependent constraints affect ME-inferred state probabilities remains unknown. To that end, we derive the transition probabilities and the stationary distribution of a maximum path entropy Markov process subject to state- and path-dependent constraints. A main finding is that the stationary distribution over states differs significantly from the Boltzmann distribution and reflects a competition between path multiplicity and imposed constraints. We illustrate our results with particle diffusion on a two-dimensional landscape. Connections with the path integral approach to diffusion are discussed.
Collapse
Affiliation(s)
- Purushottam D Dixit
- Department of Systems Biology, Columbia University, New York, New York 10032, United States
| |
Collapse
|