1
|
Martin J, Lequerica Mateos M, Onuchic JN, Coluzza I, Morcos F. Machine learning in biological physics: From biomolecular prediction to design. Proc Natl Acad Sci U S A 2024; 121:e2311807121. [PMID: 38913893 PMCID: PMC11228481 DOI: 10.1073/pnas.2311807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is "learnable" and propose its future use in the generation of unique sequences that can fold into a target structure.
Collapse
Affiliation(s)
- Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Marcos Lequerica Mateos
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of BioSciences, Rice University, Houston, TX77005
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
- Basque Foundation for Science, Ikerbasque, Bilbao48940, Spain
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
2
|
Biswas A, Choudhuri I, Arnold E, Lyumkis D, Haldane A, Levy RM. Kinetic coevolutionary models predict the temporal emergence of HIV-1 resistance mutations under drug selection pressure. Proc Natl Acad Sci U S A 2024; 121:e2316662121. [PMID: 38557187 PMCID: PMC11009627 DOI: 10.1073/pnas.2316662121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2023] [Accepted: 02/23/2024] [Indexed: 04/04/2024] Open
Abstract
Drug resistance in HIV type 1 (HIV-1) is a pervasive problem that affects the lives of millions of people worldwide. Although records of drug-resistant mutations (DRMs) have been extensively tabulated within public repositories, our understanding of the evolutionary kinetics of DRMs and how they evolve together remains limited. Epistasis, the interaction between a DRM and other residues in HIV-1 protein sequences, is key to the temporal evolution of drug resistance. We use a Potts sequence-covariation statistical-energy model of HIV-1 protein fitness under drug selection pressure, which captures epistatic interactions between all positions, combined with kinetic Monte-Carlo simulations of sequence evolutionary trajectories, to explore the acquisition of DRMs as they arise in an ensemble of drug-naive patient protein sequences. We follow the time course of 52 DRMs in the enzymes protease, RT, and integrase, the primary targets of antiretroviral therapy. The rates at which DRMs emerge are highly correlated with their observed acquisition rates reported in the literature when drug pressure is applied. This result highlights the central role of epistasis in determining the kinetics governing DRM emergence. Whereas rapidly acquired DRMs begin to accumulate as soon as drug pressure is applied, slowly acquired DRMs are contingent on accessory mutations that appear only after prolonged drug pressure. We provide a foundation for using computational methods to determine the temporal evolution of drug resistance using Potts statistical potentials, which can be used to gain mechanistic insights into drug resistance pathways in HIV-1 and other infectious agents.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Department of Physics, University of California San Diego, La Jolla, CA92093
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| | - Eddy Arnold
- Department of Chemistry and Chemical Biology, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ08854
| | - Dmitry Lyumkis
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Graduate School of Biological Sciences, Department of Molecular Biology, University of California San Diego, La Jolla, CA92093
| | - Allan Haldane
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Physics, Temple University, Philadelphia, PA19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| |
Collapse
|
3
|
Pucci F, Zerihun MB, Rooman M, Schug A. pycofitness-Evaluating the fitness landscape of RNA and protein sequences. Bioinformatics 2024; 40:btae074. [PMID: 38335928 PMCID: PMC10881095 DOI: 10.1093/bioinformatics/btae074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 01/25/2024] [Accepted: 02/06/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION https://github.com/KIT-MBS/pycofitness.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
4
|
Zhang H, Bull RA, Quadeer AA, McKay MR. HCV E1 influences the fitness landscape of E2 and may enhance escape from E2-specific antibodies. Virus Evol 2023; 9:vead068. [PMID: 38107333 PMCID: PMC10722114 DOI: 10.1093/ve/vead068] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 09/27/2023] [Accepted: 11/16/2023] [Indexed: 12/19/2023] Open
Abstract
The Hepatitis C virus (HCV) envelope glycoprotein E1 forms a non-covalent heterodimer with E2, the main target of neutralizing antibodies. How E1-E2 interactions influence viral fitness and contribute to resistance to E2-specific antibodies remain largely unknown. We investigate this problem using a combination of fitness landscape and evolutionary modeling. Our analysis indicates that E1 and E2 proteins collectively mediate viral fitness and suggests that fitness-compensating E1 mutations may accelerate escape from E2-targeting antibodies. Our analysis also identifies a set of E2-specific human monoclonal antibodies that are predicted to be especially resilient to escape via genetic variation in both E1 and E2, providing directions for robust HCV vaccine development.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
| | - Rowena A Bull
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW 2052, Australia
- The Kirby Institute for Infection and Immunity, Sydney, NSW 2052, Australia
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
- Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC 3010, Australia
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC 3010, Australia
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
5
|
Zhang H, Quadeer AA, McKay MR. Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis. Nat Commun 2023; 14:7457. [PMID: 37978179 PMCID: PMC10656532 DOI: 10.1038/s41467-023-42550-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
Direct-acting antiviral agents (DAAs) provide efficacious therapeutic treatments for chronic Hepatitis C virus (HCV) infection. However, emergence of drug resistance mutations (DRMs) can greatly affect treatment outcomes and impede virological cure. While multiple DRMs have been observed for all currently used DAAs, the evolutionary determinants of such mutations are not currently well understood. Here, by considering DAAs targeting the nonstructural 3 (NS3) protein of HCV, we present results suggesting that epistasis plays an important role in the evolution of DRMs. Employing a sequence-based fitness landscape model whose predictions correlate highly with experimental data, we identify specific DRMs that are associated with strong epistatic interactions, and these are found to be enriched in multiple NS3-specific DAAs. Evolutionary modelling further supports that the identified DRMs involve compensatory mutational interactions that facilitate relatively easy escape from drug-induced selection pressures. Our results indicate that accounting for epistasis is important for designing future HCV NS3-targeting DAAs.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC, Australia.
- Department of Microbiology and Immunology, University of Melbourne, at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia.
| |
Collapse
|
6
|
Li M, Oliveira Passos D, Shan Z, Smith SJ, Sun Q, Biswas A, Choudhuri I, Strutzenberg TS, Haldane A, Deng N, Li Z, Zhao XZ, Briganti L, Kvaratskhelia M, Burke TR, Levy RM, Hughes SH, Craigie R, Lyumkis D. Mechanisms of HIV-1 integrase resistance to dolutegravir and potent inhibition of drug-resistant variants. SCIENCE ADVANCES 2023; 9:eadg5953. [PMID: 37478179 DOI: 10.1126/sciadv.adg5953] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Accepted: 06/16/2023] [Indexed: 07/23/2023]
Abstract
HIV-1 infection depends on the integration of viral DNA into host chromatin. Integration is mediated by the viral enzyme integrase and is blocked by integrase strand transfer inhibitors (INSTIs), first-line antiretroviral therapeutics widely used in the clinic. Resistance to even the best INSTIs is a problem, and the mechanisms of resistance are poorly understood. Here, we analyze combinations of the mutations E138K, G140A/S, and Q148H/K/R, which confer resistance to INSTIs. The investigational drug 4d more effectively inhibited the mutants compared with the approved drug Dolutegravir (DTG). We present 11 new cryo-EM structures of drug-resistant HIV-1 intasomes bound to DTG or 4d, with better than 3-Å resolution. These structures, complemented with free energy simulations, virology, and enzymology, explain the mechanisms of DTG resistance involving E138K + G140A/S + Q148H/K/R and show why 4d maintains potency better than DTG. These data establish a foundation for further development of INSTIs that potently inhibit resistant forms in integrase.
Collapse
Affiliation(s)
- Min Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Zelin Shan
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Steven J Smith
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Qinfang Sun
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | - Avik Biswas
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Nanjie Deng
- Department of Chemistry and Physical Sciences, Pace University, New York, NY, 10038, USA
| | - Zhaoyang Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xue Zhi Zhao
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Lorenzo Briganti
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Mamuka Kvaratskhelia
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Terrence R Burke
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Ronald M Levy
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Stephen H Hughes
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Robert Craigie
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Dmitry Lyumkis
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Graduate School of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
7
|
Pirkl M, Büch J, Devaux C, Böhm M, Sönnerborg A, Incardona F, Abecasis A, Vandamme AM, Zazzi M, Kaiser R, Lengauer T, The EuResist Network Study Group. Analysis of mutational history of multidrug-resistant genotypes with a mutagenetic tree model. J Med Virol 2023; 95:e28389. [PMID: 36484375 DOI: 10.1002/jmv.28389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 11/24/2022] [Accepted: 12/01/2022] [Indexed: 12/14/2022]
Abstract
Human immunodeficiency virus (HIV) can develop resistance to all antiretroviral drugs. Multidrug resistance, however, is a rare event in modern HIV treatment, but can be life-threatening, particular in patients with very long therapy histories and in areas with limited access to novel drugs. To understand the evolution of multidrug resistance, we analyzed the EuResist database to uncover the accumulation of mutations over time. We hypothesize that the accumulation of resistance mutations is not acquired simultaneously and randomly across viral genotypes but rather tends to follow a predetermined order. The knowledge of this order might help to elucidate potential mechanisms of multidrug resistance. Our evolutionary model shows an almost monotonic increase of resistance with each acquired mutation, including less well-known nucleoside reverse transcriptase (RT) inhibitor-related mutations like K223Q, L228H, and Q242H. Mutations within the integrase (IN) (T97A, E138A/K G140S, Q148H, N155H) indicate high probability of multidrug resistance. Hence, these IN mutations also tend to be observed together with mutations in the protease (PR) and RT. We followed up with an analysis of the mutation-specific error rates of our model given the data. We identified several mutations with unusual rates (PR: M41L, L33F, IN: G140S). This could imply the existence of previously unknown virus variants in the viral quasispecies. In conclusion, our bioinformatics model supports the analysis and understanding of multidrug resistance.
Collapse
Affiliation(s)
- Martin Pirkl
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Joachim Büch
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Carole Devaux
- Department of Infection and Immunity, Luxembourg Institute of Health, Strassen, Luxembourg
| | - Michael Böhm
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Anders Sönnerborg
- Department of Laboratory Medicine, Division of Clinical Microbiology, Karolinska Institute, Solna, Sweden
| | | | - Ana Abecasis
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Anne-Mieke Vandamme
- Center for Global Health and Tropical Medicine, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa, Lisbon, Portugal.,Department of Microbiology, Immunology and Transplantation, Clinical and Epidemiological Virology, Institute for the Future, Rega Institute for Medical Research, KU Leuven, Leuven, Belgium
| | - Maurizio Zazzi
- Department of Medical Biotechnologies, University of Siena, Siena, Italy
| | - Rolf Kaiser
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | - Thomas Lengauer
- Institute of Virology, University Hospital Cologne, University of Cologne, Cologne, Germany
| | | |
Collapse
|
8
|
Choudhuri I, Biswas A, Haldane A, Levy RM. Contingency and Entrenchment of Drug-Resistance Mutations in HIV Viral Proteins. J Phys Chem B 2022; 126:10622-10636. [PMID: 36493468 PMCID: PMC9841799 DOI: 10.1021/acs.jpcb.2c06123] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
The ability of HIV-1 to rapidly mutate leads to antiretroviral therapy (ART) failure among infected patients. Drug-resistance mutations (DRMs), which cause a fitness penalty to intrinsic viral fitness, are compensated by accessory mutations with favorable epistatic interactions which cause an evolutionary trapping effect, but the kinetics of this overall process has not been well characterized. Here, using a Potts Hamiltonian model describing epistasis combined with kinetic Monte Carlo simulations of evolutionary trajectories, we explore how epistasis modulates the evolutionary dynamics of HIV DRMs. We show how the occurrence of a drug-resistance mutation is contingent on favorable epistatic interactions with many other residues of the sequence background and that subsequent mutations entrench DRMs. We measure the time-autocorrelation of fluctuations in the likelihood of DRMs due to epistatic coupling with the sequence background, which reveals the presence of two evolutionary processes controlling DRM kinetics with two distinct time scales. Further analysis of waiting times for the evolutionary trapping effect to reverse reveals that the sequences which entrench (trap) a DRM are responsible for the slower time scale. We also quantify the overall strength of epistatic effects on the evolutionary kinetics for different mutations and show these are much larger for DRM positions than polymorphic positions, and we also show that trapping of a DRM is often caused by the collective effect of many accessory mutations, rather than a few strongly coupled ones, suggesting the importance of multiresidue sequence variations in HIV evolution. The analysis presented here provides a framework to explore the kinetic pathways through which viral proteins like HIV evolve under drug-selection pressure.
Collapse
Affiliation(s)
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States; Department of Physics, Temple University, Philadelphia, Pennsylvania 19122-6008, United States
| | - Ronald M. Levy
- Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122, United States; Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania 19122, United States
| |
Collapse
|
9
|
Harman JL, Reardon PN, Costello SM, Warren GD, Phillips SR, Connor PJ, Marqusee S, Harms MJ. Evolution avoids a pathological stabilizing interaction in the immune protein S100A9. Proc Natl Acad Sci U S A 2022; 119:e2208029119. [PMID: 36194634 PMCID: PMC9565474 DOI: 10.1073/pnas.2208029119] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Accepted: 09/07/2022] [Indexed: 01/03/2023] Open
Abstract
Stability constrains evolution. While much is known about constraints on destabilizing mutations, less is known about the constraints on stabilizing mutations. We recently identified a mutation in the innate immune protein S100A9 that provides insight into such constraints. When introduced into human S100A9, M63F simultaneously increases the stability of the protein and disrupts its natural ability to activate Toll-like receptor 4. Using chemical denaturation, we found that M63F stabilizes a calcium-bound conformation of hS100A9. We then used NMR to solve the structure of the mutant protein, revealing that the mutation distorts the hydrophobic binding surface of hS100A9, explaining its deleterious effect on function. Hydrogen-deuterium exchange (HDX) experiments revealed stabilization of the region around M63F in the structure, notably Phe37. In the structure of the M63F mutant, the Phe37 and Phe63 sidechains are in contact, plausibly forming an edge-face π-stack. Mutating Phe37 to Leu abolished the stabilizing effect of M63F as probed by both chemical denaturation and HDX. It also restored the biological activity of S100A9 disrupted by M63F. These findings reveal that Phe63 creates a molecular staple with Phe37 that stabilizes a nonfunctional conformation of the protein, thus disrupting function. Using a bioinformatic analysis, we found that S100A9 proteins from different organisms rarely have Phe at both positions 37 and 63, suggesting that avoiding a pathological stabilizing interaction indeed constrains S100A9 evolution. This work highlights an important evolutionary constraint on stabilizing mutations, namely, that they must avoid inappropriately stabilizing nonfunctional protein conformations.
Collapse
Affiliation(s)
- Joseph L Harman
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR 97403
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403
| | - Patrick N Reardon
- College of Science, NMR Facility, Oregon State University, Corvallis, OR 97331
| | - Shawn M Costello
- Biophysics Graduate Program, University of California, Berkeley, Berkeley, CA 94720
| | - Gus D Warren
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR 97403
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403
| | - Sophia R Phillips
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR 97403
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403
| | - Patrick J Connor
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR 97403
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403
| | - Susan Marqusee
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720
- Department of Chemistry, University of California, Berkeley, Berkeley, CA 94720
- California Institute for Quantitative Biosciences, University of California, Berkeley, Berkeley, CA 94720
| | - Michael J Harms
- Department of Chemistry and Biochemistry, University of Oregon, Eugene, OR 97403
- Institute of Molecular Biology, University of Oregon, Eugene, OR 97403
| |
Collapse
|
10
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
11
|
Wang S, Sotcheff SL, Gallardo CM, Jaworski E, Torbett B, Routh A. Covariation of viral recombination with single nucleotide variants during virus evolution revealed by CoVaMa. Nucleic Acids Res 2022; 50:e41. [PMID: 35018461 PMCID: PMC9023271 DOI: 10.1093/nar/gkab1259] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 11/29/2021] [Accepted: 12/09/2021] [Indexed: 11/17/2022] Open
Abstract
Adaptation of viruses to their environments occurs through the acquisition of both novel single-nucleotide variants (SNV) and recombination events including insertions, deletions, and duplications. The co-occurrence of SNVs in individual viral genomes during their evolution has been well-described. However, unlike covariation of SNVs, studying the correlation between recombination events with each other or with SNVs has been hampered by their inherent genetic complexity and a lack of bioinformatic tools. Here, we expanded our previously reported CoVaMa pipeline (v0.1) to measure linkage disequilibrium between recombination events and SNVs within both short-read and long-read sequencing datasets. We demonstrate this approach using long-read nanopore sequencing data acquired from Flock House virus (FHV) serially passaged in vitro. We found SNVs that were either correlated or anti-correlated with large genomic deletions generated by nonhomologous recombination that give rise to Defective-RNAs. We also analyzed NGS data from longitudinal HIV samples derived from a patient undergoing antiretroviral therapy who proceeded to virological failure. We found correlations between insertions in the p6Gag and mutations in Gag cleavage sites. This report confirms previous findings and provides insights on novel associations between SNVs and specific recombination events within the viral genome and their role in viral evolution.
Collapse
Affiliation(s)
- Shiyi Wang
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
| | - Stephanea L Sotcheff
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Christian M Gallardo
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
| | - Elizabeth Jaworski
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
| | - Bruce E Torbett
- Department of Immunology and Microbiology, The Scripps Research Institute, La Jolla, CA, USA
- Center for Immunity and Immunotherapies, Seattle Children's Research Institute, Seattle, WA, USA
- Department of Laboratory Medicine and Pathology, University of Washington, Seattle, WA, USA
- Department of Pediatrics, University of Washington School of Medicine, Seattle, WA, USA
| | - Andrew L Routh
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX, USA
- Institute for Human Infections and Immunity, University of Texas Medical Branch, Galveston, TX, USA
- Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX, USA
| |
Collapse
|
12
|
Biswas A, Haldane A, Levy RM. Limits to detecting epistasis in the fitness landscape of HIV. PLoS One 2022; 17:e0262314. [PMID: 35041711 PMCID: PMC8765623 DOI: 10.1371/journal.pone.0262314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 12/20/2021] [Indexed: 02/05/2023] Open
Abstract
The rapid evolution of HIV is constrained by interactions between mutations which affect viral fitness. In this work, we explore the role of epistasis in determining the mutational fitness landscape of HIV for multiple drug target proteins, including Protease, Reverse Transcriptase, and Integrase. Epistatic interactions between residues modulate the mutation patterns involved in drug resistance, with unambiguous signatures of epistasis best seen in the comparison of the Potts model predicted and experimental HIV sequence “prevalences” expressed as higher-order marginals (beyond triplets) of the sequence probability distribution. In contrast, experimental measures of fitness such as viral replicative capacities generally probe fitness effects of point mutations in a single background, providing weak evidence for epistasis in viral systems. The detectable effects of epistasis are obscured by higher evolutionary conservation at sites. While double mutant cycles in principle, provide one of the best ways to probe epistatic interactions experimentally without reference to a particular background, we show that the analysis is complicated by the small dynamic range of measurements. Overall, we show that global pairwise interaction Potts models are necessary for predicting the mutational landscape of viral proteins.
Collapse
Affiliation(s)
- Avik Biswas
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Allan Haldane
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Ronald M. Levy
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
- Department of Chemistry, Temple University, Philadelphia, PA, United States of America
- * E-mail:
| |
Collapse
|
13
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Evolution of amino acid propensities under stability-mediated epistasis. Mol Biol Evol 2022; 39:6522130. [PMID: 35134997 PMCID: PMC8896634 DOI: 10.1093/molbev/msac030] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Site-specific amino acid preferences are influenced by the genetic background of the protein. The preferences for resident amino acids are expected to, on average, increase over time because of replacements at other sites - a nonadaptive phenomenon referred to as the 'evolutionary Stokes shift'. Alternatively, decreases in resident amino acid propensity have recently been viewed as evidence of adaptations to external environmental changes. Using population genetics theory and thermodynamic stability-constraints, we show that nonadaptive evolution can lead to both positive and negative shifts in propensities following the fixation of an amino acid, emphasizing that the detection of negative shifts is not conclusive evidence of adaptation. Considering shifts in propensities over windows between substitutions at a focal site, we find that following ≈ 50% of substitutions the propensity for the new resident amino acid decreases over time, and both positive and negative shifts were comparable in magnitude. Preferences were often conserved via a significant negative autocorrelation in propensity changes-increases in propensities often followed by decreases, and vice versa. Lastly, we explore the underlying mechanisms that lead propensities to fluctuate. We observe that stabilizing replacements increase the mutational tolerance at a site and in doing so decrease the propensity for the resident amino acid. In contrast, destabilizing substitutions result in more rugged fitness landscapes that tend to favor the resident amino acid. In summary, our results characterize propensity trajectories under nonadaptive stability-constrained evolution against which evidence of adaptations should be calibrated.
Collapse
Affiliation(s)
- Noor Youssef
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Edward Susko
- Department of Mathematics and Statistics, Dalhousie University, Halifax, NS, Canada
| | - Andrew J Roger
- Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, Canada
| | - Joseph P Bielawski
- Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, Canada
| |
Collapse
|
14
|
Co-evolution of drug resistance and broadened substrate recognition in HIV protease variants isolated from an Escherichia coli genetic selection system. Biochem J 2022; 479:479-501. [PMID: 35089310 DOI: 10.1042/bcj20210767] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 01/07/2022] [Accepted: 01/28/2022] [Indexed: 11/17/2022]
Abstract
A genetic selection system for activity of HIV protease is described that is based on a synthetic substrate constructed as a modified AraC regulatory protein that when cleaved stimulate L-arabinose metabolism in an Escherichia coli araC strain. Growth stimulation on selective plates was shown to depend on active HIV protease and the scissile bond in the substrate. In addition, the growth of cells correlated well with the established cleavage efficiency of the sites in the viral polyprotein, Gag, when these sites were individually introduced into the synthetic substate of the selection system. Plasmids encoding protease variants selected based on stimulation of cell growth in the presence of saquinavir or cleavage of a site not cleaved by wild-type protease, were indistinguishable with respect to both phenotypes. Also, both groups of selected plasmids encoded side chain substitutions known from clinical isolates or displayed different side chain substitutions but at identical positions. One highly frequent side chain substitution, E34V, not regarded as a major drug resistance substitution was found in variants obtained under both selective conditions and is suggested to improve protease processing of the synthetic substrate. This substitution is away from the substrate-binding cavity and together with other substitutions in the selected reading frames supports the previous suggestion of a substrate-binding site extended from the active site binding pocket itself.
Collapse
|
15
|
Evolutionary modeling reveals enhanced mutational flexibility of HCV subtype 1b compared with 1a. iScience 2022; 25:103569. [PMID: 34988406 PMCID: PMC8704487 DOI: 10.1016/j.isci.2021.103569] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2021] [Revised: 11/19/2021] [Accepted: 12/02/2021] [Indexed: 11/24/2022] Open
Abstract
Hepatitis C virus (HCV) is a leading cause of liver-associated disease and liver cancer. Of the major HCV subtypes, patients infected with subtype 1b have been associated with having a higher risk of developing chronic infection and hepatocellular carcinoma. However, underlying reasons for this increased disease severity remain unknown. Here, we provide an evolutionary rationale, based on a comparative study of fitness landscape and in-host evolutionary models of the E2 glycoprotein of HCV subtypes 1a and 1b. Our analysis demonstrates that a higher chronicity rate of 1b may be attributed to lower fitness constraints, enabling 1b viruses to more easily escape antibody responses. More generally, our results suggest that differences in evolutionary constraints between HCV subtypes may be an important factor in mediating distinct disease outcomes. Our analysis also identifies antibodies that appear escape-resistant against both subtypes 1a and 1b, providing directions for designing HCV vaccines having cross-subtype protection. Comparative analysis of the fitness landscapes of HCV subtypes 1a and 1b Subtype 1b evolution is subject to less constraints than 1a Subtype 1b appears to evade antibodies more easily compared with 1a Antibodies are identified that are difficult to escape for both subtypes 1a and 1b
Collapse
|
16
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
17
|
Narayanan KK, Procko E. Deep Mutational Scanning of Viral Glycoproteins and Their Host Receptors. Front Mol Biosci 2021; 8:636660. [PMID: 33898517 PMCID: PMC8062978 DOI: 10.3389/fmolb.2021.636660] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 03/18/2021] [Indexed: 11/17/2022] Open
Abstract
Deep mutational scanning or deep mutagenesis is a powerful tool for understanding the sequence diversity available to viruses for adaptation in a laboratory setting. It generally involves tracking an in vitro selection of protein sequence variants with deep sequencing to map mutational effects based on changes in sequence abundance. Coupled with any of a number of selection strategies, deep mutagenesis can explore the mutational diversity available to viral glycoproteins, which mediate critical roles in cell entry and are exposed to the humoral arm of the host immune response. Mutational landscapes of viral glycoproteins for host cell attachment and membrane fusion reveal extensive epistasis and potential escape mutations to neutralizing antibodies or other therapeutics, as well as aiding in the design of optimized immunogens for eliciting broadly protective immunity. While less explored, deep mutational scans of host receptors further assist in understanding virus-host protein interactions. Critical residues on the host receptors for engaging with viral spikes are readily identified and may help with structural modeling. Furthermore, mutations may be found for engineering soluble decoy receptors as neutralizing agents that specifically bind viral targets with tight affinity and limited potential for viral escape. By untangling the complexities of how sequence contributes to viral glycoprotein and host receptor interactions, deep mutational scanning is impacting ideas and strategies at multiple levels for combatting circulating and emergent virus strains.
Collapse
Affiliation(s)
| | - Erik Procko
- Department of Biochemistry and Cancer Center at Illinois, University of Illinois, Urbana, IL, United States
| |
Collapse
|
18
|
Haldane A, Levy RM. Mi3-GPU: MCMC-based Inverse Ising Inference on GPUs for protein covariation analysis. COMPUTER PHYSICS COMMUNICATIONS 2021; 260:107312. [PMID: 33716309 PMCID: PMC7944406 DOI: 10.1016/j.cpc.2020.107312] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Inverse Ising inference is a method for inferring the coupling parameters of a Potts/Ising model based on observed site-covariation, which has found important applications in protein physics for detecting interactions between residues in protein families. We introduce Mi3-GPU ("mee-three", for MCMC Inverse Ising Inference) software for solving the inverse Ising problem for protein-sequence datasets with few analytic approximations, by parallel Markov-Chain Monte-Carlo sampling on GPUs. We also provide tools for analysis and preparation of protein-family Multiple Sequence Alignments (MSAs) to account for finite-sampling issues, which are a major source of error or bias in inverse Ising inference. Our method is "generative" in the sense that the inferred model can be used to generate synthetic MSAs whose mutational statistics (marginals) can be verified to match the dataset MSA statistics up to the limits imposed by the effects of finite sampling. Our GPU implementation enables the construction of models which reproduce the covariation patterns of the observed MSA with a precision that is not possible with more approximate methods. The main components of our method are a GPU-optimized algorithm to greatly accelerate MCMC sampling, combined with a multi-step Quasi-Newton parameter-update scheme using a "Zwanzig reweighting" technique. We demonstrate the ability of this software to produce generative models on typical protein family datasets for sequence lengths L ~ 300 with 21 residue types with tens of millions of inferred parameters in short running times.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology and Department of Chemistry, Temple University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
19
|
Neverov AD, Popova AV, Fedonin GG, Cheremukhin EA, Klink GV, Bazykin GA. Episodic evolution of coadapted sets of amino acid sites in mitochondrial proteins. PLoS Genet 2021; 17:e1008711. [PMID: 33493156 PMCID: PMC7861529 DOI: 10.1371/journal.pgen.1008711] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 02/04/2021] [Accepted: 12/07/2020] [Indexed: 11/19/2022] Open
Abstract
The rate of evolution differs between protein sites and changes with time. However, the link between these two phenomena remains poorly understood. Here, we design a phylogenetic approach for distinguishing pairs of amino acid sites that evolve concordantly, i.e., such that substitutions at one site trigger subsequent substitutions at the other; and also pairs of sites that evolve discordantly, so that substitutions at one site impede subsequent substitutions at the other. We distinguish groups of amino acid sites that undergo coordinated evolution and evolve discordantly from other such groups. In mitochondrion-encoded proteins of metazoans and fungi, we show that concordantly evolving sites are clustered in protein structures. By analysing the phylogenetic patterns of substitutions at concordantly and discordantly evolving site pairs, we find that concordant evolution has two distinct causes: epistatic interactions between amino acid substitutions and episodes of selection independently affecting substitutions at different sites. The rate of substitutions at concordantly evolving groups of protein sites changes in the course of evolution, indicating episodes of selection limited to some of the lineages. The phylogenetic positions of these changes are consistent between proteins, suggesting common selective forces underlying them. The mode and rate of evolution of a protein site depends on the effect of its mutations on protein fitness. The fitness effect of a mutation itself can change in the course of evolution for at least two reasons. First, it can be modulated by substitutions occurring at other sites, a phenomenon called epistasis. Second, changes in selection can be non-epistatic, affecting sites independently of one another. Here, we analyse substitutions accumulated by the evolving lineages of the five proteins encoded by the mitochondrial genomes of thousands of species of metazoans and fungi. We show that substitutions at different amino acid sites occur in a coordinated fashion, and this coordination is caused both by epistasis and by episodes of selection affecting groups of sites. We partition each protein into several groups of concordantly evolving sites such that evolution of sites from different groups is discordant, and show that the proteins encoded by the mitochondrial genome consist of coevolving structural blocks. Some of these blocks have a clear functional specialization, e.g. are associated with interfaces between proteins composing respiratory complexes. Together, our results reveal a previously unrecognized complexity in the causes of variation in evolutionary rates between protein sites.
Collapse
Affiliation(s)
- Alexey D. Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- * E-mail:
| | - Anfisa V. Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
| | - Gennady G. Fedonin
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow region, Russia
| | | | - Galya V. Klink
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
| | - Georgii A. Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, Russia
- Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
20
|
Zhang TH, Dai L, Barton JP, Du Y, Tan Y, Pang W, Chakraborty AK, Lloyd-Smith JO, Sun R. Predominance of positive epistasis among drug resistance-associated mutations in HIV-1 protease. PLoS Genet 2020; 16:e1009009. [PMID: 33085662 PMCID: PMC7605711 DOI: 10.1371/journal.pgen.1009009] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Revised: 11/02/2020] [Accepted: 07/24/2020] [Indexed: 12/12/2022] Open
Abstract
Drug-resistant mutations often have deleterious impacts on replication fitness, posing a fitness cost that can only be overcome by compensatory mutations. However, the role of fitness cost in the evolution of drug resistance has often been overlooked in clinical studies or in vitro selection experiments, as these observations only capture the outcome of drug selection. In this study, we systematically profile the fitness landscape of resistance-associated sites in HIV-1 protease using deep mutational scanning. We construct a mutant library covering combinations of mutations at 11 sites in HIV-1 protease, all of which are associated with resistance to protease inhibitors in clinic. Using deep sequencing, we quantify the fitness of thousands of HIV-1 protease mutants after multiple cycles of replication in human T cells. Although the majority of resistance-associated mutations have deleterious effects on viral replication, we find that epistasis among resistance-associated mutations is predominantly positive. Furthermore, our fitness data are consistent with genetic interactions inferred directly from HIV sequence data of patients. Fitness valleys formed by strong positive epistasis reduce the likelihood of reversal of drug resistance mutations. Overall, our results support the view that strong compensatory effects are involved in the emergence of clinically observed resistance mutations and provide insights to understanding fitness barriers in the evolution and reversion of drug resistance.
Collapse
Affiliation(s)
- Tian-hao Zhang
- Molecular Biology Institute, University of California, Los Angeles, CA 90095, USA
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - Yushen Du
- School of Medicine, ZheJiang University, Hangzhou, 210000, China
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Wenwen Pang
- Department of Public Health Laboratory Science, West China School of Public Health, Sichuan University, Chengdu 610041, China
| | - Arup K. Chakraborty
- Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics, & Chemistry, Massachusetts Institute of Technology, MA 21309, USA
- Ragon Institute of MGH, MIT, & Harvard, Cambridge, MA 21309, USA
| | - James O. Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA
| | - Ren Sun
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
21
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
22
|
Evolution Rapidly Optimizes Stability and Aggregation in Lattice Proteins Despite Pervasive Landscape Valleys and Mazes. Genetics 2020; 214:1047-1057. [PMID: 32107278 DOI: 10.1534/genetics.120.302815] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2019] [Accepted: 02/18/2020] [Indexed: 11/18/2022] Open
Abstract
The "fitness" landscapes of genetic sequences are characterized by high dimensionality and "ruggedness" due to sign epistasis. Ascending from low to high fitness on such landscapes can be difficult because adaptive trajectories get stuck at low-fitness local peaks. Compounding matters, recent theoretical arguments have proposed that extremely long, winding adaptive paths may be required to reach even local peaks: a "maze-like" landscape topography. The extent to which peaks and mazes shape the mode and tempo of evolution is poorly understood, due to empirical limitations and the abstractness of many landscape models. We explore the prevalence, scale, and evolutionary consequences of landscape mazes in a biophysically grounded computational model of protein evolution that captures the "frustration" between "stability" and aggregation propensity. Our stability-aggregation landscape exhibits extensive sign epistasis and local peaks galore. Although this frequently obstructs adaptive ascent to high fitness and virtually eliminates reproducibility of evolutionary outcomes, many adaptive paths do successfully complete the ascent from low to high fitness, with hydrophobicity a critical mediator of success. These successful paths exhibit maze-like properties on a global landscape scale, in which taking an indirect path helps to avoid low-fitness local peaks. This delicate balance of "hard but possible" adaptation could occur more broadly in other biological settings where competing interactions and frustration are important.
Collapse
|
23
|
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape. Nat Commun 2020; 11:377. [PMID: 31953427 PMCID: PMC6969152 DOI: 10.1038/s41467-019-14174-2] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2018] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations. Poliovirus has a higher mutation rate than HIV, yet has been almost eradicated by vaccination while an effective vaccine against HIV does not exist. Here, the authors develop a fitness model for poliovirus viral protein 1 to show that it is subject to stringent evolutionary constraints that limit its ability to avoid vaccine-induced immune responses.
Collapse
|
24
|
Ding X, Zou Z, Brooks Iii CL. Deciphering protein evolution and fitness landscapes with latent space models. Nat Commun 2019; 10:5644. [PMID: 31822668 PMCID: PMC6904478 DOI: 10.1038/s41467-019-13633-0] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 11/12/2019] [Indexed: 12/03/2022] Open
Abstract
Protein sequences contain rich information about protein evolution, fitness landscapes, and stability. Here we investigate how latent space models trained using variational auto-encoders can infer these properties from sequences. Using both simulated and real sequences, we show that the low dimensional latent space representation of sequences, calculated using the encoder model, captures both evolutionary and ancestral relationships between sequences. Together with experimental fitness data and Gaussian process regression, the latent space representation also enables learning the protein fitness landscape in a continuous low dimensional space. Moreover, the model is also useful in predicting protein mutational stability landscapes and quantifying the importance of stability in shaping protein evolution. Overall, we illustrate that the latent space models learned using variational auto-encoders provide a mechanism for exploration of the rich data contained in protein sequences regarding evolution, fitness and stability and hence are well-suited to help guide protein engineering efforts.
Collapse
Affiliation(s)
- Xinqiang Ding
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Zhengting Zou
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, 48109, USA
| | - Charles L Brooks Iii
- Department of Computational Medicine & Bioinformatics, University of Michigan, Ann Arbor, MI, 48109, USA.
- Department of Chemistry, University of Michigan, Ann Arbor, MI, 48109, USA.
- Biophysics Program, University of Michigan, Ann Arbor, MI, 48109, USA.
| |
Collapse
|
25
|
Biswas A, Haldane A, Arnold E, Levy RM. Epistasis and entrenchment of drug resistance in HIV-1 subtype B. eLife 2019; 8:e50524. [PMID: 31591964 PMCID: PMC6783267 DOI: 10.7554/elife.50524] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 12/17/2022] Open
Abstract
The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Allan Haldane
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Eddy Arnold
- Center for Advanced Biotechnology and MedicineRutgers UniversityPiscatawayUnited States
- Department of Chemistry and Chemical BiologyRutgers UniversityPiscatawayUnited States
| | - Ronald M Levy
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
- Department of ChemistryTemple UniversityPhiladelphiaUnited States
| |
Collapse
|
26
|
Henes M, Kosovrasti K, Lockbaum GJ, Leidner F, Nachum GS, Nalivaika EA, Bolon DN, Yilmaz NK, Schiffer CA, Whitfield TW. Molecular Determinants of Epistasis in HIV-1 Protease: Elucidating the Interdependence of L89V and L90M Mutations in Resistance. Biochemistry 2019; 58:3711-3726. [PMID: 31386353 PMCID: PMC6941756 DOI: 10.1021/acs.biochem.9b00446] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protease inhibitors have the highest potency among antiviral therapies against HIV-1 infections, yet the virus can evolve resistance. Darunavir (DRV), currently the most potent Food and Drug Administration-approved protease inhibitor, retains potency against single-site mutations. However, complex combinations of mutations can confer resistance to DRV. While the interdependence between mutations within HIV-1 protease is key for inhibitor potency, the molecular mechanisms that underlie this control remain largely unknown. In this study, we investigated the interdependence between the L89V and L90M mutations and their effects on DRV binding. These two mutations have been reported to be positively correlated with one another in HIV-1 patient-derived protease isolates, with the presence of one mutation making the probability of the occurrence of the second mutation more likely. The focus of our investigation is a patient-derived isolate, with 24 mutations that we call "KY"; this variant includes the L89V and L90M mutations. Three additional KY variants with back-mutations, KY(V89L), KY(M90L), and the KY(V89L/M90L) double mutation, were used to experimentally assess the individual and combined effects of these mutations on DRV inhibition and substrate processing. The enzymatic assays revealed that the KY(V89L) variant, with methionine at residue 90, is highly resistant, but its catalytic function is compromised. When a leucine to valine mutation at residue 89 is present simultaneously with the L90M mutation, a rescue of catalytic efficiency is observed. Molecular dynamics simulations of these DRV-bound protease variants reveal how the L90M mutation induces structural changes throughout the enzyme that undermine the binding interactions.
Collapse
Affiliation(s)
- Mina Henes
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Klajdi Kosovrasti
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gordon J. Lockbaum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Florian Leidner
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gily S. Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Ellen A. Nalivaika
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Daniel N.A. Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Nese Kurt Yilmaz
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Celia A. Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Corresponding Author Celia A. Schiffer: Phone: +1 508 856 8008; , Troy W. Whitfield: Phone: +1 508 856 4401;
| | - Troy W. Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA,Corresponding Author Celia A. Schiffer: Phone: +1 508 856 8008; , Troy W. Whitfield: Phone: +1 508 856 4401;
| |
Collapse
|
27
|
Laine E, Karami Y, Carbone A. GEMME: a simple and fast global epistatic model predicting mutational effects. Mol Biol Evol 2019; 36:2604-2619. [PMID: 31406981 PMCID: PMC6805226 DOI: 10.1093/molbev/msz179] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Revised: 06/03/2019] [Accepted: 08/02/2019] [Indexed: 12/15/2022] Open
Abstract
The systematic and accurate description of protein mutational landscapes is a question of utmost importance in biology, bioengineering, and medicine. Recent progress has been achieved by leveraging on the increasing wealth of genomic data and by modeling intersite dependencies within biological sequences. However, state-of-the-art methods remain time consuming. Here, we present Global Epistatic Model for predicting Mutational Effects (GEMME) (www.lcqb.upmc.fr/GEMME), an original and fast method that predicts mutational outcomes by explicitly modeling the evolutionary history of natural sequences. This allows accounting for all positions in a sequence when estimating the effect of a given mutation. GEMME uses only a few biologically meaningful and interpretable parameters. Assessed against 50 high- and low-throughput mutational experiments, it overall performs similarly or better than existing methods. It accurately predicts the mutational landscapes of a wide range of protein families, including viral ones and, more generally, of much conserved families. Given an input alignment, it generates the full mutational landscape of a protein in a matter of minutes. It is freely available as a package and a webserver at www.lcqb.upmc.fr/GEMME/.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Yasaman Karami
- Sorbonne Université, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.,Sorbonne Université, UPMC-Univ P6, Institut du Calcul et de la Simulation
| | - Alessandra Carbone
- Sorbonne Université, UPMC University Paris 06, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France.,Institut Universitaire de France
| |
Collapse
|
28
|
Boucher JI, Whitfield TW, Dauphin A, Nachum G, Hollins C, Zeldovich KB, Swanstrom R, Schiffer CA, Luban J, Bolon DNA. Constrained Mutational Sampling of Amino Acids in HIV-1 Protease Evolution. Mol Biol Evol 2019; 36:798-810. [PMID: 30721995 DOI: 10.1093/molbev/msz022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
The evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution is interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available data set of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple-base mutations in HIV-1 protease is strongly influenced by mutational sampling.
Collapse
Affiliation(s)
- Jeffrey I Boucher
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Troy W Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA
| | - Ann Dauphin
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Gily Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Carl Hollins
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Ronald Swanstrom
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Jeremy Luban
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| |
Collapse
|
29
|
Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019; 114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open
Abstract
The protein kinase catalytic domain is one of the most abundant domains across all branches of life. Although kinases share a common core function of phosphoryl-transfer, they also have wide functional diversity and play varied roles in cell signaling networks, and for this reason are implicated in a number of human diseases. This functional diversity is primarily achieved through sequence variation, and uncovering the sequence-function relationships for the kinase family is a major challenge. In this study we use a statistical inference technique inspired by statistical physics, which builds a coevolutionary "Potts" Hamiltonian model of sequence variation in a protein family. We show how this model has sufficient power to predict the probability of specific subsequences in the highly diverged kinase family, which we verify by comparing the model's predictions with experimental observations in the Uniprot database. We show that the pairwise (residue-residue) interaction terms of the statistical model are necessary and sufficient to capture higher-than-pairwise mutation patterns of natural kinase sequences. We observe that previously identified functional sets of residues have much stronger correlated interaction scores than are typical.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Peng He
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania.
| |
Collapse
|
30
|
Identifying immunologically-vulnerable regions of the HCV E2 glycoprotein and broadly neutralizing antibodies that target them. Nat Commun 2019; 10:2073. [PMID: 31061402 PMCID: PMC6502829 DOI: 10.1038/s41467-019-09819-1] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2018] [Accepted: 04/02/2019] [Indexed: 02/06/2023] Open
Abstract
Isolation of broadly neutralizing human monoclonal antibodies (HmAbs) targeting the E2 glycoprotein of Hepatitis C virus (HCV) has sparked hope for effective vaccine development. Nonetheless, escape mutations have been reported. Ideally, a potent vaccine should elicit HmAbs that target regions of E2 that are most difficult to escape. Here, aimed at addressing this challenge, we develop a predictive in-silico evolutionary model for E2 that identifies one such region, a specific antigenic domain, making it an attractive target for a robust antibody response. Specific broadly neutralizing HmAbs that appear difficult to escape from are also identified. By providing a framework for identifying vulnerable regions of E2 and for assessing the potency of specific antibodies, our results can aid the rational design of an effective prophylactic HCV vaccine. A good vaccine should direct the immune response to virus regions that are most difficult to escape. Here, Quadeer et al. develop a predictive in-silico evolutionary model for HCV E2 which identifies one such antigenic region and identifies multiple broadly neutralizing human antibodies that appear difficult to escape from.
Collapse
|
31
|
Bandera A, Gori A, Clerici M, Sironi M. Phylogenies in ART: HIV reservoirs, HIV latency and drug resistance. Curr Opin Pharmacol 2019; 48:24-32. [PMID: 31029861 DOI: 10.1016/j.coph.2019.03.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 03/07/2019] [Accepted: 03/12/2019] [Indexed: 11/17/2022]
Abstract
Combination antiretroviral therapy (ART) has significantly reduced the morbidity and mortality resulting from HIV infection. ART is, however, unable to eradicate HIV, which persists latently in several cell types and tissues. Phylogenetic analyses suggested that the proliferation of cells infected before ART initiation is mainly responsible for residual viremia, although controversy still exists. Conversely, it is widely accepted that drug resistance mutations (DRMs) do not appear during ART in patients with suppressed viral loads. Studies based on sequence clustering have in fact indicated that, at least in developed countries, HIV-infected ART-naive patients are the major source of drug-resistant viruses. Analysis of longitudinally sampled sequences have also shown that DRMs have variable fitness costs, which are strongly influenced by the viral genetic background.
Collapse
Affiliation(s)
- Alessandra Bandera
- Infectious Diseases Unit, Department of Internal Medicine, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, 20090 Milan, Italy; Department of Pathophysiology and Transplantation, School of Medicine and Surgery, University of Milan, 20090 Milan, Italy
| | - Andrea Gori
- Infectious Diseases Unit, Department of Internal Medicine, Fondazione IRCCS Ca' Granda Ospedale Maggiore Policlinico, 20090 Milan, Italy; Department of Pathophysiology and Transplantation, School of Medicine and Surgery, University of Milan, 20090 Milan, Italy
| | - Mario Clerici
- Department of Pathophysiology and Transplantation, School of Medicine and Surgery, University of Milan, 20090 Milan, Italy; IRCCS Fondazione Don Carlo Gnocchi, 20148 Milan, Italy
| | - Manuela Sironi
- Bioinformatics, Scientific Institute, IRCCS E. MEDEA, 23842 Bosisio Parini, Lecco, Italy.
| |
Collapse
|
32
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
33
|
Peng F, Widmann S, Wünsche A, Duan K, Donovan KA, Dobson RCJ, Lenski RE, Cooper TF. Effects of Beneficial Mutations in pykF Gene Vary over Time and across Replicate Populations in a Long-Term Experiment with Bacteria. Mol Biol Evol 2019; 35:202-210. [PMID: 29069429 PMCID: PMC5850340 DOI: 10.1093/molbev/msx279] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The fitness effects of mutations can depend on the genetic backgrounds in which they occur and thereby influence future opportunities for evolving populations. In particular, mutations that fix in a population might change the selective benefit of subsequent mutations, giving rise to historical contingency. We examine these effects by focusing on mutations in a key metabolic gene, pykF, that arose independently early in the history of 12 Escherichia coli populations during a long-term evolution experiment. Eight different evolved nonsynonymous mutations conferred similar fitness benefits of ∼10% when transferred into the ancestor, and these benefits were greater than the one conferred by a deletion mutation. In contrast, the same mutations had highly variable fitness effects, ranging from ∼0% to 25%, in evolved clones isolated from the populations at 20,000 generations. Two mutations that were moved into these evolved clones conferred similar fitness effects in a given clone, but different effects between the clones, indicating epistatic interactions between the evolved pykF alleles and the other mutations that had accumulated in each evolved clone. We also measured the fitness effects of six evolved pykF alleles in the same populations in which they had fixed, but at seven time points between 0 and 50,000 generations. Variation in fitness effects was high at intermediate time points, and declined to a low level at 50,000 generations, when the mean fitness effect was lowest. Our results demonstrate the importance of genetic context in determining the fitness effects of different beneficial mutations even within the same gene.
Collapse
Affiliation(s)
- Fen Peng
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Scott Widmann
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Andrea Wünsche
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Kristina Duan
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| | - Katherine A Donovan
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
| | - Renwick C J Dobson
- Biomolecular Interaction Centre and School of Biological Sciences, University of Canterbury, Christchurch, New Zealand.,Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, Australia
| | - Richard E Lenski
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI
| | - Tim F Cooper
- Department of Biology and Biochemistry, University of Houston, Houston, TX
| |
Collapse
|
34
|
Haldane A, Levy RM. Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation. Phys Rev E 2019; 99:032405. [PMID: 30999494 PMCID: PMC6508952 DOI: 10.1103/physreve.99.032405] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Indexed: 02/02/2023]
Abstract
Potts statistical models have become a popular and promising way to analyze mutational covariation in protein multiple sequence alignments (MSAs) in order to understand protein structure, function, and fitness. But the statistical limitations of these models, which can have millions of parameters and are fit to MSAs of only thousands or hundreds of effective sequences using a procedure known as inverse Ising inference, are incompletely understood. In this work we predict how model quality degrades as a function of the number of sequences N, sequence length L, amino-acid alphabet size q, and the degree of conservation of the MSA, in different applications of the Potts models: in "fitness" predictions of individual protein sequences, in predictions of the effects of single-point mutations, in "double mutant cycle" predictions of epistasis, and in 3D contact prediction in protein structure. We show how as MSA depth N decreases an "overfitting" effect occurs such that sequences in the training MSA have overestimated fitness, and we predict the magnitude of this effect and discuss how regularization can help correct for it, using a regularization procedure motivated by statistical analysis of the effects of finite sampling. We find that as N decreases the quality of point-mutation effect predictions degrade least, fitness and epistasis predictions degrade more rapidly, and contact predictions are most affected. However, overfitting becomes negligible for MSA depths of more than a few thousand effective sequences, as often used in practice, and regularization becomes less necessary. We discuss the implications of these results for users of Potts covariation analysis.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of
Physics, and Institute for Computational Molecular Science, Temple
University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Department of
Chemistry, and Institute for Computational Molecular Science, Temple
University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
35
|
Hart GR, Ferguson AL. Computational design of hepatitis C virus immunogens from host-pathogen dynamics over empirical viral fitness landscapes. Phys Biol 2018; 16:016004. [PMID: 30484433 DOI: 10.1088/1478-3975/aaeec0] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Hepatitis C virus (HCV) afflicts 170 million people and kills 700 000 annually. Vaccination offers the most realistic and cost effective hope of controlling this epidemic, but despite 25 years of research, no vaccine is available. A major obstacle is HCV's extreme genetic variability and rapid mutational escape from immune pressure. Coupling maximum entropy inference with population dynamics simulations, we have employed a computational approach to translate HCV sequence databases into empirical landscapes of viral fitness and simulate the intrahost evolution of the viral quasispecies over these landscapes. We explicitly model the coupled host-pathogen dynamics by combining agent-based models of viral mutation with stochastically-integrated coupled ordinary differential equations for the host immune response. We validate our model in predicting the mutational evolution of the HCV RNA-dependent RNA polymerase (protein NS5B) within seven individuals for whom longitudinal sequencing data is available. We then use our approach to perform exhaustive in silico evaluation of putative immunogen candidates to rationally design tailored vaccines to simultaneously cripple viral fitness and block mutational escape within two selected individuals. By systematically identifying a small number of promising vaccine candidates, our empirical fitness landscapes and host-pathogen dynamics simulator can guide and accelerate experimental vaccine design efforts.
Collapse
Affiliation(s)
- Gregory R Hart
- Department of Physics, University of Illinois at Urbana-Champaign, 1110 West Green Street, Urbana, IL 61801, United States of America. Present address: Department of Therapeutic Radiology, Yale University, 202 LLCI, 15 York Street, New Haven, CT 96510, United States of America
| | | |
Collapse
|
36
|
Anishchenko I, Kundrotas PJ, Vakser IA. Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model. Biophys J 2018; 115:809-821. [PMID: 30122295 DOI: 10.1016/j.bpj.2018.07.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 07/31/2018] [Indexed: 12/18/2022] Open
Abstract
The energy function is the key component of protein modeling methodology. This work presents a semianalytical approach to the development of contact potentials for protein structure modeling. Residue-residue and atom-atom contact energies were derived by maximizing the probability of observing native sequences in a nonredundant set of protein structures. The optimization task was formulated as an inverse statistical mechanics problem applied to the Potts model. Its solution by pseudolikelihood maximization provides consistent estimates of coupling constants at atomic and residue levels. The best performance was achieved when interacting atoms were grouped according to their physicochemical properties. For individual protein structures, the performance of the contact potentials in distinguishing near-native structures from the decoys is similar to the top-performing scoring functions. The potentials also yielded significant improvement in the protein docking success rates. The potentials recapitulated experimentally determined protein stability changes upon point mutations and protein-protein binding affinities. The approach offers a different perspective on knowledge-based potentials and may serve as the basis for their further development.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas
| | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| |
Collapse
|
37
|
HIV-1 Protease Evolvability Is Affected by Synonymous Nucleotide Recoding. J Virol 2018; 92:JVI.00777-18. [PMID: 29875244 DOI: 10.1128/jvi.00777-18] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Accepted: 05/29/2018] [Indexed: 12/11/2022] Open
Abstract
One unexplored aspect of HIV-1 genetic architecture is how codon choice influences population diversity and evolvability. Here we compared the levels of development of HIV-1 resistance to protease inhibitors (PIs) between wild-type (WT) virus and a synthetic virus (MAX) carrying a codon-pair-reengineered protease sequence including 38 (13%) synonymous mutations. The WT and MAX viruses showed indistinguishable replication in MT-4 cells or peripheral blood mononuclear cells (PBMCs). Both viruses were subjected to serial passages in MT-4 cells, with selective pressure from the PIs atazanavir (ATV) and darunavir (DRV). After 32 successive passages, both the WT and MAX viruses developed phenotypic resistance to PIs (50% inhibitory concentrations [IC50s] of 14.6 ± 5.3 and 21.2 ± 9 nM, respectively, for ATV and 5.9 ± 1.0 and 9.3 ± 1.9, respectively, for DRV). Ultradeep sequence clonal analysis revealed that both viruses harbored previously described mutations conferring resistance to ATV and DRV. However, the WT and MAX virus proteases showed different resistance variant repertoires, with the G16E and V77I substitutions observed only in the WT and the L33F, S37P, G48L, Q58E/K, and L89I substitutions detected only in the MAX virus. Remarkably, the G48L and L89I substitutions are rarely found in vivo in PI-treated patients. The MAX virus showed significantly higher nucleotide and amino acid diversity of the propagated viruses with and without PIs (P < 0.0001), suggesting a higher selective pressure for change in this recoded virus. Our results indicate that the HIV-1 protease position in sequence space delineates the evolution of its mutant spectrum. Nevertheless, the investigated synonymously recoded variant showed mutational robustness and evolvability similar to those of the WT virus.IMPORTANCE Large-scale synonymous recoding of virus genomes is a new tool for exploring various aspects of virus biology. Synonymous virus genome recoding can be used to investigate how a virus's position in sequence space defines its mutant spectrum, evolutionary trajectory, and pathogenesis. In this study, we evaluated how synonymous recoding of the human immunodeficiency virus type 1 (HIV-1) protease affects the development of protease inhibitor (PI) resistance. HIV-1 protease is a main target of current antiretroviral therapies. Our present results demonstrate that the wild-type (WT) virus and a virus with recoded protease exhibited different patterns of resistance mutations after PI treatment. Nevertheless, the developed PI resistance phenotypes were indistinguishable between the recoded virus and the WT virus, suggesting that the HIV-1 strain with synonymously recoded protease and the WT virus are equally robust and evolvable.
Collapse
|
38
|
Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. Biochem Soc Trans 2017; 45:1253-1261. [DOI: 10.1042/bst20170063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 08/30/2017] [Accepted: 09/04/2017] [Indexed: 01/01/2023]
Abstract
Evolution leads to considerable changes in the sequence of biomolecules, while their overall structure and function remain quite conserved. The wealth of genomic sequences, the ‘Biological Big Data’, modern sequencing techniques provide allows us to investigate biomolecular evolution with unprecedented detail. Sophisticated statistical models can infer residue pair mutations resulting from spatial proximity. The introduction of predicted spatial adjacencies as constraints in biomolecular structure prediction workflows has transformed the field of protein and RNA structure prediction toward accuracies approaching the experimental resolution limit. Going beyond structure prediction, the same mathematical framework allows mimicking evolutionary fitness landscapes to infer signaling interactions, epistasis, or mutational landscapes.
Collapse
|
39
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|