1
|
Shimagaki KS, Barton JP. Efficient epistasis inference via higher-order covariance matrix factorization. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.10.14.618287. [PMID: 39464126 PMCID: PMC11507688 DOI: 10.1101/2024.10.14.618287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 10/29/2024]
Abstract
Epistasis can profoundly influence evolutionary dynamics. Temporal genetic data, consisting of sequences sampled repeatedly from a population over time, provides a unique resource to understand how epistasis shapes evolution. However, detecting epistatic interactions from sequence data is technically challenging. Existing methods for identifying epistasis are computationally demanding, limiting their applicability to real-world data. Here, we present a novel computational method for inferring epistasis that significantly reduces computational costs without sacrificing accuracy. We validated our approach in simulations and applied it to study HIV-1 evolution over multiple years in a data set of 16 individuals. There we observed a strong excess of negative epistatic interactions between beneficial mutations, especially mutations involved in immune escape. Our method is general and could be used to characterize epistasis in other large data sets.
Collapse
Affiliation(s)
- Kai S. Shimagaki
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| | - John P. Barton
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, USA
- Department of Physics and Astronomy, University of Pittsburgh, USA
| |
Collapse
|
2
|
Biswas A, Choudhuri I, Arnold E, Lyumkis D, Haldane A, Levy RM. Kinetic coevolutionary models predict the temporal emergence of HIV-1 resistance mutations under drug selection pressure. Proc Natl Acad Sci U S A 2024; 121:e2316662121. [PMID: 38557187 PMCID: PMC11009627 DOI: 10.1073/pnas.2316662121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 09/25/2023] [Accepted: 02/23/2024] [Indexed: 04/04/2024] Open
Abstract
Drug resistance in HIV type 1 (HIV-1) is a pervasive problem that affects the lives of millions of people worldwide. Although records of drug-resistant mutations (DRMs) have been extensively tabulated within public repositories, our understanding of the evolutionary kinetics of DRMs and how they evolve together remains limited. Epistasis, the interaction between a DRM and other residues in HIV-1 protein sequences, is key to the temporal evolution of drug resistance. We use a Potts sequence-covariation statistical-energy model of HIV-1 protein fitness under drug selection pressure, which captures epistatic interactions between all positions, combined with kinetic Monte-Carlo simulations of sequence evolutionary trajectories, to explore the acquisition of DRMs as they arise in an ensemble of drug-naive patient protein sequences. We follow the time course of 52 DRMs in the enzymes protease, RT, and integrase, the primary targets of antiretroviral therapy. The rates at which DRMs emerge are highly correlated with their observed acquisition rates reported in the literature when drug pressure is applied. This result highlights the central role of epistasis in determining the kinetics governing DRM emergence. Whereas rapidly acquired DRMs begin to accumulate as soon as drug pressure is applied, slowly acquired DRMs are contingent on accessory mutations that appear only after prolonged drug pressure. We provide a foundation for using computational methods to determine the temporal evolution of drug resistance using Potts statistical potentials, which can be used to gain mechanistic insights into drug resistance pathways in HIV-1 and other infectious agents.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Department of Physics, University of California San Diego, La Jolla, CA92093
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| | - Eddy Arnold
- Department of Chemistry and Chemical Biology, Center for Advanced Biotechnology and Medicine, Rutgers University, Piscataway, NJ08854
| | - Dmitry Lyumkis
- Laboratory of Genetics, The Salk Institute for Biological Studies, La Jolla, CA92037
- Graduate School of Biological Sciences, Department of Molecular Biology, University of California San Diego, La Jolla, CA92093
| | - Allan Haldane
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Physics, Temple University, Philadelphia, PA19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, College of Science and Technology, Temple University, Philadelphia, PA19122
- Department of Chemistry, Temple University, Philadelphia, PA19122
| |
Collapse
|
3
|
Zhang H, Bull RA, Quadeer AA, McKay MR. HCV E1 influences the fitness landscape of E2 and may enhance escape from E2-specific antibodies. Virus Evol 2023; 9:vead068. [PMID: 38107333 PMCID: PMC10722114 DOI: 10.1093/ve/vead068] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 04/05/2023] [Revised: 09/27/2023] [Accepted: 11/16/2023] [Indexed: 12/19/2023] Open
Abstract
The Hepatitis C virus (HCV) envelope glycoprotein E1 forms a non-covalent heterodimer with E2, the main target of neutralizing antibodies. How E1-E2 interactions influence viral fitness and contribute to resistance to E2-specific antibodies remain largely unknown. We investigate this problem using a combination of fitness landscape and evolutionary modeling. Our analysis indicates that E1 and E2 proteins collectively mediate viral fitness and suggests that fitness-compensating E1 mutations may accelerate escape from E2-targeting antibodies. Our analysis also identifies a set of E2-specific human monoclonal antibodies that are predicted to be especially resilient to escape via genetic variation in both E1 and E2, providing directions for robust HCV vaccine development.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
| | - Rowena A Bull
- School of Biomedical Sciences, Faculty of Medicine and Health, University of New South Wales, Sydney, NSW 2052, Australia
- The Kirby Institute for Infection and Immunity, Sydney, NSW 2052, Australia
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, SAR, China
- Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC 3010, Australia
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Parkville, VIC 3010, Australia
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, University of Melbourne, Melbourne, VIC 3000, Australia
| |
Collapse
|
4
|
Zhang H, Quadeer AA, McKay MR. Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis. Nat Commun 2023; 14:7457. [PMID: 37978179 PMCID: PMC10656532 DOI: 10.1038/s41467-023-42550-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 01/17/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
Direct-acting antiviral agents (DAAs) provide efficacious therapeutic treatments for chronic Hepatitis C virus (HCV) infection. However, emergence of drug resistance mutations (DRMs) can greatly affect treatment outcomes and impede virological cure. While multiple DRMs have been observed for all currently used DAAs, the evolutionary determinants of such mutations are not currently well understood. Here, by considering DAAs targeting the nonstructural 3 (NS3) protein of HCV, we present results suggesting that epistasis plays an important role in the evolution of DRMs. Employing a sequence-based fitness landscape model whose predictions correlate highly with experimental data, we identify specific DRMs that are associated with strong epistatic interactions, and these are found to be enriched in multiple NS3-specific DAAs. Evolutionary modelling further supports that the identified DRMs involve compensatory mutational interactions that facilitate relatively easy escape from drug-induced selection pressures. Our results indicate that accounting for epistasis is important for designing future HCV NS3-targeting DAAs.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC, Australia.
- Department of Microbiology and Immunology, University of Melbourne, at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia.
| |
Collapse
|
5
|
Li M, Oliveira Passos D, Shan Z, Smith SJ, Sun Q, Biswas A, Choudhuri I, Strutzenberg TS, Haldane A, Deng N, Li Z, Zhao XZ, Briganti L, Kvaratskhelia M, Burke TR, Levy RM, Hughes SH, Craigie R, Lyumkis D. Mechanisms of HIV-1 integrase resistance to dolutegravir and potent inhibition of drug-resistant variants. SCIENCE ADVANCES 2023; 9:eadg5953. [PMID: 37478179 PMCID: PMC11803526 DOI: 10.1126/sciadv.adg5953] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Academic Contribution Register] [Received: 01/09/2023] [Accepted: 06/16/2023] [Indexed: 07/23/2023]
Abstract
HIV-1 infection depends on the integration of viral DNA into host chromatin. Integration is mediated by the viral enzyme integrase and is blocked by integrase strand transfer inhibitors (INSTIs), first-line antiretroviral therapeutics widely used in the clinic. Resistance to even the best INSTIs is a problem, and the mechanisms of resistance are poorly understood. Here, we analyze combinations of the mutations E138K, G140A/S, and Q148H/K/R, which confer resistance to INSTIs. The investigational drug 4d more effectively inhibited the mutants compared with the approved drug Dolutegravir (DTG). We present 11 new cryo-EM structures of drug-resistant HIV-1 intasomes bound to DTG or 4d, with better than 3-Å resolution. These structures, complemented with free energy simulations, virology, and enzymology, explain the mechanisms of DTG resistance involving E138K + G140A/S + Q148H/K/R and show why 4d maintains potency better than DTG. These data establish a foundation for further development of INSTIs that potently inhibit resistant forms in integrase.
Collapse
Affiliation(s)
- Min Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | | | - Zelin Shan
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Steven J. Smith
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Qinfang Sun
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | - Avik Biswas
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Indrani Choudhuri
- Center for Biophysics and Computational Biology, and Department of Chemistry, Temple University, Philadelphia, PA 19122, USA
| | | | - Allan Haldane
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Nanjie Deng
- Department of Chemistry and Physical Sciences, Pace University, New York, NY, 10038, USA
| | - Zhaoyang Li
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Xue Zhi Zhao
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Lorenzo Briganti
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Mamuka Kvaratskhelia
- Division of Infectious Diseases, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Terrence R. Burke
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology and Department of Physics, Temple University, Philadelphia, PA 19122, USA
| | - Stephen H. Hughes
- Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, 21702, USA
| | - Robert Craigie
- National Institute of Diabetes and Digestive Diseases, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Dmitry Lyumkis
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, 92037, USA
- Graduate School of Biological Sciences, Section of Molecular Biology, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
6
|
Mohanty V, Louis AA. Robustness and stability of spin-glass ground states to perturbed interactions. Phys Rev E 2023; 107:014126. [PMID: 36797942 DOI: 10.1103/physreve.107.014126] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 12/10/2020] [Accepted: 12/16/2022] [Indexed: 06/18/2023]
Abstract
Across many problems in science and engineering, it is important to consider how much the output of a given system changes due to perturbations of the input. Here, we investigate the glassy phase of ±J spin glasses at zero temperature by calculating the robustness of the ground states to flips in the sign of single interactions. For random graphs and the Sherrington-Kirkpatrick model, we find relatively large sets of bond configurations that generate the same ground state. These sets can themselves be analyzed as subgraphs of the interaction domain, and we compute many of their topological properties. In particular, we find that the robustness, equivalent to the average degree, of these subgraphs is much higher than one would expect from a random model. Most notably, it scales in the same logarithmic way with the size of the subgraph as has been found in genotype-phenotype maps for RNA secondary structure folding, protein quaternary structure, gene regulatory networks, as well as for models for genetic programming. The similarity between these disparate systems suggests that this scaling may have a more universal origin.
Collapse
Affiliation(s)
- Vaibhav Mohanty
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, OX1 3NP, United Kingdom
- MD-PhD Program and Program in Health Sciences and Technology, Harvard Medical School, Boston, Massachusetts 02125, USA and Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Ard A Louis
- Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Oxford, OX1 3NP, United Kingdom
| |
Collapse
|
7
|
Zeng HL, Liu Y, Dichio V, Aurell E. Temporal epistasis inference from more than 3 500 000 SARS-CoV-2 genomic sequences. Phys Rev E 2022; 106:044409. [PMID: 36397507 DOI: 10.1103/physreve.106.044409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 02/24/2022] [Accepted: 09/19/2022] [Indexed: 06/16/2023]
Abstract
We use direct coupling analysis (DCA) to determine epistatic interactions between loci of variability of the SARS-CoV-2 virus, segmenting genomes by month of sampling. We use full-length, high-quality genomes from the GISAID repository up to October 2021 for a total of over 3 500 000 genomes. We find that DCA terms are more stable over time than correlations but nevertheless change over time as mutations disappear from the global population or reach fixation. Correlations are enriched for phylogenetic effects, and in particularly statistical dependencies at short genomic distances, while DCA brings out links at longer genomic distance. We discuss the validity of a DCA analysis under these conditions in terms of a transient auasilinkage equilibrium state. We identify putative epistatic interaction mutations involving loci in spike.
Collapse
Affiliation(s)
- Hong-Li Zeng
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Yue Liu
- School of Science, Nanjing University of Posts and Telecommunications, New Energy Technology Engineering Laboratory of Jiangsu Province, Nanjing 210023, China
| | - Vito Dichio
- Inria Paris, Aramis Project Team, Paris 75013, France
- Institut du Cerveau, ICM, Inserm U 1127, CNRS UMR 7225, Sorbonne Université, Paris, France
| | - Erik Aurell
- Department of Computational Science and Technology, AlbaNova University Center, SE-106 91 Stockholm, Sweden
| |
Collapse
|
8
|
Doelger J, Kardar M, Chakraborty AK. Inferring the intrinsic mutational fitness landscape of influenzalike evolving antigens from temporally ordered sequence data. Phys Rev E 2022; 105:024401. [PMID: 35291059 DOI: 10.1103/physreve.105.024401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 07/28/2021] [Accepted: 01/19/2022] [Indexed: 06/14/2023]
Abstract
There still are no effective long-term protective vaccines against viruses that continuously evolve under immune pressure such as seasonal influenza, which has caused, and can cause, devastating epidemics in the human population. To find such a broadly protective immunization strategy, it is useful to know how easily the virus can escape via mutation from specific antibody responses. This information is encoded in the fitness landscape of the viral proteins (i.e., knowledge of the viral fitness as a function of sequence). Here we present a computational method to infer the intrinsic mutational fitness landscape of influenzalike evolving antigens from yearly sequence data. We test inference performance with computer-generated sequence data that are based on stochastic simulations mimicking basic features of immune-driven viral evolution. Although the numerically simulated model does create a phylogeny based on the allowed mutations, the inference scheme does not use this information. This provides a contrast to other methods that rely on reconstruction of phylogenetic trees. Our method just needs a sufficient number of samples over multiple years. With our method, we are able to infer single as well as pairwise mutational fitness effects from the simulated sequence time series for short antigenic proteins. Our fitness inference approach may have potential future use for the design of immunization protocols by identifying intrinsically vulnerable immune target combinations on antigens that evolve under immune-driven selection. In the future, this approach may be applied to influenza and other novel viruses such as SARS-CoV-2, which evolves and, like influenza, might continue to escape the natural and vaccine-mediated immune pressures.
Collapse
Affiliation(s)
- Julia Doelger
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Mehran Kardar
- Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | - Arup K Chakraborty
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA; and Ragon Institute of MGH, MIT, and Harvard, Cambridge, Massachusetts 02139, USA
| |
Collapse
|
9
|
Biswas A, Haldane A, Levy RM. Limits to detecting epistasis in the fitness landscape of HIV. PLoS One 2022; 17:e0262314. [PMID: 35041711 PMCID: PMC8765623 DOI: 10.1371/journal.pone.0262314] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 10/15/2021] [Accepted: 12/20/2021] [Indexed: 02/05/2023] Open
Abstract
The rapid evolution of HIV is constrained by interactions between mutations which affect viral fitness. In this work, we explore the role of epistasis in determining the mutational fitness landscape of HIV for multiple drug target proteins, including Protease, Reverse Transcriptase, and Integrase. Epistatic interactions between residues modulate the mutation patterns involved in drug resistance, with unambiguous signatures of epistasis best seen in the comparison of the Potts model predicted and experimental HIV sequence "prevalences" expressed as higher-order marginals (beyond triplets) of the sequence probability distribution. In contrast, experimental measures of fitness such as viral replicative capacities generally probe fitness effects of point mutations in a single background, providing weak evidence for epistasis in viral systems. The detectable effects of epistasis are obscured by higher evolutionary conservation at sites. While double mutant cycles in principle, provide one of the best ways to probe epistatic interactions experimentally without reference to a particular background, we show that the analysis is complicated by the small dynamic range of measurements. Overall, we show that global pairwise interaction Potts models are necessary for predicting the mutational landscape of viral proteins.
Collapse
Affiliation(s)
- Avik Biswas
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Allan Haldane
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
| | - Ronald M. Levy
- Department of Physics, Temple University, Philadelphia, PA, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA, United States of America
- Department of Chemistry, Temple University, Philadelphia, PA, United States of America
| |
Collapse
|
10
|
Adenovirus-vectored vaccine containing multidimensionally conserved parts of the HIV proteome is immunogenic in rhesus macaques. Proc Natl Acad Sci U S A 2021; 118:2022496118. [PMID: 33514660 DOI: 10.1073/pnas.2022496118] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/18/2022] Open
Abstract
An effective vaccine that can protect against HIV infection does not exist. A major reason why a vaccine is not available is the high mutability of the virus, which enables it to evolve mutations that can evade human immune responses. This challenge is exacerbated by the ability of the virus to evolve compensatory mutations that can partially restore the fitness cost of immune-evading mutations. Based on the fitness landscapes of HIV proteins that account for the effects of coupled mutations, we designed a single long peptide immunogen comprising parts of the HIV proteome wherein mutations are likely to be deleterious regardless of the sequence of the rest of the viral protein. This immunogen was then stably expressed in adenovirus vectors that are currently in clinical development. Macaques immunized with these vaccine constructs exhibited T-cell responses that were comparable in magnitude to animals immunized with adenovirus vectors with whole HIV protein inserts. Moreover, the T-cell responses in immunized macaques strongly targeted regions contained in our immunogen. These results suggest that further studies aimed toward using our vaccine construct for HIV prophylaxis and cure are warranted.
Collapse
|
11
|
Ferguson AL, Ranganathan R. 100th Anniversary of Macromolecular Science Viewpoint: Data-Driven Protein Design. ACS Macro Lett 2021; 10:327-340. [PMID: 35549066 DOI: 10.1021/acsmacrolett.0c00885] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/13/2022]
Abstract
The design of synthetic proteins with the desired function is a long-standing goal in biomolecular science, with broad applications in biochemical engineering, agriculture, medicine, and public health. Rational de novo design and experimental directed evolution have achieved remarkable successes but are challenged by the requirement to find functional "needles" in the vast "haystack" of protein sequence space. Data-driven models for fitness landscapes provide a predictive map between protein sequence and function and can prospectively identify functional candidates for experimental testing to greatly improve the efficiency of this search. This Viewpoint reviews the applications of machine learning and, in particular, deep learning as part of data-driven protein engineering platforms. We highlight recent successes, review promising computational methodologies, and provide an outlook on future challenges and opportunities. The article is written for a broad audience comprising both polymer and protein scientists and computer and data scientists interested in an up-to-date review of recent innovations and opportunities in this rapidly evolving field.
Collapse
Affiliation(s)
- Andrew L. Ferguson
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
| | - Rama Ranganathan
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois 60637, United States
- Center for Physics of Evolving Systems, University of Chicago, Chicago, Illinois 60637, United States
- Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois 60637, United States
| |
Collapse
|
12
|
Zhang TH, Dai L, Barton JP, Du Y, Tan Y, Pang W, Chakraborty AK, Lloyd-Smith JO, Sun R. Predominance of positive epistasis among drug resistance-associated mutations in HIV-1 protease. PLoS Genet 2020; 16:e1009009. [PMID: 33085662 PMCID: PMC7605711 DOI: 10.1371/journal.pgen.1009009] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 11/13/2019] [Revised: 11/02/2020] [Accepted: 07/24/2020] [Indexed: 12/12/2022] Open
Abstract
Drug-resistant mutations often have deleterious impacts on replication fitness, posing a fitness cost that can only be overcome by compensatory mutations. However, the role of fitness cost in the evolution of drug resistance has often been overlooked in clinical studies or in vitro selection experiments, as these observations only capture the outcome of drug selection. In this study, we systematically profile the fitness landscape of resistance-associated sites in HIV-1 protease using deep mutational scanning. We construct a mutant library covering combinations of mutations at 11 sites in HIV-1 protease, all of which are associated with resistance to protease inhibitors in clinic. Using deep sequencing, we quantify the fitness of thousands of HIV-1 protease mutants after multiple cycles of replication in human T cells. Although the majority of resistance-associated mutations have deleterious effects on viral replication, we find that epistasis among resistance-associated mutations is predominantly positive. Furthermore, our fitness data are consistent with genetic interactions inferred directly from HIV sequence data of patients. Fitness valleys formed by strong positive epistasis reduce the likelihood of reversal of drug resistance mutations. Overall, our results support the view that strong compensatory effects are involved in the emergence of clinically observed resistance mutations and provide insights to understanding fitness barriers in the evolution and reversion of drug resistance.
Collapse
Affiliation(s)
- Tian-hao Zhang
- Molecular Biology Institute, University of California, Los Angeles, CA 90095, USA
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - John P. Barton
- Department of Physics and Astronomy, University of California, Riverside, CA 92521, USA
| | - Yushen Du
- School of Medicine, ZheJiang University, Hangzhou, 210000, China
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
| | - Wenwen Pang
- Department of Public Health Laboratory Science, West China School of Public Health, Sichuan University, Chengdu 610041, China
| | - Arup K. Chakraborty
- Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics, & Chemistry, Massachusetts Institute of Technology, MA 21309, USA
- Ragon Institute of MGH, MIT, & Harvard, Cambridge, MA 21309, USA
| | - James O. Lloyd-Smith
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, CA 90095, USA
| | - Ren Sun
- Molecular and Medical Pharmacology, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
13
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
14
|
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape. Nat Commun 2020; 11:377. [PMID: 31953427 PMCID: PMC6969152 DOI: 10.1038/s41467-019-14174-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 03/25/2018] [Accepted: 12/16/2019] [Indexed: 01/08/2023] Open
Abstract
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations. Poliovirus has a higher mutation rate than HIV, yet has been almost eradicated by vaccination while an effective vaccine against HIV does not exist. Here, the authors develop a fitness model for poliovirus viral protein 1 to show that it is subject to stringent evolutionary constraints that limit its ability to avoid vaccine-induced immune responses.
Collapse
|
15
|
Liu J, Pei J, Lai L. A combined computational and experimental strategy identifies mutations conferring resistance to drugs targeting the BCR-ABL fusion protein. Commun Biol 2020; 3:18. [PMID: 31925328 PMCID: PMC6952392 DOI: 10.1038/s42003-019-0743-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 02/08/2019] [Accepted: 12/17/2019] [Indexed: 12/25/2022] Open
Abstract
Drug resistance is of increasing concern, especially during the treatments of infectious diseases and cancer. To accelerate the drug discovery process in combating issues of drug resistance, here we developed a computational and experimental strategy to predict drug resistance mutations. Using BCR-ABL as a case study, we successfully recaptured the clinically observed mutations that confer resistance imatinib, nilotinib, dasatinib, bosutinib, and ponatinib. We then experimentally tested the predicted mutants in vitro. We found that although all mutants showed weakened binding strength as expected, the binding constants alone were not a good indicator of drug resistance. Instead, the half-maximal inhibitory concentration (IC50) was shown to be a good indicator of the incidence of the predicted mutations, together with change in catalytic efficacy. Our suggested strategy for predicting drug-resistance mutations includes the computational prediction and in vitro selection of mutants with increased IC50 values beyond the drug safety window.
Collapse
Affiliation(s)
- Jinxin Liu
- The PTN Graduate Program, College of Life Sciences, Peking University, Beijing, 100871, P. R. China
| | - Jianfeng Pei
- Center for Quantitative Biology, AAIS, Peking University, Beijing, 100871, P. R. China.
| | - Luhua Lai
- Center for Quantitative Biology, AAIS, Peking University, Beijing, 100871, P. R. China.
- BNLMS, Peking-Tsinghua Center for Life Sciences at College of Chemistry and Molecular Engineering, Peking University, Beijing, 100871, P. R. China.
| |
Collapse
|
16
|
Biswas A, Haldane A, Arnold E, Levy RM. Epistasis and entrenchment of drug resistance in HIV-1 subtype B. eLife 2019; 8:e50524. [PMID: 31591964 PMCID: PMC6783267 DOI: 10.7554/elife.50524] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 07/25/2019] [Accepted: 09/09/2019] [Indexed: 12/17/2022] Open
Abstract
The development of drug resistance in HIV is the result of primary mutations whose effects on viral fitness depend on the entire genetic background, a phenomenon called 'epistasis'. Based on protein sequences derived from drug-experienced patients in the Stanford HIV database, we use a co-evolutionary (Potts) Hamiltonian model to provide direct confirmation of epistasis involving many simultaneous mutations. Building on earlier work, we show that primary mutations leading to drug resistance can become highly favored (or entrenched) by the complex mutation patterns arising in response to drug therapy despite being disfavored in the wild-type background, and provide the first confirmation of entrenchment for all three drug-target proteins: protease, reverse transcriptase, and integrase; a comparative analysis reveals that NNRTI-induced mutations behave differently from the others. We further show that the likelihood of resistance mutations can vary widely in patient populations, and from the population average compared to specific molecular clones.
Collapse
Affiliation(s)
- Avik Biswas
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Allan Haldane
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
| | - Eddy Arnold
- Center for Advanced Biotechnology and MedicineRutgers UniversityPiscatawayUnited States
- Department of Chemistry and Chemical BiologyRutgers UniversityPiscatawayUnited States
| | - Ronald M Levy
- Center for Biophysics and Computational BiologyTemple UniversityPhiladelphiaUnited States
- Department of PhysicsTemple UniversityPhiladelphiaUnited States
- Department of ChemistryTemple UniversityPhiladelphiaUnited States
| |
Collapse
|
17
|
Henes M, Kosovrasti K, Lockbaum GJ, Leidner F, Nachum GS, Nalivaika EA, Bolon DN, Yilmaz NK, Schiffer CA, Whitfield TW. Molecular Determinants of Epistasis in HIV-1 Protease: Elucidating the Interdependence of L89V and L90M Mutations in Resistance. Biochemistry 2019; 58:3711-3726. [PMID: 31386353 PMCID: PMC6941756 DOI: 10.1021/acs.biochem.9b00446] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 01/07/2023]
Abstract
Protease inhibitors have the highest potency among antiviral therapies against HIV-1 infections, yet the virus can evolve resistance. Darunavir (DRV), currently the most potent Food and Drug Administration-approved protease inhibitor, retains potency against single-site mutations. However, complex combinations of mutations can confer resistance to DRV. While the interdependence between mutations within HIV-1 protease is key for inhibitor potency, the molecular mechanisms that underlie this control remain largely unknown. In this study, we investigated the interdependence between the L89V and L90M mutations and their effects on DRV binding. These two mutations have been reported to be positively correlated with one another in HIV-1 patient-derived protease isolates, with the presence of one mutation making the probability of the occurrence of the second mutation more likely. The focus of our investigation is a patient-derived isolate, with 24 mutations that we call "KY"; this variant includes the L89V and L90M mutations. Three additional KY variants with back-mutations, KY(V89L), KY(M90L), and the KY(V89L/M90L) double mutation, were used to experimentally assess the individual and combined effects of these mutations on DRV inhibition and substrate processing. The enzymatic assays revealed that the KY(V89L) variant, with methionine at residue 90, is highly resistant, but its catalytic function is compromised. When a leucine to valine mutation at residue 89 is present simultaneously with the L90M mutation, a rescue of catalytic efficiency is observed. Molecular dynamics simulations of these DRV-bound protease variants reveal how the L90M mutation induces structural changes throughout the enzyme that undermine the binding interactions.
Collapse
Affiliation(s)
- Mina Henes
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Klajdi Kosovrasti
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gordon J. Lockbaum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Florian Leidner
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Gily S. Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Ellen A. Nalivaika
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Daniel N.A. Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Nese Kurt Yilmaz
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Celia A. Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| | - Troy W. Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, Massachusetts 01605, USA
| |
Collapse
|
18
|
Barton JP, Rajkoomar E, Mann JK, Murakowski DK, Toyoda M, Mahiti M, Mwimanzi P, Ueno T, Chakraborty AK, Ndung'u T. Modelling and in vitro testing of the HIV-1 Nef fitness landscape. Virus Evol 2019; 5:vez029. [PMID: 31392033 PMCID: PMC6680064 DOI: 10.1093/ve/vez029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/21/2022] Open
Abstract
An effective vaccine is urgently required to curb the HIV-1 epidemic. We have previously described an approach to model the fitness landscape of several HIV-1 proteins, and have validated the results against experimental and clinical data. The fitness landscape may be used to identify mutation patterns harmful to virus viability, and consequently inform the design of immunogens that can target such regions for immunological control. Here we apply such an analysis and complementary experiments to HIV-1 Nef, a multifunctional protein which plays a key role in HIV-1 pathogenesis. We measured Nef-driven replication capacities as well as Nef-mediated CD4 and HLA-I down-modulation capacities of thirty-two different Nef mutants, and tested model predictions against these results. Furthermore, we evaluated the models using 448 patient-derived Nef sequences for which several Nef activities were previously measured. Model predictions correlated significantly with Nef-driven replication and CD4 down-modulation capacities, but not HLA-I down-modulation capacities, of the various Nef mutants. Similarly, in our analysis of patient-derived Nef sequences, CD4 down-modulation capacity correlated the most significantly with model predictions, suggesting that of the tested Nef functions, this is the most important in vivo. Overall, our results highlight how the fitness landscape inferred from patient-derived sequences captures, at least in part, the in vivo functional effects of mutations to Nef. However, the correlation between predictions of the fitness landscape and measured parameters of Nef function is not as accurate as the correlation observed in past studies for other proteins. This may be because of the additional complexity associated with inferring the cost of mutations on the diverse functions of Nef.
Collapse
Affiliation(s)
- John P Barton
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Erasha Rajkoomar
- HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Jaclyn K Mann
- HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa
| | - Dariusz K Murakowski
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Mako Toyoda
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan
| | | | | | - Takamasa Ueno
- Center for AIDS Research, Kumamoto University, Kumamoto, Japan.,International Research Center for Medical Sciences (IRCMS), Kumamoto University, Kumamoto, Japan
| | - Arup K Chakraborty
- Departments of Chemical Engineering, Physics, and Chemistry, Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA, USA.,Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA
| | - Thumbi Ndung'u
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard University, Boston, MA, USA.,HIV Pathogenesis Programme, Doris Duke Medical Research Institute, Nelson R. Mandela School of Medicine, University of KwaZulu-Natal, Durban, South Africa.,Africa Health Research Institute, Durban, South Africa.,Max Planck Institute for Infection Biology, Chariteplatz, D-10117 Berlin, Germany
| |
Collapse
|
19
|
Boucher JI, Whitfield TW, Dauphin A, Nachum G, Hollins C, Zeldovich KB, Swanstrom R, Schiffer CA, Luban J, Bolon DNA. Constrained Mutational Sampling of Amino Acids in HIV-1 Protease Evolution. Mol Biol Evol 2019; 36:798-810. [PMID: 30721995 DOI: 10.1093/molbev/msz022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/23/2022] Open
Abstract
The evolution of HIV-1 protein sequences should be governed by a combination of factors including nucleotide mutational probabilities, the genetic code, and fitness. The impact of these factors on protein sequence evolution is interdependent, making it challenging to infer the individual contribution of each factor from phylogenetic analyses alone. We investigated the protein sequence evolution of HIV-1 by determining an experimental fitness landscape of all individual amino acid changes in protease. We compared our experimental results to the frequency of protease variants in a publicly available data set of 32,163 sequenced isolates from drug-naïve individuals. The most common amino acids in sequenced isolates supported robust experimental fitness, indicating that the experimental fitness landscape captured key features of selection acting on protease during viral infections of hosts. Amino acid changes requiring multiple mutations from the likely ancestor were slightly less likely to support robust experimental fitness than single mutations, consistent with the genetic code favoring chemically conservative amino acid changes. Amino acids that were common in sequenced isolates were predominantly accessible by single mutations from the likely protease ancestor. Multiple mutations commonly observed in isolates were accessible by mutational walks with highly fit single mutation intermediates. Our results indicate that the prevalence of multiple-base mutations in HIV-1 protease is strongly influenced by mutational sampling.
Collapse
Affiliation(s)
- Jeffrey I Boucher
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Troy W Whitfield
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA
| | - Ann Dauphin
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Gily Nachum
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Carl Hollins
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Konstantin B Zeldovich
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Ronald Swanstrom
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC
| | - Celia A Schiffer
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| | - Jeremy Luban
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA.,Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA
| | - Daniel N A Bolon
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA
| |
Collapse
|
20
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
21
|
Gao CY, Cecconi F, Vulpiani A, Zhou HJ, Aurell E. DCA for genome-wide epistasis analysis: the statistical genetics perspective. Phys Biol 2019; 16:026002. [PMID: 30605896 DOI: 10.1088/1478-3975/aafbe0] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 11/11/2022]
Abstract
Direct coupling analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the quasi-linkage equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of clonal competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3000 genomes of the human pathogen Streptococcus pneumoniae.
Collapse
Affiliation(s)
- Chen-Yi Gao
- Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China. School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China
| | | | | | | | | |
Collapse
|
22
|
Data-driven supervised learning of a viral protease specificity landscape from deep sequencing and molecular simulations. Proc Natl Acad Sci U S A 2018; 116:168-176. [PMID: 30587591 DOI: 10.1073/pnas.1805256116] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/28/2022] Open
Abstract
Biophysical interactions between proteins and peptides are key determinants of molecular recognition specificity landscapes. However, an understanding of how molecular structure and residue-level energetics at protein-peptide interfaces shape these landscapes remains elusive. We combine information from yeast-based library screening, next-generation sequencing, and structure-based modeling in a supervised machine learning approach to report the comprehensive sequence-energetics-function mapping of the specificity landscape of the hepatitis C virus (HCV) NS3/4A protease, whose function-site-specific cleavages of the viral polyprotein-is a key determinant of viral fitness. We screened a library of substrates in which five residue positions were randomized and measured cleavability of ∼30,000 substrates (∼1% of the library) using yeast display and fluorescence-activated cell sorting followed by deep sequencing. Structure-based models of a subset of experimentally derived sequences were used in a supervised learning procedure to train a support vector machine to predict the cleavability of 3.2 million substrate variants by the HCV protease. The resulting landscape allows identification of previously unidentified HCV protease substrates, and graph-theoretic analyses reveal extensive clustering of cleavable and uncleavable motifs in sequence space. Specificity landscapes of known drug-resistant variants are similarly clustered. The described approach should enable the elucidation and redesign of specificity landscapes of a wide variety of proteases, including human-origin enzymes. Our results also suggest a possible role for residue-level energetics in shaping plateau-like functional landscapes predicted from viral quasispecies theory.
Collapse
|
23
|
Louie RHY, Kaczorowski KJ, Barton JP, Chakraborty AK, McKay MR. Fitness landscape of the human immunodeficiency virus envelope protein that is targeted by antibodies. Proc Natl Acad Sci U S A 2018; 115:E564-E573. [PMID: 29311326 PMCID: PMC5789945 DOI: 10.1073/pnas.1717765115] [Citation(s) in RCA: 74] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 01/16/2023] Open
Abstract
HIV is a highly mutable virus, and over 30 years after its discovery, a vaccine or cure is still not available. The isolation of broadly neutralizing antibodies (bnAbs) from HIV-infected patients has led to renewed hope for a prophylactic vaccine capable of combating the scourge of HIV. A major challenge is the design of immunogens and vaccination protocols that can elicit bnAbs that target regions of the virus's spike proteins where the likelihood of mutational escape is low due to the high fitness cost of mutations. Related challenges include the choice of combinations of bnAbs for therapy. An accurate representation of viral fitness as a function of its protein sequences (a fitness landscape), with explicit accounting of the effects of coupling between mutations, could help address these challenges. We describe a computational approach that has allowed us to infer a fitness landscape for gp160, the HIV polyprotein that comprises the viral spike that is targeted by antibodies. We validate the inferred landscape through comparisons with experimental fitness measurements, and various other metrics. We show that an effective antibody that prevents immune escape must selectively bind to high escape cost residues that are surrounded by those where mutations incur a low fitness cost, motivating future applications of our landscape for immunogen design.
Collapse
Affiliation(s)
- Raymond H Y Louie
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
- Institute for Advanced Study, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| | - Kevin J Kaczorowski
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - John P Barton
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139
| | - Arup K Chakraborty
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139;
- Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139
- Department of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139
- Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139
| | - Matthew R McKay
- Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong;
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Kowloon, Hong Kong
| |
Collapse
|
24
|
Flynn WF, Haldane A, Torbett BE, Levy RM. Inference of Epistatic Effects Leading to Entrenchment and Drug Resistance in HIV-1 Protease. Mol Biol Evol 2017; 34:1291-1306. [PMID: 28369521 PMCID: PMC5435099 DOI: 10.1093/molbev/msx095] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Academic Contribution Register] [Indexed: 12/27/2022] Open
Abstract
Understanding the complex mutation patterns that give rise to drug resistant viral strains provides a foundation for developing more effective treatment strategies for HIV/AIDS. Multiple sequence alignments of drug-experienced HIV-1 protease sequences contain networks of many pair correlations which can be used to build a (Potts) Hamiltonian model of these mutation patterns. Using this Hamiltonian model, we translate HIV-1 protease sequence covariation data into quantitative predictions for the probability of observing specific mutation patterns which are in agreement with the observed sequence statistics. We find that the statistical energies of the Potts model are correlated with the fitness of individual proteins containing therapy-associated mutations as estimated by in vitro measurements of protein stability and viral infectivity. We show that the penalty for acquiring primary resistance mutations depends on the epistatic interactions with the sequence background. Primary mutations which lead to drug resistance can become highly advantageous (or entrenched) by the complex mutation patterns which arise in response to drug therapy despite being destabilizing in the wildtype background. Anticipating epistatic effects is important for the design of future protease inhibitor therapies.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, New Brunswick, NJ
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA
- Department of Chemistry, Temple University, Philadelphia, PA
| |
Collapse
|
25
|
Chakraborty AK, Barton JP. Rational design of vaccine targets and strategies for HIV: a crossroad of statistical physics, biology, and medicine. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2017; 80:032601. [PMID: 28059778 DOI: 10.1088/1361-6633/aa574a] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Academic Contribution Register] [Indexed: 06/06/2023]
Abstract
Vaccination has saved more lives than any other medical procedure. Pathogens have now evolved that have not succumbed to vaccination using the empirical paradigms pioneered by Pasteur and Jenner. Vaccine design strategies that are based on a mechanistic understanding of the pertinent immunology and virology are required to confront and eliminate these scourges. In this perspective, we describe just a few examples of work aimed to achieve this goal by bringing together approaches from statistical physics with biology and clinical research.
Collapse
Affiliation(s)
- Arup K Chakraborty
- Departments of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Departments of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Institute for Medical Engineering & Science, Massachusetts Institute of Technology, Cambridge, MA 02139, United States of America. Ragon Institute of MIT, MGH, & Harvard, Cambridge, MA 02139, United States of America
| | | |
Collapse
|
26
|
Abstract
This is an exciting time for immunology because the future promises to be replete with exciting new discoveries that can be translated to improve health and treat disease in novel ways. Immunologists are attempting to answer increasingly complex questions concerning phenomena that range from the genetic, molecular, and cellular scales to that of organs, whole animals or humans, and populations of humans and pathogens. An important goal is to understand how the many different components involved interact with each other within and across these scales for immune responses to emerge, and how aberrant regulation of these processes causes disease. To aid this quest, large amounts of data can be collected using high-throughput instrumentation. The nonlinear, cooperative, and stochastic character of the interactions between components of the immune system as well as the overwhelming amounts of data can make it difficult to intuit patterns in the data or a mechanistic understanding of the phenomena being studied. Computational models are increasingly important in confronting and overcoming these challenges. I first describe an iterative paradigm of research that integrates laboratory experiments, clinical data, computational inference, and mechanistic computational models. I then illustrate this paradigm with a few examples from the recent literature that make vivid the power of bringing together diverse types of computational models with experimental and clinical studies to fruitfully interrogate the immune system.
Collapse
Affiliation(s)
- Arup K Chakraborty
- Institute for Medical Engineering and Science, Departments of Chemical Engineering, Physics, Chemistry, and Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; .,Ragon Institute of MGH, MIT and Harvard, Cambridge, Massachusetts 02139
| |
Collapse
|
27
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Academic Contribution Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|