1
|
Christensen PM, Martin J, Uppuluri A, Joyce LR, Wei Y, Guan Z, Morcos F, Palmer KL. Lipid discovery enabled by sequence statistics and machine learning. eLife 2024; 13:RP94929. [PMID: 39656516 PMCID: PMC11630815 DOI: 10.7554/elife.94929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2024] Open
Abstract
Bacterial membranes are complex and dynamic, arising from an array of evolutionary pressures. One enzyme that alters membrane compositions through covalent lipid modification is MprF. We recently identified that Streptococcus agalactiae MprF synthesizes lysyl-phosphatidylglycerol (Lys-PG) from anionic PG, and a novel cationic lipid, lysyl-glucosyl-diacylglycerol (Lys-Glc-DAG), from neutral glycolipid Glc-DAG. This unexpected result prompted us to investigate whether Lys-Glc-DAG occurs in other MprF-containing bacteria, and whether other novel MprF products exist. Here, we studied protein sequence features determining MprF substrate specificity. First, pairwise analyses identified several streptococcal MprFs synthesizing Lys-Glc-DAG. Second, a restricted Boltzmann machine-guided approach led us to discover an entirely new substrate for MprF in Enterococcus, diglucosyl-diacylglycerol (Glc2-DAG), and an expanded set of organisms that modify glycolipid substrates using MprF. Overall, we combined the wealth of available sequence data with machine learning to model evolutionary constraints on MprF sequences across the bacterial domain, thereby identifying a novel cationic lipid.
Collapse
Affiliation(s)
- Priya M Christensen
- Department of Biological Sciences, University of Texas at DallasRichardsonUnited States
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at DallasRichardsonUnited States
| | - Aparna Uppuluri
- Department of Biological Sciences, University of Texas at DallasRichardsonUnited States
| | - Luke R Joyce
- Department of Immunology and Microbiology, University of Colorado Anschutz Medical CampusAuroraUnited States
| | - Yahan Wei
- School of Podiatric Medicine, University of Texas Rio Grande ValleyHarlingenUnited States
| | - Ziqiang Guan
- Department of Biochemistry, Duke University Medical CenterDurhamUnited States
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at DallasRichardsonUnited States
- Department of Bioengineering, University of Texas at DallasRichardsonUnited States
- Center for Systems Biology, University of Texas at DallasRichardsonUnited States
| | - Kelli L Palmer
- Department of Biological Sciences, University of Texas at DallasRichardsonUnited States
| |
Collapse
|
2
|
Cao W, Huang C, Zhou X, Zhou S, Deng Y. Engineering two-component systems for advanced biosensing: From architecture to applications in biotechnology. Biotechnol Adv 2024; 75:108404. [PMID: 39002783 DOI: 10.1016/j.biotechadv.2024.108404] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 06/05/2024] [Accepted: 07/07/2024] [Indexed: 07/15/2024]
Abstract
Two-component systems (TCSs) are prevalent signaling pathways in bacteria. These systems mediate phosphotransfer between histidine kinase and a response regulator, facilitating responses to diverse physical, chemical, and biological stimuli. Advancements in synthetic and structural biology have repurposed TCSs for applications in monitoring heavy metals, disease-associated biomarkers, and the production of bioproducts. However, the utility of many TCS biosensors is hindered by undesired performance due to the lack of effective engineering methods. Here, we briefly discuss the architectures and regulatory mechanisms of TCSs. We also summarize the recent advancements in TCS engineering by experimental or computational-based methods to fine-tune the biosensor functional parameters, such as response curve and specificity. Engineered TCSs have great potential in the medical, environmental, and biorefinery fields, demonstrating a crucial role in a wide area of biotechnology.
Collapse
Affiliation(s)
- Wenyan Cao
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Chao Huang
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Xuan Zhou
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China
| | - Shenghu Zhou
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China.
| | - Yu Deng
- School of Biotechnology and Key Laboratory of Industrial Biotechnology of Ministry of Education, Jiangnan University, Wuxi 214122, China.
| |
Collapse
|
3
|
Christensen PM, Martin J, Uppuluri A, Joyce LR, Wei Y, Guan Z, Morcos F, Palmer KL. Lipid discovery enabled by sequence statistics and machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.10.12.562061. [PMID: 37873101 PMCID: PMC10592805 DOI: 10.1101/2023.10.12.562061] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Bacterial membranes are complex and dynamic, arising from an array of evolutionary pressures. One enzyme that alters membrane compositions through covalent lipid modification is MprF. We recently identified that Streptococcus agalactiae MprF synthesizes lysyl-phosphatidylglycerol (Lys-PG) from anionic PG, and a novel cationic lipid, lysyl-glucosyl-diacylglycerol (Lys-Glc-DAG), from neutral glycolipid Glc-DAG. This unexpected result prompted us to investigate whether Lys-Glc-DAG occurs in other MprF-containing bacteria, and whether other novel MprF products exist. Here, we studied protein sequence features determining MprF substrate specificity. First, pairwise analyses identified several streptococ-cal MprFs synthesizing Lys-Glc-DAG. Second, a restricted Boltzmann machine-guided approach led us to discover an entirely new substrate for MprF in Enterococcus , diglucosyl-diacylglycerol (Glc 2 -DAG), and an expanded set of organisms that modify glycolipid substrates using MprF. Overall, we combined the wealth of available sequence data with machine learning to model evolutionary constraints on MprF sequences across the bacterial domain, thereby identifying a novel cationic lipid.
Collapse
|
4
|
Kinshuk S, Li L, Meckes B, Chan CTY. Sequence-Based Protein Design: A Review of Using Statistical Models to Characterize Coevolutionary Traits for Developing Hybrid Proteins as Genetic Sensors. Int J Mol Sci 2024; 25:8320. [PMID: 39125888 PMCID: PMC11312098 DOI: 10.3390/ijms25158320] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Revised: 07/23/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Statistical analyses of homologous protein sequences can identify amino acid residue positions that co-evolve to generate family members with different properties. Based on the hypothesis that the coevolution of residue positions is necessary for maintaining protein structure, coevolutionary traits revealed by statistical models provide insight into residue-residue interactions that are important for understanding protein mechanisms at the molecular level. With the rapid expansion of genome sequencing databases that facilitate statistical analyses, this sequence-based approach has been used to study a broad range of protein families. An emerging application of this approach is to design hybrid transcriptional regulators as modular genetic sensors for novel wiring between input signals and genetic elements to control outputs. Among many allosterically regulated regulator families, the members contain structurally conserved and functionally independent protein domains, including a DNA-binding module (DBM) for interacting with a specific genetic element and a ligand-binding module (LBM) for sensing an input signal. By hybridizing a DBM and an LBM from two different family members, a hybrid regulator can be created with a new combination of signal-detection and DNA-recognition properties not present in natural systems. In this review, we present recent advances in the development of hybrid regulators and their applications in cellular engineering, especially focusing on the use of statistical analyses for characterizing DBM-LBM interactions and hybrid regulator design. Based on these studies, we then discuss the current limitations and potential directions for enhancing the impact of this sequence-based design approach.
Collapse
Affiliation(s)
- Sahaj Kinshuk
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Lin Li
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
| | - Brian Meckes
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| | - Clement T. Y. Chan
- Department of Biomedical Engineering, College of Engineering, University of North Texas, 3940 N Elm Street, Denton, TX 76207, USA; (S.K.); (L.L.); (B.M.)
- BioDiscovery Institute, University of North Texas, 1155 Union Circle #305220, Denton, TX 76203, USA
| |
Collapse
|
5
|
Martin J, Lequerica Mateos M, Onuchic JN, Coluzza I, Morcos F. Machine learning in biological physics: From biomolecular prediction to design. Proc Natl Acad Sci U S A 2024; 121:e2311807121. [PMID: 38913893 PMCID: PMC11228481 DOI: 10.1073/pnas.2311807121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/26/2024] Open
Abstract
Machine learning has been proposed as an alternative to theoretical modeling when dealing with complex problems in biological physics. However, in this perspective, we argue that a more successful approach is a proper combination of these two methodologies. We discuss how ideas coming from physical modeling neuronal processing led to early formulations of computational neural networks, e.g., Hopfield networks. We then show how modern learning approaches like Potts models, Boltzmann machines, and the transformer architecture are related to each other, specifically, through a shared energy representation. We summarize recent efforts to establish these connections and provide examples on how each of these formulations integrating physical modeling and machine learning have been successful in tackling recent problems in biomolecular structure, dynamics, function, evolution, and design. Instances include protein structure prediction; improvement in computational complexity and accuracy of molecular dynamics simulations; better inference of the effects of mutations in proteins leading to improved evolutionary modeling and finally how machine learning is revolutionizing protein engineering and design. Going beyond naturally existing protein sequences, a connection to protein design is discussed where synthetic sequences are able to fold to naturally occurring motifs driven by a model rooted in physical principles. We show that this model is "learnable" and propose its future use in the generation of unique sequences that can fold into a target structure.
Collapse
Affiliation(s)
- Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Marcos Lequerica Mateos
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of BioSciences, Rice University, Houston, TX77005
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Universidad del País Vasco/Euskal Herriko Unibertsitatea Science Park, Leioa48940, Spain
- Basque Foundation for Science, Ikerbasque, Bilbao48940, Spain
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
6
|
Fram B, Su Y, Truebridge I, Riesselman AJ, Ingraham JB, Passera A, Napier E, Thadani NN, Lim S, Roberts K, Kaur G, Stiffler MA, Marks DS, Bahl CD, Khan AR, Sander C, Gauthier NP. Simultaneous enhancement of multiple functional properties using evolution-informed protein design. Nat Commun 2024; 15:5141. [PMID: 38902262 PMCID: PMC11190266 DOI: 10.1038/s41467-024-49119-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 05/24/2024] [Indexed: 06/22/2024] Open
Abstract
A major challenge in protein design is to augment existing functional proteins with multiple property enhancements. Altering several properties likely necessitates numerous primary sequence changes, and novel methods are needed to accurately predict combinations of mutations that maintain or enhance function. Models of sequence co-variation (e.g., EVcouplings), which leverage extensive information about various protein properties and activities from homologous protein sequences, have proven effective for many applications including structure determination and mutation effect prediction. We apply EVcouplings to computationally design variants of the model protein TEM-1 β-lactamase. Nearly all the 14 experimentally characterized designs were functional, including one with 84 mutations from the nearest natural homolog. The designs also had large increases in thermostability, increased activity on multiple substrates, and nearly identical structure to the wild type enzyme. This study highlights the efficacy of evolutionary models in guiding large sequence alterations to generate functional diversity for protein design applications.
Collapse
Affiliation(s)
- Benjamin Fram
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yang Su
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Ian Truebridge
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Adam J Riesselman
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Program in Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - John B Ingraham
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Alessandro Passera
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Research Institute of Molecular Pathology (IMP), Vienna BioCenter (VBC), Campus-Vienna-Biocenter 1, 1030, Vienna, Austria
| | - Eve Napier
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
| | - Nicole N Thadani
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Apriori Bio, Cambridge, MA, USA
| | - Samuel Lim
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Kristen Roberts
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Gurleen Kaur
- Selux Diagnostics Inc., 56 Roland Street, Charlestown, MA, USA
| | - Michael A Stiffler
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Dyno Therapeutics, 343 Arsenal Street, Watertown, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christopher D Bahl
- Institute for Protein Innovation, Boston, MA, USA
- Division of Hematology/Oncology, Boston Children's Hospital, Harvard Medical School, Boston, MA, USA
- AI Proteins, Boston, MA, USA
| | - Amir R Khan
- School of Biochemistry and Immunology, Trinity College Dublin, Dublin 2, Ireland
- Division of Newborn Medicine, Boston Children's Hospital, Boston, MA, USA
| | - Chris Sander
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Nicholas P Gauthier
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
7
|
Chan CTY, Kennedy V, Kinshuk S. A domain swapping strategy to create modular transcriptional regulators for novel topology in genetic network. Biotechnol Adv 2024; 72:108345. [PMID: 38513775 PMCID: PMC11135624 DOI: 10.1016/j.biotechadv.2024.108345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 02/23/2024] [Accepted: 03/18/2024] [Indexed: 03/23/2024]
Abstract
Transcriptional regulators generate connections between biological signals and genetic outputs. They are used robustly for sensing input signals in building genetic circuits. However, each regulator can only generate a fixed connection, which generates constraints in linking multiple signals for more complex processes. Recent studies discovered that a domain swapping strategy can be applied to various regulator families to create modular regulators for new signal-output connections, significantly broadening possibilities in circuit design. Here we review the development of this emerging strategy, the use of resulting modular regulators for creating novel genetic response behaviors, and current limitations and solutions for further advancing the design of modular regulators.
Collapse
Affiliation(s)
- Clement T Y Chan
- Department of Biomedical Engineering, University of North Texas, TX 76207, USA; BioDiscovery Institute, University of North Texas, TX 76207, USA.
| | - Vincenzo Kennedy
- Department of Biomedical Engineering, University of North Texas, TX 76207, USA
| | - Sahaj Kinshuk
- Department of Biomedical Engineering, University of North Texas, TX 76207, USA
| |
Collapse
|
8
|
Nartey C, Koo HJ, Laurendon C, Shaik HZ, O’maille P, Noel JP, Morcos F. Coevolutionary Information Captures Catalytic Functions and Reveals Divergent Roles of Terpene Synthase Interdomain Connections. Biochemistry 2024; 63:355-366. [PMID: 38206111 PMCID: PMC10851433 DOI: 10.1021/acs.biochem.3c00578] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/22/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024]
Abstract
Inferring the historical and biophysical causes of diversity within protein families is a complex puzzle. A key to unraveling this problem is characterizing the rugged topography of sequence-function adaptive landscapes. Using biochemical data from a 29 = 512 combinatorial library of tobacco 5-epi-aristolochene synthase (TEAS) mutants engineered to make the native major product of Egyptian henbane premnaspirodiene synthase (HPS) and a complementary 512 mutant HPS library, we address the question of how product specificity is controlled. These data sets reveal that HPS is far more robust and resistant to mutations than TEAS, where most mutants are promiscuous. We also combine experimental data with a sequence Potts Hamiltonian model and direct coupling analysis to quantify mutant fitness. Our results demonstrate that the Hamiltonian captures variation in product outputs across both libraries, clusters native family members based on their substrate specificities, and exposes the divergent catalytic roles of couplings between the catalytic and noncatalytic domains of TEAS versus HPS. Specifically, we found that the role of the interdomain connectivities in specifying product output is more important in TEAS than connectivities within the catalytic domain. Despite being 75% identical, this property is not shared by HPS, where connectivities within the catalytic domain are more important for specificity. By solving the X-ray crystal structure of HPS, we assessed structural bases for their interdomain network differences. Last, we calculate the product profile Shannon entropies of the two libraries, which showcases that site-site connectivities also play divergent roles in catalytic accuracy.
Collapse
Affiliation(s)
- Charisse
M. Nartey
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
| | - Hyun Jo Koo
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Caroline Laurendon
- John
Innes Centre, Department of Metabolic Biology, Norwich Research Park, Norwich NR4 7UH, U.K.
| | - Hana Z. Shaik
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| | - Paul O’maille
- John
Innes Centre, Institute of Food Research, Food & Health Programme, Norwich Research Park, Norwich NR4 7UA, U.K.
| | - Joseph P. Noel
- Howard
Hughes Medical Institute, The Salk Institute for Biological Studies, Jack H. Skirball Center for Chemical Biology and Proteomics, 10010 North Torrey Pines Road, La Jolla, California 92037, United States
| | - Faruck Morcos
- Department
of Biological Sciences, The University of
Texas at Dallas, Richardson, Texas 75080, United States
- Department
of Bioengineering, The University of Texas
at Dallas, Richardson, Texas 75080, United States
- Center for
Systems Biology, The University of Texas
at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
9
|
Alvarez S, Nartey CM, Mercado N, de la Paz JA, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. Proc Natl Acad Sci U S A 2024; 121:e2308895121. [PMID: 38285950 PMCID: PMC10861889 DOI: 10.1073/pnas.2308895121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/19/2023] [Indexed: 01/31/2024] Open
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called sequence evolution with epistatic contributions (SEEC). Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo [Formula: see text]-lactamase activity in Escherichia coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their wild-type predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes, and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | | | - Tea Huseinbegovic
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| |
Collapse
|
10
|
Tee WV, Berezovsky IN. Allosteric drugs: New principles and design approaches. Curr Opin Struct Biol 2024; 84:102758. [PMID: 38171188 DOI: 10.1016/j.sbi.2023.102758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024]
Abstract
Focusing on an important biomedical implication of allostery - design of allosteric drugs, we describe characteristics of allosteric sites, effectors, and their modes of actions distinguishing them from the orthosteric counterparts and calling for new principles and protocols in the quests for allosteric drugs. We show the importance of considering both binding affinity and allosteric signaling in establishing the structure-activity relationships (SARs) toward design of allosteric effectors, arguing that pairs of allosteric sites and their effector ligands - the site-effector pairs - should be generated and adjusted simultaneously in the framework of what we call directed design protocol. Key ideas and approaches for designing allosteric effectors including reverse perturbation, targeted and agnostic analysis are also discussed here. Several promising computational approaches are highlighted, along with the need for and potential advantages of utilizing generative models to facilitate discovery/design of new allosteric drugs.
Collapse
Affiliation(s)
- Wei-Ven Tee
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A∗STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671.
| | - Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A∗STAR), 30 Biopolis Street, #07-01, Matrix, Singapore 138671; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| |
Collapse
|
11
|
Lu J, Rahman MI, Kazan IC, Halloran NR, Bobkov AA, Ozkan SB, Ghirlanda G. Engineering gain-of-function mutants of a WW domain by dynamics and structural analysis. Protein Sci 2023; 32:e4759. [PMID: 37574787 PMCID: PMC10464296 DOI: 10.1002/pro.4759] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 07/17/2023] [Accepted: 08/10/2023] [Indexed: 08/15/2023]
Abstract
Proteins gain optimal fitness such as foldability and function through evolutionary selection. However, classical studies have found that evolutionarily designed protein sequences alone cannot guarantee foldability, or at least not without considering local contacts associated with the initial folding steps. We previously showed that foldability and function can be restored by removing frustration in the folding energy landscape of a model WW domain protein, CC16, which was designed based on Statistical Coupling Analysis (SCA). Substitutions ensuring the formation of five local contacts identified as "on-path" were selected using the closest homolog native folded sequence, N21. Surprisingly, the resulting sequence, CC16-N21, bound to Group I peptides, while N21 did not. Here, we identified single-point mutations that enable N21 to bind a Group I peptide ligand through structure and dynamic-based computational design. Comparison of the docked position of the CC16-N21/ligand complex with the N21 structure showed that residues at positions 9 and 19 are important for peptide binding, whereas the dynamic profiles identified position 10 as allosterically coupled to the binding site and exhibiting different dynamics between N21 and CC16-N21. We found that swapping these positions in N21 with matched residues from CC16-N21 recovers nature-like binding affinity to N21. This study validates the use of dynamic profiles as guiding principles for affecting the binding affinity of small proteins.
Collapse
Affiliation(s)
- Jin Lu
- Department of Physics and Center for Biological PhysicsArizona State UniversityTempeArizonaUSA
| | | | - I. Can Kazan
- Department of Physics and Center for Biological PhysicsArizona State UniversityTempeArizonaUSA
| | | | - Andrey A. Bobkov
- Conrad Prebys Center for Chemical GenomicsSanford Burnham Prebys Medical Discovery InstituteCaliforniaUSA
| | - S. Banu Ozkan
- Department of Physics and Center for Biological PhysicsArizona State UniversityTempeArizonaUSA
| | | |
Collapse
|
12
|
Alvarez S, Nartey CM, Mercado N, de la Paz A, Huseinbegovic T, Morcos F. In vivo functional phenotypes from a computational epistatic model of evolution. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.24.542176. [PMID: 37292895 PMCID: PMC10245989 DOI: 10.1101/2023.05.24.542176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Computational models of evolution are valuable for understanding the dynamics of sequence variation, to infer phylogenetic relationships or potential evolutionary pathways and for biomedical and industrial applications. Despite these benefits, few have validated their propensities to generate outputs with in vivo functionality, which would enhance their value as accurate and interpretable evolutionary algorithms. We demonstrate the power of epistasis inferred from natural protein families to evolve sequence variants in an algorithm we developed called Sequence Evolution with Epistatic Contributions. Utilizing the Hamiltonian of the joint probability of sequences in the family as fitness metric, we sampled and experimentally tested for in vivo β -lactamase activity in E. coli TEM-1 variants. These evolved proteins can have dozens of mutations dispersed across the structure while preserving sites essential for both catalysis and interactions. Remarkably, these variants retain family-like functionality while being more active than their WT predecessor. We found that depending on the inference method used to generate the epistatic constraints, different parameters simulate diverse selection strengths. Under weaker selection, local Hamiltonian fluctuations reliably predict relative changes to variant fitness, recapitulating neutral evolution. SEEC has the potential to explore the dynamics of neofunctionalization, characterize viral fitness landscapes and facilitate vaccine development.
Collapse
Affiliation(s)
- Sophia Alvarez
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Charisse M. Nartey
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Nicholas Mercado
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Alberto de la Paz
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Tea Huseinbegovic
- School of Natural Sciences and Mathematics, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX 75080, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX 75080, USA
| |
Collapse
|
13
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
14
|
Glasgow A, Hobbs HT, Perry ZR, Wells ML, Marqusee S, Kortemme T. Ligand-specific changes in conformational flexibility mediate long-range allostery in the lac repressor. Nat Commun 2023; 14:1179. [PMID: 36859492 PMCID: PMC9977783 DOI: 10.1038/s41467-023-36798-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Accepted: 02/17/2023] [Indexed: 03/03/2023] Open
Abstract
Biological regulation ubiquitously depends on protein allostery, but the regulatory mechanisms are incompletely understood, especially in proteins that undergo ligand-induced allostery with few structural changes. Here we used hydrogen-deuterium exchange with mass spectrometry (HDX/MS) to map allosteric effects in a paradigm ligand-responsive transcription factor, the lac repressor (LacI), in different functional states (apo, or bound to inducer, anti-inducer, and/or DNA). Although X-ray crystal structures of the LacI core domain in these states are nearly indistinguishable, HDX/MS experiments reveal widespread differences in flexibility. We integrate these results with modeling of protein-ligand-solvent interactions to propose a revised model for allostery in LacI, where ligand binding allosterically shifts the conformational ensemble as a result of distinct changes in the rigidity of secondary structures in the different states. Our model provides a mechanistic basis for the altered function of distal mutations. More generally, our approach provides a platform for characterizing and engineering protein allostery.
Collapse
Affiliation(s)
- Anum Glasgow
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA.
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, 10032, USA.
| | - Helen T Hobbs
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Zion R Perry
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, 06511, USA
| | - Malcolm L Wells
- Department of Physics, Columbia University, New York, NY, 10032, USA
| | - Susan Marqusee
- Department of Chemistry, University of California, Berkeley, Berkeley, CA, 94720, USA
- Department of Molecular & Cell Biology, University of California, Berkeley, Berkeley, CA, 94720, USA
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, 94158, USA
| |
Collapse
|
15
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
16
|
Abstract
Repeat proteins are made with tandem copies of similar amino acid stretches that fold into elongated architectures. These proteins constitute excellent model systems to investigate how evolution relates to structure, folding, and function. Here, we propose a scheme to map evolutionary information at the sequence level to a coarse-grained model for repeat-protein folding and use it to investigate the folding of thousands of repeat proteins. We model the energetics by a combination of an inverse Potts-model scheme with an explicit mechanistic model of duplications and deletions of repeats to calculate the evolutionary parameters of the system at the single-residue level. These parameters are used to inform an Ising-like model that allows for the generation of folding curves, apparent domain emergence, and occupation of intermediate states that are highly compatible with experimental data in specific case studies. We analyzed the folding of thousands of natural Ankyrin repeat proteins and found that a multiplicity of folding mechanisms are possible. Fully cooperative all-or-none transitions are obtained for arrays with enough sequence-similar elements and strong interactions between them, while noncooperative element-by-element intermittent folding arose if the elements are dissimilar and the interactions between them are energetically weak. Additionally, we characterized nucleation-propagation and multidomain folding mechanisms. We show that the global stability and cooperativity of the repeating arrays can be predicted from simple sequence scores.
Collapse
|
17
|
Vigué L, Croce G, Petitjean M, Ruppé E, Tenaillon O, Weigt M. Deciphering polymorphism in 61,157 Escherichia coli genomes via epistatic sequence landscapes. Nat Commun 2022; 13:4030. [PMID: 35821377 PMCID: PMC9276797 DOI: 10.1038/s41467-022-31643-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Accepted: 06/27/2022] [Indexed: 12/05/2022] Open
Abstract
Characterizing the effect of mutations is key to understand the evolution of protein sequences and to separate neutral amino-acid changes from deleterious ones. Epistatic interactions between residues can lead to a context dependence of mutation effects. Context dependence constrains the amino-acid changes that can contribute to polymorphism in the short term, and the ones that can accumulate between species in the long term. We use computational approaches to accurately predict the polymorphisms segregating in a panel of 61,157 Escherichia coli genomes from the analysis of distant homologues. By comparing a context-aware Direct-Coupling Analysis modelling to a non-epistatic approach, we show that the genetic context strongly constrains the tolerable amino acids in 30% to 50% of amino-acid sites. The study of more distant species suggests the gradual build-up of genetic context over long evolutionary timescales by the accumulation of small epistatic contributions.
Collapse
Affiliation(s)
- Lucile Vigué
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Giancarlo Croce
- Department of Oncology, Ludwig Institute for Cancer Research Lausanne, University of Lausanne, Lausanne, Switzerland
- Swiss Institute of Bioinformatics-SIB, Lausanne, Switzerland
| | - Marie Petitjean
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
| | - Etienne Ruppé
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France
- Laboratoire de Bactériologie, Hôpital Bichat, APHP, Paris, France
| | - Olivier Tenaillon
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018, Paris, France.
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology-LCQB, Paris, France.
| |
Collapse
|
18
|
Ding D, Green AG, Wang B, Lite TLV, Weinstein EN, Marks DS, Laub MT. Co-evolution of interacting proteins through non-contacting and non-specific mutations. Nat Ecol Evol 2022; 6:590-603. [PMID: 35361892 PMCID: PMC9090974 DOI: 10.1038/s41559-022-01688-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2021] [Accepted: 01/31/2022] [Indexed: 01/08/2023]
Abstract
Proteins often accumulate neutral mutations that do not affect current functions but can profoundly influence future mutational possibilities and functions. Understanding such hidden potential has major implications for protein design and evolutionary forecasting but has been limited by a lack of systematic efforts to identify potentiating mutations. Here, through the comprehensive analysis of a bacterial toxin-antitoxin system, we identified all possible single substitutions in the toxin that enable it to tolerate otherwise interface-disrupting mutations in its antitoxin. Strikingly, the majority of enabling mutations in the toxin do not contact and promote tolerance non-specifically to many different antitoxin mutations, despite covariation in homologues occurring primarily between specific pairs of contacting residues across the interface. In addition, the enabling mutations we identified expand future mutational paths that both maintain old toxin-antitoxin interactions and form new ones. These non-specific mutations are missed by widely used covariation and machine learning methods. Identifying such enabling mutations will be critical for ensuring continued binding of therapeutically relevant proteins, such as antibodies, aimed at evolving targets.
Collapse
Affiliation(s)
- David Ding
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Anna G Green
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Boyuan Wang
- Department of Pharmacology, UT Southwestern Medical Center, Dallas, TX, USA
| | - Thuy-Lan Vo Lite
- Harvard-MIT Division of Health Sciences and Technology, Harvard Medical School, Boston, MA, USA
| | | | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Michael T Laub
- Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Howard Hughes Medical Institute, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
19
|
Morcos F. Characterizing the landscape of evolvability. Nat Ecol Evol 2022; 6:500-501. [PMID: 35361891 PMCID: PMC9150722 DOI: 10.1038/s41559-022-01731-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A framework to experimentally traverse the large space of functionally neutral variants in a toxin–antitoxin protein complex reveals insights on evolvability and entrenchment of molecular interactions.
Collapse
Affiliation(s)
- Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA.
| |
Collapse
|
20
|
Chi H, Zhou Q, Tutol JN, Phelps SM, Lee J, Kapadia P, Morcos F, Dodani SC. Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor. ACS Synth Biol 2022; 11:1627-1638. [PMID: 35389621 PMCID: PMC9184236 DOI: 10.1021/acssynbio.2c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Our understanding of chloride in biology has been accelerated through the application of fluorescent protein-based sensors in living cells. These sensors can be generated and diversified to have a range of properties using laboratory-guided evolution. Recently, we established that the fluorescent proton-pumping rhodopsin wtGR from Gloeobacter violaceus can be converted into a fluorescent sensor for chloride. To unlock this non-natural function, a single point mutation at the Schiff counterion position (D121V) was introduced into wtGR fused to cyan fluorescent protein (CFP) resulting in GR1-CFP. Here, we have integrated coevolutionary analysis with directed evolution to understand how the rhodopsin sequence space can be explored and engineered to improve this starting point. We first show how evolutionary couplings are predictive of functional sites in the rhodopsin family and how a fitness metric based on a sequence can be used to quantify the known proton-pumping activities of GR-CFP variants. Then, we couple this ability to predict potential functional outcomes with a screening and selection assay in live Escherichia coli to reduce the mutational search space of five residues along the proton-pumping pathway in GR1-CFP. This iterative selection process results in GR2-CFP with four additional mutations: E132K, A84K, T125C, and V245I. Finally, bulk and single fluorescence measurements in live E. coli reveal that GR2-CFP is a reversible, ratiometric fluorescent sensor for extracellular chloride with an improved dynamic range. We anticipate that our framework will be applicable to other systems, providing a more efficient methodology to engineer fluorescent protein-based sensors with desired properties.
Collapse
|