1
|
Colom MS, Vučinić J, Adolf‐Bryfogle J, Bowman JW, Verel S, Moczygemba I, Schiex T, Simoncini D, Bahl CD. Complete combinatorial mutational enumeration of a protein functional site enables sequence-landscape mapping and identifies highly-mutated variants that retain activity. Protein Sci 2024; 33:e5109. [PMID: 38989563 PMCID: PMC11237556 DOI: 10.1002/pro.5109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 05/20/2024] [Accepted: 06/25/2024] [Indexed: 07/12/2024]
Abstract
Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.
Collapse
Affiliation(s)
- Mireia Solà Colom
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | - Jelena Vučinić
- Université Fédérale de Toulouse, IRIT UMR 5505, ANITI, Université Toulouse CapitoleToulouseFrance
| | - Jared Adolf‐Bryfogle
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
| | - James W. Bowman
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | | | - Isabelle Moczygemba
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | - Thomas Schiex
- MIAT, Université Fédérale de Toulouse, ANITI, INRAE UR 875ToulouseFrance
| | - David Simoncini
- Université Fédérale de Toulouse, IRIT UMR 5505, ANITI, Université Toulouse CapitoleToulouseFrance
| | - Christopher D. Bahl
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| |
Collapse
|
2
|
Guerin N, Childs H, Zhou P, Donald BR. DexDesign: A new OSPREY-based algorithm for designing de novo D-peptide inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.12.579944. [PMID: 38405797 PMCID: PMC10888900 DOI: 10.1101/2024.02.12.579944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/27/2024]
Abstract
With over 270 unique occurrences in the human genome, peptide-recognizing PDZ domains play a central role in modulating polarization, signaling, and trafficking pathways. Mutations in PDZ domains lead to diseases such as cancer and cystic fibrosis, making PDZ domains attractive targets for therapeutic intervention. D-peptide inhibitors offer unique advantages as therapeutics, including increased metabolic stability and low immunogenicity. Here, we introduce DexDesign, a novel OSPREY-based algorithm for computationally designing de novo D-peptide inhibitors. DexDesign leverages three novel techniques that are broadly applicable to computational protein design: the Minimum Flexible Set, K*-based Mutational Scan, and Inverse Alanine Scan, which enable exponential reductions in the size of the peptide sequence search space. We apply these techniques and DexDesign to generate novel D-peptide inhibitors of two biomedically important PDZ domain targets: CAL and MAST2. We introduce a new framework for analyzing de novo peptides-evaluation along a replication/restitution axis-and apply it to the DexDesign-generated D-peptides. Notably, the peptides we generated are predicted to bind their targets tighter than their targets' endogenous ligands, validating the peptides' potential as lead therapeutic candidates. We provide an implementation of DexDesign in the free and open source computational protein design software OSPREY.
Collapse
|
3
|
Rennella E, Sahtoe DD, Baker D, Kay LE. Exploiting conformational dynamics to modulate the function of designed proteins. Proc Natl Acad Sci U S A 2023; 120:e2303149120. [PMID: 37094170 PMCID: PMC10161014 DOI: 10.1073/pnas.2303149120] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Accepted: 03/22/2023] [Indexed: 04/26/2023] Open
Abstract
With the recent success in calculating protein structures from amino acid sequences using artificial intelligence-based algorithms, an important next step is to decipher how dynamics is encoded by the primary protein sequence so as to better predict function. Such dynamics information is critical for protein design, where strategies could then focus not only on sequences that fold into particular structures that perform a given task, but would also include low-lying excited protein states that could influence the function of the designed protein. Herein, we illustrate the importance of dynamics in modulating the function of C34, a designed α/β protein that captures β-strands of target ligands and is a member of a family of proteins designed to sequester β-strands and β hairpins of aggregation-prone molecules that lead to a variety of pathologies. Using a strategy to "see" regions of apo C34 that are invisible to NMR spectroscopy as a result of pervasive conformational exchange, as well as a mutagenesis approach whereby C34 molecules are stabilized into a single conformer, we determine the structures of the predominant conformations that are sampled by C34 and show that these attenuate the affinity for cognate peptide. Subsequently, the observed motion is exploited to develop an allosterically regulated peptide binder whose binding affinity can be controlled through the addition of a second molecule. Our study emphasizes the unique role that NMR can play in directing the design process and in the construction of new molecules with more complex functionality.
Collapse
Affiliation(s)
- Enrico Rennella
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Chemistry, University of Toronto, Toronto, ONM5S 3H6, Canada
| | - Danny D. Sahtoe
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA98195
- Institute for Protein Design, University of Washington, Seattle, WA98195
- HHMI, University of Washington, Seattle, WA98195
| | - Lewis E. Kay
- Department of Molecular Genetics, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Biochemistry, University of Toronto, Toronto, ONM5S 1A8, Canada
- Department of Chemistry, University of Toronto, Toronto, ONM5S 3H6, Canada
- Program in Molecular Medicine, The Hospital for Sick Children Research Institute, Toronto, ONM5G 0A4, Canada
| |
Collapse
|
4
|
Chen D, Chen Z, He Z, Gao J, Su Z. Learning heuristics for weighted CSPs through deep reinforcement learning. APPL INTELL 2022. [DOI: 10.1007/s10489-022-03992-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
5
|
Bouchiba Y, Esque J, Cottret L, Maréchaux M, Gaston M, Gasciolli V, Keller J, Nouwen N, Gully D, Arrighi J, Gough C, Lefebvre B, Barbe S, Bono J. An integrated approach reveals how lipo‐chitooligosaccharides interact with the lysin motif receptor‐like kinase
MtLYR3. Protein Sci 2022; 31:e4327. [PMID: 35634776 PMCID: PMC9115844 DOI: 10.1002/pro.4327] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 04/19/2022] [Accepted: 04/20/2022] [Indexed: 11/22/2022]
Abstract
N‐acetylglucosamine containing compounds acting as pathogenic or symbiotic signals are perceived by plant‐specific Lysin Motif Receptor‐Like Kinases (LysM‐RLKs). The molecular mechanisms of this perception are not fully understood, notably those of lipo‐chitooligosaccharides (LCOs) produced during root endosymbioses with nitrogen‐fixing bacteria or arbuscular mycorrhizal fungi. In Medicago truncatula, we previously identified the LysM‐RLK LYR3 (MtLYR3) as a specific LCO‐binding protein. We also showed that the absence of LCO binding to LYR3 of the non‐mycorrhizal Lupinus angustifolius, (LanLYR3), was related to LysM3, which differs from that of MtLYR3 by several amino acids and, particularly, by a critical tyrosine residue absent in LanLYR3. Here, we aimed to define the LCO binding site of MtLYR3 by using molecular modelling and simulation approaches, combined with site‐directed mutagenesis and LCO binding experiments. 3D models of MtLYR3 and LanLYR3 ectodomains were built, and homology modelling and molecular dynamics (MD) simulations were performed. Molecular docking and MD simulation on the LysM3 identified potential key residues for LCO binding. We highlighted by steered MD simulations that in addition to the critical tyrosine, two other residues were important for LCO binding in MtLYR3. Substitution of these residues in LanLYR3‐LysM3 by those of MtLYR3‐LysM3 allowed the recovery of high‐affinity LCO binding in experimental radioligand‐binding assays. An analysis of selective constraints revealed that the critical tyrosine has experienced positive selection pressure and is absent in some LYR3 proteins. These findings now pave the way to uncover the functional significance of this specific evolutionary pattern.
Collapse
Affiliation(s)
- Younes Bouchiba
- TBI, Université de Toulouse CNRS, INRAE, INSA Toulouse France
| | - Jérémy Esque
- TBI, Université de Toulouse CNRS, INRAE, INSA Toulouse France
| | - Ludovic Cottret
- LIPME, Université de Toulouse INRAE, CNRS Castanet‐Tolosan France
| | - Maude Maréchaux
- LIPME, Université de Toulouse INRAE, CNRS Castanet‐Tolosan France
| | - Mégane Gaston
- LIPME, Université de Toulouse INRAE, CNRS Castanet‐Tolosan France
| | | | - Jean Keller
- Laboratoire de Recherche en Sciences Végétales Université de Toulouse, CNRS, UPS Castanet‐Tolosan France
| | - Nico Nouwen
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM) UMR IRD/SupAgro/INRAE/UM/CIRAD Montpellier France
| | - Djamel Gully
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM) UMR IRD/SupAgro/INRAE/UM/CIRAD Montpellier France
| | - Jean‐François Arrighi
- IRD, Laboratoire des Symbioses Tropicales et Méditerranéennes (LSTM) UMR IRD/SupAgro/INRAE/UM/CIRAD Montpellier France
| | - Clare Gough
- LIPME, Université de Toulouse INRAE, CNRS Castanet‐Tolosan France
| | - Benoit Lefebvre
- LIPME, Université de Toulouse INRAE, CNRS Castanet‐Tolosan France
| | - Sophie Barbe
- TBI, Université de Toulouse CNRS, INRAE, INSA Toulouse France
| | | |
Collapse
|
6
|
Marchand B, Ponty Y, Bulteau L. Tree diet: reducing the treewidth to unlock FPT algorithms in RNA bioinformatics. Algorithms Mol Biol 2022; 17:8. [PMID: 35366923 PMCID: PMC8976393 DOI: 10.1186/s13015-022-00213-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 03/01/2022] [Indexed: 11/25/2022] Open
Abstract
Hard graph problems are ubiquitous in Bioinformatics, inspiring the design of specialized Fixed-Parameter Tractable algorithms, many of which rely on a combination of tree-decomposition and dynamic programming. The time/space complexities of such approaches hinge critically on low values for the treewidth tw of the input graph. In order to extend their scope of applicability, we introduce the Tree-Diet problem, i.e. the removal of a minimal set of edges such that a given tree-decomposition can be slimmed down to a prescribed treewidth \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$tw'$$\end{document}tw′. Our rationale is that the time gained thanks to a smaller treewidth in a parameterized algorithm compensates the extra post-processing needed to take deleted edges into account. Our core result is an FPT dynamic programming algorithm for Tree-Diet, using \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$2^{O(tw)}n$$\end{document}2O(tw)n time and space. We complement this result with parameterized complexity lower-bounds for stronger variants (e.g., NP-hardness when \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$tw'$$\end{document}tw′ or \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$tw-tw'$$\end{document}tw-tw′ is constant). We propose a prototype implementation for our approach which we apply on difficult instances of selected RNA-based problems: RNA design, sequence-structure alignment, and search of pseudoknotted RNAs in genomes, revealing very encouraging results. This work paves the way for a wider adoption of tree-decomposition-based algorithms in Bioinformatics.
Collapse
|
7
|
Bouchiba Y, Ruffini M, Schiex T, Barbe S. Computational Design of Miniprotein Binders. Methods Mol Biol 2022; 2405:361-382. [PMID: 35298822 DOI: 10.1007/978-1-0716-1855-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.
Collapse
Affiliation(s)
- Younes Bouchiba
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
| | - Manon Ruffini
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Sophie Barbe
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France.
| |
Collapse
|
8
|
Yagi S, Padhi AK, Vucinic J, Barbe S, Schiex T, Nakagawa R, Simoncini D, Zhang KYJ, Tagami S. Seven Amino Acid Types Suffice to Create the Core Fold of RNA Polymerase. J Am Chem Soc 2021; 143:15998-16006. [PMID: 34559526 DOI: 10.1021/jacs.1c05367] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The extant complex proteins must have evolved from ancient short and simple ancestors. The double-ψ β-barrel (DPBB) is one of the oldest protein folds and conserved in various fundamental enzymes, such as the core domain of RNA polymerase. Here, by reverse engineering a modern DPBB domain, we reconstructed its plausible evolutionary pathway started by "interlacing homodimerization" of a half-size peptide, followed by gene duplication and fusion. Furthermore, by simplifying the amino acid repertoire of the peptide, we successfully created the DPBB fold with only seven amino acid types (Ala, Asp, Glu, Gly, Lys, Arg, and Val), which can be coded by only GNN and ARR (R = A or G) codons in the modern translation system. Thus, the DPBB fold could have been materialized by the early translation system and genetic code.
Collapse
Affiliation(s)
- Sota Yagi
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Aditya K Padhi
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Jelena Vucinic
- Université Fédérale de Toulouse, ANITI, INRAE-UR 875, 31000 Toulouse, France.,TBI, Université Fédérale de Toulouse, CNRS, INRAE, INSA, ANITI, 31000 Toulouse, France.,Université Fédérale de Toulouse, ANITI, IRIT-UMR 5505, 31000 Toulouse, France
| | - Sophie Barbe
- TBI, Université Fédérale de Toulouse, CNRS, INRAE, INSA, ANITI, 31000 Toulouse, France
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE-UR 875, 31000 Toulouse, France
| | - Reiko Nakagawa
- RIKEN Center for Biosystems Dynamics Research, 2-2-3 Minatojima-minamimachi, Chuo-ku, Kobe, Hyogo 650-0047, Japan
| | - David Simoncini
- Université Fédérale de Toulouse, ANITI, IRIT-UMR 5505, 31000 Toulouse, France
| | - Kam Y J Zhang
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Shunsuke Tagami
- RIKEN Center for Biosystems Dynamics Research, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
9
|
Nazet J, Lang E, Merkl R. Rosetta:MSF:NN: Boosting performance of multi-state computational protein design with a neural network. PLoS One 2021; 16:e0256691. [PMID: 34437621 PMCID: PMC8389498 DOI: 10.1371/journal.pone.0256691] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2020] [Accepted: 08/12/2021] [Indexed: 12/05/2022] Open
Abstract
Rational protein design aims at the targeted modification of existing proteins. To reach this goal, software suites like Rosetta propose sequences to introduce the desired properties. Challenging design problems necessitate the representation of a protein by means of a structural ensemble. Thus, Rosetta multi-state design (MSD) protocols have been developed wherein each state represents one protein conformation. Computational demands of MSD protocols are high, because for each of the candidate sequences a costly three-dimensional (3D) model has to be created and assessed for all states. Each of these scores contributes one data point to a complex, design-specific energy landscape. As neural networks (NN) proved well-suited to learn such solution spaces, we integrated one into the framework Rosetta:MSF instead of the so far used genetic algorithm with the aim to reduce computational costs. As its predecessor, Rosetta:MSF:NN administers a set of candidate sequences and their scores and scans sequence space iteratively. During each iteration, the union of all candidate sequences and their Rosetta scores are used to re-train NNs that possess a design-specific architecture. The enormous speed of the NNs allows an extensive assessment of alternative sequences, which are ranked on the scores predicted by the NN. Costly 3D models are computed only for a small fraction of best-scoring sequences; these and the corresponding 3D-based scores replace half of the candidate sequences during each iteration. The analysis of two sets of candidate sequences generated for a specific design problem by means of a genetic algorithm confirmed that the NN predicted 3D-based scores quite well; the Pearson correlation coefficient was at least 0.95. Applying Rosetta:MSF:NN:enzdes to a benchmark consisting of 16 ligand-binding problems showed that this protocol converges ten-times faster than the genetic algorithm and finds sequences with comparable scores.
Collapse
Affiliation(s)
- Julian Nazet
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Elmar Lang
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|
10
|
Michael E, Simonson T. How much can physics do for protein design? Curr Opin Struct Biol 2021; 72:46-54. [PMID: 34461593 DOI: 10.1016/j.sbi.2021.07.011] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2021] [Revised: 07/22/2021] [Accepted: 07/25/2021] [Indexed: 01/03/2023]
Abstract
Physics and physical chemistry are an important thread in computational protein design, complementary to knowledge-based tools. They provide molecular mechanics scoring functions that need little or no ad hoc parameter readjustment, methods to thoroughly sample equilibrium ensembles, and different levels of approximation for conformational flexibility. They led recently to the successful redesign of a small protein using a physics-based folded state energy. Adaptive Monte Carlo or molecular dynamics schemes were discovered where protein variants are populated as per their ligand-binding free energy or catalytic efficiency. Molecular dynamics have been used for backbone flexibility. Implicit solvent models have been refined, polarizable force fields applied, and many physical insights obtained.
Collapse
Affiliation(s)
- Eleni Michael
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biologie Structurale de la Cellule (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France.
| |
Collapse
|
11
|
Woolfson DN. A Brief History of De Novo Protein Design: Minimal, Rational, and Computational. J Mol Biol 2021; 433:167160. [PMID: 34298061 DOI: 10.1016/j.jmb.2021.167160] [Citation(s) in RCA: 69] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 07/07/2021] [Accepted: 07/12/2021] [Indexed: 12/26/2022]
Abstract
Protein design has come of age, but how will it mature? In the 1980s and the 1990s, the primary motivation for de novo protein design was to test our understanding of the informational aspect of the protein-folding problem; i.e., how does protein sequence determine protein structure and function? This necessitated minimal and rational design approaches whereby the placement of each residue in a design was reasoned using chemical principles and/or biochemical knowledge. At that time, though with some notable exceptions, the use of computers to aid design was not widespread. Over the past two decades, the tables have turned and computational protein design is firmly established. Here, I illustrate this progress through a timeline of de novo protein structures that have been solved to atomic resolution and deposited in the Protein Data Bank. From this, it is clear that the impact of rational and computational design has been considerable: More-complex and more-sophisticated designs are being targeted with many being resolved to atomic resolution. Furthermore, our ability to generate and manipulate synthetic proteins has advanced to a point where they are providing realistic alternatives to natural protein functions for applications both in vitro and in cells. Also, and increasingly, computational protein design is becoming accessible to non-specialists. This all begs the questions: Is there still a place for minimal and rational design approaches? And, what challenges lie ahead for the burgeoning field of de novo protein design as a whole?
Collapse
Affiliation(s)
- Derek N Woolfson
- School of Chemistry, University of Bristol, Cantock's Close, Bristol BS8 1TS, UK; School of Biochemistry, University of Bristol, Biomedical Sciences Building, University Walk, Bristol BS8 1TD, UK; Bristol BioDesign Institute, University of Bristol, Life Sciences Building, Tyndall Avenue, Bristol BS8 1TQ, UK.
| |
Collapse
|
12
|
Beuvin F, de Givry S, Schiex T, Verel S, Simoncini D. Iterated local search with partition crossover for computational protein design. Proteins 2021; 89:1522-1529. [PMID: 34228826 DOI: 10.1002/prot.26174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 05/25/2021] [Indexed: 11/06/2022]
Abstract
Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this article, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.
Collapse
Affiliation(s)
- François Beuvin
- IRIT UMR 5505-CNRS, Université de Toulouse I Capitole, Toulouse, France.,Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France
| | - Simon de Givry
- Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France.,MIAT, Université de Toulouse, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France.,MIAT, Université de Toulouse, INRAE, UR 875, Toulouse, France
| | | | - David Simoncini
- IRIT UMR 5505-CNRS, Université de Toulouse I Capitole, Toulouse, France.,Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France
| |
Collapse
|
13
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|