1
|
Childs H, Guerin N, Zhou P, Donald BR. Protocol for Designing De Novo Noncanonical Peptide Binders in OSPREY. J Comput Biol 2024; 31:965-974. [PMID: 39364612 PMCID: PMC11698684 DOI: 10.1089/cmb.2024.0669] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/05/2024] Open
Abstract
D-peptides, the mirror image of canonical L-peptides, offer numerous biological advantages that make them effective therapeutics. This article details how to use DexDesign, the newest OSPREY-based algorithm, for designing these D-peptides de novo. OSPREY physics-based models precisely mimic energy-equivariant reflection operations, enabling the generation of D-peptide scaffolds from L-peptide templates. Due to the scarcity of D-peptide:L-protein structural data, DexDesign calls a geometric hashing algorithm, Method of Accelerated Search for Tertiary Ensemble Representatives, as a subroutine to produce a synthetic structural dataset. DexDesign enables mixed-chirality designs with a new user interface and also reduces the conformation and sequence search space using three new design techniques: Minimum Flexible Set, Inverse Alanine Scanning, and K*-based Mutational Scanning.
Collapse
Affiliation(s)
- Henry Childs
- Department of Chemistry, Duke University, Durham, North Carolina, USA
| | - Nathan Guerin
- Department of Computer Science, Duke University, Durham, North Carolina, USA
| | - Pei Zhou
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, USA
| | - Bruce R. Donald
- Department of Chemistry, Duke University, Durham, North Carolina, USA
- Department of Computer Science, Duke University, Durham, North Carolina, USA
- Department of Biochemistry, Duke University School of Medicine, Durham, North Carolina, USA
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
2
|
Colom MS, Vučinić J, Adolf‐Bryfogle J, Bowman JW, Verel S, Moczygemba I, Schiex T, Simoncini D, Bahl CD. Complete combinatorial mutational enumeration of a protein functional site enables sequence-landscape mapping and identifies highly-mutated variants that retain activity. Protein Sci 2024; 33:e5109. [PMID: 38989563 PMCID: PMC11237556 DOI: 10.1002/pro.5109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 05/20/2024] [Accepted: 06/25/2024] [Indexed: 07/12/2024]
Abstract
Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride toward achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.
Collapse
Affiliation(s)
- Mireia Solà Colom
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | - Jelena Vučinić
- Université Fédérale de Toulouse, IRIT UMR 5505, ANITI, Université Toulouse CapitoleToulouseFrance
| | - Jared Adolf‐Bryfogle
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
| | - James W. Bowman
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | | | - Isabelle Moczygemba
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| | - Thomas Schiex
- MIAT, Université Fédérale de Toulouse, ANITI, INRAE UR 875ToulouseFrance
| | - David Simoncini
- Université Fédérale de Toulouse, IRIT UMR 5505, ANITI, Université Toulouse CapitoleToulouseFrance
| | - Christopher D. Bahl
- Institute for Protein InnovationBostonMassachusettsUSA
- Division of Hematology/OncologyBoston Children's Hospital, Harvard Medical SchoolBostonMassachusettsUSA
- Present address:
AI ProteinsBostonMassachusettsUSA
| |
Collapse
|
3
|
Colom MS, Vucinic J, Adolf-Bryfogle J, Bowman JW, Verel S, Moczygemba I, Schiex T, Simoncini D, Bahl CD. Complete Combinatorial Mutational Enumeration of a protein functional site enables sequence-landscape mapping and identifies highly-mutated variants that retain activity. RESEARCH SQUARE 2023:rs.3.rs-2248327. [PMID: 36482980 PMCID: PMC9727770 DOI: 10.21203/rs.3.rs-2248327/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
Understanding how proteins evolve under selective pressure is a longstanding challenge. The immensity of the search space has limited efforts to systematically evaluate the impact of multiple simultaneous mutations, so mutations have typically been assessed individually. However, epistasis, or the way in which mutations interact, prevents accurate prediction of combinatorial mutations based on measurements of individual mutations. Here, we use artificial intelligence to define the entire functional sequence landscape of a protein binding site in silico, and we call this approach Complete Combinatorial Mutational Enumeration (CCME). By leveraging CCME, we are able to construct a comprehensive map of the evolutionary connectivity within this functional sequence landscape. As a proof of concept, we applied CCME to the ACE2 binding site of the SARS-CoV-2 spike protein receptor binding domain. We selected representative variants from across the functional sequence landscape for testing in the laboratory. We identified variants that retained functionality to bind ACE2 despite changing over 40% of evaluated residue positions, and the variants now escape binding and neutralization by monoclonal antibodies. This work represents a crucial initial stride towards achieving precise predictions of pathogen evolution, opening avenues for proactive mitigation.
Collapse
Affiliation(s)
- Mireia Solà Colom
- Institute for Protein Innovation; Boston, Massachusetts, 02115, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- current address: AI Proteins; Boston, Massachusetts, 02215, USA
| | - Jelena Vucinic
- Université Fédérale de Toulouse; ANITI, IRIT-CNRS UMR 5505, Université Toulouse Capitole, 31000 Toulouse, France
| | - Jared Adolf-Bryfogle
- Institute for Protein Innovation; Boston, Massachusetts, 02115, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
| | - James W. Bowman
- Institute for Protein Innovation; Boston, Massachusetts, 02115, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- current address: AI Proteins; Boston, Massachusetts, 02215, USA
| | - Sébastien Verel
- Université Littoral Côte d’Opale; UR 4491, LISIC, F-62100 Calais, France
| | - Isabelle Moczygemba
- Institute for Protein Innovation; Boston, Massachusetts, 02115, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- current address: AI Proteins; Boston, Massachusetts, 02215, USA
| | - Thomas Schiex
- Université Fédérale de Toulouse; ANITI, INRAE-UR 875, 31000 Toulouse, France
| | - David Simoncini
- Université Fédérale de Toulouse; ANITI, IRIT-CNRS UMR 5505, Université Toulouse Capitole, 31000 Toulouse, France
| | - Christopher D. Bahl
- Institute for Protein Innovation; Boston, Massachusetts, 02115, USA
- Division of Hematology/Oncology, Boston Children’s Hospital, Harvard Medical School; Boston, Massachusetts, 02115, USA
- current address: AI Proteins; Boston, Massachusetts, 02215, USA
| |
Collapse
|
4
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
5
|
Bouchiba Y, Ruffini M, Schiex T, Barbe S. Computational Design of Miniprotein Binders. Methods Mol Biol 2022; 2405:361-382. [PMID: 35298822 DOI: 10.1007/978-1-0716-1855-4_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Miniprotein binders hold a great interest as a class of drugs that bridges the gap between monoclonal antibodies and small molecule drugs. Like monoclonal antibodies, they can be designed to bind to therapeutic targets with high affinity, but they are more stable and easier to produce and to administer. In this chapter, we present a structure-based computational generic approach for miniprotein inhibitor design. Specifically, we describe step-by-step the implementation of the approach for the design of miniprotein binders against the SARS-CoV-2 coronavirus, using available structural data on the SARS-CoV-2 spike receptor binding domain (RBD) in interaction with its native target, the human receptor ACE2. Structural data being increasingly accessible around many protein-protein interaction systems, this method might be applied to the design of miniprotein binders against numerous therapeutic targets. The computational pipeline exploits provable and deterministic artificial intelligence-based protein design methods, with some recent additions in terms of binding energy estimation, multistate design and diverse library generation.
Collapse
Affiliation(s)
- Younes Bouchiba
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
| | - Manon Ruffini
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, Toulouse, France
| | - Sophie Barbe
- TBI, Université de Toulouse, CNRS, INRAE, INSA, ANITI, Toulouse, France.
| |
Collapse
|
6
|
Defresne M, Barbe S, Schiex T. Protein Design with Deep Learning. Int J Mol Sci 2021; 22:11741. [PMID: 34769173 PMCID: PMC8584038 DOI: 10.3390/ijms222111741] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Revised: 10/23/2021] [Accepted: 10/26/2021] [Indexed: 12/21/2022] Open
Abstract
Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.
Collapse
Affiliation(s)
- Marianne Defresne
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, Université de Toulouse, CNRS, INRAE, INSA, ANITI, 31077 Toulouse, France; (M.D.); (S.B.)
| | - Thomas Schiex
- Université Fédérale de Toulouse, ANITI, INRAE, UR 875, 31326 Toulouse, France
| |
Collapse
|
7
|
Beuvin F, de Givry S, Schiex T, Verel S, Simoncini D. Iterated local search with partition crossover for computational protein design. Proteins 2021; 89:1522-1529. [PMID: 34228826 DOI: 10.1002/prot.26174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2020] [Accepted: 05/25/2021] [Indexed: 11/06/2022]
Abstract
Structure-based computational protein design (CPD) refers to the problem of finding a sequence of amino acids which folds into a specific desired protein structure, and possibly fulfills some targeted biochemical properties. Recent studies point out the particularly rugged CPD energy landscape, suggesting that local search optimization methods should be designed and tuned to easily escape local minima attraction basins. In this article, we analyze the performance and search dynamics of an iterated local search (ILS) algorithm enhanced with partition crossover. Our algorithm, PILS, quickly finds local minima and escapes their basins of attraction by solution perturbation. Additionally, the partition crossover operator exploits the structure of the residue interaction graph in order to efficiently mix solutions and find new unexplored basins. Our results on a benchmark of 30 proteins of various topology and size show that PILS consistently finds lower energy solutions compared to Rosetta fixbb and a classic ILS, and that the corresponding sequences are mostly closer to the native.
Collapse
Affiliation(s)
- François Beuvin
- IRIT UMR 5505-CNRS, Université de Toulouse I Capitole, Toulouse, France.,Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France
| | - Simon de Givry
- Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France.,MIAT, Université de Toulouse, INRAE, UR 875, Toulouse, France
| | - Thomas Schiex
- Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France.,MIAT, Université de Toulouse, INRAE, UR 875, Toulouse, France
| | | | - David Simoncini
- IRIT UMR 5505-CNRS, Université de Toulouse I Capitole, Toulouse, France.,Artificial and Natural Intelligence Toulouse Institute, ANITI, Toulouse, France
| |
Collapse
|
8
|
Bouchiba Y, Cortés J, Schiex T, Barbe S. Molecular flexibility in computational protein design: an algorithmic perspective. Protein Eng Des Sel 2021; 34:6271252. [PMID: 33959778 DOI: 10.1093/protein/gzab011] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Revised: 03/12/2021] [Accepted: 03/29/2021] [Indexed: 12/19/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique for engineering new proteins, with both great fundamental implications and diverse practical interests. However, the approximations usually made for computational efficiency, using a single fixed backbone and a discrete set of side chain rotamers, tend to produce rigid and hyper-stable folds that may lack functionality. These approximations contrast with the demonstrated importance of molecular flexibility and motions in a wide range of protein functions. The integration of backbone flexibility and multiple conformational states in CPD, in order to relieve the inaccuracies resulting from these simplifications and to improve design reliability, are attracting increased attention. However, the greatly increased search space that needs to be explored in these extensions defines extremely challenging computational problems. In this review, we outline the principles of CPD and discuss recent effort in algorithmic developments for incorporating molecular flexibility in the design process.
Collapse
Affiliation(s)
- Younes Bouchiba
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France.,Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Juan Cortés
- Laboratoire d'Analyse et d'Architecture des Systèmes, LAAS CNRS, Université de Toulouse, CNRS, Toulouse 31400, France
| | - Thomas Schiex
- Université de Toulouse, ANITI, INRAE, UR MIAT, F-31320, Castanet-Tolosan, France
| | - Sophie Barbe
- Toulouse Biotechnology Institute, TBI, CNRS, INRAE, INSA, ANITI, Toulouse 31400, France
| |
Collapse
|
9
|
Cheng S, Ma L, Lu H, Lei X, Shi Y. Evolutionary computation for solving search-based data analytics problems. Artif Intell Rev 2020. [DOI: 10.1007/s10462-020-09882-x] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
10
|
Surpeta B, Sequeiros-Borja CE, Brezovsky J. Dynamics, a Powerful Component of Current and Future in Silico Approaches for Protein Design and Engineering. Int J Mol Sci 2020; 21:E2713. [PMID: 32295283 PMCID: PMC7215530 DOI: 10.3390/ijms21082713] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/10/2020] [Accepted: 04/12/2020] [Indexed: 12/13/2022] Open
Abstract
Computational prediction has become an indispensable aid in the processes of engineering and designing proteins for various biotechnological applications. With the tremendous progress in more powerful computer hardware and more efficient algorithms, some of in silico tools and methods have started to apply the more realistic description of proteins as their conformational ensembles, making protein dynamics an integral part of their prediction workflows. To help protein engineers to harness benefits of considering dynamics in their designs, we surveyed new tools developed for analyses of conformational ensembles in order to select engineering hotspots and design mutations. Next, we discussed the collective evolution towards more flexible protein design methods, including ensemble-based approaches, knowledge-assisted methods, and provable algorithms. Finally, we highlighted apparent challenges that current approaches are facing and provided our perspectives on their further development.
Collapse
Affiliation(s)
- Bartłomiej Surpeta
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| | - Carlos Eduardo Sequeiros-Borja
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| | - Jan Brezovsky
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| |
Collapse
|
11
|
Tan Y, Shi Y, Tuba M. Swarm Intelligence in Data Science: Applications, Opportunities and Challenges. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7354777 DOI: 10.1007/978-3-030-53956-6_1] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The Swarm Intelligence (SI) algorithms have been proved to be a comprehensive method to solve complex optimization problems by simulating the emergence behaviors of biological swarms. Nowadays, data science is getting more and more attention, which needs quick management and analysis of massive data. Most traditional methods can only be applied to continuous and differentiable functions. As a set of population-based approaches, it is proven by some recent research works that the SI algorithms have great potential for relevant tasks in this field. In order to gather better insight into the utilization of these methods in data science and to provide a further reference for future researches, this paper focuses on the relationship between data science and swarm intelligence. After introducing the mainstream swarm intelligence algorithms and their common characteristics, both the theoretical and real-world applications in the literature which utilize the swarm intelligence to the related domains of data analytics are reviewed. Based on the summary of the existing works, this paper also analyzes the opportunities and challenges in this field, which attempts to shed some light on designing more effective algorithms to solve the problems in data science for real-world applications.
Collapse
Affiliation(s)
- Ying Tan
- Peking University, Beijing, China
| | - Yuhui Shi
- Southern University of Science and Technology, Shenzhen, China
| | | |
Collapse
|