1
|
Lutz ID, Wang S, Norn C, Courbet A, Borst AJ, Zhao YT, Dosey A, Cao L, Xu J, Leaf EM, Treichel C, Litvicov P, Li Z, Goodson AD, Rivera-Sánchez P, Bratovianu AM, Baek M, King NP, Ruohola-Baker H, Baker D. Top-down design of protein architectures with reinforcement learning. Science 2023; 380:266-273. [PMID: 37079676 DOI: 10.1126/science.adf6591] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 03/21/2023] [Indexed: 04/22/2023]
Abstract
As a result of evolutionary selection, the subunits of naturally occurring protein assemblies often fit together with substantial shape complementarity to generate architectures optimal for function in a manner not achievable by current design approaches. We describe a "top-down" reinforcement learning-based design approach that solves this problem using Monte Carlo tree search to sample protein conformers in the context of an overall architecture and specified functional constraints. Cryo-electron microscopy structures of the designed disk-shaped nanopores and ultracompact icosahedra are very close to the computational models. The icosohedra enable very-high-density display of immunogens and signaling molecules, which potentiates vaccine response and angiogenesis induction. Our approach enables the top-down design of complex protein nanomaterials with desired system properties and demonstrates the power of reinforcement learning in protein design.
Collapse
Affiliation(s)
- Isaac D Lutz
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Shunzhi Wang
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Christoffer Norn
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- BioInnovation Institute, DK2200 Copenhagen N, Denmark
| | - Alexis Courbet
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA, USA
| | - Andrew J Borst
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Yan Ting Zhao
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Oral Health Sciences, University of Washington, Seattle, WA, USA
| | - Annie Dosey
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Longxing Cao
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, Hangzhou, Zhejiang, China
| | - Jinwei Xu
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Elizabeth M Leaf
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Catherine Treichel
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Patrisia Litvicov
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
| | - Zhe Li
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Alexander D Goodson
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | | | | | - Minkyung Baek
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- School of Biological Sciences, Seoul National University, Seoul, Republic of Korea
| | - Neil P King
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
| | - Hannele Ruohola-Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
- Oral Health Sciences, University of Washington, Seattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA, USA
- Institute for Protein Design, University of Washington, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
2
|
Evolutionary Conserved Short Linear Motifs Provide Insights into the Cellular Response to Stress. Antioxidants (Basel) 2022; 12:antiox12010096. [PMID: 36670957 PMCID: PMC9854524 DOI: 10.3390/antiox12010096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 11/22/2022] [Accepted: 12/22/2022] [Indexed: 01/03/2023] Open
Abstract
Short linear motifs (SLiMs) are evolutionarily conserved functional modules of proteins composed of 3 to 10 residues and involved in multiple cellular functions. Here, we performed a search for SLiMs that exert sequence similarity to two segments of alpha-fetoprotein (AFP), a major mammalian embryonic and cancer-associated protein. Biological activities of the peptides, LDSYQCT (AFP14-20) and EMTPVNPGV (GIP-9), have been previously confirmed under in vitro and in vivo conditions. In our study, we retrieved a vast array of proteins that contain SLiMs of interest from both prokaryotic and eukaryotic species, including viruses, bacteria, archaea, invertebrates, and vertebrates. Comprehensive Gene Ontology enrichment analysis showed that proteins from multiple functional classes, including enzymes, transcription factors, as well as those involved in signaling, cell cycle, and quality control, and ribosomal proteins were implicated in cellular adaptation to environmental stress conditions. These include response to oxidative and metabolic stress, hypoxia, DNA and RNA damage, protein degradation, as well as antimicrobial, antiviral, and immune response. Thus, our data enabled insights into the common functions of SLiMs evolutionary conserved across all taxonomic categories. These SLiMs can serve as important players in cellular adaptation to stress, which is crucial for cell functioning.
Collapse
|
3
|
Short Linear Motifs in Colorectal Cancer Interactome and Tumorigenesis. Cells 2022; 11:cells11233739. [PMID: 36496998 PMCID: PMC9737320 DOI: 10.3390/cells11233739] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2022] [Revised: 11/16/2022] [Accepted: 11/21/2022] [Indexed: 11/25/2022] Open
Abstract
Colorectal tumorigenesis is driven by alterations in genes and proteins responsible for cancer initiation, progression, and invasion. This multistage process is based on a dense network of protein-protein interactions (PPIs) that become dysregulated as a result of changes in various cell signaling effectors. PPIs in signaling and regulatory networks are known to be mediated by short linear motifs (SLiMs), which are conserved contiguous regions of 3-10 amino acids within interacting protein domains. SLiMs are the minimum sequences required for modulating cellular PPI networks. Thus, several in silico approaches have been developed to predict and analyze SLiM-mediated PPIs. In this review, we focus on emerging evidence supporting a crucial role for SLiMs in driver pathways that are disrupted in colorectal cancer (CRC) tumorigenesis and related PPI network alterations. As a result, SLiMs, along with short peptides, are attracting the interest of researchers to devise small molecules amenable to be used as novel anti-CRC targeted therapies. Overall, the characterization of SLiMs mediating crucial PPIs in CRC may foster the development of more specific combined pharmacological approaches.
Collapse
|
4
|
Swanson S, Sivaraman V, Grigoryan G, Keating AE. Tertiary motifs as building blocks for the design of protein-binding peptides. Protein Sci 2022; 31:e4322. [PMID: 35634780 PMCID: PMC9088223 DOI: 10.1002/pro.4322] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Revised: 04/12/2022] [Accepted: 04/14/2022] [Indexed: 11/07/2022]
Abstract
Despite advances in protein engineering, the de novo design of small proteins or peptides that bind to a desired target remains a difficult task. Most computational methods search for binder structures in a library of candidate scaffolds, which can lead to designs with poor target complementarity and low success rates. Instead of choosing from pre-defined scaffolds, we propose that custom peptide structures can be constructed to complement a target surface. Our method mines tertiary motifs (TERMs) from known structures to identify surface-complementing fragments or "seeds." We combine seeds that satisfy geometric overlap criteria to generate peptide backbones and score the backbones to identify the most likely binding structures. We found that TERM-based seeds can describe known binding structures with high resolution: the vast majority of peptide binders from 486 peptide-protein complexes can be covered by seeds generated from single-chain structures. Furthermore, we demonstrate that known peptide structures can be reconstructed with high accuracy from peptide-covering seeds. As a proof of concept, we used our method to design 100 peptide binders of TRAF6, seven of which were predicted by Rosetta to form higher-quality interfaces than a native binder. The designed peptides interact with distinct sites on TRAF6, including the native peptide-binding site. These results demonstrate that known peptide-binding structures can be constructed from TERMs in single-chain structures and suggest that TERM information can be applied to efficiently design novel target-complementing binders.
Collapse
Affiliation(s)
- Sebastian Swanson
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Venkatesh Sivaraman
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| | - Gevorg Grigoryan
- Department of Computer ScienceDartmouth CollegeHanoverNew HampshireUSA
| | - Amy E. Keating
- Department of BiologyMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Department of Biological EngineeringMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
- Koch Center for Integrative Cancer ResearchMassachusetts Institute of TechnologyCambridgeMassachusettsUSA
| |
Collapse
|
5
|
Sologova SS, Zavadskiy SP, Mokhosoev IM, Moldogazieva NT. Short Linear Motifs Orchestrate Functioning of Human Proteins during Embryonic Development, Redox Regulation, and Cancer. Metabolites 2022; 12:metabo12050464. [PMID: 35629968 PMCID: PMC9144484 DOI: 10.3390/metabo12050464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/18/2022] [Accepted: 05/19/2022] [Indexed: 11/16/2022] Open
Abstract
Short linear motifs (SLiMs) are evolutionarily conserved functional modules of proteins that represent amino acid stretches composed of 3 to 10 residues. The biological activities of two short peptide segments of human alpha-fetoprotein (AFP), a major embryo-specific and cancer-related protein, have been confirmed experimentally. This is a heptapeptide segment LDSYQCT in domain I designated as AFP14–20 and a nonapeptide segment EMTPVNPGV in domain III designated as GIP-9. In our work, we searched the UniprotKB database for human proteins that contain SLiMs with sequence similarity to the both segments of human AFP and undertook gene ontology (GO)-based functional categorization of retrieved proteins. Gene set enrichment analysis included GO terms for biological process, molecular function, metabolic pathway, KEGG pathway, and protein–protein interaction (PPI) categories. We identified the SLiMs of interest in a variety of non-homologous proteins involved in multiple cellular processes underlying embryonic development, cancer progression, and, unexpectedly, the regulation of redox homeostasis. These included transcription factors, cell adhesion proteins, ubiquitin-activating and conjugating enzymes, cell signaling proteins, and oxidoreductase enzymes. They function by regulating cell proliferation and differentiation, cell cycle, DNA replication/repair/recombination, metabolism, immune/inflammatory response, and apoptosis. In addition to the retrieved genes, new interacting genes were identified. Our data support the hypothesis that conserved SLiMs are incorporated into non-homologous proteins to serve as functional blocks for their orchestrated functioning.
Collapse
Affiliation(s)
- Susanna S. Sologova
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
| | - Sergey P. Zavadskiy
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
| | - Innokenty M. Mokhosoev
- Department of Biochemistry and Molecular Biology, Pirogov Russian National Research Medical University, 117997 Moscow, Russia;
| | - Nurbubu T. Moldogazieva
- Nelyubin Institute of Pharmacy, Sechenov First Moscow State Medical University, (Sechenov University), 119991 Moscow, Russia; (S.S.S.); (S.P.Z.)
- Correspondence:
| |
Collapse
|
6
|
Feng Q, Hou M, Liu J, Zhao K, Zhang G. Construct a variable-length fragment library for de novo protein structure prediction. Brief Bioinform 2022; 23:6547572. [PMID: 35284936 DOI: 10.1093/bib/bbac086] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2022] [Revised: 02/10/2022] [Accepted: 02/20/2022] [Indexed: 11/12/2022] Open
Abstract
Although remarkable achievements, such as AlphaFold2, have been made in end-to-end structure prediction, fragment libraries remain essential for de novo protein structure prediction, which can help explore and understand the protein-folding mechanism. In this work, we developed a variable-length fragment library (VFlib). In VFlib, a master structure database was first constructed from the Protein Data Bank through sequence clustering. The hidden Markov model (HMM) profile of each protein in the master structure database was generated by HHsuite, and the secondary structure of each protein was calculated by DSSP. For the query sequence, the HMM-profile was first constructed. Then, variable-length fragments were retrieved from the master structure database through dynamically variable-length profile-profile comparison. A complete method for chopping the query HMM-profile during this process was proposed to obtain fragments with increased diversity. Finally, secondary structure information was used to further screen the retrieved fragments to generate the final fragment library of specific query sequence. The experimental results obtained with a set of 120 nonredundant proteins show that the global precision and coverage of the fragment library generated by VFlib were 55.04% and 94.95% at the RMSD cutoff of 1.5 Å, respectively. Compared with the benchmark method of NNMake, the global precision of our fragment library had increased by 62.89% with equivalent coverage. Furthermore, the fragments generated by VFlib and NNMake were used to predict structure models through fragment assembly. Controlled experimental results demonstrate that the average TM-score of VFlib was 16.00% higher than that of NNMake.
Collapse
Affiliation(s)
- Qiongqiong Feng
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Minghua Hou
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Kailong Zhao
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Guijun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
7
|
Zhou J, Panaitiu AE, Grigoryan G. A general-purpose protein design framework based on mining sequence-structure relationships in known protein structures. Proc Natl Acad Sci U S A 2020; 117:1059-1068. [PMID: 31892539 PMCID: PMC6969538 DOI: 10.1073/pnas.1908723117] [Citation(s) in RCA: 61] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Current state-of-the-art approaches to computational protein design (CPD) aim to capture the determinants of structure from physical principles. While this has led to many successful designs, it does have strong limitations associated with inaccuracies in physical modeling, such that a reliable general solution to CPD has yet to be found. Here, we propose a design framework-one based on identifying and applying patterns of sequence-structure compatibility found in known proteins, rather than approximating them from models of interatomic interactions. We carry out extensive computational analyses and an experimental validation for our method. Our results strongly argue that the Protein Data Bank is now sufficiently large to enable proteins to be designed by using only examples of structural motifs from unrelated proteins. Because our method is likely to have orthogonal strengths relative to existing techniques, it could represent an important step toward removing remaining barriers to robust CPD.
Collapse
Affiliation(s)
- Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, NH 03755
| | | | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755;
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755
| |
Collapse
|
8
|
Blanco JD, Radusky L, Climente-González H, Serrano L. FoldX accurate structural protein-DNA binding prediction using PADA1 (Protein Assisted DNA Assembly 1). Nucleic Acids Res 2019; 46:3852-3863. [PMID: 29608705 PMCID: PMC5934639 DOI: 10.1093/nar/gky228] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Accepted: 03/20/2018] [Indexed: 12/20/2022] Open
Abstract
The speed at which new genomes are being sequenced highlights the need for genome-wide methods capable of predicting protein–DNA interactions. Here, we present PADA1, a generic algorithm that accurately models structural complexes and predicts the DNA-binding regions of resolved protein structures. PADA1 relies on a library of protein and double-stranded DNA fragment pairs obtained from a training set of 2103 DNA–protein complexes. It includes a fast statistical force field computed from atom-atom distances, to evaluate and filter the 3D docking models. Using published benchmark validation sets and 212 DNA–protein structures published after 2016 we predicted the DNA-binding regions with an RMSD of <1.8 Å per residue in >95% of the cases. We show that the quality of the docked templates is compatible with FoldX protein design tool suite to identify the crystallized DNA molecule sequence as the most energetically favorable in 80% of the cases. We highlighted the biological potential of PADA1 by reconstituting DNA and protein conformational changes upon protein mutagenesis of a meganuclease and its variants, and by predicting DNA-binding regions and nucleotide sequences in proteins crystallized without DNA. These results opens up new perspectives for the engineering of DNA–protein interfaces.
Collapse
Affiliation(s)
- Javier Delgado Blanco
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Leandro Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Héctor Climente-González
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain
| | - Luis Serrano
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, Dr. Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluis Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
9
|
Trevizani R, Custódio FL. Supersecondary Structures and Fragment Libraries. Methods Mol Biol 2019; 1958:283-295. [PMID: 30945224 DOI: 10.1007/978-1-4939-9161-7_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The use of smotifs and fragment libraries has proven useful to both simplify and increase the quality of protein models. Here, we present Profrager, a tool that automatically generates putative structural fragments to reproduce local motifs of proteins given a target sequence. Profrager is highly customizable, allowing the user to select the number of fragments per library, the ranking method is able to generate fragments of all sizes, and it was recently modified to include the possibility of output exclusively smotifs.
Collapse
|
10
|
Simoncini D, Zhang KYJ, Schiex T, Barbe S. A structural homology approach for computational protein design with flexible backbone. Bioinformatics 2018; 35:2418-2426. [DOI: 10.1093/bioinformatics/bty975] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Revised: 11/01/2018] [Accepted: 11/28/2018] [Indexed: 01/09/2023] Open
Abstract
Abstract
Motivation
Structure-based Computational Protein design (CPD) plays a critical role in advancing the field of protein engineering. Using an all-atom energy function, CPD tries to identify amino acid sequences that fold into a target structure and ultimately perform a desired function. Energy functions remain however imperfect and injecting relevant information from known structures in the design process should lead to improved designs.
Results
We introduce Shades, a data-driven CPD method that exploits local structural environments in known protein structures together with energy to guide sequence design, while sampling side-chain and backbone conformations to accommodate mutations. Shades (Structural Homology Algorithm for protein DESign), is based on customized libraries of non-contiguous in-contact amino acid residue motifs. We have tested Shades on a public benchmark of 40 proteins selected from different protein families. When excluding homologous proteins, Shades achieved a protein sequence recovery of 30% and a protein sequence similarity of 46% on average, compared with the PFAM protein family of the target protein. When homologous structures were added, the wild-type sequence recovery rate achieved 93%.
Availability and implementation
Shades source code is available at https://bitbucket.org/satsumaimo/shades as a patch for Rosetta 3.8 with a curated protein structure database and ITEM library creation software.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Simoncini
- Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, F Toulouse cedex 04, France
- Institut de recherche en informatique de Toulouse, IRIT, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa, Japan
| | - Thomas Schiex
- Institut de recherche en informatique de Toulouse, UMR 5505-CNRS, Université de Toulouse, Cedex 9, France
| | - Sophie Barbe
- Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés, LISBP, Université de Toulouse, CNRS, INRA, INSA, F Toulouse cedex 04, France
| |
Collapse
|
11
|
Abstract
During the last two decades, the pharmaceutical industry has progressed from detecting small molecules to designing biologic-based therapeutics. Amino acid-based drugs are a group of biologic-based therapeutics that can effectively combat the diseases caused by drug resistance or molecular deficiency. Computational techniques play a key role to design and develop the amino acid-based therapeutics such as proteins, peptides and peptidomimetics. In this study, it was attempted to discuss the various elements for computational design of amino acid-based therapeutics. Protein design seeks to identify the properties of amino acid sequences that fold to predetermined structures with desirable structural and functional characteristics. Peptide drugs occupy a middle space between proteins and small molecules and it is hoped that they can target "undruggable" intracellular protein-protein interactions. Peptidomimetics, the compounds that mimic the biologic characteristics of peptides, present refined pharmacokinetic properties compared to the original peptides. Here, the elaborated techniques that are developed to characterize the amino acid sequences consistent with a specific structure and allow protein design are discussed. Moreover, the key principles and recent advances in currently introduced computational techniques for rational peptide design are spotlighted. The most advanced computational techniques developed to design novel peptidomimetics are also summarized.
Collapse
Affiliation(s)
- Tayebeh Farhadi
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed MohammadReza Hashemian
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Clinical Tuberculosis and Epidemiology Research Center, National Research Institute of Tuberculosis and Lung Disease, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
12
|
Mackenzie CO, Grigoryan G. Protein structural motifs in prediction and design. Curr Opin Struct Biol 2017; 44:161-167. [PMID: 28460216 PMCID: PMC5513761 DOI: 10.1016/j.sbi.2017.03.012] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 03/18/2017] [Accepted: 03/28/2017] [Indexed: 01/11/2023]
Abstract
The Protein Data Bank (PDB) has been an integral resource for shaping our fundamental understanding of protein structure and for the advancement of such applications as protein design and structure prediction. Over the years, information from the PDB has been used to generate models ranging from specific structural mechanisms to general statistical potentials. With accumulating structural data, it has become possible to mine for more complete and complex structural observations, deducing more accurate generalizations. Motif libraries, which capture recurring structural features along with their sequence preferences, have exposed modularity in the structural universe and found successful application in various problems of structural biology. Here we summarize recent achievements in this arena, focusing on subdomain level structural patterns and their applications to protein design and structure prediction, and suggest promising future directions as the structural database continues to grow.
Collapse
Affiliation(s)
- Craig O Mackenzie
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, United States
| | - Gevorg Grigoryan
- Institute for Quantitative Biomedical Sciences, Dartmouth College, Hanover, NH 03755, United States; Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States.
| |
Collapse
|
13
|
Critical Features of Fragment Libraries for Protein Structure Prediction. PLoS One 2017; 12:e0170131. [PMID: 28085928 PMCID: PMC5235372 DOI: 10.1371/journal.pone.0170131] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 12/29/2016] [Indexed: 11/19/2022] Open
Abstract
The use of fragment libraries is a popular approach among protein structure prediction methods and has proven to substantially improve the quality of predicted structures. However, some vital aspects of a fragment library that influence the accuracy of modeling a native structure remain to be determined. This study investigates some of these features. Particularly, we analyze the effect of using secondary structure prediction guiding fragments selection, different fragments sizes and the effect of structural clustering of fragments within libraries. To have a clearer view of how these factors affect protein structure prediction, we isolated the process of model building by fragment assembly from some common limitations associated with prediction methods, e.g., imprecise energy functions and optimization algorithms, by employing an exact structure-based objective function under a greedy algorithm. Our results indicate that shorter fragments reproduce the native structure more accurately than the longer. Libraries composed of multiple fragment lengths generate even better structures, where longer fragments show to be more useful at the beginning of the simulations. The use of many different fragment sizes shows little improvement when compared to predictions carried out with libraries that comprise only three different fragment sizes. Models obtained from libraries built using only sequence similarity are, on average, better than those built with a secondary structure prediction bias. However, we found that the use of secondary structure prediction allows greater reduction of the search space, which is invaluable for prediction methods. The results of this study can be critical guidelines for the use of fragment libraries in protein structure prediction.
Collapse
|
14
|
Abstract
Computational protein design (CPD) has established itself as a leading field in basic and applied science with a strong coupling between the two. Proteins are computationally designed from the level of amino acids to the level of a functional protein complex. Design targets range from increased thermo- (or other) stability to specific requested reactions such as protein-protein binding, enzymatic reactions, or nanotechnology applications. The design scheme may encompass small regions of the proteins or the entire protein. In either case, the design may aim at the side-chains or at the full backbone conformation. Herein, the main framework for the process is outlined highlighting key elements in the CPD iterative cycle. These include the very definition of CPD, the diverse goals of CPD, components of the CPD protocol, methods for searching sequence and structure space, scoring functions, and augmenting the CPD with other optimization tools. Taken together, this chapter aims to introduce the framework of CPD.
Collapse
Affiliation(s)
- Ilan Samish
- Department of Plants and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel.
- Department of Biotechnology Engineering, Braude Academic College of Engineering, Karmiel, Israel.
- Amai Proteins Ltd., Ashdod, Israel.
| |
Collapse
|
15
|
Abstract
Computational protein design (CPD), a yet evolving field, includes computer-aided engineering for partial or full de novo designs of proteins of interest. Designs are defined by a requested structure, function, or working environment. This chapter describes the birth and maturation of the field by presenting 101 CPD examples in a chronological order emphasizing achievements and pending challenges. Integrating these aspects presents the plethora of CPD approaches with the hope of providing a "CPD 101". These reflect on the broader structural bioinformatics and computational biophysics field and include: (1) integration of knowledge-based and energy-based methods, (2) hierarchical designated approach towards local, regional, and global motifs and the integration of high- and low-resolution design schemes that fit each such region, (3) systematic differential approaches towards different protein regions, (4) identification of key hot-spot residues and the relative effect of remote regions, (5) assessment of shape-complementarity, electrostatics and solvation effects, (6) integration of thermal plasticity and functional dynamics, (7) negative design, (8) systematic integration of experimental approaches, (9) objective cross-assessment of methods, and (10) successful ranking of potential designs. Future challenges also include dissemination of CPD software to the general use of life-sciences researchers and the emphasis of success within an in vivo milieu. CPD increases our understanding of protein structure and function and the relationships between the two along with the application of such know-how for the benefit of mankind. Applied aspects range from biological drugs, via healthier and tastier food products to nanotechnology and environmentally friendly enzymes replacing toxic chemicals utilized in the industry.
Collapse
|
16
|
Abstract
Here, we systematically decompose the known protein structural universe into its basic elements, which we dub tertiary structural motifs (TERMs). A TERM is a compact backbone fragment that captures the secondary, tertiary, and quaternary environments around a given residue, comprising one or more disjoint segments (three on average). We seek the set of universal TERMs that capture all structure in the Protein Data Bank (PDB), finding remarkable degeneracy. Only ∼600 TERMs are sufficient to describe 50% of the PDB at sub-Angstrom resolution. However, more rare geometries also exist, and the overall structural coverage grows logarithmically with the number of TERMs. We go on to show that universal TERMs provide an effective mapping between sequence and structure. We demonstrate that TERM-based statistics alone are sufficient to recapitulate close-to-native sequences given either NMR or X-ray backbones. Furthermore, sequence variability predicted from TERM data agrees closely with evolutionary variation. Finally, locations of TERMs in protein chains can be predicted from sequence alone based on sequence signatures emergent from TERM instances in the PDB. For multisegment motifs, this method identifies spatially adjacent fragments that are not contiguous in sequence-a major bottleneck in structure prediction. Although all TERMs recur in diverse proteins, some appear specialized for certain functions, such as interface formation, metal coordination, or even water binding. Structural biology has benefited greatly from previously observed degeneracies in structure. The decomposition of the known structural universe into a finite set of compact TERMs offers exciting opportunities toward better understanding, design, and prediction of protein structure.
Collapse
|
17
|
Nastri F, Chino M, Maglio O, Bhagi-Damodaran A, Lu Y, Lombardi A. Design and engineering of artificial oxygen-activating metalloenzymes. Chem Soc Rev 2016; 45:5020-54. [PMID: 27341693 PMCID: PMC5021598 DOI: 10.1039/c5cs00923e] [Citation(s) in RCA: 133] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Many efforts are being made in the design and engineering of metalloenzymes with catalytic properties fulfilling the needs of practical applications. Progress in this field has recently been accelerated by advances in computational, molecular and structural biology. This review article focuses on the recent examples of oxygen-activating metalloenzymes, developed through the strategies of de novo design, miniaturization processes and protein redesign. Considerable progress in these diverse design approaches has produced many metal-containing biocatalysts able to adopt the functions of native enzymes or even novel functions beyond those found in Nature.
Collapse
Affiliation(s)
- Flavia Nastri
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Marco Chino
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| | - Ornella Maglio
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
- IBB, CNR, Via Mezzocannone 16, 80134 Naples, Italy
| | - Ambika Bhagi-Damodaran
- Department of Chemistry, University of Illinois at Urbana-Champaign, A322 CLSL, 600 South Mathews Avenue, Urbana, IL 61801
| | - Yi Lu
- Department of Chemistry, University of Illinois at Urbana-Champaign, A322 CLSL, 600 South Mathews Avenue, Urbana, IL 61801
| | - Angela Lombardi
- Department of Chemical Sciences, University of Naples “Federico II”, Via Cintia, 80126 Naples, Italy
| |
Collapse
|
18
|
Watanabe H, Honda S. Adaptive Assembly: Maximizing the Potential of a Given Functional Peptide with a Tailor-Made Protein Scaffold. ACTA ACUST UNITED AC 2015; 22:1165-73. [PMID: 26299673 DOI: 10.1016/j.chembiol.2015.07.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 06/18/2015] [Accepted: 07/05/2015] [Indexed: 01/28/2023]
Abstract
Protein engineering that exploits known functional peptides holds great promise for generating novel functional proteins. Here we propose a combinatorial approach, termed adaptive assembly, which provides a tailor-made protein scaffold for a given functional peptide. A combinatorial library was designed to create a tailor-made scaffold, which was generated from β hairpins derived from a 10-residue minimal protein "chignolin" and randomized amino acid sequences. We applied adaptive assembly to a peptide with low affinity for the Fc region of human immunoglobulin G, generating a 54-residue protein AF.p17 with a 40,600-fold enhanced affinity. The crystal structure of AF.p17 complexed with the Fc region revealed that the scaffold fixed the active conformation with a unique structure composed of a short α helix, β hairpins, and a loop-like structure. Adaptive assembly can take full advantage of known peptides as assets for generating novel functional proteins.
Collapse
Affiliation(s)
- Hideki Watanabe
- Biomedical Research Institute, the National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1, Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | - Shinya Honda
- Biomedical Research Institute, the National Institute of Advanced Industrial Science and Technology (AIST), 1-1-1, Higashi, Tsukuba, Ibaraki 305-8566, Japan.
| |
Collapse
|
19
|
Zhou J, Grigoryan G. Rapid search for tertiary fragments reveals protein sequence-structure relationships. Protein Sci 2014; 24:508-24. [PMID: 25420575 DOI: 10.1002/pro.2610] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2014] [Accepted: 11/21/2014] [Indexed: 12/31/2022]
Abstract
Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner; so we go on to illustrate the utility of MASTER to protein structural biology. We demonstrate its capacity to rapidly establish structure-sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure.
Collapse
Affiliation(s)
- Jianfu Zhou
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire, 03755
| | | |
Collapse
|
20
|
Rysavy SJ, Beck DAC, Daggett V. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction. Protein Sci 2014; 23:1584-95. [PMID: 25142412 DOI: 10.1002/pro.2537] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 07/30/2014] [Accepted: 08/17/2014] [Indexed: 12/26/2022]
Abstract
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments.
Collapse
Affiliation(s)
- Steven J Rysavy
- Division of Biomedical and Health Informatics, University of Washington, Seattle, Washington
| | | | | |
Collapse
|
21
|
Abstract
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell-Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Computer Science , Dartmouth College , Hanover, NH , USA
| | | | | |
Collapse
|
22
|
Abstract
The observation of a limited secondary-structural alphabet in native proteins, with significant sequence preferences, has profoundly influenced the fields of protein design and structure prediction (Simons, Kooperberg, Huang, & Baker, 1997; Verschueren et al., 2011). In the era of structural genomics, as the size of the structural dataset continues to grow rapidly, it is becoming possible to extend this analysis to tertiary structural motifs and their sequences. For a hypothetical tertiary motif, the rate of its utilization in natural proteins may be used to assess its designability-the ease with which the motif can be realized with natural amino acids. This requires a structural similarity search methodology, which rather than looking for global topological agreement (more appropriate for categorization of full proteins or domains), identifies detailed geometric matches. In this chapter, we introduce such a method, called MaDCaT, and demonstrate its use by assessing the designability landscapes of two tertiary structural motifs. We also show that such analysis can establish structure/sequence links by providing the sequence constraints necessary to encode designable motifs. As logical extension of their secondary-structure counterparts, tertiary structural preferences will likely prove extremely useful in de novo protein design and structure prediction.
Collapse
Affiliation(s)
- Jian Zhang
- Department of Computer Science, Dartmouth College, Fax: 603-646-1672, 6211 Sudikoff Lab, Room 210, Hanover, NH 03755-3510, USA
| | - Gevorg Grigoryan
- Adjunct Professor of Biology, Dartmouth College, Phone: 603-646-3173, Fax: 603-646-1672, 6211 Sudikoff Lab, Room 113, Hanover, NH 03755-3510, USA
| |
Collapse
|
23
|
Steiner K, Schwab H. Recent advances in rational approaches for enzyme engineering. Comput Struct Biotechnol J 2012; 2:e201209010. [PMID: 24688651 PMCID: PMC3962183 DOI: 10.5936/csbj.201209010] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 10/16/2012] [Accepted: 10/18/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are an attractive alternative in the asymmetric syntheses of chiral building blocks. To meet the requirements of industrial biotechnology and to introduce new functionalities, the enzymes need to be optimized by protein engineering. This article specifically reviews rational approaches for enzyme engineering and de novo enzyme design involving structure-based approaches developed in recent years for improvement of the enzymes’ performance, broadened substrate range, and creation of novel functionalities to obtain products with high added value for industrial applications.
Collapse
Affiliation(s)
- Kerstin Steiner
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria
| | - Helmut Schwab
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria ; Institute of Molecular Biotechnology, TU Graz, 8010 Graz, Austria
| |
Collapse
|
24
|
Kulp DW, Subramaniam S, Donald JE, Hannigan BT, Mueller BK, Grigoryan G, Senes A. Structural informatics, modeling, and design with an open-source Molecular Software Library (MSL). J Comput Chem 2012; 33:1645-61. [PMID: 22565567 PMCID: PMC3432414 DOI: 10.1002/jcc.22968] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 02/16/2012] [Accepted: 03/02/2012] [Indexed: 01/22/2023]
Abstract
We present the Molecular Software Library (MSL), a C++ library for molecular modeling. MSL is a set of tools that supports a large variety of algorithms for the design, modeling, and analysis of macromolecules. Among the main features supported by the library are methods for applying geometric transformations and alignments, the implementation of a rich set of energy functions, side chain optimization, backbone manipulation, calculation of solvent accessible surface area, and other tools. MSL has a number of unique features, such as the ability of storing alternative atomic coordinates (for modeling) and multiple amino acid identities at the same backbone position (for design). It has a straightforward mechanism for extending its energy functions and can work with any type of molecules. Although the code base is large, MSL was created with ease of developing in mind. It allows the rapid implementation of simple tasks while fully supporting the creation of complex applications. Some of the potentialities of the software are demonstrated here with examples that show how to program complex and essential modeling tasks with few lines of code. MSL is an ongoing and evolving project, with new features and improvements being introduced regularly, but it is mature and suitable for production and has been used in numerous protein modeling and design projects. MSL is open-source software, freely downloadable at http://msl-libraries.org. We propose it as a common platform for the development of new molecular algorithms and to promote the distribution, sharing, and reutilization of computational methods.
Collapse
Affiliation(s)
| | | | | | - Brett T. Hannigan
- U. of Pennsylvania, Genomics and Computational Biology Graduate Group
| | | | | | | |
Collapse
|