1
|
Xu G, Luo Z, Yan Y, Wang Q, Ma J. OPUS-Rota5: A highly accurate protein side-chain modeling method with 3D-Unet and RotaFormer. Structure 2024; 32:1001-1010.e2. [PMID: 38657613 DOI: 10.1016/j.str.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/06/2024] [Accepted: 03/28/2024] [Indexed: 04/26/2024]
Abstract
Accurate protein side-chain modeling is crucial for protein folding and design. This is particularly true for molecular docking as ligands primarily interact with side chains. In this study, we introduce a two-stage side-chain modeling approach called OPUS-Rota5. It leverages a modified 3D-Unet to capture the local environmental features, including ligand information of each residue, and then employs the RotaFormer module to aggregate various types of features. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, shows that OPUS-Rota5 significantly outperforms some other leading side-chain modeling methods. We also employ OPUS-Rota5 to refine the side chains of 25 G protein-coupled receptor targets predicted by AlphaFold2 and achieve a significantly improved success rate in a subsequent "back" docking of their natural ligands. Therefore, OPUS-Rota5 is a useful and effective tool for molecular docking, particularly for targets with relatively accurate predicted backbones but not side chains such as high-homology targets.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Zhenwei Luo
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Yaming Yan
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China.
| |
Collapse
|
2
|
Gogal RA, Nessler AJ, Thiel AC, Bernabe HV, Corrigan Grove RA, Cousineau LM, Litman JM, Miller JM, Qi G, Speranza MJ, Tollefson MR, Fenn TD, Michaelson JJ, Okada O, Piquemal JP, Ponder JW, Shen J, Smith RJH, Yang W, Ren P, Schnieders MJ. Force Field X: A computational microscope to study genetic variation and organic crystals using theory and experiment. J Chem Phys 2024; 161:012501. [PMID: 38958156 PMCID: PMC11223778 DOI: 10.1063/5.0214652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Accepted: 06/17/2024] [Indexed: 07/04/2024] Open
Abstract
Force Field X (FFX) is an open-source software package for atomic resolution modeling of genetic variants and organic crystals that leverages advanced potential energy functions and experimental data. FFX currently consists of nine modular packages with novel algorithms that include global optimization via a many-body expansion, acid-base chemistry using polarizable constant-pH molecular dynamics, estimation of free energy differences, generalized Kirkwood implicit solvent models, and many more. Applications of FFX focus on the use and development of a crystal structure prediction pipeline, biomolecular structure refinement against experimental datasets, and estimation of the thermodynamic effects of genetic variants on both proteins and nucleic acids. The use of Parallel Java and OpenMM combines to offer shared memory, message passing, and graphics processing unit parallelization for high performance simulations. Overall, the FFX platform serves as a computational microscope to study systems ranging from organic crystals to solvated biomolecular systems.
Collapse
Affiliation(s)
- Rose A. Gogal
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Aaron J. Nessler
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Andrew C. Thiel
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Hernan V. Bernabe
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Rae A. Corrigan Grove
- Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico 87545, USA
| | - Leah M. Cousineau
- Department of Biochemistry and Molecular Biology, University of Iowa, Iowa City, Iowa 52242, USA
| | - Jacob M. Litman
- Department of Biochemistry and Molecular Biology, University of Iowa, Iowa City, Iowa 52242, USA
| | - Jacob M. Miller
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Guowei Qi
- Department of Biochemistry and Molecular Biology, University of Iowa, Iowa City, Iowa 52242, USA
| | - Matthew J. Speranza
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Mallory R. Tollefson
- Roy J. Carver Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa 52242, USA
| | - Timothy D. Fenn
- Analytical Development, LEXEO Therapeutics, New York, New York 10010, USA
| | - Jacob J. Michaelson
- Department of Psychiatry, University of Iowa Hospitals and Clinics, Iowa City, Iowa 52242, USA
| | - Okimasa Okada
- Sohyaku Innovative Research Division, Mitsubishi Tanabe Pharma Corporation, 1000 Kamoshida-cho, Aoba-ku, Yokohama, Kanagawa 227-0033, Japan
| | | | - Jay W. Ponder
- Department of Chemistry, Washington University in St. Louis, St. Louis, Missouri 63130, USA
| | - Jana Shen
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201, USA
| | - Richard J. H. Smith
- Molecular Otolaryngology and Renal Research Laboratories, Department of Otolaryngology, University of Iowa Hospitals and Clinics, Iowa City, Iowa 52242, USA
| | | | - Pengyu Ren
- Department of Biomedical Engineering, University of Texas, Austin, Texas 78712, USA
| | | |
Collapse
|
3
|
Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. Proteins 2024. [PMID: 38790143 DOI: 10.1002/prot.26705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Revised: 04/19/2024] [Accepted: 05/13/2024] [Indexed: 05/26/2024]
Abstract
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high-confidence and low-energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low-energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates usingχ $$ \chi $$ -angle distribution predictions and geometry-aware invariant point message passing (IPMP). On a test set of ∼1400 high-quality protein chains, PIPPack is highly competitive with other state-of-the-art PSCP methods in rotamer recovery and per-residue RMSD but is significantly faster.
Collapse
Affiliation(s)
- Nicholas Z Randolph
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
4
|
Randolph NZ, Kuhlman B. Invariant point message passing for protein side chain packing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.08.03.551328. [PMID: 38187664 PMCID: PMC10769188 DOI: 10.1101/2023.08.03.551328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Protein side chain packing (PSCP) is a fundamental problem in the field of protein engineering, as high-confidence and low-energy conformations of amino acid side chains are crucial for understanding (and designing) protein folding, protein-protein interactions, and protein-ligand interactions. Traditional PSCP methods (such as the Rosetta Packer) often rely on a library of discrete side chain conformations, or rotamers, and a forcefield to guide the structure to low-energy conformations. Recently, deep learning (DL) based methods (such as DLPacker, AttnPacker, and DiffPack) have demonstrated state-of-the-art predictions and speed in the PSCP task. Building off the success of geometric graph neural networks for protein modeling, we present the Protein Invariant Point Packer (PIPPack) which effectively processes local structural and sequence information to produce realistic, idealized side chain coordinates using χ-angle distribution predictions and geometry-aware invariant point message passing (IPMP). On a test set of ~1,400 high-quality protein chains, PIPPack is highly competitive with other state-of-the-art PSCP methods in rotamer recovery and per-residue RMSD but is significantly faster.
Collapse
Affiliation(s)
- Nicholas Z Randolph
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| | - Brian Kuhlman
- Department of Bioinformatics and Computational Biology, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
- Department of Biochemistry and Biophysics, University of North Carolina School of Medicine, Chapel Hill, North Carolina, USA
| |
Collapse
|
5
|
Yan J, Li S, Zhang Y, Hao A, Zhao Q. ZetaDesign: an end-to-end deep learning method for protein sequence design and side-chain packing. Brief Bioinform 2023; 24:bbad257. [PMID: 37429578 DOI: 10.1093/bib/bbad257] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/05/2023] [Accepted: 06/21/2023] [Indexed: 07/12/2023] Open
Abstract
Computational protein design has been demonstrated to be the most powerful tool in the last few years among protein designing and repacking tasks. In practice, these two tasks are strongly related but often treated separately. Besides, state-of-the-art deep-learning-based methods cannot provide interpretability from an energy perspective, affecting the accuracy of the design. Here we propose a new systematic approach, including both a posterior probability and a joint probability parts, to solve the two essential questions once for all. This approach takes the physicochemical property of amino acids into consideration and uses the joint probability model to ensure the convergence between structure and amino acid type. Our results demonstrated that this method could generate feasible, high-confidence sequences with low-energy side conformations. The designed sequences can fold into target structures with high confidence and maintain relatively stable biochemical properties. The side chain conformation has a significantly lower energy landscape without delegating to a rotamer library or performing the expensive conformational searches. Overall, we propose an end-to-end method that combines the advantages of both deep learning and energy-based methods. The design results of this model demonstrate high efficiency, and precision, as well as a low energy state and good interpretability.
Collapse
Affiliation(s)
- Junyu Yan
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Shuai Li
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Ying Zhang
- The Key Laboratory of Cell Proliferation and Regulation Biology, Ministry of Education, College of Life Sciences, Beijing Normal University, Beijing, China
| | - Aimin Hao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| | - Qinping Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, Beihang University, Beijing, China
| |
Collapse
|
6
|
Grybauskas A, Gražulis S. Building protein structure-specific rotamer libraries. Bioinformatics 2023; 39:btad429. [PMID: 37439702 PMCID: PMC10359632 DOI: 10.1093/bioinformatics/btad429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Revised: 06/19/2023] [Indexed: 07/14/2023] Open
Abstract
MOTIVATION Identifying the probable positions of the protein side-chains is one of the protein modelling steps that can improve the prediction of protein-ligand and protein-protein interactions. Most of the strategies predicting the side-chain conformations use predetermined dihedral angle lists, also called rotamer libraries, that are usually generated from a subset of high-quality protein structures. Although these methods are fast to apply, they tend to average out geometries instead of taking into account the surrounding atoms and molecules and ignore structures not included in the selected subset. Such simplifications can result in inaccuracies when predicting possible side-chain atom positions. RESULTS We propose an approach that takes into account both of these circumstances by scanning through sterically accessible side-chain conformations and generating dihedral angle libraries specific to the target proteins. The method avoids the drawbacks of lacking conformations due to unusual or rare protein structures and successfully suggests potential rotamers with average RMSD closer to the experimentally determined side-chain atom positions than other widely used rotamer libraries. AVAILABILITY AND IMPLEMENTATION The technique is implemented in open-source software package rotag and available at GitHub: https://www.github.com/agrybauskas/rotag, under GNU Lesser General Public License.
Collapse
Affiliation(s)
- Algirdas Grybauskas
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, 7 Saulėtekio Ave, Vilnius, LT- 10257, Lithuania
| | - Saulius Gražulis
- Sector of Crystallography and Cheminformatics, Institute of Biotechnology, Life Sciences Center, Vilnius University, 7 Saulėtekio Ave, Vilnius, LT- 10257, Lithuania
| |
Collapse
|
7
|
Amarasinghe PR, Allison L, Stuckey PJ, Garcia de la Banda M, Lesk AM, Konagurthu AS. Getting 'ϕψχal' with proteins: minimum message length inference of joint distributions of backbone and sidechain dihedral angles. Bioinformatics 2023; 39:i357-i367. [PMID: 37387189 DOI: 10.1093/bioinformatics/btad251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
The tendency of an amino acid to adopt certain configurations in folded proteins is treated here as a statistical estimation problem. We model the joint distribution of the observed mainchain and sidechain dihedral angles (〈ϕ,ψ,χ1,χ2,…〉) of any amino acid by a mixture of a product of von Mises probability distributions. This mixture model maps any vector of dihedral angles to a point on a multi-dimensional torus. The continuous space it uses to specify the dihedral angles provides an alternative to the commonly used rotamer libraries. These rotamer libraries discretize the space of dihedral angles into coarse angular bins, and cluster combinations of sidechain dihedral angles (〈χ1,χ2,…〉) as a function of backbone 〈ϕ,ψ〉 conformations. A 'good' model is one that is both concise and explains (compresses) observed data. Competing models can be compared directly and in particular our model is shown to outperform the Dunbrack rotamer library in terms of model complexity (by three orders of magnitude) and its fidelity (on average 20% more compression) when losslessly explaining the observed dihedral angle data across experimental resolutions of structures. Our method is unsupervised (with parameters estimated automatically) and uses information theory to determine the optimal complexity of the statistical model, thus avoiding under/over-fitting, a common pitfall in model selection problems. Our models are computationally inexpensive to sample from and are geared to support a number of downstream studies, ranging from experimental structure refinement, de novo protein design, and protein structure prediction. We call our collection of mixture models as PhiSiCal (ϕψχal). AVAILABILITY AND IMPLEMENTATION PhiSiCal mixture models and programs to sample from them are available for download at http://lcb.infotech.monash.edu.au/phisical.
Collapse
Affiliation(s)
- Piyumi R Amarasinghe
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Peter J Stuckey
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
- OPTIMA ARC Industrial Training and Transformation Centre, Carlton, VIC 3053, Australia
| | - Maria Garcia de la Banda
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
- OPTIMA ARC Industrial Training and Transformation Centre, Carlton, VIC 3053, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, United States
| | - Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| |
Collapse
|
8
|
Ding W, Nakai K, Gong H. Protein design via deep learning. Brief Bioinform 2022; 23:bbac102. [PMID: 35348602 PMCID: PMC9116377 DOI: 10.1093/bib/bbac102] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 02/26/2022] [Accepted: 03/01/2022] [Indexed: 12/11/2022] Open
Abstract
Proteins with desired functions and properties are important in fields like nanotechnology and biomedicine. De novo protein design enables the production of previously unseen proteins from the ground up and is believed as a key point for handling real social challenges. Recent introduction of deep learning into design methods exhibits a transformative influence and is expected to represent a promising and exciting future direction. In this review, we retrospect the major aspects of current advances in deep-learning-based design procedures and illustrate their novelty in comparison with conventional knowledge-based approaches through noticeable cases. We not only describe deep learning developments in structure-based protein design and direct sequence design, but also highlight recent applications of deep reinforcement learning in protein design. The future perspectives on design goals, challenges and opportunities are also comprehensively discussed.
Collapse
Affiliation(s)
- Wenze Ding
- School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China
- School of Future Technology, Nanjing University of Information Science and Technology, Nanjing 210044, China
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| | - Kenta Nakai
- Institute of Medical Science, the University of Tokyo, Tokyo 1088639, Japan
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing 100084, China
- Beijing Advanced Innovation Center for Structural Biology, Tsinghua University, Beijing 100084, China
| |
Collapse
|
9
|
Talluri S. Algorithms for protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:1-38. [PMID: 35534105 DOI: 10.1016/bs.apcsb.2022.01.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Computational Protein Design has the potential to contribute to major advances in enzyme technology, vaccine design, receptor-ligand engineering, biomaterials, nanosensors, and synthetic biology. Although Protein Design is a challenging problem, proteins can be designed by experts in Protein Design, as well as by non-experts whose primary interests are in the applications of Protein Design. The increased accessibility of Protein Design technology is attributable to the accumulated knowledge and experience with Protein Design as well as to the availability of software and online resources. The objective of this review is to serve as a guide to the relevant literature with a focus on the novel methods and algorithms that have been developed or applied for Protein Design, and to assist in the selection of algorithms for Protein Design. Novel algorithms and models that have been introduced to utilize the enormous amount of experimental data and novel computational hardware have the potential for producing substantial increases in the accuracy, reliability and range of applications of designed proteins.
Collapse
Affiliation(s)
- Sekhar Talluri
- Department of Biotechnology, GITAM, Visakhapatnam, India.
| |
Collapse
|
10
|
ElGamacy M. Accelerating therapeutic protein design. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:85-118. [PMID: 35534117 DOI: 10.1016/bs.apcsb.2022.01.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Protein structures provide for defined microenvironments that can support complex pharmacological functions, otherwise unachievable by small molecules. The advent of therapeutic proteins has thus greatly broadened the range of manageable disorders. Leveraging the knowledge and recent advances in de novo protein design methods has the prospect of revolutionizing how protein drugs are discovered and developed. This review lays out the main challenges facing therapeutic proteins discovery and development, and how present and future advancements of protein design can accelerate the protein drug pipelines.
Collapse
Affiliation(s)
- Mohammad ElGamacy
- University Hospital Tübingen, Division of Translational Oncology, Tübingen, Germany; Max Planck Institute for Biology, Tübingen, Germany.
| |
Collapse
|
11
|
Malik A, Banerjee A, Pal A, Mitra P. A sequence space search engine for computational protein design to modulate molecular functionality. J Biomol Struct Dyn 2022; 41:2937-2946. [PMID: 35220920 DOI: 10.1080/07391102.2022.2042386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
De-novo protein design explores the untapped sequence space that is otherwise less discovered during the evolutionary process. This necessitates an efficient sequence space search engine for effective convergence in computational protein design. We propose a greedy simulated annealing-based Monte-Carlo parallel search algorithm for better sequence-structure compatibility probing in protein design. The guidance provided by the evolutionary profile, the greedy approach, and the cooling schedule adopted in the Monte Carlo simulation ensures sufficient exploration and exploitation of the search space leading to faster convergence. On evaluating the proposed algorithm, we find that a dataset of 76 target scaffolds report an average root-mean-square-deviation (RMSD) of 1.07 Å and an average TM-Score of 0.93 with the modeled designed protein sequences. High sequence recapitulation of 48.7% (59.4%) observed in the design sequences for all (hydrophobic) solvent-inaccessible residues again establish the goodness of the proposed algorithm. A high (93.4%) intra-group recapitulation of hydrophobic residues in the solvent-inaccessible region indicates that the proposed protein design algorithm preserves the core residues in the protein and provides alternative residue combinations in the solvent-accessible regions of the target protein. Furthermore, a COFACTOR-based protein functional analysis shows that the design sequences exhibit altered molecular functionality and introduce new molecular functions compared to the target scaffolds.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ayush Malik
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Anupam Banerjee
- School of Medical Science and Technology, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
12
|
Gisdon FJ, Kynast JP, Ayyildiz M, Hine AV, Plückthun A, Höcker B. Modular peptide binders - development of a predictive technology as alternative for reagent antibodies. Biol Chem 2022; 403:535-543. [PMID: 35089661 DOI: 10.1515/hsz-2021-0384] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 01/11/2022] [Indexed: 11/15/2022]
Abstract
Current biomedical research and diagnostics critically depend on detection agents for specific recognition and quantification of protein molecules. Monoclonal antibodies have been used for this purpose over decades and facilitated numerous biological and biomedical investigations. Recently, however, it has become apparent that many commercial reagent antibodies lack specificity or do not recognize their target at all. Thus, synthetic alternatives are needed whose complex designs are facilitated by multidisciplinary approaches incorporating experimental protein engineering with computational modeling. Here, we review the status of such an engineering endeavor based on the modular armadillo repeat protein scaffold and discuss challenges in its implementation.
Collapse
Affiliation(s)
- Florian J Gisdon
- Department of Biochemistry, University of Bayreuth, D-95447 Bayreuth, Germany
| | - Josef P Kynast
- Department of Biochemistry, University of Bayreuth, D-95447 Bayreuth, Germany
| | - Merve Ayyildiz
- Department of Biochemistry, University of Bayreuth, D-95447 Bayreuth, Germany
| | - Anna V Hine
- College of Health and Life Sciences, Aston University, Birmingham B4 7ET, UK
| | - Andreas Plückthun
- Department of Biochemistry, University of Zurich, CH-8057 Zürich, Switzerland
| | - Birte Höcker
- Department of Biochemistry, University of Bayreuth, D-95447 Bayreuth, Germany
| |
Collapse
|
13
|
Xiao X, Sarma S, Menegatti S, Crook N, Magness ST, Hall CK. In Silico Identification and Experimental Validation of Peptide-Based Inhibitors Targeting Clostridium difficile Toxin A. ACS Chem Biol 2022; 17:118-128. [PMID: 34965093 DOI: 10.1021/acschembio.1c00743] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Clostridium difficile infection is mediated by two major exotoxins: toxins A (TcdA) and B (TcdB). Inhibiting the biocatalytic activities of these toxins with targeted peptide-based drugs can reduce the risk of C. difficile infection. In this work, we used a computational strategy that integrates a peptide binding design (PepBD) algorithm and explicit-solvent atomistic molecular dynamics simulation to determine promising toxin A-targeting peptides that can recognize and bind to the catalytic site of the TcdA glucosyltransferase domain (GTD). Our simulation results revealed that two out of three in silico discovered peptides, viz. the neutralizing peptides A (NPA) and B (NPB), exhibit lower binding free energies when bound to the TcdA GTD than the phage-display discovered peptide, viz. the reference peptide (RP). These peptides may serve as potential inhibitors against C. difficile infection. The efficacy of the peptides RP, NPA, and NPB to neutralize the cytopathic effects of TcdA was tested in vitro in human jejunum cells. Both phage-display peptide RP and in silico peptide NPA were found to exhibit strong toxin-neutralizing properties, thereby preventing the TcdA toxicity. However, the in silico peptide NPB demonstrates a relatively low efficacy against TcdA.
Collapse
Affiliation(s)
- Xingqing Xiao
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Sudeep Sarma
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Stefano Menegatti
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
- Biomanufacturing Training and Education Center (BTEC), North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Nathan Crook
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Scott T Magness
- Department of Medicine, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27514, United States
| | - Carol K Hall
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina 27695, United States
| |
Collapse
|
14
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
15
|
Zhu J, Avakyan N, Kakkis AA, Hoffnagle AM, Han K, Li Y, Zhang Z, Choi TS, Na Y, Yu CJ, Tezcan FA. Protein Assembly by Design. Chem Rev 2021; 121:13701-13796. [PMID: 34405992 PMCID: PMC9148388 DOI: 10.1021/acs.chemrev.1c00308] [Citation(s) in RCA: 107] [Impact Index Per Article: 35.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Proteins are nature's primary building blocks for the construction of sophisticated molecular machines and dynamic materials, ranging from protein complexes such as photosystem II and nitrogenase that drive biogeochemical cycles to cytoskeletal assemblies and muscle fibers for motion. Such natural systems have inspired extensive efforts in the rational design of artificial protein assemblies in the last two decades. As molecular building blocks, proteins are highly complex, in terms of both their three-dimensional structures and chemical compositions. To enable control over the self-assembly of such complex molecules, scientists have devised many creative strategies by combining tools and principles of experimental and computational biophysics, supramolecular chemistry, inorganic chemistry, materials science, and polymer chemistry, among others. Owing to these innovative strategies, what started as a purely structure-building exercise two decades ago has, in short order, led to artificial protein assemblies with unprecedented structures and functions and protein-based materials with unusual properties. Our goal in this review is to give an overview of this exciting and highly interdisciplinary area of research, first outlining the design strategies and tools that have been devised for controlling protein self-assembly, then describing the diverse structures of artificial protein assemblies, and finally highlighting the emergent properties and functions of these assemblies.
Collapse
Affiliation(s)
| | | | - Albert A. Kakkis
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Alexander M. Hoffnagle
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Kenneth Han
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Yiying Li
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Zhiyin Zhang
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Tae Su Choi
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Youjeong Na
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - Chung-Jui Yu
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| | - F. Akif Tezcan
- Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093-0340, United States
| |
Collapse
|
16
|
Hassan M, Coutsias EA. Kinematic Reconstruction of Cyclic Peptides and Protein Backbones from Partial Data. J Chem Inf Model 2021; 61:4975-5000. [PMID: 34570494 PMCID: PMC10129052 DOI: 10.1021/acs.jcim.1c00453] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We present an algorithm, QBKR (Quaternary Backbone Kinematic Reconstruction), a fast analytical method for an all-atom backbone reconstruction of proteins and linear or cyclic peptide chains from Cα coordinate traces. Unlike previous analytical methods for deriving all-atom representations from coarse-grained models that rely on canonical geometry with planar peptides in the trans conformation, our de novo kinematic model incorporates noncanonical, cis-trans, geometry naturally. Perturbations to this geometry can be effected with ease in our formulation, for example, to account for a continuous change from cis to trans geometry. A simple optimization of a spring-based objective function is employed for Cα-Cα distance variations that extend beyond the cis-trans limit. The kinematic construction produces a linked chain of peptide units, Cα-C-N-Cα, hinged at the Cα atoms spanning all possible planar and nonplanar peptide conformations. We have combined our method with a ring closure algorithm for the case of ring peptides and missing loops in a protein structure. Here, the reconstruction proceeding from both the N and C termini of the protein backbone (or in both directions from a starting position for rings) requires freedom in the position of one Cα atom (a capstone) to achieve a successful loop or ring closure. A salient feature of our reconstruction method is the ability to enrich conformational ensembles to produce alternative feasible conformations in which H-bond forming C-O or N-H pairs in the backbone can reverse orientations, thus addressing a well-known shortcoming in Cα-based RMSD structure comparison, wherein very close structures may lead to significantly different overall H-bond behavior. We apply the fixed Cα-based design to the reverse reconstruction from noisy Cryo-EM data, a posteriori to the optimization. Our method can be applied to speed up the process of an all-atom description from voluminous experimental data or subpar electron density maps.
Collapse
Affiliation(s)
- Mosavverul Hassan
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York 11794, United States
| | - Evangelos A Coutsias
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, New York 11794, United States.,Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York 11794-5252, United States
| |
Collapse
|
17
|
Saikia B, Gogoi CR, Rahman A, Baruah A. Identification of an optimal foldability criterion to design misfolding resistant protein. J Chem Phys 2021; 155:144102. [PMID: 34654294 DOI: 10.1063/5.0057533] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Proteins achieve their functional, active, and operative three dimensional native structures by overcoming the possibility of being trapped in non-native energy minima present in the energy landscape. The enormous and intricate interactions that play an important role in protein folding also determine the stability of the proteins. The large number of stabilizing/destabilizing interactions makes proteins to be only marginally stable as compared to the other competing structures. Therefore, there are some possibilities that they become trapped in the non-native conformations and thus get misfolded. These misfolded proteins lead to several debilitating diseases. This work performs a comparative study of some existing foldability criteria in the computational design of misfold resistant protein sequences based on self-consistent mean field theory. The foldability criteria selected for this study are Ef, Δ, and Φ that are commonly used in protein design procedures to determine the most efficient foldability criterion for the design of misfolding resistant proteins. The results suggest that the foldability criterion Δ is significantly better in designing a funnel energy landscape stabilizing the target state. The results also suggest that inclusion of negative design features is important for designing misfolding resistant proteins, but more information about the non-native conformations in terms of Φ leads to worse results compared to even simple positive design. The sequences designed using Δ show better resistance to misfolding in the Monte Carlo simulations performed in the study.
Collapse
Affiliation(s)
- Bondeepa Saikia
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Chimi Rekha Gogoi
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Aziza Rahman
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| | - Anupaul Baruah
- Department of Chemistry, Dibrugarh University, Dibrugarh 786004, India
| |
Collapse
|
18
|
Pal A, Mulumudy R, Mitra P. Modularity-based parallel protein design algorithm with an implementation using shared memory programming. Proteins 2021; 90:658-669. [PMID: 34651333 DOI: 10.1002/prot.26263] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Revised: 09/23/2021] [Accepted: 10/01/2021] [Indexed: 01/08/2023]
Abstract
Given a target protein structure, the prime objective of protein design is to find amino acid sequences that will fold/acquire to the given three-dimensional structure. The protein design problem belongs to the non-deterministic polynomial-time-hard class as sequence search space increases exponentially with protein length. To ensure better search space exploration and faster convergence, we propose a protein modularity-based parallel protein design algorithm. The modular architecture of the protein structure is exploited by considering an intermediate structural organization between secondary structure and domain defined as protein unit (PU). Here, we have incorporated a divide-and-conquer approach where a protein is split into PUs and each PU region is explored in a parallel fashion. It has been further analyzed that our shared memory implementation of modularity-based parallel sequence search leads to better search space exploration compared to the case of traditional full protein design. Sequence-based analysis on design sequences depicts an average of 39.7% sequence similarity on the benchmark data set. Structure-based comparison of the modeled structures of the design protein with the target structure exhibited an average root-mean-square deviation of 1.17 Å and an average template modeling score of 0.89. The selected modeled structures of the design protein sequences are validated using 100 ns molecular dynamics simulations where 80% of the proteins have shown better or similar stability to the respective target proteins. Our study informs that our modularity-based protein design algorithm can be extended to protein interaction design as well.
Collapse
Affiliation(s)
- Abantika Pal
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Rohith Mulumudy
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| | - Pralay Mitra
- Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, West Bengal, India
| |
Collapse
|
19
|
Rabinovitch E, Mihara K, Sananes A, Zaretsky M, Heyne M, Shifman J, Aharoni A, Hollenberg MD, Papo N. A KLK4 proteinase substrate capture approach to antagonize PAR1. Sci Rep 2021; 11:16170. [PMID: 34373558 PMCID: PMC8352894 DOI: 10.1038/s41598-021-95666-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2021] [Accepted: 07/29/2021] [Indexed: 11/08/2022] Open
Abstract
Proteinase-activated receptor-1 (PAR1), triggered by thrombin and other serine proteinases such as tissue kallikrein-4 (KLK4), is a key driver of inflammation, tumor invasiveness and tumor metastasis. The PAR1 transmembrane G-protein-coupled receptor therefore represents an attractive target for therapeutic inhibitors. We thus used a computational design to develop a new PAR1 antagonist, namely, a catalytically inactive human KLK4 that acts as a proteinase substrate-capture reagent, preventing receptor cleavage (and hence activation) by binding to and occluding the extracellular R41-S42 canonical PAR1 proteolytic activation site. On the basis of in silico site-saturation mutagenesis, we then generated KLK4S207A,L185D, a first-of-a-kind 'decoy' PAR1 inhibitor, by mutating the S207A and L185D residues in wild-type KLK4, which strongly binds to PAR1. KLK4S207A,L185D markedly inhibited PAR1 cleavage, and PAR1-mediated MAPK/ERK activation as well as the migration and invasiveness of melanoma cells. This 'substrate-capturing' KLK4 variant, engineered to bind to PAR1, illustrates proof of principle for the utility of a KLK4 'proteinase substrate capture' approach to regulate proteinase-mediated PAR1 signaling.
Collapse
Affiliation(s)
- Eitan Rabinovitch
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O.B. 653, 84105, Beer-Sheva, Israel
| | - Koishiro Mihara
- Department of Physiology and Pharmacology, Cumming School of Medicine, University of Calgary, Calgary, Canada
| | - Amiram Sananes
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O.B. 653, 84105, Beer-Sheva, Israel
| | - Marianna Zaretsky
- Department of Life Sciences, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Michael Heyne
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O.B. 653, 84105, Beer-Sheva, Israel
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Givat Ram Campus, 91906, Jerusalem, Israel
| | - Julia Shifman
- Department of Biological Chemistry, The Hebrew University of Jerusalem, Givat Ram Campus, 91906, Jerusalem, Israel
| | - Amir Aharoni
- Department of Life Sciences, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, Beer-Sheva, Israel
| | - Morley D Hollenberg
- Department of Physiology and Pharmacology, Cumming School of Medicine, University of Calgary, Calgary, Canada
| | - Niv Papo
- Avram and Stella Goldstein-Goren Department of Biotechnology Engineering, National Institute of Biotechnology in the Negev, Ben-Gurion University of the Negev, P.O.B. 653, 84105, Beer-Sheva, Israel.
| |
Collapse
|
20
|
Pujari N, Saundh SL, Acquah FA, Mooers BHM, Ferré-D’Amaré AR, Leung AKW. Engineering Crystal Packing in RNA Structures I: Past and Future Strategies for Engineering RNA Packing in Crystals. CRYSTALS 2021; 11:952. [PMID: 34745656 PMCID: PMC8570644 DOI: 10.3390/cryst11080952] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
X-ray crystallography remains a powerful method to gain atomistic insights into the catalytic and regulatory functions of RNA molecules. However, the technique requires the preparation of diffraction-quality crystals. This is often a resource- and time-consuming venture because RNA crystallization is hindered by the conformational heterogeneity of RNA, as well as the limited opportunities for stereospecific intermolecular interactions between RNA molecules. The limited success at crystallization explains in part the smaller number of RNA-only structures in the Protein Data Bank. Several approaches have been developed to aid the formation of well-ordered RNA crystals. The majority of these are construct-engineering techniques that aim to introduce crystal contacts to favor the formation of well-diffracting crystals. A typical example is the insertion of tetraloop-tetraloop receptor pairs into non-essential RNA segments to promote intermolecular association. Other methods of promoting crystallization involve chaperones and crystallization-friendly molecules that increase RNA stability and improve crystal packing. In this review, we discuss the various techniques that have been successfully used to facilitate crystal packing of RNA molecules, recent advances in construct engineering, and directions for future research in this vital aspect of RNA crystallography.
Collapse
Affiliation(s)
- Narsimha Pujari
- Department of Veterinary Biomedical Sciences, University of Saskatchewan, Saskatoon, SK S7N 5B4, Canada
| | - Stephanie L. Saundh
- Department of Veterinary Biomedical Sciences, University of Saskatchewan, Saskatoon, SK S7N 5B4, Canada
| | - Francis A. Acquah
- Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Blaine H. M. Mooers
- Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
- Stephenson Cancer Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK 73104, USA
| | - Adrian R. Ferré-D’Amaré
- Biochemistry and Biophysics Center, National Heart, Lung and Blood Institute, Bethesda, MD 20892, USA
| | - Adelaine Kwun-Wai Leung
- Department of Veterinary Biomedical Sciences, University of Saskatchewan, Saskatoon, SK S7N 5B4, Canada
| |
Collapse
|
21
|
Pereira JM, Vieira M, Santos SM. Step-by-step design of proteins for small molecule interaction: A review on recent milestones. Protein Sci 2021; 30:1502-1520. [PMID: 33934427 PMCID: PMC8284594 DOI: 10.1002/pro.4098] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 04/21/2021] [Accepted: 04/23/2021] [Indexed: 01/01/2023]
Abstract
Protein design is the field of synthetic biology that aims at developing de novo custom-made proteins and peptides for specific applications. Despite exploring an ambitious goal, recent computational advances in both hardware and software technologies have paved the way to high-throughput screening and detailed design of novel folds and improved functionalities. Modern advances in the field of protein design for small molecule targeting are described in this review, organized in a step-by-step fashion: from the conception of a new or upgraded active binding site, to scaffold design, sequence optimization, and experimental expression of the custom protein. In each step, contemporary examples are described, and state-of-the-art software is briefly explored.
Collapse
Affiliation(s)
- José M. Pereira
- CICECO & Departamento de QuímicaUniversidade de AveiroAveiroPortugal
| | - Maria Vieira
- CICECO & Departamento de QuímicaUniversidade de AveiroAveiroPortugal
| | - Sérgio M. Santos
- CICECO & Departamento de QuímicaUniversidade de AveiroAveiroPortugal
| |
Collapse
|
22
|
Roda S, Robles-Martín A, Xiang R, Kazemi M, Guallar V. Structural-Based Modeling in Protein Engineering. A Must Do. J Phys Chem B 2021; 125:6491-6500. [PMID: 34106727 DOI: 10.1021/acs.jpcb.1c02545] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Biotechnological solutions will be a key aspect in our immediate future society, where optimized enzymatic processes through enzyme engineering might be an important solution for waste transformation, clean energy production, biodegradable materials, and green chemistry, for example. Here we advocate the importance of structural-based bioinformatics and molecular modeling tools in such developments. We summarize our recent experiences indicating a great prediction/success ratio, and we suggest that an early in silico phase should be performed in enzyme engineering studies. Moreover, we demonstrate the potential of a new technique combining Rosetta and PELE, which could provide a faster and more automated procedure, an essential aspect for a broader use.
Collapse
Affiliation(s)
- Sergi Roda
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | | | - Ruite Xiang
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Masoud Kazemi
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain
| | - Victor Guallar
- Barcelona Supercomputing Center (BSC), Barcelona 08034, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona 08010, Spain
| |
Collapse
|
23
|
Banerjee A, Pal K, Mitra P. An Evolutionary Profile Guided Greedy Parallel Replica-Exchange Monte Carlo Search Algorithm for Rapid Convergence in Protein Design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:489-499. [PMID: 31329126 DOI: 10.1109/tcbb.2019.2928809] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design, also known as the inverse protein folding problem, is the identification of a protein sequence that folds into a target protein structure. Protein design is proved as an NP-hard problem. While researchers are working on designing heuristics with an emphasis on new scoring functions, we propose a replica-exchange Monte Carlo (REMC) search algorithm that ensures faster convergence using a greedy strategy. Using biological insights, we construct an evolutionary profile to encode the amino acid variability in different positions of the target protein from its structural homologs. The evolutionary profile guides the REMC search, and the greedy approach confirms appreciable exploration and exploitation of the sequence-structure fitness surface. We allow termination of a simulation trajectory once stagnant situation is detected. A series of sequence and structure level validations establish the goodness of our design. On a benchmark dataset, our algorithm reports an average root-mean-square deviation of 1.21Å between the target and the design proteins when modeled with an existing protein folding software. Besides, our algorithm assures 6.16 times overall speedup. In Molecular Dynamics simulations, we observe that four out of selected five design proteins report better to comparable stability to the corresponding target proteins.
Collapse
|
24
|
Pan X, Kortemme T. Recent advances in de novo protein design: Principles, methods, and applications. J Biol Chem 2021; 296:100558. [PMID: 33744284 PMCID: PMC8065224 DOI: 10.1016/j.jbc.2021.100558] [Citation(s) in RCA: 93] [Impact Index Per Article: 31.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Revised: 03/12/2021] [Accepted: 03/16/2021] [Indexed: 02/06/2023] Open
Abstract
The computational de novo protein design is increasingly applied to address a number of key challenges in biomedicine and biological engineering. Successes in expanding applications are driven by advances in design principles and methods over several decades. Here, we review recent innovations in major aspects of the de novo protein design and include how these advances were informed by principles of protein architecture and interactions derived from the wealth of structures in the Protein Data Bank. We describe developments in de novo generation of designable backbone structures, optimization of sequences, design scoring functions, and the design of the function. The advances not only highlight design goals reachable now but also point to the challenges and opportunities for the future of the field.
Collapse
Affiliation(s)
- Xingjie Pan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA.
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, USA; UC Berkeley - UCSF Graduate Program in Bioengineering, University of California San Francisco, San Francisco, California, USA; Quantitative Biosciences Institute (QBI), University of California San Francisco, San Francisco, California, USA.
| |
Collapse
|
25
|
Xu G, Wang Q, Ma J. OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods. J Chem Inf Model 2020; 60:6691-6697. [PMID: 33211480 DOI: 10.1021/acs.jcim.0c00951] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Side-chain modeling is critical for protein structure prediction since the uniqueness of the protein structure is largely determined by its side-chain packing conformation. In this paper, differing from most approaches that rely on rotamer library sampling, we first propose a novel side-chain rotamer prediction method based on deep neural networks, named OPUS-RotaNN. Then, on the basis of our previous work OPUS-Rota2, we propose an open-source side-chain modeling framework, OPUS-Rota3, which integrates the results of different methods into its rotamer library as the sampling candidates. By including OPUS-RotaNN into OPUS-Rota3, we conduct our experiments on three native backbone test sets and one non-native backbone test set. On the native backbone test set, CAMEO-Hard61 for example, OPUS-Rota3 successfully predicts 51.14% of all side-chain dihedral angles with a tolerance criterion of 20° and outperforms OSCAR-star (50.87%), SCWRL4 (50.40%), and FASPR (49.85%). On the non-native backbone test set DB379-ITASSER, the accuracy of OPUS-Rota3 is 52.49%, better than OSCAR-star (48.95%), FASPR (48.69%), and SCWRL4 (48.29%). All the source codes including the training codes and the data we used are available at https://github.com/thuxugang/opus_rota3.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States.,Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
26
|
Huang X, Pearce R, Zhang Y. FASPR: an open-source tool for fast and accurate protein side-chain packing. Bioinformatics 2020; 36:3758-3765. [PMID: 32259206 DOI: 10.1093/bioinformatics/btaa234] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2020] [Revised: 03/30/2020] [Accepted: 04/01/2020] [Indexed: 01/04/2023] Open
Abstract
MOTIVATION Protein structure and function are essentially determined by how the side-chain atoms interact with each other. Thus, accurate protein side-chain packing (PSCP) is a critical step toward protein structure prediction and protein design. Despite the importance of the problem, however, the accuracy and speed of current PSCP programs are still not satisfactory. RESULTS We present FASPR for fast and accurate PSCP by using an optimized scoring function in combination with a deterministic searching algorithm. The performance of FASPR was compared with four state-of-the-art PSCP methods (CISRR, RASP, SCATD and SCWRL4) on both native and non-native protein backbones. For the assessment on native backbones, FASPR achieved a good performance by correctly predicting 69.1% of all the side-chain dihedral angles using a stringent tolerance criterion of 20°, compared favorably with SCWRL4, CISRR, RASP and SCATD which successfully predicted 68.8%, 68.6%, 67.8% and 61.7%, respectively. Additionally, FASPR achieved the highest speed for packing the 379 test protein structures in only 34.3 s, which was significantly faster than the control methods. For the assessment on non-native backbones, FASPR showed an equivalent or better performance on I-TASSER predicted backbones and the backbones perturbed from experimental structures. Detailed analyses showed that the major advantage of FASPR lies in the optimal combination of the dead-end elimination and tree decomposition with a well optimized scoring function, which makes FASPR of practical use for both protein structure modeling and protein design studies. AVAILABILITY AND IMPLEMENTATION The web server, source code and datasets are freely available at https://zhanglab.ccmb.med.umich.edu/FASPR and https://github.com/tommyhuangthu/FASPR. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
27
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
28
|
Vrancken JPM, Tame JRH, Voet ARD. Development and applications of artificial symmetrical proteins. Comput Struct Biotechnol J 2020; 18:3959-3968. [PMID: 33335692 PMCID: PMC7734218 DOI: 10.1016/j.csbj.2020.10.040] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 10/27/2020] [Accepted: 10/31/2020] [Indexed: 12/28/2022] Open
Abstract
Since the determination of the first molecular models of proteins there has been interest in creating proteins artificially, but such methods have only become widely successful in the last decade. Gradual improvements over a long period of time have now yielded numerous examples of non-natural proteins, many of which are built from repeated elements. In this review we discuss the design of such symmetrical proteins and their various applications in chemistry and medicine.
Collapse
Affiliation(s)
- Jeroen P M Vrancken
- Laboratory of Biomolecular Modelling and Design, Department of Chemistry, KU Leuven, Celestijnenlaan 200G, 3001 Leuven, Belgium
| | - Jeremy R H Tame
- Graduate School of Medical Life Science, Yokohama City University, 1-7-29 Suehiro, Yokohama, Kanagawa 230-0045, Japan
| | - Arnout R D Voet
- Laboratory of Biomolecular Modelling and Design, Department of Chemistry, KU Leuven, Celestijnenlaan 200G, 3001 Leuven, Belgium
| |
Collapse
|
29
|
Kabra R, Singh S. Evolutionary artificial intelligence based peptide discoveries for effective Covid-19 therapeutics. Biochim Biophys Acta Mol Basis Dis 2020; 1867:165978. [PMID: 32980462 PMCID: PMC7832815 DOI: 10.1016/j.bbadis.2020.165978] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 09/10/2020] [Accepted: 09/21/2020] [Indexed: 12/20/2022]
Abstract
An epidemic caused by COVID-19 in China turned into pandemic within a short duration affecting countries worldwide. Researchers and companies around the world are working on all the possible strategies to develop a curative or preventive strategy for the same, which includes vaccine development, drug repurposing, plasma therapy, and drug discovery based on Artificial intelligence. Therapeutic approaches based on Computational biology and Machine-learning algorithms are specially considered, with a view that these could provide a fast and accurate outcome in the present scenario. As an effort towards developing possible therapeutics for COVID-19, we have used machine-learning algorithms for the generation of alignment kernels from diverse viral sequences of Covid-19 reported from India, China, Italy and USA. Using these diverse sequences we have identified the conserved motifs and subsequently a peptide library was designed against them. Of these, 4 peptides have shown strong binding affinity against the main protease of SARS-CoV-2 (Mpro) and also maintained their stability and specificity under physiological conditions as observed through MD Simulations. Our data suggest that these evolutionary peptides against COVID-19 if found effective may provide cross-protection against diverse Covid-19 variants.
Collapse
Affiliation(s)
- Ritika Kabra
- National Centre for Cell Science, NCCS Complex, Ganeshkhind, SP Pune University Campus, Pune 411007, India
| | - Shailza Singh
- National Centre for Cell Science, NCCS Complex, Ganeshkhind, SP Pune University Campus, Pune 411007, India.
| |
Collapse
|
30
|
Surpeta B, Sequeiros-Borja CE, Brezovsky J. Dynamics, a Powerful Component of Current and Future in Silico Approaches for Protein Design and Engineering. Int J Mol Sci 2020; 21:E2713. [PMID: 32295283 PMCID: PMC7215530 DOI: 10.3390/ijms21082713] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2020] [Revised: 04/10/2020] [Accepted: 04/12/2020] [Indexed: 12/13/2022] Open
Abstract
Computational prediction has become an indispensable aid in the processes of engineering and designing proteins for various biotechnological applications. With the tremendous progress in more powerful computer hardware and more efficient algorithms, some of in silico tools and methods have started to apply the more realistic description of proteins as their conformational ensembles, making protein dynamics an integral part of their prediction workflows. To help protein engineers to harness benefits of considering dynamics in their designs, we surveyed new tools developed for analyses of conformational ensembles in order to select engineering hotspots and design mutations. Next, we discussed the collective evolution towards more flexible protein design methods, including ensemble-based approaches, knowledge-assisted methods, and provable algorithms. Finally, we highlighted apparent challenges that current approaches are facing and provided our perspectives on their further development.
Collapse
Affiliation(s)
- Bartłomiej Surpeta
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| | - Carlos Eduardo Sequeiros-Borja
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| | - Jan Brezovsky
- Laboratory of Biomolecular Interactions and Transport, Department of Gene Expression, Institute of Molecular Biology and Biotechnology, Faculty of Biology, Adam Mickiewicz University, Uniwersytetu Poznanskiego 6, 61-614 Poznan, Poland; (B.S.); (C.E.S.-B.)
- International Institute of Molecular and Cell Biology in Warsaw, Ks Trojdena 4, 02-109 Warsaw, Poland
| |
Collapse
|
31
|
Abstract
Proteins are molecular machines whose function depends on their ability to achieve complex folds with precisely defined structural and dynamic properties. The rational design of proteins from first-principles, or de novo, was once considered to be impossible, but today proteins with a variety of folds and functions have been realized. We review the evolution of the field from its earliest days, placing particular emphasis on how this endeavor has illuminated our understanding of the principles underlying the folding and function of natural proteins, and is informing the design of macromolecules with unprecedented structures and properties. An initial set of milestones in de novo protein design focused on the construction of sequences that folded in water and membranes to adopt folded conformations. The first proteins were designed from first-principles using very simple physical models. As computers became more powerful, the use of the rotamer approximation allowed one to discover amino acid sequences that stabilize the desired fold. As the crystallographic database of protein structures expanded in subsequent years, it became possible to construct proteins by assembling short backbone fragments that frequently recur in Nature. The second set of milestones in de novo design involves the discovery of complex functions. Proteins have been designed to bind a variety of metals, porphyrins, and other cofactors. The design of proteins that catalyze hydrolysis and oxygen-dependent reactions has progressed significantly. However, de novo design of catalysts for energetically demanding reactions, or even proteins that bind with high affinity and specificity to highly functionalized complex polar molecules remains an importnant challenge that is now being achieved. Finally, the protein design contributed significantly to our understanding of membrane protein folding and transport of ions across membranes. The area of membrane protein design, or more generally of biomimetic polymers that function in mixed or non-aqueous environments, is now becoming increasingly possible.
Collapse
|
32
|
Huang X, Pearce R, Zhang Y. Toward the Accuracy and Speed of Protein Side-Chain Packing: A Systematic Study on Rotamer Libraries. J Chem Inf Model 2019; 60:410-420. [PMID: 31851497 DOI: 10.1021/acs.jcim.9b00812] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Protein rotamers refer to the conformational isomers taken by the side-chains of amino acids to accommodate specific structural folding environments. Since accurate modeling of atomic interactions is difficult, rotamer information collected from experimentally solved protein structures is often used to guide side-chain packing in protein folding and sequence design studies. Many rotamer libraries have been built in the literature but there is little quantitative guidance on which libraries should be chosen for different structural modeling studies. Here, we performed a comparative study of six widely used rotamer libraries and systematically examined their suitability for protein folding and sequence design in four aspects: (1) side-chain match accuracy, (2) side-chain conformation prediction, (3) de novo protein sequence design, and (4) computational time cost. We demonstrated that, compared to the backbone-dependent rotamer libraries (BBDRLs), the backbone-independent rotamer libraries (BBIRLs) generated conformations that more closely matched the native conformations due to the larger number of rotamers in the local rotamer search spaces. However, more practically, using an optimized physical energy function incorporated into a simulated annealing Monte Carlo searching scheme, we showed that utilization of the BBDRLs could result in higher accuracies in side-chain prediction and higher sequence recapitulation rates in protein design experiments. Detailed data analyses showed that the major advantage of BBDRLs lies in the energy term derived from the rotamer probabilities that are associated with the individual backbone torsion angle subspaces. This term is important for distinguishing between amino acid identities as well as the rotamer conformations of an amino acid. Meanwhile, the backbone torsion angle subspace-specific rotamer search drastically speeds up the searching time, despite the significantly larger number of total rotamers in the BBDRLs. These results should provide important guidance for the development and selection of rotamer libraries for practical protein design and structure prediction studies.
Collapse
|
33
|
Chowdhury R, Maranas CD. From directed evolution to computational enzyme engineering—A review. AIChE J 2019. [DOI: 10.1002/aic.16847] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Ratul Chowdhury
- Department of Chemical Engineering The Pennsylvania State University University Park Pennsylvania
| | - Costas D. Maranas
- Department of Chemical Engineering The Pennsylvania State University University Park Pennsylvania
| |
Collapse
|
34
|
HALLEN MARKA, DONALD BRUCER. Protein Design by Provable Algorithms. COMMUNICATIONS OF THE ACM 2019; 62:76-84. [PMID: 31607753 PMCID: PMC6788629 DOI: 10.1145/3338124] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein design algorithms can leverage provable guarantees of accuracy to provide new insights and unique optimized molecules.
Collapse
Affiliation(s)
- MARK A. HALLEN
- Research assistant professor at the Toyota Technological Institute at Chicago, IL, USA
| | - BRUCE R. DONALD
- James B. Duke Professor of Computer Science at Duke University, as well as a
professor of chemistry and biochemistry in the Duke University Medical
Center, Durham, NC, USA
| |
Collapse
|
35
|
Cheung NJ, Yu W. Sibe: a computation tool to apply protein sequence statistics to predict folding and design in silico. BMC Bioinformatics 2019; 20:455. [PMID: 31492097 PMCID: PMC6728967 DOI: 10.1186/s12859-019-2984-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 07/03/2019] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Evolutionary information contained in the amino acid sequences of proteins specifies the biological function and fold, but exactly what information contained in the protein sequence drives both of these processes? Considerable progress has been made to answer this fundamental question, but it remains challenging to explore the potential space of cooperative interactions between amino acids. Statistical analysis plays a significant role in studying such interactions and its use has expanded in recent years to studies ranging from coevolution-guided rational protein design to protein folding in silico. RESULTS Here we describe a computational tool named Sibe for use in studies of protein sequence, folding, and design using evolutionary coupling between amino acids as a driving factor. In this study, Sibe is used to identify positionally conserved couplings between pairwise amino acids and aid rational protein design. In this process, pairwise couplings are filtered according to the relative entropy computed from the positional conservations and grouped into several 'blocks', which could contribute to driving protein folding and design. A human β2-adrenergic receptor (β2AR) was used to demonstrate that those 'blocks' contribute the rational design for specifying functional residues. Sibe also provides folding modules based on both the positionally conserved couplings and well-established statistical potentials for simulating protein folding in silico and predicting tertiary structure. Our results show that statistically inferences of basic evolutionary principles, such as conservations and coupled-mutations, can be used to rapidly design a diverse set of proteins and study protein folding. CONCLUSIONS The developed software Sibe provides a computational tool for systematical analysis from protein primary to its tertiary structure using the evolutionary couplings as a driving factor. Sibe, written in C++, accounts for compatibility with the 'big data' era in biological science, and it primarily focuses on protein sequence analysis, but it is also applicable to extend to other modeling and predictions of experimental measurements.
Collapse
Affiliation(s)
- Ngaam J. Cheung
- Department of Brain and Cognitive Science, DGIST, Daegu, 42988 South Korea
- Cavendish Laboratory, Department of Physics, University of Cambridge, Cambridge, CB3 0HA UK
| | - Wookyung Yu
- Department of Brain and Cognitive Science, DGIST, Daegu, 42988 South Korea
- Core Protein Resources Center, DGIST, Daegu, 42988 South Korea
| |
Collapse
|
36
|
Tollefson MR, Litman JM, Qi G, O'Connell CE, Wipfler MJ, Marini RJ, Bernabe HV, Tollefson WTA, Braun TA, Casavant TL, Smith RJH, Schnieders MJ. Structural Insights into Hearing Loss Genetics from Polarizable Protein Repacking. Biophys J 2019; 117:602-612. [PMID: 31327459 DOI: 10.1016/j.bpj.2019.06.030] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2019] [Revised: 06/10/2019] [Accepted: 06/25/2019] [Indexed: 12/21/2022] Open
Abstract
Hearing loss is associated with ∼8100 mutations in 152 genes, and within the coding regions of these genes are over 60,000 missense variants. The majority of these variants are classified as "variants of uncertain significance" to reflect our inability to ascribe a phenotypic effect to the observed amino acid change. A promising source of pathogenicity information is biophysical simulation, although input protein structures often contain defects because of limitations in experimental data and/or only distant homology to a template. Here, we combine the polarizable atomic multipole optimized energetics for biomolecular applications force field, many-body optimization theory, and graphical processing unit acceleration to repack all deafness-associated proteins and thereby improve average structure MolProbity score from 2.2 to 1.0. We then used these optimized wild-type models to create over 60,000 structures for missense variants in the Deafness Variation Database, which are being incorporated into the Deafness Variation Database to inform deafness pathogenicity prediction. Finally, this work demonstrates that advanced polarizable atomic multipole force fields are efficient enough to repack the entire human proteome.
Collapse
Affiliation(s)
- Mallory R Tollefson
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Molecular Otolaryngology & Renal Research Laboratories, Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | - Jacob M Litman
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Guowei Qi
- Department of Biochemistry, University of Iowa, Iowa City, Iowa
| | - Claire E O'Connell
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Matthew J Wipfler
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Robert J Marini
- Molecular Otolaryngology & Renal Research Laboratories, Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | - Hernan V Bernabe
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Molecular Otolaryngology & Renal Research Laboratories, Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa
| | | | - Terry A Braun
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Thomas L Casavant
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa
| | - Richard J H Smith
- Molecular Otolaryngology & Renal Research Laboratories, Department of Otolaryngology-Head and Neck Surgery, University of Iowa Hospitals and Clinics, Iowa City, Iowa.
| | - Michael J Schnieders
- Department of Biomedical Engineering, University of Iowa, Iowa City, Iowa; Department of Biochemistry, University of Iowa, Iowa City, Iowa.
| |
Collapse
|
37
|
Tabasinezhad M, Talebkhan Y, Wenzel W, Rahimi H, Omidinia E, Mahboudi F. Trends in therapeutic antibody affinity maturation: From in-vitro towards next-generation sequencing approaches. Immunol Lett 2019; 212:106-113. [PMID: 31247224 DOI: 10.1016/j.imlet.2019.06.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2019] [Revised: 06/08/2019] [Accepted: 06/24/2019] [Indexed: 12/12/2022]
Abstract
Current advances in antibody engineering driving the strongest growth area in biotherapeutic agents development. Affinity improvement that is mainly important for biological activity and clinical efficacy of therapeutic antibodies, has still remained a challenging task. In the human body, during a course of immune response affinity maturation increase antibody activity by several rounds of somatic hypermutation and clonal selection in the germinal center. The final outputs are antibodies representing higher affinity and specificity against a particular antigen. In the realm of biotechnology, exploring of mutations which improve antibody affinity while preserving its specificity and stability is an extremely time-consuming and laborious process. Recent advances in computational algorithms and DNA sequencing technologies help researchers to redesign antibody structure to achieve desired properties such as improved binding affinity. In this review, we briefly described the principle of affinity maturation and different corresponding in vitro techniques. Also, we recapitulated the most recent advancements in the field of antibody affinity maturation including computational approaches and next-generation sequencing (NGS).
Collapse
Affiliation(s)
- Maryam Tabasinezhad
- Biotechnology Research Centre, Pasteur Institute of Iran, Tehran, Iran; Institute of Nanotechnology, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Yeganeh Talebkhan
- Biotechnology Research Centre, Pasteur Institute of Iran, Tehran, Iran
| | - Wolfgang Wenzel
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Hamzeh Rahimi
- Molecular Medicine Department, Pasteur Institute of Iran, Tehran, Iran
| | - Eskandar Omidinia
- Genetics & Metabolism Research Centre, Pasteur Institute of Iran, Tehran, Iran.
| | | |
Collapse
|
38
|
DockRMSD: an open-source tool for atom mapping and RMSD calculation of symmetric molecules through graph isomorphism. J Cheminform 2019; 11:40. [PMID: 31175455 PMCID: PMC6556049 DOI: 10.1186/s13321-019-0362-7] [Citation(s) in RCA: 158] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2019] [Accepted: 05/30/2019] [Indexed: 11/29/2022] Open
Abstract
Comparison of ligand poses generated by protein–ligand docking programs has often been carried out with the assumption of direct atomic correspondence between ligand structures. However, this correspondence is not necessarily chemically relevant for symmetric molecules and can lead to an artificial inflation of ligand pose distance metrics, particularly those that depend on receptor superposition (rather than ligand superposition), such as docking root mean square deviation (RMSD). Several of the commonly-used RMSD calculation algorithms that correct for molecular symmetry do not take into account the bonding structure of molecules and can therefore result in non-physical atomic mapping. Here, we present DockRMSD, a docking pose distance calculator that converts the symmetry correction to a graph isomorphism searching problem, in which the optimal atomic mapping and RMSD calculation are performed by an exhaustive and fast matching search of all isomorphisms of the ligand structure graph. We show through evaluation of docking poses generated by AutoDock Vina on the CSAR Hi-Q set that DockRMSD is capable of deterministically identifying the minimum symmetry-corrected RMSD and is able to do so without significant loss of computational efficiency compared to other methods. The open-source DockRMSD program can be conveniently integrated with various docking pipelines to assist with accurate atomic mapping and RMSD calculations, which can therefore help improve docking performance, especially for ligand molecules with complicated structural symmetry.
Collapse
|
39
|
Leem J, Deane CM. High-Throughput Antibody Structure Modeling and Design Using ABodyBuilder. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2019; 1851:367-380. [PMID: 30298409 DOI: 10.1007/978-1-4939-8736-8_21] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Antibodies are proteins of the adaptive immune system; they can be designed to bind almost any molecule, and are increasingly being used as biotherapeutics. Experimental antibody design is an expensive and time-consuming process, and computational antibody design methods can now be used to help develop new therapeutics and diagnostics. Within the design pipeline, accurate antibody structure modeling is essential, as it provides the basis for antibody-antigen docking, binding affinity prediction, and estimating thermal stability. Ideally, models should be rapidly generated, allowing the exploration of the breadth of antibody space. This allows methods to replicate the natural processes of antibody diversification (e.g., V(D)J recombination and somatic hypermutation), and cope with large volumes of data that are typical of next-generation sequencing datasets. Here we describe ABodyBuilder and PEARS, algorithms that build and mutate antibody model structures. These methods take ~30 s to generate a model antibody structure.
Collapse
Affiliation(s)
- Jinwoo Leem
- Department of Statistics, University of Oxford, Oxford, UK
| | | |
Collapse
|
40
|
Jumper JM, Faruk NF, Freed KF, Sosnick TR. Accurate calculation of side chain packing and free energy with applications to protein molecular dynamics. PLoS Comput Biol 2018; 14:e1006342. [PMID: 30589846 PMCID: PMC6307715 DOI: 10.1371/journal.pcbi.1006342] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Accepted: 06/21/2018] [Indexed: 12/02/2022] Open
Abstract
To address the large gap between time scales that can be easily reached by molecular simulations and those required to understand protein dynamics, we present a rapid self-consistent approximation of the side chain free energy at every integration step. In analogy with the adiabatic Born-Oppenheimer approximation for electronic structure, the protein backbone dynamics are simulated as preceding according to the dictates of the free energy of an instantaneously-equilibrated side chain potential. The side chain free energy is computed on the fly, allowing the protein backbone dynamics to traverse a greatly smoothed energetic landscape. This computation results in extremely rapid equilibration and sampling of the Boltzmann distribution. Our method, termed Upside, employs a reduced model involving the three backbone atoms, along with the carbonyl oxygen and amide proton, and a single (oriented) side chain bead having multiple locations reflecting the conformational diversity of the side chain's rotameric states. We also introduce a novel, maximum-likelihood method to parameterize the side chain interactions using protein structures. We demonstrate state-of-the-art accuracy for predicting χ1 rotamer states while consuming only milliseconds of CPU time. Our method enables rapidly equilibrating coarse-grained simulations that can nonetheless contain significant molecular detail. We also show that the resulting free energies of the side chains are sufficiently accurate for de novo folding of some proteins.
Collapse
Affiliation(s)
- John M. Jumper
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
- Department of Chemistry, and The James Franck Institute, University of Chicago, Chicago, Illinois, United States of America
| | - Nabil F. Faruk
- Graduate Program in Biophysical Sciences, University of Chicago, Chicago, Illinois, United States of America
| | - Karl F. Freed
- Department of Chemistry, and The James Franck Institute, University of Chicago, Chicago, Illinois, United States of America
| | - Tobin R. Sosnick
- Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, Illinois, United States of America
- Institute for Biophysical Dynamics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
41
|
Hallen MA, Martin JW, Ojewole A, Jou JD, Lowegard AU, Frenkel MS, Gainza P, Nisonoff HM, Mukund A, Wang S, Holt GT, Zhou D, Dowd E, Donald BR. OSPREY 3.0: Open-source protein redesign for you, with powerful new features. J Comput Chem 2018; 39:2494-2507. [PMID: 30368845 PMCID: PMC6391056 DOI: 10.1002/jcc.25522] [Citation(s) in RCA: 49] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 06/14/2018] [Indexed: 12/14/2022]
Abstract
We present osprey 3.0, a new and greatly improved release of the osprey protein design software. Osprey 3.0 features a convenient new Python interface, which greatly improves its ease of use. It is over two orders of magnitude faster than previous versions of osprey when running the same algorithms on the same hardware. Moreover, osprey 3.0 includes several new algorithms, which introduce substantial speedups as well as improved biophysical modeling. It also includes GPU support, which provides an additional speedup of over an order of magnitude. Like previous versions of osprey, osprey 3.0 offers a unique package of advantages over other design software, including provable design algorithms that account for continuous flexibility during design and model conformational entropy. Finally, we show here empirically that osprey 3.0 accurately predicts the effect of mutations on protein-protein binding. Osprey 3.0 is available at http://www.cs.duke.edu/donaldlab/osprey.php as free and open-source software. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mark A. Hallen
- Department of Computer Science, Duke University, Durham, NC
27708
- Toyota Technological Institute at Chicago, Chicago, IL
60637
| | | | - Adegoke Ojewole
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Jonathan D. Jou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Anna U. Lowegard
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Marcel S. Frenkel
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| | - Pablo Gainza
- Department of Computer Science, Duke University, Durham, NC
27708
| | | | - Aditya Mukund
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Siyu Wang
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - Graham T. Holt
- Program in Computational Biology and Bioinformatics, Duke
University Medical Center, Durham, NC 27710
| | - David Zhou
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Elizabeth Dowd
- Department of Computer Science, Duke University, Durham, NC
27708
| | - Bruce R. Donald
- Department of Computer Science, Duke University, Durham, NC
27708
- Department of Chemistry, Duke University, Durham, NC
27708
- Department of Biochemistry, Duke University Medical Center,
Durham, NC 27710
| |
Collapse
|
42
|
Hallen MA. PLUG (Pruning of Local Unrealistic Geometries) removes restrictions on biophysical modeling for protein design. Proteins 2018; 87:62-73. [PMID: 30378699 DOI: 10.1002/prot.25623] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 10/10/2018] [Accepted: 10/16/2018] [Indexed: 12/29/2022]
Abstract
Protein design algorithms must search an enormous conformational space to identify favorable conformations. As a result, those that perform this search with guarantees of accuracy generally start with a conformational pruning step, such as dead-end elimination (DEE). However, the mathematical assumptions of DEE-based pruning algorithms have up to now severely restricted the biophysical model that can feasibly be used in protein design. To lift these restrictions, I propose to prune local unrealistic geometries (PLUG) using a linear programming-based method. PLUG's biophysical model consists only of well-known lower bounds on interatomic distances. PLUG is intended as preprocessing for energy-based protein design calculations, whose biophysical model need not support DEE pruning. Based on 96 test cases, PLUG is at least as effective at pruning as DEE for larger protein designs-the type that most require pruning. When combined with the LUTE protein design algorithm, PLUG greatly facilitates designs that account for continuous entropy, large multistate designs with continuous flexibility, and designs with extensive continuous backbone flexibility and advanced nonpairwise energy functions. Many of these designs are tractable only with PLUG, either for empirical reasons (LUTE's machine learning step achieves an accurate fit only after PLUG pruning), or for theoretical reasons (many energy functions are fundamentally incompatible with DEE).
Collapse
Affiliation(s)
- Mark A Hallen
- Toyota Technological Institute at Chicago, Chicago, Illinois
| |
Collapse
|
43
|
Hooper WF, Walcott BD, Wang X, Bystroff C. Fast design of arbitrary length loops in proteins using InteractiveRosetta. BMC Bioinformatics 2018; 19:337. [PMID: 30249181 PMCID: PMC6154894 DOI: 10.1186/s12859-018-2345-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Accepted: 08/29/2018] [Indexed: 11/10/2022] Open
Abstract
Background With increasing interest in ab initio protein design, there is a desire to be able to fully explore the design space of insertions and deletions. Nature inserts and deletes residues to optimize energy and function, but allowing variable length indels in the context of an interactive protein design session presents challenges with regard to speed and accuracy. Results Here we present a new module (INDEL) for InteractiveRosetta which allows the user to specify a range of lengths for a desired indel, and which returns a set of low energy backbones in a matter of seconds. To make the loop search fast, loop anchor points are geometrically hashed using C α-C α and C β-C β distances, and the hash is mapped to start and end points in a pre-compiled random access file of non-redundant, protein backbone coordinates. Loops with superposable anchors are filtered for collisions and returned to InteractiveRosetta as poly-alanine for display and selective incorporation into the design template. Sidechains can then be added using RosettaDesign tools. Conclusions INDEL was able to find viable loops in 100% of 500 attempts for all lengths from 3 to 20 residues. INDEL has been applied to the task of designing a domain-swapping loop for T7-endonuclease I, changing its specificity from Holliday junctions to paranemic crossover (PX) DNA. Electronic supplementary material The online version of this article (10.1186/s12859-018-2345-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- William F Hooper
- Emmes Corporation, Rockville, Washington, MD, USA.,Department of Biology, Rensselaer Polytechnic Institute, Troy, NY, USA
| | | | - Xing Wang
- Department of Chemistry and Chemical Biology, Rensselaer Polytechnic Institute, Troy, NY, USA
| | - Christopher Bystroff
- Department of Biology, Rensselaer Polytechnic Institute, Troy, NY, USA. .,Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY, USA.
| |
Collapse
|
44
|
Dauzhenka T, Kundrotas PJ, Vakser IA. Computational Feasibility of an Exhaustive Search of Side-Chain Conformations in Protein-Protein Docking. J Comput Chem 2018; 39:2012-2021. [PMID: 30226647 DOI: 10.1002/jcc.25381] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 03/24/2018] [Accepted: 05/26/2018] [Indexed: 11/07/2022]
Abstract
Protein-protein docking procedures typically perform the global scan of the proteins relative positions, followed by the local refinement of the putative matches. Because of the size of the search space, the global scan is usually implemented as rigid-body search, using computationally inexpensive intermolecular energy approximations. An adequate refinement has to take into account structural flexibility. Since the refinement performs conformational search of the interacting proteins, it is extremely computationally challenging, given the enormous amount of the internal degrees of freedom. Different approaches limit the search space by restricting the search to the side chains, rotameric states, coarse-grained structure representation, principal normal modes, and so on. Still, even with the approximations, the refinement presents an extreme computational challenge due to the very large number of the remaining degrees of freedom. Given the complexity of the search space, the advantage of the exhaustive search is obvious. The obstacle to such search is computational feasibility. However, the growing computational power of modern computers, especially due to the increasing utilization of Graphics Processing Unit (GPU) with large amount of specialized computing cores, extends the ranges of applicability of the brute-force search methods. This proof-of-concept study demonstrates computational feasibility of an exhaustive search of side-chain conformations in protein pocking. The procedure, implemented on the GPU architecture, was used to generate the optimal conformations in a large representative set of protein-protein complexes. © 2018 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Taras Dauzhenka
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047
| | - Petras J Kundrotas
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047
| | - Ilya A Vakser
- Center for Computational Biology, The University of Kansas, Lawrence, Kansas, 66047.,Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas, 66047
| |
Collapse
|
45
|
Abstract
Motivation Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design. Results We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity. Availability and implementation https://shen-lab.github.io/software/iCFN. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mostafa Karimi
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| | - Yang Shen
- Department of Electrical and Computer Engineering and TEES-AgriLife Center for Bioinformatics and Genomic Systems Engineering, Texas A&M University, College Station, USA
| |
Collapse
|
46
|
Hogues H, Gaudreault F, Corbeil CR, Deprez C, Sulea T, Purisima EO. ProPOSE: Direct Exhaustive Protein-Protein Docking with Side Chain Flexibility. J Chem Theory Comput 2018; 14:4938-4947. [PMID: 30107730 DOI: 10.1021/acs.jctc.8b00225] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Despite decades of development, protein-protein docking remains a largely unsolved problem. The main difficulties are the immense space spanned by the translational and rotational degrees of freedom and the prediction of the conformational changes of proteins upon binding. FFT is generally the preferred method to exhaustively explore the translation-rotation space at a fine grid resolution, albeit with the trade-off of approximating force fields with correlation functions. This work presents a direct search alternative that samples the states in Cartesian space at the same resolution and computational cost as standard FFT methods. Operating in real space allows the use of standard force field functional forms used in typical non-FFT methods as well as the implementation of strategies for focused exploration of conformational flexibility. Currently, a few misplaced side chains can cause docking programs to fail. This work specifically addresses the problem of side chain rearrangements upon complex formation. Based on the observation that most side chains retain their unbound conformation upon binding, each rigidly docked pose is initially scored ignoring up to a limited number of side chain overlaps which are resolved in subsequent repacking and minimization steps. On test systems where side chains are altered and backbones held in their bound state, this implementation provides significantly better native pose recovery and higher quality (lower RMSD) predictions when compared with five of the most popular docking programs. The method is implemented in the software program ProPOSE (Protein Pose Optimization by Systematic Enumeration).
Collapse
Affiliation(s)
- Hervé Hogues
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| | - Francis Gaudreault
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| | - Christopher R Corbeil
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| | - Christophe Deprez
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| | - Traian Sulea
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| | - Enrico O Purisima
- Human Health Therapeutics , National Research Council Canada , 6100 Royalmount Avenue , Montreal , Quebec H4P 2R2 , Canada
| |
Collapse
|
47
|
Colbes J, Corona RI, Lezcano C, Rodríguez D, Brizuela CA. Protein side-chain packing problem: is there still room for improvement? Brief Bioinform 2018; 18:1033-1043. [PMID: 27567382 DOI: 10.1093/bib/bbw079] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Indexed: 11/12/2022] Open
Abstract
The protein side-chain packing problem (PSCPP) is an important subproblem of both protein structure prediction and protein design. During the past two decades, a large number of methods have been proposed to tackle this problem. These methods consist of three main components: a rotamer library, a scoring function and a search strategy. The average overall accuracy level obtained by these methods is approximately 87%. Whether a better accuracy level could be achieved remains to be answered. To address this question, we calculated the maximum accuracy level attainable using a simple rotamer library, independently of the energy function or the search method. Using 2883 different structures from the Protein Data Bank, we compared this accuracy level with the accuracy level of five state-of-the-art methods. These comparisons indicated that, for buried residues in the protein, we are already close to the best possible accuracy results. In addition, for exposed residues, we found that a significant gap exists between the possible improvement and the maximum accuracy level achievable with current methods. After determining that an improvement is possible, the next step is to understand what limitations are preventing us from obtaining such an improvement. Previous works on protein structure prediction and protein design have shown that scoring function inaccuracies may represent the main obstacle to achieving better results for these problems. To show that the same is true for the PSCPP, we evaluated the quality of two scoring functions used by some state-of-the-art algorithms. Our results indicate that neither of these scoring functions can guide the search method correctly, thereby reinforcing the idea that efforts to solve the PSCPP must also focus on developing better scoring functions.
Collapse
|
48
|
Shekhovtsov A, Swoboda P, Savchynskyy B, Shekhovtsov A, Swoboda P, Savchynskyy B, Savchynskyy B, Shekhovtsov A, Swoboda P. Maximum Persistency via Iterative Relaxed Inference in Graphical Models. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2018; 40:1668-1682. [PMID: 28742030 DOI: 10.1109/tpami.2017.2730884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if they provably do not belong to any solution. With access to an exact solver of a linear programming relaxation to the MAP-inference problem, our algorithm marks the maximal possible (in a specified sense) number of labels. We also present a version of the algorithm, which has access to a suboptimal dual solver only and still can ensure the (non-)optimality for the marked labels, although the overall number of the marked labels may decrease. We propose an efficient implementation, which runs in time comparable to a single run of a suboptimal dual solver. Our method is well-scalable and shows state-of-the-art results on computational benchmarks from machine learning and computer vision.
Collapse
|
49
|
Abstract
During the last two decades, the pharmaceutical industry has progressed from detecting small molecules to designing biologic-based therapeutics. Amino acid-based drugs are a group of biologic-based therapeutics that can effectively combat the diseases caused by drug resistance or molecular deficiency. Computational techniques play a key role to design and develop the amino acid-based therapeutics such as proteins, peptides and peptidomimetics. In this study, it was attempted to discuss the various elements for computational design of amino acid-based therapeutics. Protein design seeks to identify the properties of amino acid sequences that fold to predetermined structures with desirable structural and functional characteristics. Peptide drugs occupy a middle space between proteins and small molecules and it is hoped that they can target "undruggable" intracellular protein-protein interactions. Peptidomimetics, the compounds that mimic the biologic characteristics of peptides, present refined pharmacokinetic properties compared to the original peptides. Here, the elaborated techniques that are developed to characterize the amino acid sequences consistent with a specific structure and allow protein design are discussed. Moreover, the key principles and recent advances in currently introduced computational techniques for rational peptide design are spotlighted. The most advanced computational techniques developed to design novel peptidomimetics are also summarized.
Collapse
Affiliation(s)
- Tayebeh Farhadi
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Seyed MohammadReza Hashemian
- Chronic Respiratory Diseases Research Center (CRDRC), National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran, Iran
- Clinical Tuberculosis and Epidemiology Research Center, National Research Institute of Tuberculosis and Lung Disease, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| |
Collapse
|
50
|
Setiawan D, Brender J, Zhang Y. Recent advances in automated protein design and its future challenges. Expert Opin Drug Discov 2018; 13:587-604. [PMID: 29695210 DOI: 10.1080/17460441.2018.1465922] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
INTRODUCTION Protein function is determined by protein structure which is in turn determined by the corresponding protein sequence. If the rules that cause a protein to adopt a particular structure are understood, it should be possible to refine or even redefine the function of a protein by working backwards from the desired structure to the sequence. Automated protein design attempts to calculate the effects of mutations computationally with the goal of more radical or complex transformations than are accessible by experimental techniques. Areas covered: The authors give a brief overview of the recent methodological advances in computer-aided protein design, showing how methodological choices affect final design and how automated protein design can be used to address problems considered beyond traditional protein engineering, including the creation of novel protein scaffolds for drug development. Also, the authors address specifically the future challenges in the development of automated protein design. Expert opinion: Automated protein design holds potential as a protein engineering technique, particularly in cases where screening by combinatorial mutagenesis is problematic. Considering solubility and immunogenicity issues, automated protein design is initially more likely to make an impact as a research tool for exploring basic biology in drug discovery than in the design of protein biologics.
Collapse
Affiliation(s)
- Dani Setiawan
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA
| | - Jeffrey Brender
- b Radiation Biology Branch , Center for Cancer Research, National Cancer Institute - NIH , Bethesda , MD , USA
| | - Yang Zhang
- a Department of Computational Medicine and Bioinformatics , University of Michigan , Ann Arbor , MI , USA.,c Department of Biological Chemistry , University of Michigan , Ann Arbor , MI , USA
| |
Collapse
|