1
|
Soleymani F, Paquet E, Viktor H, Michalowski W, Spinello D. Protein-protein interaction prediction with deep learning: A comprehensive review. Comput Struct Biotechnol J 2022; 20:5316-5341. [PMID: 36212542 PMCID: PMC9520216 DOI: 10.1016/j.csbj.2022.08.070] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/15/2022] Open
Abstract
Most proteins perform their biological function by interacting with themselves or other molecules. Thus, one may obtain biological insights into protein functions, disease prevalence, and therapy development by identifying protein-protein interactions (PPI). However, finding the interacting and non-interacting protein pairs through experimental approaches is labour-intensive and time-consuming, owing to the variety of proteins. Hence, protein-protein interaction and protein-ligand binding problems have drawn attention in the fields of bioinformatics and computer-aided drug discovery. Deep learning methods paved the way for scientists to predict the 3-D structure of proteins from genomes, predict the functions and attributes of a protein, and modify and design new proteins to provide desired functions. This review focuses on recent deep learning methods applied to problems including predicting protein functions, protein-protein interaction and their sites, protein-ligand binding, and protein design.
Collapse
Affiliation(s)
- Farzan Soleymani
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| | - Eric Paquet
- National Research Council, 1200 Montreal Road, Ottawa, ON K1A 0R6, Canada
| | - Herna Viktor
- School of Electrical Engineering and Computer Science, University of Ottawa, ON, Canada
| | | | - Davide Spinello
- Department of Mechanical Engineering, University of Ottawa, Ottawa, ON, Canada
| |
Collapse
|
2
|
Mansour B, Bayoumi WA, El-Sayed MA, Abouzeid LA, Massoud MAM. In vitro cytotoxicity and docking study of novel symmetric and asymmetric dihydropyridines and pyridines as EGFR tyrosine kinase inhibitors. Chem Biol Drug Des 2022; 100:121-135. [PMID: 35501997 DOI: 10.1111/cbdd.14058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2021] [Revised: 12/28/2021] [Accepted: 04/10/2022] [Indexed: 12/24/2022]
Abstract
Quinolines have a weighty effect as anticancer agents and 1,4-DHPs have demonstrated efficacy as anticancer agents in several studies, as well. New hybrid models of symmetric and asymmetric 1,4-DHPs and pyridines linked at C3 of 2-chloroquinoline as a new anticancer scaffold, were designed and synthesized. Hantszch 1,4-DHPs method was adopted for chemical synthesis. MTT assay was performed for the evaluation of cytotoxicity, and EGFR tyrosine kinase assay was performed to investigate binding to our selected compounds, measured by ELISA. The IC50 expressed in µM values revealed that compounds 4a,b, and 5i,k showed the best results against the tested four cell lines than the reference drug 5-Flurouuracil. Compound 5k displayed the most potent cytotoxic activity with IC50 values in the low µM range (12.03 ± 1.51: 20.09 ± 2.16 µM), compared with 5-Fu IC50 range (40.74 ± 2.46: 63.81 ± 2.69 µM). The incorporation of 2-chloroquinoline at C3 to C4 of 1,4-DHP could be proposed as an anticancer scaffold rather than its analogous pyridines. Ester fragments connected to 1,4-DHPs ring as a lipophilic part are essential for anticancer activity. The chirality at C4 improved the anticancer activity. The hydrogen and halogen bond facilitated protein-ligand binding mode and affinity.
Collapse
Affiliation(s)
- Basem Mansour
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Delta University for Science and Technology, Mansoura, Egypt
| | - Waleed A Bayoumi
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Delta University for Science and Technology, Mansoura, Egypt.,Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura, Egypt
| | - Magda A El-Sayed
- Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura, Egypt.,Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Horus University, New Damietta, Egypt
| | - Laila A Abouzeid
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Delta University for Science and Technology, Mansoura, Egypt.,Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura, Egypt
| | - Mohammed A M Massoud
- Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura, Egypt
| |
Collapse
|
3
|
San Fabián J, Ema I, Omar S, García de la Vega JM. Toward a Computational NMR Procedure for Modeling Dipeptide Side-Chain Conformation. J Chem Inf Model 2021; 61:6012-6023. [PMID: 34762416 PMCID: PMC8715507 DOI: 10.1021/acs.jcim.1c00773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
![]()
Theoretical relationships between
the vicinal spin–spin
coupling constants (SSCCs) and the χ1 torsion angles
have been studied to predict the conformations of protein side chains.
An efficient computational procedure is developed to obtain the conformation
of dipeptides through theoretical and experimental SSCCs, Karplus
equations, and quantum chemistry methods, and it is applied to three
aliphatic hydrophobic residues (Val, Leu, and Ile). Three models are
proposed: unimodal-static, trimodal-static-stepped, and trimodal-static-trigonal,
where the most important factors are incorporated (coupled nuclei,
nature and orientation of the substituents, and local geometric properties).
Our results are validated by comparison with NMR and X-ray empirical
data described in the literature, obtaining successful results on
the 29 residues considered. Using out trimodal residue treatment,
it is possible to detect and resolve residues with a simple conformation
and those with two or three staggered conformers. In four residues,
a deeper analysis explains that they do not have a unique conformation
and that the population of each conformation plays an important role.
Collapse
Affiliation(s)
- Jesús San Fabián
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Ignacio Ema
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | - Salama Omar
- Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid, 28049 Madrid, Spain
| | | |
Collapse
|
4
|
Binder design for targeting SARS-CoV-2 spike protein: An in silico perspective. GENE REPORTS 2021; 26:101452. [PMID: 34849425 PMCID: PMC8616691 DOI: 10.1016/j.genrep.2021.101452] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Revised: 11/16/2021] [Accepted: 11/18/2021] [Indexed: 11/22/2022]
Abstract
Introduction The COVID-19 pandemic is now affecting all people around the world and getting worse. New antiviral medications are desperately needed other than the few approved medications that have shown no promising efficacy so far. Methods Here we report three blocking binders for targeting SARS-CoV-2 spike protein to block the interaction between the spike protein on the SARS-CoV-2 and the angiotensin-converting enzyme 2 (ACE2) receptors, responsible for viral homing into the alveolar epithelium type II cells (AECII). Results The design process is based on the collected natural scaffolds and using Rosetta interface for designing the binders. Conclusion Based on the structural analysis, three binders were selected, and the results showed that they might be promising as new therapeutic targets for blocking COVID-19.
Collapse
|
5
|
Abstract
The purpose of this quick guide is to help new modelers who have little or no background in comparative modeling yet are keen to produce high-resolution protein 3D structures for their study by following systematic good modeling practices, using affordable personal computers or online computational resources. Through the available experimental 3D-structure repositories, the modeler should be able to access and use the atomic coordinates for building homology models. We also aim to provide the modeler with a rationale behind making a simple list of atomic coordinates suitable for computational analysis abiding to principles of physics (e.g., molecular mechanics). Keeping that objective in mind, these quick tips cover the process of homology modeling and some postmodeling computations such as molecular docking and molecular dynamics (MD). A brief section was left for modeling nonprotein molecules, and a short case study of homology modeling is discussed.
Collapse
Affiliation(s)
- Yazan Haddad
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Vojtech Adam
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| | - Zbynek Heger
- Department of Chemistry and Biochemistry, Mendel University in Brno, Brno, Czech Republic
- Central European Institute of Technology, Brno University of Technology, Brno, Czech Republic
| |
Collapse
|
6
|
Qi Y, Zhang JZH. DenseCPD: Improving the Accuracy of Neural-Network-Based Computational Protein Sequence Design with DenseNet. J Chem Inf Model 2020; 60:1245-1252. [DOI: 10.1021/acs.jcim.0c00043] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Yifei Qi
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
| | - John Z. H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China
- Department of Chemistry, New York University, New York, New York 10003, United States
| |
Collapse
|
7
|
Chen S, Sun Z, Lin L, Liu Z, Liu X, Chong Y, Lu Y, Zhao H, Yang Y. To Improve Protein Sequence Profile Prediction through Image Captioning on Pairwise Residue Distance Map. J Chem Inf Model 2019; 60:391-399. [PMID: 31800243 DOI: 10.1021/acs.jcim.9b00438] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein sequence profile prediction aims to generate multiple sequences from structural information to advance the protein design. Protein sequence profile can be computationally predicted by energy-based or fragment-based methods. By integrating these methods with neural networks, our previous method, SPIN2, has achieved a sequence recovery rate of 34%. However, SPIN2 employed only one-dimensional (1D) structural properties that are not sufficient to represent three-dimensional (3D) structures. In this study, we represented 3D structures by 2D maps of pairwise residue distances and developed a new method (SPROF) to predict protein sequence profiles based on an image captioning learning frame. To our best knowledge, this is the first method to employ a 2D distance map for predicting protein properties. SPROF achieved 39.8% in sequence recovery of residues on the independent test set, representing a 5.2% improvement over SPIN2. We also found the sequence recovery increased with the number of their neighbored residues in 3D structural space, indicating that our method can effectively learn long-range information from the 2D distance map. Thus, such network architecture using a 2D distance map is expected to be useful for other 3D structure-based applications, such as binding site prediction, protein function prediction, and protein interaction prediction. The online server and the source code is available at http://biomed.nscc-gz.cn and https://github.com/biomed-AI/SPROF , respectively.
Collapse
Affiliation(s)
- Sheng Chen
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zhe Sun
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Lihua Lin
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Zifeng Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Xun Liu
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutian Chong
- Third Affiliated Hospital of Sun Yat-sen University , Guangzhou 510000 , China
| | - Yutong Lu
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital , Sun Yat-sen University , Guangzhou 510000 , China
| | - Yuedong Yang
- School of Data and Computer Science , Sun Yat-sen University , Guangzhou 510000 , China.,Key Laboratory of Machine Intelligence and Advanced Computing (Sun Yat-sen University) of the Ministry of Education , Guangzhou 510000 , China
| |
Collapse
|
8
|
Xiong P, Hu X, Huang B, Zhang J, Chen Q, Liu H. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 2019; 36:136-144. [DOI: 10.1093/bioinformatics/btz515] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/29/2019] [Accepted: 06/21/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures.
Results
We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments.
Availability and implementation
The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peng Xiong
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Xiuhong Hu
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Bin Huang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Jiahai Zhang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Quan Chen
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Haiyan Liu
- School of Life Sciences, Hefei, Anhui 230026, China
- Hefei National Laboratory for Physical Sciences at the Microscale, Hefei, Anhui 230026, China
- School of Data Science, University of Sciences and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
9
|
Enhanced catalytic activities and modified substrate preferences for taxoid 10β-O-acetyl transferase mutants by engineering catalytic histidine residues. Biotechnol Lett 2018; 40:1245-1251. [PMID: 29869304 DOI: 10.1007/s10529-018-2573-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2018] [Accepted: 05/18/2018] [Indexed: 12/14/2022]
Abstract
OBJECTIVES Taxoid 10β-O-acetyl transferase (DBAT) was redesigned to enhance its catalytic activity and substrate preference for baccatin III and taxol biosynthesis. RESULTS Residues H162, D166 and R363 were determined as potential sites within the catalytic pocket of DBAT for molecular docking and site-directed mutagenesis to modify the activity of DBAT. Enzymatic activity assays revealed that the kcat/KM values of mutant H162A/R363H, D166H, R363H, D166H/R363H acting on 10-deacetylbaccatin III were about 3, 15, 26 and 60 times higher than that of the wild type of DBAT, respectively. Substrate preference assays indicated that these mutants (H162A/R363H, D166H, R363H, D166H/R363H) could transfer acetyl group from unnatural acetyl donor (e.g. vinyl acetate, sec-butyl acetate, isobutyl acetate, amyl acetate and isoamyl acetate) to 10-deacetylbaccatin III. CONCLUSION Taxoid 10β-O-acetyl transferase mutants with redesigned active sites displayed increased catalytic activities and modified substrate preferences, indicating their possible application in the enzymatic synthesis of baccatin III and taxol.
Collapse
|
10
|
Gaines JC, Acebes S, Virrueta A, Butler M, Regan L, O'Hern CS. Comparing side chain packing in soluble proteins, protein-protein interfaces, and transmembrane proteins. Proteins 2018; 86:581-591. [PMID: 29427530 PMCID: PMC5912992 DOI: 10.1002/prot.25479] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2017] [Revised: 01/23/2018] [Accepted: 02/06/2018] [Indexed: 12/26/2022]
Abstract
We compare side chain prediction and packing of core and non-core regions of soluble proteins, protein-protein interfaces, and transmembrane proteins. We first identified or created comparable databases of high-resolution crystal structures of these 3 protein classes. We show that the solvent-inaccessible cores of the 3 classes of proteins are equally densely packed. As a result, the side chains of core residues at protein-protein interfaces and in the membrane-exposed regions of transmembrane proteins can be predicted by the hard-sphere plus stereochemical constraint model with the same high prediction accuracies (>90%) as core residues in soluble proteins. We also find that for all 3 classes of proteins, as one moves away from the solvent-inaccessible core, the packing fraction decreases as the solvent accessibility increases. However, the side chain predictability remains high (80% within 30°) up to a relative solvent accessibility, rSASA≲0.3, for all 3 protein classes. Our results show that ≈40% of the interface regions in protein complexes are "core", that is, densely packed with side chain conformations that can be accurately predicted using the hard-sphere model. We propose packing fraction as a metric that can be used to distinguish real protein-protein interactions from designed, non-binding, decoys. Our results also show that cores of membrane proteins are the same as cores of soluble proteins. Thus, the computational methods we are developing for the analysis of the effect of hydrophobic core mutations in soluble proteins will be equally applicable to analyses of mutations in membrane proteins.
Collapse
Affiliation(s)
- J C Gaines
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, 06520
- Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, Connecticut, 06520
| | - S Acebes
- Department of Mechanical Engineering and Materials Science, Yale University, New Haven, Connecticut, 06520
| | - A Virrueta
- Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, Connecticut, 06520
- Department of Mechanical Engineering and Materials Science, Yale University, New Haven, Connecticut, 06520
| | - M Butler
- Department of Physics and Astronomy, University of Southern California, Los Angeles, California, 90007
| | - L Regan
- Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, Connecticut, 06520
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut, 06520
- Department of Chemistry, Yale University, New Haven, Connecticut, 06520
| | - C S O'Hern
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, 06520
- Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, Connecticut, 06520
- Department of Mechanical Engineering and Materials Science, Yale University, New Haven, Connecticut, 06520
- Department of Physics, Yale University, New Haven, Connecticut, 06520
- Department of Applied Physics, Yale University, New Haven, Connecticut, 06520
| |
Collapse
|
11
|
Wang J, Cao H, Zhang JZH, Qi Y. Computational Protein Design with Deep Learning Neural Networks. Sci Rep 2018; 8:6349. [PMID: 29679026 PMCID: PMC5910428 DOI: 10.1038/s41598-018-24760-x] [Citation(s) in RCA: 75] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2017] [Accepted: 04/10/2018] [Indexed: 12/19/2022] Open
Abstract
Computational protein design has a wide variety of applications. Despite its remarkable success, designing a protein for a given structure and function is still a challenging task. On the other hand, the number of solved protein structures is rapidly increasing while the number of unique protein folds has reached a steady number, suggesting more structural information is being accumulated on each fold. Deep learning neural network is a powerful method to learn such big data set and has shown superior performance in many machine learning fields. In this study, we applied the deep learning neural network approach to computational protein design for predicting the probability of 20 natural amino acids on each residue in a protein. A large set of protein structures was collected and a multi-layer neural network was constructed. A number of structural properties were extracted as input features and the best network achieved an accuracy of 38.3%. Using the network output as residue type restraints improves the average sequence identity in designing three natural proteins using Rosetta. Moreover, the predictions from our network show ~3% higher sequence identity than a previous method. Results from this study may benefit further development of computational protein design methods.
Collapse
Affiliation(s)
- Jingxue Wang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - Huali Cao
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China
| | - John Z H Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China.,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.,Department of Chemistry, New York University, NY, NY, 10003, USA.,Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi, 030006, China
| | - Yifei Qi
- Shanghai Engineering Research Center of Molecular Therapeutics and New Drug Development, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai, 200062, China. .,NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai, 200062, China.
| |
Collapse
|
12
|
Colbes J, Aguila SA, Brizuela CA. Scoring of Side-Chain Packings: An Analysis of Weight Factors and Molecular Dynamics Structures. J Chem Inf Model 2018; 58:443-452. [PMID: 29368924 DOI: 10.1021/acs.jcim.7b00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The protein side-chain packing problem (PSCPP) is a central task in computational protein design. The problem is usually modeled as a combinatorial optimization problem, which consists of searching for a set of rotamers, from a given rotamer library, that minimizes a scoring function (SF). The SF is a weighted sum of terms, that can be decomposed in physics-based and knowledge-based terms. Although there are many methods to obtain approximate solutions for this problem, all of them have similar performances and there has not been a significant improvement in recent years. Studies on protein structure prediction and protein design revealed the limitations of current SFs to achieve further improvements for these two problems. In the same line, a recent work reported a similar result for the PSCPP. In this work, we ask whether or not this negative result regarding further improvements in performance is due to (i) an incorrect weighting of the SFs terms or (ii) the constrained conformation resulting from the protein crystallization process. To analyze these questions, we (i) model the PSCPP as a bi-objective combinatorial optimization problem, optimizing, at the same time, the two most important terms of two SFs of state-of-the-art algorithms and (ii) performed a preprocessing relaxation of the crystal structure through molecular dynamics to simulate the protein in the solvent and evaluated the performance of these two state-of-the-art SFs under these conditions. Our results indicate that (i) no matter what combination of weight factors we use the current SFs will not lead to better performances and (ii) the evaluated SFs will not be able to improve performance on relaxed structures. Furthermore, the experiments revealed that the SFs and the methods are biased toward crystallized structures.
Collapse
Affiliation(s)
- Jose Colbes
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnologia, Universidad Nacional Autonoma de Mexico , Km. 107 Carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico , C.P. 22860
| | - Carlos A Brizuela
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| |
Collapse
|
13
|
Chu H, Liu H. TetraBASE: A Side Chain-Independent Statistical Energy for Designing Realistically Packed Protein Backbones. J Chem Inf Model 2018; 58:430-442. [PMID: 29314837 DOI: 10.1021/acs.jcim.7b00677] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
To construct backbone structures of high designability is a primary aspect of computational protein design. We report here a side chain-independent statistical energy that aims at realistic modeling of through-space packing of polypeptide backbones. To mitigate the lack of explicit amino acid side chains, the model treats the interbackbone site packing as being dependent on peptide local conformation. In addition, new variables suitable for statistical analysis, one for relative orientation and another for distance, have been introduced to represent the intersite geometry based on the asymmetrical tetrahedron organization of distinct chemical groups surrounding the Cα-carbon atoms. The resulting tetrahedron-based backbone statistical energy (tetraBASE) model has been used to optimize the tertiary organizations of secondary structure elements (SSEs) of designated types with Monte Caro simulated annealing, starting from artificial initial configurations. The tetraBASE minimum energy structures can reproduce SSE packing frequently observed in native proteins with atomic root-mean-square deviations of 1-2 Å. The model has also been tested by examining the stability of native SSE arrangements under tetraBASE. The results suggest that tetraBASE model can be used to effectively represent interbackbone packing when designing backbone structures without explicitly knowing side chain types.
Collapse
Affiliation(s)
- Huanyu Chu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China
| | - Haiyan Liu
- School of Life Sciences, University of Science and Technology of China , 230027 Hefei, Anhui China.,Hefei National Laboratory for Physical Sciences at the Microscales , 230027 Hefei, Anhui China.,Collaborative Innovation Center of Chemistry for Life Sciences , 230027 Hefei, Anhui China
| |
Collapse
|
14
|
Gaillard T, Simonson T. Full Protein Sequence Redesign with an MMGBSA Energy Function. J Chem Theory Comput 2017; 13:4932-4943. [DOI: 10.1021/acs.jctc.7b00202] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Thomas Gaillard
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
15
|
Sun MGF, Kim PM. Data driven flexible backbone protein design. PLoS Comput Biol 2017; 13:e1005722. [PMID: 28837553 PMCID: PMC5587332 DOI: 10.1371/journal.pcbi.1005722] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Revised: 09/06/2017] [Accepted: 08/11/2017] [Indexed: 11/18/2022] Open
Abstract
Protein design remains an important problem in computational structural biology. Current computational protein design methods largely use physics-based methods, which make use of information from a single protein structure. This is despite the fact that multiple structures of many protein folds are now readily available in the PDB. While ensemble protein design methods can use multiple protein structures, they treat each structure independently. Here, we introduce a flexible backbone strategy, FlexiBaL-GP, which learns global protein backbone movements directly from multiple protein structures. FlexiBaL-GP uses the machine learning method of Gaussian Process Latent Variable Models to learn a lower dimensional representation of the protein coordinates that best represent backbone movements. These learned backbone movements are used to explore alternative protein backbones, while engineering a protein within a parallel tempered MCMC framework. Using the human ubiquitin-USP21 complex as a model we demonstrate that our design strategy outperforms current strategies for the interface design task of identifying tight binding ubiquitin variants for USP21.
Collapse
Affiliation(s)
- Mark G. F. Sun
- Department of Computer Science, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Philip M. Kim
- Department of Computer Science, University of Toronto, Toronto, Canada
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Canada
- Banting and Best Department of Medical Research, University of Toronto, Toronto, Canada
- * E-mail:
| |
Collapse
|
16
|
Gaines JC, Clark AH, Regan L, O'Hern CS. Packing in protein cores. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2017; 29:293001. [PMID: 28557791 DOI: 10.1088/1361-648x/aa75c2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Proteins are biological polymers that underlie all cellular functions. The first high-resolution protein structures were determined by x-ray crystallography in the 1960s. Since then, there has been continued interest in understanding and predicting protein structure and stability. It is well-established that a large contribution to protein stability originates from the sequestration from solvent of hydrophobic residues in the protein core. How are such hydrophobic residues arranged in the core; how can one best model the packing of these residues, and are residues loosely packed with multiple allowed side chain conformations or densely packed with a single allowed side chain conformation? Here we show that to properly model the packing of residues in protein cores it is essential that amino acids are represented by appropriately calibrated atom sizes, and that hydrogen atoms are explicitly included. We show that protein cores possess a packing fraction of [Formula: see text], which is significantly less than the typically quoted value of 0.74 obtained using the extended atom representation. We also compare the results for the packing of amino acids in protein cores to results obtained for jammed packings from discrete element simulations of spheres, elongated particles, and composite particles with bumpy surfaces. We show that amino acids in protein cores pack as densely as disordered jammed packings of particles with similar values for the aspect ratio and bumpiness as found for amino acids. Knowing the structural properties of protein cores is of both fundamental and practical importance. Practically, it enables the assessment of changes in the structure and stability of proteins arising from amino acid mutations (such as those identified as a result of the massive human genome sequencing efforts) and the design of new folded, stable proteins and protein-protein interactions with tunable specificity and affinity.
Collapse
Affiliation(s)
- J C Gaines
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, United States of America. Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, United States of America
| | | | | | | |
Collapse
|
17
|
Gaines JC, Virrueta A, Buch DA, Fleishman SJ, O'Hern CS, Regan L. Collective repacking reveals that the structures of protein cores are uniquely specified by steric repulsive interactions. Protein Eng Des Sel 2017; 30:387-394. [PMID: 28201818 PMCID: PMC7263838 DOI: 10.1093/protein/gzx011] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 01/26/2017] [Indexed: 11/12/2022] Open
Abstract
Protein core repacking is a standard test of protein modeling software. A recent study of
six different modeling software packages showed that they are more successful at
predicting side chain conformations of core compared to surface residues. All the modeling
software tested have multicomponent energy functions, typically including contributions
from solvation, electrostatics, hydrogen bonding and Lennard–Jones interactions in
addition to statistical terms based on observed protein structures. We investigated to
what extent a simplified energy function that includes only stereochemical constraints and
repulsive hard-sphere interactions can correctly repack protein cores. For single residue
and collective repacking, the hard-sphere model accurately recapitulates the observed side
chain conformations for Ile, Leu, Phe, Thr, Trp, Tyr and Val. This result shows that there
are no alternative, sterically allowed side chain conformations of core residues. Analysis
of the same set of protein cores using the Rosetta software suite revealed that the
hard-sphere model and Rosetta perform equally well on Ile, Leu, Phe, Thr and Val; the
hard-sphere model performs better on Trp and Tyr and Rosetta performs better on Ser. We
conclude that the high prediction accuracy in protein cores obtained by protein modeling
software and our simplified hard-sphere approach reflects the high density of protein
cores and dominance of steric repulsion.
Collapse
Affiliation(s)
- J C Gaines
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, USA
| | - A Virrueta
- Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, USA.,Department of Mechanical Engineering and Materials Science, Yale University, New Haven, CT 06520, USA
| | - D A Buch
- C. Eugene Bennett Department of Chemistry, 217 Clark Hall, West Virginia University, Morgantown, WV 26506, USA
| | - S J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 76100, Israel
| | - C S O'Hern
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, USA.,Department of Mechanical Engineering and Materials Science, Yale University, New Haven, CT 06520, USA.,Department of Physics, Yale University, New Haven, CT 06520, USA.,Department of Applied Physics, Yale University, New Haven, CT 06520, USA
| | - L Regan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA.,Integrated Graduate Program in Physical and Engineering Biology (IGPPEB), Yale University, New Haven, CT 06520, USA.,Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.,Department of Chemistry, Yale University, New Haven, CT 06520, USA
| |
Collapse
|
18
|
Löffler P, Schmitz S, Hupfeld E, Sterner R, Merkl R. Rosetta:MSF: a modular framework for multi-state computational protein design. PLoS Comput Biol 2017; 13:e1005600. [PMID: 28604768 PMCID: PMC5484525 DOI: 10.1371/journal.pcbi.1005600] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2017] [Revised: 06/26/2017] [Accepted: 05/27/2017] [Indexed: 12/20/2022] Open
Abstract
Computational protein design (CPD) is a powerful technique to engineer existing proteins or to design novel ones that display desired properties. Rosetta is a software suite including algorithms for computational modeling and analysis of protein structures and offers many elaborate protocols created to solve highly specific tasks of protein engineering. Most of Rosetta’s protocols optimize sequences based on a single conformation (i. e. design state). However, challenging CPD objectives like multi-specificity design or the concurrent consideration of positive and negative design goals demand the simultaneous assessment of multiple states. This is why we have developed the multi-state framework MSF that facilitates the implementation of Rosetta’s single-state protocols in a multi-state environment and made available two frequently used protocols. Utilizing MSF, we demonstrated for one of these protocols that multi-state design yields a 15% higher performance than single-state design on a ligand-binding benchmark consisting of structural conformations. With this protocol, we designed de novo nine retro-aldolases on a conformational ensemble deduced from a (βα)8-barrel protein. All variants displayed measurable catalytic activity, testifying to a high success rate for this concept of multi-state enzyme design. Protein engineering, i. e. the targeted modification or design of proteins has tremendous potential for medical and industrial applications. One generally applicable strategy for protein engineering is rational protein design: based on detailed knowledge of structure and function, computer programs like Rosetta propose the sequence of a protein possessing the desired properties. So far, most computer protocols have used rigid structures for design, which is a simplification because a protein’s structure is more accurately specified by a conformational ensemble. We have now implemented a framework for computational protein design that allows certain design protocols of Rosetta to make use of multiple design states like structural ensembles. An in silico assessment simulating ligand-binding design showed that this new approach generates more reliably native-like sequences than a single-state approach. As a proof-of-concept, we introduced de novo retro-aldolase activity into a scaffold protein and characterized nine variants experimentally, all of which were catalytically active.
Collapse
Affiliation(s)
- Patrick Löffler
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Samuel Schmitz
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Enrico Hupfeld
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Reinhard Sterner
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
| | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Regensburg, Germany
- * E-mail:
| |
Collapse
|