1
|
Rosignoli S, Lustrino E, Di Silverio I, Paiardini A. Making Use of Averaging Methods in MODELLER for Protein Structure Prediction. Int J Mol Sci 2024; 25:1731. [PMID: 38339009 PMCID: PMC10855553 DOI: 10.3390/ijms25031731] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 01/23/2024] [Accepted: 01/29/2024] [Indexed: 02/12/2024] Open
Abstract
Recent advances in protein structure prediction, driven by AlphaFold 2 and machine learning, demonstrate proficiency in static structures but encounter challenges in capturing essential dynamic features crucial for understanding biological function. In this context, homology-based modeling emerges as a cost-effective and computationally efficient alternative. The MODELLER (version 10.5, accessed on 30 November 2023) algorithm can be harnessed for this purpose since it computes intermediate models during simulated annealing, enabling the exploration of attainable configurational states and energies while minimizing its objective function. There have been a few attempts to date to improve the models generated by its algorithm, and in particular, there is no literature regarding the implementation of an averaging procedure involving the intermediate models in the MODELLER algorithm. In this study, we examined MODELLER's output using 225 target-template pairs, extracting the best representatives of intermediate models. Applying an averaging procedure to the selected intermediate structures based on statistical potentials, we aimed to determine: (1) whether averaging improves the quality of structural models during the building phase; (2) if ranking by statistical potentials reliably selects the best models, leading to improved final model quality; (3) whether using a single template versus multiple templates affects the averaging approach; (4) whether the "ensemble" nature of the MODELLER building phase can be harnessed to capture low-energy conformations in holo structures modeling. Our findings indicate that while improvements typically fall short of a few decimal points in the model evaluation metric, a notable fraction of configurations exhibit slightly higher similarity to the native structure than MODELLER's proposed final model. The averaging-building procedure proves particularly beneficial in (1) regions of low sequence identity between the target and template(s), the most challenging aspect of homology modeling; (2) holo protein conformations generation, an area in which MODELLER and related tools usually fall short of the expected performance.
Collapse
Affiliation(s)
| | | | | | - Alessandro Paiardini
- Department of Biochemical Sciences, Sapienza University of Rome, 00185 Rome, Italy; (S.R.); (E.L.); (I.D.S.)
| |
Collapse
|
2
|
Teruel N, Borges VM, Najmanovich R. Surfaces: a software to quantify and visualize interactions within and between proteins and ligands. Bioinformatics 2023; 39:btad608. [PMID: 37788107 PMCID: PMC10568369 DOI: 10.1093/bioinformatics/btad608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 08/23/2023] [Accepted: 09/29/2023] [Indexed: 10/05/2023] Open
Abstract
SUMMARY Computational methods for the quantification and visualization of the relative contribution of molecular interactions to the stability of biomolecular structures and complexes are fundamental to understand, modulate and engineer biological processes. Here, we present Surfaces, an easy to use, fast and customizable software for quantification and visualization of molecular interactions based on the calculation of surface areas in contact. Surfaces calculations shows equivalent or better correlations with experimental data as computationally expensive methods based on molecular dynamics. AVAILABILITY AND IMPLEMENTATION All scripts are available at https://github.com/NRGLab/Surfaces. Surface's documentation is available at https://surfaces-tutorial.readthedocs.io/en/latest/index.html.
Collapse
Affiliation(s)
- Natália Teruel
- Department of Pharmacology and Physiology, Faculty of Medicine, Université de Montréal, Montreal H3T 1J4, Canada
| | - Vinicius Magalhães Borges
- Department of Biomedical Sciences, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, USA
| | - Rafael Najmanovich
- Department of Pharmacology and Physiology, Faculty of Medicine, Université de Montréal, Montreal H3T 1J4, Canada
| |
Collapse
|
3
|
Ahmadi N, Aghasadeghi M, Hamidi-Fard M, Motevalli F, Bahramali G. Reverse Vaccinology and Immunoinformatic Approach for Designing a Bivalent Vaccine Candidate Against Hepatitis A and Hepatitis B Viruses. Mol Biotechnol 2023:10.1007/s12033-023-00867-z. [PMID: 37715882 DOI: 10.1007/s12033-023-00867-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Accepted: 08/21/2023] [Indexed: 09/18/2023]
Abstract
Hepatitis A and B are two crucial viral infections that still dramatically affect public health worldwide. Hepatitis A Virus (HAV) is the main cause of acute hepatitis, whereas Hepatitis B Virus (HBV) leads to the chronic form of the disease, possibly cirrhosis or liver failure. Therefore, vaccination has always been considered the most effective preventive method against pathogens. At this moment, we aimed at the immunoinformatic analysis of HAV-Viral Protein 1 (VP1) as the major capsid protein to come up with the most conserved immunogenic truncated protein to be fused by HBV surface antigen (HBs Ag) to achieve a bivalent vaccine against HAV and HBV using an AAY linker. Various computational approaches were employed to predict highly conserved regions and the most immunogenic B-cell and T-cell epitopes of HAV-VP1 capsid protein in both humans and BALB/c. Moreover, the predicted fusion protein was analyzed regarding primary and secondary structures and also homology validation. Afterward, the three-dimensional structure of vaccine constructs docked with various toll-like receptors (TLR) 2, 4 and 7. According to the bioinformatics tools, the region of 99-259 amino acids of VP1 was selected with high immunogenicity and conserved epitopes. T-cell epitope prediction showed that this region contains 32 antigenic peptides for Human leukocyte antigen (HLA) class I and 20 antigenic peptides in terms of HLA class II which are almost fully conserved in the Iranian population. The vaccine design includes 5 linear and 4 conformational B-cell lymphocyte (BCL) epitopes to induce humoral immune responses. The designed VP1-AAY-HBsAg fusion protein has the potency to be constructed and expressed to achieve a bivalent vaccine candidate, especially in the Iranian population. These findings led us to claim that the designed vaccine candidate provides potential pathways for creating an exploratory vaccine against Hepatitis A and Hepatitis B Viruses with high confidence for the identified strains.
Collapse
Affiliation(s)
- Neda Ahmadi
- Department of Microbiology, Faculty of Biological Sciences, North Tehran Branch, Islamic Azad University, Tehran, Iran
| | - Mohammadreza Aghasadeghi
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, Tehran, 13165, Iran
- Viral Vaccine Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Mojtaba Hamidi-Fard
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, Tehran, 13165, Iran
- Viral Vaccine Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Fatemeh Motevalli
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, Tehran, 13165, Iran
| | - Golnaz Bahramali
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, Tehran, 13165, Iran.
- Viral Vaccine Research Center, Pasteur Institute of Iran, Tehran, Iran.
| |
Collapse
|
4
|
Jung Y, Geng C, Bonvin AMJJ, Xue LC, Honavar VG. MetaScore: A Novel Machine-Learning-Based Approach to Improve Traditional Scoring Functions for Scoring Protein-Protein Docking Conformations. Biomolecules 2023; 13:121. [PMID: 36671507 PMCID: PMC9855734 DOI: 10.3390/biom13010121] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 12/22/2022] [Accepted: 12/26/2022] [Indexed: 01/11/2023] Open
Abstract
Protein-protein interactions play a ubiquitous role in biological function. Knowledge of the three-dimensional (3D) structures of the complexes they form is essential for understanding the structural basis of those interactions and how they orchestrate key cellular processes. Computational docking has become an indispensable alternative to the expensive and time-consuming experimental approaches for determining the 3D structures of protein complexes. Despite recent progress, identifying near-native models from a large set of conformations sampled by docking-the so-called scoring problem-still has considerable room for improvement. We present MetaScore, a new machine-learning-based approach to improve the scoring of docked conformations. MetaScore utilizes a random forest (RF) classifier trained to distinguish near-native from non-native conformations using their protein-protein interfacial features. The features include physicochemical properties, energy terms, interaction-propensity-based features, geometric properties, interface topology features, evolutionary conservation, and also scores produced by traditional scoring functions (SFs). MetaScore scores docked conformations by simply averaging the score produced by the RF classifier with that produced by any traditional SF. We demonstrate that (i) MetaScore consistently outperforms each of the nine traditional SFs included in this work in terms of success rate and hit rate evaluated over conformations ranked among the top 10; (ii) an ensemble method, MetaScore-Ensemble, that combines 10 variants of MetaScore obtained by combining the RF score with each of the traditional SFs outperforms each of the MetaScore variants. We conclude that the performance of traditional SFs can be improved upon by using machine learning to judiciously leverage protein-protein interfacial features and by using ensemble methods to combine multiple scoring functions.
Collapse
Affiliation(s)
- Yong Jung
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
| | - Cunliang Geng
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Alexandre M. J. J. Bonvin
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Li C. Xue
- Bijvoet Centre for Biomolecular Research, Faculty of Science—Chemistry, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Center for Molecular and Biomolecular Informatics, Radboudumc, Greet Grooteplein 26-28, 6525 GA Nijmegen, The Netherlands
| | - Vasant G. Honavar
- Bioinformatics & Genomics Graduate Program, Pennsylvania State University, University Park, PA 16802, USA
- Artificial Intelligence Research Laboratory, Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Clinical and Translational Sciences Institute, Pennsylvania State University, University Park, PA 16802, USA
- College of Information Sciences & Technology, Pennsylvania State University, University Park, PA 16802, USA
- Institute for Computational and Data Sciences, Pennsylvania State University, University Park, PA 16802, USA
- Center for Big Data Analytics and Discovery Informatics, Pennsylvania State University, University Park, PA 16823, USA
| |
Collapse
|
5
|
An X, Zhang W, Rong C, Liu S. Understanding
Ramachandran
plot for dipeptide: A density functional theory and i
nformation‐theoretic
approach study. J CHIN CHEM SOC-TAIP 2022. [DOI: 10.1002/jccs.202200444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Xiaoyan An
- Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research, (Ministry of Education of China) Hunan Normal University Changsha Hunan People's Republic of China
| | - Wenbiao Zhang
- Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research, (Ministry of Education of China) Hunan Normal University Changsha Hunan People's Republic of China
| | - Chunying Rong
- Key Laboratory of Chemical Biology and Traditional Chinese Medicine Research, (Ministry of Education of China) Hunan Normal University Changsha Hunan People's Republic of China
| | - Shubin Liu
- Research Computing Center University of North Carolina Chapel Hill North Carolina USA
- Department of Chemistry University of North Carolina Chapel Hill North Carolina USA
| |
Collapse
|
6
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
7
|
Ochoa R, Lunardelli VAS, Rosa DS, Laio A, Cossio P. Multiple-Allele MHC Class II Epitope Engineering by a Molecular Dynamics-Based Evolution Protocol. Front Immunol 2022; 13:862851. [PMID: 35572587 PMCID: PMC9094701 DOI: 10.3389/fimmu.2022.862851] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 03/28/2022] [Indexed: 11/13/2022] Open
Abstract
Epitopes that bind simultaneously to all human alleles of Major Histocompatibility Complex class II (MHC II) are considered one of the key factors for the development of improved vaccines and cancer immunotherapies. To engineer MHC II multiple-allele binders, we developed a protocol called PanMHC-PARCE, based on the unsupervised optimization of the epitope sequence by single-point mutations, parallel explicit-solvent molecular dynamics simulations and scoring of the MHC II-epitope complexes. The key idea is accepting mutations that not only improve the affinity but also reduce the affinity gap between the alleles. We applied this methodology to enhance a Plasmodium vivax epitope for multiple-allele binding. In vitro rate-binding assays showed that four engineered peptides were able to bind with improved affinity toward multiple human MHC II alleles. Moreover, we demonstrated that mice immunized with the peptides exhibited interferon-gamma cellular immune response. Overall, the method enables the engineering of peptides with improved binding properties that can be used for the generation of new immunotherapies.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia
| | | | - Daniela Santoro Rosa
- Department of Microbiology, Immunology and Parasitology, Federal University of Sao Paulo, Sao Paulo, Brazil.,Institute for Investigation in Immunology (iii), Instituto Nacional de Ciência e Tecnologia (INCT), Sao Paulo, Brazil
| | - Alessandro Laio
- Physics Area, International School for Advanced Studies (SISSA), Trieste, Italy.,Condensed Matter and Statistical Physics Section, International Centre for Theoretical Physics (ICTP), Trieste, Italy
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia.,Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany.,Center for Computational Mathematics, Flatiron Institute, New York, NY, United States.,Center for Computational Biology, Flatiron Institute, New York, NY, United States
| |
Collapse
|
8
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
9
|
Wang WF, Xie XY, Huang Y, Li YK, Liu H, Chen XL, Wang HL. Identification of a Novel Antimicrobial Peptide From the Ancient Marine Arthropod Chinese Horseshoe Crab, Tachypleus tridentatus. Front Immunol 2022; 13:794779. [PMID: 35401525 PMCID: PMC8984021 DOI: 10.3389/fimmu.2022.794779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 02/24/2022] [Indexed: 12/02/2022] Open
Abstract
Humoral immunity is the first line of defense in the invertebrate immune system, and antimicrobial peptides play an important role in this biological process. A novel antimicrobial peptide, termed Tatritin, was identified and characterized in hemolymph of Chinese horseshoe crab, Tachypleus tridentatus, infected with Gram-negative bacteria via transcriptome analysis. Tatritin was significantly induced by bacterial infection in hemolymph and gill. The preprotein of Tatritin consists of a signal peptide (21 aa) and a mature peptide (47 aa) enriched by cysteine. The putative mature peptide was 5.6 kDa with a theoretical isoelectric point (pI) of 9.99 and showed a α-helix structure in the N-terminal and an anti-parallel β-sheet structure in the cysteine-stabilized C-terminal region. The chemically synthesized peptide of Tatritin exhibited a broad spectrum of antimicrobial activity against Gram-negative and Gram-positive bacteria and fungi. Furthermore, Tatritin may recognize and inhibit pathogenic microorganisms by directly binding to LPS, DNA, and chitin. In addition, administration of Tatritin reduced the mortality of zebrafish after bacterial infection. Due to its broad-spectrum antimicrobial activity in vivo and in vitro and the sensitivity to drug-resistant bacterial strains, Tatritin peptide can be used as a new type of drug for infection treatment or as an immune enhancer in animals.
Collapse
Affiliation(s)
- Wei-Feng Wang
- Key Lab of Freshwater Animal Breeding, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Fisheries, Huazhong Agricultural University, Wuhan, China
| | - Xiao-Yong Xie
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture, South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, China
| | - Yan Huang
- Key Lab of Freshwater Animal Breeding, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Fisheries, Huazhong Agricultural University, Wuhan, China
| | - Yin-Kang Li
- Key Laboratory of South China Sea Fishery Resources Exploitation & Utilization, Ministry of Agriculture, South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, China
| | - Hong Liu
- Key Lab of Freshwater Animal Breeding, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Fisheries, Huazhong Agricultural University, Wuhan, China
| | - Xiu-Li Chen
- Guangxi Key Laboratory of Aquatic Genetic Breeding and Healthy Aquaculture, Guangxi Academy of Fishery Sciences, Nanning, China
| | - Huan-Ling Wang
- Key Lab of Freshwater Animal Breeding, Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction, Ministry of Education, College of Fisheries, Huazhong Agricultural University, Wuhan, China
| |
Collapse
|
10
|
Radusky LG, Serrano L. pyFoldX: enabling biomolecular analysis and engineering along structural ensembles. Bioinformatics 2022; 38:2353-2355. [PMID: 35176149 PMCID: PMC9004634 DOI: 10.1093/bioinformatics/btac072] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 12/19/2021] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
SUMMARY Recent years have seen an increase in the number of structures available, not only for new proteins but also for the same protein crystallized with different molecules and proteins. While protein design software has proven to be successful in designing and modifying proteins, they can also be overly sensitive to small conformational differences between structures of the same protein. To cope with this, we introduce here pyFoldX, a python library that allows the integrative analysis of structures of the same protein using FoldX, an established forcefield and modelling software. The library offers new functionalities for handling different structures of the same protein, an improved molecular parametrization module and an easy integration with the data analysis ecosystem of the python programming language. AVAILABILITY AND IMPLEMENTATION pyFoldX rely on the FoldX software for energy calculations and modelling, which can be downloaded upon registration in http://foldxsuite.crg.eu/ and its licence is free of charge for academics. The pyFoldX library is open-source. Full details on installation, tutorials covering the library functionality and the scripts used to generate the data and figures presented in this paper are available at https://github.com/leandroradusky/pyFoldX. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leandro G Radusky
- Centre for Genomic Regulation (CRG), The Barcelona Institute for Science and Technology, 08003 Barcelona, Spain
| | | |
Collapse
|
11
|
Yamamori Y, Tomii K. Application of Homology Modeling by Enhanced Profile-Profile Alignment and Flexible-Fitting Simulation to Cryo-EM Based Structure Determination. Int J Mol Sci 2022; 23:ijms23041977. [PMID: 35216093 PMCID: PMC8879198 DOI: 10.3390/ijms23041977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2021] [Revised: 02/07/2022] [Accepted: 02/09/2022] [Indexed: 12/03/2022] Open
Abstract
Application of cryo-electron microscopy (cryo-EM) is crucially important for ascertaining the atomic structure of large biomolecules such as ribosomes and protein complexes in membranes. Advances in cryo-EM technology and software have made it possible to obtain data with near-atomic resolution, but the method is still often capable of producing only a density map with up to medium resolution, either partially or entirely. Therefore, bridging the gap separating the density map and the atomic model is necessary. Herein, we propose a methodology for constructing atomic structure models based on cryo-EM maps with low-to-medium resolution. The method is a combination of sensitive and accurate homology modeling using our profile–profile alignment method with a flexible-fitting method using molecular dynamics simulation. As described herein, this study used benchmark applications to evaluate the model constructions of human two-pore channel 2 (one target protein in CASP13 with its structure determined using cryo-EM data) and the overall structure of Enterococcus hirae V-ATPase complex.
Collapse
Affiliation(s)
- Yu Yamamori
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan;
| | - Kentaro Tomii
- Artificial Intelligence Research Center (AIRC), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan;
- AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory (RWBC-OIL), National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
- Correspondence:
| |
Collapse
|
12
|
Ochoa R, Soler MA, Gladich I, Battisti A, Minovski N, Rodriguez A, Fortuna S, Cossio P, Laio A. Computational Evolution Protocol for Peptide Design. Methods Mol Biol 2022; 2405:335-359. [PMID: 35298821 DOI: 10.1007/978-1-0716-1855-4_16] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Computational peptide design is useful for therapeutics, diagnostics, and vaccine development. To select the most promising peptide candidates, the key is describing accurately the peptide-target interactions at the molecular level. We here review a computational peptide design protocol whose key feature is the use of all-atom explicit solvent molecular dynamics for describing the different peptide-target complexes explored during the optimization. We describe the milestones behind the development of this protocol, which is now implemented in an open-source code called PARCE. We provide a basic tutorial to run the code for an antibody fragment design example. Finally, we describe three additional applications of the method to design peptides for different targets, illustrating the broad scope of the proposed approach.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin, Colombia
| | | | - Ivan Gladich
- Qatar Environment and Energy Research Institute, Hamad Bin Khalifa University, Doha, Qatar
- SISSA, Trieste, Italy
| | | | - Nikola Minovski
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Trieste, Italy
- Theory Department, Laboratory for Cheminformatics, National Institute of Chemistry, Ljubljana, Slovenia
| | - Alex Rodriguez
- The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy
| | - Sara Fortuna
- Italian Institute of Technology (IIT), Genova, Italy
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, Trieste, Italy
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia, Medellin, Colombia
- Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Alessandro Laio
- The Abdus Salam International Centre for Theoretical Physics, Trieste, Italy
- SISSA, Trieste, Italy
| |
Collapse
|
13
|
Abstract
The biological significance of proteins attracted the scientific community in exploring their characteristics. The studies shed light on the interaction patterns and functions of proteins in a living body. Due to their practical difficulties, reliable experimental techniques pave the way for introducing computational methods in the interaction prediction. Automated methods reduced the difficulties but could not yet replace experimental studies as the field is still evolving. Interaction prediction problem being critical needs highly accurate results, but none of the existing methods could offer reliable performance that can parallel with experimental results yet. This article aims to assess the existing computational docking algorithms, their challenges, and future scope. Blind docking techniques are quite helpful when no information other than the individual structures are available. As more and more complex structures are being added to different databases, information-driven approaches can be a good alternative. Artificial intelligence, ruling over the major fields, is expected to take over this domain very shortly.
Collapse
|
14
|
Redesigning an antibody H3 loop by virtual screening of a small library of human germline-derived sequences. Sci Rep 2021; 11:21362. [PMID: 34725391 PMCID: PMC8560851 DOI: 10.1038/s41598-021-00669-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 10/05/2021] [Indexed: 01/01/2023] Open
Abstract
The design of superior biologic therapeutics, including antibodies and engineered proteins, involves optimizing their specific ability to bind to disease-related molecular targets. Previously, we developed and applied the Assisted Design of Antibody and Protein Therapeutics (ADAPT) platform for virtual affinity maturation of antibodies (Vivcharuk et al. in PLoS One 12(7):e0181490, 10.1371/journal.pone.0181490, 2017). However, ADAPT is limited to point mutations of hot-spot residues in existing CDR loops. In this study, we explore the possibility of wholesale replacement of the entire H3 loop with no restriction to maintain the parental loop length. This complements other currently published studies that sample replacements for the CDR loops L1, L2, L3, H1 and H2. Given the immense sequence space theoretically available to H3, we focused on the virtual grafting of over 5000 human germline-derived H3 sequences from the IGMT/LIGM database increasing the diversity of the sequence space when compared to using crystalized H3 loop sequences. H3 loop conformations are generated and scored to identify optimized H3 sequences. Experimental testing of high-ranking H3 sequences grafted into the framework of the bH1 antibody against human VEGF-A led to the discovery of multiple hits, some of which had similar or better affinities relative to the parental antibody. In over 75% of the tested designs, the re-designed H3 loop contributed favorably to overall binding affinity. The hits also demonstrated good developability attributes such as high thermal stability and no aggregation. Crystal structures of select re-designed H3 variants were solved and indicated that although some deviations from predicted structures were seen in the more solvent accessible regions of the H3 loop, they did not significantly affect predicted affinity scores.
Collapse
|
15
|
Jeon S, Blazyte A, Yoon C, Ryu H, Jeon Y, Bhak Y, Bolser D, Manica A, Shin ES, Cho YS, Kim BC, Ryoo N, Choi H, Bhak J. Regional TMPRSS2 V197M Allele Frequencies Are Correlated with COVID-19 Case Fatality Rates. Mol Cells 2021; 44:680-687. [PMID: 34588322 PMCID: PMC8490206 DOI: 10.14348/molcells.2021.2249] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Revised: 06/14/2021] [Accepted: 07/10/2021] [Indexed: 02/08/2023] Open
Abstract
Coronavirus disease, COVID-19 (coronavirus disease 2019), caused by SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2), has a higher case fatality rate in European countries than in others, especially East Asian ones. One potential explanation for this regional difference is the diversity of the viral infection efficiency. Here, we analyzed the allele frequencies of a nonsynonymous variant rs12329760 (V197M) in the TMPRSS2 gene, a key enzyme essential for viral infection and found a significant association between the COVID-19 case fatality rate and the V197M allele frequencies, using over 200,000 present-day and ancient genomic samples. East Asian countries have higher V197M allele frequencies than other regions, including European countries which correlates to their lower case fatality rates. Structural and energy calculation analysis of the V197M amino acid change showed that it destabilizes the TMPRSS2 protein, possibly negatively affecting its ACE2 and viral spike protein processing.
Collapse
Affiliation(s)
- Sungwon Jeon
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Asta Blazyte
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Changhan Yoon
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Hyojung Ryu
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Yeonsu Jeon
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Youngjune Bhak
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | | | - Andrea Manica
- Department of Zoology, University of Cambridge, Cambridge CB2 3EJ, UK
| | - Eun-Seok Shin
- Division of Cardiology, Department of Internal Medicine, Ulsan Medical Center, Ulsan 44686, Korea
- Personal Genomics Institute (PGI), Genome Research Foundation (GRF), Cheongju 28160, Korea
| | | | | | - Namhee Ryoo
- Department of Laboratory Medicine, Keimyung University School of Medicine, Daegu 42601, Korea
| | - Hansol Choi
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
| | - Jong Bhak
- Korean Genomics Center (KOGIC), Ulsan National Institute of Science and Technology (UNIST), Ulsan 44919, Korea
- Department of Biomedical Engineering, College of Information and Biotechnology, UNIST, Ulsan 44919, Korea
- Geromics, Ltd., Cambridge CB1 3NF, UK
- Personal Genomics Institute (PGI), Genome Research Foundation (GRF), Cheongju 28160, Korea
- Clinomics, Inc., Ulsan 44919, Korea
| |
Collapse
|
16
|
Liu X, Luo Y, Li P, Song S, Peng J. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput Biol 2021; 17:e1009284. [PMID: 34347784 PMCID: PMC8366979 DOI: 10.1371/journal.pcbi.1009284] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 08/16/2021] [Accepted: 07/17/2021] [Indexed: 11/19/2022] Open
Abstract
Modeling the impact of amino acid mutations on protein-protein interaction plays a crucial role in protein engineering and drug design. In this study, we develop GeoPPI, a novel structure-based deep-learning framework to predict the change of binding affinity upon mutations. Based on the three-dimensional structure of a protein, GeoPPI first learns a geometric representation that encodes topology features of the protein structure via a self-supervised learning scheme. These representations are then used as features for training gradient-boosting trees to predict the changes of protein-protein binding affinity upon mutations. We find that GeoPPI is able to learn meaningful features that characterize interactions between atoms in protein structures. In addition, through extensive experiments, we show that GeoPPI achieves new state-of-the-art performance in predicting the binding affinity changes upon both single- and multi-point mutations on six benchmark datasets. Moreover, we show that GeoPPI can accurately estimate the difference of binding affinities between a few recently identified SARS-CoV-2 antibodies and the receptor-binding domain (RBD) of the S protein. These results demonstrate the potential of GeoPPI as a powerful and useful computational tool in protein design and engineering. Our code and datasets are available at: https://github.com/Liuxg16/GeoPPI. Estimating the binding affinities of protein-protein interactions (PPIs) is crucial to understand protein function and design new functional proteins. Since the experimental measurement in wet-labs is labor-intensive and time-consuming, fast and accurate in silico approaches have received much attention. Although considerable efforts have been made in this direction, predicting the effects of mutations on the protein-protein binding affinity is still a challenging research problem. In this work, we introduce GeoPPI, a novel computational approach that uses deep geometric representations of protein complexes to predict the effects of mutations on the binding affinity. The geometric representations are first learned via a self-supervised learning scheme and then integrated with gradient-boosting trees to accomplish the prediction. We find that the learned representations encode meaningful patterns underlying the interactions between atoms in protein structures. Also, extensive tests on major benchmark datasets show that GeoPPI has made an important improvement over the existing methods in predicting the effects of mutations on the binding affinity.
Collapse
Affiliation(s)
- Xianggen Liu
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
| | - Yunan Luo
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Pengyong Li
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
| | - Sen Song
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
- * E-mail: (JP); (SS)
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (JP); (SS)
| |
Collapse
|
17
|
Pearce R, Zhang Y. Toward the solution of the protein structure prediction problem. J Biol Chem 2021; 297:100870. [PMID: 34119522 PMCID: PMC8254035 DOI: 10.1016/j.jbc.2021.100870] [Citation(s) in RCA: 60] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Revised: 06/07/2021] [Accepted: 06/09/2021] [Indexed: 11/20/2022] Open
Abstract
Since Anfinsen demonstrated that the information encoded in a protein's amino acid sequence determines its structure in 1973, solving the protein structure prediction problem has been the Holy Grail of structural biology. The goal of protein structure prediction approaches is to utilize computational modeling to determine the spatial location of every atom in a protein molecule starting from only its amino acid sequence. Depending on whether homologous structures can be found in the Protein Data Bank (PDB), structure prediction methods have been historically categorized as template-based modeling (TBM) or template-free modeling (FM) approaches. Until recently, TBM has been the most reliable approach to predicting protein structures, and in the absence of reliable templates, the modeling accuracy sharply declines. Nevertheless, the results of the most recent community-wide assessment of protein structure prediction experiment (CASP14) have demonstrated that the protein structure prediction problem can be largely solved through the use of end-to-end deep machine learning techniques, where correct folds could be built for nearly all single-domain proteins without using the PDB templates. Critically, the model quality exhibited little correlation with the quality of available template structures, as well as the number of sequence homologs detected for a given target protein. Thus, the implementation of deep-learning techniques has essentially broken through the 50-year-old modeling border between TBM and FM approaches and has made the success of high-resolution structure prediction significantly less dependent on template availability in the PDB library.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, USA; Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, USA.
| |
Collapse
|
18
|
Bordbar A, Amanlou M, Pooshang Bagheri K, Ready PD, Ebrahimi S, Shahbaz Mohammadi H, Ghafari SM, Parvizi P. Cloning, high-level gene expression and bioinformatics analysis of SP15 and LeIF from Leishmania major and Iranian Phlebotomus papatasi saliva as single and novel fusion proteins: a potential vaccine candidate against leishmaniasis. Trans R Soc Trop Med Hyg 2021; 115:699-713. [PMID: 33155034 DOI: 10.1093/trstmh/traa119] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 08/09/2020] [Accepted: 10/16/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Early exacerbation of cutaneous leishmaniasis is mainly affected by both the salivary and Leishmania parasite components. Little is known of the vaccine combination made by immunogenic proteins of sandfly saliva (SP15) with Leishmania parasites (LeIF) as a single prophylactic vaccine, namely SaLeish. Also, there are no data available to determine the species-specific sequence of SP15 isolated from the Iranian Phlebotomus papatasi. METHODS Integrated bioinformatics and genetic engineering methods were employed to design, optimize and obtain a vector-parasite-based vaccine formulation in a whole-length fusion form of LeIF-SP15 against leishmaniasis. Holistic gene optimization was initially performed to obtain a high yield of pure 'whole-SaLeish' expression using bioinformatics analyses. Genomic and salivary gland RNAs of wild-caught P. papatasi were extracted and their complementary DNA was amplified and cloned into pJET vector. RESULTS The new chimeric protein of whole-SaLeish and randomly selected transcripts of native PpIRSP15 (GenBank accession nos. MT025054 and MN938854, MN938855 and MN938856) were successfully expressed, purified and validated by immunoblotting assay. Furthermore, despite the single amino acid polymorphisms of PpIRSP15 found at positions Y23 and E73 within the population of wild Iranian sandflies, antigenicity and conservancy of PpIRSP15 epitopes remained constant to activate T cells. CONCLUSIONS The SaLeish vaccine strategy takes advantage of a plethora of vector-parasite immunogenic proteins with potential protective efficacy to stimulate both the innate and specific cellular immune responses against Leishmania parasites.
Collapse
Affiliation(s)
- Ali Bordbar
- Molecular Systematics Laboratory, Parasitology Department, Pasteur Institute of Iran, 69 Pasteur Ave., Tehran, Iran.,Venom and Biotherapeutics Molecules Laboratory, Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Massoud Amanlou
- Department of Medicinal Chemistry, Faculty of Pharmacy and Drug Design and Development Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Kamran Pooshang Bagheri
- Venom and Biotherapeutics Molecules Laboratory, Biotechnology Department, Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Paul Donald Ready
- Department of Disease Control, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Sahar Ebrahimi
- Molecular Systematics Laboratory, Parasitology Department, Pasteur Institute of Iran, 69 Pasteur Ave., Tehran, Iran
| | - Hamid Shahbaz Mohammadi
- Department of Biochemistry, Genetics and Metabolism Research Group, Pasteur Institute of Iran, Tehran, Iran
| | - Seyedeh Maryam Ghafari
- Molecular Systematics Laboratory, Parasitology Department, Pasteur Institute of Iran, 69 Pasteur Ave., Tehran, Iran
| | - Parviz Parvizi
- Molecular Systematics Laboratory, Parasitology Department, Pasteur Institute of Iran, 69 Pasteur Ave., Tehran, Iran
| |
Collapse
|
19
|
Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep 2021; 11:10943. [PMID: 34035363 PMCID: PMC8149836 DOI: 10.1038/s41598-021-90303-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open
Abstract
The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.
Collapse
|
20
|
Heo L, Park S, Seok C. GalaxyWater-wKGB: Prediction of Water Positions on Protein Structure Using wKGB Statistical Potential. J Chem Inf Model 2021; 61:2283-2293. [PMID: 33938216 DOI: 10.1021/acs.jcim.0c01434] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Proteins fold and function in water, and protein-water interactions play important roles in protein structure and function. In computational studies on protein structure and interaction, the effect of water is considered either implicitly or explicitly. Implicit water models are frequently used in protein structure prediction and docking because they are computationally much more efficient than explicit water models, which are often employed in molecular dynamics (MD) simulations. However, implicit water models that treat water as a continuous solvent medium cannot account for specific atomistic protein-water interactions that are critical for structure formation and interactions with other molecules. Various methods for predicting water molecules that form specific atomistic interactions with proteins have been developed. Methods involving MD simulations or the integral equation theory tend to produce more accurate results at a higher computational cost than simple geometry- or energy-based methods. Here, we present a novel method for predicting water positions on a protein surface called GalaxyWater-wKGB, which is based on a statistical potential, a water knowledge-based potential based on the generalized Born model (wKGB). This method is accurate and rapid because it does not require conformational sampling or iterative computation owing to the effective statistical treatment employed to derive the potential. The statistical potential describes specific protein atom-water interactions more accurately than conventional potentials by considering the dependence on the degree of solvent accessibility of protein atoms as well as on protein atom-water distances and orientations. The introduction of solvent accessibility allows effective consideration of competing nonspecific protein-water and intraprotein interactions. When tested on high-resolution protein crystal structures, this method could recover similar or larger fractions of crystallographic water 180 times faster than the sophisticated integral equation theory, 3D-RISM. A web service of this water prediction method is freely available at http://galaxy.seoklab.org/wkgb.
Collapse
Affiliation(s)
- Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Sangwoo Park
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
21
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
22
|
Ochoa R, Laskowski RA, Thornton JM, Cossio P. Impact of Structural Observables From Simulations to Predict the Effect of Single-Point Mutations in MHC Class II Peptide Binders. Front Mol Biosci 2021; 8:636562. [PMID: 34222328 PMCID: PMC8253603 DOI: 10.3389/fmolb.2021.636562] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2020] [Accepted: 02/15/2021] [Indexed: 11/23/2022] Open
Abstract
The prediction of peptide binders to Major Histocompatibility Complex (MHC) class II receptors is of great interest to study autoimmune diseases and for vaccine development. Most approaches predict the affinities using sequence-based models trained on experimental data and multiple alignments from known peptide substrates. However, detecting activity differences caused by single-point mutations is a challenging task. In this work, we used interactions calculated from simulations to build scoring matrices for quickly estimating binding differences by single-point mutations. We modelled a set of 837 peptides bound to an MHC class II allele, and optimized the sampling of the conformations using the Rosetta backrub method by comparing the results to molecular dynamics simulations. From the dynamic trajectories of each complex, we averaged and compared structural observables for each amino acid at each position of the 9°mer peptide core region. With this information, we generated the scoring-matrices to predict the sign of the binding differences. We then compared the performance of the best scoring-matrix to different computational methodologies that range in computational costs. Overall, the prediction of the activity differences caused by single mutated peptides was lower than 60% for all the methods. However, the developed scoring-matrix in combination with existing methods reports an increase in the performance, up to 86% with a scoring method that uses molecular dynamics.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, United Kingdom
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group, University of Antioquia UdeA, Medellin, Colombia.,Department of Theoretical Biophysics, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| |
Collapse
|
23
|
Dixit H, Kumar C S, Chaudhary R, Thaker D, Gadewal N, Dasgupta D. Role of Phosphorylation and Hyperphosphorylation of Tau in Its Interaction with βα Dimeric Tubulin Studied from a Bioinformatics Perspective. Avicenna J Med Biotechnol 2021; 13:24-34. [PMID: 33680370 PMCID: PMC7903436 DOI: 10.18502/ajmb.v13i1.4579] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background: Tau is a disordered Microtubule Associated Protein (MAP) which prefers to bind and stabilize microtubules. Phosphorylation of tau in particular enhances tautubulin interaction which otherwise detaches from tubulin during hyperphosphorylation. The reason behind their destabilization, detachment and the role of β subunit (from microtubule) and the projection domain (Tau) in microtubule stability remains elusive till date. Thus, a complete 3D structural investigation of tau protein is much needed to address these queries as the existing crystal structures are in fragments and quite limited. Methods: In this study, the modelled human tau protein was subjected to phosphorylation and hyperphosphorylation which were later considered for docking with micro-tubules (βα subunits-inter dimer) and vinblastine. Results: Phosphorylated tau protein interacts with both α- and β subunits. But stronger bonding was with α- compared to β subunits. Regarding β subunit, proline rich loop and projection domain actively participated in tau binding. Interestingly, hyperphosphorylation of tau increases MAP domain flexibility which ultimately results in tau detachment, the main reason behind tangle formation in Alzheimer’s disease. Conclusion: This study being the first of its kind emphasizes the role of projection domain and proline rich region of β-subunit in stabilizing the tau-tubulin interaction and also the effect of hyperphosphorylation in protein-protein and protein-drug binding.
Collapse
Affiliation(s)
- Hrushikesh Dixit
- Faculty of Biotechnology and Bioinformatics, D.Y. Patil Deemed to be University, CBD Belapur, Navi Mumbai, India
| | - Selvaa Kumar C
- Faculty of Biotechnology and Bioinformatics, D.Y. Patil Deemed to be University, CBD Belapur, Navi Mumbai, India
| | - Ruchi Chaudhary
- Faculty of Biotechnology and Bioinformatics, D.Y. Patil Deemed to be University, CBD Belapur, Navi Mumbai, India
| | - Divya Thaker
- Faculty of Biotechnology and Bioinformatics, D.Y. Patil Deemed to be University, CBD Belapur, Navi Mumbai, India
| | - Nikhil Gadewal
- Advanced Centre for Treatment, Research and Education in Cancer (ACTREC), Kharghar, Navi Mumbai, India
| | - Debjani Dasgupta
- Faculty of Biotechnology and Bioinformatics, D.Y. Patil Deemed to be University, CBD Belapur, Navi Mumbai, India
| |
Collapse
|
24
|
Sadat SM, Aghadadeghi MR, Yousefi M, Khodaei A, Sadat Larijani M, Bahramali G. Bioinformatics Analysis of SARS-CoV-2 to Approach an Effective Vaccine Candidate Against COVID-19. Mol Biotechnol 2021; 63:389-409. [PMID: 33625681 PMCID: PMC7902242 DOI: 10.1007/s12033-021-00303-0] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/21/2021] [Indexed: 02/07/2023]
Abstract
The emerging Coronavirus Disease 2019 (COVID-19) pandemic has posed a serious threat to the public health worldwide, demanding urgent vaccine provide. According to the virus feature as an RNA virus, a high rate of mutations imposes some vaccine design difficulties. Bioinformatics tools have been widely used to make advantage of conserved regions as well as immunogenicity. In this study, we aimed at immunoinformatic evaluation of SARS-CoV-2 proteins conservancy and immunogenicity to design a preventive vaccine candidate. Spike, Membrane and Nucleocapsid amino acid sequences were obtained, and four possible fusion proteins were assessed and compared in terms of structural features and immunogenicity, and population coverage. MHC-I and MHC-II T-cell epitopes, the linear and conformational B-cell epitopes were evaluated. Among the predicted models, the truncated form of Spike in fusion with M and N protein applying AAY linker has high rate of MHC-I and MCH-II epitopes with high antigenicity and acceptable population coverage of 82.95% in Iran and 92.51% in Europe. The in silico study provided truncated Spike-M-N SARS-CoV-2 as a potential preventive vaccine candidate for further in vivo evaluation.
Collapse
Affiliation(s)
- Seyed Mehdi Sadat
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran
| | - Mohammad Reza Aghadadeghi
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran.
| | - Masoume Yousefi
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran
| | - Arezoo Khodaei
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran
| | - Mona Sadat Larijani
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran
| | - Golnaz Bahramali
- Department of Hepatitis and AIDS and Blood Borne Diseases, Pasteur Institute of Iran, No: 69, Pasteur Ave, 13165, Tehran, Iran.
| |
Collapse
|
25
|
Guest JD, Vreven T, Zhou J, Moal I, Jeliazkov JR, Gray JJ, Weng Z, Pierce BG. An expanded benchmark for antibody-antigen docking and affinity prediction reveals insights into antibody recognition determinants. Structure 2021; 29:606-621.e5. [PMID: 33539768 DOI: 10.1016/j.str.2021.01.005] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Revised: 11/15/2020] [Accepted: 01/11/2021] [Indexed: 01/04/2023]
Abstract
Accurate predictive modeling of antibody-antigen complex structures and structure-based antibody design remain major challenges in computational biology, with implications for biotherapeutics, immunity, and vaccines. Through a systematic search for high-resolution structures of antibody-antigen complexes and unbound antibody and antigen structures, in conjunction with identification of experimentally determined binding affinities, we have assembled a non-redundant set of test cases for antibody-antigen docking and affinity prediction. This benchmark more than doubles the number of antibody-antigen complexes and corresponding affinities available in our previous benchmarks, providing an unprecedented view of the determinants of antibody recognition and insights into molecular flexibility. Initial assessments of docking and affinity prediction tools highlight the challenges posed by this diverse set of cases, which includes camelid nanobodies, therapeutic monoclonal antibodies, and broadly neutralizing antibodies targeting viral glycoproteins. This dataset will enable development of advanced predictive modeling and design methods for this therapeutically relevant class of protein-protein interactions.
Collapse
Affiliation(s)
- Johnathan D Guest
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA; Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| | - Thom Vreven
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Jing Zhou
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Iain Moal
- Computational Sciences, GlaxoSmithKline Research and Development, Stevenage SG1 2NY, UK
| | - Jeliazko R Jeliazkov
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA; Program in Molecular Biophysics, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| | - Brian G Pierce
- University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD 20850, USA; Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
26
|
Hernandez R, Facelli JC. Understanding protein structural changes for oncogenic missense variants. Heliyon 2021; 7:e06013. [PMID: 33553733 PMCID: PMC7846930 DOI: 10.1016/j.heliyon.2021.e06013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2020] [Revised: 08/20/2020] [Accepted: 01/15/2021] [Indexed: 12/31/2022] Open
Abstract
Understanding and predicting the changes of protein structure and function upon mutation and their relationship to human health is a critical element to translate the genomic revolution into actionable interventions. Therefore, it is pertinent to explore how mutations result in structural changes leading to pathogenic proteins, but due to the protein structural knowledge gap, experimental approaches are lacking. Protein structure prediction methods, such as I-TASSER, have made it possible to predict the structure of a given amino acid sequence, thus opening a new way to explore protein structure changes upon mutations when experimental information is not available. Using known mutations from the Catalogue of Somatic Mutation in Cancer (COSMIC) and ClinVar databases, we compare predicted structure-derived properties from wild type (WT) and mutated proteins and find differences between the local and global 3D protein structures of the WT and the mutants. The studies in this relatively small sample reveal that the structural changes are quite diverse.
Collapse
Affiliation(s)
- Rolando Hernandez
- Department of Biomedical Informatics and Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah, USA
| | - Julio C. Facelli
- Department of Biomedical Informatics and Center for Clinical and Translational Science, The University of Utah, Salt Lake City, Utah, USA
| |
Collapse
|
27
|
Akhter N, Chennupati G, Djidjev H, Shehu A. Decoy selection for protein structure prediction via extreme gradient boosting and ranking. BMC Bioinformatics 2020; 21:189. [PMID: 33297949 PMCID: PMC7724862 DOI: 10.1186/s12859-020-3523-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 04/29/2020] [Indexed: 11/10/2022] Open
Abstract
Background Identifying one or more biologically-active/native decoys from millions of non-native decoys is one of the major challenges in computational structural biology. The extreme lack of balance in positive and negative samples (native and non-native decoys) in a decoy set makes the problem even more complicated. Consensus methods show varied success in handling the challenge of decoy selection despite some issues associated with clustering large decoy sets and decoy sets that do not show much structural similarity. Recent investigations into energy landscape-based decoy selection approaches show promises. However, lack of generalization over varied test cases remains a bottleneck for these methods. Results We propose a novel decoy selection method, ML-Select, a machine learning framework that exploits the energy landscape associated with the structure space probed through a template-free decoy generation. The proposed method outperforms both clustering and energy ranking-based methods, all the while consistently offering better performance on varied test-cases. Moreover, ML-Select shows promising results even for the decoy sets consisting of mostly low-quality decoys. Conclusions ML-Select is a useful method for decoy selection. This work suggests further research in finding more effective ways to adopt machine learning frameworks in achieving robust performance for decoy selection in template-free protein structure prediction.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Bikini At al Rd., Los Alamos, 87545, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Bikini At al Rd., Los Alamos, 87545, USA
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, 22030, VA, USA.,Department of Bioengineering, George Mason University, Fairfax, 22030, VA, USA.,School of Systems Biology, George Mason University, Manassas, 20110, VA, USA
| |
Collapse
|
28
|
Bhattacharya S, Sah PP, Banerjee A, Ray S. Structural impact due to PPQEE deletion in multiple cancer associated protein - Integrin αV: An In silico exploration. Biosystems 2020; 198:104216. [DOI: 10.1016/j.biosystems.2020.104216] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2020] [Revised: 07/22/2020] [Accepted: 07/27/2020] [Indexed: 12/12/2022]
|
29
|
Amalgamation of 3D structure and sequence information for protein-protein interaction prediction. Sci Rep 2020; 10:19171. [PMID: 33154416 PMCID: PMC7645622 DOI: 10.1038/s41598-020-75467-x] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 09/17/2020] [Indexed: 11/08/2022] Open
Abstract
Protein is the primary building block of living organisms. It interacts with other proteins and is then involved in various biological processes. Protein-protein interactions (PPIs) help in predicting and hence help in understanding the functionality of the proteins, causes and growth of diseases, and designing new drugs. However, there is a vast gap between the available protein sequences and the identification of protein-protein interactions. To bridge this gap, researchers proposed several computational methods to reveal the interactions between proteins. These methods merely depend on sequence-based information of proteins. With the advancement of technology, different types of information related to proteins are available such as 3D structure information. Nowadays, deep learning techniques are adopted successfully in various domains, including bioinformatics. So, current work focuses on the utilization of different modalities, such as 3D structures and sequence-based information of proteins, and deep learning algorithms to predict PPIs. The proposed approach is divided into several phases. We first get several illustrations of proteins using their 3D coordinates information, and three attributes, such as hydropathy index, isoelectric point, and charge of amino acids. Amino acids are the building blocks of proteins. A pre-trained ResNet50 model, a subclass of a convolutional neural network, is utilized to extract features from these representations of proteins. Autocovariance and conjoint triad are two widely used sequence-based methods to encode proteins, which are used here as another modality of protein sequences. A stacked autoencoder is utilized to get the compact form of sequence-based information. Finally, the features obtained from different modalities are concatenated in pairs and fed into the classifier to predict labels for protein pairs. We have experimented on the human PPIs dataset and Saccharomyces cerevisiae PPIs dataset and compared our results with the state-of-the-art deep-learning-based classifiers. The results achieved by the proposed method are superior to those obtained by the existing methods. Extensive experimentations on different datasets indicate that our approach to learning and combining features from two different modalities is useful in PPI prediction.
Collapse
|
30
|
Chen X, Song S, Ji J, Tang Z, Todo Y. Incorporating a multiobjective knowledge-based energy function into differential evolution for protein structure prediction. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.06.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
31
|
Chang RL, Stanley JA, Robinson MC, Sher JW, Li Z, Chan YA, Omdahl AR, Wattiez R, Godzik A, Matallana-Surget S. Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage. EMBO J 2020; 39:e104523. [PMID: 33073387 PMCID: PMC7705453 DOI: 10.15252/embj.2020104523] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 09/16/2020] [Accepted: 09/22/2020] [Indexed: 02/05/2023] Open
Abstract
Oxidative stress alters cell viability, from microorganism irradiation sensitivity to human aging and neurodegeneration. Deleterious effects of protein carbonylation by reactive oxygen species (ROS) make understanding molecular properties determining ROS susceptibility essential. The radiation‐resistant bacterium Deinococcus radiodurans accumulates less carbonylation than sensitive organisms, making it a key model for deciphering properties governing oxidative stress resistance. We integrated shotgun redox proteomics, structural systems biology, and machine learning to resolve properties determining protein damage by γ‐irradiation in Escherichia coli and D. radiodurans at multiple scales. Local accessibility, charge, and lysine enrichment accurately predict ROS susceptibility. Lysine, methionine, and cysteine usage also contribute to ROS resistance of the D. radiodurans proteome. Our model predicts proteome maintenance machinery, and proteins protecting against ROS are more resistant in D. radiodurans. Our findings substantiate that protein‐intrinsic protection impacts oxidative stress resistance, identifying causal molecular properties.
Collapse
Affiliation(s)
- Roger L Chang
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Julian A Stanley
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA
| | - Matthew C Robinson
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA
| | - Joel W Sher
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA
| | - Zhanwen Li
- Division of Biomedical Sciences, University of California Riverside School of Medicine, Riverside, CA, USA
| | - Yujia A Chan
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA.,Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Ashton R Omdahl
- Department of Systems Biology, Blavatnik Institute at Harvard Medical School, Boston, MA, USA
| | - Ruddy Wattiez
- Department of Proteomics and Microbiology, Research Institute for Biosciences, University of Mons, Mons, Belgium
| | - Adam Godzik
- Division of Biomedical Sciences, University of California Riverside School of Medicine, Riverside, CA, USA
| | - Sabine Matallana-Surget
- Division of Biological and Environmental Sciences, Faculty of Natural Sciences, University of Stirling, Stirling, UK
| |
Collapse
|
32
|
Kulandaisamy A, Zaucha J, Frishman D, Gromiha MM. MPTherm-pred: Analysis and Prediction of Thermal Stability Changes upon Mutations in Transmembrane Proteins. J Mol Biol 2020; 433:166646. [PMID: 32920050 DOI: 10.1016/j.jmb.2020.09.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2020] [Revised: 09/04/2020] [Accepted: 09/04/2020] [Indexed: 01/06/2023]
Abstract
The stability of membrane proteins differs from globular proteins due to the presence of nonpolar membrane-spanning regions. Using a dataset of 929 membrane protein mutations whose effects on thermal stability (ΔTm) were experimentally determined, we found that the average ΔTm due to 190 stabilizing and 232 destabilizing mutations occurring in membrane-spanning regions are 2.43(3.1) °C and -5.48(5.5) °C, respectively. The ΔTm values for mutations occurring in solvent-exposed regions are 2.56(2.82) and - 6.8(7.2) °C. We have systematically analyzed the factors influencing the stability of mutants and observed that changes in hydrophobicity, number of contacts between Cα atoms and frequency of aliphatic residues are important determinants of the stability change induced by mutations occurring in membrane-spanning regions. We have developed structure- and sequence-based machine learning predictors of ΔTm due to mutations specifically for membrane proteins. They showed a correlation and mean absolute error (MAE) of 0.72 and 2.85 °C, respectively, between experimental and predicted ΔTm for mutations in membrane-spanning regions on 10-fold group-wise cross-validation. The average correlation and MAE for mutations in aqueous regions are 0.73 and 3.7 °C, respectively. These MAE values are about 50% lower than standard deviations from the mean ΔTm values. The reliability of the method was affirmed on a test set of mutations occurring in evolutionary independent protein sequences. The developed MPTherm-pred server for predicting thermal stability changes upon mutations in membrane proteins is available at https://web.iitm.ac.in/bioinfo2/mpthermpred/. Our results provide insights into factors influencing the stability of membrane proteins and can aid in designing mutants that are more resistant to thermal stress.
Collapse
Affiliation(s)
- A Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jan Zaucha
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Department of Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany; Department of Bioinformatics, Peter the Great St. Petersburg Polytechnic University, St. Petersburg, Russian Federation
| | - M Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India.
| |
Collapse
|
33
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
34
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
35
|
Tanemura KA, Pei J, Merz KM. Refinement of pairwise potentials via logistic regression to score protein-protein interactions. Proteins 2020; 88:1559-1568. [PMID: 32729132 DOI: 10.1002/prot.25973] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 05/17/2020] [Accepted: 06/14/2020] [Indexed: 12/20/2022]
Abstract
Protein-protein interactions (PPIs) are ubiquitous and functionally of great importance in biological systems. Hence, the accurate prediction of PPIs by protein-protein docking and scoring tools is highly desirable in order to characterize their structure and biological function. Ab initio docking protocols are divided into the sampling of docking poses to produce at least one near-native structure, and then to evaluate the vast candidate structures by scoring. Concurrent development in both sampling and scoring is crucial for the deployment of protein-protein docking software. In the present work, we apply a machine learning model on pairwise potentials to refine the task of protein quaternary structure native structure detection among decoys. A decoy set was featurized using the Knowledge and Empirical Combined Scoring Algorithm 2 (KECSA2) pairwise potential. The highly unbalanced decoy set was then balanced using a comparison concept between native and decoy structures. The resultant comparison descriptors were used to train a logistic regression (LR) classifier. The LR model yielded the optimal performance for native detection among decoys compared with conventional scoring functions, while exhibiting lesser performance for the detection of low root mean square deviation decoy structures. Its deployment on an independent benchmark set confirms that the scoring function performs competitively relative to other scoring functions. The scripts used are available at https://github.com/TanemuraKiyoto/PPI-native-detection-via-LR.
Collapse
Affiliation(s)
- Kiyoto A Tanemura
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| | - Jun Pei
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| | - Kenneth M Merz
- Department of Chemistry, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
36
|
Bi J, Chen S, Zhao X, Nie Y, Xu Y. Computation-aided engineering of starch-debranching pullulanase from Bacillus thermoleovorans for enhanced thermostability. Appl Microbiol Biotechnol 2020; 104:7551-7562. [PMID: 32632476 DOI: 10.1007/s00253-020-10764-z] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2020] [Revised: 06/17/2020] [Accepted: 06/30/2020] [Indexed: 12/26/2022]
Abstract
Pullulanases are widely used in food, medicine, and other industries because they specifically hydrolyze α-1,6-glycosidic linkages in starch and oligosaccharides. In addition, high-temperature thermostable pullulanase has multiple advantages, including decreasing saccharification solution viscosity accompanied with enhanced mass transfer and reducing microbial contamination in starch hydrolysis. However, thermophilic pullulanase availability remains limited. Additionally, most do not meet starch-manufacturing requirements due to weak thermostability. Here, we developed a computation-aided strategy to engineer the thermophilic pullulanase from Bacillus thermoleovorans. First, three computational design predictors (FoldX, I-Mutant 3.0, and dDFIRE) were combined to predict stability changes introduced by mutations. After excluding conserved and catalytic sites, 17 mutants were identified. After further experimental verification, we confirmed six positive mutants. Among them, the G692M mutant had the highest thermostability improvement, with 3.8 °C increased Tm and 2.1-fold longer half-life than the wild type at 70 °C. We then characterized the mechanism underlying increased thermostability, such as rigidity enhancement, closer conformation, and strengthened motion correlation using root mean square fluctuation (RMSF), principal component analysis (PCA), dynamic cross-correlation map (DCCM), and free energy landscape (FEL) analysis. KEY POINTS: • A computation-aided strategy was developed to engineer pullulanase thermostability. • Seventeen mutants were identified by combining three computational design predictors. • The G692M mutant was obtained with increased Tmand half-life at 70 °C.
Collapse
Affiliation(s)
- Jiahua Bi
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
| | - Shuhui Chen
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
| | - Xianghan Zhao
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China
| | - Yao Nie
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China. .,Suqian Industrial Technology Research Institute of Jiangnan University, Suqian, 223814, China.
| | - Yan Xu
- School of Biotechnology and Key Laboratory of Industrial Biotechnology, Ministry of Education, Jiangnan University, Wuxi, 214122, China.,State Key Laboratory of Food Science and Technology, Jiangnan University, Wuxi, 214122, China
| |
Collapse
|
37
|
Prediction of Protein Tertiary Structure via Regularized Template Classification Techniques. Molecules 2020; 25:molecules25112467. [PMID: 32466409 PMCID: PMC7321371 DOI: 10.3390/molecules25112467] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Revised: 05/21/2020] [Accepted: 05/22/2020] [Indexed: 11/24/2022] Open
Abstract
We discuss the use of the regularized linear discriminant analysis (LDA) as a model reduction technique combined with particle swarm optimization (PSO) in protein tertiary structure prediction, followed by structure refinement based on singular value decomposition (SVD) and PSO. The algorithm presented in this paper corresponds to the category of template-based modeling. The algorithm performs a preselection of protein templates before constructing a lower dimensional subspace via a regularized LDA. The protein coordinates in the reduced spaced are sampled using a highly explorative optimization algorithm, regressive–regressive PSO (RR-PSO). The obtained structure is then projected onto a reduced space via singular value decomposition and further optimized via RR-PSO to carry out a structure refinement. The final structures are similar to those predicted by best structure prediction tools, such as Rossetta and Zhang servers. The main advantage of our methodology is that alleviates the ill-posed character of protein structure prediction problems related to high dimensional optimization. It is also capable of sampling a wide range of conformational space due to the application of a regularized linear discriminant analysis, which allows us to expand the differences over a reduced basis set.
Collapse
|
38
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
39
|
Bordbar A, Bagheri KP, Ebrahimi S, Parvizi P. Bioinformatics analyses of immunogenic T-cell epitopes of LeIF and PpSP15 proteins from Leishmania major and sand fly saliva used as model antigens for the design of a multi-epitope vaccine to control leishmaniasis. INFECTION GENETICS AND EVOLUTION 2020; 80:104189. [PMID: 31931259 DOI: 10.1016/j.meegid.2020.104189] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 01/05/2020] [Accepted: 01/08/2020] [Indexed: 11/17/2022]
Abstract
Leishmaniasis is caused by protozoan parasites belonging to 20 Leishmania species. This infectious disease is transmitted by bites of infected phlebotomine sandflies, and is widespread in 97 countries throughout the world. No preventive or effective vaccine has been developed yet. In this study, diverse computational methods were integrated to calculate evolutionary divergence, immunogenicity, IFN-γ production, epitope conservancy, and population coverage of protein fusion models of LeIF-SP15 namely SaLeish. Immunogenicity of LeIF of Leishmania species and SP15 of sandfly saliva has not been investigated in-silico in fusion form. A complete set of 9-mer MHC class I and 15-mer MHC class II peptides were identified with a high affinity for the antigenic epitopes of SaLeish inducing specific responses of CD8+ and CD4+ T cells from BALB/c and human. Our preferred approach was determining truncated fragment of SaLeish rather than a whole length bearing the capacity to trigger specific immune response. Phylogenetic analysis showed that LeIF protein is under balancing selection and is conserved between different Leishmania species. Selected SaLeish model contained 19 and 35 antigenic peptides for MHC class I and II, respectively, with strong binding affinity to both highly frequent HLA-I and HLA-II alleles. Analysis of class I CTL epitopes showed that promiscuous peptides of KSLKADIRK, MSCIPHCKY, LQAGVIVAV, and YQYYGFVAM have greater affinity to interact with HLA-A*01:01, HLA-A*02 (03, 06), HLA-A*30:02, HLA-B*40:01, and HLA-B*52:01 molecules. Population coverage with a range of 78-85% confirmed SaLeish-Model4 as an appropriate vaccine candidate among Persian, South Asia, Europe, and North America population. Also, predicted antigenic epitopes of AKPEIRTFSNVLIKY, TRVQDDLRKLQAGVI, and VALFSATMPEEVLEL corresponding to MHC class II were found to provide strong ability to produce IFNγ toward TH(1)-biased-DTH responses. Findings of the current investigation warrant the future experimental assessment of promising SaLeish prophylaxis vaccine that is capable to enhance both innate and specific cellular immune responses.
Collapse
Affiliation(s)
- Ali Bordbar
- Molecular Systematics Laboratory, Parasitology Department, Microbiology Research Center, Pasteur Institute of Iran, Tehran, Iran.
| | - Kamran Pooshang Bagheri
- Venom and Biotherapeutics Molecules Lab., Biotechnology Dept., Biotechnology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Sahar Ebrahimi
- Molecular Systematics Laboratory, Parasitology Department, Microbiology Research Center, Pasteur Institute of Iran, Tehran, Iran
| | - Parviz Parvizi
- Molecular Systematics Laboratory, Parasitology Department, Microbiology Research Center, Pasteur Institute of Iran, Tehran, Iran.
| |
Collapse
|
40
|
Residual Participation and Thermodynamic Stability Due to Molecular Interactions in IL11, IL11Rα and Gp130 from Homo sapiens: An In Silico Outlook for IL11 as a Therapeutic Remedy. Int J Pept Res Ther 2019. [DOI: 10.1007/s10989-019-09996-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
41
|
Cai Y, Li X, Sun Z, Lu Y, Zhao H, Hanson J, Paliwal K, Litfin T, Zhou Y, Yang Y. SPOT-Fold: Fragment-Free Protein Structure Prediction Guided by Predicted Backbone Structure and Contact Map. J Comput Chem 2019; 41:745-750. [PMID: 31845383 DOI: 10.1002/jcc.26132] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 10/07/2019] [Accepted: 12/01/2019] [Indexed: 02/01/2023]
Abstract
Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Yufeng Cai
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Xiongjun Li
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Zhe Sun
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510000, China
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| |
Collapse
|
42
|
Zhang T, Hu G, Yang Y, Wang J, Zhou Y. All-Atom Knowledge-Based Potential for RNA Structure Discrimination Based on the Distance-Scaled Finite Ideal-Gas Reference State. J Comput Biol 2019; 27:856-867. [PMID: 31638408 DOI: 10.1089/cmb.2019.0251] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Noncoding RNAs are increasingly found to play a wide variety of roles in living organisms. Yet, their functional mechanisms are poorly understood because their structures are difficult to determine experimentally. As a result, developing more effective computational techniques to predict RNA structures becomes increasingly an urgent task. One key challenge in RNA structure prediction is the lack of an accurate free energy function to guide RNA folding and discriminate native and near-native structures from decoy conformations. In this study, we developed an all-atom distance-dependent knowledge-based energy function for RNA that is based on a reference state (distance-scaled finite ideal-gas reference state, DFIRE) proven successful for protein structure discrimination. Using four separate benchmarks including RNA puzzles, we found that this DFIRE-based RNA statistical energy function is able to discriminate native and near-native structures against decoys with performance comparable with or better than several existing scoring functions compared. The energy function is expected to be useful for improving the detection of RNA near-native structures.
Collapse
Affiliation(s)
- Tongchuan Zhang
- Institute for Glycomics, School of Informatics and Communication Technology, Griffith University, Southport, Australia
| | - Guodong Hu
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, Guangzhou, China
| | - Jihua Wang
- Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| | - Yaoqi Zhou
- Institute for Glycomics, School of Informatics and Communication Technology, Griffith University, Southport, Australia.,Shandong Provincial Key Laboratory of Biophysics, Institute of Biophysics, Dezhou University, Dezhou, China
| |
Collapse
|
43
|
Akhter N, Chennupati G, Kabir KL, Djidjev H, Shehu A. Unsupervised and Supervised Learning over theEnergy Landscape for Protein Decoy Selection. Biomolecules 2019; 9:E607. [PMID: 31615116 PMCID: PMC6843838 DOI: 10.3390/biom9100607] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Revised: 10/03/2019] [Accepted: 10/04/2019] [Indexed: 11/17/2022] Open
Abstract
The energy landscape that organizes microstates of a molecular system and governs theunderlying molecular dynamics exposes the relationship between molecular form/structure, changesto form, and biological activity or function in the cell. However, several challenges stand in the wayof leveraging energy landscapes for relating structure and structural dynamics to function. Energylandscapes are high-dimensional, multi-modal, and often overly-rugged. Deep wells or basins inthem do not always correspond to stable structural states but are instead the result of inherentinaccuracies in semi-empirical molecular energy functions. Due to these challenges, energeticsis typically ignored in computational approaches addressing long-standing central questions incomputational biology, such as protein decoy selection. In the latter, the goal is to determine over apossibly large number of computationally-generated three-dimensional structures of a protein thosestructures that are biologically-active/native. In recent work, we have recast our attention on theprotein energy landscape and its role in helping us to advance decoy selection. Here, we summarizesome of our successes so far in this direction via unsupervised learning. More importantly, we furtheradvance the argument that the energy landscape holds valuable information to aid and advance thestate of protein decoy selection via novel machine learning methodologies that leverage supervisedlearning. Our focus in this article is on decoy selection for the purpose of a rigorous, quantitativeevaluation of how leveraging protein energy landscapes advances an important problem in proteinmodeling. However, the ideas and concepts presented here are generally useful to make discoveriesin studies aiming to relate molecular structure and structural dynamics to function.
Collapse
Affiliation(s)
- Nasrin Akhter
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Gopinath Chennupati
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Kazi Lutful Kabir
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
| | - Hristo Djidjev
- Information Sciences (CCS-3) Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA.
- Center for Adaptive Human-Machine Partnership, George Mason University, Fairfax, VA 22030, USA.
- Department of Bioengineering, George Mason University, Fairfax, VA 22030, USA.
- School of Systems Biology, George Mason University, Fairfax, VA 22030, USA.
| |
Collapse
|
44
|
Larijani MS, Sadat SM, Bolhassani A, Pouriayevali MH, Bahramali G, Ramezani A. In Silico Design and Immunologic Evaluation of HIV-1 p24-Nef Fusion Protein to Approach a Therapeutic Vaccine Candidate. Curr HIV Res 2019; 16:322-337. [PMID: 30605062 PMCID: PMC6446525 DOI: 10.2174/1570162x17666190102151717] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2018] [Revised: 12/04/2018] [Accepted: 12/27/2018] [Indexed: 01/24/2023]
Abstract
Background: Acquired immune deficiency syndrome (HIV/AIDS) has been a major glob-al health concern for over 38 years. No safe and effective preventive or therapeutic vaccine has been developed although many products have been investigated. Computational methods have facilitated vaccine developments in recent decades. Among HIV-1 proteins, p24 and Nef are two suitable targets to provoke the cellular immune response. However, the fusion form of these two proteins has not been analyzed in silico yet. Objective: This study aimed at the evaluation of possible fusion forms of p24 and Nef in order to achieve a potential therapeutic subunit vaccine against HIV-1. Method: In this study, various computational approaches have been applied to predict the most effec-tive fusion form of p24-Nef including CTL (Cytotoxic T lymphocytes) response, immunogenicity, conservation and population coverage. Moreover, binding to MHC (Major histocompatibility com-plex) molecules was assessed in both human and BALB/c. Results: After analyzing six possible fusion protein forms using AAY linker, we came up with the most practical form of p24 from 80 to 231 and Nef from 120 to 150 regions (according to their refer-ence sequence of HXB2 strain) using an AAY linker, based on their peptides affinity to MHC mole-cules which are located in a conserved region among different virus clades. The selected fusion protein contains seventeen MHC I antigenic epitopes, among them KRWIILGLN, YKRWIILGL, DIAG-TTSTL and FPDWQNYTP are fully conserved between the virus clades. Furthermore, analyzed class I CTL epitopes showed greater affinity binding to HLA-B 57*01, HLA-B*51:01 and HLA-B 27*02 molecules. The population coverage with the rate of >70% coverage in the Persian population supports this truncated form as an appropriate candidate against HIV-I virus. Conclusion: The predicted fusion protein, p24-AAY-Nef in a truncated form with a high rate of T cell epitopes and high conservancy rate among different clades, provides a helpful model for developing a therapeutic vaccine candidate against HIV-1.
Collapse
Affiliation(s)
- Mona Sadat Larijani
- Hepatitis, AIDS and Bloodborne Diseases Department, Pasteur Institute of Iran, Tehran, Iran
| | - Seyed Mehdi Sadat
- Hepatitis, AIDS and Bloodborne Diseases Department, Pasteur Institute of Iran, Tehran, Iran
| | - Azam Bolhassani
- Hepatitis, AIDS and Bloodborne Diseases Department, Pasteur Institute of Iran, Tehran, Iran
| | - Mohammad Hassan Pouriayevali
- Department of Arboviruses and Viral Hemorrhagic Fevers (National Ref Lab), Pasteur Institute of Iran (IPI) Tehran, Iran
| | - Golnaz Bahramali
- Hepatitis, AIDS and Bloodborne Diseases Department, Pasteur Institute of Iran, Tehran, Iran
| | - Amitis Ramezani
- Hepatitis, AIDS and Bloodborne Diseases Department, Pasteur Institute of Iran, Tehran, Iran
| |
Collapse
|
45
|
Ochoa R, Laio A, Cossio P. Predicting the Affinity of Peptides to Major Histocompatibility Complex Class II by Scoring Molecular Dynamics Simulations. J Chem Inf Model 2019; 59:3464-3473. [PMID: 31290667 DOI: 10.1021/acs.jcim.9b00403] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Predicting the binding affinity of peptides able to interact with major histocompatibility complex (MHC) molecules is a priority for researchers working in the identification of novel vaccines candidates. Most available approaches are based on the analysis of the sequence of peptides of known experimental affinity. However, for MHC class II receptors, these approaches are not very accurate, due to the intrinsic flexibility of the complex. To overcome these limitations, we propose to estimate the binding affinity of peptides bound to an MHC class II by averaging the score of the configurations from finite-temperature molecular dynamics simulations. The score is estimated for 18 different scoring functions, and we explored the optimal manner for combining them. To test the predictions, we considered eight peptides of known binding affinity. We found that six scoring functions correlate with the experimental ranking of the peptides significantly better than the others. We then assessed a set of techniques for combining the scoring functions by linear regression and logistic regression. We obtained a maximum accuracy of 82% for the predicted sign of the binding affinity using a logistic regression with optimized weights. These results are potentially useful to improve the reliability of in silico protocols to design high-affinity binding peptides for MHC class II receptors.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Biophysics of Tropical Diseases, Max Planck Tandem Group , University of Antioquia , 050010 Medellin , Colombia
| | - Alessandro Laio
- International School for Advanced Studies (SISSA) , Via Bonomea 265 , 34136 Trieste , Italy.,The Abdus Salam International Centre for Theoretical Physics (ICTP) , Strada Costiera 11 , 34151 Trieste , Italy
| | - Pilar Cossio
- Biophysics of Tropical Diseases, Max Planck Tandem Group , University of Antioquia , 050010 Medellin , Colombia.,Department of Theoretical Biophysics , Max Planck Institute of Biophysics , 60438 Frankfurt am Main , Germany
| |
Collapse
|
46
|
Wang L, Wang HF, Liu SR, Yan X, Song KJ. Predicting Protein-Protein Interactions from Matrix-Based Protein Sequence Using Convolution Neural Network and Feature-Selective Rotation Forest. Sci Rep 2019; 9:9848. [PMID: 31285519 PMCID: PMC6614364 DOI: 10.1038/s41598-019-46369-4] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2019] [Accepted: 06/10/2019] [Indexed: 01/09/2023] Open
Abstract
Protein is an essential component of the living organism. The prediction of protein-protein interactions (PPIs) has important implications for understanding the behavioral processes of life, preventing diseases, and developing new drugs. Although the development of high-throughput technology makes it possible to identify PPIs in large-scale biological experiments, it restricts the extensive use of experimental methods due to the constraints of time, cost, false positive rate and other conditions. Therefore, there is an urgent need for computational methods as a supplement to experimental methods to predict PPIs rapidly and accurately. In this paper, we propose a novel approach, namely CNN-FSRF, for predicting PPIs based on protein sequence by combining deep learning Convolution Neural Network (CNN) with Feature-Selective Rotation Forest (FSRF). The proposed method firstly converts the protein sequence into the Position-Specific Scoring Matrix (PSSM) containing biological evolution information, then uses CNN to objectively and efficiently extracts the deeply hidden features of the protein, and finally removes the redundant noise information by FSRF and gives the accurate prediction results. When performed on the PPIs datasets Yeast and Helicobacter pylori, CNN-FSRF achieved a prediction accuracy of 97.75% and 88.96%. To further evaluate the prediction performance, we compared CNN-FSRF with SVM and other existing methods. In addition, we also verified the performance of CNN-FSRF on independent datasets. Excellent experimental results indicate that CNN-FSRF can be used as a useful complement to biological experiments to identify protein interactions.
Collapse
Affiliation(s)
- Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China. .,Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, P.R. China.
| | - Hai-Feng Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - San-Rong Liu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China
| | - Xin Yan
- School of Foreign Languages, Zaozhuang University, Zaozhuang, Shandong, 277100, P.R. China.
| | - Ke-Jian Song
- School of information engineering, JiangXi University of Science and Technology, Ganzhou, Jiangxi, 341000, P.R. China
| |
Collapse
|
47
|
Wang E, Sun H, Wang J, Wang Z, Liu H, Zhang JZH, Hou T. End-Point Binding Free Energy Calculation with MM/PBSA and MM/GBSA: Strategies and Applications in Drug Design. Chem Rev 2019; 119:9478-9508. [DOI: 10.1021/acs.chemrev.9b00055] [Citation(s) in RCA: 578] [Impact Index Per Article: 115.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Ercheng Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Huiyong Sun
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Junmei Wang
- Department of Pharmaceutical Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Zhe Wang
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - Hui Liu
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| | - John Z. H. Zhang
- Shanghai Engineering Research Center of Molecular Therapeutics & New Drug Development, Shanghai Key Laboratory of Green Chemistry & Chemical Process, School of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China
- NYU−ECNU Center for Computational Chemistry, NYU Shanghai, Shanghai 200122, China
- Department of Chemistry, New York University, New York, New York 10003, United States
- Collaborative Innovation Center of Extreme Optics, Shanxi University, Taiyuan, Shanxi 030006, China
| | - Tingjun Hou
- Hangzhou Institute of Innovative Medicine, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, China
| |
Collapse
|
48
|
Heo L, Arbour CF, Feig M. Driven to near-experimental accuracy by refinement via molecular dynamics simulations. Proteins 2019; 87:1263-1275. [PMID: 31197841 DOI: 10.1002/prot.25759] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Revised: 06/01/2019] [Accepted: 06/07/2019] [Indexed: 12/17/2022]
Abstract
Protein model refinement has been an essential part of successful protein structure prediction. Molecular dynamics simulation-based refinement methods have shown consistent improvement of protein models. There had been progress in the extent of refinement for a few years since the idea of ensemble averaging of sampled conformations emerged. There was little progress in CASP12 because conformational sampling was not sufficiently diverse due to harmonic restraints. During CASP13, a new refinement method was tested that achieved significant improvements over CASP12. The new method intended to address previous bottlenecks in the refinement problem by introducing new features. Flat-bottom harmonic restraints replaced harmonic restraints, sampling was performed iteratively, and a new scoring function and selection criteria were used. The new protocol expanded conformational sampling at reduced computational costs. In addition to overall improvements, some models were refined significantly to near-experimental accuracy.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| | - Collin F Arbour
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan
| |
Collapse
|
49
|
Baek M, Park T, Heo L, Park C, Seok C. GalaxyHomomer: a web server for protein homo-oligomer structure prediction from a monomer sequence or structure. Nucleic Acids Res 2019; 45:W320-W324. [PMID: 28387820 PMCID: PMC5570155 DOI: 10.1093/nar/gkx246] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/05/2017] [Indexed: 11/18/2022] Open
Abstract
Homo-oligomerization of proteins is abundant in nature, and is often intimately related with the physiological functions of proteins, such as in metabolism, signal transduction or immunity. Information on the homo-oligomer structure is therefore important to obtain a molecular-level understanding of protein functions and their regulation. Currently available web servers predict protein homo-oligomer structures either by template-based modeling using homo-oligomer templates selected from the protein structure database or by ab initio docking of monomer structures resolved by experiment or predicted by computation. The GalaxyHomomer server, freely accessible at http://galaxy.seoklab.org/homomer, carries out template-based modeling, ab initio docking or both depending on the availability of proper oligomer templates. It also incorporates recently developed model refinement methods that can consistently improve model quality. Moreover, the server provides additional options that can be chosen by the user depending on the availability of information on the monomer structure, oligomeric state and locations of unreliable/flexible loops or termini. The performance of the server was better than or comparable to that of other available methods when tested on benchmark sets and in a recent CASP performed in a blind fashion.
Collapse
Affiliation(s)
- Minkyung Baek
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Taeyong Park
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| | - Chiwook Park
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, IN 47907, USA
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 151-747, Korea
| |
Collapse
|
50
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|