1
|
Xu G, Luo Z, Yan Y, Wang Q, Ma J. OPUS-Rota5: A highly accurate protein side-chain modeling method with 3D-Unet and RotaFormer. Structure 2024; 32:1001-1010.e2. [PMID: 38657613 DOI: 10.1016/j.str.2024.03.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/06/2024] [Accepted: 03/28/2024] [Indexed: 04/26/2024]
Abstract
Accurate protein side-chain modeling is crucial for protein folding and design. This is particularly true for molecular docking as ligands primarily interact with side chains. In this study, we introduce a two-stage side-chain modeling approach called OPUS-Rota5. It leverages a modified 3D-Unet to capture the local environmental features, including ligand information of each residue, and then employs the RotaFormer module to aggregate various types of features. Evaluation on three test sets, including recently released targets from CAMEO and CASP15, shows that OPUS-Rota5 significantly outperforms some other leading side-chain modeling methods. We also employ OPUS-Rota5 to refine the side chains of 25 G protein-coupled receptor targets predicted by AlphaFold2 and achieve a significantly improved success rate in a subsequent "back" docking of their natural ligands. Therefore, OPUS-Rota5 is a useful and effective tool for molecular docking, particularly for targets with relatively accurate predicted backbones but not side chains such as high-homology targets.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Zhenwei Luo
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China
| | - Yaming Yan
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China
| | - Qinghua Wang
- Center for Biomolecular Innovation, Harcam Biomedicines, Shanghai 200131, China
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China; Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China; Shanghai AI Laboratory, Shanghai 200030, China.
| |
Collapse
|
2
|
Rozano L, Hane JK, Mancera RL. The Molecular Docking of MAX Fungal Effectors with Plant HMA Domain-Binding Proteins. Int J Mol Sci 2023; 24:15239. [PMID: 37894919 PMCID: PMC10607590 DOI: 10.3390/ijms242015239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Fungal effector proteins are important in mediating disease infections in agriculturally important crops. These secreted small proteins are known to interact with their respective host receptor binding partners in the host, either inside the cells or in the apoplastic space, depending on the localisation of the effector proteins. Consequently, it is important to understand the interactions between fungal effector proteins and their target host receptor binding partners, particularly since this can be used for the selection of potential plant resistance or susceptibility-related proteins that can be applied to the breeding of new cultivars with disease resistance. In this study, molecular docking simulations were used to characterise protein-protein interactions between effector and plant receptors. Benchmarking was undertaken using available experimental structures of effector-host receptor complexes to optimise simulation parameters, which were then used to predict the structures and mediating interactions of effector proteins with host receptor binding partners that have not yet been characterised experimentally. Rigid docking was applied for both the so-called bound and unbound docking of MAX effectors with plant HMA domain protein partners. All bound complexes used for benchmarking were correctly predicted, with 84% being ranked as the top docking pose using the ZDOCK scoring function. In the case of unbound complexes, a minimum of 95% of known residues were predicted to be part of the interacting interface on the host receptor binding partner, and at least 87% of known residues were predicted to be part of the interacting interface on the effector protein. Hydrophobic interactions were found to dominate the formation of effector-plant protein complexes. An optimised set of docking parameters based on the use of ZDOCK and ZRANK scoring functions were established to enable the prediction of near-native docking poses involving different binding interfaces on plant HMA domain proteins. Whilst this study was limited by the availability of the experimentally determined complexed structures of effectors and host receptor binding partners, we demonstrated the potential of molecular docking simulations to predict the likely interactions between effectors and their respective host receptor binding partners. This computational approach may accelerate the process of the discovery of putative interacting plant partners of effector proteins and contribute to effector-assisted marker discovery, thereby supporting the breeding of disease-resistant crops.
Collapse
Affiliation(s)
- Lina Rozano
- Curtin Medical School, Curtin Health Innovation Research Institute, GPO Box U1987, Perth, WA 6845, Australia
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| | - James K. Hane
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
- Centre for Crop and Disease Management, School of Molecular and Life Sciences, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| | - Ricardo L. Mancera
- Curtin Medical School, Curtin Health Innovation Research Institute, GPO Box U1987, Perth, WA 6845, Australia
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| |
Collapse
|
3
|
Mufassirin MMM, Newton MAH, Sattar A. Artificial intelligence for template-free protein structure prediction: a comprehensive review. Artif Intell Rev 2022. [DOI: 10.1007/s10462-022-10350-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
4
|
Liu J, Zhang C, Lai L. GeoPacker: A novel deep learning framework for protein side-chain modeling. Protein Sci 2022; 31:e4484. [PMID: 36309961 PMCID: PMC9667900 DOI: 10.1002/pro.4484] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 10/23/2022] [Accepted: 10/26/2022] [Indexed: 12/13/2022]
Abstract
Atomic interactions play essential roles in protein folding, structure stabilization, and function performance. Recent advances in deep learning-based methods have achieved impressive success not only in protein structure prediction, but also in protein sequence design. However, highly efficient and accurate protein side-chain prediction methods that can give detailed atomic interactions are still lacking. In the present study, we developed a deep learning based method, GeoPacker, that uses geometric deep learning coupled ResNet for protein side-chain modeling. GeoPacker explicitly represents atomic interactions with rotational and translational invariance for information extraction of relative locations. GeoPacker outperformed the state-of-the-art energy function-based methods in side-chain structure prediction accuracy and runs about 10 and 700 times faster than the deep learning-based method DLPacker and OPUS-rota4 with comparable prediction accuracy, respectively. The performance of GeoPacker does not depend on the secondary structures that the residues belong to. GeoPacker gives highly accurate predictions for buried residues in the protein core as well as protein-protein interface, making it a useful tool for protein structure modeling, protein, and interaction design.
Collapse
Affiliation(s)
- Jiale Liu
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| | - Changsheng Zhang
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
| | - Luhua Lai
- Center for Life Sciences, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
- BNLMS, College of Chemistry and Molecular EngineeringPeking UniversityBeijingChina
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary StudiesPeking UniversityBeijingChina
| |
Collapse
|
5
|
Structure Prediction, Evaluation, and Validation of GPR18 Lipid Receptor Using Free Programs. Int J Mol Sci 2022; 23:ijms23147917. [PMID: 35887268 PMCID: PMC9319093 DOI: 10.3390/ijms23147917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/04/2022] [Accepted: 07/08/2022] [Indexed: 11/30/2022] Open
Abstract
The GPR18 receptor, often referred to as the N-arachidonylglycine receptor, although assigned (along with GPR55 and GPR119) to the new class A GPCR subfamily-lipid receptors, officially still has the status of a class A GPCR orphan. While its signaling pathways and biological significance have not yet been fully elucidated, increasing evidence points to the therapeutic potential of GPR18 in relation to immune, neurodegenerative, and cancer processes to name a few. Therefore, it is necessary to understand the interactions of potential ligands with the receptor and the influence of particular structural elements on their activity. Thus, given the lack of an experimentally solved structure, the goal of the present study was to obtain a homology model of the GPR18 receptor in the inactive state, meeting all requirements in terms of protein structure quality and recognition of active ligands. To increase the reliability and precision of the predictions, different contemporary protein structure prediction methods and software were used and compared herein. To test the usability of the resulting models, we optimized and compared the selected structures followed by the assessment of the ability to recognize known, active ligands. The stability of the predicted poses was then evaluated by means of molecular dynamics simulations. On the other hand, most of the best-ranking contemporary CADD software/platforms for its full usability require rather expensive licenses. To overcome this down-to-earth obstacle, the overarching goal of these studies was to test whether it is possible to perform the thorough CADD experiments with high scientific confidence while using only license-free/academic software and online platforms. The obtained results indicate that a wide range of freely available software and/or academic licenses allow us to carry out meaningful molecular modelling/docking studies.
Collapse
|
6
|
Multi-task learning to leverage partially annotated data for PPI interface prediction. Sci Rep 2022; 12:10487. [PMID: 35729253 PMCID: PMC9213449 DOI: 10.1038/s41598-022-13951-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/29/2022] Open
Abstract
Protein protein interactions (PPI) are crucial for protein functioning, nevertheless predicting residues in PPI interfaces from the protein sequence remains a challenging problem. In addition, structure-based functional annotations, such as the PPI interface annotations, are scarce: only for about one-third of all protein structures residue-based PPI interface annotations are available. If we want to use a deep learning strategy, we have to overcome the problem of limited data availability. Here we use a multi-task learning strategy that can handle missing data. We start with the multi-task model architecture, and adapted it to carefully handle missing data in the cost function. As related learning tasks we include prediction of secondary structure, solvent accessibility, and buried residue. Our results show that the multi-task learning strategy significantly outperforms single task approaches. Moreover, only the multi-task strategy is able to effectively learn over a dataset extended with structural feature data, without additional PPI annotations. The multi-task setup becomes even more important, if the fraction of PPI annotations becomes very small: the multi-task learner trained on only one-eighth of the PPI annotations—with data extension—reaches the same performances as the single-task learner on all PPI annotations. Thus, we show that the multi-task learning strategy can be beneficial for a small training dataset where the protein’s functional properties of interest are only partially annotated.
Collapse
|
7
|
Akhter N, Kabir KL, Chennupati G, Vangara R, Alexandrov BS, Djidjev H, Shehu A. Improved Protein Decoy Selection via Non-Negative Matrix Factorization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1670-1682. [PMID: 33400654 DOI: 10.1109/tcbb.2020.3049088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
A central challenge in protein modeling research and protein structure prediction in particular is known as decoy selection. The problem refers to selecting biologically-active/native tertiary structures among a multitude of physically-realistic structures generated by template-free protein structure prediction methods. Research on decoy selection is active. Clustering-based methods are popular, but they fail to identify good/near-native decoys on datasets where near-native decoys are severely under-sampled by a protein structure prediction method. Reasonable progress is reported by methods that additionally take into account the internal energy of a structure and employ it to identify basins in the energy landscape organizing the multitude of decoys. These methods, however, incur significant time costs for extracting basins from the landscape. In this paper, we propose a novel decoy selection method based on non-negative matrix factorization. We demonstrate that our method outperforms energy landscape-based methods. In particular, the proposed method addresses both the time cost issue and the challenge of identifying good decoys in a sparse dataset, successfully recognizing near-native decoys for both easy and hard protein targets.
Collapse
|
8
|
Xu G, Wang Q, Ma J. OPUS-Rota4: a gradient-based protein side-chain modeling framework assisted by deep learning-based predictors. Brief Bioinform 2022; 23:bbab529. [PMID: 34905769 PMCID: PMC8769891 DOI: 10.1093/bib/bbab529] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/11/2021] [Accepted: 11/15/2021] [Indexed: 11/13/2022] Open
Abstract
Accurate protein side-chain modeling is crucial for protein folding and protein design. In the past decades, many successful methods have been proposed to address this issue. However, most of them depend on the discrete samples from the rotamer library, which may have limitations on their accuracies and usages. In this study, we report an open-source toolkit for protein side-chain modeling, named OPUS-Rota4. It consists of three modules: OPUS-RotaNN2, which predicts protein side-chain dihedral angles; OPUS-RotaCM, which measures the distance and orientation information between the side chain of different residue pairs and OPUS-Fold2, which applies the constraints derived from the first two modules to guide side-chain modeling. OPUS-Rota4 adopts the dihedral angles predicted by OPUS-RotaNN2 as its initial states, and uses OPUS-Fold2 to refine the side-chain conformation with the side-chain contact map constraints derived from OPUS-RotaCM. Therefore, we convert the side-chain modeling problem into a side-chain contact map prediction problem. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to include other differentiable energy terms. OPUS-Rota4 also provides a platform in which the side-chain conformation can be dynamically adjusted under the influence of other processes. We apply OPUS-Rota4 on 15 FM predictions submitted by AlphaFold2 on CASP14, the results show that the side chains modeled by OPUS-Rota4 are closer to their native counterparts than those predicted by AlphaFold2 (e.g. the residue-wise RMSD for all residues and core residues are 0.588 and 0.472 for AlphaFold2, and 0.535 and 0.407 for OPUS-Rota4).
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems Fudan University Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center Fudan University Shanghai, 201210, China
- Shanghai AI Laboratory Shanghai, 200030, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology Baylor College of Medicine Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems Fudan University Shanghai, 200433, China
- Zhangjiang Fudan International Innovation Center Fudan University Shanghai, 201210, China
- Shanghai AI Laboratory Shanghai, 200030, China
| |
Collapse
|
9
|
Narykov O, Johnson NT, Korkin D. Predicting protein interaction network perturbation by alternative splicing with semi-supervised learning. Cell Rep 2021; 37:110045. [PMID: 34818539 DOI: 10.1016/j.celrep.2021.110045] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 07/21/2021] [Accepted: 11/02/2021] [Indexed: 10/19/2022] Open
Abstract
Alternative splicing introduces an additional layer of protein diversity and complexity in regulating cellular functions that can be specific to the tissue and cell type, physiological state of a cell, or disease phenotype. Recent high-throughput experimental studies have illuminated the functional role of splicing events through rewiring protein-protein interactions; however, the extent to which the macromolecular interactions are affected by alternative splicing has yet to be fully understood. In silico methods provide a fast and cheap alternative to interrogating functional characteristics of thousands of alternatively spliced isoforms. Here, we develop an accurate feature-based machine learning approach that predicts whether a protein-protein interaction carried out by a reference isoform is perturbed by an alternatively spliced isoform. Our method, called the alternatively spliced interactions prediction (ALT-IN) tool, is compared with the state-of-the-art PPI prediction tools and shows superior performance, achieving 0.92 in precision and recall values.
Collapse
Affiliation(s)
- Oleksandr Narykov
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA
| | - Nathan T Johnson
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA; Harvard Program in Therapeutic Sciences, Harvard Medical School, and Breast Tumor Immunology Laboratory, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Dmitry Korkin
- Department of Computer Science, and Bioinformatics and Computational Biology Program, Worcester Polytechnic Institute, Worcester, MA, USA.
| |
Collapse
|
10
|
Johansson-Åkhe I, Mirabello C, Wallner B. InterPepRank: Assessment of Docked Peptide Conformations by a Deep Graph Network. FRONTIERS IN BIOINFORMATICS 2021; 1:763102. [PMID: 36303778 PMCID: PMC9581042 DOI: 10.3389/fbinf.2021.763102] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Accepted: 10/05/2021] [Indexed: 11/13/2022] Open
Abstract
Peptide-protein interactions between a smaller or disordered peptide stretch and a folded receptor make up a large part of all protein-protein interactions. A common approach for modeling such interactions is to exhaustively sample the conformational space by fast-Fourier-transform docking, and then refine a top percentage of decoys. Commonly, methods capable of ranking the decoys for selection fast enough for larger scale studies rely on first-principle energy terms such as electrostatics, Van der Waals forces, or on pre-calculated statistical potentials. We present InterPepRank for peptide-protein complex scoring and ranking. InterPepRank is a machine learning-based method which encodes the structure of the complex as a graph; with physical pairwise interactions as edges and evolutionary and sequence features as nodes. The graph network is trained to predict the LRMSD of decoys by using edge-conditioned graph convolutions on a large set of peptide-protein complex decoys. InterPepRank is tested on a massive independent test set with no targets sharing CATH annotation nor 30% sequence identity with any target in training or validation data. On this set, InterPepRank has a median AUC of 0.86 for finding coarse peptide-protein complexes with LRMSD < 4Å. This is an improvement compared to other state-of-the-art ranking methods that have a median AUC between 0.65 and 0.79. When included as a selection-method for selecting decoys for refinement in a previously established peptide docking pipeline, InterPepRank improves the number of medium and high quality models produced by 80% and 40%, respectively. The InterPepRank program as well as all scripts for reproducing and retraining it are available from: http://wallnerlab.org/InterPepRank.
Collapse
|
11
|
Xu G, Wang Q, Ma J. OPUS-X: an open-source toolkit for protein torsion angles, secondary structure, solvent accessibility, contact map predictions and 3D folding. Bioinformatics 2021; 38:108-114. [PMID: 34478500 PMCID: PMC8696105 DOI: 10.1093/bioinformatics/btab633] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/09/2021] [Accepted: 09/01/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION The development of an open-source platform to predict protein 1D features and 3D structure is an important task. In this paper, we report an open-source toolkit for protein 3D structure modeling, named OPUS-X. It contains three modules: OPUS-TASS2, which predicts protein torsion angles, secondary structure and solvent accessibility; OPUS-Contact, which measures the distance and orientation information between different residue pairs; and OPUS-Fold2, which uses the constraints derived from the first two modules to guide folding. RESULTS OPUS-TASS2 is an upgraded version of our previous method OPUS-TASS. OPUS-TASS2 integrates protein global structure information and significantly outperforms OPUS-TASS. OPUS-Contact combines multiple raw co-evolutionary features with protein 1D features predicted by OPUS-TASS2, and delivers better results than the open-source state-of-the-art method trRosetta. OPUS-Fold2 is a complementary version of our previous method OPUS-Fold. OPUS-Fold2 is a gradient-based protein folding framework based on the differentiable energy terms in opposed to OPUS-Fold that is a sampling-based method used to deal with the non-differentiable terms. OPUS-Fold2 exhibits comparable performance to the Rosetta folding protocol in trRosetta when using identical inputs. OPUS-Fold2 is written in Python and TensorFlow2.4, which is user-friendly to any source-code-level modification. AVAILABILITYAND IMPLEMENTATION The code and pre-trained models of OPUS-X can be downloaded from https://github.com/OPUS-MaLab/opus_x. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China,Zhangjiang Fudan International Innovation Center, Fudan University, Shanghai 201210, China,Shanghai AI Laboratory, Shanghai 200030, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
12
|
Liu J, Wu T, Guo Z, Hou J, Cheng J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins 2021; 90:58-72. [PMID: 34291486 PMCID: PMC8671168 DOI: 10.1002/prot.26186] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/21/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]
Abstract
Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.
Collapse
Affiliation(s)
- Jian Liu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Zhiye Guo
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, Missouri, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri, USA
| |
Collapse
|
13
|
Shuvo MH, Gulfam M, Bhattacharya D. DeepRefiner: high-accuracy protein structure refinement by deep network calibration. Nucleic Acids Res 2021; 49:W147-W152. [PMID: 33999209 PMCID: PMC8262753 DOI: 10.1093/nar/gkab361] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2021] [Revised: 04/18/2021] [Accepted: 04/23/2021] [Indexed: 12/20/2022] Open
Abstract
The DeepRefiner webserver, freely available at http://watson.cse.eng.auburn.edu/DeepRefiner/, is an interactive and fully configurable online system for high-accuracy protein structure refinement. Fuelled by deep learning, DeepRefiner offers the ability to leverage cutting-edge deep neural network architectures which can be calibrated for on-demand selection of adventurous or conservative refinement modes targeted at degree or consistency of refinement. The method has been extensively tested in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments under the group name 'Bhattacharya-Server' and was officially ranked as the No. 2 refinement server in CASP13 (second only to 'Seok-server' and outperforming all other refinement servers) and No. 2 refinement server in CASP14 (second only to 'FEIG-S' and outperforming all other refinement servers including 'Seok-server'). The DeepRefiner web interface offers a number of convenient features, including (i) fully customizable refinement job submission and validation; (ii) automated job status update, tracking, and notifications; (ii) interactive and interpretable web-based results retrieval with quantitative and visual analysis and (iv) extensive help information on job submission and results interpretation via web-based tutorial and help tooltips.
Collapse
Affiliation(s)
- Md Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Muhammad Gulfam
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL 36849, USA
- Department of Biological Sciences, Auburn University, Auburn, AL 36849, USA
| |
Collapse
|
14
|
Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep 2021; 11:10943. [PMID: 34035363 PMCID: PMC8149836 DOI: 10.1038/s41598-021-90303-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open
Abstract
The inter-residue contact prediction and deep learning showed the promise to improve the estimation of protein model accuracy (EMA) in the 13th Critical Assessment of Protein Structure Prediction (CASP13). To further leverage the improved inter-residue distance predictions to enhance EMA, during the 2020 CASP14 experiment, we integrated several new inter-residue distance features with the existing model quality assessment features in several deep learning methods to predict the quality of protein structural models. According to the evaluation of performance in selecting the best model from the models of CASP14 targets, our three multi-model predictors of estimating model accuracy (MULTICOM-CONSTRUCT, MULTICOM-AI, and MULTICOM-CLUSTER) achieve the averaged loss of 0.073, 0.079, and 0.081, respectively, in terms of the global distance test score (GDT-TS). The three methods are ranked first, second, and third out of all 68 CASP14 predictors. MULTICOM-DEEP, the single-model predictor of estimating model accuracy (EMA), is ranked within top 10 among all the single-model EMA methods according to GDT-TS score loss. The results demonstrate that inter-residue distance features are valuable inputs for deep learning to predict the quality of protein structural models. However, larger training datasets and better ways of leveraging inter-residue distance information are needed to fully explore its potentials.
Collapse
|
15
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
16
|
Xu G, Wang Q, Ma J. OPUS-Rota3: Improving Protein Side-Chain Modeling by Deep Neural Networks and Ensemble Methods. J Chem Inf Model 2020; 60:6691-6697. [PMID: 33211480 DOI: 10.1021/acs.jcim.0c00951] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Side-chain modeling is critical for protein structure prediction since the uniqueness of the protein structure is largely determined by its side-chain packing conformation. In this paper, differing from most approaches that rely on rotamer library sampling, we first propose a novel side-chain rotamer prediction method based on deep neural networks, named OPUS-RotaNN. Then, on the basis of our previous work OPUS-Rota2, we propose an open-source side-chain modeling framework, OPUS-Rota3, which integrates the results of different methods into its rotamer library as the sampling candidates. By including OPUS-RotaNN into OPUS-Rota3, we conduct our experiments on three native backbone test sets and one non-native backbone test set. On the native backbone test set, CAMEO-Hard61 for example, OPUS-Rota3 successfully predicts 51.14% of all side-chain dihedral angles with a tolerance criterion of 20° and outperforms OSCAR-star (50.87%), SCWRL4 (50.40%), and FASPR (49.85%). On the non-native backbone test set DB379-ITASSER, the accuracy of OPUS-Rota3 is 52.49%, better than OSCAR-star (48.95%), FASPR (48.69%), and SCWRL4 (48.29%). All the source codes including the training codes and the data we used are available at https://github.com/thuxugang/opus_rota3.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States.,Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
17
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
18
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
19
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
20
|
Xu G, Wang Q, Ma J. OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks. Bioinformatics 2020; 36:5021-5026. [DOI: 10.1093/bioinformatics/btaa629] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2020] [Revised: 06/25/2020] [Accepted: 07/10/2020] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
Predictions of protein backbone torsion angles (ϕ and ψ) and secondary structure from sequence are crucial subproblems in protein structure prediction. With the development of deep learning approaches, their accuracies have been significantly improved. To capture the long-range interactions, most studies integrate bidirectional recurrent neural networks into their models. In this study, we introduce and modify a recently proposed architecture named Transformer to capture the interactions between the two residues theoretically with arbitrary distance. Moreover, we take advantage of multitask learning to improve the generalization of neural network by introducing related tasks into the training process. Similar to many previous studies, OPUS-TASS uses an ensemble of models and achieves better results.
Results
OPUS-TASS uses the same training and validation sets as SPOT-1D. We compare the performance of OPUS-TASS and SPOT-1D on TEST2016 (1213 proteins) and TEST2018 (250 proteins) proposed in the SPOT-1D paper, CASP12 (55 proteins), CASP13 (32 proteins) and CASP-FM (56 proteins) proposed in the SAINT paper, and a recently released PDB structure collection from CAMEO (93 proteins) named as CAMEO93. On these six test sets, OPUS-TASS achieves consistent improvements in both backbone torsion angles prediction and secondary structure prediction. On CAMEO93, SPOT-1D achieves the mean absolute errors of 16.89 and 23.02 for ϕ and ψ predictions, respectively, and the accuracies for 3- and 8-state secondary structure predictions are 87.72 and 77.15%, respectively. In comparison, OPUS-TASS achieves 16.56 and 22.56 for ϕ and ψ predictions, and 89.06 and 78.87% for 3- and 8-state secondary structure predictions, respectively. In particular, after using our torsion angles refinement method OPUS-Refine as the post-processing procedure for OPUS-TASS, the mean absolute errors for final ϕ and ψ predictions are further decreased to 16.28 and 21.98, respectively.
Availability and implementation
The training and the inference codes of OPUS-TASS and its data are available at https://github.com/thuxugang/opus_tass.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
- Department of Bioengineering, Rice University, Houston, TX 77030, USA
| |
Collapse
|
21
|
Dobrzanska DA, Lamaudière MTF, Rollason J, Acton L, Duncan M, Compton S, Simms J, Weedall GD, Morozov IY. Preventive antibiotic treatment of calves: emergence of dysbiosis causing propagation of obese state-associated and mobile multidrug resistance-carrying bacteria. Microb Biotechnol 2020; 13:669-682. [PMID: 31663669 PMCID: PMC7111097 DOI: 10.1111/1751-7915.13496] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2019] [Revised: 08/27/2019] [Accepted: 10/01/2019] [Indexed: 01/10/2023] Open
Abstract
In agriculture, antibiotics are used for the treatment and prevention of livestock disease. Antibiotics perturb the bacterial gut composition but the extent of these changes and potential consequences for animal and human health is still debated. Six calves were housed in a controlled environment. Three animals received an injection of the antibiotic florfenicol (Nuflor), and three received no treatment. Faecal samples were collected at 0, 3 and 7 days, and bacterial communities were profiled to assess the impact of a therapy on the gut microbiota. Phylogenetic analysis (16S-rDNA) established that at day 7, antibiotic-treated microbiota showed a 10-fold increase in facultative anaerobic Escherichia spp, a signature of imbalanced microbiota, dysbiosis. The antibiotic resistome showed a high background of antibiotic resistance genes, which did not significantly change in response to florfenicol. However, the maintenance of Escherichia coli plasmid-encoded quinolone, oqxB and propagation of mcr-2, and colistin resistance genes were observed and confirmed by Sanger sequencing. The microbiota of treated animals was enriched with energy harvesting bacteria, common to obese microbial communities. We propose that antibiotic treatment of healthy animals leads to unbalanced, disease- and obese-related microbiota that promotes growth of E. coli carrying resistance genes on mobile elements, potentially increasing the risk of transmission of antibiotic resistant bacteria to humans.
Collapse
Affiliation(s)
| | | | | | - Lauren Acton
- School of Life SciencesCoventry UniversityCoventryUK
| | - Michael Duncan
- Centre for Sport, Exercise and Life SciencesCoventry UniversityCoventryUK
| | - Sharon Compton
- Moreton Morrell College FarmThe Warwickshire CollegeWarwickshireCV35 9BLUK
| | - John Simms
- School of Life SciencesCoventry UniversityCoventryUK
| | - Gareth D. Weedall
- School of Natural Sciences and PsychologyLiverpool John Moores UniversityLiverpoolUK
| | - Igor Y. Morozov
- Centre for Sport, Exercise and Life SciencesCoventry UniversityCoventryUK
| |
Collapse
|
22
|
Zhai C, Li T, Shi H, Yeo J. Discovery and design of soft polymeric bio-inspired materials with multiscale simulations and artificial intelligence. J Mater Chem B 2020; 8:6562-6587. [DOI: 10.1039/d0tb00896f] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Establishing the “Materials 4.0” paradigm requires intimate knowledge of the virtual space in materials design.
Collapse
Affiliation(s)
- Chenxi Zhai
- J2 Lab for Engineering Living Materials
- Sibley School of Mechanical and Aerospace Engineering
- Cornell University
- Ithaca
- USA
| | - Tianjiao Li
- J2 Lab for Engineering Living Materials
- Sibley School of Mechanical and Aerospace Engineering
- Cornell University
- Ithaca
- USA
| | - Haoyuan Shi
- J2 Lab for Engineering Living Materials
- Sibley School of Mechanical and Aerospace Engineering
- Cornell University
- Ithaca
- USA
| | - Jingjie Yeo
- J2 Lab for Engineering Living Materials
- Sibley School of Mechanical and Aerospace Engineering
- Cornell University
- Ithaca
- USA
| |
Collapse
|
23
|
Christoffer C, Terashi G, Shin WH, Aderinwale T, Maddhuri Venkata Subramaniya SR, Peterson L, Verburgt J, Kihara D. Performance and enhancement of the LZerD protein assembly pipeline in CAPRI 38-46. Proteins 2019; 88:948-961. [PMID: 31697428 DOI: 10.1002/prot.25850] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 10/07/2019] [Accepted: 11/03/2019] [Indexed: 01/17/2023]
Abstract
We report the performance of the protein docking prediction pipeline of our group and the results for Critical Assessment of Prediction of Interactions (CAPRI) rounds 38-46. The pipeline integrates programs developed in our group as well as other existing scoring functions. The core of the pipeline is the LZerD protein-protein docking algorithm. If templates of the target complex are not found in PDB, the first step of our docking prediction pipeline is to run LZerD for a query protein pair. Meanwhile, in the case of human group prediction, we survey the literature to find information that can guide the modeling, such as protein-protein interface information. In addition to any literature information and binding residue prediction, generated docking decoys were selected by a rank aggregation of statistical scoring functions. The top 10 decoys were relaxed by a short molecular dynamics simulation before submission to remove atom clashes and improve side-chain conformations. In these CAPRI rounds, our group, particularly the LZerD server, showed robust performance. On the other hand, there are failed cases where some other groups were successful. To understand weaknesses of our pipeline, we analyzed sources of errors for failed targets. Since we noted that structure refinement is a step that needs improvement, we newly performed a comparative study of several refinement approaches. Finally, we show several examples that illustrate successful and unsuccessful cases by our group.
Collapse
Affiliation(s)
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana.,Department of Chemistry Education, Sunchon National University, Suncheon, Jeollanam-do, Republic of Korea
| | - Tunde Aderinwale
- Department of Computer Science, Purdue University, West Lafayette, Indiana
| | | | - Lenna Peterson
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Jacob Verburgt
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, Indiana.,Department of Biological Sciences, Purdue University, West Lafayette, Indiana.,Purdue University Center for Cancer Research, Purdue University, West Lafayette, Indiana.,Department of Pediatrics, University of Cincinnati, Cincinnati, Ohio
| |
Collapse
|
24
|
Mirzaie M. Identification of native protein structures captured by principal interactions. BMC Bioinformatics 2019; 20:604. [PMID: 31752663 PMCID: PMC6873546 DOI: 10.1186/s12859-019-3186-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 11/01/2019] [Indexed: 11/20/2022] Open
Abstract
Background Evaluation of protein structure is based on trustworthy potential function. The total potential of a protein structure is approximated as the summation of all pair-wise interaction potentials. Knowledge-based potentials (KBP) are one type of potential functions derived by known experimentally determined protein structures. Although several KBP functions with different methods have been introduced, the key interactions that capture the total potential have not studied yet. Results In this study, we seek the interaction types that preserve as much of the total potential as possible. We employ a procedure based on the principal component analysis (PCA) to extract the significant and key interactions in native protein structures. We call these interactions as principal interactions and show that the results of the model that considers only these interactions are very close to the full interaction model that considers all interactions in protein fold recognition. In fact, the principal interactions maintain the discriminative power of the full interaction model. This method was evaluated on 3 KBPs with different contact definitions and thresholds of distance and revealed that their corresponding principal interactions are very similar and have a lot in common. Additionally, the principal interactions consisted of 20 % of the full interactions on average, and they are between residues, which are considered important in protein folding. Conclusions This work shows that all interaction types are not equally important in discrimination of native structure. The results of the reduced model based on principal interactions that were very close to the full interaction model suggest that a new strategy is needed to capture the role of remaining interactions (non-principal interactions) to improve the power of knowledge-based potential functions.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O.Box: 14115-134, Tehran, Iran.
| |
Collapse
|
25
|
Siebenmorgen T, Zacharias M. Computational prediction of protein–protein binding affinities. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2019. [DOI: 10.1002/wcms.1448] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Affiliation(s)
- Till Siebenmorgen
- Physics Department T38 Technical University of Munich Garching Germany
| | - Martin Zacharias
- Physics Department T38 Technical University of Munich Garching Germany
| |
Collapse
|
26
|
Sato R, Ishida T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS One 2019; 14:e0221347. [PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/05/2019] [Indexed: 11/23/2022] Open
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366–0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.
Collapse
Affiliation(s)
- Rin Sato
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
27
|
Xu G, Ma T, Du J, Wang Q, Ma J. OPUS-Rota2: An Improved Fast and Accurate Side-Chain Modeling Method. J Chem Theory Comput 2019; 15:5154-5160. [PMID: 31412199 DOI: 10.1021/acs.jctc.9b00309] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Side-chain modeling plays a critical role in protein structure prediction. However, in many current methods, balancing the speed and accuracy is still challenging. In this paper, on the basis of our previous work OPUS-Rota (Protein Sci. 2008, 17, 1576-1585), we introduce a new side-chain modeling method, OPUS-Rota2, which is tested on both a 65-protein test set (DB65) in the OPUS-Rota paper and a 379-protein test set (DB379) in the SCWRL4 paper. If the main chain is native, OPUS-Rota2 is more accurate than OPUS-Rota, SCWRL4, and OSCAR-star but slightly less accurate than OSCAR-o. Also, if the main chain is non-native, OPUS-Rota2 is more accurate than any other method. Moreover, OPUS-Rota2 is significantly faster than any other method, in particular, 2 orders of magnitude faster than OSCAR-o. Thus, the combination of higher accuracy and speed of OPUS-Rota2 in modeling side chains on both the native and non-native main chains makes OPUS-Rota2 a very useful tool in protein structure modeling.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems , Fudan University , Shanghai 200433 , China.,School of Life Sciences , Tsinghua University , Beijing 100084 , China
| | | | - Junqing Du
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems , Fudan University , Shanghai 200433 , China.,School of Life Sciences , Tsinghua University , Beijing 100084 , China.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States.,School of Life Sciences , Fudan University , Shanghai 200433 , China
| |
Collapse
|
28
|
Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019; 87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]
Abstract
Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue-residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus-based methods.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Myong-Ho Choe
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kun-Sop Han
- Department of Life Science, University of Science, Pyongyang, DPR Korea
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Ali H A Maghrabi
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK
| | - David Menéndez-Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Gabriel Studer
- Biozentrum, University of Basel, Basel, Switzerland.,SIB Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Basel, Switzerland
| | - Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Stockholm, Sweden
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Björn Wallner
- Department of Physics, Chemistry, and Biology, Bioinformatics Division, Linköping University, Linköping, Sweden
| |
Collapse
|
29
|
Yu Z, Yao Y, Deng H, Yi M. ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment. BMC Bioinformatics 2019; 20:299. [PMID: 31159742 PMCID: PMC6547486 DOI: 10.1186/s12859-019-2898-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/13/2019] [Indexed: 01/05/2023] Open
Abstract
Background The knowledge-based statistical potential has been widely used in protein structure modeling and model quality assessment. They are commonly evaluated based on their abilities of native recognition as well as decoy discrimination. However, these two aspects are found to be mutually exclusive in many statistical potentials. Results We developed an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter. When distance cutoff is ≤9.0 Å, “effective atomic interaction” is employed to enhance the ability of native recognition. For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination. Benchmark tests on 632 structural decoy sets from diverse sources demonstrate that ANDIS outperforms other state-of-the-art potentials in both native recognition and decoy discrimination. Conclusions Distance cutoff is a crucial parameter for distance-dependent statistical potentials. A lower distance cutoff is better for native recognition, while a higher one is favorable for decoy discrimination. The ANDIS potential is freely available as a standalone application at http://qbp.hzau.edu.cn/ANDIS/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2898-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongwang Yu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
30
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
31
|
Wang X, Huang SY. Integrating Bonded and Nonbonded Potentials in the Knowledge-Based Scoring Function for Protein Structure Prediction. J Chem Inf Model 2019; 59:3080-3090. [DOI: 10.1021/acs.jcim.9b00057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Xinxiang Wang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
32
|
Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019; 87:1165-1178. [PMID: 30985027 PMCID: PMC6800999 DOI: 10.1002/prot.25697] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 04/04/2019] [Accepted: 04/12/2019] [Indexed: 12/28/2022]
Abstract
Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.
Collapse
Affiliation(s)
- Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| | - Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, Washington
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, Missouri
| |
Collapse
|
33
|
Xu G, Ma T, Wang Q, Ma J. OPUS-SSF: A side-chain-inclusive scoring function for ranking protein structural models. Protein Sci 2019; 28:1157-1162. [PMID: 30919509 DOI: 10.1002/pro.3608] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/21/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
We introduce a side-chain-inclusive scoring function, named OPUS-SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS-CSF [Xu et al., Protein Sci. 2018; 27: 286-292], which exclusively uses main chain information, OPUS-SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS-SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS-CSF, OPUS-SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS-SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China.,Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| |
Collapse
|
34
|
Li J, Fu A, Zhang L. An Overview of Scoring Functions Used for Protein-Ligand Interactions in Molecular Docking. Interdiscip Sci 2019; 11:320-328. [PMID: 30877639 DOI: 10.1007/s12539-019-00327-w] [Citation(s) in RCA: 190] [Impact Index Per Article: 38.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Revised: 02/06/2019] [Accepted: 03/06/2019] [Indexed: 12/17/2022]
Abstract
Currently, molecular docking is becoming a key tool in drug discovery and molecular modeling applications. The reliability of molecular docking depends on the accuracy of the adopted scoring function, which can guide and determine the ligand poses when thousands of possible poses of ligand are generated. The scoring function can be used to determine the binding mode and site of a ligand, predict binding affinity and identify the potential drug leads for a given protein target. Despite intensive research over the years, accurate and rapid prediction of protein-ligand interactions is still a challenge in molecular docking. For this reason, this study reviews four basic types of scoring functions, physics-based, empirical, knowledge-based, and machine learning-based scoring functions, based on an up-to-date classification scheme. We not only discuss the foundations of the four types scoring functions, suitable application areas and shortcomings, but also discuss challenges and potential future study directions.
Collapse
Affiliation(s)
- Jin Li
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China.,School of Medical Information and Engineering, Southwest Medical University, Luzhou, 646000, China
| | - Ailing Fu
- College of Pharmaceutical Sciences, Southwest University, Chongqing, 400715, China
| | - Le Zhang
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China. .,College of Computer Science, Sichuan University, Chengdu, 610065, China. .,Medical Big Data Center, Sichuan University, Chengdu, 610065, China. .,Zdmedical, Information Polytron Technologies Inc Chongqing, Chongqing, 401320, China.
| |
Collapse
|
35
|
Wang CK, Craik DJ. Toward Structure Determination of Disulfide-Rich Peptides Using Chemical Shift-Based Methods. J Phys Chem B 2019; 123:1903-1912. [PMID: 30730741 DOI: 10.1021/acs.jpcb.8b10649] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Disulfide-rich peptides are a class of molecules for which NMR spectroscopy has been the primary tool for structural characterization. Here, we explore whether the process can be achieved by using structural information encoded in chemical shifts. We examine (i) a representative set of five cyclic disulfide-rich peptides that have high-resolution NMR and X-ray structures and (ii) a larger set of 100 disulfide-rich peptides from the PDB. Accuracy of the calculated structures was dependent on the methods used for searching through conformational space and for identifying native conformations. Although Hα chemical shifts could be predicted reasonably well using SHIFTX, agreement between predicted and experimental chemical shifts was sufficient for identifying native conformations for only some peptides in the representative set. Combining chemical shift data with the secondary structure information and potential energy calculations improved the ability to identify native conformations. Additional use of sparse distance restraints or homology information to restrict the search space also improved the resolution of the calculated structures. This study demonstrates that abbreviated methods have potential for elucidation of peptide structures to high resolution and further optimization of these methods, e.g., improvement in chemical shift prediction accuracy, will likely help transition these methods into the mainstream of disulfide-rich peptide structural biology.
Collapse
Affiliation(s)
- Conan K Wang
- Institute for Molecular Bioscience , The University of Queensland , Brisbane , Queensland 4072 , Australia
| | - David J Craik
- Institute for Molecular Bioscience , The University of Queensland , Brisbane , Queensland 4072 , Australia
| |
Collapse
|
36
|
Pei J, Zheng Z, Merz KM. Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection. J Chem Inf Model 2019; 59:1919-1929. [DOI: 10.1021/acs.jcim.8b00734] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jun Pei
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Zheng Zheng
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
- Institute for Cyber Enabled Research, Michigan State University, 567 Wilson Road, East Lansing, Michigan 48824, United States
| |
Collapse
|
37
|
López-Blanco JR, Chacón P. KORP: knowledge-based 6D potential for fast protein and loop modeling. Bioinformatics 2019; 35:3013-3019. [DOI: 10.1093/bioinformatics/btz026] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 01/03/2019] [Accepted: 01/08/2019] [Indexed: 12/18/2022] Open
Abstract
Abstract
Motivation
Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation.
Results
We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function.
Availability and implementation
http://chaconlab.org/modeling/korp.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- José Ramón López-Blanco
- Department of Biological Chemical Physics, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| | - Pablo Chacón
- Department of Biological Chemical Physics, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| |
Collapse
|
38
|
Colbes J, Corona RI, Lezcano C, Rodríguez D, Brizuela CA. Protein side-chain packing problem: is there still room for improvement? Brief Bioinform 2018; 18:1033-1043. [PMID: 27567382 DOI: 10.1093/bib/bbw079] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Indexed: 11/12/2022] Open
Abstract
The protein side-chain packing problem (PSCPP) is an important subproblem of both protein structure prediction and protein design. During the past two decades, a large number of methods have been proposed to tackle this problem. These methods consist of three main components: a rotamer library, a scoring function and a search strategy. The average overall accuracy level obtained by these methods is approximately 87%. Whether a better accuracy level could be achieved remains to be answered. To address this question, we calculated the maximum accuracy level attainable using a simple rotamer library, independently of the energy function or the search method. Using 2883 different structures from the Protein Data Bank, we compared this accuracy level with the accuracy level of five state-of-the-art methods. These comparisons indicated that, for buried residues in the protein, we are already close to the best possible accuracy results. In addition, for exposed residues, we found that a significant gap exists between the possible improvement and the maximum accuracy level achievable with current methods. After determining that an improvement is possible, the next step is to understand what limitations are preventing us from obtaining such an improvement. Previous works on protein structure prediction and protein design have shown that scoring function inaccuracies may represent the main obstacle to achieving better results for these problems. To show that the same is true for the PSCPP, we evaluated the quality of two scoring functions used by some state-of-the-art algorithms. Our results indicate that neither of these scoring functions can guide the search method correctly, thereby reinforcing the idea that efforts to solve the PSCPP must also focus on developing better scoring functions.
Collapse
|
39
|
Zang T, Ma T, Wang Q, Ma J. Improving low-accuracy protein structures using enhanced sampling techniques. J Chem Phys 2018; 149:072319. [PMID: 30134714 PMCID: PMC5995690 DOI: 10.1063/1.5027243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2018] [Accepted: 05/23/2018] [Indexed: 11/14/2022] Open
Abstract
In this paper, we report results of using enhanced sampling and blind selection techniques for high-accuracy protein structural refinement. By combining a parallel continuous simulated tempering (PCST) method, previously developed by Zang et al. [J. Chem. Phys. 141, 044113 (2014)], and the structure based model (SBM) as restraints, we refined 23 targets (18 from the refinement category of the CASP10 and 5 from that of CASP12). We also designed a novel model selection method to blindly select high-quality models from very long simulation trajectories. The combined use of PCST-SBM with the blind selection method yielded final models that are better than initial models. For Top-1 group, 7 out of 23 targets had better models (greater global distance test total scores) than the critical assessment of structure prediction participants. For Top-5 group, 10 out of 23 were better. Our results justify the crucial position of enhanced sampling in protein structure prediction and refinement and demonstrate that a considerable improvement of low-accuracy structures is achievable with current force fields.
Collapse
Affiliation(s)
- Tianwu Zang
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Tianqi Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas 77005, USA
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, USA
| | - Jianpeng Ma
- Author to whom correspondence should be addressed: . Telephone: 713-798-8187. Fax: 713-796-9438
| |
Collapse
|
40
|
Anishchenko I, Kundrotas PJ, Vakser IA. Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model. Biophys J 2018; 115:809-821. [PMID: 30122295 DOI: 10.1016/j.bpj.2018.07.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 07/31/2018] [Indexed: 12/18/2022] Open
Abstract
The energy function is the key component of protein modeling methodology. This work presents a semianalytical approach to the development of contact potentials for protein structure modeling. Residue-residue and atom-atom contact energies were derived by maximizing the probability of observing native sequences in a nonredundant set of protein structures. The optimization task was formulated as an inverse statistical mechanics problem applied to the Potts model. Its solution by pseudolikelihood maximization provides consistent estimates of coupling constants at atomic and residue levels. The best performance was achieved when interacting atoms were grouped according to their physicochemical properties. For individual protein structures, the performance of the contact potentials in distinguishing near-native structures from the decoys is similar to the top-performing scoring functions. The potentials also yielded significant improvement in the protein docking success rates. The potentials recapitulated experimentally determined protein stability changes upon point mutations and protein-protein binding affinities. The approach offers a different perspective on knowledge-based potentials and may serve as the basis for their further development.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas
| | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| |
Collapse
|
41
|
Kryshtafovych A, Adams PD, Lawson CL, Chiu W. Evaluation system and web infrastructure for the second cryo-EM model challenge. J Struct Biol 2018; 204:96-108. [PMID: 30017700 DOI: 10.1016/j.jsb.2018.07.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2018] [Revised: 07/06/2018] [Accepted: 07/10/2018] [Indexed: 01/01/2023]
Abstract
An evaluation system and a web infrastructure were developed for the second cryo-EM model challenge. The evaluation system includes tools to validate stereo-chemical plausibility of submitted models, check their fit to the corresponding density maps, estimate their overall and per-residue accuracy, and assess their similarity to reference cryo-EM or X-ray structures as well as other models submitted in this challenge. The web infrastructure provides a convenient interface for analyzing models at different levels of detail. It includes interactively sortable tables of evaluation scores for different subsets of models and different sublevels of structure organization, and a suite of visualization tools facilitating model analysis. The results are publicly accessible at http://model-compare.emdatabank.org.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA.
| | - Paul D Adams
- Molecular Biophysics & Integrated Bioimaging, LBNL, CA 94720, USA; Department of Bioengineering, University of California Berkeley, CA 94720, USA
| | - Catherine L Lawson
- Institute for Quantitative Biomedicine and Research Collaboratory for Structural Bioinformatics, Rutgers, The State University of New Jersey, 174 Frelinghuysen Road, Piscataway, NJ 08854, USA
| | - Wah Chiu
- Departments of Bioengineering and Microbiology & Immunology, Stanford University, Stanford, CA 94305-5447, USA; Division of CryoEM and Bioimaging, SLAC National Accelerator Laboratory, Menlo Park, CA 94025, USA
| |
Collapse
|
42
|
Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018; 13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open
Abstract
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Qinxin Pan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America
- * E-mail:
| |
Collapse
|
43
|
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 2018; 33:2496-2503. [PMID: 28419290 DOI: 10.1093/bioinformatics/btx222] [Citation(s) in RCA: 130] [Impact Index Per Article: 21.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/12/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation The accurate ranking of predicted structural models and selecting the best model from a given candidate pool remain as open problems in the field of structural bioinformatics. The quality assessment (QA) methods used to address these problems can be grouped into two categories: consensus methods and single-model methods. Consensus methods in general perform better and attain higher correlation between predicted and true quality measures. However, these methods frequently fail to generate proper quality scores for native-like structures which are distinct from the rest of the pool. Conversely, single-model methods do not suffer from this drawback and are better suited for real-life applications where many models from various sources may not be readily available. Results In this study, we developed a support-vector-machine-based single-model global quality assessment (SVMQA) method. For a given protein model, the SVMQA method predicts TM-score and GDT_TS score based on a feature vector containing statistical potential energy terms and consistency-based terms between the actual structural features (extracted from the three-dimensional coordinates) and predicted values (from primary sequence). We trained SVMQA using CASP8, CASP9 and CASP10 targets and determined the machine parameters by 10-fold cross-validation. We evaluated the performance of our SVMQA method on various benchmarking datasets. Results show that SVMQA outperformed the existing best single-model QA methods both in ranking provided protein models and in selecting the best model from the pool. According to the CASP12 assessment, SVMQA was the best method in selecting good-quality models from decoys in terms of GDTloss. Availability and implementation SVMQA method can be freely downloaded from http://lee.kias.re.kr/SVMQA/SVMQA_eval.tar.gz. Contact jlee@kias.re.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| |
Collapse
|
44
|
Colbes J, Aguila SA, Brizuela CA. Scoring of Side-Chain Packings: An Analysis of Weight Factors and Molecular Dynamics Structures. J Chem Inf Model 2018; 58:443-452. [PMID: 29368924 DOI: 10.1021/acs.jcim.7b00679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The protein side-chain packing problem (PSCPP) is a central task in computational protein design. The problem is usually modeled as a combinatorial optimization problem, which consists of searching for a set of rotamers, from a given rotamer library, that minimizes a scoring function (SF). The SF is a weighted sum of terms, that can be decomposed in physics-based and knowledge-based terms. Although there are many methods to obtain approximate solutions for this problem, all of them have similar performances and there has not been a significant improvement in recent years. Studies on protein structure prediction and protein design revealed the limitations of current SFs to achieve further improvements for these two problems. In the same line, a recent work reported a similar result for the PSCPP. In this work, we ask whether or not this negative result regarding further improvements in performance is due to (i) an incorrect weighting of the SFs terms or (ii) the constrained conformation resulting from the protein crystallization process. To analyze these questions, we (i) model the PSCPP as a bi-objective combinatorial optimization problem, optimizing, at the same time, the two most important terms of two SFs of state-of-the-art algorithms and (ii) performed a preprocessing relaxation of the crystal structure through molecular dynamics to simulate the protein in the solvent and evaluated the performance of these two state-of-the-art SFs under these conditions. Our results indicate that (i) no matter what combination of weight factors we use the current SFs will not lead to better performances and (ii) the evaluated SFs will not be able to improve performance on relaxed structures. Furthermore, the experiments revealed that the SFs and the methods are biased toward crystallized structures.
Collapse
Affiliation(s)
- Jose Colbes
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| | - Sergio A Aguila
- Centro de Nanociencias y Nanotecnologia, Universidad Nacional Autonoma de Mexico , Km. 107 Carretera Tijuana-Ensenada, Ensenada, Baja California, Mexico , C.P. 22860
| | - Carlos A Brizuela
- Computer Science Department, CICESE Research Center , 22860 Ensenada, Mexico
| |
Collapse
|
45
|
Wang X, Zhang D, Huang SY. New Knowledge-Based Scoring Function with Inclusion of Backbone Conformational Entropies from Protein Structures. J Chem Inf Model 2018; 58:724-732. [DOI: 10.1021/acs.jcim.7b00601] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Xinxiang Wang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Di Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
46
|
Mirzaie M. Hydrophobic residues can identify native protein structures. Proteins 2018; 86:467-474. [DOI: 10.1002/prot.25466] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 12/28/2017] [Accepted: 01/23/2018] [Indexed: 11/06/2022]
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences; Tarbiat Modares University, Jalal Ale Ahmad Highway; Tehran Iran
- School of Biological Sciences; Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| |
Collapse
|
47
|
Peterson LX, Togawa Y, Esquivel-Rodriguez J, Terashi G, Christoffer C, Roy A, Shin WH, Kihara D. Modeling the assembly order of multimeric heteroprotein complexes. PLoS Comput Biol 2018; 14:e1005937. [PMID: 29329283 PMCID: PMC5785014 DOI: 10.1371/journal.pcbi.1005937] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2017] [Revised: 01/25/2018] [Accepted: 12/19/2017] [Indexed: 12/31/2022] Open
Abstract
Protein-protein interactions are the cornerstone of numerous biological processes. Although an increasing number of protein complex structures have been determined using experimental methods, relatively fewer studies have been performed to determine the assembly order of complexes. In addition to the insights into the molecular mechanisms of biological function provided by the structure of a complex, knowing the assembly order is important for understanding the process of complex formation. Assembly order is also practically useful for constructing subcomplexes as a step toward solving the entire complex experimentally, designing artificial protein complexes, and developing drugs that interrupt a critical step in the complex assembly. There are several experimental methods for determining the assembly order of complexes; however, these techniques are resource-intensive. Here, we present a computational method that predicts the assembly order of protein complexes by building the complex structure. The method, named Path-LzerD, uses a multimeric protein docking algorithm that assembles a protein complex structure from individual subunit structures and predicts assembly order by observing the simulated assembly process of the complex. Benchmarked on a dataset of complexes with experimental evidence of assembly order, Path-LZerD was successful in predicting the assembly pathway for the majority of the cases. Moreover, when compared with a simple approach that infers the assembly path from the buried surface area of subunits in the native complex, Path-LZerD has the strong advantage that it can be used for cases where the complex structure is not known. The path prediction accuracy decreased when starting from unbound monomers, particularly for larger complexes of five or more subunits, for which only a part of the assembly path was correctly identified. As the first method of its kind, Path-LZerD opens a new area of computational protein structure modeling and will be an indispensable approach for studying protein complexes.
Collapse
Affiliation(s)
- Lenna X. Peterson
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Yoichiro Togawa
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Juan Esquivel-Rodriguez
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Charles Christoffer
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
| | - Amitava Roy
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- Department of Medicinal Chemistry and Molecular Pharmacology, Purdue University, West Lafayette, Indiana, United States of America
- Bioinformatics and Computational Biosciences Branch, Rocky Mountain Laboratories, NIAID, National Institutes of Health, Hamilton, Montana, United States of America
| | - Woong-Hee Shin
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, United States of America
- Department of Computer Science, Purdue University, West Lafayette, Indiana, United States of America
- * E-mail:
| |
Collapse
|
48
|
Yao Y, Gui R, Liu Q, Yi M, Deng H. Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction. BMC Bioinformatics 2017; 18:542. [PMID: 29221443 PMCID: PMC5723101 DOI: 10.1186/s12859-017-1983-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment. During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated. RESULTS Based on different distance cutoffs (from 5 to 22 Å) and residue intervals (from 0 to 15) as well as six different reference states, we constructed a series of distance-dependent atom-pair potentials and tested them on several groups of structural decoy sets collected from diverse sources. A comprehensive investigation has been performed to clarify the effects of distance cutoff and residue interval on the potential's performance. Our results provide a new perspective as well as a practical guidance for optimizing distance-dependent statistical potentials. CONCLUSIONS The optimal distance cutoff and residue interval are highly related with the reference state that the potential is based on, the measurements of the potential's performance, and the decoy sets that the potential is applied to. The performance of distance-dependent statistical potential can be significantly improved when the best statistical parameters for the specific application environment are adopted.
Collapse
Affiliation(s)
- Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Rong Gui
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Quan Liu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
- Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070 China
| |
Collapse
|
49
|
Coudrat T, Simms J, Christopoulos A, Wootten D, Sexton PM. Improving virtual screening of G protein-coupled receptors via ligand-directed modeling. PLoS Comput Biol 2017; 13:e1005819. [PMID: 29131821 PMCID: PMC5708846 DOI: 10.1371/journal.pcbi.1005819] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 11/30/2017] [Accepted: 10/12/2017] [Indexed: 11/22/2022] Open
Abstract
G protein-coupled receptors (GPCRs) play crucial roles in cell physiology and pathophysiology. There is increasing interest in using structural information for virtual screening (VS) of libraries and for structure-based drug design to identify novel agonist or antagonist leads. However, the sparse availability of experimentally determined GPCR/ligand complex structures with diverse ligands impedes the application of structure-based drug design (SBDD) programs directed to identifying new molecules with a select pharmacology. In this study, we apply ligand-directed modeling (LDM) to available GPCR X-ray structures to improve VS performance and selectivity towards molecules of specific pharmacological profile. The described method refines a GPCR binding pocket conformation using a single known ligand for that GPCR. The LDM method is a computationally efficient, iterative workflow consisting of protein sampling and ligand docking. We developed an extensive benchmark comparing LDM-refined binding pockets to GPCR X-ray crystal structures across seven different GPCRs bound to a range of ligands of different chemotypes and pharmacological profiles. LDM-refined models showed improvement in VS performance over origin X-ray crystal structures in 21 out of 24 cases. In all cases, the LDM-refined models had superior performance in enriching for the chemotype of the refinement ligand. This likely contributes to the LDM success in all cases of inhibitor-bound to agonist-bound binding pocket refinement, a key task for GPCR SBDD programs. Indeed, agonist ligands are required for a plethora of GPCRs for therapeutic intervention, however GPCR X-ray structures are mostly restricted to their inactive inhibitor-bound state. G protein-coupled receptors (GPCRs) are a major target for drug discovery. These receptors are highly dynamic membrane proteins, and have had limited tractability using with biophysical screens that are widely adopted for globular protein targets. Thus, structure-based virtual screening (SBVS) holds great promise as a complement to physical screening for rational design of novel drugs. Indeed, the increasing number of atomic-detail GPCR X-ray crystal structures has coincided with an increase in prospective SBVS studies that have identified novel compounds. However, experimentally solved GPCR structures do not meet the full demand for SBVS, as the GPCR structural landscape is incomplete, lacking both in coverage of available GPCRs, and diversity in both receptor conformations and the chemistry of co-crystalised ligands. Here we present a novel computational GPCR binding pocket refinement method that can generate predictive GPCR/ligand complexes with improved SBVS performance. This ligand-directed modeling workflow uses parallel processing and efficient algorithms to search the GPCR/ligand conformational space faster and more efficiently than the widely used protein refinement method molecular dynamics. In this study, the resulting models are evaluated both structurally, and in retrospective SBVS. We demonstrate improved performance of refined models over their starting structures in the majority of our test cases.
Collapse
Affiliation(s)
- Thomas Coudrat
- Drug Discovery Biology and Department of Pharmacology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - John Simms
- School of Life and Health Sciences, Aston University, Birmingham, United Kingdom
| | - Arthur Christopoulos
- Drug Discovery Biology and Department of Pharmacology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
| | - Denise Wootten
- Drug Discovery Biology and Department of Pharmacology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- * E-mail: (DW); (PMS)
| | - Patrick M. Sexton
- Drug Discovery Biology and Department of Pharmacology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria, Australia
- * E-mail: (DW); (PMS)
| |
Collapse
|
50
|
Xu G, Ma T, Zang T, Wang Q, Ma J. OPUS-CSF: A C-atom-based scoring function for ranking protein structural models. Protein Sci 2017; 27:286-292. [PMID: 29047165 PMCID: PMC5734313 DOI: 10.1002/pro.3327] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 10/14/2017] [Accepted: 10/16/2017] [Indexed: 12/12/2022]
Abstract
We report a C‐atom‐based scoring function, named OPUS‐CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS‐CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all‐atom empirical potentials. The average correlation coefficient with TM‐score was also comparable with those of other potentials. OPUS‐CSF is a highly coarse‐grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing, China.,Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| |
Collapse
|