1
|
Rozano L, Hane JK, Mancera RL. The Molecular Docking of MAX Fungal Effectors with Plant HMA Domain-Binding Proteins. Int J Mol Sci 2023; 24:15239. [PMID: 37894919 PMCID: PMC10607590 DOI: 10.3390/ijms242015239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 10/11/2023] [Accepted: 10/13/2023] [Indexed: 10/29/2023] Open
Abstract
Fungal effector proteins are important in mediating disease infections in agriculturally important crops. These secreted small proteins are known to interact with their respective host receptor binding partners in the host, either inside the cells or in the apoplastic space, depending on the localisation of the effector proteins. Consequently, it is important to understand the interactions between fungal effector proteins and their target host receptor binding partners, particularly since this can be used for the selection of potential plant resistance or susceptibility-related proteins that can be applied to the breeding of new cultivars with disease resistance. In this study, molecular docking simulations were used to characterise protein-protein interactions between effector and plant receptors. Benchmarking was undertaken using available experimental structures of effector-host receptor complexes to optimise simulation parameters, which were then used to predict the structures and mediating interactions of effector proteins with host receptor binding partners that have not yet been characterised experimentally. Rigid docking was applied for both the so-called bound and unbound docking of MAX effectors with plant HMA domain protein partners. All bound complexes used for benchmarking were correctly predicted, with 84% being ranked as the top docking pose using the ZDOCK scoring function. In the case of unbound complexes, a minimum of 95% of known residues were predicted to be part of the interacting interface on the host receptor binding partner, and at least 87% of known residues were predicted to be part of the interacting interface on the effector protein. Hydrophobic interactions were found to dominate the formation of effector-plant protein complexes. An optimised set of docking parameters based on the use of ZDOCK and ZRANK scoring functions were established to enable the prediction of near-native docking poses involving different binding interfaces on plant HMA domain proteins. Whilst this study was limited by the availability of the experimentally determined complexed structures of effectors and host receptor binding partners, we demonstrated the potential of molecular docking simulations to predict the likely interactions between effectors and their respective host receptor binding partners. This computational approach may accelerate the process of the discovery of putative interacting plant partners of effector proteins and contribute to effector-assisted marker discovery, thereby supporting the breeding of disease-resistant crops.
Collapse
Affiliation(s)
- Lina Rozano
- Curtin Medical School, Curtin Health Innovation Research Institute, GPO Box U1987, Perth, WA 6845, Australia
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| | - James K. Hane
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
- Centre for Crop and Disease Management, School of Molecular and Life Sciences, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| | - Ricardo L. Mancera
- Curtin Medical School, Curtin Health Innovation Research Institute, GPO Box U1987, Perth, WA 6845, Australia
- Curtin Institute for Data Science, Curtin University, GPO Box U1987, Perth, WA 6845, Australia
| |
Collapse
|
2
|
Shome S, Jia K, Sivasankar S, Jernigan RL. Characterizing interactions in E-cadherin assemblages. Biophys J 2023; 122:3069-3077. [PMID: 37345249 PMCID: PMC10432173 DOI: 10.1016/j.bpj.2023.06.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2022] [Revised: 09/26/2022] [Accepted: 06/14/2023] [Indexed: 06/23/2023] Open
Abstract
Cadherin intermolecular interactions are critical for cell-cell adhesion and play essential roles in tissue formation and the maintenance of tissue structures. In this study, we focus on E-cadherin, a classical cadherin that connects epithelial cells, to understand how they interact in cis and trans conformations when attached to the same cell or opposing cells. We employ coevolutionary sequence analysis and molecular dynamics simulations to confirm previously known interaction sites as well as to identify new interaction sites. The sequence coevolutionary results yield a surprising result indicating that there are no strongly favored intermolecular interaction sites, which is unusual and suggests that many interaction sites may be possible, with none being strongly preferred over others. By using molecular dynamics, we test the persistence of these interactions and how they facilitate adhesion. We build several types of cadherin assemblages, with different numbers and combinations of cis and trans interfaces to understand how these conformations act to facilitate adhesion. Our results suggest that, in addition to the established interaction sites on the EC1 and EC2 domains, an additional plausible cis interface at the EC3-EC5 domain exists. Furthermore, we identify specific mutations at cis/trans binding sites that impair adhesion within E-cadherin assemblages.
Collapse
Affiliation(s)
- Sayane Shome
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Kejue Jia
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa
| | - Sanjeevi Sivasankar
- Department of Biomedical Engineering, University of California, Davis, Davis, California
| | - Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, Iowa.
| |
Collapse
|
3
|
Jernigan RL, Khade P, Kumar A, Kloczkowski A. Using Surface Hydrophobicity Together with Empirical Potentials to Identify Protein-Protein Binding Sites: Application to the Interactions of E-cadherins. Methods Mol Biol 2022; 2340:41-50. [PMID: 35167069 PMCID: PMC9131873 DOI: 10.1007/978-1-0716-1546-1_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Studying the interactions within protein structures can inform about the details of how proteins of various types interact and aggregate. Empirical contact potentials have proven to be extremely important in the evaluation of individual modeled protein structures, but have found few applications to protein-protein interactions. In part, this is caused by a lack of properly formulated potentials with a proper reference state. Since the comparisons are made between different bound structures, the proper reference state should take into account other contacts. Therefore, a preferred reference state should be defined with respect to a given residue type interacting with an average residue instead of interacting with solvent as typically is used in derivation of statistical contact potentials. Here, a two-stage procedure for generating and evaluating interacting protein pairs is described, and an example of E-cadherin interactions is shown.
Collapse
Affiliation(s)
- Robert L Jernigan
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA.
| | - Pranav Khade
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Ambuj Kumar
- Roy J. Carver Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA, USA
| | - Andrzej Kloczkowski
- Battelle Center for Mathematical Medicine, Nationwide Children's Hospital, Columbus, OH, USA
- Department of Pediatrics, The Ohio State University College of Medicine, Columbus, OH, USA
| |
Collapse
|
4
|
Barradas-Bautista D, Cao Z, Vangone A, Oliva R, Cavallo L. A random forest classifier for protein-protein docking models. BIOINFORMATICS ADVANCES 2021; 2:vbab042. [PMID: 36699405 PMCID: PMC9710594 DOI: 10.1093/bioadv/vbab042] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 11/11/2021] [Accepted: 12/06/2021] [Indexed: 01/28/2023]
Abstract
Herein, we present the results of a machine learning approach we developed to single out correct 3D docking models of protein-protein complexes obtained by popular docking software. To this aim, we generated 3 × 10 4 docking models for each of the 230 complexes in the protein-protein benchmark, version 5, using three different docking programs (HADDOCK, FTDock and ZDOCK), for a cumulative set of ≈ 7 × 10 6 docking models. Three different machine learning approaches (Random Forest, Supported Vector Machine and Perceptron) were used to train classifiers with 158 different scoring functions (features). The Random Forest algorithm outperformed the other two algorithms and was selected for further optimization. Using a features selection algorithm, and optimizing the random forest hyperparameters, allowed us to train and validate a random forest classifier, named COnservation Driven Expert System (CoDES). Testing of CoDES on independent datasets, as well as results of its comparative performance with machine learning methods recently developed in the field for the scoring of docking decoys, confirm its state-of-the-art ability to discriminate correct from incorrect decoys both in terms of global parameters and in terms of decoys ranked at the top positions. Supplementary information Supplementary data are available at Bioinformatics Advances online. Software and data availability statement The docking models are available at https://doi.org/10.5281/zenodo.4012018. The programs underlying this article will be shared on request to the corresponding authors.
Collapse
Affiliation(s)
- Didier Barradas-Bautista
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| | - Zhen Cao
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia
| | - Anna Vangone
- Pharma Research and Early Development, Therapeutic Modalities, Roche Innovation Center Munich Large Molecule Research, 82377 Penzberg, Germany
| | - Romina Oliva
- Department of Sciences and Technologies, University Parthenope of Naples, Centro Direzionale Isola C4, I-80143 Naples, Italy,To whom correspondence should be addressed. or or
| | - Luigi Cavallo
- Kaust Catalysis Center, Physical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), 23955-6900 Thuwal, Saudi Arabia,To whom correspondence should be addressed. or or
| |
Collapse
|
5
|
Hong Z, Liu J, Chen Y. An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction. Biophys Chem 2021; 278:106666. [PMID: 34418678 DOI: 10.1016/j.bpc.2021.106666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 08/09/2021] [Accepted: 08/09/2021] [Indexed: 12/29/2022]
Abstract
Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.
Collapse
Affiliation(s)
- Zhonghua Hong
- Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing University, Jiaxing 314001, PR China.
| | - Jiale Liu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, PR China
| | - Yinggao Chen
- Shantou Central Hospital, Shantou 515041, PR China.
| |
Collapse
|
6
|
Fogalli GB, Line SRP. Estimating the Influence of Physicochemical and Biochemical Property Indexes on Selection for Amino Acids Usage in Eukaryotic Cells. J Mol Evol 2021; 89:257-268. [PMID: 33760966 DOI: 10.1007/s00239-021-10003-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 03/10/2021] [Indexed: 11/26/2022]
Abstract
Proteins can evolve by accumulating changes on amino acid sequences. These changes are mainly caused by missense mutations on its DNA coding sequences. Mutations with neutral or positive effects on fitness can be maintained while deleterious mutations tend to be eliminated by natural selection. Amino acid changes are influenced by the biophysical, chemical, and biological properties of amino acids. There is a multiplicity of amino acid properties that can influence the function and expression of proteins. Amino acid properties can be expressed into numerical indexes, which can help to predict functional and structural aspects of proteins and allow statistical inferences of selection pressure on amino acid usage. The accuracy of these analyses may be compromised by the existence of several numerical indexes that measure the same amino acid property, and the lack of objective parameters to determine the most accurate and biologically relevant index. In the present study, the gradient consistency test was used in order to estimate the magnitude of directional selection imparted by amino acid biochemical and biophysical properties on protein evolution.
Collapse
Affiliation(s)
- Giovani B Fogalli
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil
| | - Sergio R P Line
- Department of Biosciences, Piracicaba Dental School, University of Campinas, Campinas, Brazil.
| |
Collapse
|
7
|
Lou H, Cukier RI. A maximum entropy principle approach to a joint probability model for sequences with known neighbor and next neighbor pair probabilities. Chem Phys 2020. [DOI: 10.1016/j.chemphys.2020.110872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
8
|
Tywoniuk B, Yuan Y, McCartan S, Szydłowska BM, Tofoleanu F, Brooks BR, Buchete NV. Amyloid Fibril Design: Limiting Structural Polymorphism in Alzheimer's Aβ Protofilaments. J Phys Chem B 2018; 122:11535-11545. [PMID: 30335383 DOI: 10.1021/acs.jpcb.8b07423] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Nanoscale fibrils formed by amyloid peptides have a polymorphic character, adopting several types of molecular structures in similar growth conditions. As shown by experimental (e.g., solid-state NMR) and computational studies, amyloid fibril polymorphism hinders both the structural characterization of Alzheimer's Aβ amyloid protofilaments and fibrils at a molecular level, as well as the possible applications (e.g., development of drugs or biomarkers) that rely on similar, controlled molecular arrangements of the Aβ peptides in amyloid fibril structures. We have explored the use of several contact potentials for the efficient identification of minimal sequence mutations that could enhance the stability of specific fibril structures while simultaneously destabilizing competing topologies, controlling thus the amount of structural polymorphism in a rational way. We found that different types of contact potentials, while having only partial accuracy on their own, lead to similar results regarding ranking the compatibility of wild-type (WT) and mutated amyloid sequences with different fibril morphologies. This approach allows exhaustive screening and assessment of possible mutations and the identification of minimal consensus mutations that could stabilize fibrils with the desired topology at the expense of other topology types, a prediction that is further validated using atomistic molecular dynamics with explicit water molecules. We apply this two-step multiscale (i.e., residue and atomistic-level) approach to predict and validate mutations that could bias either parallel or antiparallel packing in the core Alzheimer's Aβ9-40 amyloid fibril models based on solid-state NMR experiments. Besides shedding new light on the molecular origins of structural polymorphism in WT Aβ fibrils, our study could also lead to efficient tools for assisting future experimental approaches for amyloid fibril determination, and for the development of biomarkers or drugs aimed at interfering with the stability of amyloid fibrils, as well as for the future design of amyloid fibrils with a controlled (e.g., reduced) level of structural polymorphism.
Collapse
Affiliation(s)
- Bartłomiej Tywoniuk
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Ye Yuan
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Sarah McCartan
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| | - Beata Maria Szydłowska
- Applied Physical Chemistry , Ruprecht-Karls University Heidelberg , Heidelberg 69120 , Germany.,Institute of Physics, EIT 2 , Universität der Bundeswehr München , Werner-Heisenberg-Weg 39 , 85577 Neubiberg , Germany
| | - Florentina Tofoleanu
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute , National Institutes of Health , Bethesda , Maryland 20892 , United States.,Department of Chemistry , Yale University , New Haven , Connecticut 06520 , United States
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute , National Institutes of Health , Bethesda , Maryland 20892 , United States
| | - Nicolae-Viorel Buchete
- School of Physics , University College Dublin , Dublin D04 V1W8 , Ireland.,Institute for Discovery , University College Dublin , Dublin D04 V1W8 , Ireland
| |
Collapse
|
9
|
Lu B, Li C, Chen Q, Song J. ProBAPred: Inferring protein–protein binding affinity by incorporating protein sequence and structural features. J Bioinform Comput Biol 2018; 16:1850011. [PMID: 29954286 DOI: 10.1142/s0219720018500117] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Protein-protein binding interaction is the most prevalent biological activity that mediates a great variety of biological processes. The increasing availability of experimental data of protein–protein interaction allows a systematic construction of protein–protein interaction networks, significantly contributing to a better understanding of protein functions and their roles in cellular pathways and human diseases. Compared to well-established classification for protein–protein interactions (PPIs), limited work has been conducted for estimating protein–protein binding free energy, which can provide informative real-value regression models for characterizing the protein–protein binding affinity. In this study, we propose a novel ensemble computational framework, termed ProBAPred (Protein–protein Binding Affinity Predictor), for quantitative estimation of protein–protein binding affinity. A large number of sequence and structural features, including physical–chemical properties, binding energy and conformation annotations, were collected and calculated from currently available protein binding complex datasets and the literature. Feature selection based on the WEKA package was performed to identify and characterize the most informative and contributing feature subsets. Experiments on the independent test showed that our ensemble method achieved the lowest Mean Absolute Error (MAE; 1.657[Formula: see text]kcal/mol) and the second highest correlation coefficient ([Formula: see text]), compared with the existing methods. The datasets and source codes of ProBAPred, and the supplementary materials in this study can be downloaded at http://lightning.med.monash.edu/probapred/ for academic use. We anticipate that the developed ProBAPred regression models can facilitate computational characterization and experimental studies of protein–protein binding affinity.
Collapse
Affiliation(s)
- Bangli Lu
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Chen Li
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
| | - Qingfeng Chen
- School of Computer, Electronic and Information, and State Key Laboratory for Conservation and Utilization of Subtropical Agro-Bioresources, Guangxi University, 100 Daxue Road, 530004 Nanning, P. R. China
| | - Jiangning Song
- Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, VIC 3800, Australia
- Monash Centre for Data Science, Faculty of Information Technology, Monash University, VIC 3800, Australia
- ARC Centre of Excellence for Advanced Molecular Imaging, Monash University, VIC 3800, Australia
| |
Collapse
|
10
|
Anishchenko I, Kundrotas PJ, Vakser IA. Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model. Biophys J 2018; 115:809-821. [PMID: 30122295 DOI: 10.1016/j.bpj.2018.07.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 07/31/2018] [Indexed: 12/18/2022] Open
Abstract
The energy function is the key component of protein modeling methodology. This work presents a semianalytical approach to the development of contact potentials for protein structure modeling. Residue-residue and atom-atom contact energies were derived by maximizing the probability of observing native sequences in a nonredundant set of protein structures. The optimization task was formulated as an inverse statistical mechanics problem applied to the Potts model. Its solution by pseudolikelihood maximization provides consistent estimates of coupling constants at atomic and residue levels. The best performance was achieved when interacting atoms were grouped according to their physicochemical properties. For individual protein structures, the performance of the contact potentials in distinguishing near-native structures from the decoys is similar to the top-performing scoring functions. The potentials also yielded significant improvement in the protein docking success rates. The potentials recapitulated experimentally determined protein stability changes upon point mutations and protein-protein binding affinities. The approach offers a different perspective on knowledge-based potentials and may serve as the basis for their further development.
Collapse
Affiliation(s)
- Ivan Anishchenko
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas
| | - Petras J Kundrotas
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| | - Ilya A Vakser
- Computational Biology Program and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas.
| |
Collapse
|
11
|
Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018; 13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open
Abstract
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Qinxin Pan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America
- * E-mail:
| |
Collapse
|
12
|
Mirzaie M. Hydrophobic residues can identify native protein structures. Proteins 2018; 86:467-474. [DOI: 10.1002/prot.25466] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 12/28/2017] [Accepted: 01/23/2018] [Indexed: 11/06/2022]
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences; Tarbiat Modares University, Jalal Ale Ahmad Highway; Tehran Iran
- School of Biological Sciences; Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| |
Collapse
|
13
|
Barradas-Bautista D, Moal IH, Fernández-Recio J. A systematic analysis of scoring functions in rigid-body protein docking: The delicate balance between the predictive rate improvement and the risk of overtraining. Proteins 2017; 85:1287-1297. [DOI: 10.1002/prot.25289] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2016] [Revised: 03/08/2017] [Accepted: 03/20/2017] [Indexed: 12/24/2022]
Affiliation(s)
- Didier Barradas-Bautista
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology; Barcelona 08034 Spain
| | - Iain H. Moal
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology; Barcelona 08034 Spain
- European Molecular Biology Laboratory; European Bioinformatics Institute, Wellcome Trust Genome Campus; Hinxton Cambridge CB10 1SD United Kingdom
| | - Juan Fernández-Recio
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Joint BSC-CRG-IRB Research Program in Computational Biology; Barcelona 08034 Spain
| |
Collapse
|
14
|
Knowledge-based entropies improve the identification of native protein structures. Proc Natl Acad Sci U S A 2017; 114:2928-2933. [PMID: 28265078 DOI: 10.1073/pnas.1613331114] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Evaluating protein structures requires reliable free energies with good estimates of both potential energies and entropies. Although there are many demonstrated successes from using knowledge-based potential energies, computing entropies of proteins has lagged far behind. Here we take an entirely different approach and evaluate knowledge-based conformational entropies of proteins based on the observed frequencies of contact changes between amino acids in a set of 167 diverse proteins, each of which has two alternative structures. The results show that charged and polar interactions break more often than hydrophobic pairs. This pattern correlates strongly with the average solvent exposure of amino acids in globular proteins, as well as with polarity indices and the sizes of the amino acids. Knowledge-based entropies are derived by using the inverse Boltzmann relationship, in a manner analogous to the way that knowledge-based potentials have been extracted. Including these new knowledge-based entropies almost doubles the performance of knowledge-based potentials in selecting the native protein structures from decoy sets. Beyond the overall energy-entropy compensation, a similar compensation is seen for individual pairs of interacting amino acids. The entropies in this report have immediate applications for 3D structure prediction, protein model assessment, and protein engineering and design.
Collapse
|
15
|
Tang K, Wong SWK, Liu JS, Zhang J, Liang J. Conformational sampling and structure prediction of multiple interacting loops in soluble and β-barrel membrane proteins using multi-loop distance-guided chain-growth Monte Carlo method. Bioinformatics 2015; 31:2646-52. [PMID: 25861965 DOI: 10.1093/bioinformatics/btv198] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 04/03/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Loops in proteins are often involved in biochemical functions. Their irregularity and flexibility make experimental structure determination and computational modeling challenging. Most current loop modeling methods focus on modeling single loops. In protein structure prediction, multiple loops often need to be modeled simultaneously. As interactions among loops in spatial proximity can be rather complex, sampling the conformations of multiple interacting loops is a challenging task. RESULTS In this study, we report a new method called multi-loop Distance-guided Sequential chain-Growth Monte Carlo (M-DiSGro) for prediction of the conformations of multiple interacting loops in proteins. Our method achieves an average RMSD of 1.93 Å for lowest energy conformations of 36 pairs of interacting protein loops with the total length ranging from 12 to 24 residues. We further constructed a data set containing proteins with 2, 3 and 4 interacting loops. For the most challenging target proteins with four loops, the average RMSD of the lowest energy conformations is 2.35 Å. Our method is also tested for predicting multiple loops in β-barrel membrane proteins. For outer-membrane protein G, the lowest energy conformation has a RMSD of 2.62 Å for the three extracellular interacting loops with a total length of 34 residues (12, 12 and 10 residues in each loop). AVAILABILITY AND IMPLEMENTATION The software is freely available at: tanto.bioe.uic.edu/m-DiSGro. CONTACT jinfeng@stat.fsu.edu or jliang@uic.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Tang
- Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL
| | - Samuel W K Wong
- Department of Statistics, University of Florida, Gainesville, FL
| | - Jun S Liu
- Department of Statistics, Harvard University, Science Center, Cambridge, MA and
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jie Liang
- Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL
| |
Collapse
|
16
|
Thompson JJ, Tabatabaei Ghomi H, Lill MA. Application of information theory to a three-body coarse-grained representation of proteins in the PDB: insights into the structural and evolutionary roles of residues in protein structure. Proteins 2014; 82:3450-65. [PMID: 25269778 DOI: 10.1002/prot.24698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 09/09/2014] [Accepted: 09/19/2014] [Indexed: 01/03/2023]
Abstract
Knowledge-based methods for analyzing protein structures, such as statistical potentials, primarily consider the distances between pairs of bodies (atoms or groups of atoms). Considerations of several bodies simultaneously are generally used to characterize bonded structural elements or those in close contact with each other, but historically do not consider atoms that are not in direct contact with each other. In this report, we introduce an information-theoretic method for detecting and quantifying distance-dependent through-space multibody relationships between the sidechains of three residues. The technique introduced is capable of producing convergent and consistent results when applied to a sufficiently large database of randomly chosen, experimentally solved protein structures. The results of our study can be shown to reproduce established physico-chemical properties of residues as well as more recently discovered properties and interactions. These results offer insight into the numerous roles that residues play in protein structure, as well as relationships between residue function, protein structure, and evolution. The techniques and insights presented in this work should be useful in the future development of novel knowledge-based tools for the evaluation of protein structure.
Collapse
Affiliation(s)
- Jared J Thompson
- Department of Medicinal Chemistry and Molecular Pharmacology, College of Pharmacy, Purdue University, West Lafayette, Indiana
| | | | | |
Collapse
|
17
|
Moal IH, Jiménez-García B, Fernández-Recio J. CCharPPI web server: computational characterization of protein-protein interactions from structure. Bioinformatics 2014; 31:123-5. [PMID: 25183488 DOI: 10.1093/bioinformatics/btu594] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
SUMMARY The atomic structures of protein-protein interactions are central to understanding their role in biological systems, and a wide variety of biophysical functions and potentials have been developed for their characterization and the construction of predictive models. These tools are scattered across a multitude of stand-alone programs, and are often available only as model parameters requiring reimplementation. This acts as a significant barrier to their widespread adoption. CCharPPI integrates many of these tools into a single web server. It calculates up to 108 parameters, including models of electrostatics, desolvation and hydrogen bonding, as well as interface packing and complementarity scores, empirical potentials at various resolutions, docking potentials and composite scoring functions. AVAILABILITY AND IMPLEMENTATION The server does not require registration by the user and is freely available for non-commercial academic use at http://life.bsc.es/pid/ccharppi.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Brian Jiménez-García
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernández-Recio
- Joint BSC-IRB Research Programme in Computational Biology, Department of Life Sciences, Barcelona Supercomputing Center, C/Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
18
|
Mirzaie M, Sadeghi M. Delaunay-based nonlocal interactions are sufficient and accurate in protein fold recognition. Proteins 2013; 82:415-23. [DOI: 10.1002/prot.24407] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 08/12/2013] [Accepted: 08/21/2013] [Indexed: 01/05/2023]
Affiliation(s)
- Mehdi Mirzaie
- Department of Basic Sciences, Faculty of Paramedical Sciences; Shahid Beheshti University of Medical Sciences; Tehran Iran
- Department of Bioinformatics; School of Computer Science, Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| | - Mehdi Sadeghi
- Department of Bioinformatics, National Institute of Genetic Engineering and Biotechnology; Tehran Iran
| |
Collapse
|
19
|
Moal IH, Torchala M, Bates PA, Fernández-Recio J. The scoring of poses in protein-protein docking: current capabilities and future directions. BMC Bioinformatics 2013; 14:286. [PMID: 24079540 PMCID: PMC3850738 DOI: 10.1186/1471-2105-14-286] [Citation(s) in RCA: 76] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2013] [Accepted: 09/25/2013] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Protein-protein docking, which aims to predict the structure of a protein-protein complex from its unbound components, remains an unresolved challenge in structural bioinformatics. An important step is the ranking of docked poses using a scoring function, for which many methods have been developed. There is a need to explore the differences and commonalities of these methods with each other, as well as with functions developed in the fields of molecular dynamics and homology modelling. RESULTS We present an evaluation of 115 scoring functions on an unbound docking decoy benchmark covering 118 complexes for which a near-native solution can be found, yielding top 10 success rates of up to 58%. Hierarchical clustering is performed, so as to group together functions which identify near-natives in similar subsets of complexes. Three set theoretic approaches are used to identify pairs of scoring functions capable of correctly scoring different complexes. This shows that functions in different clusters capture different aspects of binding and are likely to work together synergistically. CONCLUSIONS All functions designed specifically for docking perform well, indicating that functions are transferable between sampling methods. We also identify promising methods from the field of homology modelling. Further, differential success rates by docking difficulty and solution quality suggest a need for flexibility-dependent scoring. Investigating pairs of scoring functions, the set theoretic measures identify known scoring strategies as well as a number of novel approaches, indicating promising augmentations of traditional scoring methods. Such augmentation and parameter combination strategies are discussed in the context of the learning-to-rank paradigm.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Super computing Center, Barcelona 08034, Spain
| | - Mieczyslaw Torchala
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London WC2A 3LY, UK
| | - Paul A Bates
- Biomolecular Modelling Laboratory, Cancer Research UK London Research Institute, London WC2A 3LY, UK
| | - Juan Fernández-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Super computing Center, Barcelona 08034, Spain
| |
Collapse
|
20
|
Moal IH, Fernandez-Recio J. Intermolecular Contact Potentials for Protein-Protein Interactions Extracted from Binding Free Energy Changes upon Mutation. J Chem Theory Comput 2013; 9:3715-27. [PMID: 26584123 DOI: 10.1021/ct400295z] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Understanding and predicting the energetics of protein-protein interactions is fundamental to the structural modeling of protein complexes. Binding free energy can be approximated as a sum of pairwise atomic or residue contact energies, which are commonly inferred from contact frequencies observed in experimental protein structures. However, such statistically inferred potentials require certain assumptions and approximation. Here, we explore the possibility of deriving atomic and residue contact potentials directly from experimental binding free energy changes following mutation and present a number of such potentials. The first set of potentials is obtained by unweighted least-squares fitting and bootsrap aggregating. The second set is calculated using a weighting scheme optimized against absolute binding affinity data, so as to account for the over-representation of certain complexes, residues, and families of interactions. The congruence of the potentials with known physical chemistry is investigated. The potentials are further validated by ranking and clustering protein-protein docking poses.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernandez-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
21
|
Kapoor A, Travesset A. Folding and stability of helical bundle proteins from coarse-grained models. Proteins 2013; 81:1200-11. [DOI: 10.1002/prot.24269] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2012] [Revised: 01/13/2013] [Accepted: 01/29/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Abhijeet Kapoor
- Department of Physics and Astronomy; Iowa State University; Ames; Iowa 50011
| | - Alex Travesset
- Department of Physics and Astronomy; Iowa State University; Ames; Iowa 50011
| |
Collapse
|
22
|
Kauffman C, Karypis G. Coarse- and fine-grained models for proteins: Evaluation by decoy discrimination. Proteins 2013. [DOI: 10.1002/prot.24222] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Chris Kauffman
- Department of Computer Science, George Mason University, Fairfax, Virginia 22030, USA.
| | | |
Collapse
|
23
|
Pandey RB, Farmer BL. Random coil to globular thermal response of a protein (H3.1) with three knowledge-based coarse-grained potentials. PLoS One 2012; 7:e49352. [PMID: 23166645 PMCID: PMC3498164 DOI: 10.1371/journal.pone.0049352] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2012] [Accepted: 10/10/2012] [Indexed: 11/19/2022] Open
Abstract
The effect of temperature on the conformation of a histone (H3.1) is studied by a coarse-grained Monte Carlo simulation based on three knowledge-based contact potentials (MJ, BT, BFKV). Despite unique energy and mobility profiles of its residues, the histone H3.1 undergoes a systematic (possibly continuous) structural transition from a random coil to a globular conformation on reducing the temperature. The range over which such a systematic response in variation of the radius of gyration (R(g)) with the temperature (T) occurs, however, depends on the potential, i.e. ΔT(MJ) ≈ 0.013-0.020, ΔT(BT) ≈ 0.018-0.026, and ΔT(BFKV) ≈ 0.006-0.013 (in reduced unit). Unlike MJ and BT potentials, results from the BFKV potential show an anomaly where the magnitude of R(g) decreases on raising the temperature in a range ΔT(A) ≈ 0.015-0.018 before reaching its steady-state random coil configuration. Scaling of the structure factor, S(q) ∝ q(-1/ν), with the wave vector, q=2π/λ, and the wavelength, λ, reveals a systematic change in the effective dimension (D(e)∼1/ν) of the histone with all potentials (MJ, BT, BFKV): D(e)∼3 in the globular structure with D(e)∼2 for the random coil. Reproducibility of the general yet unique (monotonic) structural transition of the protein H3.1 with the temperature (in contrast to non-monotonic structural response of a similar but different protein H2AX) with three interaction sets shows that the knowledge-based contact potential is viable tool to investigate structural response of proteins. Caution should be exercise with the quantitative comparisons due to differences in transition regimes with these interactions.
Collapse
Affiliation(s)
- Ras B Pandey
- Department of Physics and Astronomy, University of Southern Mississippi, Hattiesburg, Missouri, USA.
| | | |
Collapse
|
24
|
Mirzaie M, Sadeghi M. Distance-dependent atomic knowledge-based force in protein fold recognition. Proteins 2012; 80:683-90. [DOI: 10.1002/prot.24011] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Revised: 11/15/2011] [Accepted: 12/06/2011] [Indexed: 11/08/2022]
|
25
|
Dotu I, Cebrián M, Van Hentenryck P, Clote P. On lattice protein structure prediction revisited. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1620-1632. [PMID: 21358007 DOI: 10.1109/tcbb.2011.41] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Protein structure prediction is regarded as a highly challenging problem both for the biology and for the computational communities. In recent years, many approaches have been developed, moving to increasingly complex lattice models and off-lattice models. This paper presents a Large Neighborhood Search (LNS) to find the native state for the Hydrophobic-Polar (HP) model on the Face-Centered Cubic (FCC) lattice or, in other words, a self-avoiding walk on the FCC lattice having a maximum number of H-H contacts. The algorithm starts with a tabu-search algorithm, whose solution is then improved by a combination of constraint programming and LNS. The flexible framework of this hybrid algorithm allows an adaptation to the Miyazawa-Jernigan contact potential, in place of the HP model, thus suggesting its potential for tertiary structure prediction. Benchmarking statistics are given for our method against the hydrophobic core threading program HPstruct, an exact method which can be viewed as complementary to our method.
Collapse
Affiliation(s)
- Ivan Dotu
- Biology Department, Boston College, Higgins 355, 140 Commonwealth Avenue, Chestnut Hill, MA 02467, USA.
| | | | | | | |
Collapse
|
26
|
Free energies for coarse-grained proteins by integrating multibody statistical contact potentials with entropies from elastic network models. ACTA ACUST UNITED AC 2011; 12:137-47. [PMID: 21674234 DOI: 10.1007/s10969-011-9113-3] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2010] [Accepted: 05/26/2011] [Indexed: 01/02/2023]
Abstract
We propose a novel method of calculation of free energy for coarse grained models of proteins by combining our newly developed multibody potentials with entropies computed from elastic network models of proteins. Multi-body potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Combining four-body non-sequential, four-body sequential and pairwise short range potentials with optimized weights for each term, our coarse-grained potential improved recognition of native structure among misfolded decoys, outperforming all other contact potentials for CASP8 decoy sets and performance comparable to the fully atomic empirical DFIRE potentials. By combing statistical contact potentials with entropies from elastic network models of the same structures we can compute free energy changes and improve coarse-grained modeling of protein structure and dynamics. The consideration of protein flexibility and dynamics should improve protein structure prediction and refinement of computational models. This work is the first to combine coarse-grained multibody potentials with an entropic model that takes into account contributions of the entire structure, investigating native-like decoy selection.
Collapse
|
27
|
Jha AN, Vishveshwara S, Banavar JR. Amino acid interaction preferences in helical membrane proteins. Protein Eng Des Sel 2011; 24:579-88. [PMID: 21666247 DOI: 10.1093/protein/gzr022] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Membrane proteins are involved in a number of important biological functions. Yet, they are poorly understood from the structure and folding point of view. The external environment being drastically different from that of globular proteins, the intra-protein interactions in membrane proteins are also expected to be different. Hence, statistical potentials representing the features of inter-residue interactions based exclusively on the structures of membrane proteins are much needed. Currently, a reasonable number of structures are available, making it possible to undertake such an analysis on membrane proteins. In this study we have examined the inter-residue interaction propensities of amino acids in the membrane spanning regions of the alpha-helical membrane (HM) proteins. Recently we have shown that valuable information can be obtained on globular proteins by the evaluation of the pair-wise interactions of amino acids by classifying them into different structural environments, based on factors such as the secondary structure or the number of contacts that a residue can make. Here we have explored the possible ways of classifying the intra-protein environment of HM proteins and have developed scoring functions based on different classification schemes. On evaluation of different schemes, we find that the scheme which classifies amino acids to different intra-contact environment is the most promising one. Based on this classification scheme, we also redefine the hydrophobicity scale of amino acids in HM proteins.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | |
Collapse
|
28
|
Gniewek P, Leelananda SP, Kolinski A, Jernigan RL, Kloczkowski A. Multibody coarse-grained potentials for native structure recognition and quality assessment of protein models. Proteins 2011; 79:1923-9. [PMID: 21560165 DOI: 10.1002/prot.23015] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2010] [Revised: 01/07/2011] [Accepted: 01/28/2011] [Indexed: 01/02/2023]
Abstract
Multibody potentials have been of much interest recently because they take into account three dimensional interactions related to residue packing and capture the cooperativity of these interactions in protein structures. Our goal was to combine long range multibody potentials and short range potentials to improve recognition of native structure among misfolded decoys. We optimized the weights for four-body nonsequential, four-body sequential, and short range potentials to obtain optimal model ranking results for threading and have compared these data against results obtained with other potentials (26 different coarse-grained potentials from the Potentials 'R'Us web server have been used). Our optimized multibody potentials outperform all other contact potentials in the recognition of the native structure among decoys, both for models from homology template-based modeling and from template-free modeling in CASP8 decoy sets. We have compared the results obtained for this optimized coarse-grained potentials, where each residue is represented by a single point, with results obtained by using the DFIRE potential, which takes into account atomic level information of proteins. We found that for all proteins larger than 80 amino acids our optimized coarse-grained potentials yield results comparable to those obtained with the atomic DFIRE potential.
Collapse
Affiliation(s)
- Pawel Gniewek
- Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | | | | | | | | |
Collapse
|
29
|
The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011; 188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.
Collapse
|
30
|
Mittal A, Jayaram B. Backbones of Folded Proteins Reveal Novel Invariant Amino Acid Neighborhoods. J Biomol Struct Dyn 2011; 28:443-54. [DOI: 10.1080/073911011010524954] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
31
|
Abstract
We extend PRIME, an intermediate-resolution protein model previously used in simulations of the aggregation of polyalanine and polyglutamine, to the description of the geometry and energetics of peptides containing all 20 amino acid residues. The 20 amino acid side chains are classified into 14 groups according to their hydrophobicity, polarity, size, charge, and potential for side chain hydrogen bonding. The parameters for extended PRIME, called PRIME 20, include hydrogen-bonding energies, side chain interaction range and energy, and excluded volume. The parameters are obtained by applying a perceptron-learning algorithm and a modified stochastic learning algorithm that optimizes the energy gap between 711 known native states from the PDB and decoy structures generated by gapless threading. The number of independent pair interaction parameters is chosen to be small enough to be physically meaningful yet large enough to give reasonably accurate results in discriminating decoys from native structures. The most physically meaningful results are obtained with 19 energy parameters.
Collapse
Affiliation(s)
- Mookyung Cheon
- Department of Chemical and Biomolecular Engineering, North Carolina State University, Raleigh, North Carolina, USA
| | | | | |
Collapse
|
32
|
Jha AN, Vishveshwara S, Banavar JR. Amino acid interaction preferences in proteins. Protein Sci 2010; 19:603-16. [PMID: 20073083 DOI: 10.1002/pro.339] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Understanding the key factors that influence the interaction preferences of amino acids in the folding of proteins have remained a challenge. Here we present a knowledge-based approach for determining the effective interactions between amino acids based on amino acid type, their secondary structure, and the contact based environment that they find themselves in the native state structure as measured by their number of neighbors. We find that the optimal information is approximately encoded in a 60 x 60 matrix describing the 20 types of amino acids in three distinct secondary structures (helix, beta strand, and loop). We carry out a clustering scheme to understand the similarity between these interactions and to elucidate a nonredundant set. We demonstrate that the inferred energy parameters can be used for assessing the fit of a given sequence into a putative native state structure.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India
| | | | | |
Collapse
|
33
|
Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010; 11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open
Abstract
Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA
| | | |
Collapse
|
34
|
Feng Y, Kloczkowski A, Jernigan RL. Potentials 'R' Us web-server for protein energy estimations with coarse-grained knowledge-based potentials. BMC Bioinformatics 2010; 11:92. [PMID: 20163737 PMCID: PMC3098114 DOI: 10.1186/1471-2105-11-92] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2009] [Accepted: 02/17/2010] [Indexed: 11/13/2022] Open
Abstract
Background Knowledge-based potentials have been widely used in the last 20 years for fold recognition, protein structure prediction from amino acid sequence, ligand binding, protein design, and many other purposes. However generally these are not readily accessible online. Results Our new knowledge-based potential server makes available many of these potentials for easy use to automatically compute the energies of protein structures or models supplied. Our web server for protein energy estimation uses four-body potentials, short-range potentials, and 23 different two-body potentials. Users can select potentials according to their needs and preferences. Files containing the coordinates of protein atoms in the PDB format can be uploaded as input. The results will be returned to the user's email address. Conclusions Our Potentials 'R'Us server is an easily accessible, freely available tool with a web interface that collects all existing and future protein coarse-grained potentials and computes energies of multiple structural models.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011-0320, USA
| | | | | |
Collapse
|
35
|
Jamroz M, Kolinski A. Modeling of loops in proteins: a multi-method approach. BMC STRUCTURAL BIOLOGY 2010; 10:5. [PMID: 20149252 PMCID: PMC2837870 DOI: 10.1186/1472-6807-10-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 02/11/2010] [Indexed: 11/23/2022]
Abstract
Background Template-target sequence alignment and loop modeling are key components of protein comparative modeling. Short loops can be predicted with high accuracy using structural fragments from other, not necessairly homologous proteins, or by various minimization methods. For longer loops multiscale approaches employing coarse-grained de novo modeling techniques should be more effective. Results For a representative set of protein structures of various structural classes test predictions of loop regions have been performed using MODELLER, ROSETTA, and a CABS coarse-grained de novo modeling tool. Loops of various length, from 4 to 25 residues, were modeled assuming an ideal target-template alignment of the remaining portions of the protein. It has been shown that classical modeling with MODELLER is usually better for short loops, while coarse-grained de novo modeling is more effective for longer loops. Even very long missing fragments in protein structures could be effectively modeled. Resolution of such models is usually on the level 2-6 Å, which could be sufficient for guiding protein engineering. Further improvement of modeling accuracy could be achieved by the combination of different methods. In particular, we used 10 top ranked models from sets of 500 models generated by MODELLER as multiple templates for CABS modeling. On average, the resulting molecular models were better than the models from individual methods. Conclusions Accuracy of protein modeling, as demonstrated for the problem of loop modeling, could be improved by the combinations of different modeling techniques.
Collapse
Affiliation(s)
- Michal Jamroz
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | | |
Collapse
|
36
|
Pokarowski P, Kloczkowski A, Nowakowski S, Pokarowska M, Jernigan RL, Kolinski A. Ideal amino acid exchange forms for approximating substitution matrices. Proteins 2009; 69:379-93. [PMID: 17623859 DOI: 10.1002/prot.21509] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have analyzed 29 published substitution matrices (SMs) and five statistical protein contact potentials (CPs) for comparison. We find that popular, 'classical' SMs obtained mainly from sequence alignments of globular proteins are mostly correlated by at least a value of 0.9. The BLOSUM62 is the central element of this group. A second group includes SMs derived from alignments of remote homologs or transmembrane proteins. These matrices correlate better with classical SMs (0.8) than among themselves (0.7). A third group consists of intermediate links between SMs and CPs - matrices and potentials that exhibit mutual correlations of at least 0.8. Next, we show that SMs can be approximated with a correlation of 0.9 by expressions c(0) + x(i)x(j) + y(i)y(j) + z(i)z(j), 1<or= i, j <or= 20, where c(0) is a constant and the vectors (x(i)), (y(i)), (z(i)) correlate highly with hydrophobicity, molecular volume and coil preferences of amino acids, respectively. The present paper is the continuation of our work (Pokarowski et al., Proteins 2005;59:49-57), where similar approximation were used to derive ideal amino acid interaction forms from CPs. Both approximations allow us to understand general trends in amino acid similarity and can help improve multiple sequence alignments using the fast Fourier transform (MAFFT), fast threading or another methods based on alignments of physicochemical profiles of protein sequences. The use of this approximation in sequence alignments instead of a classical SM yields results that differ by less than 5%. Intermediate links between SMs and CPs, new formulas for approximating these matrices, and the highly significant dependence of classical SMs on coil preferences are new findings.
Collapse
Affiliation(s)
- Piotr Pokarowski
- Institute of Informatics, Faculty of Mathematics, Informatics and Mechanics, Warsaw University, 02-097 Warsaw, Poland.
| | | | | | | | | | | |
Collapse
|
37
|
Kloczkowski A, Jernigan RL, Wu Z, Song G, Yang L, Kolinski A, Pokarowski P. Distance matrix-based approach to protein structure prediction. ACTA ACUST UNITED AC 2009; 10:67-81. [PMID: 19224393 DOI: 10.1007/s10969-009-9062-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2008] [Accepted: 02/01/2009] [Indexed: 10/21/2022]
Abstract
Much structural information is encoded in the internal distances; a distance matrix-based approach can be used to predict protein structure and dynamics, and for structural refinement. Our approach is based on the square distance matrix D = [r(ij)(2)] containing all square distances between residues in proteins. This distance matrix contains more information than the contact matrix C, that has elements of either 0 or 1 depending on whether the distance r (ij) is greater or less than a cutoff value r (cutoff). We have performed spectral decomposition of the distance matrices D = sigma lambda(k)V(k)V(kT), in terms of eigenvalues lambda kappa and the corresponding eigenvectors v kappa and found that it contains at most five nonzero terms. A dominant eigenvector is proportional to r (2)--the square distance of points from the center of mass, with the next three being the principal components of the system of points. By predicting r (2) from the sequence we can approximate a distance matrix of a protein with an expected RMSD value of about 7.3 A, and by combining it with the prediction of the first principal component we can improve this approximation to 4.0 A. We can also explain the role of hydrophobic interactions for the protein structure, because r is highly correlated with the hydrophobic profile of the sequence. Moreover, r is highly correlated with several sequence profiles which are useful in protein structure prediction, such as contact number, the residue-wise contact order (RWCO) or mean square fluctuations (i.e. crystallographic temperature factors). We have also shown that the next three components are related to spatial directionality of the secondary structure elements, and they may be also predicted from the sequence, improving overall structure prediction. We have also shown that the large number of available HIV-1 protease structures provides a remarkable sampling of conformations, which can be viewed as direct structural information about the dynamics. After structure matching, we apply principal component analysis (PCA) to obtain the important apparent motions for both bound and unbound structures. There are significant similarities between the first few key motions and the first few low-frequency normal modes calculated from a static representative structure with an elastic network model (ENM) that is based on the contact matrix C (related to D), strongly suggesting that the variations among the observed structures and the corresponding conformational changes are facilitated by the low-frequency, global motions intrinsic to the structure. Similarities are also found when the approach is applied to an NMR ensemble, as well as to atomic molecular dynamics (MD) trajectories. Thus, a sufficiently large number of experimental structures can directly provide important information about protein dynamics, but ENM can also provide a similar sampling of conformations. Finally, we use distance constraints from databases of known protein structures for structure refinement. We use the distributions of distances of various types in known protein structures to obtain the most probable ranges or the mean-force potentials for the distances. We then impose these constraints on structures to be refined or include the mean-force potentials directly in the energy minimization so that more plausible structural models can be built. This approach has been successfully used by us in 2006 in the CASPR structure refinement (http://predictioncenter.org/caspR).
Collapse
Affiliation(s)
- Andrzej Kloczkowski
- Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, 112 Office and Lab Bldg, Ames, IA 50011-3020, USA.
| | | | | | | | | | | | | |
Collapse
|
38
|
Solis AD, Rackovsky S. Information and discrimination in pairwise contact potentials. Proteins 2008; 71:1071-87. [PMID: 18004788 DOI: 10.1002/prot.21733] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
39
|
Buchete NV, Straub JE, Thirumalai D. Dissecting contact potentials for proteins: relative contributions of individual amino acids. Proteins 2008; 70:119-30. [PMID: 17640067 DOI: 10.1002/prot.21538] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Knowledge-based contact potentials are routinely used in fold recognition, binding of peptides to proteins, structure prediction, and coarse-grained models to probe protein folding kinetics. The dominant physical forces embodied in the contact potentials are revealed by eigenvalue analysis of the matrices, whose elements describe the strengths of interaction between amino acid side chains. We propose a general method to rank quantitatively the importance of various inter-residue interactions represented in the currently popular pair contact potentials. Eigenvalue analysis and correlation diagrams are used to rank the inter-residue pair interactions with respect to the magnitude of their relative contributions to the contact potentials. The amino acid ranking is shown to be consistent with a mean field approximation that is used to reconstruct the original contact potentials from the most relevant amino acids for several contact potentials. By providing a general, relative ranking score for amino acids, this method permits a detailed, quantitative comparison of various contact interaction schemes. For most contact potentials, between 7 and 9 amino acids of varying chemical character are needed to accurately reconstruct the full matrix. By correlating the identified important amino acid residues in contact potentials and analysis of about 7800 structural domains in the CATH database we predict that it is important to model accurately interactions between small hydrophobic residues. In addition, only potentials that take interactions involving the protein backbone into account can predict dense packing in protein structures.
Collapse
Affiliation(s)
- N-V Buchete
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892-0520, USA.
| | | | | |
Collapse
|
40
|
Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007; 68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-0320, USA
| | | | | |
Collapse
|
41
|
Kawashima S, Pokarowski P, Pokarowska M, Kolinski A, Katayama T, Kanehisa M. AAindex: amino acid index database, progress report 2008. Nucleic Acids Res 2007; 36:D202-5. [PMID: 17998252 PMCID: PMC2238890 DOI: 10.1093/nar/gkm998] [Citation(s) in RCA: 657] [Impact Index Per Article: 38.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. We have added a collection of protein contact potentials to the AAindex as a new section. Accordingly AAindex consists of three sections now: AAindex1 for the amino acid index of 20 numerical values, AAindex2 for the amino acid substitution matrix and AAindex3 for the statistical protein contact potentials. All data are derived from published literature. The database can be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.jp/dbget-bin/www_bfind?aaindex) or downloaded by anonymous FTP (ftp://ftp.genome.jp/pub/db/community/aaindex/).
Collapse
Affiliation(s)
- Shuichi Kawashima
- Laboratory of Genome Database, Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokane-dai Minato-ku Tokyo 108-8639, Japan.
| | | | | | | | | | | |
Collapse
|
42
|
Schlessinger A, Punta M, Rost B. Natively unstructured regions in proteins identified from contact predictions. ACTA ACUST UNITED AC 2007; 23:2376-84. [PMID: 17709338 DOI: 10.1093/bioinformatics/btm349] [Citation(s) in RCA: 95] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Natively unstructured (also dubbed intrinsically disordered) regions in proteins lack a defined 3D structure under physiological conditions and often adopt regular structures under particular conditions. Proteins with such regions are overly abundant in eukaryotes, they may increase functional complexity of organisms and they usually evade structure determination in the unbound form. Low propensity for the formation of internal residue contacts has been previously used to predict natively unstructured regions. RESULTS We combined PROFcon predictions for protein-specific contacts with a generic pairwise potential to predict unstructured regions. This novel method, Ucon, outperformed the best available methods in predicting proteins with long unstructured regions. Furthermore, Ucon correctly identified cases missed by other methods. By computing the difference between predictions based on specific contacts (approach introduced here) and those based on generic potentials (realized in other methods), we might identify unstructured regions that are involved in protein-protein binding. We discussed one example to illustrate this ambitious aim. Overall, Ucon added quality and an orthogonal aspect that may help in the experimental study of unstructured regions in network hubs. AVAILABILITY http://www.predictprotein.org/submit_ucon.html. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Avner Schlessinger
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY, USA.
| | | | | |
Collapse
|
43
|
Pokarowski P, Droste K, Kolinski A. A minimal proteinlike lattice model: an alpha-helix motif. J Chem Phys 2007; 122:214915. [PMID: 15974798 DOI: 10.1063/1.1924601] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
A simple protein model of a four-helix bundle motif on a face-centered cubic lattice has been studied. Total energy of a conformation includes attractive interactions between hydrophobic residues, repulsive interactions between hydrophobic and polar residues, and a potential that favors helical turns. Using replica exchange Monte Carlo simulations we have estimated a set of parameters for which the native structure is a global minimum of conformational energy. Then we have shown that all the above types of interactions are necessary to guarantee the cooperativity of folding transition and to satisfy the thermodynamic hypothesis.
Collapse
Affiliation(s)
- Piotr Pokarowski
- Institute of Applied Mathematics and Mechanics, Warsaw University, Banacha 2, 02-097 Warsaw, Poland.
| | | | | |
Collapse
|
44
|
Kozakov D, Brenke R, Comeau SR, Vajda S. PIPER: an FFT-based protein docking program with pairwise potentials. Proteins 2006; 65:392-406. [PMID: 16933295 DOI: 10.1002/prot.21117] [Citation(s) in RCA: 597] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The Fast Fourier Transform (FFT) correlation approach to protein-protein docking can evaluate the energies of billions of docked conformations on a grid if the energy is described in the form of a correlation function. Here, this restriction is removed, and the approach is efficiently used with pairwise interaction potentials that substantially improve the docking results. The basic idea is approximating the interaction matrix by its eigenvectors corresponding to the few dominant eigenvalues, resulting in an energy expression written as the sum of a few correlation functions, and solving the problem by repeated FFT calculations. In addition to describing how the method is implemented, we present a novel class of structure-based pairwise intermolecular potentials. The DARS (Decoys As the Reference State) potentials are extracted from structures of protein-protein complexes and use large sets of docked conformations as decoys to derive atom pair distributions in the reference state. The current version of the DARS potential works well for enzyme-inhibitor complexes. With the new FFT-based program, DARS provides much better docking results than the earlier approaches, in many cases generating 50% more near-native docked conformations. Although the potential is far from optimal for antibody-antigen pairs, the results are still slightly better than those given by an earlier FFT method. The docking program PIPER is freely available for noncommercial applications.
Collapse
Affiliation(s)
- Dima Kozakov
- Department of Biomedical Engineering, Boston University, Boston, Massachusetts 02215, USA
| | | | | | | |
Collapse
|
45
|
Zhang J, Liu JS. On side-chain conformational entropy of proteins. PLoS Comput Biol 2006; 2:e168. [PMID: 17154716 PMCID: PMC1676032 DOI: 10.1371/journal.pcbi.0020168] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2006] [Accepted: 10/26/2006] [Indexed: 11/19/2022] Open
Abstract
The role of side-chain entropy (SCE) in protein folding has long been speculated about but is still not fully understood. Utilizing a newly developed Monte Carlo method, we conducted a systematic investigation of how the SCE relates to the size of the protein and how it differs among a protein's X-ray, NMR, and decoy structures. We estimated the SCE for a set of 675 nonhomologous proteins, and observed that there is a significant SCE for both exposed and buried residues for all these proteins-the contribution of buried residues approaches approximately 40% of the overall SCE. Furthermore, the SCE can be quite different for structures with similar compactness or even similar conformations. As a striking example, we found that proteins' X-ray structures appear to pack more "cleverly" than their NMR or decoy counterparts in the sense of retaining higher SCE while achieving comparable compactness, which suggests that the SCE plays an important role in favouring native protein structures. By including a SCE term in a simple free energy function, we can significantly improve the discrimination of native protein structures from decoys.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| | - Jun S Liu
- Department of Statistics, Harvard University, Cambridge, Massachusetts, United States of America
| |
Collapse
|
46
|
Parthiban V, Gromiha MM, Hoppe C, Schomburg D. Structural analysis and prediction of protein mutant stability using distance and torsion potentials: Role of secondary structure and solvent accessibility. Proteins 2006; 66:41-52. [PMID: 17068801 DOI: 10.1002/prot.21115] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Analyzing the factors behind protein stability is a key research topic in molecular biology, and has direct implications on protein structure prediction and protein-protein interactions. We have analyzed protein stability upon point mutations using a distance-dependant pair potential representing mainly through-space interactions, and torsion angle potential representing mainly neighboring effects as a basic statistical mechanical setup for the analysis. The synergetic effect of accessible surface area and secondary structure preferences was used as a classifier for the potentials. In addition, short-, medium-, and long-range interactions of the protein environment were also analyzed. Two datasets of point mutations were taken for the comparison of theoretically predicted stabilizing energy values with experimental DeltaDeltaG and DeltaDeltaGH(2)O from thermal and chemical denaturation experiments. These include 1538 and 1603 mutations, respectively, and contain 101 proteins that share a wide range of sequence identity. The resulting force fields were carefully evaluated with different statistical tests. Results show a maximum correlation of 0.87 with a standard error of 0.71 kcal/mol between predicted and measured DeltaDeltaG values and a prediction accuracy of 85.3% (stabilizing or destabilizing) for all mutations together. A correlation of 0.77 (more than 80% prediction accuracy with a standard error of 0.95 kcal/mol) each for the test dataset of split-sample validation and fivefold crossvalidation was obtained and a correlation of 0.70 (77.4% prediction accuracy with a standard error of 1.17 kcal/mol) was shown by the jackknife test. The same model was implemented, and the results were analyzed for mutations with DeltaDeltaGH(2)O. A correlation of 0.78 (standard error 0.96 kcal/mol) was observed with a prediction efficiency of 84.65%. This model can be used for the future prediction of protein structural stability together with various experimental techniques.
Collapse
Affiliation(s)
- Vijaya Parthiban
- Cologne University Bioinformatics Center, International Max Planck Research School, Cologne, Germany
| | | | | | | |
Collapse
|
47
|
Zhang J, Lin M, Chen R, Liang J, Liu JS. Monte Carlo sampling of near-native structures of proteins with applications. Proteins 2006; 66:61-8. [PMID: 17039507 DOI: 10.1002/prot.21203] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Since a protein's dynamic fluctuation inside cells affects the protein's biological properties, we present a novel method to study the ensemble of near-native structures (NNS) of proteins, namely, the conformations that are very similar to the experimentally determined native structure. We show that this method enables us to (i) quantify the difficulty of predicting a protein's structure, (ii) choose appropriate simplified representations of protein structures, and (iii) assess the effectiveness of knowledge-based potential functions. We found that well-designed simple representations of protein structures are likely as accurate as those more complex ones for certain potential functions. We also found that the widely used contact potential functions stabilize NNS poorly, whereas potential functions incorporating local structure information significantly increase the stability of NNS.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Cambridge, Massachusetts, USA
| | | | | | | | | |
Collapse
|
48
|
de Sancho D, Rey A. Assessment of protein folding potentials with an evolutionary method. J Chem Phys 2006; 125:014904. [PMID: 16863330 DOI: 10.1063/1.2210931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Many different protein folding potentials have been developed in the last decades, based upon knowledge of experimentally determined protein structures. Decoy-based techniques are frequently used to assess these force fields, but other methods can explore different features in the performance of the interaction schemes, thus helping in their evaluation. Here, we propose an evolutionary strategy to efficiently assess folding potentials. We apply it to three potentials with different characteristics, taken from the bibliography. A search for minimum energy protein topologies, treated as arrangements of rigid protein fragments, is performed. The method, applied to a set of helix bundle proteins, shows the different behavior of the studied potentials, providing a reasonably fast tool to evaluate their advantages and limitations.
Collapse
Affiliation(s)
- David de Sancho
- Departamento de Química Física I, Facultad de Ciencias Químicas, Universidad Complutense, E-28040 Madrid, Spain
| | | |
Collapse
|
49
|
Koliński A, Bujnicki JM. Generalized protein structure prediction based on combination of fold-recognition with de novo folding and evaluation of models. Proteins 2006; 61 Suppl 7:84-90. [PMID: 16187348 DOI: 10.1002/prot.20723] [Citation(s) in RCA: 85] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
To predict the tertiary structure of full-length sequences of all targets in CASP6, regardless of their potential category (from easy comparative modeling to fold recognition to apparent new folds) we used a novel combination of two very different approaches developed independently in our laboratories, which ranked quite well in different categories in CASP5. First, the GeneSilico metaserver was used to identify domains, predict secondary structure, and generate fold recognition (FR) alignments, which were converted to full-atom models using the "FRankenstein's Monster" approach for comparative modeling (CM) by recombination of protein fragments. Additional models generated "de novo" by fully automated servers were obtained from the CASP website. All these models were evaluated by VERIFY3D, and residues with scores better than 0.2 were used as a source of spatial restraints. Second, a new implementation of the lattice-based protein modeling tool CABS was used to carry out folding guided by the above-mentioned restraints with the Replica Exchange Monte Carlo sampling technique. Decoys generated in the course of simulation were subject to the average linkage hierarchical clustering. For a representative decoy from each cluster, a full-atom model was rebuilt. Finally, five models were selected for submission based on combination of various criteria, including the size, density, and average energy of the corresponding cluster, and the visual evaluation of the full-atom structures and their relationship to the original templates. The combination of FRankenstein and CABS was one of the best-performing algorithms over all categories in CASP6 (it is important to note that our human intervention was very limited, and all steps in our method can be easily automated). We were able to generate a number of very good models, especially in the Comparative Modeling and New Folds categories. Frequently, the best models were closer to the native structure than any of the templates used. The main problem we encountered was in the ranking of the final models (the only step of significant human intervention), due to the insufficient computational power, which precluded the possibility of full-atom refinement and energy-based evaluation.
Collapse
|
50
|
Radja NH, Farzami RR, Ejtehadi MR. Conservation of statistical results under the reduction of pair-contact interactions to solvation interactions. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2005; 72:061915. [PMID: 16485982 DOI: 10.1103/physreve.72.061915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2005] [Indexed: 05/06/2023]
Abstract
We show that the hydrophobicity of sequences is the leading term in Miyazawa-Jernigan interactions. Being the source of additive (solvation) terms in pair-contact interactions, they were used to reduce the energy parameters while resulting in a clear vector manipulation of energy. The reduced (additive) potential performs considerably successful in predicting the statistical properties of arbitrary structures. The evaluated designabilities of the structures by both models are highly correlated. Suggesting geometrically nondegenerate vectors (structures) as proteinlike structures, the additive model is a powerful tool for protein design. Moreover, a crossing point in the log-linear diagram of designability ranking shows that about 1/e of the structures have designabilities above the average, independent on the used model.
Collapse
Affiliation(s)
- N Hamedani Radja
- Department of Physics, Sharif University of Technology, P.O. Box 11365-9161, Tehran, Iran.
| | | | | |
Collapse
|