1
|
Andreani J, Faure G, Guerois R. InterEvScore: a novel coarse-grained interface scoring function using a multi-body statistical potential coupled to evolution. ACTA ACUST UNITED AC 2013; 29:1742-9. [PMID: 23652426 DOI: 10.1093/bioinformatics/btt260] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Structural prediction of protein interactions currently remains a challenging but fundamental goal. In particular, progress in scoring functions is critical for the efficient discrimination of near-native interfaces among large sets of decoys. Many functions have been developed using knowledge-based potentials, but few make use of multi-body interactions or evolutionary information, although multi-residue interactions are crucial for protein-protein binding and protein interfaces undergo significant selection pressure to maintain their interactions. RESULTS This article presents InterEvScore, a novel scoring function using a coarse-grained statistical potential including two- and three-body interactions, which provides each residue with the opportunity to contribute in its most favorable local structural environment. Combination of this potential with evolutionary information considerably improves scoring results on the 54 test cases from the widely used protein docking benchmark for which evolutionary information can be collected. We analyze how our way to include evolutionary information gradually increases the discriminative power of InterEvScore. Comparison with several previously published scoring functions (ZDOCK, ZRANK and SPIDER) shows the significant progress brought by InterEvScore. AVAILABILITY http://biodev.cea.fr/interevol/interevscore CONTACT guerois@cea.fr SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessica Andreani
- CEA, iBiTecS, Service de Bioenergetique Biologie Structurale et Mecanismes SB2SM, Laboratoire de Biologie Structurale et Radiobiologie LBSR, F-91191 Gif sur Yvette, France
| | | | | |
Collapse
|
2
|
Johansson KE, Hamelryck T. A simple probabilistic model of multibody interactions in proteins. Proteins 2013; 81:1340-50. [DOI: 10.1002/prot.24277] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Revised: 01/31/2013] [Accepted: 02/18/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Kristoffer Enøe Johansson
- Section for Biomolecular Sciences; Department of Biology, University of Copenhagen; Ole Maal⊘es Vej 5, DK-2200 Copenhagen N Denmark
| | - Thomas Hamelryck
- Section for Computational and RNA biology; Department of Biology, University of Copenhagen; Room 1.2.22, Ole Maal⊘es Vej 5 DK-2200 Copenhagen N Denmark
| |
Collapse
|
3
|
Afzal AJ, Srour A, Goil A, Vasudaven S, Liu T, Samudrala R, Dogra N, Kohli P, Malakar A, Lightfoot DA. Homo-dimerization and ligand binding by the leucine-rich repeat domain at RHG1/RFS2 underlying resistance to two soybean pathogens. BMC PLANT BIOLOGY 2013; 13:43. [PMID: 23497186 PMCID: PMC3626623 DOI: 10.1186/1471-2229-13-43] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 02/05/2013] [Indexed: 05/26/2023]
Abstract
BACKGROUND The protein encoded by GmRLK18-1 (Glyma_18_02680 on chromosome 18) was a receptor like kinase (RLK) encoded within the soybean (Glycine max L. Merr.) Rhg1/Rfs2 locus. The locus underlies resistance to the soybean cyst nematode (SCN) Heterodera glycines (I.) and causal agent of sudden death syndrome (SDS) Fusarium virguliforme (Aoki). Previously the leucine rich repeat (LRR) domain was expressed in Escherichia coli. RESULTS The aims here were to evaluate the LRRs ability to; homo-dimerize; bind larger proteins; and bind to small peptides. Western analysis suggested homo-dimers could form after protein extraction from roots. The purified LRR domain, from residue 131-485, was seen to form a mixture of monomers and homo-dimers in vitro. Cross-linking experiments in vitro showed the H274N region was close (<11.1 A) to the highly conserved cysteine residue C196 on the second homo-dimer subunit. Binding constants of 20-142 nM for peptides found in plant and nematode secretions were found. Effects on plant phenotypes including wilting, stem bending and resistance to infection by SCN were observed when roots were treated with 50 pM of the peptides. Far-Western analyses followed by MS showed methionine synthase and cyclophilin bound strongly to the LRR domain. A second LRR from GmRLK08-1 (Glyma_08_g11350) did not show these strong interactions. CONCLUSIONS The LRR domain of the GmRLK18-1 protein formed both a monomer and a homo-dimer. The LRR domain bound avidly to 4 different CLE peptides, a cyclophilin and a methionine synthase. The CLE peptides GmTGIF, GmCLE34, GmCLE3 and HgCLE were previously reported to be involved in root growth inhibition but here GmTGIF and HgCLE were shown to alter stem morphology and resistance to SCN. One of several models from homology and ab-initio modeling was partially validated by cross-linking. The effect of the 3 amino acid replacements present among RLK allotypes, A87V, Q115K and H274N were predicted to alter domain stability and function. Therefore, the LRR domain of GmRLK18-1 might underlie both root development and disease resistance in soybean and provide an avenue to develop new variants and ligands that might promote reduced losses to SCN.
Collapse
Affiliation(s)
- Ahmed J Afzal
- Department of Molecular Biology, Microbiology and Biochemistry and Center for Excellence the Illinois Soybean Center, Southern Illinois University at Carbondale, IL 62901, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
4
|
Lu WW, Huang RB, Wei YT, Meng JZ, Du LQ, Du QS. Statistical energy potential: reduced representation of Dehouck–Gilis–Rooman function by selecting against decoy datasets. Amino Acids 2012; 42:2353-61. [DOI: 10.1007/s00726-011-0977-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2010] [Accepted: 07/06/2011] [Indexed: 11/24/2022]
|
5
|
Park MS, Gao C, Stern HA. Estimating binding affinities by docking/scoring methods using variable protonation states. Proteins 2011; 79:304-14. [PMID: 21058298 DOI: 10.1002/prot.22883] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
To investigate the effects of multiple protonation states on protein-ligand recognition, we generated alternative protonation states for selected titratable groups of ligands and receptors. The selection of states was based on the predicted pK(a) of the unbound receptor and ligand and the proximity of titratable groups of the receptor to the binding site. Various ligand tautomer states were also considered. An independent docking calculation was run for each state. Several protocols were examined: using an ensemble of all generated states of ligand and receptor, using only the most probable state of the unbound ligand/receptor, and using only the state giving the most favorable docking score. The accuracies of these approaches were compared, using a set of 176 protein-ligand complexes (15 receptors) for which crystal structures and measured binding affinities are available. The best agreement with experiment was obtained when ligand poses from experimental crystal structures were used. For 9 of 15 receptors, using an ensemble of all generated protonation states of the ligand and receptor gave the best correlation between calculated and measured affinities.
Collapse
Affiliation(s)
- Min-Sun Park
- Department of Biochemistry and Biophysics, University of Rochester, Rochester, New York 14642, USA
| | | | | |
Collapse
|
6
|
Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010; 11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open
Abstract
Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA
| | | |
Collapse
|
7
|
Bahadur RP, Chakrabarti P. Discriminating the native structure from decoys using scoring functions based on the residue packing in globular proteins. BMC STRUCTURAL BIOLOGY 2009; 9:76. [PMID: 20038291 PMCID: PMC2809062 DOI: 10.1186/1472-6807-9-76] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/11/2009] [Accepted: 12/28/2009] [Indexed: 11/14/2022]
Abstract
BACKGROUND Setting the rules for the identification of a stable conformation of a protein is of utmost importance for the efficient generation of structures in computer simulation. For structure prediction, a considerable number of possible models are generated from which the best model has to be selected. RESULTS Two scoring functions, Rs and Rp, based on the consideration of packing of residues, which indicate if the conformation of an amino acid sequence is native-like, are presented. These are defined using the solvent accessible surface area (ASA) and the partner number (PN) (other residues that are within 4.5 A) of a particular residue. The two functions evaluate the deviation from the average packing properties (ASA or PN) of all residues in a polypeptide chain corresponding to a model of its three-dimensional structure. While simple in concept and computationally less intensive, both the functions are at least as efficient as any other energy functions in discriminating the native structure from decoys in a large number of standard decoy sets, as well as on models submitted for the targets of CASP7. Rs appears to be slightly more effective than Rp, as determined by the number of times the native structure possesses the minimum value for the function and its separation from the average value for the decoys. CONCLUSION Two parameters, Rs and Rp, are discussed that can very efficiently recognize the native fold for a sequence from an ensemble of decoy structures. Unlike many other algorithms that rely on the use of composite scoring function, these are based on a single parameter, viz., the accessible surface area (or the number of residues in contact), but still able to capture the essential attribute of the native fold.
Collapse
Affiliation(s)
- Ranjit Prasad Bahadur
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
- Current address: Department of Biotechnology, Indian Institute of Technology, Kharagpur 721302, West Bengal, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
8
|
Májek P, Elber R. A coarse-grained potential for fold recognition and molecular dynamics simulations of proteins. Proteins 2009; 76:822-36. [PMID: 19291741 DOI: 10.1002/prot.22388] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A coarse-grained potential for protein simulations and fold ranking is presented. The potential is based on a two-point model of individual amino acids and a specific implementation of hydrogen bonding. Parameters are determined for distance dependent pair interactions, pseudo bonds, angles, and torsions. A scaling factor for a hydrogen bonding term is also determined. Iterative sampling for 4867 proteins reproduces distributions of internal coordinates and distances observed in the Protein Data Bank. The adjustment of the potential and resampling are in the spirit of the generalized ensemble approach. No native structure information (e.g., secondary structure) is used in the calculation of the potential or in the simulation of a particular protein. The potential is subject to two tests as follows: (i) simulations of 956 globular proteins in the neighborhood of their native folds (these proteins were not used in the training set) and (ii) discrimination between native and decoy structures for 2470 proteins with 305,000 decoys and the "Decoys 'R' Us" dataset. In the first test, 58% of tested proteins stay within 5 A from the native fold in Molecular Dynamics simulations of more than 20 nanoseconds using the new potential. The potential is also useful in differentiating between correct and approximate folds providing significant signal for structure prediction algorithms. Sampling with the potential consistently regenerates the distribution of distances and internal coordinates it learned. Nevertheless, during Molecular Dynamics simulations structures are found that reproduce the learned distributions but are far from the native fold.
Collapse
Affiliation(s)
- Peter Májek
- Department of Computer Science, Upson Hall 4130, Cornell University, Ithaca, New York 14853-7501, USA
| | | |
Collapse
|
9
|
Gu J, Li H, Jiang H, Wang X. Optimizing energy potential for protein fold recognition with parametric evaluation function. J Comput Biol 2009; 16:427-42. [PMID: 19254182 DOI: 10.1089/cmb.2008.0128] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In this paper, a new optimization method is proposed to determine a simplified energy potential for protein fold recognition, which consists of the residue-residue contact, hydrophobicity, and pseudodihedral potentials. With a parametric evaluation function method, the Z-scores of all the proteins in a training set are optimized simultaneously to obtain the best parameter set of the potential. For this multi-objective and multi-constraint problem, the new optimization scheme is very effective. The derived potential is then tested on two high-quality decoy sets and compared with other classical fold recognition potentials. With the simplified energy potential, we achieve a high level of discrimination capability between correct and incorrect folds.
Collapse
Affiliation(s)
- Junfeng Gu
- Department of Engineering Mechanics, State Key Laboratory of Structural Analysis for Industrial Equipment, Dalian University of Technology, Dalian, China
| | | | | | | |
Collapse
|
10
|
Gu J, Li H, Jiang H, Wang X. A simple Calpha-SC potential with higher accuracy for protein fold recognition. Biochem Biophys Res Commun 2009; 379:610-5. [PMID: 19121621 DOI: 10.1016/j.bbrc.2008.12.131] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2008] [Accepted: 12/20/2008] [Indexed: 11/18/2022]
Abstract
In this paper, an improved C(alpha)-SC energy potential designed for protein fold recognition was reported. It consists of three extremely simple interaction terms which are supposed to be the dominant interactions in protein folding: residue-residue contact, hydrophobicity and pseudodihedral potentials. The potential function only contains 210 contacts, one hydrophobic and one torsion parameters, which have been optimized using an interior point algorithm of linear programming. Tests of the derived potential function on commonly used decoy sets illustrate that it outperforms most of the existing coarse-grained potentials in terms of its capabilities in recognizing native structures and consistency in achieving high Z-scores across decoy sets, and it has almost equivalent performance to the potentials which considered complex intra-molecular interactions. The results show that our scoring function is a generally prospective potential for protein structure prediction and modeling with regard to its recognition and computation efficacy.
Collapse
Affiliation(s)
- Junfeng Gu
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, Dalian University of Technology, Dalian 116024, China
| | | | | | | |
Collapse
|
11
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
12
|
Rajgaria R, McAllister SR, Floudas CA. Distance dependent centroid to centroid force fields using high resolution decoys. Proteins 2008; 70:950-70. [PMID: 17847088 DOI: 10.1002/prot.21561] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Simplified force fields play an important role in protein structure prediction and de novo protein design by requiring less computational effort than detailed atomistic potentials. A side chain centroid based, distance dependent pairwise interaction potential has been developed. A linear programming based formulation was used in which non-native "decoy" conformers are forced to take a higher energy compared with the corresponding native structure. This model was trained on an enhanced and diverse protein set. High quality decoy structures were generated for approximately 1400 nonhomologous proteins using torsion angle dynamics along with restricted variations of the hydrophobic cores of the native structure. The resulting decoy set was used to train the model yielding two different side chain centroid based force fields that differ in the way distance dependence has been used to calculate energy parameters. These force fields were tested on an independent set of 148 test proteins with 500 decoy structures for each protein. The side chain centroid force fields were successful in correctly identifying approximately 86% native structures. The Z-scores produced by the proposed centroid-centroid distance dependent force fields improved compared with other distance dependent C(alpha)-C(alpha) or side chain based force fields.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
13
|
Cheng J, Pei J, Lai L. A free-rotating and self-avoiding chain model for deriving statistical potentials based on protein structures. Biophys J 2007; 92:3868-77. [PMID: 17351015 PMCID: PMC1868969 DOI: 10.1529/biophysj.106.102152] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Statistical potentials have been widely used in protein studies despite the much-debated theoretical basis. In this work, we have applied two physical reference states for deriving the statistical potentials based on protein structure features to achieve zero interaction and orthogonalization. The free-rotating chain-based potential applies a local free-rotating chain reference state, which could theoretically be described by the Gaussian distribution. The self-avoiding chain-based potential applies a reference state derived from a database of artificial self-avoiding backbones generated by Monte Carlo simulation. These physical reference states are independent of known protein structures and are based solely on the analytical formulation or simulation method. The new potentials performed better and yielded higher Z-scores and success rates compared to other statistical potentials. The end-to-end distance distribution produced by the self-avoiding chain model was similar to the distance distribution of protein atoms in structure database. This fact may partly explain the basis of the reference states that depend on the atom pair frequency observed in the protein database. The current study showed that a more physical reference model improved the performance of statistical potentials in protein fold recognition, which could also be extended to other types of applications.
Collapse
Affiliation(s)
- Ji Cheng
- State Key Laboratory for Structural Chemistry of Stable and Unstable Species, College of Chemistry and Molecular Engineering, and Center for Theoretical Biology, Peking University, Beijing, China
| | | | | |
Collapse
|