1
|
Elfiky AA, Ismail A. Molecular dynamics and docking reveal the potency of novel GTP derivatives against RNA dependent RNA polymerase of genotype 4a HCV. Life Sci 2019; 238:116958. [PMID: 31628915 DOI: 10.1016/j.lfs.2019.116958] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2019] [Revised: 10/08/2019] [Accepted: 10/11/2019] [Indexed: 01/17/2023]
Abstract
AIM To work on Hepatitis C Virus (HCV), one of the major causes of liver cirrhosis and hepatocellular carcinoma, polymerase of genotype 4a that have no solved structures deposited in the protein data bank (PDB) yet. Understanding the dynamics and testing some novel inhibitors are also covered. MATERIALS AND METHODS Molecular Dynamics Simulation (MDS) is performed for a period of 1 μs on comparatively modeled then validated NS5b of subtype 4a. Following MDS analysis, molecular docking is performed to test the inhibitory performance of eight novels suggested guanosine derivatives using 181 different conformations of the protein model gathered during the MDS run after the equilibration period. KEY FINDINGS The results yield that the eight modified, at position 2', GTP derivatives (fluorine, Hydroxyl, and sulphonyl oxydanyl) have binding energies comparable to the parent molecule, GTP. Besides, the eight suggested compounds have lower binding energies (and hence better in binding) compared to sofosbuvir (a drug approved by FDA in 2013 against HCV) and ribavirin (a wide range acting antiviral drug used before against HCV). SIGNIFICANCE Combined molecular dynamics and molecular docking are able to test the hypothesis of HCV polymerase dynamics doesn't affect the nucleotides (or nucleotide inhibitors) binding to its active site. Despite the reported highly dynamic subtype 4a of HCV; all the nucleotide inhibitors under the study are able to, tightly, bind to NS5b of genotype 4a. This behavior is reported before for the Zika virus polymerase, as well.
Collapse
Affiliation(s)
- Abdo A Elfiky
- Biophysics Department, Faculty of Science, Cairo University, Giza, Egypt.
| | - Alaa Ismail
- Biophysics Department, Faculty of Science, Cairo University, Giza, Egypt
| |
Collapse
|
2
|
Computational and experimental approaches to reveal the effects of single nucleotide polymorphisms with respect to disease diagnostics. Int J Mol Sci 2014; 15:9670-717. [PMID: 24886813 PMCID: PMC4100115 DOI: 10.3390/ijms15069670] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Revised: 05/15/2014] [Accepted: 05/16/2014] [Indexed: 12/25/2022] Open
Abstract
DNA mutations are the cause of many human diseases and they are the reason for natural differences among individuals by affecting the structure, function, interactions, and other properties of DNA and expressed proteins. The ability to predict whether a given mutation is disease-causing or harmless is of great importance for the early detection of patients with a high risk of developing a particular disease and would pave the way for personalized medicine and diagnostics. Here we review existing methods and techniques to study and predict the effects of DNA mutations from three different perspectives: in silico, in vitro and in vivo. It is emphasized that the problem is complicated and successful detection of a pathogenic mutation frequently requires a combination of several methods and a knowledge of the biological phenomena associated with the corresponding macromolecules.
Collapse
|
3
|
Heinke F, Schildbach S, Stockmann D, Labudde D. eProS--a database and toolbox for investigating protein sequence-structure-function relationships through energy profiles. Nucleic Acids Res 2012; 41:D320-6. [PMID: 23161695 PMCID: PMC3531212 DOI: 10.1093/nar/gks1079] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Gaining information about structural and functional features of newly identified proteins is often a difficult task. This information is crucial for understanding sequence–structure–function relationships of target proteins and, thus, essential in comprehending the mechanisms and dynamics of the molecular systems of interest. Using protein energy profiles is a novel approach that can contribute in addressing such problems. An energy profile corresponds to the sequence of energy values that are derived from a coarse-grained energy model. Energy profiles can be computed from protein structures or predicted from sequences. As shown, correspondences and dissimilarities in energy profiles can be applied for investigations of protein mechanics and dynamics. We developed eProS (energy profile suite, freely available at http://bioservices.hs-mittweida.de/Epros/), a database that provides ∼76 000 pre-calculated energy profiles as well as a toolbox for addressing numerous problems of structure biology. Energy profiles can be browsed, visualized, calculated from an uploaded structure or predicted from sequence. Furthermore, it is possible to align energy profiles of interest or compare them with all entries in the eProS database to identify significantly similar energy profiles and, thus, possibly relevant structural and functional relationships. Additionally, annotations and cross-links from numerous sources provide a broad view of potential biological correspondences.
Collapse
Affiliation(s)
- Florian Heinke
- Department of Mathematics, University of Applied Sciences Mittweida, Mittweida, Saxony, Technikumplatz 17, D-09648, Germany.
| | | | | | | |
Collapse
|
4
|
Zhou H, Skolnick J. GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 2012; 101:2043-52. [PMID: 22004759 DOI: 10.1016/j.bpj.2011.09.012] [Citation(s) in RCA: 201] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 09/07/2011] [Accepted: 09/09/2011] [Indexed: 12/18/2022] Open
Abstract
An accurate scoring function is a key component for successful protein structure prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled structures that are close to native (all within ∼4.0 Å) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
5
|
Li H, Zhou Y. FOLD HELICAL PROTEINS BY ENERGY MINIMIZATION IN DIHEDRAL SPACE AND A DFIRE-BASED STATISTICAL ENERGY FUNCTION. J Bioinform Comput Biol 2011; 3:1151-70. [PMID: 16278952 DOI: 10.1142/s0219720005001430] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2004] [Revised: 04/12/2005] [Accepted: 04/21/2005] [Indexed: 11/18/2022]
Abstract
Statistical energy functions are discrete (or stepwise) energy functions that lack van der Waals repulsion. As a result, they are often applied directly to a given structure (native or decoy) without further energy minimization being performed to the structure. However, the full benefit (or hidden defect) of an energy function cannot be revealed without energy minimization. This paper tests a recently developed, all-atom statistical energy function by energy minimization with a fixed secondary helical structure in dihedral space. This is accomplished by combining the statistical energy function based on a distance-scaled finite ideal-gas reference (DFIRE) state with a simple repulsive interaction and an improper torsion energy function. The energy function was used to minimize 2000 random initial structures of 41 small and medium-sized helical proteins in a dihedral space with a fixed helical region. Results indicate that near-native structures for most studied proteins can be obtained by minimization alone. The average minimum root-mean-squared distance (rmsd) from the native structure for all 41 proteins is 4.1 Å. The energy function (together with a simple clustering of similar structures) also makes a reasonable selection of near-native structures from minimized structures. The average rmsd value and the average rank for the best structure in the top five is 6.8 Å and 2.4, respectively. The accuracy of the structures sampled and the structure selections can be improved significantly with the removal of flexible terminal regions in rmsd calculations and in minimization and with the increase in the number of minimizations. The minimized structures form an excellent decoy set for testing other energy functions because most structures are well-packed with minimum hard-core overlaps with correct hydrophobic/hydrophilic partitioning. They are available online at .
Collapse
Affiliation(s)
- Hongzhi Li
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, New York 14214, USA.
| | | |
Collapse
|
6
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
7
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
8
|
Abstract
Knowledge-based approaches frequently employ empirical relations to determine effective potentials for coarse-grained protein models directly from protein databank structures. Although these approaches have enjoyed considerable success and widespread popularity in computational protein science, their fundamental basis has been widely questioned. It is well established that conventional knowledge-based approaches do not correctly treat many-body correlations between amino acids. Moreover, the physical significance of potentials determined by using structural statistics from different proteins has remained obscure. In the present work, we address both of these concerns by introducing and demonstrating a theory for calculating transferable potentials directly from a databank of protein structures. This approach assumes that the databank structures correspond to representative configurations sampled from equilibrium solution ensembles for different proteins. Given this assumption, this physics-based theory exactly treats many-body structural correlations and directly determines the transferable potentials that provide a variationally optimized approximation to the free energy landscape for each protein. We illustrate this approach by first constructing a databank of protein structures using a model potential and then quantitatively recovering this potential from the structure databank. The proposed framework will clarify the assumptions and physical significance of knowledge-based potentials, allow for their systematic improvement, and provide new insight into many-body correlations and cooperativity in folded proteins.
Collapse
|
9
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
10
|
Solis AD, Rackovsky SR. Information-theoretic analysis of the reference state in contact potentials used for protein structure prediction. Proteins 2010; 78:1382-97. [PMID: 20034109 DOI: 10.1002/prot.22652] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi-chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information-theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA.
| | | |
Collapse
|
11
|
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Although much is known about the packing geometry observed between alpha-helices and between beta-sheets, there has been little progress on characterizing helix-sheet interactions. We present an analysis of the conformation of alphabeta(2) motifs in proteins, corresponding to all occurrences of helices in contact with two strands that are hydrogen bonded. The geometry of the alphabeta(2) motif is characterized by the azimuthal angle theta between the helix axis and an average vector representing the two strands, the elevation angle psi between the helix axis and the plane containing the two strands, and the distance D between the helix and the strands. We observe that the helix tends to align to the two strands, with a preference for an antiparallel orientation if the two strands are parallel; this preference is diminished for other topologies of the beta-sheet. Side-chain packing at the interface between the helix and the strands is mostly hydrophobic, with a preference for aliphatic amino acids in the strand and aromatic amino acids in the helix. From the knowledge of the geometry and amino acid propensities of alphabeta(2) motifs in proteins, we have derived different statistical potentials that are shown to be efficient in picking native-like conformations among a set of non-native conformations in well-known decoy datasets. The information on the geometry of alphabeta(2) motifs as well as the related statistical potentials have applications in the field of protein structure prediction.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| |
Collapse
|
12
|
Abstract
Empirical or knowledge-based potentials have many applications in structural biology such as the prediction of protein structure, protein-protein, and protein-ligand interactions and in the evaluation of stability for mutant proteins, the assessment of errors in experimentally solved structures, and the design of new proteins. Here, we describe a simple procedure to derive and use pairwise distance-dependent potentials that rely on the definition of effective atomic interactions, which attempt to capture interactions that are more likely to be physically relevant. Based on a difficult benchmark test composed of proteins with different secondary structure composition and representing many different folds, we show that the use of effective atomic interactions significantly improves the performance of potentials at discriminating between native and near-native conformations. We also found that, in agreement with previous reports, the potentials derived from the observed effective atomic interactions in native protein structures contain a larger amount of mutual information. A detailed analysis of the effective energy functions shows that atom connectivity effects, which mostly arise when deriving the potential by the incorporation of those indirect atomic interactions occurring beyond the first atomic shell, are clearly filtered out. The shape of the energy functions for direct atomic interactions representing hydrogen bonding and disulfide and salt bridges formation is almost unaffected when effective interactions are taken into account. On the contrary, the shape of the energy functions for indirect atom interactions (i.e., those describing the interaction between two atoms bound to a direct interacting pair) is clearly different when effective interactions are considered. Effective energy functions for indirect interacting atom pairs are not influenced by the shape or the energy minimum observed for the corresponding direct interacting atom pair. Our results suggest that the dependency between the signals in different energy functions is a key aspect that need to be addressed when empirical energy functions are derived and used, and also highlight the importance of additivity assumptions in the use of potential energy functions.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | |
Collapse
|
13
|
DeVane R, Shinoda W, Moore PB, Klein ML. A Transferable Coarse Grain Non-bonded Interaction Model For Amino Acids. J Chem Theory Comput 2009; 5:2115-2124. [PMID: 20161179 PMCID: PMC2725330 DOI: 10.1021/ct800441u] [Citation(s) in RCA: 100] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
The large quantity of protein sequences being generated from genomic data has greatly outpaced the throughput of experimental protein structure determining methods and consequently brought urgency to the need for accurate protein structure prediction tools. Reduced resolution, or coarse grained (CG) models, have become a mainstay in computational protein structure prediction perfoming among the best tools available. The quest for high quality generalized CG models presents an extremely challenging yet popular endeavor. To this point, a CG based interaction potential is presented here for the naturally occurring amino acids. In the present approach, three to four heavy atoms and associated hydrogens are condensed into a single CG site. The parameterization of the site-site interaction potential relies on experimental data thus providing a novel approach that is neither based on all-atom (AA) simulations nor experimental protein structural data. Specifically, intermolecular potentials, which are based on Lennard-Jones (LJ) style functional forms, are parameterized using thermodynamic data including surface tension and density. Using this approach, an amino acid potential dataset has been developed for use in modeling peptides and proteins. The potential is evaluated here by comparing the solvent accessible surface area (SASA) to AA representations and ranking of protein decoy data sets provided by Decoys 'R' Us. The model is shown to perform very well compared to other existing prediction models for these properties.
Collapse
Affiliation(s)
- Russell DeVane
- Center for Molecular Modeling and Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104-6323
| | - Wataru Shinoda
- Research Institute for Computational Sciences (RICS), National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba Central 2, Umezono 1-1-1, Tsukuba, Ibaraki 305-8568, Japan
| | - Preston B. Moore
- Department of Chemistry and Biochemistry, University of the Sciences in Philadelphia, 600 s. 43rd Street, Philadelphia, PA 19104
| | - Michael L. Klein
- Center for Molecular Modeling and Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104-6323
| |
Collapse
|
14
|
Xue B, Faraggi E, Zhou Y. Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 2009; 76:176-83. [PMID: 19137600 DOI: 10.1002/prot.22329] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
A neural network method (SPINE-2D) is introduced to provide a sequence-based prediction of residue-residue contact maps. This method is built on the success of SPINE in predicting secondary structure, residue solvent accessibility, and backbone torsion angles via large-scale training with overfit protection and a two-layer neural network. SPINE-2D achieved a 10-fold cross-validated accuracy of 47% (+/-2%) for top L/5 predicted contacts between two residues with sequence separation of six or more and an accuracy of 24 +/- 1% for nonlocal contacts with sequence separation of 24 residues or more. The accuracies of 23% and 26% for nonlocal contact predictions are achieved for two independent datasets of 500 proteins and 82 CASP 7 targets, respectively. A comparison with other methods indicates that SPINE-2D is among the most accurate methods for contact-map prediction. SPINE-2D is available as a webserver at http://sparks.informatics.iupui.edu.
Collapse
Affiliation(s)
- Bin Xue
- Indiana University School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | |
Collapse
|
15
|
Padrón-García JA, Alonso-Tarajano M, Alonso-Becerra E, Winterburn TJ, Ruiz Y, Kay J, Berry C. Quantitative structure activity relationship of IA3-like peptides as aspartic proteinase inhibitors. Proteins 2009; 75:859-69. [DOI: 10.1002/prot.22295] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
16
|
Solis AD, Rackovsky S. Information and discrimination in pairwise contact potentials. Proteins 2008; 71:1071-87. [PMID: 18004788 DOI: 10.1002/prot.21733] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
We examine the information-theoretic characteristics of statistical potentials that describe pairwise long-range contacts between amino acid residues in proteins. In our work, we seek to map out an efficient information-based strategy to detect and optimally utilize the structural information latent in empirical data, to make contact potentials, and other statistically derived folding potentials, more effective tools in protein structure prediction. Foremost, we establish fundamental connections between basic information-theoretic quantities (including the ubiquitous Z-score) and contact "energies" or scores used routinely in protein structure prediction, and demonstrate that the informatic quantity that mediates fold discrimination is the total divergence. We find that pairwise contacts between residues bear a moderate amount of fold information, and if optimized, can assist in the discrimination of native conformations from large ensembles of native-like decoys. Using an extensive battery of threading tests, we demonstrate that parameters that affect the information content of contact potentials (e.g., choice of atoms to define residue location and the cut-off distance between pairs) have a significant influence in their performance in fold recognition. We conclude that potentials that have been optimized for mutual information and that have high number of score events per sequence-structure alignment are superior in identifying the correct fold. We derive the quantity "information product" that embodies these two critical factors. We demonstrate that the information product, which does not require explicit threading to compute, is as effective as the Z-score, which requires expensive decoy threading to evaluate. This new objective function may be able to speed up the multidimensional parameter search for better statistical potentials. Lastly, by demonstrating the functional equivalence of quasi-chemically approximated "energies" to fundamental informatic quantities, we make statistical potentials less dependent on theoretically tenuous biophysical formalisms and more amenable to direct bioinformatic optimization.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
17
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
18
|
Marabotti A, Colonna G, Facchiano A. New computational strategy to analyze the interactions of ERalpha and ERbeta with different ERE sequences. J Comput Chem 2007; 28:1031-41. [PMID: 17269124 DOI: 10.1002/jcc.20582] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The importance of computational methods for the simulation and analysis of biological systems has increased during the last years. In particular, methods to predict binding energies are developing not only with the aim of ranking the affinities between two or more complexes, but also to quantify the contribution of different types of interaction. In this work, we present the application of HINT, a non Newtonian force field, to rank the affinities of complexes formed by estrogen receptors (ER) alpha and beta and different estrogen responsive elements (ERE) near the estrogen-regulated genes. We used the crystallographic coordinates of the DNA binding domain of ERalpha complexed to a consensus ERE as a starting point to simulate several complexes in which some nucleotides in the ERE sequence were mutated. Moreover, we used homology modeling methods to create the structure of the complexes between the DNA binding domain of ERbeta (for which no experimental structures are currently available) and the same ERE sequences. Our results show that HINT is able to rank the affinities of ERalpha and ERbeta for different ERE sequences, and to correctly identify the positions on the DNA sequence that are most important for binding affinity. Moreover, the HINT output gives us the opportunity to identify and quantify the role played by each single atom of amino acids and nucleotides in the binding event, as well as to predict the effect on the binding affinity for other nucleotide mutations.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory of Bioinformatics and Computational Biology, Institute of Food Science, National Research Council, Avellino, Italy.
| | | | | |
Collapse
|
19
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1778] [Impact Index Per Article: 104.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
20
|
|
21
|
Robertson TA, Varani G. An all-atom, distance-dependent scoring function for the prediction of protein-DNA interactions from structure. Proteins 2006; 66:359-74. [PMID: 17078093 DOI: 10.1002/prot.21162] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed an all-atom statistical potential function for the prediction of protein-DNA interactions from their structures, and show that this method outperforms similar, lower-resolution statistical potentials in a series of decoy discrimination experiments. The all-atom formalism appears to capture details of atomic interactions that are missed by the lower-resolution methods, with the majority of the discriminatory power arising from its description of short-range atomic contacts. We show that, on average, the method is able to identify 90% of near-native docking decoys within the best-scoring 10% of structures in a given decoy set, and it compares favorably with an optimized physical potential function in a test of structure-based identification of DNA binding-sequences. These results demonstrate that all-atom statistical functions specific to protein-DNA interactions can achieve great discriminatory power despite the limited size of the structural database. They also suggest that the statistical scores may soon be able to achieve accuracy on par with more complex, physical potential functions.
Collapse
Affiliation(s)
- Timothy A Robertson
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
22
|
Eramian D, Shen MY, Devos D, Melo F, Sali A, Marti-Renom MA. A composite score for predicting errors in protein structure models. Protein Sci 2006; 15:1653-66. [PMID: 16751606 PMCID: PMC2242555 DOI: 10.1110/ps.062095806] [Citation(s) in RCA: 114] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Reliable prediction of model accuracy is an important unsolved problem in protein structure modeling. To address this problem, we studied 24 individual assessment scores, including physics-based energy functions, statistical potentials, and machine learning-based scoring functions. Individual scores were also used to construct approximately 85,000 composite scoring functions using support vector machine (SVM) regression. The scores were tested for their abilities to identify the most native-like models from a set of 6000 comparative models of 20 representative protein structures. Each of the 20 targets was modeled using a template of <30% sequence identity, corresponding to challenging comparative modeling cases. The best SVM score outperformed all individual scores by decreasing the average RMSD difference between the model identified as the best of the set and the model with the lowest RMSD (DeltaRMSD) from 0.63 A to 0.45 A, while having a higher Pearson correlation coefficient to RMSD (r=0.87) than any other tested score. The most accurate score is based on a combination of the DOPE non-hydrogen atom statistical potential; surface, contact, and combined statistical potentials from MODPIPE; and two PSIPRED/DSSP scores. It was implemented in the SVMod program, which can now be applied to select the final model in various modeling problems, including fold assignment, target-template alignment, and loop modeling.
Collapse
Affiliation(s)
- David Eramian
- Graduate Group in Biophysics, Department of Biopharmaceutical Sciences, University of California at San Francisco 94158, USA
| | | | | | | | | | | |
Collapse
|
23
|
Abstract
We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
Collapse
Affiliation(s)
- Y Dehouck
- Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, 1050 Brussels, Belgium.
| | | | | |
Collapse
|
24
|
Skolnick J. In quest of an empirical potential for protein structure prediction. Curr Opin Struct Biol 2006; 16:166-71. [PMID: 16524716 DOI: 10.1016/j.sbi.2006.02.004] [Citation(s) in RCA: 112] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2005] [Revised: 02/10/2006] [Accepted: 02/23/2006] [Indexed: 11/19/2022]
Abstract
Key to successful protein structure prediction is a potential that recognizes the native state from misfolded structures. Recent advances in empirical potentials based on known protein structures include improved reference states for assessing random interactions, sidechain-orientation-dependent pair potentials, potentials for describing secondary or supersecondary structural preferences and, most importantly, optimization protocols that sculpt the energy landscape to enhance the correlation between native-like features and the energy. Improved clustering algorithms that select native-like structures on the basis of cluster density also resulted in greater prediction accuracy. For template-based modeling, these advances allowed improvement in predicted structures relative to their initial template alignments over a wide range of target-template homology. This represents significant progress and suggests applications to proteome-scale structure prediction.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington Street, Buffalo, NY 14203, USA.
| |
Collapse
|
25
|
Urbanc B, Borreguero JM, Cruz L, Stanley HE. Ab initio discrete molecular dynamics approach to protein folding and aggregation. Methods Enzymol 2006; 412:314-38. [PMID: 17046666 DOI: 10.1016/s0076-6879(06)12019-4] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Understanding the toxicity of amyloidogenic protein aggregates and designing therapeutic approaches require the knowledge of their structure at atomic resolution. Although solid-state NMR, X-ray diffraction, and other experimental techniques are capable of discerning the protein fibrillar structure, determining the structures of early aggregates, called oligomers, is a challenging experimental task. Computational studies by all-atom molecular dynamics, which provides a complete description of a protein in the solvent, are typically limited to study folding of smaller protein or aggregation of a small number of short protein fragments. We review an efficient ab initio computer simulation approach to protein folding and aggregation using discrete molecular dynamics (DMD) in combination with several coarse-grained protein models and implicit solvent. This approach involves different complexity levels in both the protein model and the interparticle interactions. Starting from the simplest protein model with minimal interactions, and gradually increasing its complexity, while guided by in vitro findings, we can systematically select the key features of the protein model and interactions that drive protein folding and aggregation. Because the method used in this DMD approach does not require any knowledge of the native or any other state of the protein, it can be applied to study degenerative disorders associated with protein misfolding and aberrant protein aggregation. The choice of the coarse-grained model depends on the complexity of the protein and specific questions to be addressed, which are mostly suggested by in vitro findings. Thus, we illustrate our approach on amyloid beta-protein (Abeta) associated with Alzheimer's disease (AD). Despite the simplifications introduced in the DMD approach, the predicted Abeta conformations are in agreement with existing experimental data. The in silico findings also provide further insights into the structure and dynamics of Abeta folding and oligomer formation that are amenable to in vitro testing.
Collapse
Affiliation(s)
- Brigita Urbanc
- Department of Physics, Center for Polymer Studies, Boston University, Massachusetts 02215, USA
| | | | | | | |
Collapse
|