1
|
Chen MC, Li Y, Zhu YH, Ge F, Yu DJ. SSCpred: Single-Sequence-Based Protein Contact Prediction Using Deep Fully Convolutional Network. J Chem Inf Model 2020; 60:3295-3303. [DOI: 10.1021/acs.jcim.9b01207] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Ming-Cai Chen
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Yang Li
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
- Department of Computational Medicine and Bioinformatics, University of Michigan, Washtenaw 100, Ann Arbor, Michigan 48109-2218, United States
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Xiaolingwei 200, Nanjing 210094, P. R. China
| |
Collapse
|
2
|
Wozniak PP, Pelc J, Skrzypecki M, Vriend G, Kotulska M. Bio-knowledge-based filters improve residue-residue contact prediction accuracy. Bioinformatics 2019; 34:3675-3683. [PMID: 29850768 DOI: 10.1093/bioinformatics/bty416] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 05/19/2018] [Indexed: 11/13/2022] Open
Abstract
Motivation Residue-residue contact prediction through direct coupling analysis has reached impressive accuracy, but yet higher accuracy will be needed to allow for routine modelling of protein structures. One way to improve the prediction accuracy is to filter predicted contacts using knowledge about the particular protein of interest or knowledge about protein structures in general. Results We focus on the latter and discuss a set of filters that can be used to remove false positive contact predictions. Each filter depends on one or a few cut-off parameters for which the filter performance was investigated. Combining all filters while using default parameters resulted for a test set of 851 protein domains in the removal of 29% of the predictions of which 92% were indeed false positives. Availability and implementation All data and scripts are available at http://comprec-lin.iiar.pwr.edu.pl/FPfilter/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Pelc
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - M Skrzypecki
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
3
|
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018; 86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]
Abstract
Following up on the encouraging results of residue-residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology-based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution-based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single-domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.
Collapse
Affiliation(s)
- Joerg Schaarschmidt
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| | | | | | - Alexandre M.J.J. Bonvin
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| |
Collapse
|
4
|
Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017; 33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein. Results We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX. Availability and implementation All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/. Contact malgorzata.kotulska@pwr.edu.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - B M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
5
|
Wozniak PP, Vriend G, Kotulska M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017; 33:1497-1504. [PMID: 28203707 DOI: 10.1093/bioinformatics/btx013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/11/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- P P Wozniak
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| |
Collapse
|
6
|
Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins 2013; 82 Suppl 2:138-53. [PMID: 23760879 DOI: 10.1002/prot.24340] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Revised: 05/14/2013] [Accepted: 05/21/2013] [Indexed: 12/13/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact predictions from 26 prediction groups participating in the 10th round of the CASP experiment. The most recently developed direct coupling analysis methods did not take part in the experiment likely because they require a very deep sequence alignment not available for any of the 114 CASP10 targets. The performance of contact prediction methods was evaluated with the measures used in previous CASPs (i.e., prediction accuracy and the difference between the distribution of the predicted contacts and that of all pairs of residues in the target protein), as well as new measures, such as the Matthews correlation coefficient, the area under the precision-recall curve and the ranks of the first correctly and incorrectly predicted contact. We also evaluated the ability to detect interdomain contacts and tested whether the difficulty of predicting contacts depends upon the protein length and the depth of the family sequence alignment. The analyses were carried out on the target domains for which structural homologs did not exist or were difficult to identify. The evaluation was performed for all types of contacts (short, medium, and long-range), with emphasis placed on long-range contacts, i.e. those involving residues separated by at least 24 residues along the sequence. The assessment suggests that the best CASP10 contact prediction methods perform at approximately the same level, and comparably to those participating in CASP9.
Collapse
|
7
|
Shibberu Y, Holder A. A spectral approach to protein structure alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:867-875. [PMID: 21301031 DOI: 10.1109/tcbb.2011.24] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A new intrinsic geometry based on a spectral analysis is used to motivate methods for aligning protein folds. The geometry is induced by the fact that a distance matrix can be scaled so that its eigenvalues are positive. We provide a mathematically rigorous development of the intrinsic geometry underlying our spectral approach and use it to motivate two alignment algorithms. The first uses eigenvalues alone and dynamic programming to quickly compute a fold alignment. Family identification results are reported for the Skolnick40 and Proteus300 data sets. The second algorithm extends our spectral method by iterating between our intrinsic geometry and the 3D geometry of a fold to make high-quality alignments. Results and comparisons are reported for several difficult fold alignments. The second algorithm's ability to correctly identify fold families in the Skolnick40 and Proteus300 data sets is also established.
Collapse
Affiliation(s)
- Yosi Shibberu
- Department of Mathematics, Rose-Hulman Institute of Technology, 5500 Wabash Avenue, Terre Haute, IN 47803, USA.
| | | |
Collapse
|
8
|
Duarte JM, Sathyapriya R, Stehr H, Filippis I, Lappe M. Optimal contact definition for reconstruction of contact maps. BMC Bioinformatics 2010; 11:283. [PMID: 20507547 PMCID: PMC3583236 DOI: 10.1186/1471-2105-11-283] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Accepted: 05/27/2010] [Indexed: 11/23/2022] Open
Abstract
Background Contact maps have been extensively used as a simplified representation of protein structures. They capture most important features of a protein's fold, being preferred by a number of researchers for the description and study of protein structures. Inspired by the model's simplicity many groups have dedicated a considerable amount of effort towards contact prediction as a proxy for protein structure prediction. However a contact map's biological interest is subject to the availability of reliable methods for the 3-dimensional reconstruction of the structure. Results We use an implementation of the well-known distance geometry protocol to build realistic protein 3-dimensional models from contact maps, performing an extensive exploration of many of the parameters involved in the reconstruction process. We try to address the questions: a) to what accuracy does a contact map represent its corresponding 3D structure, b) what is the best contact map representation with regard to reconstructability and c) what is the effect of partial or inaccurate contact information on the 3D structure recovery. Our results suggest that contact maps derived from the application of a distance cutoff of 9 to 11Å around the Cβ atoms constitute the most accurate representation of the 3D structure. The reconstruction process does not provide a single solution to the problem but rather an ensemble of conformations that are within 2Å RMSD of the crystal structure and with lower values for the pairwise average ensemble RMSD. Interestingly it is still possible to recover a structure with partial contact information, although wrong contacts can lead to dramatic loss in reconstruction fidelity. Conclusions Thus contact maps represent a valid approximation to the structures with an accuracy comparable to that of experimental methods. The optimal contact definitions constitute key guidelines for methods based on contact maps such as structure prediction through contacts and structural alignments based on maximum contact map overlap.
Collapse
Affiliation(s)
- Jose M Duarte
- Max Planck Institute for Molecular Genetics, Ihnestr, Berlin, Germany.
| | | | | | | | | |
Collapse
|
9
|
|
10
|
Faísca PFN, Plaxco KW. Cooperativity and the origins of rapid, single-exponential kinetics in protein folding. Protein Sci 2006; 15:1608-18. [PMID: 16815915 PMCID: PMC2242573 DOI: 10.1110/ps.062180806] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The folding of naturally occurring, single-domain proteins is usually well described as a simple, single-exponential process lacking significant trapped states. Here we further explore the hypothesis that the smooth energy landscape this implies, and the rapid kinetics it engenders, arises due to the extraordinary thermodynamic cooperativity of protein folding. Studying Miyazawa-Jernigan lattice polymers, we find that, even under conditions where the folding energy landscape is relatively optimized (designed sequences folding at their temperature of maximum folding rate), the folding of protein-like heteropolymers is accelerated when their thermodynamic cooperativity is enhanced by enhancing the nonadditivity of their energy potentials. At lower temperatures, where kinetic traps presumably play a more significant role in defining folding rates, we observe still greater cooperativity-induced acceleration. Consistent with these observations, we find that the folding kinetics of our computational models more closely approximates single-exponential behavior as their cooperativity approaches optimal levels. These observations suggest that the rapid folding of naturally occurring proteins is, in part, a consequence of their remarkably cooperative folding.
Collapse
Affiliation(s)
- Patrícia F N Faísca
- Centro de Física Teórica e Computacional da Universidade de Lisboa, Portugal
| | | |
Collapse
|
11
|
Ishida T, Nakamura S, Shimizu K. Potential for assessing quality of protein structure based on contact number prediction. Proteins 2006; 64:940-7. [PMID: 16788993 DOI: 10.1002/prot.21047] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We developed a novel knowledge-based residue environment potential for assessing the quality of protein structures in protein structure prediction. The potential uses the contact number of residues in a protein structure and the absolute contact number of residues predicted from its amino acid sequence using a new prediction method based on a support vector regression (SVR). The contact number of an amino acid residue in a protein structure is defined by the number of residues around a given residue. First, the contact number of each residue is predicted using SVR from an amino acid sequence of a target protein. Then, the potential of the protein structure is calculated from the probability distribution of the native contact numbers corresponding to the predicted ones. The performance of this potential is compared with other score functions using decoy structures to identify both native structure from other structures and near-native structures from nonnative structures. This potential improves not only the ability to identify native structures from other structures but also the ability to discriminate near-native structures from nonnative structures.
Collapse
Affiliation(s)
- Takashi Ishida
- Department of Biotechnology, Graduate School of Agricultural and Life Sciences, The University of Tokyo, Tokyo, Japan.
| | | | | |
Collapse
|
12
|
Faísca PFN, Telo da Gama MM, Nunes A. The Gō model revisited: Native structure and the geometric coupling between local and long-range contacts. Proteins 2005; 60:712-22. [PMID: 16021621 DOI: 10.1002/prot.20521] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Monte Carlo simulations show that long-range interactions play a major role in determining the folding rates of 48-mer three-dimensional lattice polymers modeled by the Gō potential. For three target structures with different native geometries we found a sharp increase in the folding time when the relative contribution of the long-range interactions to the native state's energy is decreased from approximately 50% towards zero. However, the dispersion of the simulated folding times is strongly dependent on native geometry and Gō polymers folding to one of the target structures exhibits folding times spanning three orders of magnitude. We have also found that, depending on the target geometry, a strong geometric coupling may exist between local and long-range contacts, which means that, when this coupling exists, the formation of long-range contacts is forced by the previous formation of local contacts. The absence of a strong geometric coupling results in a kinetics that is more sensitive to the interaction energy parameters; in this case, the formation of local contacts is not capable of promoting the establishment of long-range ones when the latter are strongly penalized energetically and this results in longer folding times.
Collapse
Affiliation(s)
- Patrícia F N Faísca
- Centro de Física Teórica e Computacional da Universidade de Lisboa, Lisboa Codex, Portugal.
| | | | | |
Collapse
|
13
|
Kinjo AR, Horimoto K, Nishikawa K. Predicting absolute contact numbers of native protein structure from amino acid sequence. Proteins 2004; 58:158-65. [PMID: 15523668 DOI: 10.1002/prot.20300] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
14
|
Faisca PFN, Telo Da Gama MM, Ball RC. Folding and form: insights from lattice simulations. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2004; 69:051917. [PMID: 15244857 DOI: 10.1103/physreve.69.051917] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2003] [Revised: 02/05/2004] [Indexed: 05/24/2023]
Abstract
Monte Carlo simulations of a Miyazawa-Jernigan lattice-polymer model indicate that, depending on the native structure's geometry, the model exhibits two broad classes of folding mechanisms for two-state folders. Folding to native structures of low contact order is driven by backbone distance and is characterized by a progressive accumulation of structure towards the native fold. By contrast, folding to high contact order targets is dominated by intermediate stage contacts not present in the native fold, yielding a more cooperative folding process.
Collapse
Affiliation(s)
- P F N Faisca
- CFTC, Avenida Prof. Gama Pinto 2, 1649-003 Lisboa Codex, Portugal.
| | | | | |
Collapse
|
15
|
Kabakçioglu A, Kanter I, Vendruscolo M, Domany E. Statistical properties of contact vectors. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2002; 65:041904. [PMID: 12005870 DOI: 10.1103/physreve.65.041904] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/01/2001] [Indexed: 05/23/2023]
Abstract
We study the statistical properties of contact vectors, a construct to characterize a protein's structure. The contact vector of an N-residue protein is a list of N integers n(i), representing the number of residues in contact with residue i. We study analytically (at mean-field level) and numerically the amount of structural information contained in a contact vector. Analytical calculations reveal that a large variance in the contact numbers reduces the degeneracy of the mapping between contact vectors and structures. Exact enumeration for lengths up to N=16 on the three-dimensional cubic lattice indicates that the growth rate of number of contact vectors as a function of N is only 3% less than that for contact maps. In particular, for compact structures we present numerical evidence that, practically, each contact vector corresponds to only a handful of structures. We discuss how this information can be used for better structure prediction.
Collapse
Affiliation(s)
- A Kabakçioglu
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot 76100, Israel
| | | | | | | |
Collapse
|
16
|
|
17
|
Chelvanayagam G, Knecht L, Jenny T, Benner SA, Gonnet GH. A combinatorial distance-constraint approach to predicting protein tertiary models from known secondary structure. FOLDING & DESIGN 1998; 3:149-60. [PMID: 9562545 DOI: 10.1016/s1359-0278(98)00023-6] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
BACKGROUND Distance geometry methods allow protein structures to be constructed using a large number of distance constraints, which can be elucidated by experimental techniques such as NMR. New methods for gleaning tertiary structural information from multiple sequence alignments make it possible for distance constraints to be predicted from sequence information alone. The basic distance geometry method can thus be applied using these empirically derived distance constraints. Such an approach, which incorporates a novel combinatoric procedure, is reported here. RESULTS Given the correct sheet topology and disulfide formations, the fully automated procedure is generally able to construct native-like Calpha models for eight small beta-protein structures. When the sheet topology was unknown but disulfide connectivities were included, all sheet topologies were explored by the combinatorial procedure. Using a simple geometric evaluation scheme, models with the correct sheet topology were ranked first in four of the eight example cases, second in three examples and third in one example. If neither the sheet topology nor the disulfide connectivities were given a priori, all combinations of sheet topologies and disulfides were explored by the combinatorial procedure. The evaluation scheme ranked the correct topology within the top five folds for half the example cases. CONCLUSIONS The combinatorial procedure is a useful technique for identifying a limited number of low-resolution candidate folds for small, disulfide-rich, beta-protein structures. Better results are obtained, however, if correct disulfide connectivities are known in advance. Combinatorial distance constraints can be applied whenever there are a sufficiently small number of finite connectivities.
Collapse
Affiliation(s)
- G Chelvanayagam
- Computational Chemistry Group, Universitätstrasse 16, ETH Zentrum, Zürich, CH 8092, Switzerland.
| | | | | | | | | |
Collapse
|
18
|
Abstract
We introduce an energy function for contact maps of proteins. In addition to the standard term, that takes into account pair-wise interactions between amino acids, our potential contains a new hydrophobic energy term. Parameters of the energy function were obtained from a statistical analysis of the contact maps of known structures. The quality of our energy function was tested extensively in a variety of ways. In particular, fold recognition experiments revealed that for a fixed sequence the native map is identified correctly in an overwhelming majority of the cases tested. We succeeded in identifying the structure of some proteins that are known to pose difficulties for such tests (BPTI, spectrin, and cro-protein). In addition, many known pairs of homologous structures were correctly identified, even when the two sequences had relatively low sequence homology. We also introduced a dynamic Monte Carlo procedure in the space of contact maps, taking topological and polymeric constraints into account by restrictive dynamic rules. Various aspects of protein dynamics, including high-temperature melting and refolding, were simulated. Perspectives of application of the energy function and the method for structure checking and fold prediction are discussed.
Collapse
Affiliation(s)
- L Mirny
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel.
| | | |
Collapse
|
19
|
Lund O, Hansen J, Brunak S, Bohr J. Relationship between protein structure and geometrical constraints. Protein Sci 1996; 5:2217-25. [PMID: 8931140 PMCID: PMC2143282 DOI: 10.1002/pro.5560051108] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
We evaluate to what extent the structure of proteins can be deduced from incomplete knowledge of disulfide bridges, surface assignments, secondary structure assignments, and additional distance constraints. A cost function taking such constraints into account was used to obtain protein structures using a simple minimization algorithm. For small proteins, the approximate structure could be obtained using one additional distance constraint for each amino acid in the protein. We also studied the effect of using predicted secondary structure and surface assignments. The constraints used in this approach typically may be obtained from low-resolution experimental data. When using a cost function based on distances, half of the resulting structures will be mirrored, because the resulting structure and its mirror image will have the same cost. The secondary structure assignments were therefore divided into chirality constraints and distance constraints. Here we report that the problem of mirrored structures, in some cases, can be solved by using a chirality term in the cost function.
Collapse
Affiliation(s)
- O Lund
- Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.
| | | | | | | |
Collapse
|
20
|
Kikuchi T. Inter-C? atomic potentials derived from the statistics of average interresidue distances in proteins: Application to bovine pancreatic trypsin inhibitor. J Comput Chem 1996. [DOI: 10.1002/(sici)1096-987x(19960130)17:2<226::aid-jcc9>3.0.co;2-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
21
|
Abstract
A protein sequence with at lease 40% identity to a known structure can now be modelled automatically, with an accuracy approaching that o fa low-resolution X-ray structure or a medium-resolution nuclear magnetic resonance structure. In general, these models have goods stereochemistry and an overall structural accuracy that is as high as the similarity between the template and the actual structure being predicted. As a result, the number of sequences that can be modelled is an order of magnitude larger then the number of experimentally determined protein structures. In addition, evaluation techniques are available that can estimated errors in different regions of the model. Thus, the number of applications where homology modelling is proving useful is growing rapidly.
Collapse
Affiliation(s)
- A Sali
- The Rockefeller University, New York, USA
| |
Collapse
|
22
|
Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol 1995; 30:1-94. [PMID: 7587278 DOI: 10.3109/10409239509085139] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
This review attempts a critical stock-taking of the current state of the science aimed at predicting structural features of proteins from their amino acid sequences. At the primary structure level, methods are considered for detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modifications and binding sites. The techniques involving secondary structural features include prediction of secondary structure, membrane-spanning regions, and secondary structural class. At the tertiary structural level, methods for threading a sequence into a mainchain fold, homology modeling and assigning sequences to protein families with similar folds are discussed. A literature analysis suggests that, to date, threading techniques are not able to show their superiority over sequence pattern recognition methods. Recent progress in the state of ab initio structure calculation is reviewed in detail. The analysis shows that many structural features can be predicted from the amino acid sequence much better than just a few years ago and with attendant utility in experimental research. Best prediction can be achieved for new protein sequences that can be assigned to well-studied protein families. For single sequences without homologues, the folding problem has not yet been solved.
Collapse
Affiliation(s)
- F Eisenhaber
- Institut für Biochemie der Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, Fed. Rep. Germany
| | | | | |
Collapse
|
23
|
Abstract
Although the 'structure from sequence' prediction problem remains fundamentally unsolved, new and promising methods in one, two and three dimensions have reopened the field. Significantly improved one-dimensional prediction of secondary structure from multiple sequence alignments is now in routine use. In the two-dimensional approach, inter-residue contacts can be detected by analysis of correlated mutations, albeit with low accuracy. Finally, three-dimensional methods, in which pseudopotentials or information values are derived from the databases, are proving their value for distinguishing between correct and incorrect models.
Collapse
Affiliation(s)
- B Rost
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | |
Collapse
|
24
|
Abstract
Through the comprehensive analysis of protein sequence and structural data, relationships can be established that suggest, with varying degrees of success, structural models for a protein for which only the sequence is known. The certainty with which a model can be proposed depends on the degree of similarity between the sequence of unknown structure and the sequence of a protein of known structure. Methods are being developed to detect remote similarities between sequences or structures, and to predict protein structure based on such small levels of similarity.
Collapse
Affiliation(s)
- W R Taylor
- Laboratory of Mathematical Biology, National Institute for Medical Research, London, UK
| |
Collapse
|
25
|
Abstract
The maintenance of protein function and structure constrains the evolution of amino acid sequences. This fact can be exploited to interpret correlated mutations observed in a sequence family as an indication of probable physical contact in three dimensions. Here we present a simple and general method to analyze correlations in mutational behavior between different positions in a multiple sequence alignment. We then use these correlations to predict contact maps for each of 11 protein families and compare the result with the contacts determined by crystallography. For the most strongly correlated residue pairs predicted to be in contact, the prediction accuracy ranges from 37 to 68% and the improvement ratio relative to a random prediction from 1.4 to 5.1. Predicted contact maps can be used as input for the calculation of protein tertiary structure, either from sequence information alone or in combination with experimental information.
Collapse
Affiliation(s)
- U Göbel
- Protein Design Group, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | |
Collapse
|
26
|
Abstract
Knowledge, both from the three-dimensional structures of homologous proteins and from the general analysis of protein structure, is of value in modeling a protein of known sequence but unknown structure. While many models are still constructed at least in part by manual methods on graphics devices, automated procedures have come into greater use. These procedures include those that assemble fragments of structure from other known structures and those that derive coordinates for the model from the satisfaction of restraints placed on atomic positions.
Collapse
Affiliation(s)
- M S Johnson
- Imperial Cancer Research Fund, Department of Crystallography, Birkbeck College, London
| | | | | | | |
Collapse
|