1
|
Pereira J, Simpkin AJ, Hartmann MD, Rigden DJ, Keegan RM, Lupas AN. High-accuracy protein structure prediction in CASP14. Proteins 2021; 89:1687-1699. [PMID: 34218458 DOI: 10.1002/prot.26171] [Citation(s) in RCA: 161] [Impact Index Per Article: 53.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 06/16/2021] [Accepted: 06/23/2021] [Indexed: 12/25/2022]
Abstract
The application of state-of-the-art deep-learning approaches to the protein modeling problem has expanded the "high-accuracy" category in CASP14 to encompass all targets. Building on the metrics used for high-accuracy assessment in previous CASPs, we evaluated the performance of all groups that submitted models for at least 10 targets across all difficulty classes, and judged the usefulness of those produced by AlphaFold2 (AF2) as molecular replacement search models with AMPLE. Driven by the qualitative diversity of the targets submitted to CASP, we also introduce DipDiff as a new measure for the improvement in backbone geometry provided by a model versus available templates. Although a large leap in high-accuracy is seen due to AF2, the second-best method in CASP14 out-performed the best in CASP13, illustrating the role of community-based benchmarking in the development and evolution of the protein structure prediction field.
Collapse
Affiliation(s)
- Joana Pereira
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Adam J Simpkin
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Marcus D Hartmann
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Daniel J Rigden
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Ronan M Keegan
- Department of Scientific Computing, Science and Technologies Facilities Council, UK Research and Innovation, Didcot, Oxfordshire, UK
| | - Andrei N Lupas
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| |
Collapse
|
2
|
Baldassarre F, Menéndez Hurtado D, Elofsson A, Azizpour H. GraphQA: protein model quality assessment using graph convolutional networks. Bioinformatics 2021; 37:360-366. [PMID: 32780838 PMCID: PMC8058777 DOI: 10.1093/bioinformatics/btaa714] [Citation(s) in RCA: 37] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2020] [Revised: 07/03/2020] [Accepted: 08/05/2020] [Indexed: 11/25/2022] Open
Abstract
Motivation Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein’s structure can be time-consuming, prohibitively expensive and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency. Results GraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated. Availability and implementation PyTorch implementation, datasets, experiments and link to an evaluation server are available through this GitHub repository: github.com/baldassarreFe/graphqa. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Federico Baldassarre
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
| | - David Menéndez Hurtado
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Arne Elofsson
- Department of Intelligent Systems, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden
- Department of Biochemistry and Biophysics, school of Electrical Engineering and Computer Science (EECS), Stockholm University, 10691 Stockholm, Sweden
| | - Hossein Azizpour
- Division of Robotics, Perception and Learning (RPL), KTH – Royal Institute of Technology, 10044 Stockholm, Sweden
- To whom correspondence should be addressed.
| |
Collapse
|
3
|
Abad-Zapatero C. Ligand efficiency indices for effective drug discovery: a unifying vector formulation. Expert Opin Drug Discov 2021; 16:763-775. [PMID: 33522838 DOI: 10.1080/17460441.2021.1884065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
INTRODUCTION The area of ligand efficiency indices (LEIs) in drug discovery has developed significantly since the initial publications nearly 20 years ago. A large number of different LEIs have been defined and applied with certain degrees of success and acceptance in the community. An overall view emphasizing more the common elements than the differences is needed. AREAS COVERED In this review, the author accentuates the numerical and algebraic relationships among the different LEIs and proposes the notion of 'ligand efficiency index' (LEI) as a vector variable comprising two interrelated components that provide 'direction' and 'distance' along the drug discovery process. The same concept had been suggested before relating to the graphical representation of the content of Structure-Activity Databases (SAR-Databases). EXPERT OPINION The extension of the concept of ligand efficiency from a scalar to a vector will help to unify the different formulations by emphasizing the relationship among the different variables. It should also provide an algebraically robust framework to critically assess the value of LEIs, and to incorporate them routinely in various workflows and protocols. Only cautious and rigorous testing by the community could provide a definitive proof of their possible value as reliable optimization variables in drug discovery.
Collapse
Affiliation(s)
- Celerino Abad-Zapatero
- Department of Pharmaceutical Sciences, Institute of Tuberculosis Research, Center for Biomolecular Sciences, University of Illinois at Chicago, Chicago, Illinois
| |
Collapse
|
4
|
Abstract
Codon usage depends on mutation bias, tRNA-mediated selection, and the need for high efficiency and accuracy in translation. One codon in a synonymous codon family is often strongly over-used, especially in highly expressed genes, which often leads to a high dN/dS ratio because dS is very small. Many different codon usage indices have been proposed to measure codon usage and codon adaptation. Sense codon could be misread by release factors and stop codons misread by tRNAs, which also contribute to codon usage in rare cases. This chapter outlines the conceptual framework on codon evolution, illustrates codon-specific and gene-specific codon usage indices, and presents their applications. A new index for codon adaptation that accounts for background mutation bias (Index of Translation Elongation) is presented and contrasted with codon adaptation index (CAI) which does not consider background mutation bias. They are used to re-analyze data from a recent paper claiming that translation elongation efficiency matters little in protein production. The reanalysis disproves the claim.
Collapse
|
5
|
A Critical Note on Symmetry Contact Artifacts and the Evaluation of the Quality of Homology Models. Symmetry (Basel) 2018. [DOI: 10.3390/sym10010025] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
6
|
Huwe PJ, Xu Q, Shapovalov MV, Modi V, Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 2016; 84 Suppl 1:370-91. [PMID: 27181425 DOI: 10.1002/prot.24997] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/10/2016] [Accepted: 01/18/2016] [Indexed: 12/26/2022]
Abstract
In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Peter J Huwe
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | | - Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Mark D Andrake
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
7
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 2016; 84 Suppl 1:4-14. [PMID: 27171127 DOI: 10.1002/prot.25064] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 04/29/2016] [Accepted: 05/08/2016] [Indexed: 12/15/2022]
Abstract
Modeling of protein structure from amino acid sequence now plays a major role in structural biology. Here we report new developments and progress from the CASP11 community experiment, assessing the state of the art in structure modeling. Notable points include the following: (1) New methods for predicting three dimensional contacts resulted in a few spectacular template free models in this CASP, whereas models based on sequence homology to proteins with experimental structure continue to be the most accurate. (2) Refinement of initial protein models, primarily using molecular dynamics related approaches, has now advanced to the point where the best methods can consistently (though slightly) improve nearly all models. (3) The use of relatively sparse NMR constraints dramatically improves the accuracy of models, and another type of sparse data, chemical crosslinking, introduced in this CASP, also shows promise for producing better models. (4) A new emphasis on modeling protein complexes, in collaboration with CAPRI, has produced interesting results, but also shows the need for more focus on this area. (5) Methods for estimating the accuracy of models have advanced to the point where they are of considerable practical use. (6) A first assessment demonstrates that models can sometimes successfully address biological questions that motivate experimental structure determination. (7) There is continuing progress in accuracy of modeling regions of structure not directly available by comparative modeling, while there is marginal or no progress in some other areas. Proteins 2016; 84(Suppl 1):4-14. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850.
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, 95616
| | | | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
8
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
9
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014; 82 Suppl 2:1-6. [PMID: 24344053 PMCID: PMC4394854 DOI: 10.1002/prot.24452] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/21/2013] [Indexed: 12/28/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland 20850
| | | | | | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
10
|
Manning T, Sleator RD, Walsh P. Biologically inspired intelligent decision making: a commentary on the use of artificial neural networks in bioinformatics. Bioengineered 2013; 5:80-95. [PMID: 24335433 PMCID: PMC4049912 DOI: 10.4161/bioe.26997] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Artificial neural networks (ANNs) are a class of powerful machine learning models for classification and function approximation which have analogs in nature. An ANN learns to map stimuli to responses through repeated evaluation of exemplars of the mapping. This learning approach results in networks which are recognized for their noise tolerance and ability to generalize meaningful responses for novel stimuli. It is these properties of ANNs which make them appealing for applications to bioinformatics problems where interpretation of data may not always be obvious, and where the domain knowledge required for deductive techniques is incomplete or can cause a combinatorial explosion of rules. In this paper, we provide an introduction to artificial neural network theory and review some interesting recent applications to bioinformatics problems.
Collapse
Affiliation(s)
- Timmy Manning
- Department of Computer Science; Cork Institute of Technology; Cork, Ireland
| | - Roy D Sleator
- Department of Biological Sciences; Cork Institute of Technology; Cork, Ireland
| | - Paul Walsh
- NSilico Ltd; Rubicon Innovation Centre; Cork, Ireland
| |
Collapse
|
11
|
Krupa P, Sieradzan AK, Rackovsky S, Baranowski M, Ołldziej S, Scheraga HA, Liwo A, Czaplewski C. Improvement of the treatment of loop structures in the UNRES force field by inclusion of coupling between backbone- and side-chain-local conformational states. J Chem Theory Comput 2013; 9. [PMID: 24273465 DOI: 10.1021/ct4004977] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The UNited RESidue (UNRES) coarse-grained model of polypeptide chains, developed in our laboratory, enables us to carry out millisecond-scale molecular-dynamics simulations of large proteins effectively. It performs well in ab initio predictions of protein structure, as demonstrated in the last Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP10). However, the resolution of the simulated structure is too coarse, especially in loop regions, which results from insufficient specificity of the model of local interactions. To improve the representation of local interactions, in this work we introduced new side-chain-backbone correlation potentials, derived from a statistical analysis of loop regions of 4585 proteins. To obtain sufficient statistics, we reduced the set of amino-acid-residue types to five groups, derived in our earlier work on structurally optimized reduced alphabets, based on a statistical analysis of the properties of amino-acid structures. The new correlation potentials are expressed as one-dimensional Fourier series in the virtual-bond-dihedral angles involving side-chain centroids. The weight of these new terms was determined by a trial-and-error method, in which Multiplexed Replica Exchange Molecular Dynamics (MREMD) simulations were run on selected test proteins. The best average root-mean-square deviations (RMSDs) of the calculated structures from the experimental structures below the folding-transition temperatures were obtained with the weight of the new side-chain-backbone correlation potentials equal to 0.57. The resulting conformational ensembles were analyzed in detail by using the Weighted Histogram Analysis Method (WHAM) and Ward's minimum-variance clustering. This analysis showed that the RMSDs from the experimental structures dropped by 0.5 Å on average, compared to simulations without the new terms, and the deviation of individual residues in the loop region of the computed structures from their counterparts in the experimental structures (after optimum superposition of the calculated and experimental structure) decreased by up to 8 Å. Consequently, the new terms improve the representation of local structure.
Collapse
Affiliation(s)
- Paweł Krupa
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland.,Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Adam K Sieradzan
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| | - S Rackovsky
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A.,Dept. of Pharmacology and Systems Therapeutics, The Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, U.S.A
| | - Maciej Baranowski
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Kładki 24, 80-922 Gdańsk, Poland
| | - Stanisław Ołldziej
- Intercollegiate Faculty of Biotechnology, University of Gdańsk and Medical University of Gdańsk, Kładki 24, 80-922 Gdańsk, Poland
| | - Harold A Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Adam Liwo
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| | - Cezary Czaplewski
- Faculty of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-952 Gdańsk, Poland
| |
Collapse
|
12
|
Soong TT, Hwang MJ, Chen CM. Discovery of Recurrent Structural Motifs for Approximating Three-Dimensional Protein Structures. J CHIN CHEM SOC-TAIP 2013. [DOI: 10.1002/jccs.200400164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
13
|
Dhingra P, Jayaram B. A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J Comput Chem 2013; 34:1925-36. [PMID: 23728619 DOI: 10.1002/jcc.23339] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 03/09/2013] [Accepted: 04/21/2013] [Indexed: 12/19/2022]
Abstract
One of the major challenges for protein tertiary structure prediction strategies is the quality of conformational sampling algorithms, which can effectively and readily search the protein fold space to generate near-native conformations. In an effort to advance the field by making the best use of available homology as well as fold recognition approaches along with ab initio folding methods, we have developed Bhageerath-H Strgen, a homology/ab initio hybrid algorithm for protein conformational sampling. The methodology is tested on the benchmark CASP9 dataset of 116 targets. In 93% of the cases, a structure with TM-score ≥ 0.5 is generated in the pool of decoys. Further, the performance of Bhageerath-H Strgen was seen to be efficient in comparison with different decoy generation methods. The algorithm is web enabled as Bhageerath-H Strgen web tool which is made freely accessible for protein decoy generation (http://www.scfbio-iitd.res.in/software/Bhageerath-HStrgen1.jsp).
Collapse
Affiliation(s)
- Priyanka Dhingra
- Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
| | | |
Collapse
|
14
|
Chitale M, Khan IK, Kihara D. In-depth performance evaluation of PFP and ESG sequence-based function prediction methods in CAFA 2011 experiment. BMC Bioinformatics 2013; 14 Suppl 3:S2. [PMID: 23514353 PMCID: PMC3584938 DOI: 10.1186/1471-2105-14-s3-s2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND Many Automatic Function Prediction (AFP) methods were developed to cope with an increasing growth of the number of gene sequences that are available from high throughput sequencing experiments. To support the development of AFP methods, it is essential to have community wide experiments for evaluating performance of existing AFP methods. Critical Assessment of Function Annotation (CAFA) is one such community experiment. The meeting of CAFA was held as a Special Interest Group (SIG) meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in 2011. Here, we perform a detailed analysis of two sequence-based function prediction methods, PFP and ESG, which were developed in our lab, using the predictions submitted to CAFA. RESULTS We evaluate PFP and ESG using four different measures in comparison with BLAST, Prior, and GOtcha. In addition to the predictions submitted to CAFA, we further investigate performance of a different scoring function to rank order predictions by PFP as well as PFP/ESG predictions enriched with Priors that simply adds frequently occurring Gene Ontology terms as a part of predictions. Prediction accuracies of each method were also evaluated separately for different functional categories. Successful and unsuccessful predictions by PFP and ESG are also discussed in comparison with BLAST. CONCLUSION The in-depth analysis discussed here will complement the overall assessment by the CAFA organizers. Since PFP and ESG are based on sequence database search results, our analyses are not only useful for PFP and ESG users but will also shed light on the relationship of the sequence similarity space and functions that can be inferred from the sequences.
Collapse
Affiliation(s)
- Meghana Chitale
- Department of Computer Science, Purdue University, 305 N, University Street, West Lafayette, Indiana 47907, USA
| | | | | |
Collapse
|
15
|
Bhattacharya D, Cheng J. 3Drefine: consistent protein structure refinement by optimizing hydrogen bonding network and atomic-level energy minimization. Proteins 2013; 81:119-31. [PMID: 22927229 PMCID: PMC3634918 DOI: 10.1002/prot.24167] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/26/2012] [Accepted: 08/17/2012] [Indexed: 12/27/2022]
Abstract
One of the major limitations of computational protein structure prediction is the deviation of predicted models from their experimentally derived true, native structures. The limitations often hinder the possibility of applying computational protein structure prediction methods in biochemical assignment and drug design that are very sensitive to structural details. Refinement of these low-resolution predicted models to high-resolution structures close to the native state, however, has proven to be extremely challenging. Thus, protein structure refinement remains a largely unsolved problem. Critical assessment of techniques for protein structure prediction (CASP) specifically indicated that most predictors participating in the refinement category still did not consistently improve model quality. Here, we propose a two-step refinement protocol, called 3Drefine, to consistently bring the initial model closer to the native structure. The first step is based on optimization of hydrogen bonding (HB) network and the second step applies atomic-level energy minimization on the optimized model using a composite physics and knowledge-based force fields. The approach has been evaluated on the CASP benchmark data and it exhibits consistent improvement over the initial structure in both global and local structural quality measures. 3Drefine method is also computationally inexpensive, consuming only few minutes of CPU time to refine a protein of typical length (300 residues). 3Drefine web server is freely available at http://sysbio.rnet.missouri.edu/3Drefine/.
Collapse
Affiliation(s)
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
- Informatics Institute, University of Missouri, Columbia, MO 65211, USA
- Bond Life Science Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
16
|
Corbeil CR, Williams CI, Labute P. Variability in docking success rates due to dataset preparation. J Comput Aided Mol Des 2012; 26:775-86. [PMID: 22566074 PMCID: PMC3397132 DOI: 10.1007/s10822-012-9570-1] [Citation(s) in RCA: 278] [Impact Index Per Article: 23.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2011] [Accepted: 04/03/2012] [Indexed: 01/22/2023]
Abstract
The results of cognate docking with the prepared Astex dataset provided by the organizers of the "Docking and Scoring: A Review of Docking Programs" session at the 241st ACS national meeting are presented. The MOE software with the newly developed GBVI/WSA dG scoring function is used throughout the study. For 80 % of the Astex targets, the MOE docker produces a top-scoring pose within 2 Å of the X-ray structure. For 91 % of the targets a pose within 2 Å of the X-ray structure is produced in the top 30 poses. Docking failures, defined as cases where the top scoring pose is greater than 2 Å from the experimental structure, are shown to be largely due to the absence of bound waters in the source dataset, highlighting the need to include these and other crucial information in future standardized sets. Docking success is shown to depend heavily on data preparation. A "dataset preparation" error of 0.5 kcal/mol is shown to cause fluctuations of over 20 % in docking success rates.
Collapse
Affiliation(s)
- Christopher R Corbeil
- Chemical Computing Group, Suite 910, 1010 Sherbrooke Street West, Montreal, QC, H3A 2R7, Canada.
| | | | | |
Collapse
|
17
|
Moult J, Fidelis K, Kryshtafovych A, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round IX. Proteins 2011; 79 Suppl 10:1-5. [PMID: 21997831 DOI: 10.1002/prot.23200] [Citation(s) in RCA: 177] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 09/12/2011] [Indexed: 12/16/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the ninth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Methods for modeling protein structure continue to advance, although at a more modest pace than in the early CASP experiments. CASP developments of note are indications of improvement in model accuracy for some classes of target, an improved ability to choose the most accurate of a set of generated models, and evidence of improvement in accuracy for short "new fold" models. In addition, a new analysis of regions of models not derivable from the most obvious template structure has revealed better performance than expected.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, MD 20850, USA.
| | | | | | | |
Collapse
|
18
|
Cheon S, Liang F. Folding small proteins via annealing stochastic approximation Monte Carlo. Biosystems 2011; 105:243-9. [DOI: 10.1016/j.biosystems.2011.05.015] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2010] [Revised: 05/22/2011] [Accepted: 05/26/2011] [Indexed: 11/26/2022]
|
19
|
Esque J, Oguey C, de Brevern AG. Comparative Analysis of Threshold and Tessellation Methods for Determining Protein Contacts. J Chem Inf Model 2011; 51:493-507. [DOI: 10.1021/ci100195t] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Jeremy Esque
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| | - Christophe Oguey
- LPTM, CNRS UMR 8089, Université de Cergy Pontoise, 2 av. Adolphe Chauvin, 95302 Cergy-Pontoise, France
| | - Alexandre G. de Brevern
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot, Paris 7, INTS, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France
| |
Collapse
|
20
|
Joo H, Qu X, Swanson R, McCallum CM, Tsai J. Fine grained sampling of residue characteristics using molecular dynamics simulation. Comput Biol Chem 2010; 34:172-83. [PMID: 20621565 DOI: 10.1016/j.compbiolchem.2010.06.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2010] [Revised: 06/11/2010] [Accepted: 06/11/2010] [Indexed: 11/19/2022]
Abstract
In a fine-grained computational analysis of protein structure, we investigated the relationships between a residue's backbone conformations and its side-chain packing as well as conformations. To produce continuous distributions in high resolution, we ran molecular dynamics simulations over a set of protein folds (dynameome). In effect, the dynameome dataset samples not only the states well represented in the PDB but also the known states that are not well represented in the structural database. In our analysis, we characterized the mutual influence among the backbone phi,psi angles with the first side-chain torsion angles (chi(1)) and the volumes occupied by the side-chains. The dependencies of these relationships on side-chain environment and amino acids are further explored. We found that residue volumes exhibit dependency on backbone 2 degrees structure conformation: side-chains pack more densely in extended beta-sheet than in alpha-helical structures. As expected, residue volumes on the protein surface were larger than those in the interior. The first side-chain torsion angles are found to be dependent on the backbone conformations in agreement with previous studies, but the dynameome dataset provides higher resolution of rotamer preferences based on the backbone conformation. All three gauche(-), gauche(+), and trans rotamers show different patterns of phi,psi dependency, and variations in chi(1) value are skewed from their canonical values to relieve the steric strains. By demonstrating the utility of dynameomic modeling on the native state ensemble, this study reveals details of the interplay among backbone conformations, residue volumes and side-chain conformations.
Collapse
Affiliation(s)
- Hyun Joo
- Chemistry Department, University of the Pacific, 3601 Pacific Avenue, Stockton, CA 95211, United States.
| | | | | | | | | |
Collapse
|
21
|
Tramontano A. Comparative modelling techniques: where are we? Comp Funct Genomics 2010; 4:402-5. [PMID: 18629085 PMCID: PMC2447371 DOI: 10.1002/cfg.306] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2003] [Revised: 06/02/2003] [Accepted: 06/03/2003] [Indexed: 11/21/2022] Open
Abstract
The enormous increase in data availability brought about by genomic projects is paralleled by an equally unprecedented increase in the expectations for new medical,
pharmacological, environmental and biotechnological discoveries. Whether or not we
will be able to meet (at least partially) these expectations will depend on how well
we will be able to interpret the data and translate the mono-dimensional information
encrypted in genomes into a detailed understanding of its biological meaning at the
phenotypic level. The process is far from being trivial, and the obstacles along the
road are formidable: even the problem of identifying coding regions in eukaryotic
genomes is not completely solved. Far more complex is identification of the function of
the encoded proteins, and this will probably represent the most challenging problem
for the next generations of scientists.
Collapse
Affiliation(s)
- Anna Tramontano
- Department of Biochemical Sciences A. Rossi Fanelli, University of Rome La Sapienza, Rome 00185, Italy.
| |
Collapse
|
22
|
Kanou K, Hirata T, Iwadate M, Terashi G, Umeyama H, Takeda-Shitaka M. HUMAN FAMSD-BASE: high quality protein structure model database for the human genome using the FAMSD homology modeling method. Chem Pharm Bull (Tokyo) 2010; 58:66-75. [PMID: 20045969 DOI: 10.1248/cpb.58.66] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Almost all proteins express their biological functions through the structural conformation of their specific amino acid sequences. Therefore, acquiring the three-dimensional structures of proteins is very important to elucidate the role of a particular protein. We had built protein structure model databases, which is called RIKEN FAMSBASE (http://famshelp.gsc.riken.jp/famsbase/). The RIKEN FAMSBASE is a genome-wide protein structure model database that contains a large number of protein models from many organisms. The HUMAN FAMSBASE that is one part of the RIKEN FAMSBASE contains many protein models for human genes, which are significant in the pharmaceutical and medicinal fields. We have now implemented an update of the human protein modeling database consisting of 242918 constructed models against the number of 20743 human protein sequences with an improved modeling method called Full Automatic protein Modeling System Developed (FAMSD). The results of our benchmark test of the FAMSD method indicated that it has an excellent capability to pack amino acid side-chains with correct torsion angles in addition to the main-chain, while avoiding the formation of atom-atom collisions that are not found in experimental structures. This new protein structure model database for human genes, which is named HUMAN FAMSD-BASE, is open to the public as a component part of the RIKEN FAMSBASE at http://mammalia.gsc.riken.jp/human_famsd/. A significant improvement of the HUMAN FAMSD-BASE in comparison with the preceding HUMAN FAMSBASE was verified in the benchmark test of this paper. The HUMAN FAMSD-BASE will have an important impact on the progress of biological science.
Collapse
|
23
|
Kanou K, Hirata T, Terashi G, Umeyama H, Takeda-Shitaka M. New protein structure model evaluation methods that include a side-chain consensus score for the protein modeling. Chem Pharm Bull (Tokyo) 2010; 58:180-90. [PMID: 20118576 DOI: 10.1248/cpb.58.180] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Selecting the best quality model from a set of predicted structures is one of the most important aspects of protein structure prediction. We have developed model quality assessment programs that select high quality models which account for both the Calpha backbone and side-chain atom positions. The new methods are based on the consensus method with consideration of the side-chain environment of a protein structure and the secondary structure agreement. This Side-chain Environment Consensus (SEC) method is compared with the conventional consensus method, 3D-Jury (Ginalski K. et al., Bioinformatics, 19, 1015-1018 (2003)), which takes into account only the Calpha backbone atoms of the protein model. As the result, it was found that the SEC method selects the models with more accurate positioning of the side-chain atoms than the 3D-Jury method. When the SEC method was used in combination with the 3D-Jury method (3DJ+SEC), models were selected with improved quality both in the Calpha backbone and side-chain atom positions. Moreover, the CIRCLE (CCL) method (Terashi G. et al., Proteins, 69 (Suppl. 8), 98-107 (2007)) based on the 3D-1D profile score has been shown to select the best possible models that are the closest to the native structures from candidate models. Accordingly, the 3DJ+SEC+CCL method, in which CIRCLE is used after reducing the number of candidates by the 3DJ+SEC consensus method, was found to be very effective in selecting high quality models. Thus, the best method (the 3DJ+SEC+CCL method) includes the consensus approaches of the Calpha backbone and the side-chains, the secondary structure agreement and the 3D-1D profile score which corresponds to the free energy-like score in the residues of the protein model. In short, new algorithms are introduced in protein structure evaluation methods that are based on a side-chain consensus score. Additionally, in order to apply the 3DJ+SEC+CCL method and indicate the usefulness of this method, a model of human Cabin1, a protein associated with p53 function and cancer, is created using various internet modeling and alignment servers.
Collapse
Affiliation(s)
- Kazuhiko Kanou
- School of Pharmacy, Kitasato University, 5-9-1 Shirokane, Minato-ku, Tokyo 108-8641, Japan
| | | | | | | | | |
Collapse
|
24
|
Gao X, Xu J, Li SC, Li M. Predicting local quality of a sequence-structure alignment. J Bioinform Comput Biol 2009; 7:789-810. [PMID: 19785046 DOI: 10.1142/s0219720009004345] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2009] [Revised: 04/06/2009] [Accepted: 04/07/2009] [Indexed: 11/18/2022]
Abstract
Although protein structure prediction has made great progress in recent years, a protein model derived from automated prediction methods is subject to various errors. As methods for structure prediction develop, a continuing problem is how to evaluate the quality of a protein model, especially to identify some well-predicted regions of the model, so that the structural biology community can benefit from the automated structure prediction. It is also important to identify badly-predicted regions in a model so that some refinement measurements can be applied to it. We present two complementary techniques, FragQA and PosQA, to accurately predict local quality of a sequence-structure (i.e. sequence-template) alignment generated by comparative modeling (i.e. homology modeling and threading). FragQA and PosQA predict local quality from two different perspectives. Different from existing methods, FragQA directly predicts cRMSD between a continuously aligned fragment determined by an alignment and the corresponding fragment in the native structure, while PosQA predicts the quality of an individual aligned position. Both FragQA and PosQA use an SVM (Support Vector Machine) regression method to perform prediction using similar information extracted from a single given alignment. Experimental results demonstrate that FragQA performs well on predicting local fragment quality, and PosQA outperforms two top-notch methods, ProQres and ProQprof. Our results indicate that (1) local quality can be predicted well; (2) local sequence evolutionary information (i.e. sequence similarity) is the major factor in predicting local quality; and (3) structural information such as solvent accessibility and secondary structure helps to improve the prediction performance.
Collapse
Affiliation(s)
- Xin Gao
- David R. Cheriton School of Computer Science, University of Waterloo, 200 University Avenue West, Waterloo, Ontario, N2L 3G1, Canada.
| | | | | | | |
Collapse
|
25
|
Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A. Critical assessment of methods of protein structure prediction-Round VIII. Proteins 2009; 77 Suppl 9:1-4. [DOI: 10.1002/prot.22589] [Citation(s) in RCA: 156] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
26
|
Fuchs A, Kirschner A, Frishman D. Prediction of helix-helix contacts and interacting helices in polytopic membrane proteins using neural networks. Proteins 2009; 74:857-71. [PMID: 18704938 DOI: 10.1002/prot.22194] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Despite rapidly increasing numbers of available 3D structures, membrane proteins still account for less than 1% of all structures in the Protein Data Bank. Recent high-resolution structures indicate a clearly broader structural diversity of membrane proteins than initially anticipated, motivating the development of reliable structure prediction methods specifically tailored for this class of molecules. One important prediction target capturing all major aspects of a protein's 3D structure is its contact map. Our analysis shows that computational methods trained to predict residue contacts in globular proteins perform poorly when applied to membrane proteins. We have recently published a method to identify interacting alpha-helices in membrane proteins based on the analysis of coevolving residues in predicted transmembrane regions. Here, we present a substantially improved algorithm for the same problem, which uses a newly developed neural network approach to predict helix-helix contacts. In addition to the input features commonly used for contact prediction of soluble proteins, such as windowed residue profiles and residue distance in the sequence, our network also incorporates features that apply to membrane proteins only, such as residue position within the transmembrane segment and its orientation toward the lipophilic environment. The obtained neural network can predict contacts between residues in transmembrane segments with nearly 26% accuracy. It is therefore the first published contact predictor developed specifically for membrane proteins performing with equal accuracy to state-of-the-art contact predictors available for soluble proteins. The predicted helix-helix contacts were employed in a second step to identify interacting helices. For our dataset consisting of 62 membrane proteins of solved structure, we gained an accuracy of 78.1%. Because the reliable prediction of helix interaction patterns is an important step in the classification and prediction of membrane protein folds, our method will be a helpful tool in compiling a structural census of membrane proteins.
Collapse
Affiliation(s)
- Angelika Fuchs
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85354 Freising, Germany
| | | | | |
Collapse
|
27
|
Swanson R, Vannucci M, Tsai JW. Information theory provides a comprehensive framework for the evaluation of protein structure predictions. Proteins 2009; 74:701-11. [PMID: 18704942 DOI: 10.1002/prot.22186] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Protein structure prediction has a number of important ad hoc similarity measures for evaluating predictions, but would benefit from a measure that is able to provide a common framework for a broad range of comparisons. Here we show that a mutual information-like measure can provide a comprehensive framework for evaluating protein structure prediction of all types. We discuss the concept of information, its application to secondary structure, and the obstacle to applying it to 3D structure. On the basis of the insights from the secondary structure case, we present an approach to work around the 3D difficulties, and develop a method to measure the mutual information provided by a 3D structure prediction. We integrate the evaluation of all types of protein structure prediction into a single framework, and compare the amount of information provided by various prediction methods, including secondary structure prediction. Within this broadened framework, the idea that structure is better preserved than sequence during evolution is evaluated quantitatively for the globin family. A nearly perfect sequence match in the globin family corresponds to about 300 bits of information, whereas a nearly perfect structural match for the same two proteins corresponds to about 2500 bits of information, where bits of information describes the probability of obtaining a match of similar closeness by chance. Mutual information provides both a theoretical basis for evaluating structure similarity and an explanatory surround for existing similarity measures.
Collapse
Affiliation(s)
- Rosemarie Swanson
- Department of Biochemistry and Biophysics, Texas A&M University, 2128 TAMU, College Station, Texas 77843, USA.
| | | | | |
Collapse
|
28
|
Brunette TJ, Brock O. Guiding conformation space search with an all-atom energy potential. Proteins 2008; 73:958-72. [PMID: 18536015 DOI: 10.1002/prot.22123] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. To alleviate this problem, we present model-based search, a novel conformation space search method. Model-based search uses highly accurate information obtained during search to build an approximate, partial model of the energy landscape. Model-based search aggregates information in the model as it progresses, and in turn uses this information to guide exploration toward regions most likely to contain a near-optimal minimum. We validate our method by predicting the structure of 32 proteins, ranging in length from 49 to 213 amino acids. Our results demonstrate that model-based search is more effective at finding low-energy conformations in high-dimensional conformation spaces than existing search methods. The reduction in energy translates into structure predictions of increased accuracy.
Collapse
Affiliation(s)
- T J Brunette
- Robotics and Biology Laboratory, Department of Computer Science, University of Massachusetts Amherst, Amherst, Massachusetts 01003-9264, USA
| | | |
Collapse
|
29
|
Swanson R, Kagiampakis I, Tsai JW. An information measure of the quality of protein secondary structure prediction. J Comput Biol 2008; 15:65-79. [PMID: 18199024 DOI: 10.1089/cmb.2007.0199] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We describe an information-theory-based measure of the quality of secondary structure prediction (RELINFO). RELINFO has a simple yet intuitive interpretation: it represents the factor by which secondary structure choice at a residue has been restricted by a prediction scheme. As an alternative interpretation of secondary structure prediction, RELINFO complements currently used methods by providing an information-based view as to why a prediction succeeds and fails. To demonstrate this score's capabilities, we applied RELINFO to an analysis of a large set of secondary structure predictions obtained from the first five rounds of the Critical Assessment of Structure Prediction (CASP) experiment. RELINFO is compared with two other common measures: percent correct (Q3) and secondary structure overlap (SOV). While the correlation between Q3 and RELINFO is approximately 0.85, RELINFO avoids certain disadvantages of Q3, including overestimating the quality of a prediction. The correlation between SOV and RELINFO is approximately 0.75. The valuable SOV measure unfortunately suffers from a saturation problem, and perhaps has unfairly given the general impression that secondary structure prediction has reached its limit since SOV hasn't improved much over the recent rounds of CASP. Although not a replacement for SOV, RELINFO has greater dispersion. Over the five rounds of CASP assessed here, RELINFO shows that predictions targets have been more difficult in successive CASP experiments, yet the predictions quality has continued to improve measurably over each round. In terms of information, the secondary structure prediction quality has almost doubled from CASP1 to CASP5. Therefore, as a different perspective of accuracy, RELINFO can help to improve prediction of protein secondary structure by providing a measure of difficulty as well as final quality of a prediction.
Collapse
Affiliation(s)
- Rosemarie Swanson
- Department of Biochemistry and Biophysics, Texas A&M University, Texas Agricultural Experiment Station, College Station, Texas 77843-2128, USA.
| | | | | |
Collapse
|
30
|
Katzman S, Barrett C, Thiltgen G, Karchin R, Karplus K. PREDICT-2ND: a tool for generalized protein local structure prediction. ACTA ACUST UNITED AC 2008; 24:2453-9. [PMID: 18757875 DOI: 10.1093/bioinformatics/btn438] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Predictions of protein local structure, derived from sequence alignment information alone, provide visualization tools for biologists to evaluate the importance of amino acid residue positions of interest in the absence of X-ray crystal/NMR structures or homology models. They are also useful as inputs to sequence analysis and modeling tools, such as hidden Markov models (HMMs), which can be used to search for homology in databases of known protein structure. In addition, local structure predictions can be used as a component of cost functions in genetic algorithms that predict protein tertiary structure. We have developed a program (predict-2nd) that trains multilayer neural networks and have applied it to numerous local structure alphabets, tuning network parameters such as the number of layers, the number of units in each layer and the window sizes of each layer. We have had the most success with four-layer networks, with gradually increasing window sizes at each layer. RESULTS Because the four-layer neural nets occasionally get trapped in poor local optima, our training protocol now uses many different random starts, with short training runs, followed by more training on the best performing networks from the short runs. One recent addition to the program is the option to add a guide sequence to the profile inputs, increasing the number of inputs per position by 20. We find that use of a guide sequence provides a small but consistent improvement in the predictions for several different local-structure alphabets. AVAILABILITY Local structure prediction with the methods described here is available for use online at http://www.soe.ucsc.edu/compbio/SAM_T08/T08-query.html. The source code and example networks for PREDICT-2ND are available at http://www.soe.ucsc.edu/~karplus/predict-2nd/ A required C++ library is available at http://www.soe.ucsc.edu/~karplus/ultimate/
Collapse
Affiliation(s)
- Sol Katzman
- Department of Biomolecular Engineering, University of California, Santa Cruz, CA 95064, USA
| | | | | | | | | |
Collapse
|
31
|
Ngan SC, Hung LH, Liu T, Samudrala R. Scoring functions for de novo protein structure prediction revisited. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:243-81. [PMID: 18075169 DOI: 10.1007/978-1-59745-574-9_10] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/08/2023]
Abstract
De novo protein structure prediction methods attempt to predict tertiary structures from sequences based on general principles that govern protein folding energetics and/or statistical tendencies of conformational features that native structures acquire, without the use of explicit templates. A general paradigm for de novo prediction involves sampling the conformational space, guided by scoring functions and other sequence-dependent biases, such that a large set of candidate ("decoy") structures are generated, and then selecting native-like conformations from those decoys using scoring functions as well as conformer clustering. High-resolution refinement is sometimes used as a final step to fine-tune native-like structures. There are two major classes of scoring functions. Physics-based functions are based on mathematical models describing aspects of the known physics of molecular interaction. Knowledge-based functions are formed with statistical models capturing aspects of the properties of native protein conformations. We discuss the implementation and use of some of the scoring functions from these two classes for de novo structure prediction in this chapter.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Department of Microbiology, University of Washington School of Medicine, Seattle, WA, USA
| | | | | | | |
Collapse
|
32
|
Martin J, de Brevern AG, Camproux AC. In silico local structure approach: a case study on outer membrane proteins. Proteins 2008; 71:92-109. [PMID: 17932925 DOI: 10.1002/prot.21659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results.
Collapse
Affiliation(s)
- Juliette Martin
- INSERM UMR-S 726/Université Denis Diderot Paris 7, Equipe de Bioinformatique Génomique et Moléculaire, F-75005 Paris
| | | | | |
Collapse
|
33
|
Moult J, Fidelis K, Kryshtafovych A, Rost B, Hubbard T, Tramontano A. Critical assessment of methods of protein structure prediction-Round VII. Proteins 2008; 69 Suppl 8:3-9. [PMID: 17918729 PMCID: PMC2653632 DOI: 10.1002/prot.21767] [Citation(s) in RCA: 177] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This paper is an introduction to the supplemental issue of the journal PROTEINS, dedicated to the seventh CASP experiment to assess the state of the art in protein structure prediction. The paper describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. Highlights are improvements in model accuracy relative to that obtainable from knowledge of a single best template structure; convergence of the accuracy of models produced by automatic servers toward that produced by human modeling teams; the emergence of methods for predicting the quality of models; and rapidly increasing practical applications of the methods.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.
| | | | | | | | | | | |
Collapse
|
34
|
Dong Q, Wang X, Lin L, Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins 2008; 72:163-72. [DOI: 10.1002/prot.21904] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
35
|
|
36
|
|
37
|
|
38
|
Moult J, Fidelis K, Rost B, Hubbard T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round 6. Proteins 2006; 61 Suppl 7:3-7. [PMID: 16187341 DOI: 10.1002/prot.20716] [Citation(s) in RCA: 136] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
This article is an introduction to the special issue of the journal Proteins, dedicated to the sixth CASP experiment to assess the state of the art in protein structure prediction. The article describes the conduct of the experiment and the categories of prediction included, and outlines the evaluation and assessment procedures. A brief summary of progress over the decade of CASP experiments is also provided.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.
| | | | | | | | | |
Collapse
|
39
|
Ngan SC, Inouye MT, Samudrala R. A knowledge-based scoring function based on residue triplets for protein structure prediction. Protein Eng Des Sel 2006; 19:187-93. [PMID: 16533801 PMCID: PMC5441915 DOI: 10.1093/protein/gzj018] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2005] [Revised: 12/30/2005] [Accepted: 01/09/2006] [Indexed: 11/29/2022] Open
Abstract
One of the general paradigms for ab initio protein structure prediction involves sampling the conformational space such that a large set of decoy (candidate) structures are generated and then selecting native-like conformations from those decoys using various scoring functions. In this study, based on a physical/geometric approach first suggested by Banavar and colleagues, we formulate a knowledge-based scoring function, which uses the radii of curvature formed among triplets of residues in a protein conformation. By analyzing its performance on various decoy sets, we determine a good set of parameters--the distance cutoff and the number of distance bins--to use for configuring such a function. Furthermore, we investigate the effect of using various approaches for compiling the prior distribution on the performance of the knowledge-based function. Possible extensions to the current form of the residue triplet scoring function are discussed.
Collapse
Affiliation(s)
- Shing-Chung Ngan
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Michael T. Inouye
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| | - Ram Samudrala
- Computational Genomics Group, Department of Microbiology, University of Washington School of Medicine, Seattle, WA 98195, USA
| |
Collapse
|
40
|
Zhang Y, Skolnick J. Scoring function for automated assessment of protein structure template quality. Proteins 2006; 57:702-10. [PMID: 15476259 DOI: 10.1002/prot.20264] [Citation(s) in RCA: 1291] [Impact Index Per Article: 71.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed a new scoring function, the template modeling score (TM-score), to assess the quality of protein structure templates and predicted full-length models by extending the approaches used in Global Distance Test (GDT)1 and MaxSub.2 First, a protein size-dependent scale is exploited to eliminate the inherent protein size dependence of the previous scores and appropriately account for random protein structure pairs. Second, rather than setting specific distance cutoffs and calculating only the fractions with errors below the cutoff, all residue pairs in alignment/modeling are evaluated in the proposed score. For comparison of various scoring functions, we have constructed a large-scale benchmark set of structure templates for 1489 small to medium size proteins using the threading program PROSPECTOR_3 and built the full-length models using MODELLER and TASSER. The TM-score of the initial threading alignments, compared to the GDT and MaxSub scoring functions, shows a much stronger correlation to the quality of the final full-length models. The TM-score is further exploited as an assessment of all 'new fold' targets in the recent CASP5 experiment and shows a close coincidence with the results of human-expert visual assessment. These data suggest that the TM-score is a useful complement to the fully automated assessment of protein structure predictions. The executable program of TM-score is freely downloadable at http://bioinformatics.buffalo.edu/TM-score.
Collapse
Affiliation(s)
- Yang Zhang
- Center of Excellence in Bioinformatics, University at Buffalo, Buffalo, New York 14203, USA
| | | |
Collapse
|
41
|
Reddy CS, Vijayasarathy K, Srinivas E, Sastry GM, Sastry GN. Homology modeling of membrane proteins: A critical assessment. Comput Biol Chem 2006; 30:120-6. [PMID: 16540373 DOI: 10.1016/j.compbiolchem.2005.12.002] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2005] [Revised: 11/10/2005] [Accepted: 12/14/2005] [Indexed: 11/22/2022]
Abstract
Evaluation and validation of homology modeling protocols are indispensable for membrane proteins as experimental determination of their three-dimensional structure is an arduous task. The prediction ability of Modeller, MOE, InsightII-Homology and Swiss-PdbViewer (SPV) with different sequence alignments CLUSTALW, BLAST and 3D-JIGSAW have been assessed. The sequence identity of the target and template was chosen to be in the range of 25-35%. Validation protocols to assess the structure, fold and stereochemical quality, are employed by comparing with experimental structures. Two different ranking schemes are suggested to evaluate the performance of each methodology based on the validation scores. While unambiguous preference for any given procedure did not surface, statistically Modeller and the sequence alignment technique, 3D-JIGSAW, gave best results amongst the chosen protocols. The present study helps in selecting the right protocols when modeling membrane proteins, which form a major class of drug targets.
Collapse
Affiliation(s)
- Ch Surendhar Reddy
- Molecular Modelling Group, Organic Chemical Sciences, Indian Institute of Chemical Technology, Tarnaka, Hyderabad 500007, India
| | | | | | | | | |
Collapse
|
42
|
Arunachalam J, Kanagasabai V, Gautham N. Protein structure prediction using mutually orthogonal Latin squares and a genetic algorithm. Biochem Biophys Res Commun 2006; 342:424-33. [PMID: 16487483 DOI: 10.1016/j.bbrc.2006.01.162] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Accepted: 01/31/2006] [Indexed: 11/29/2022]
Abstract
We combine a new, extremely fast technique to generate a library of low energy structures of an oligopeptide (by using mutually orthogonal Latin squares to sample its conformational space) with a genetic algorithm to predict protein structures. The protein sequence is divided into oligopeptides, and a structure library is generated for each. These libraries are used in a newly defined mutation operator that, together with variation, crossover, and diversity operators, is used in a modified genetic algorithm to make the prediction. Application to five small proteins has yielded near native structures.
Collapse
Affiliation(s)
- J Arunachalam
- Department of Crystallography and Biophysics, University of Madras, Chennai 600025, India
| | | | | |
Collapse
|
43
|
Ka C, Le Gac G, Dupradeau FY, Rochette J, Férec C. The Q283P amino-acid change in HFE leads to structural and functional consequences similar to those described for the mutated 282Y HFE protein. Hum Genet 2005; 117:467-75. [PMID: 15965644 DOI: 10.1007/s00439-005-1307-y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2005] [Accepted: 03/12/2005] [Indexed: 12/23/2022]
Abstract
In Caucasians, 4-35% of hemochromatosis patients carry at least one chromosome without a common HFE mutation (i.e. C282Y, H63D and S65C). Several studies have now shown that iron overload phenotypes in such patients can be associated with uncommon HFE mutations. We previously supported implication of the C282Y/Q283P compound heterozygous genotype in hemochromatosis phenotypes and, based on molecular dynamics simulations, proposed that the Q283P substitution prevents normal folding of the HFE alpha3-domain. In the current work, we have used HeLa cells carrying wild-type or Q283P-mutant HFE cDNA under the control of a tetracycline-sensitive promoter to functionally characterise the Q283P mutation. Experiments using cells over-expressing wild-type HFE confirm the existence of beta2microglobulin(beta2m)/HFE and HFE/transferrin receptor 1 (TfR1) interactions, as well as the capacity of HFE to reduce transferrin-mediated iron uptake. In contrast, neither beta2m/HFE nor HFE/TfR1 complex formation was detected in cells over-expressing the mutated form of HFE. Moreover, the 283P HFE protein was found to have a very limited effect on the major cellular iron uptake pathway. Combined, our results indicate that the Q283P mutation leads to structural and functional consequences similar to those described for the main hereditary hemochromatosis mutation. As a consequence, our study has implications for the screening of hemochromatosis patients that have one or two copies of HFE which lack the main mutations. It also highlights that protein structure prediction methods could be more generally used to better interpret relationships between rare genotypes and molecular diagnosis of a human inherited disorder.
Collapse
|
44
|
Sharp JS, Guo JT, Uchiki T, Xu Y, Dealwis C, Hettich RL. Photochemical surface mapping of C14S-Sml1p for constrained computational modeling of protein structure. Anal Biochem 2005; 340:201-12. [PMID: 15840492 DOI: 10.1016/j.ab.2005.02.005] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2004] [Indexed: 11/29/2022]
Abstract
Photochemically generated hydroxyl radicals were used to map solvent-exposed regions in the C14S mutant of the protein Sml1p, a regulator of the ribonuclease reductase enzyme Rnr1p in Saccharomyces cerevisiae. By using high-performance mass spectrometry to characterize the oxidized peptides created by the hydroxyl radical reactions, amino acid solvent-accessibility data for native and denatured C14S Sml1p that revealed a solvent-excluding tertiary structure in the native state were obtained. The data on solvent accessibilities of various amino acids within the protein were then utilized to evaluate the de novo computational models generated by the HMMSTR/Rosetta server. The top five models initially generated by the server all disagreed with both published nuclear magnetic resonance (NMR) data and the solvent-accessibility data obtained in this study. A structural model adjusted to fit the previously reported NMR data satisfied most of the solvent-accessibility constraints. Through minor adjustment of the rotamers of two amino acid side chains for this latter structure, a model that not only provided a lower energy conformation but also completely satisfied previously reported data from NMR and tryptophan fluorescence measurements, in addition to the solvent-accessibility data presented here, was generated.
Collapse
Affiliation(s)
- Joshua S Sharp
- Graduate School of Genome Science and Technology, The University of Tennessee and Oak Ridge National Laboratory, 1060 Commerce Park, Oak Ridge, TN 37830-8026, USA
| | | | | | | | | | | |
Collapse
|
45
|
Liu HL, Hsu JP. Recent developments in structural proteomics for protein structure determination. Proteomics 2005; 5:2056-68. [PMID: 15846841 DOI: 10.1002/pmic.200401104] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.
Collapse
Affiliation(s)
- Hsuan-Liang Liu
- Department of Chemical Engineering, National Taipei University of Technology, Taiwan.
| | | |
Collapse
|
46
|
Nair R, Rost B. Mimicking Cellular Sorting Improves Prediction of Subcellular Localization. J Mol Biol 2005; 348:85-100. [PMID: 15808855 DOI: 10.1016/j.jmb.2005.02.025] [Citation(s) in RCA: 237] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2004] [Revised: 02/08/2005] [Accepted: 02/09/2005] [Indexed: 11/24/2022]
Abstract
Predicting the native subcellular compartment of a protein is an important step toward elucidating its function. Here we introduce LOCtree, a hierarchical system combining support vector machines (SVMs) and other prediction methods. LOCtree predicts the subcellular compartment of a protein by mimicking the mechanism of cellular sorting and exploiting a variety of sequence and predicted structural features in its input. Currently LOCtree does not predict localization for membrane proteins, since the compositional properties of membrane proteins significantly differ from those of non-membrane proteins. While any information about function can be used by the system, we present estimates of performance that are valid when only the amino acid sequence of a protein is known. When evaluated on a non-redundant test set, LOCtree achieved sustained levels of 74% accuracy for non-plant eukaryotes, 70% for plants, and 84% for prokaryotes. We rigorously benchmarked LOCtree in comparison to the best alternative methods for localization prediction. LOCtree outperformed all other methods in nearly all benchmarks. Localization assignments using LOCtree agreed quite well with data from recent large-scale experiments. Our preliminary analysis of a few entirely sequenced organisms, namely human (Homo sapiens), yeast (Saccharomyces cerevisiae), and weed (Arabidopsis thaliana) suggested that over 35% of all non-membrane proteins are nuclear, about 20% are retained in the cytosol, and that every fifth protein in the weed resides in the chloroplast.
Collapse
Affiliation(s)
- Rajesh Nair
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
| | | |
Collapse
|
47
|
Abstract
Energy functions are crucial ingredients of protein tertiary structure prediction methods. Assessing the quality of energy functions is therefore of prime importance. It requires the elaboration of a standard evaluation scheme, whose key elements are: i). sets that contain the native and several non-native structures of proteins (decoys) in order to test whether the energy functions display the expected quality features and ii). measures to evaluate the reliability of energy functions. We present here a survey of the recent advances in these two related fields. In a first part, we analyze and review the large number of decoy sets that are available on the web, and we summarize the characteristics of a challenging decoy set. We then discuss how to define the quality of energy functions and review the measures related to it.
Collapse
Affiliation(s)
- D Gilis
- Center of Applied Molecular Engineering, Institute of Chemistry and Biochemistry, University of Salzburg, Jakob Haringerstrabe 3, A-5020 Salzburg, Austria.
| |
Collapse
|
48
|
Lee J, Kim SY, Lee J. Protein structure prediction based on fragment assembly and parameter optimization. Biophys Chem 2005; 115:209-14. [PMID: 15752606 DOI: 10.1016/j.bpc.2004.12.046] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2004] [Revised: 11/09/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We propose a novel method for ab-initio prediction of protein tertiary structures based on the fragment assembly and global optimization. Fifteen residue long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. Then in order to enhance the performance of the prediction method, we optimize the linear parameters of the energy function, so that the native-like conformations become energetically more favorable than the non-native ones for proteins with known structures. We test the feasibility of the parameter optimization procedure by applying it to the training set consisting of three proteins: the 10-55 residue fragment of staphylococcal protein A (PDB ID 1bdd), a designed protein betanova, and 1fsd.
Collapse
Affiliation(s)
- Julian Lee
- Department of Bioinformatics and Life Science, Computer Aided Molecular Design Research Center, Bioinformatics and Molecular Design Technology Innovation Center, Soongsil University, Seoul 156-743, South Korea.
| | | | | |
Collapse
|
49
|
Lee J, Kim SY, Joo K, Kim I, Lee J. Prediction of protein tertiary structure using PROFESY, a novel method based on fragment assembly and conformational space annealing. Proteins 2004; 56:704-14. [PMID: 15281124 DOI: 10.1002/prot.20150] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A novel method for ab initio prediction of protein tertiary structures, PROFESY (PROFile Enumerating SYstem), is proposed. This method utilizes the secondary structure prediction information of a query sequence and the fragment assembly procedure based on global optimization. Fifteen-residue-long fragment libraries are constructed using the secondary structure prediction method PREDICT, and fragments in these libraries are assembled to generate full-length chains of a query protein. Tertiary structures of 50 to 100 conformations are obtained by minimizing an energy function for proteins, using the conformational space annealing method that enables one to sample diverse low-lying local minima of the energy. We apply PROFESY for benchmark tests to proteins with known structures to demonstrate its feasibility. In addition, we participated in CASP5 and applied PROFESY to four new-fold targets for blind prediction. The results are quite promising, despite the fact that PROFESY was in its early stages of development. In particular, PROFESY successfully provided us the best model-one structure for the target T0161.
Collapse
Affiliation(s)
- Julian Lee
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
| | | | | | | | | |
Collapse
|
50
|
Caprara A, Carr R, Istrail S, Lancia G, Walenz B. 1001 optimal PDB structure alignments: integer programming methods for finding the maximum contact map overlap. J Comput Biol 2004; 11:27-52. [PMID: 15072687 DOI: 10.1089/106652704773416876] [Citation(s) in RCA: 125] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure comparison is a fundamental problem for structural genomics, with applications to drug design, fold prediction, protein clustering, and evolutionary studies. Despite its importance, there are very few rigorous methods and widely accepted similarity measures known for this problem. In this paper we describe the last few years of developments on the study of an emerging measure, the contact map overlap (CMO), for protein structure comparison. A contact map is a list of pairs of residues which lie in three-dimensional proximity in the protein's native fold. Although this measure is in principle computationally hard to optimize, we show how it can in fact be computed with great accuracy for related proteins by integer linear programming techniques. These methods have the advantage of providing certificates of near-optimality by means of upper bounds to the optimal alignment value. We also illustrate effective heuristics, such as local search and genetic algorithms. We were able to obtain for the first time optimal alignments for large similar proteins (about 1,000 residues and 2,000 contacts) and used the CMO measure to cluster proteins in families. The clusters obtained were compared to SCOP classification in order to validate the measure. Extensive computational experiments showed that alignments which are off by at most 10% from the optimal value can be computed in a short time. Further experiments showed how this measure reacts to the choice of the threshold defining a contact and how to choose this threshold in a sensible way.
Collapse
Affiliation(s)
- Alberto Caprara
- D.E.I.S., Università di Bologna, Viale Risorgimento, 2 40136 Bologna, Italy
| | | | | | | | | |
Collapse
|