1
|
Bartuzi D, Kaczor AA, Matosiuk D. Illuminating the "Twilight Zone": Advances in Difficult Protein Modeling. Methods Mol Biol 2023; 2627:25-40. [PMID: 36959440 DOI: 10.1007/978-1-0716-2974-1_2] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
Homology modeling was long considered a method of choice in tertiary protein structure prediction. However, it used to provide models of acceptable quality only when templates with appreciable sequence identity with a target could be found. The threshold value was long assumed to be around 20-30%. Below this level, obtained sequence identity was getting dangerously close to values that can be obtained by chance, after aligning any random, unrelated sequences. In these cases, other approaches, including ab initio folding simulations or fragment assembly, were usually employed. The most recent editions of the CASP and CAMEO community-wide modeling methods assessment have brought some surprising outcomes, proving that much more clues can be inferred from protein sequence analyses than previously thought. In this chapter, we focus on recent advances in the field of difficult protein modeling, pushing the threshold deep into the "twilight zone", with particular attention devoted to improvements in applications of machine learning and model evaluation.
Collapse
Affiliation(s)
- Damian Bartuzi
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland.
| | - Agnieszka A Kaczor
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
- University of Eastern Finland, School of Pharmacy, Kuopio, Finland
| | - Dariusz Matosiuk
- Department of Synthesis and Chemical Technology of Pharmaceutical Substances with Computer Modelling Laboratory, Medical University of Lublin, Lublin, Poland
| |
Collapse
|
2
|
Monroe L, Kihara D. Using steered molecular dynamic tension for assessing quality of computational protein structure models. J Comput Chem 2022; 43:1140-1150. [PMID: 35475517 DOI: 10.1002/jcc.26876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/16/2022] [Accepted: 04/15/2022] [Indexed: 11/12/2022]
Abstract
The native structures of proteins, except for notable exceptions of intrinsically disordered proteins, in general take their most stable conformation in the physiological condition to maintain their structural framework so that their biological function can be properly carried out. Experimentally, the stability of a protein can be measured by several means, among which the pulling experiment using the atomic force microscope (AFM) stands as a unique method. AFM directly measures the resistance from unfolding, which can be quantified from the observed force-extension profile. It has been shown that key features observed in an AFM pulling experiment can be well reproduced by computational molecular dynamics simulations. Here, we applied computational pulling for estimating the accuracy of computational protein structure models under the hypothesis that the structural stability would positively correlated with the accuracy, i.e. the closeness to the native, of a model. We used in total 4929 structure models for 24 target proteins from the Critical Assessment of Techniques of Structure Prediction (CASP) and investigated if the magnitude of the break force, that is, the force required to rearrange the model's structure, from the force profile was sufficient information for selecting near-native models. We found that near-native models can be successfully selected by examining their break forces suggesting that high break force indeed indicates high stability of models. On the other hand, there were also near-native models that had relatively low peak forces. The mechanisms of the stability exhibited by the break forces were explored and discussed.
Collapse
Affiliation(s)
- Lyman Monroe
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana, USA.,Department of Computer Science, Purdue University, West Lafayette, Indiana, USA.,Purdue Center for Cancer Research, Purdue University, West Lafayette, Indiana, USA
| |
Collapse
|
3
|
Role of solvent accessibility for aggregation-prone patches in protein folding. Sci Rep 2018; 8:12896. [PMID: 30150761 PMCID: PMC6110721 DOI: 10.1038/s41598-018-31289-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 08/15/2018] [Indexed: 11/21/2022] Open
Abstract
The arrangement of amino acids in a protein sequence encodes its native folding. However, the same arrangement in aggregation-prone regions may cause misfolding as a result of local environmental stress. Under normal physiological conditions, such regions congregate in the protein’s interior to avoid aggregation and attain the native fold. We have used solvent accessibility of aggregation patches (SAAPp) to determine the packing of aggregation-prone residues. Our results showed that SAAPp has low values for native crystal structures, consistent with protein folding as a mechanism to minimize the solvent accessibility of aggregation-prone residues. SAAPp also shows an average correlation of 0.76 with the global distance test (GDT) score on CASP12 template-based protein models. Using SAAPp scores and five structural features, a random forest machine learning quality assessment tool, SAAP-QA, showed 2.32 average GDT loss between best model predicted and actual best based on GDT score on independent CASP test data, with the ability to discriminate native-like folds having an AUC of 0.94. Overall, the Pearson correlation coefficient (PCC) between true and predicted GDT scores on independent CASP data was 0.86 while on the external CAMEO dataset, comprising high quality protein structures, PCC and average GDT loss were 0.71 and 4.46 respectively. SAAP-QA can be used to detect the quality of models and iteratively improve them to native or near-native structures.
Collapse
|
4
|
Cao R, Adhikari B, Bhattacharya D, Sun M, Hou J, Cheng J. QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics 2017; 33:586-588. [PMID: 28035027 DOI: 10.1093/bioinformatics/btw694] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Accepted: 11/01/2016] [Indexed: 11/14/2022] Open
Abstract
Motivation Protein model quality assessment (QA) plays a very important role in protein structure prediction. It can be divided into two groups of methods: single model and consensus QA method. The consensus QA methods may fail when there is a large portion of low quality models in the model pool. Results In this paper, we develop a novel single-model quality assessment method QAcon utilizing structural features, physicochemical properties, and residue contact predictions. We apply residue-residue contact information predicted by two protein contact prediction methods PSICOV and DNcon to generate a new score as feature for quality assessment. This novel feature and other 11 features are used as input to train a two-layer neural network on CASP9 datasets to predict the quality of a single protein model. We blindly benchmarked our method QAcon on CASP11 dataset as the MULTICOM-CLUSTER server. Based on the evaluation, our method is ranked as one of the top single model QA methods. The good performance of the features based on contact prediction illustrates the value of using contact information in protein quality assessment. Availability and Implementation The web server and the source code of QAcon are freely available at: http://cactus.rnet.missouri.edu/QAcon. Contact chengji@missouri.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, WA 98447, USA
| | - Badri Adhikari
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS 67260-0083, USA
| | - Miao Sun
- Department of Electrical and Computer Engineering
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
5
|
Prediction of Local Quality of Protein Structure Models Considering Spatial Neighbors in Graphical Models. Sci Rep 2017; 7:40629. [PMID: 28074879 PMCID: PMC5225430 DOI: 10.1038/srep40629] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2016] [Accepted: 12/08/2016] [Indexed: 12/31/2022] Open
Abstract
Protein tertiary structure prediction methods have matured in recent years. However, some proteins defy accurate prediction due to factors such as inadequate template structures. While existing model quality assessment methods predict global model quality relatively well, there is substantial room for improvement in local quality assessment, i.e. assessment of the error at each residue position in a model. Local quality is a very important information for practical applications of structure models such as interpreting/designing site-directed mutagenesis of proteins. We have developed a novel local quality assessment method for protein tertiary structure models. The method, named Graph-based Model Quality assessment method (GMQ), explicitly considers the predicted quality of spatially neighboring residues using a graph representation of a query protein structure model. GMQ uses conditional random field as its core of the algorithm, and performs a binary prediction of the quality of each residue in a model, indicating if a residue position is likely to be within an error cutoff or not. The accuracy of GMQ was improved by considering larger graphs to include quality information of more surrounding residues. Moreover, we found that using different edge weights in graphs reflecting different secondary structures further improves the accuracy. GMQ showed competitive performance on a benchmark for quality assessment of structure models from the Critical Assessment of Techniques for Protein Structure Prediction (CASP).
Collapse
|
6
|
Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016; 17:495. [PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/01/2016] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Protein quality assessment (QA) useful for ranking and selecting protein models has long been viewed as one of the major challenges for protein tertiary structure prediction. Especially, estimating the quality of a single protein model, which is important for selecting a few good models out of a large model pool consisting of mostly low-quality models, is still a largely unsolved problem. RESULTS We introduce a novel single-model quality assessment method DeepQA based on deep belief network that utilizes a number of selected features describing the quality of a model from different perspectives, such as energy, physio-chemical characteristics, and structural information. The deep belief network is trained on several large datasets consisting of models from the Critical Assessment of Protein Structure Prediction (CASP) experiments, several publicly available datasets, and models generated by our in-house ab initio method. Our experiments demonstrate that deep belief network has better performance compared to Support Vector Machines and Neural Networks on the protein model quality assessment problem, and our method DeepQA achieves the state-of-the-art performance on CASP11 dataset. It also outperformed two well-established methods in selecting good outlier models from a large set of models of mostly low quality generated by ab initio modeling methods. CONCLUSION DeepQA is a useful deep learning tool for protein single model quality assessment and protein structure prediction. The source code, executable, document and training/test datasets of DeepQA for Linux is freely available to non-commercial users at http://cactus.rnet.missouri.edu/DeepQA/ .
Collapse
Affiliation(s)
- Renzhi Cao
- Department of Computer Science, Pacific Lutheran University, Tacoma, WA, 98447, USA
| | - Debswapna Bhattacharya
- Department of Electrical Engineering and Computer Science, Wichita State University, Wichita, KS, 67260, USA
| | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA. .,Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
7
|
Protein single-model quality assessment by feature-based probability density functions. Sci Rep 2016; 6:23990. [PMID: 27041353 PMCID: PMC4819172 DOI: 10.1038/srep23990] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 03/17/2016] [Indexed: 11/11/2022] Open
Abstract
Protein quality assessment (QA) has played an important role in protein structure prediction. We developed a novel single-model quality assessment method–Qprob. Qprob calculates the absolute error for each protein feature value against the true quality scores (i.e. GDT-TS scores) of protein structural models, and uses them to estimate its probability density distribution for quality assessment. Qprob has been blindly tested on the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) as MULTICOM-NOVEL server. The official CASP result shows that Qprob ranks as one of the top single-model QA methods. In addition, Qprob makes contributions to our protein tertiary structure predictor MULTICOM, which is officially ranked 3rd out of 143 predictors. The good performance shows that Qprob is good at assessing the quality of models of hard targets. These results demonstrate that this new probability density distribution based method is effective for protein single-model quality assessment and is useful for protein structure prediction. The webserver of Qprob is available at: http://calla.rnet.missouri.edu/qprob/. The software is now freely available in the web server of Qprob.
Collapse
|
8
|
Uziela K, Wallner B. ProQ2: estimation of model accuracy implemented in Rosetta. Bioinformatics 2016; 32:1411-3. [PMID: 26733453 PMCID: PMC4848402 DOI: 10.1093/bioinformatics/btv767] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2015] [Accepted: 12/23/2015] [Indexed: 11/24/2022] Open
Abstract
Motivation: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. Results: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy. Availability and implementation:https://github.com/bjornwallner/ProQ_scripts Contact:bjornw@ifm.liu.se Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Karolis Uziela
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83, Linköping, Sweden and Swedish e-Science Research Center, Linköping, Sweden
| |
Collapse
|
9
|
Dyrka W, Kurczyńska M, Konopka BM, Kotulska M. Fast assessment of structural models of ion channels based on their predicted current-voltage characteristics. Proteins 2015; 84:217-31. [PMID: 26650347 DOI: 10.1002/prot.24967] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Revised: 11/19/2015] [Accepted: 11/29/2015] [Indexed: 11/11/2022]
Abstract
Computational prediction of protein structures is a difficult task, which involves fast and accurate evaluation of candidate model structures. We propose to enhance single-model quality assessment with a functionality evaluation phase for proteins whose quantitative functional characteristics are known. In particular, this idea can be applied to evaluation of structural models of ion channels, whose main function - conducting ions - can be quantitatively measured with the patch-clamp technique providing the current-voltage characteristics. The study was performed on a set of KcsA channel models obtained from complete and incomplete contact maps. A fast continuous electrodiffusion model was used for calculating the current-voltage characteristics of structural models. We found that the computed charge selectivity and total current were sensitive to structural and electrostatic quality of models. In practical terms, we show that evaluating predicted conductance values is an appropriate method to eliminate models with an occluded pore or with multiple erroneously created pores. Moreover, filtering models on the basis of their predicted charge selectivity results in a substantial enrichment of the candidate set in highly accurate models. Tests on three other ion channels indicate that, in addition to being a proof of the concept, our function-oriented single-model quality assessment method can be directly applied to evaluation of structural models of some classes of protein channels. Finally, our work raises an important question whether a computational validation of functionality should be included in the evaluation process of structural models, whenever possible.
Collapse
Affiliation(s)
- Witold Dyrka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Monika Kurczyńska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Bogumił M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| | - Małgorzata Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Technology, Wybrzeze Wyspianskiego 27, Wroclaw, 50-370, Poland
| |
Collapse
|
10
|
Nguyen SP, Shang Y, Xu D. DL-PRO: A Novel Deep Learning Method for Protein Model Quality Assessment. PROCEEDINGS OF ... INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS. INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 2014; 2014:2071-2078. [PMID: 25392745 PMCID: PMC4226404 DOI: 10.1109/ijcnn.2014.6889891] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Computational protein structure prediction is very important for many applications in bioinformatics. In the process of predicting protein structures, it is essential to accurately assess the quality of generated models. Although many single-model quality assessment (QA) methods have been developed, their accuracy is not high enough for most real applications. In this paper, a new approach based on C-α atoms distance matrix and machine learning methods is proposed for single-model QA and the identification of native-like models. Different from existing energy/scoring functions and consensus approaches, this new approach is purely geometry based. Furthermore, a novel algorithm based on deep learning techniques, called DL-Pro, is proposed. For a protein model, DL-Pro uses its distance matrix that contains pairwise distances between two residues' C-α atoms in the model, which sometimes is also called contact map, as an orientation-independent representation. From training examples of distance matrices corresponding to good and bad models, DL-Pro learns a stacked autoencoder network as a classifier. In experiments on selected targets from the Critical Assessment of Structure Prediction (CASP) competition, DL-Pro obtained promising results, outperforming state-of-the-art energy/scoring functions, including OPUS-CA, DOPE, DFIRE, and RW.
Collapse
Affiliation(s)
- Son P. Nguyen
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Yi Shang
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Dong Xu
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Christopher S. Bond Life Science Center, University of Missouri at Columbia
| |
Collapse
|
11
|
Chen Y, Shang Y, Xu D. Multi-Dimensional Scaling and MODELLER-Based Evolutionary Algorithms for Protein Model Refinement. PROCEEDINGS OF THE ... CONGRESS ON EVOLUTIONARY COMPUTATION. CONGRESS ON EVOLUTIONARY COMPUTATION 2014; 2014:1038-1045. [PMID: 25844403 PMCID: PMC4380876 DOI: 10.1109/cec.2014.6900443] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Protein structure prediction, i.e., computationally predicting the three-dimensional structure of a protein from its primary sequence, is one of the most important and challenging problems in bioinformatics. Model refinement is a key step in the prediction process, where improved structures are constructed based on a pool of initially generated models. Since the refinement category was added to the biennial Critical Assessment of Structure Prediction (CASP) in 2008, CASP results show that it is a challenge for existing model refinement methods to improve model quality consistently. This paper presents three evolutionary algorithms for protein model refinement, in which multidimensional scaling(MDS), the MODELLER software, and a hybrid of both are used as crossover operators, respectively. The MDS-based method takes a purely geometrical approach and generates a child model by combining the contact maps of multiple parents. The MODELLER-based method takes a statistical and energy minimization approach, and uses the remodeling module in MODELLER program to generate new models from multiple parents. The hybrid method first generates models using the MDS-based method and then run them through the MODELLER-based method, aiming at combining the strength of both. Promising results have been obtained in experiments using CASP datasets. The MDS-based method improved the best of a pool of predicted models in terms of the global distance test score (GDT-TS) in 9 out of 16test targets.
Collapse
Affiliation(s)
- Yan Chen
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| | - Yi Shang
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| | - Dong Xu
- Yan Chen, Yi Shang, and Dong Xu are with the Department of Computer Science, University of Missouri, Columbia, MO 65211 USA. Dong Xu is also with the Christopher S. Bond Life Science Center, University of Missouri. (, , and )
| |
Collapse
|
12
|
Cao R, Wang Z, Cheng J. Designing and evaluating the MULTICOM protein local and global model quality prediction methods in the CASP10 experiment. BMC STRUCTURAL BIOLOGY 2014; 14:13. [PMID: 24731387 PMCID: PMC3996498 DOI: 10.1186/1472-6807-14-13] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/01/2014] [Indexed: 11/10/2022]
Abstract
BACKGROUND Protein model quality assessment is an essential component of generating and using protein structural models. During the Tenth Critical Assessment of Techniques for Protein Structure Prediction (CASP10), we developed and tested four automated methods (MULTICOM-REFINE, MULTICOM-CLUSTER, MULTICOM-NOVEL, and MULTICOM-CONSTRUCT) that predicted both local and global quality of protein structural models. RESULTS MULTICOM-REFINE was a clustering approach that used the average pairwise structural similarity between models to measure the global quality and the average Euclidean distance between a model and several top ranked models to measure the local quality. MULTICOM-CLUSTER and MULTICOM-NOVEL were two new support vector machine-based methods of predicting both the local and global quality of a single protein model. MULTICOM-CONSTRUCT was a new weighted pairwise model comparison (clustering) method that used the weighted average similarity between models in a pool to measure the global model quality. Our experiments showed that the pairwise model assessment methods worked better when a large portion of models in the pool were of good quality, whereas single-model quality assessment methods performed better on some hard targets when only a small portion of models in the pool were of reasonable quality. CONCLUSIONS Since digging out a few good models from a large pool of low-quality models is a major challenge in protein structure prediction, single model quality assessment methods appear to be poised to make important contributions to protein structure modeling. The other interesting finding was that single-model quality assessment scores could be used to weight the models by the consensus pairwise model comparison method to improve its accuracy.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Computer Science Department, University of Missouri, Columbia, Missouri 65211, USA.
| |
Collapse
|
13
|
Kryshtafovych A, Barbato A, Fidelis K, Monastyrskyy B, Schwede T, Tramontano A. Assessment of the assessment: evaluation of the model quality estimates in CASP10. Proteins 2014; 82 Suppl 2:112-26. [PMID: 23780644 PMCID: PMC4406045 DOI: 10.1002/prot.24347] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2013] [Revised: 05/31/2013] [Accepted: 06/06/2013] [Indexed: 11/10/2022]
Abstract
The article presents an assessment of the ability of the thirty-seven model quality assessment (MQA) methods participating in CASP10 to provide an a priori estimation of the quality of structural models, and of the 67 tertiary structure prediction groups to provide confidence estimates for their predicted coordinates. The assessment of MQA predictors is based on the methods used in previous CASPs, such as correlation between the predicted and observed quality of the models (both at the global and local levels), accuracy of methods in distinguishing between good and bad models as well as good and bad regions within them, and ability to identify the best models in the decoy sets. Several numerical evaluations were used in our analysis for the first time, such as comparison of global and local quality predictors with reference (baseline) predictors and a ROC analysis of the predictors' ability to differentiate between the well and poorly modeled regions. For the evaluation of the reliability of self-assessment of the coordinate errors, we used the correlation between the predicted and observed deviations of the coordinates and a ROC analysis of correctly identified errors in the models. A modified two-stage procedure for testing MQA methods in CASP10 whereby a small number of models spanning the whole range of model accuracy was released first followed by the release of a larger number of models of more uniform quality, allowed a more thorough analysis of abilities and inabilities of different types of methods. Clustering methods were shown to have an advantage over the single- and quasi-single- model methods on the larger datasets. At the same time, the evaluation revealed that the size of the dataset has smaller influence on the global quality assessment scores (for both clustering and nonclustering methods), than its diversity. Narrowing the quality range of the assessed models caused significant decrease in accuracy of ranking for global quality predictors but essentially did not change the results for local predictors. Self-assessment error estimates submitted by the majority of groups were poor overall, with two research groups showing significantly better results than the remaining ones.
Collapse
Affiliation(s)
| | - Alessandro Barbato
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 95616 California, USA
| | | | - Torsten Schwede
- Biozentrum, University of Basel, 4056 Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, 4056 Basel, Switzerland
| | - Anna Tramontano
- Department of Physics, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
14
|
Roy A, Perez A, Dill KA, Maccallum JL. Computing the relative stabilities and the per-residue components in protein conformational changes. Structure 2014; 22:168-75. [PMID: 24316402 PMCID: PMC3905753 DOI: 10.1016/j.str.2013.10.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 10/18/2013] [Accepted: 10/21/2013] [Indexed: 11/19/2022]
Abstract
Protein molecules often undergo conformational changes. In order to gain insights into the forces that drive such changes, it would be useful to have a method that computes the per-residue contributions to the conversion free energy. Here, we describe the "confine-convert-release" (CCR) method, which is applicable to large conformational changes. We show that CCR correctly predicts the stable states of several "chameleon" sequences that have previously been challenging for molecular simulations. CCR can often discriminate better from worse predictions of native protein models in critical assessment of protein structure prediction (CASP). We show how the total conversion free energies can be parsed into per-residue free-energy components. Such parsing gives insights into which amino acids are most responsible for given transformations. For example, here we are able to "reverse-engineer" the known design principles of the chameleon proteins. This opens up the possibility for systematic improvements in structure-prediction scoring functions, in the design of protein conformational switches, and in interpreting protein mechanisms at the amino-acid level.
Collapse
Affiliation(s)
- Arijit Roy
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Alberto Perez
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA
| | - Ken A Dill
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA; Department of Physics, Stony Brook University, Stony Brook, NY 11794, USA; Department of Chemistry, Stony Brook University, Stony Brook, NY 11794, USA.
| | - Justin L Maccallum
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, USA
| |
Collapse
|
15
|
Terashi G, Nakamura Y, Shimoyama H, Takeda-Shitaka M. Quality Assessment Methods for 3D Protein Structure Models Based on a Residue–Residue Distance Matrix Prediction. Chem Pharm Bull (Tokyo) 2014; 62:744-53. [PMID: 25087626 DOI: 10.1248/cpb.c13-00973] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
16
|
Roche DB, Buenavista MT, McGuffin LJ. Assessing the quality of modelled 3D protein structures using the ModFOLD server. Methods Mol Biol 2014; 1137:83-103. [PMID: 24573476 DOI: 10.1007/978-1-4939-0366-5_7] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Model quality assessment programs (MQAPs) aim to assess the quality of modelled 3D protein structures. The provision of quality scores, describing both global and local (per-residue) accuracy are extremely important, as without quality scores we are unable to determine the usefulness of a 3D model for further computational and experimental wet lab studies.Here, we briefly discuss protein tertiary structure prediction, along with the biennial Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition and their key role in driving the field of protein model quality assessment methods (MQAPs). We also briefly discuss the top MQAPs from the previous CASP competitions. Additionally, we describe our downloadable and webserver-based model quality assessment methods: ModFOLD3, ModFOLDclust, ModFOLDclustQ, ModFOLDclust2, and IntFOLD-QA. We provide a practical step-by-step guide on using our downloadable and webserver-based tools and include examples of their application for improving tertiary structure prediction, ligand binding site residue prediction, and oligomer predictions.
Collapse
Affiliation(s)
- Daniel Barry Roche
- Genoscope, Institut de Génomique, Commissariat à l'Energie Atomique et aux Energies Alternatives, Evry, France
| | | | | |
Collapse
|
17
|
He Z, Alazmi M, Zhang J, Xu D. Protein structural model selection by combining consensus and single scoring methods. PLoS One 2013; 8:e74006. [PMID: 24023923 PMCID: PMC3759460 DOI: 10.1371/journal.pone.0074006] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2013] [Accepted: 07/26/2013] [Indexed: 01/28/2023] Open
Abstract
Quality assessment (QA) for predicted protein structural models is an important and challenging research problem in protein structure prediction. Consensus Global Distance Test (CGDT) methods assess each decoy (predicted structural model) based on its structural similarity to all others in a decoy set and has been proved to work well when good decoys are in a majority cluster. Scoring functions evaluate each single decoy based on its structural properties. Both methods have their merits and limitations. In this paper, we present a novel method called PWCom, which consists of two neural networks sequentially to combine CGDT and single model scoring methods such as RW, DDFire and OPUS-Ca. Specifically, for every pair of decoys, the difference of the corresponding feature vectors is input to the first neural network which enables one to predict whether the decoy-pair are significantly different in terms of their GDT scores to the native. If yes, the second neural network is used to decide which one of the two is closer to the native structure. The quality score for each decoy in the pool is based on the number of winning times during the pairwise comparisons. Test results on three benchmark datasets from different model generation methods showed that PWCom significantly improves over consensus GDT and single scoring methods. The QA server (MUFOLD-Server) applying this method in CASP 10 QA category was ranked the second place in terms of Pearson and Spearman correlation performance.
Collapse
Affiliation(s)
- Zhiquan He
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Meshari Alazmi
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Jingfen Zhang
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
| | - Dong Xu
- Department of Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Missouri, United States of America
- * E-mail:
| |
Collapse
|
18
|
Skwark MJ, Elofsson A. PconsD: ultra rapid, accurate model quality assessment for protein structure prediction. ACTA ACUST UNITED AC 2013; 29:1817-8. [PMID: 23677942 DOI: 10.1093/bioinformatics/btt272] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
SUMMARY Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models, the computational cost of the model comparison can become significant. Here, we present PconsD, a fast, stream-computing method for distance-driven model quality assessment that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy. AVAILABILITY The source code for PconsD is freely available at http://d.pcons.net/. Supplementary benchmarking data are also available there. CONTACT arne@bioinfo.se SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Marcin J Skwark
- Department of Biochemistry and Biophysics, Science for Life Laboratory, Swedish E-Science Research Center, Stockholm University, Box 1031, 17121 Solna, Sweden
| | | |
Collapse
|
19
|
Li J, Deng X, Eickholt J, Cheng J. Designing and benchmarking the MULTICOM protein structure prediction system. BMC STRUCTURAL BIOLOGY 2013; 13:2. [PMID: 23442819 PMCID: PMC3599124 DOI: 10.1186/1472-6807-13-2] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/16/2012] [Accepted: 02/21/2013] [Indexed: 11/19/2022]
Abstract
Background Predicting protein structure from sequence is one of the most significant and challenging problems in bioinformatics. Numerous bioinformatics techniques and tools have been developed to tackle almost every aspect of protein structure prediction ranging from structural feature prediction, template identification and query-template alignment to structure sampling, model quality assessment, and model refinement. How to synergistically select, integrate and improve the strengths of the complementary techniques at each prediction stage and build a high-performance system is becoming a critical issue for constructing a successful, competitive protein structure predictor. Results Over the past several years, we have constructed a standalone protein structure prediction system MULTICOM that combines multiple sources of information and complementary methods at all five stages of the protein structure prediction process including template identification, template combination, model generation, model assessment, and model refinement. The system was blindly tested during the ninth Critical Assessment of Techniques for Protein Structure Prediction (CASP9) in 2010 and yielded very good performance. In addition to studying the overall performance on the CASP9 benchmark, we thoroughly investigated the performance and contributions of each component at each stage of prediction. Conclusions Our comprehensive and comparative study not only provides useful and practical insights about how to select, improve, and integrate complementary methods to build a cutting-edge protein structure prediction system but also identifies a few new sources of information that may help improve the design of a protein structure prediction system. Several components used in the MULTICOM system are available at: http://sysbio.rnet.missouri.edu/multicom_toolbox/.
Collapse
Affiliation(s)
- Jilong Li
- Computer Science Department, University of Missouri, Columbia, MO, USA
| | | | | | | |
Collapse
|
20
|
Terashi G, Oosawa M, Nakamura Y, Kanou K, Takeda-Shitaka M. United3D: A Protein Model Quality Assessment Program That Uses Two Consensus Based Methods. Chem Pharm Bull (Tokyo) 2012; 60:1359-65. [DOI: 10.1248/cpb.c12-00287] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|