Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Rykunov D, Fiser A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins 2007;67:559-68. [PMID: 17335003 DOI: 10.1002/prot.21279] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

For:	Rykunov D, Fiser A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins 2007;67:559-68. [PMID: 17335003 DOI: 10.1002/prot.21279] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]

Number

Cited by Other Article(s)

Bernard C, Postic G, Ghannay S, Tahi F. RNAdvisor: a comprehensive benchmarking tool for the measure and prediction of RNA structural model quality. Brief Bioinform 2024;25:bbae064. [PMID: 38436560 PMCID: PMC10939302 DOI: 10.1093/bib/bbae064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/30/2024] [Accepted: 02/02/2024] [Indexed: 03/05/2024] Open

Conti S, Ovchinnikov V, Karplus M. ppdx: Automated modeling of protein-protein interaction descriptors for use with machine learning. J Comput Chem 2022;43:1747-1757. [PMID: 35930347 DOI: 10.1002/jcc.26974] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 07/01/2022] [Accepted: 07/13/2022] [Indexed: 11/07/2022]

On the Rapid Calculation of Binding Affinities for Antigen and Antibody Design and Affinity Maturation Simulations. Antibodies (Basel) 2022;11:antib11030051. [PMID: 35997345 PMCID: PMC9397028 DOI: 10.3390/antib11030051] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 07/23/2022] [Accepted: 08/01/2022] [Indexed: 02/05/2023] Open

Abstract The accurate and efficient calculation of protein-protein binding affinities is an essential component in antibody and antigen design and optimization, and in computer modeling of antibody affinity maturation. Such calculations remain challenging despite advances in computer hardware and algorithms, primarily because proteins are flexible molecules, and thus, require explicit or implicit incorporation of multiple conformational states into the computational procedure. The astronomical size of the amino acid sequence space further compounds the challenge by requiring predictions to be computed within a short time so that many sequence variants can be tested. In this study, we compare three classes of methods for antibody/antigen (Ab/Ag) binding affinity calculations: (i) a method that relies on the physical separation of the Ab/Ag complex in equilibrium molecular dynamics (MD) simulations, (ii) a collection of 18 scoring functions that act on an ensemble of structures created using homology modeling software, and (iii) methods based on the molecular mechanics-generalized Born surface area (MM-GBSA) energy decomposition, in which the individual contributions of the energy terms are scaled to optimize agreement with the experiment. When applied to a set of 49 antibody mutations in two Ab/HIV gp120 complexes, all of the methods are found to have modest accuracy, with the highest Pearson correlations reaching about 0.6. In particular, the most computationally intensive method, i.e., MD simulation, did not outperform several scoring functions. The optimized energy decomposition methods provided marginally higher accuracy, but at the expense of requiring experimental data for parametrization. Within each method class, we examined the effect of the number of independent computational replicates, i.e., modeled structures or reinitialized MD simulations, on the prediction accuracy. We suggest using about ten modeled structures for scoring methods, and about five simulation replicates for MD simulations as a rule of thumb for obtaining reasonable convergence. We anticipate that our study will be a useful resource for practitioners working to incorporate binding affinity calculations within their protein design and optimization process. Collapse

Ye L, Wu P, Peng Z, Gao J, Liu J, Yang J. Improved estimation of model quality using predicted inter-residue distance. Bioinformatics 2021;37:3752-3759. [PMID: 34473228 DOI: 10.1093/bioinformatics/btab632] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2021] [Revised: 08/27/2021] [Accepted: 08/31/2021] [Indexed: 11/13/2022] Open

Abstract

MOTIVATION

Protein model quality assessment (QA) is an essential component in protein structure prediction, which aims to estimate the quality of a structure model and/or select the most accurate model out from a pool of structure models, without knowing the native structure. QA remains a challenging task in protein structure prediction.

RESULTS

Based on the inter-residue distance predicted by the recent deep learning-based structure prediction algorithm trRosetta, we developed QDistance, a new approach to the estimation of both global and local qualities. QDistance works for both single-model and multi-models inputs. We designed several distance-based features to assess the agreement between the predicted and model-derived inter-residue distances. Together with a few widely used features, they are fed into a simple yet powerful linear regression model to infer the global QA scores. The local QA scores for each structure model are predicted based on a comparative analysis with a set of selected reference models. For multi-models input, the reference models are selected from the input based on the predicted global QA scores. For single-model input, the reference models are predicted by trRosetta. With the informative distance-based features, QDistance can predict the global quality with satisfactory accuracy. Benchmark tests on the CASP13 and the CAMEO structure models suggested that QDistance was competitive other methods. Blind tests in the CASP14 experiments showed that QDistance was robust and ranked among the top predictors. Especially, QDistance was the top 3 local QA method and made the most accurate local QA prediction for unreliable local region. Analysis showed that this superior performance can be attributed to the inclusion of the predicted inter-residue distance.

AVAILABILITY AND IMPLEMENTATION

http://yanglab.nankai.edu.cn/QDistance.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Liu J, Wu T, Guo Z, Hou J, Cheng J. Improving protein tertiary structure prediction by deep learning and distance prediction in CASP14. Proteins 2021;90:58-72. [PMID: 34291486 PMCID: PMC8671168 DOI: 10.1002/prot.26186] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 06/21/2021] [Accepted: 07/12/2021] [Indexed: 12/15/2022]

Abstract

Substantial progresses in protein structure prediction have been made by utilizing deep‐learning and residue‐residue distance prediction since CASP13. Inspired by the advances, we improve our CASP14 MULTICOM protein structure prediction system by incorporating three new components: (a) a new deep learning‐based protein inter‐residue distance predictor to improve template‐free (ab initio) tertiary structure prediction, (b) an enhanced template‐based tertiary structure prediction method, and (c) distance‐based model quality assessment methods empowered by deep learning. In the 2020 CASP14 experiment, MULTICOM predictor was ranked seventh out of 146 predictors in tertiary structure prediction and ranked third out of 136 predictors in inter‐domain structure prediction. The results demonstrate that the template‐free modeling based on deep learning and residue‐residue distance prediction can predict the correct topology for almost all template‐based modeling targets and a majority of hard targets (template‐free targets or targets whose templates cannot be recognized), which is a significant improvement over the CASP13 MULTICOM predictor. Moreover, the template‐free modeling performs better than the template‐based modeling on not only hard targets but also the targets that have homologous templates. The performance of the template‐free modeling largely depends on the accuracy of distance prediction closely related to the quality of multiple sequence alignments. The structural model quality assessment works well on targets for which enough good models can be predicted, but it may perform poorly when only a few good models are predicted for a hard target and the distribution of model quality scores is highly skewed. MULTICOM is available at https://github.com/jianlin-cheng/MULTICOM_Human_CASP14/tree/CASP14_DeepRank3 and https://github.com/multicom-toolbox/multicom/tree/multicom_v2.0.

Collapse

Protein model accuracy estimation empowered by deep learning and inter-residue distance prediction in CASP14. Sci Rep 2021;11:10943. [PMID: 34035363 PMCID: PMC8149836 DOI: 10.1038/s41598-021-90303-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/10/2021] [Indexed: 11/28/2022] Open

Cheng J, Choe MH, Elofsson A, Han KS, Hou J, Maghrabi AHA, McGuffin LJ, Menéndez-Hurtado D, Olechnovič K, Schwede T, Studer G, Uziela K, Venclovas Č, Wallner B. Estimation of model accuracy in CASP13. Proteins 2019;87:1361-1377. [PMID: 31265154 DOI: 10.1002/prot.25767] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2019] [Revised: 06/04/2019] [Accepted: 06/15/2019] [Indexed: 12/28/2022]

Tan YL, Feng CJ, Jin L, Shi YZ, Zhang W, Tan ZJ. What is the best reference state for building statistical potentials in RNA 3D structure evaluation? RNA (NEW YORK, N.Y.) 2019;25:793-812. [PMID: 30996105 PMCID: PMC6573789 DOI: 10.1261/rna.069872.118] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2018] [Accepted: 04/06/2019] [Indexed: 05/14/2023]

Hou J, Wu T, Cao R, Cheng J. Protein tertiary structure modeling driven by deep learning and contact distance prediction in CASP13. Proteins 2019;87:1165-1178. [PMID: 30985027 PMCID: PMC6800999 DOI: 10.1002/prot.25697] [Citation(s) in RCA: 99] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2019] [Revised: 04/04/2019] [Accepted: 04/12/2019] [Indexed: 12/28/2022]

Abstract

Predicting residue‐residue distance relationships (eg, contacts) has become the key direction to advance protein structure prediction since 2014 CASP11 experiment, while deep learning has revolutionized the technology for contact and distance distribution prediction since its debut in 2012 CASP10 experiment. During 2018 CASP13 experiment, we enhanced our MULTICOM protein structure prediction system with three major components: contact distance prediction based on deep convolutional neural networks, distance‐driven template‐free (ab initio) modeling, and protein model ranking empowered by deep learning and contact prediction. Our experiment demonstrates that contact distance prediction and deep learning methods are the key reasons that MULTICOM was ranked 3rd out of all 98 predictors in both template‐free and template‐based structure modeling in CASP13. Deep convolutional neural network can utilize global information in pairwise residue‐residue features such as coevolution scores to substantially improve contact distance prediction, which played a decisive role in correctly folding some free modeling and hard template‐based modeling targets. Deep learning also successfully integrated one‐dimensional structural features, two‐dimensional contact information, and three‐dimensional structural quality scores to improve protein model quality assessment, where the contact prediction was demonstrated to consistently enhance ranking of protein models for the first time. The success of MULTICOM system clearly shows that protein contact distance prediction and model selection driven by deep learning holds the key of solving protein structure prediction problem. However, there are still challenges in accurately predicting protein contact distance when there are few homologous sequences, folding proteins from noisy contact distances, and ranking models of hard targets.

Collapse

Conti S, Karplus M. Estimation of the breadth of CD4bs targeting HIV antibodies by molecular modeling and machine learning. PLoS Comput Biol 2019;15:e1006954. [PMID: 30970017 PMCID: PMC6457539 DOI: 10.1371/journal.pcbi.1006954] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Accepted: 03/18/2019] [Indexed: 11/21/2022] Open

Abstract

HIV is a highly mutable virus for which all attempts to develop a vaccine have been unsuccessful. Nevertheless, few long-infected patients develop antibodies, called broadly neutralizing antibodies (bnAbs), that have a high breadth and can neutralize multiple variants of the virus. This suggests that a universal HIV vaccine should be possible. A measure of the efficacy of a HIV vaccine is the neutralization breadth of the antibodies it generates. The breadth is defined as the fraction of viruses in the Seaman panel that are neutralized by the antibody. Experimentally the neutralization ability is measured as the half maximal inhibitory concentration of the antibody (IC50). To avoid such time-consuming experimental measurements, we developed a computational approach to estimate the IC50 and use it to determine the antibody breadth. Given that no direct method exists for calculating IC50 values, we resort to a combination of atomistic modeling and machine learning. For each antibody/virus complex, an all-atoms model is built using the amino acid sequence and a known structure of a related complex. Then a series of descriptors are derived from the atomistic models, and these are used to train a Multi-Layer Perceptron (an Artificial Neural Network) to predict the value of the IC50 (by regression), or if the antibody binds or not to the virus (by classification). The neural networks are trained by use of experimental IC50 values collected in the CATNAP database. The computed breadths obtained by regression and classification are reported and the importance of having some related information in the data set for obtaining accurate predictions is analyzed. This approach is expected to prove useful for the design of HIV bnAbs, where the computation of the potency must be accompanied by a computation of the breadth, and for evaluating the efficiency of potential vaccination schemes developed through modeling and simulation.

Collapse

Role of solvent accessibility for aggregation-prone patches in protein folding. Sci Rep 2018;8:12896. [PMID: 30150761 PMCID: PMC6110721 DOI: 10.1038/s41598-018-31289-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2018] [Accepted: 08/15/2018] [Indexed: 11/21/2022] Open

Anishchenko I, Kundrotas PJ, Vakser IA. Contact Potential for Structure Prediction of Proteins and Protein Complexes from Potts Model. Biophys J 2018;115:809-821. [PMID: 30122295 DOI: 10.1016/j.bpj.2018.07.035] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2018] [Revised: 07/16/2018] [Accepted: 07/31/2018] [Indexed: 12/18/2022] Open

Yao Y, Gui R, Liu Q, Yi M, Deng H. Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction. BMC Bioinformatics 2017;18:542. [PMID: 29221443 PMCID: PMC5723101 DOI: 10.1186/s12859-017-1983-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 12/27/2022] Open

Cao R, Adhikari B, Bhattacharya D, Sun M, Hou J, Cheng J. QAcon: single model quality assessment using protein structural and contact information with machine learning techniques. Bioinformatics 2017;33:586-588. [PMID: 28035027 DOI: 10.1093/bioinformatics/btw694] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2016] [Accepted: 11/01/2016] [Indexed: 11/14/2022] Open

All-Atom Four-Body Knowledge-Based Statistical Potentials to Distinguish Native Protein Structures from Nonnative Folds. BIOMED RESEARCH INTERNATIONAL 2017;2017:5760612. [PMID: 29119109 PMCID: PMC5651141 DOI: 10.1155/2017/5760612] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2017] [Revised: 08/13/2017] [Accepted: 08/23/2017] [Indexed: 02/05/2023]

Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 2017;292:14349-14361. [PMID: 28710274 DOI: 10.1074/jbc.m117.784165] [Citation(s) in RCA: 77] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/11/2017] [Indexed: 01/18/2023] Open

Abstract

Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.

Collapse

Cao R, Bhattacharya D, Hou J, Cheng J. DeepQA: improving the estimation of single protein model quality with deep belief networks. BMC Bioinformatics 2016;17:495. [PMID: 27919220 PMCID: PMC5139030 DOI: 10.1186/s12859-016-1405-y] [Citation(s) in RCA: 112] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2016] [Accepted: 12/01/2016] [Indexed: 01/02/2023] Open

Dybas JM, Fiser A. Development of a motif-based topology-independent structure comparison method to identify evolutionarily related folds. Proteins 2016;84:1859-1874. [PMID: 27671894 PMCID: PMC5118133 DOI: 10.1002/prot.25169] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2016] [Revised: 08/17/2016] [Accepted: 08/25/2016] [Indexed: 11/09/2022]

Protein single-model quality assessment by feature-based probability density functions. Sci Rep 2016;6:23990. [PMID: 27041353 PMCID: PMC4819172 DOI: 10.1038/srep23990] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2015] [Accepted: 03/17/2016] [Indexed: 11/11/2022] Open

Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J. Large-scale model quality assessment for improving protein tertiary structure prediction. Bioinformatics 2015;31:i116-23. [PMID: 26072473 PMCID: PMC4553833 DOI: 10.1093/bioinformatics/btv235] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Affiliation(s)

Renzhi Cao Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
Debswapna Bhattacharya Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
Badri Adhikari Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
Jilong Li Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA
Jianlin Cheng Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA Computer Science Department, University of Missouri, Columbia, Missouri, 65211, USA, Informatics Institute, University of Missouri, Columbia, Missouri, 65211, USA and C. Bond Life Science Center, University of Missouri, Columbia, Missouri, 65211, USA

Collapse

A large-scale conformation sampling and evaluation server for protein tertiary structure prediction and its assessment in CASP11. BMC Bioinformatics 2015;16:337. [PMID: 26493701 PMCID: PMC4619059 DOI: 10.1186/s12859-015-0775-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2015] [Accepted: 10/14/2015] [Indexed: 11/10/2022] Open

Abstract

Background

With more and more protein sequences produced in the genomic era, predicting protein structures from sequences becomes very important for elucidating the molecular details and functions of these proteins for biomedical research. Traditional template-based protein structure prediction methods tend to focus on identifying the best templates, generating the best alignments, and applying the best energy function to rank models, which often cannot achieve the best performance because of the difficulty of obtaining best templates, alignments, and models.

Methods

We developed a large-scale conformation sampling and evaluation method and its servers to improve the reliability and robustness of protein structure prediction. In the first step, our method used a variety of alignment methods to sample relevant and complementary templates and to generate alternative and diverse target-template alignments, used a template and alignment combination protocol to combine alignments, and used template-based and template-free modeling methods to generate a pool of conformations for a target protein. In the second step, it used a large number of protein model quality assessment methods to evaluate and rank the models in the protein model pool, in conjunction with an exception handling strategy to deal with any additional failure in model ranking.

Results

The method was implemented as two protein structure prediction servers: MULTICOM-CONSTRUCT and MULTICOM-CLUSTER that participated in the 11th Critical Assessment of Techniques for Protein Structure Prediction (CASP11) in 2014. The two servers were ranked among the best 10 server predictors.

Conclusions

The good performance of our servers in CASP11 demonstrates the effectiveness and robustness of the large-scale conformation sampling and evaluation. The MULTICOM server is available at: http://sysbio.rnet.missouri.edu/multicom_cluster/.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0775-x) contains supplementary material, which is available to authorized users.

Collapse

Deng H, Jia Y, Zhang Y. 3DRobot: automated generation of diverse and well-packed protein structure decoys. Bioinformatics 2015;32:378-87. [PMID: 26471454 DOI: 10.1093/bioinformatics/btv601] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2015] [Accepted: 10/10/2015] [Indexed: 11/12/2022] Open

Cao R, Bhattacharya D, Adhikari B, Li J, Cheng J. Massive integration of diverse protein quality assessment methods to improve template based modeling in CASP11. Proteins 2015;84 Suppl 1:247-59. [PMID: 26369671 DOI: 10.1002/prot.24924] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Revised: 08/21/2015] [Accepted: 09/10/2015] [Indexed: 12/28/2022]

Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015;11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open

Abstract

Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling.

Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.

Collapse

Xu Y, Zhou X, Huang M. StaRProtein, a web server for prediction of the stability of repeat proteins. PLoS One 2015;10:e0119417. [PMID: 25807112 PMCID: PMC4373711 DOI: 10.1371/journal.pone.0119417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 01/13/2015] [Indexed: 11/25/2022] Open

Wang J, Zhao Y, Zhu C, Xiao Y. 3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures. Nucleic Acids Res 2015;43:e63. [PMID: 25712091 PMCID: PMC4446410 DOI: 10.1093/nar/gkv141] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 02/06/2015] [Indexed: 01/02/2023] Open

Bolt K, Bergman A. Systems Biology of Aging. LONGEVITY GENES 2015;847:163-78. [DOI: 10.1007/978-1-4939-2404-2_8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]

Jayaram B, Dhingra P, Mishra A, Kaushik R, Mukherjee G, Singh A, Shekhar S. Bhageerath-H: a homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins. BMC Bioinformatics 2014;15 Suppl 16:S7. [PMID: 25521245 PMCID: PMC4290660 DOI: 10.1186/1471-2105-15-s16-s7] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open

Thompson JJ, Tabatabaei Ghomi H, Lill MA. Application of information theory to a three-body coarse-grained representation of proteins in the PDB: insights into the structural and evolutionary roles of residues in protein structure. Proteins 2014;82:3450-65. [PMID: 25269778 DOI: 10.1002/prot.24698] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2014] [Revised: 09/09/2014] [Accepted: 09/19/2014] [Indexed: 01/03/2023]

Ghomi HT, Thompson JJ, Lill MA. Are distance-dependent statistical potentials considering three interacting bodies superior to two-body statistical potentials for protein structure prediction? J Bioinform Comput Biol 2014;12:1450022. [PMID: 25212727 DOI: 10.1142/s021972001450022x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]

Abstract

Distance-based statistical potentials have long been used to model condensed matter systems, e.g. as scoring functions in differentiating native-like protein structures from decoys. These scoring functions are based on the assumption that the total free energy of the protein can be calculated as the sum of pairwise free energy contributions derived from a statistical analysis of pair-distribution functions. However, this fundamental assumption has been challenged theoretically. In fact the free energy of a system with N particles is only exactly related to the N-body distribution function. Based on this argument coarse-grained multi-body statistical potentials have been developed to capture higher-order interactions. Having a coarse representation of the protein and using geometric contacts instead of pairwise interaction distances renders these models insufficient in modeling details of multi-body effects. In this study, we investigated if extending distance-dependent pairwise atomistic statistical potentials to corresponding interaction functions that are conditional on a third interacting body, defined as quasi-three-body statistical potentials, could model details of three-body interactions. We also tested if this approach could improve the predictive capabilities of statistical scoring functions for protein structure prediction. We analyzed the statistical dependency between two simultaneous pairwise interactions and showed that there is surprisingly little if any dependency of a third interacting site on pairwise atomistic statistical potentials. Also the protein structure prediction performance of these quasi-three-body potentials is comparable with their corresponding two-body counterparts. The scoring functions developed in this study showed better or comparable performances compared to some widely used scoring functions for protein structure prediction.

Collapse

Menon V, Vallat BK, Dybas JM, Fiser A. Modeling proteins using a super-secondary structure library and NMR chemical shift information. Structure 2013;21:891-9. [PMID: 23685209 DOI: 10.1016/j.str.2013.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Revised: 04/02/2013] [Accepted: 04/13/2013] [Indexed: 11/29/2022]

Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013;1834:1520-31. [PMID: 23665455 DOI: 10.1016/j.bbapap.2013.04.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/12/2013] [Accepted: 04/15/2013] [Indexed: 12/15/2022]

Pujato M, MacCarthy T, Fiser A, Bergman A. The underlying molecular and network level mechanisms in the evolution of robustness in gene regulatory networks. PLoS Comput Biol 2013;9:e1002865. [PMID: 23300434 PMCID: PMC3536627 DOI: 10.1371/journal.pcbi.1002865] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2012] [Accepted: 11/13/2012] [Indexed: 11/18/2022] Open

Runthala A. Protein structure prediction: challenging targets for CASP10. J Biomol Struct Dyn 2012;30:607-15. [PMID: 22731875 DOI: 10.1080/07391102.2012.687526] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]

Deng H, Jia Y, Wei Y, Zhang Y. What is the best reference state for designing statistical atomic potentials in protein structure prediction? Proteins 2012;80:2311-22. [PMID: 22623012 DOI: 10.1002/prot.24121] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Revised: 04/30/2012] [Accepted: 05/21/2012] [Indexed: 01/01/2023]

Ceres N, Lavery R. Coarse-grain Protein Models. INNOVATIONS IN BIOMOLECULAR MODELING AND SIMULATIONS 2012. [DOI: 10.1039/9781849735049-00219] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]

Shirota M, Ishida T, Kinoshita K. Absolute quality evaluation of protein model structures using statistical potentials with respect to the native and reference states. Proteins 2011;79:1550-63. [PMID: 21365682 DOI: 10.1002/prot.22982] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2010] [Revised: 11/19/2010] [Accepted: 12/19/2010] [Indexed: 11/06/2022]

Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010;5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open

Abstract

Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.

Collapse

Sikic K, Tomic S, Carugo O. Systematic comparison of crystal and NMR protein structures deposited in the protein data bank. Open Biochem J 2010;4:83-95. [PMID: 21293729 PMCID: PMC3032220 DOI: 10.2174/1874091x01004010083] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2010] [Revised: 05/20/2010] [Accepted: 06/14/2010] [Indexed: 11/22/2022] Open

Solis AD, Rackovsky SR. Information-theoretic analysis of the reference state in contact potentials used for protein structure prediction. Proteins 2010;78:1382-97. [PMID: 20034109 DOI: 10.1002/prot.22652] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]

Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010;11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open

Fiser A. Template-based protein structure modeling. Methods Mol Biol 2010;673:73-94. [PMID: 20835794 DOI: 10.1007/978-1-60761-842-3_6] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Shirota M, Ishida T, Kinoshita K. Analyses on hydrophobicity and attractiveness of all-atom distance-dependent potentials. Protein Sci 2009;18:1906-15. [PMID: 19588493 DOI: 10.1002/pro.201] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Aloy P, Oliva B. Splitting statistical potentials into meaningful scoring functions: testing the prediction of near-native structures from decoy conformations. BMC STRUCTURAL BIOLOGY 2009;9:71. [PMID: 19917096 PMCID: PMC2783033 DOI: 10.1186/1472-6807-9-71] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022]

Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009;5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open

Improved scoring function for comparative modeling using the M4T method. ACTA ACUST UNITED AC 2008;10:95-9. [PMID: 18985440 DOI: 10.1007/s10969-008-9044-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2008] [Accepted: 10/16/2008] [Indexed: 10/21/2022]

Shirota M, Ishida T, Kinoshita K. Effects of surface-to-volume ratio of proteins on hydrophilic residues: decrease in occurrence and increase in buried fraction. Protein Sci 2008;17:1596-602. [PMID: 18556475 DOI: 10.1110/ps.035592.108] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]