1
|
Konecki DM, Hamrick S, Wang C, Agosto MA, Wensel TG, Lichtarge O. CovET: A covariation-evolutionary trace method that identifies protein structure-function modules. J Biol Chem 2023; 299:104896. [PMID: 37290531 PMCID: PMC10338321 DOI: 10.1016/j.jbc.2023.104896] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2023] [Revised: 06/01/2023] [Accepted: 06/02/2023] [Indexed: 06/10/2023] Open
Abstract
Measuring the relative effect that any two sequence positions have on each other may improve protein design or help better interpret coding variants. Current approaches use statistics and machine learning but rarely consider phylogenetic divergences which, as shown by Evolutionary Trace studies, provide insight into the functional impact of sequence perturbations. Here, we reframe covariation analyses in the Evolutionary Trace framework to measure the relative tolerance to perturbation of each residue pair during evolution. This approach (CovET) systematically accounts for phylogenetic divergences: at each divergence event, we penalize covariation patterns that belie evolutionary coupling. We find that while CovET approximates the performance of existing methods to predict individual structural contacts, it performs significantly better at finding structural clusters of coupled residues and ligand binding sites. For example, CovET found more functionally critical residues when we examined the RNA recognition motif and WW domains. It correlates better with large-scale epistasis screen data. In the dopamine D2 receptor, top CovET residue pairs recovered accurately the allosteric activation pathway characterized for Class A G protein-coupled receptors. These data suggest that CovET ranks highest the sequence position pairs that play critical functional roles through epistatic and allosteric interactions in evolutionarily relevant structure-function motifs. CovET complements current methods and may shed light on fundamental molecular mechanisms of protein structure and function.
Collapse
Affiliation(s)
- Daniel M Konecki
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Spencer Hamrick
- Chemical, Physical, and Structural Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Chen Wang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Theodore G Wensel
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Quantitative and Computational Biosciences Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Cancer and Cell Biology Graduate Program, Baylor College of Medicine, Houston, Texas, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
2
|
Improved Protein Real-Valued Distance Prediction Using Deep Residual Dense Network (DRDN). Protein J 2022; 41:468-476. [PMID: 36008645 DOI: 10.1007/s10930-022-10067-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2022] [Indexed: 10/15/2022]
Abstract
Three-dimensional protein structure prediction is one of the major challenges in bioinformatics. According to recent research findings, real-valued distance prediction plays a vital role in determining the unique three-dimensional protein structure. This paper proposes a novel methodology involving a deep residual dense network (DRDN) for predicting protein real-valued distance. The features extracted from the given query protein sequence and its corresponding homologous sequences are used for training the model. Multi-aligned homologous sequences for each query protein sequence are retrieved from five different databases using DeepMSA, HHblits, and HITS_PR_HHblits methods. The proposed method yielded outcomes of 3.89, 0.23, 0.45, and 0.63, respectively, corresponding to the evaluation metrics such as Absolute Error, Relative Error, High-accuracy Pairwise Distance Test (PDA), and Pairwise Distance Test (PDT). Further, the contact map is computed based on CASP criteria by converting the predicted real-valued distance, and it is evaluated using the precision metric. It is observed that precision of long-range top L/5 contact prediction on the CASP13 dataset by the proposed method, RaptorX, Zhang, trRosetta, JinboXu & JinLu, and Deepdist are 0.834, 0.657, 0.70, 0.785, 0.786, and 0.812, respectively. Also, Top-L/5 contact prediction on the CASP14 dataset evaluated using average precision resulted in 0.847, 0.707, 0.752, 0.783, 0.792, 0.817, and 0.825 respectively, corresponding to the proposed method, Zhang, RaptorX, trRosetta, Deepdist, JinboXu & JinLu, and Alphafold2.
Collapse
|
3
|
Peng CX, Zhou XG, Zhang GJ. De novo Protein Structure Prediction by Coupling Contact With Distance Profile. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:395-406. [PMID: 32750861 DOI: 10.1109/tcbb.2020.3000758] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
De novo protein structure prediction is a challenging problem that requires both an accurate energy function and an efficient conformation sampling method. In this study, a de novo structure prediction method, named CoDiFold, is proposed. In CoDiFold, contacts and distance profiles are organically combined into the Rosetta low-resolution energy function to improve the accuracy of energy function. As a result, the correlation between energy and root mean square deviation (RMSD) is improved. In addition, a population-based multi-mutation strategy is designed to balance the exploration and exploitation of conformation space sampling. The average RMSD of the models generated by the proposed protocol is decreased by 49.24 and 45.21 percent in the test set with 43 proteins compared with those of Rosetta and QUARK de novo protocols, respectively. The results also demonstrate that the structures predicted by proposed CoDiFold are comparable to the state-of-the-art methods for the 10 FM targets of CASP13. The source code and executable versions are freely available at http://github.com/iobio-zjut/CoDiFold.
Collapse
|
4
|
Su H, Wang W, Du Z, Peng Z, Gao S, Cheng M, Yang J. Improved Protein Structure Prediction Using a New Multi-Scale Network and Homologous Templates. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2102592. [PMID: 34719864 PMCID: PMC8693034 DOI: 10.1002/advs.202102592] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/19/2021] [Revised: 09/12/2021] [Indexed: 06/04/2023]
Abstract
The accuracy of de novo protein structure prediction has been improved considerably in recent years, mostly due to the introduction of deep learning techniques. In this work, trRosettaX, an improved version of trRosetta for protein structure prediction is presented. The major improvement over trRosetta consists of two folds. The first is the application of a new multi-scale network, i.e., Res2Net, for improved prediction of inter-residue geometries, including distance and orientations. The second is an attention-based module to exploit multiple homologous templates to increase the accuracy further. Compared with trRosetta, trRosettaX improves the contact precision by 6% and 8% on the free modeling targets of CASP13 and CASP14, respectively. A preliminary version of trRosettaX is ranked as one of the top server groups in CASP14's blind test. Additional benchmark test on 161 targets from CAMEO (between Jun and Sep 2020) shows that trRosettaX achieves an average TM-score ≈0.8, outperforming the top groups in CAMEO. These data suggest the effectiveness of using the multi-scale network and the benefit of incorporating homologous templates into the network. The trRosettaX algorithm is incorporated into the trRosetta server since Nov 2020. The web server, the training and inference codes are available at: https://yanglab.nankai.edu.cn/trRosetta/.
Collapse
Affiliation(s)
- Hong Su
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Wenkai Wang
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zongyang Du
- School of Mathematical SciencesNankai UniversityTianjin300071China
| | - Zhenling Peng
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| | - Shang‐Hua Gao
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Ming‐Ming Cheng
- College of Computer ScienceNankai UniversityTianjin300071China
| | - Jianyi Yang
- Research Center for Mathematics and Interdisciplinary SciencesShandong UniversityQingdao266237China
| |
Collapse
|
5
|
Ruiz-Serra V, Pontes C, Milanetti E, Kryshtafovych A, Lepore R, Valencia A. Assessing the accuracy of contact and distance predictions in CASP14. Proteins 2021; 89:1888-1900. [PMID: 34595772 DOI: 10.1002/prot.26248] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Revised: 09/06/2021] [Accepted: 09/21/2021] [Indexed: 12/26/2022]
Abstract
We present the results of the assessment of the intramolecular residue-residue contact and distance predictions from groups participating in the 14th round of the CASP experiment. The performance of contact prediction methods was evaluated with the measures used in previous CASPs, while distance predictions were assessed based on a new protocol, which considers individual distance pairs as well as the whole predicted distance matrix, using a graph-based framework. The results of the evaluation indicate that predictions by the tFold framework, TripletRes and DeepPotential were the most accurate in both categories. With regards to progress in method performance, the results of the assessment in contact prediction did not reveal any discernible difference when compared to CASP13. Arguably, this could be due to CASP14 FM targets being more challenging than ever before.
Collapse
Affiliation(s)
| | - Camila Pontes
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Edoardo Milanetti
- Department of Physics, Sapienza Università di Roma, Rome, Italy.,Center for Life Nano- & Neuro-Science, Fondazione Istituto Italiano di Tecnologia (IIT), Rome, Italy
| | | | | | - Alfonso Valencia
- Barcelona Supercomputing Center (BSC), Barcelona, Spain.,ICREA, Pg. Lluís Companys, Barcelona, Spain
| |
Collapse
|
6
|
Hong Z, Liu J, Chen Y. An interpretable machine learning method for homo-trimeric protein interface residue-residue interaction prediction. Biophys Chem 2021; 278:106666. [PMID: 34418678 DOI: 10.1016/j.bpc.2021.106666] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 08/09/2021] [Accepted: 08/09/2021] [Indexed: 12/29/2022]
Abstract
Protein-protein interaction plays an important role in life activities. A more fine-grained analysis, such as residues and atoms level, will better benefit us to understand the mechanism for inter-protein interaction and drug design. The development of efficient computational methods to reduce trials and errors, as well as assisting experimental researchers to determine the complex structure are some of the ongoing studies in the field. The research of trimer protein interface, especially homotrimer, has been rarely studied. In this paper, we proposed an interpretable machine learning method for homo-trimeric protein interface residue pairs prediction. The structure, sequence, and physicochemical information are intergraded as feature input fed to model for training. Graph model is utilized to present spatial information for intra-protein. Matrix factorization captures the different features' interactions. Kernel function is designed to auto-acquire the adjacent information of our target residue pairs. The accuracy rate achieves 54.5% in an independent test set. Sequence and structure alignment exhibit the ability of model self-study. Our model indicates the biological significance between sequence and structure, and could be auxiliary for reducing trials and errors in the fields of protein complex determination and protein-protein docking, etc. SIGNIFICANCE: Protein complex structures are significant for understanding protein function and promising functional protein design. With data increasing, some computational tools have been developed for protein complex residue contact prediction, which is one of the most significant steps for complex structure prediction. But for homo-trimeric protein, the sequence-based deep learning predictors are infeasible for homologous sequences, and the algorithm black box prevents us from understanding of each step operation. In this way, we propose an interpreting machine learning method for homo-trimeric protein interface residue-residue interaction prediction, and the predictor shows a good performance. Our work provides a computational auxiliary way for determining the homo-trimeric proteins interface residue pairs which will be further verified by wet experiments, and and gives a hand for the downstream works, such as protein-protein docking, protein complex structure prediction and drug design.
Collapse
Affiliation(s)
- Zhonghua Hong
- Jiaxing Hospital of Traditional Chinese Medicine, Jiaxing University, Jiaxing 314001, PR China.
| | - Jiale Liu
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, PR China
| | - Yinggao Chen
- Shantou Central Hospital, Shantou 515041, PR China.
| |
Collapse
|
7
|
Zheng W, Zhang C, Li Y, Pearce R, Bell EW, Zhang Y. Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations. CELL REPORTS METHODS 2021; 1:100014. [PMID: 34355210 PMCID: PMC8336924 DOI: 10.1016/j.crmeth.2021.100014] [Citation(s) in RCA: 261] [Impact Index Per Article: 87.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2021] [Revised: 04/22/2021] [Accepted: 05/03/2021] [Indexed: 12/23/2022]
Abstract
Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Eric W. Bell
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
8
|
Wu T, Guo Z, Hou J, Cheng J. DeepDist: real-value inter-residue distance prediction with deep residual convolutional network. BMC Bioinformatics 2021; 22:30. [PMID: 33494711 PMCID: PMC7831258 DOI: 10.1186/s12859-021-03960-9] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Driven by deep learning, inter-residue contact/distance prediction has been significantly improved and substantially enhanced ab initio protein structure prediction. Currently, most of the distance prediction methods classify inter-residue distances into multiple distance intervals instead of directly predicting real-value distances. The output of the former has to be converted into real-value distances to be used in tertiary structure prediction. RESULTS To explore the potentials of predicting real-value inter-residue distances, we develop a multi-task deep learning distance predictor (DeepDist) based on new residual convolutional network architectures to simultaneously predict real-value inter-residue distances and classify them into multiple distance intervals. Tested on 43 CASP13 hard domains, DeepDist achieves comparable performance in real-value distance prediction and multi-class distance prediction. The average mean square error (MSE) of DeepDist's real-value distance prediction is 0.896 Å2 when filtering out the predicted distance ≥ 16 Å, which is lower than 1.003 Å2 of DeepDist's multi-class distance prediction. When distance predictions are converted into contact predictions at 8 Å threshold (the standard threshold in the field), the precision of top L/5 and L/2 contact predictions of DeepDist's multi-class distance prediction is 79.3% and 66.1%, respectively, higher than 78.6% and 64.5% of its real-value distance prediction and the best results in the CASP13 experiment. CONCLUSIONS DeepDist can predict inter-residue distances well and improve binary contact prediction over the existing state-of-the-art methods. Moreover, the predicted real-value distances can be directly used to reconstruct protein tertiary structures better than multi-class distance predictions due to the lower MSE. Finally, we demonstrate that predicting the real-value distance map and multi-class distance map at the same time performs better than predicting real-value distances alone.
Collapse
Affiliation(s)
- Tianqi Wu
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Zhiye Guo
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA
| | - Jie Hou
- Department of Computer Science, Saint Louis University, St. Louis, MO, 63103, USA
| | - Jianlin Cheng
- Electrical Engineering and Computer Science Department, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
9
|
Gao W, Mahajan SP, Sulam J, Gray JJ. Deep Learning in Protein Structural Modeling and Design. PATTERNS (NEW YORK, N.Y.) 2020; 1:100142. [PMID: 33336200 PMCID: PMC7733882 DOI: 10.1016/j.patter.2020.100142] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Deep learning is catalyzing a scientific revolution fueled by big data, accessible toolkits, and powerful computational resources, impacting many fields, including protein structural modeling. Protein structural modeling, such as predicting structure from amino acid sequence and evolutionary information, designing proteins toward desirable functionality, or predicting properties or behavior of a protein, is critical to understand and engineer biological systems at the molecular level. In this review, we summarize the recent advances in applying deep learning techniques to tackle problems in protein structural modeling and design. We dissect the emerging approaches using deep learning techniques for protein structural modeling and discuss advances and challenges that must be addressed. We argue for the central importance of structure, following the "sequence → structure → function" paradigm. This review is directed to help both computational biologists to gain familiarity with the deep learning methods applied in protein modeling, and computer scientists to gain perspective on the biologically meaningful problems that may benefit from deep learning techniques.
Collapse
Affiliation(s)
- Wenhao Gao
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
10
|
Chasing coevolutionary signals in intrinsically disordered proteins complexes. Sci Rep 2020; 10:17962. [PMID: 33087759 PMCID: PMC7578644 DOI: 10.1038/s41598-020-74791-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2020] [Accepted: 08/27/2020] [Indexed: 11/30/2022] Open
Abstract
Intrinsically disordered proteins/regions (IDPs/IDRs) are crucial components of the cell, they are highly abundant and participate ubiquitously in a wide range of biological functions, such as regulatory processes and cell signaling. Many of their important functions rely on protein interactions, by which they trigger or modulate different pathways. Sequence covariation, a powerful tool for protein contact prediction, has been applied successfully to predict protein structure and to identify protein–protein interactions mostly of globular proteins. IDPs/IDRs also mediate a plethora of protein–protein interactions, highlighting the importance of addressing sequence covariation-based inter-protein contact prediction of this class of proteins. Despite their importance, a systematic approach to analyze the covariation phenomena of intrinsically disordered proteins and their complexes is still missing. Here we carry out a comprehensive critical assessment of coevolution-based contact prediction in IDP/IDR complexes and detail the challenges and possible limitations that emerge from their analysis. We found that the coevolutionary signal is faint in most of the complexes of disordered proteins but positively correlates with the interface size and binding affinity between partners. In addition, we discuss the state-of-art methodology by biological interpretation of the results, formulate evaluation guidelines and suggest future directions of development to the field.
Collapse
|
11
|
Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.
Collapse
|
12
|
Bhattacharya S, Bhattacharya D. Evaluating the significance of contact maps in low-homology protein modeling using contact-assisted threading. Sci Rep 2020; 10:2908. [PMID: 32076047 PMCID: PMC7031282 DOI: 10.1038/s41598-020-59834-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Accepted: 02/04/2020] [Indexed: 12/02/2022] Open
Abstract
The development of improved threading algorithms for remote homology modeling is a critical step forward in template-based protein structure prediction. We have recently demonstrated the utility of contact information to boost protein threading by developing a new contact-assisted threading method. However, the nature and extent to which the quality of a predicted contact map impacts the performance of contact-assisted threading remains elusive. Here, we systematically analyze and explore this interdependence by employing our newly-developed contact-assisted threading method over a large-scale benchmark dataset using predicted contact maps from four complementary methods including direct coupling analysis (mfDCA), sparse inverse covariance estimation (PSICOV), classical neural network-based meta approach (MetaPSICOV), and state-of-the-art ultra-deep learning model (RaptorX). Experimental results demonstrate that contact-assisted threading using high-quality contacts having the Matthews Correlation Coefficient (MCC) ≥ 0.5 improves threading performance in nearly 30% cases, while low-quality contacts with MCC <0.35 degrades the performance for 50% cases. This holds true even in CASP13 dataset, where threading using high-quality contacts (MCC ≥ 0.5) significantly improves the performance of 22 instances out of 29. Collectively, our study uncovers the mutual association between the quality of predicted contacts and its possible utility in boosting threading performance for improving low-homology protein modeling.
Collapse
Affiliation(s)
- Sutanu Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, 36849, USA.
- Department of Biological Sciences, Auburn University, Auburn, AL, 36849, USA.
| |
Collapse
|
13
|
Bittrich S, Schroeder M, Labudde D. StructureDistiller: Structural relevance scoring identifies the most informative entries of a contact map. Sci Rep 2019; 9:18517. [PMID: 31811259 PMCID: PMC6898053 DOI: 10.1038/s41598-019-55047-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2019] [Accepted: 11/21/2019] [Indexed: 12/17/2022] Open
Abstract
Protein folding and structure prediction are two sides of the same coin. Contact maps and the related techniques of constraint-based structure reconstruction can be considered as unifying aspects of both processes. We present the Structural Relevance (SR) score which quantifies the information content of individual contacts and residues in the context of the whole native structure. The physical process of protein folding is commonly characterized with spatial and temporal resolution: some residues are Early Folding while others are Highly Stable with respect to unfolding events. We employ the proposed SR score to demonstrate that folding initiation and structure stabilization are subprocesses realized by distinct sets of residues. The example of cytochrome c is used to demonstrate how StructureDistiller identifies the most important contacts needed for correct protein folding. This shows that entries of a contact map are not equally relevant for structural integrity. The proposed StructureDistiller algorithm identifies contacts with the highest information content; these entries convey unique constraints not captured by other contacts. Identification of the most informative contacts effectively doubles resilience toward contacts which are not observed in the native contact map. Furthermore, this knowledge increases reconstruction fidelity on sparse contact maps significantly by 0.4 Å.
Collapse
Affiliation(s)
- Sebastian Bittrich
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany. .,Biotechnology Center (BIOTEC), TU Dresden, Dresden, 01307, Germany. .,Research Collaboratory for Structural Bioinformatics Protein Data Bank, University of California, San Diego, La Jolla, CA, 92093, USA.
| | | | - Dirk Labudde
- University of Applied Sciences Mittweida, Mittweida, 09648, Germany
| |
Collapse
|
14
|
Shrestha R, Fajardo E, Gil N, Fidelis K, Kryshtafovych A, Monastyrskyy B, Fiser A. Assessing the accuracy of contact predictions in CASP13. Proteins 2019; 87:1058-1068. [PMID: 31587357 PMCID: PMC6851495 DOI: 10.1002/prot.25819] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/17/2019] [Accepted: 09/17/2019] [Indexed: 01/07/2023]
Abstract
The accuracy of sequence-based tertiary contact predictions was assessed in a blind prediction experiment at the CASP13 meeting. After 4 years of significant improvements in prediction accuracy, another dramatic advance has taken place since CASP12 was held 2 years ago. The precision of predicting the top L/5 contacts in the free modeling category, where L is the corresponding length of the protein in residues, has exceeded 70%. As a comparison, the best-performing group at CASP12 with a 47% precision would have finished below the top 1/3 of the CASP13 groups. Extensively trained deep neural network approaches dominate the top performing algorithms, which appear to efficiently integrate information on coevolving residues and interacting fragments or possibly utilize memories of sequence similarities and sometimes can deliver accurate results even in the absence of virtually any target specific evolutionary information. If the current performance is evaluated by F-score on L contacts, it stands around 24% right now, which, despite the tremendous impact and advance in improving its utility for structure modeling, also suggests that there is much room left for further improvement.
Collapse
Affiliation(s)
- Rojan Shrestha
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Eduardo Fajardo
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Nelson Gil
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Bohdan Monastyrskyy
- Genome Center, University of California, Davis, 451 Health Sciences Dr., Davis CA 95616-8816, USA
| | - Andras Fiser
- Department of Systems and Computational Biology, and Department of Biochemistry, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| |
Collapse
|
15
|
Sala D, Huang YJ, Cole CA, Snyder DA, Liu G, Ishida Y, Swapna GVT, Brock KP, Sander C, Fidelis K, Kryshtafovych A, Inouye M, Tejero R, Valafar H, Rosato A, Montelione GT. Protein structure prediction assisted with sparse NMR data in CASP13. Proteins 2019; 87:1315-1332. [PMID: 31603581 DOI: 10.1002/prot.25837] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 09/26/2019] [Accepted: 09/27/2019] [Indexed: 01/05/2023]
Abstract
CASP13 has investigated the impact of sparse NMR data on the accuracy of protein structure prediction. NOESY and 15 N-1 H residual dipolar coupling data, typical of that obtained for 15 N,13 C-enriched, perdeuterated proteins up to about 40 kDa, were simulated for 11 CASP13 targets ranging in size from 80 to 326 residues. For several targets, two prediction groups generated models that are more accurate than those produced using baseline methods. Real NMR data collected for a de novo designed protein were also provided to predictors, including one data set in which only backbone resonance assignments were available. Some NMR-assisted prediction groups also did very well with these data. CASP13 also assessed whether incorporation of sparse NMR data improves the accuracy of protein structure prediction relative to nonassisted regular methods. In most cases, incorporation of sparse, noisy NMR data results in models with higher accuracy. The best NMR-assisted models were also compared with the best regular predictions of any CASP13 group for the same target. For six of 13 targets, the most accurate model provided by any NMR-assisted prediction group was more accurate than the most accurate model provided by any regular prediction group; however, for the remaining seven targets, one or more regular prediction method provided a more accurate model than even the best NMR-assisted model. These results suggest a novel approach for protein structure determination, in which advanced prediction methods are first used to generate structural models, and sparse NMR data is then used to validate and/or refine these models.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center, University of Florence, Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Sesto Fiorentino, Italy
| | - Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York
| | - Casey A Cole
- Department of Computer Science & Engineering, University of South Carolina, Columbia, South Carolina
| | - David A Snyder
- Department of Chemistry, College of Science and Health, William Paterson University, Wayne, New Jersey
| | - Gaohua Liu
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Nexomics Biosciences, Bordentown, New Jersey
| | - Yojiro Ishida
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - G V T Swapna
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, Massachusetts.,cBio Center, Dana-Farber Cancer Institute, Boston, Massachusetts
| | | | | | - Masayori Inouye
- Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Roberto Tejero
- Departamento de Quimica Fisica, Universidad de Valencia, Valencia, Spain
| | - Homayoun Valafar
- Department of Computer Science & Engineering, University of South Carolina, Columbia, South Carolina
| | - Antonio Rosato
- Magnetic Resonance Center, University of Florence, Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Sesto Fiorentino, Italy
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, and Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey.,Department of Chemistry and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York.,Department of Biochemistry and Molecular Biology, The Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| |
Collapse
|
16
|
Wu T, Hou J, Adhikari B, Cheng J. Analysis of several key factors influencing deep learning-based inter-residue contact prediction. Bioinformatics 2019; 36:1091-1098. [PMID: 31504181 PMCID: PMC7703788 DOI: 10.1093/bioinformatics/btz679] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 08/02/2019] [Accepted: 08/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Deep learning has become the dominant technology for protein contact prediction. However, the factors that affect the performance of deep learning in contact prediction have not been systematically investigated. RESULTS We analyzed the results of our three deep learning-based contact prediction methods (MULTICOM-CLUSTER, MULTICOM-CONSTRUCT and MULTICOM-NOVEL) in the CASP13 experiment and identified several key factors [i.e. deep learning technique, multiple sequence alignment (MSA), distance distribution prediction and domain-based contact integration] that influenced the contact prediction accuracy. We compared our convolutional neural network (CNN)-based contact prediction methods with three coevolution-based methods on 75 CASP13 targets consisting of 108 domains. We demonstrated that the CNN-based multi-distance approach was able to leverage global coevolutionary coupling patterns comprised of multiple correlated contacts for more accurate contact prediction than the local coevolution-based methods, leading to a substantial increase of precision by 19.2 percentage points. We also tested different alignment methods and domain-based contact prediction with the deep learning contact predictors. The comparison of the three methods showed deeper sequence alignments and the integration of domain-based contact prediction with the full-length contact prediction improved the performance of contact prediction. Moreover, we demonstrated that the domain-based contact prediction based on a novel ab initio approach of parsing domains from MSAs alone without using known protein structures was a simple, fast approach to improve contact prediction. Finally, we showed that predicting the distribution of inter-residue distances in multiple distance intervals could capture more structural information and improve binary contact prediction. AVAILABILITY AND IMPLEMENTATION https://github.com/multicom-toolbox/DNCON2/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tianqi Wu
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Jie Hou
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO 65211, USA
| | - Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri, St. Louis, MO 63121, USA
| | | |
Collapse
|
17
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
18
|
Adhikari B, Hou J, Cheng J. DNCON2: improved protein contact prediction using two-level deep convolutional neural networks. Bioinformatics 2019; 34:1466-1472. [PMID: 29228185 PMCID: PMC5925776 DOI: 10.1093/bioinformatics/btx781] [Citation(s) in RCA: 105] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 12/07/2017] [Indexed: 12/14/2022] Open
Abstract
Motivation Significant improvements in the prediction of protein residue–residue contacts are observed in the recent years. These contacts, predicted using a variety of coevolution-based and machine learning methods, are the key contributors to the recent progress in ab initio protein structure prediction, as demonstrated in the recent CASP experiments. Continuing the development of new methods to reliably predict contact maps is essential to further improve ab initio structure prediction. Results In this paper we discuss DNCON2, an improved protein contact map predictor based on two-level deep convolutional neural networks. It consists of six convolutional neural networks—the first five predict contacts at 6, 7.5, 8, 8.5 and 10 Å distance thresholds, and the last one uses these five predictions as additional features to predict final contact maps. On the free-modeling datasets in CASP10, 11 and 12 experiments, DNCON2 achieves mean precisions of 35, 50 and 53.4%, respectively, higher than 30.6% by MetaPSICOV on CASP10 dataset, 34% by MetaPSICOV on CASP11 dataset and 46.3% by Raptor-X on CASP12 dataset, when top L/5 long-range contacts are evaluated. We attribute the improved performance of DNCON2 to the inclusion of short- and medium-range contacts into training, two-level approach to prediction, use of the state-of-the-art optimization and activation functions, and a novel deep learning architecture that allows each filter in a convolutional layer to access all the input features of a protein of arbitrary length. Availability and implementation The web server of DNCON2 is at http://sysbio.rnet.missouri.edu/dncon2/ where training and testing datasets as well as the predictions for CASP10, 11 and 12 free-modeling datasets can also be downloaded. Its source code is available at https://github.com/multicom-toolbox/DNCON2/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jie Hou
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA
| | - Jianlin Cheng
- Department of Mathematics and Computer Science, University of Missouri-St. Louis, St. Louis, MO 63121, USA.,Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
19
|
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue-residue contact prediction. Brief Bioinform 2019; 19:219-230. [PMID: 27802931 DOI: 10.1093/bib/bbw106] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Indexed: 11/14/2022] Open
Abstract
Sequence-based prediction of residue-residue contact in proteins becomes increasingly more important for improving protein structure prediction in the big data era. In this study, we performed a large-scale comparative assessment of 15 locally installed contact predictors. To assess these methods, we collected a big data set consisting of 680 nonredundant proteins covering different structural classes and target difficulties. We investigated a wide range of factors that may influence the precision of contact prediction, including target difficulty, structural class, the alignment depth and distribution of contact pairs in a protein structure. We found that: (1) the machine learning-based methods outperform the direct-coupling-based methods for short-range contact prediction, while the latter are significantly better for long-range contact prediction. The consensus-based methods, which combine machine learning and direct-coupling methods, perform the best. (2) The target difficulty does not have clear influence on the machine learning-based methods, while it does affect the direct-coupling and consensus-based methods significantly. (3) The alignment depth has relatively weak effect on the machine learning-based methods. However, for the direct-coupling-based methods and consensus-based methods, the predicted contacts for targets with deeper alignment tend to be more accurate. (4) All methods perform relatively better on β and α + β proteins than on α proteins. (5) Residues buried in the core of protein structure are more prone to be in contact than residues on the surface (22 versus 6%). We believe these are useful results for guiding future development of new approach to contact prediction.
Collapse
Affiliation(s)
- Qiqige Wuyun
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Wei Zheng
- School of Mathematical Sciences, Nankai University, Tianjin, China
| | - Zhenling Peng
- Center for Applied Mathematics, Tianjin University, Tianjin, China
| | - Jianyi Yang
- School of Mathematical Sciences, Nankai University, Tianjin, China
| |
Collapse
|
20
|
Kasahara K, Minami S, Aizawa Y. Characteristics of interactions at protein segments without non-local intramolecular contacts in the Protein Data Bank. PLoS One 2018; 13:e0205052. [PMID: 30537764 PMCID: PMC6289587 DOI: 10.1371/journal.pone.0205052] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 11/23/2018] [Indexed: 12/20/2022] Open
Abstract
The principle of three-dimensional protein structure formation is a long-standing conundrum in structural biology. A globular domain of a soluble protein is formed by a network of atomic contacts among amino acid residues, but regions without intramolecular non-local contacts are often observed in the protein structure, especially in loop, linker, and peripheral segments with secondary structures. Although these regions can play key roles for protein function as interfaces for intermolecular interactions, their nature remains unclear. Here, we termed protein segments without non-local contacts as floating segments and sought them in tens of thousands of entries in the Protein Data Bank. As a result, we found that 0.72% of residues are in floating segments. Regarding secondary structural elements, coil structures are enriched in floating segments, especially for long segments. Interactions with polypeptides and polynucleotides, but not chemical compounds, are enriched in floating segments. The amino acid preferences of floating segments are similar to those of surface residues, with exceptions; the small side chain amino acids, Gly and Ala, are preferred, and some charged side chains, Arg and His, are disfavored for floating segments compared to surface residues. Our comprehensive characterization of floating segments may provide insights into understanding protein sequence-structure-function relationships.
Collapse
Affiliation(s)
- Kota Kasahara
- College of Life Sciences, Ritsumeikan University, Noji-higashi, Kusatsu, Shiga, Japan
| | - Shintaro Minami
- Exploratory Research Center on Life and Living Systems, National Institutes for Natural Sciences, Myodaiji, Okazaki, Aichi, Japan
| | - Yasunori Aizawa
- School of Life Science and Technology, Tokyo Institute of Technology, Nagatsuda-cho, Midori-ku, Yokohama, Kanagawa, Japan
| |
Collapse
|
21
|
Neuwald AF, Altschul SF. Statistical investigations of protein residue direct couplings. PLoS Comput Biol 2018; 14:e1006237. [PMID: 30596639 PMCID: PMC6329532 DOI: 10.1371/journal.pcbi.1006237] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 01/11/2019] [Accepted: 11/23/2018] [Indexed: 12/12/2022] Open
Abstract
Protein Direct Coupling Analysis (DCA), which predicts residue-residue contacts based on covarying positions within a multiple sequence alignment, has been remarkably effective. This suggests that there is more to learn from sequence correlations than is generally assumed, and calls for deeper investigations into DCA and perhaps into other types of correlations. Here we describe an approach that enables such investigations by measuring, as an estimated p-value, the statistical significance of the association between residue-residue covariance and structural interactions, either internal or homodimeric. Its application to thirty protein superfamilies confirms that direct coupling (DC) scores correlate with 3D pairwise contacts with very high significance. This method also permits quantitative assessment of the relative performance of alternative DCA methods, and of the degree to which they detect direct versus indirect couplings. We illustrate its use to assess, for a given protein, the biological relevance of alternative conformational states, to investigate the possible mechanistic implications of differences between these states, and to characterize subtle aspects of direct couplings. Our analysis indicates that direct pairwise correlations may be largely distinct from correlated patterns associated with functional specialization, and that the joint analysis of both types of correlations can yield greater power. Data, programs, and source code are freely available at http://evaldca.igs.umaryland.edu.
Collapse
Affiliation(s)
- Andrew F. Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, Baltimore, Maryland, United States of America
| | - Stephen F. Altschul
- Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
22
|
Wu H, Cao C, Xia X, Lu Q. Unified Deep Learning Architecture for Modeling Biology Sequence. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1445-1452. [PMID: 28991751 DOI: 10.1109/tcbb.2017.2760832] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Prediction of the spatial structure or function of biological macromolecules based on their sequences remains an important challenge in bioinformatics. When modeling biological sequences using traditional sequencing models, long-range interaction, complicated and variable output of labeled structures, and variable length of biological sequences usually lead to different solutions on a case-by-case basis. This study proposed a unified deep learning architecture based on long short-term memory or a gated recurrent unit to capture long-range interactions. The architecture designs the optional reshape operator to adapt to the diversity of the output labels and implements a training algorithm to support the training of sequence models capable of processing variable-length sequences. The merging and pooling operators enhances the ability of capturing short-range interactions between basic units of biological sequences. The proposed deep-learning architecture and its training algorithm might be capable of solving currently variable biological sequence-modeling problems under a unified framework. We validated the model on one of the most difficult biological sequence-modeling problems, protein residue interaction prediction. The results indicate that the accuracy of obtaining the residue interactions of the model exceeded popular approaches by 10 percent on multiple widely-used benchmarks.
Collapse
|
23
|
Deng H, Jia Y, Zhang Y. Protein structure prediction. INTERNATIONAL JOURNAL OF MODERN PHYSICS. B 2018; 32:1840009. [PMID: 30853739 PMCID: PMC6407873 DOI: 10.1142/s021797921840009x] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Predicting 3D structure of protein from its amino acid sequence is one of the most important unsolved problems in biophysics and computational biology. This paper attempts to give a comprehensive introduction of the most recent effort and progress on protein structure prediction. Following the general flowchart of structure prediction, related concepts and methods are presented and discussed. Moreover, brief introductions are made to several widely-used prediction methods and the community-wide critical assessment of protein structure prediction (CASP) experiments.
Collapse
Affiliation(s)
- Haiyou Deng
- College of Science, Huazhong Agricultural University, Wuhan 4R0070, P. R. China
| | - Ya Jia
- College of Physical Science and Technology, Central China Normal University, Wuhan 430079, P. R. China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA
| |
Collapse
|
24
|
He B, Mortuza SM, Wang Y, Shen HB, Zhang Y. NeBcon: protein contact map prediction using neural network training coupled with naïve Bayes classifiers. Bioinformatics 2018; 33:2296-2306. [PMID: 28369334 DOI: 10.1093/bioinformatics/btx164] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/21/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation Recent CASP experiments have witnessed exciting progress on folding large-size non-humongous proteins with the assistance of co-evolution based contact predictions. The success is however anecdotal due to the requirement of the contact prediction methods for the high volume of sequence homologs that are not available to most of the non-humongous protein targets. Development of efficient methods that can generate balanced and reliable contact maps for different type of protein targets is essential to enhance the success rate of the ab initio protein structure prediction. Results We developed a new pipeline, NeBcon, which uses the naïve Bayes classifier (NBC) theorem to combine eight state of the art contact methods that are built from co-evolution and machine learning approaches. The posterior probabilities of the NBC model are then trained with intrinsic structural features through neural network learning for the final contact map prediction. NeBcon was tested on 98 non-redundant proteins, which improves the accuracy of the best co-evolution based meta-server predictor by 22%; the magnitude of the improvement increases to 45% for the hard targets that lack sequence and structural homologs in the databases. Detailed data analysis showed that the major contribution to the improvement is due to the optimized NBC combination of the complementary information from both co-evolution and machine learning predictions. The neural network training also helps to improve the coupling of the NBC posterior probability and the intrinsic structural features, which were found particularly important for the proteins that do not have sufficient number of homologous sequences to derive reliable co-evolution profiles. Availiablity and Implementation On-line server and standalone package of the program are available at http://zhanglab.ccmb.med.umich.edu/NeBcon/ . Contact zhng@umich.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Baoji He
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - S M Mortuza
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yanting Wang
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China.,School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Hong-Bin Shen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
25
|
Schaarschmidt J, Monastyrskyy B, Kryshtafovych A, Bonvin AM. Assessment of contact predictions in CASP12: Co-evolution and deep learning coming of age. Proteins 2018; 86 Suppl 1:51-66. [PMID: 29071738 PMCID: PMC5820169 DOI: 10.1002/prot.25407] [Citation(s) in RCA: 126] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2017] [Revised: 10/06/2017] [Accepted: 10/24/2017] [Indexed: 12/20/2022]
Abstract
Following up on the encouraging results of residue-residue contact prediction in the CASP11 experiment, we present the analysis of predictions submitted for CASP12. The submissions include predictions of 34 groups for 38 domains classified as free modeling targets which are not accessible to homology-based modeling due to a lack of structural templates. CASP11 saw a rise of coevolution-based methods outperforming other approaches. The improvement of these methods coupled to machine learning and sequence database growth are most likely the main driver for a significant improvement in average precision from 27% in CASP11 to 47% in CASP12. In more than half of the targets, especially those with many homologous sequences accessible, precisions above 90% were achieved with the best predictors reaching a precision of 100% in some cases. We furthermore tested the impact of using these contacts as restraints in ab initio modeling of 14 single-domain free modeling targets using Rosetta. Adding contacts to the Rosetta calculations resulted in improvements of up to 26% in GDT_TS within the top five structures.
Collapse
Affiliation(s)
- Joerg Schaarschmidt
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| | | | | | - Alexandre M.J.J. Bonvin
- Faculty of Science ‐ ChemistryComputational Structural Biology Group, Bijvoet Center for Biomolecular Research, Utrecht UniversityUtrechtThe Netherlands
| |
Collapse
|
26
|
Mandalaparthy V, Sanaboyana VR, Rafalia H, Gosavi S. Exploring the effects of sparse restraints on protein structure prediction. Proteins 2017; 86:248-262. [DOI: 10.1002/prot.25438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/20/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023]
Affiliation(s)
- Varun Mandalaparthy
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Venkata Ramana Sanaboyana
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Hitesh Rafalia
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
- Manipal University, Madhav Nagar; Manipal 576104 India
| | - Shachi Gosavi
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| |
Collapse
|
27
|
Conserved salt-bridge competition triggered by phosphorylation regulates the protein interactome. Proc Natl Acad Sci U S A 2017; 114:13453-13458. [PMID: 29208709 DOI: 10.1073/pnas.1711543114] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Phosphorylation is a major regulator of protein interactions; however, the mechanisms by which regulation occurs are not well understood. Here we identify a salt-bridge competition or "theft" mechanism that enables a phospho-triggered swap of protein partners by Raf Kinase Inhibitory Protein (RKIP). RKIP transitions from inhibiting Raf-1 to inhibiting G-protein-coupled receptor kinase 2 upon phosphorylation, thereby bridging MAP kinase and G-Protein-Coupled Receptor signaling. NMR and crystallography indicate that a phosphoserine, but not a phosphomimetic, competes for a lysine from a preexisting salt bridge, initiating a partial unfolding event and promoting new protein interactions. Structural elements underlying the theft occurred early in evolution and are found in 10% of homo-oligomers and 30% of hetero-oligomers including Bax, Troponin C, and Early Endosome Antigen 1. In contrast to a direct recognition of phosphorylated residues by binding partners, the salt-bridge theft mechanism represents a facile strategy for promoting or disrupting protein interactions using solvent-accessible residues, and it can provide additional specificity at protein interfaces through local unfolding or conformational change.
Collapse
|
28
|
Wozniak PP, Konopka BM, Xu J, Vriend G, Kotulska M. Forecasting residue-residue contact prediction accuracy. Bioinformatics 2017; 33:3405-3414. [PMID: 29036497 DOI: 10.1093/bioinformatics/btx416] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 06/22/2017] [Indexed: 11/14/2022] Open
Abstract
Motivation Apart from meta-predictors, most of today's methods for residue-residue contact prediction are based entirely on Direct Coupling Analysis (DCA) of correlated mutations in multiple sequence alignments (MSAs). These methods are on average ∼40% correct for the 100 strongest predicted contacts in each protein. The end-user who works on a single protein of interest will not know if predictions are either much more or much less correct than 40%, which is especially a problem if contacts are predicted to steer experimental research on that protein. Results We designed a regression model that forecasts the accuracy of residue-residue contact prediction for individual proteins with an average error of 7 percentage points. Contacts were predicted with two DCA methods (gplmDCA and PSICOV). The models were built on parameters that describe the MSA, the predicted secondary structure, the predicted solvent accessibility and the contact prediction scores for the target protein. Results show that our models can be also applied to the meta-methods, which was tested on RaptorX. Availability and implementation All data and scripts are available from http://comprec-lin.iiar.pwr.edu.pl/dcaQ/. Contact malgorzata.kotulska@pwr.edu.pl. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- P P Wozniak
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - B M Konopka
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| | - J Xu
- Toyota Technological Institute at Chicago, Chicago, IL 60637, USA
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, GA 6525, Nijmegen, The Netherlands
| | - M Kotulska
- Department of Biomedical Engineering, Faculty of Fundamental Problems of Technology, Wroclaw University of Science and Technology, Wroclaw, Poland
| |
Collapse
|
29
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
30
|
Xiong D, Mao W, Gong H. Predicting the helix-helix interactions from correlated residue mutations. Proteins 2017; 85:2162-2169. [PMID: 28833538 DOI: 10.1002/prot.25370] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Revised: 08/03/2017] [Accepted: 08/13/2017] [Indexed: 12/30/2022]
Abstract
Helix-helix interactions are crucial in the structure assembly, stability and function of helix-rich proteins including many membrane proteins. In spite of remarkable progresses over the past decades, the accuracy of predicting protein structures from their amino acid sequences is still far from satisfaction. In this work, we focused on a simpler problem, the prediction of helix-helix interactions, the results of which could facilitate practical protein structure prediction by constraining the sampling space. Specifically, we started from the noisy 2D residue contact maps derived from correlated residue mutations, and utilized ridge detection to identify the characteristic residue contact patterns for helix-helix interactions. The ridge information as well as a few additional features were then fed into a machine learning model HHConPred to predict interactions between helix pairs. In an independent test, our method achieved an F-measure of ∼60% for predicting helix-helix interactions. Moreover, although the model was trained mainly using soluble proteins, it could be extended to membrane proteins with at least comparable performance relatively to previous approaches that were generated purely using membrane proteins. All data and source codes are available at http://166.111.152.91/Downloads.html or https://github.com/dpxiong/HHConPred.
Collapse
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Wenzhi Mao
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
31
|
Lam SD, Das S, Sillitoe I, Orengo C. An overview of comparative modelling and resources dedicated to large-scale modelling of genome sequences. Acta Crystallogr D Struct Biol 2017; 73:628-640. [PMID: 28777078 PMCID: PMC5571743 DOI: 10.1107/s2059798317008920] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 06/14/2017] [Indexed: 12/02/2022] Open
Abstract
Computational modelling of proteins has been a major catalyst in structural biology. Bioinformatics groups have exploited the repositories of known structures to predict high-quality structural models with high efficiency at low cost. This article provides an overview of comparative modelling, reviews recent developments and describes resources dedicated to large-scale comparative modelling of genome sequences. The value of subclustering protein domain superfamilies to guide the template-selection process is investigated. Some recent cases in which structural modelling has aided experimental work to determine very large macromolecular complexes are also cited.
Collapse
Affiliation(s)
- Su Datt Lam
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
- School of Biosciences and Biotechnology, Faculty of Science and Technology, University Kebangsaan Malaysia, 43600 Bangi, Selangor, Malaysia
| | - Sayoni Das
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| | - Christine Orengo
- Institute of Structural and Molecular Biology, UCL, Darwin Building, Gower Street, London WC1E 6BT, England
| |
Collapse
|
32
|
Teixeira PL, Mendenhall JL, Heinze S, Weiner B, Skwark MJ, Meiler J. Membrane protein contact and structure prediction using co-evolution in conjunction with machine learning. PLoS One 2017; 12:e0177866. [PMID: 28542325 PMCID: PMC5443516 DOI: 10.1371/journal.pone.0177866] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2016] [Accepted: 05/04/2017] [Indexed: 11/18/2022] Open
Abstract
De novo membrane protein structure prediction is limited to small proteins due to the conformational search space quickly expanding with length. Long-range contacts (24+ amino acid separation)-residue positions distant in sequence, but in close proximity in the structure, are arguably the most effective way to restrict this conformational space. Inverse methods for co-evolutionary analysis predict a global set of position-pair couplings that best explain the observed amino acid co-occurrences, thus distinguishing between evolutionarily explained co-variances and these arising from spurious transitive effects. Here, we show that applying machine learning approaches and custom descriptors improves evolutionary contact prediction accuracy, resulting in improvement of average precision by 6 percentage points for the top 1L non-local contacts. Further, we demonstrate that predicted contacts improve protein folding with BCL::Fold. The mean RMSD100 metric for the top 10 models folded was reduced by an average of 2 Å for a benchmark of 25 membrane proteins.
Collapse
Affiliation(s)
- Pedro L. Teixeira
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
| | - Jeff L. Mendenhall
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Sten Heinze
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Brian Weiner
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Marcin J. Skwark
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
| | - Jens Meiler
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Chemistry, Center for Structural Biology, Vanderbilt University, Nashville Tennessee, United States of America
- * E-mail:
| |
Collapse
|
33
|
Xiong D, Zeng J, Gong H. A deep learning framework for improving long-range residue–residue contact prediction using a hierarchical strategy. Bioinformatics 2017; 33:2675-2683. [DOI: 10.1093/bioinformatics/btx296] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2016] [Accepted: 05/02/2017] [Indexed: 12/31/2022] Open
Affiliation(s)
- Dapeng Xiong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| | - Jianyang Zeng
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Haipeng Gong
- MOE Key Laboratory of Bioinformatics, School of Life Sciences, Tsinghua University, Beijing, China
- Beijing Innovation Center of Structural Biology, Tsinghua University, Beijing, China
| |
Collapse
|
34
|
Sabzekar M, Naghibzadeh M, Sadri J. Efficient dynamic programming algorithm with prior knowledge for protein β-strand alignment. J Theor Biol 2017; 417:43-50. [PMID: 28108305 DOI: 10.1016/j.jtbi.2017.01.018] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2016] [Revised: 11/11/2016] [Accepted: 01/12/2017] [Indexed: 11/30/2022]
Abstract
One of the main tasks towards the prediction of protein β-sheet structure is to predict the native alignment of β-strands. The alignment of two β-strands defines similar regions that may reflect functional, structural, or evolutionary relationships between them. Therefore, any improvement in β-strands alignment not only reduces the computational search space but also improves β-sheet structure prediction accuracy. To define the alignment scores, previous studies utilized predicted residue-residue contacts (contact maps). However, there are two serious problems using them. First, the precision of contact map prediction techniques, especially for long-range contacts (i.e., β-residues), is still not satisfactory. Second, the residue-residue contact predictors usually utilize general properties of amino acids and disregard the structural features of β-residues. In this paper, we consider β-structure information, which is estimated from protein β-sheet data sets, as alignment scores. However, the predicted contact maps are used as a prior knowledge about residues. They are used for strengthening or weakening the alignment scores in our algorithm. Thus, we can utilize both β-residues and β-structure information in alignment of β-strands. The structure of dynamic programming of the alignment algorithm is changed in order to work with our prior knowledge. Moreover, the Four Russians method is applied to the proposed alignment algorithm in order to reduce the time complexity of the problem. For evaluating the proposed method, we applied it to the state-of-the-art β-sheet structure prediction methods. The experimental results on the BetaSheet916 data set showed significant improvements in the execution time, the accuracy of β-strands' alignment and consequently β-sheet structure prediction accuracy. The results are available at http://conceptsgate.com/BetaSheet.
Collapse
Affiliation(s)
- Mostafa Sabzekar
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran
| | - Mahmoud Naghibzadeh
- Department of Computer Engineering, Ferdowsi University of Mashhad, Mashhad, Iran.
| | - Javad Sadri
- Department of Computer Science & Software Engineering, Concordia University, Canada
| |
Collapse
|
35
|
Wozniak PP, Vriend G, Kotulska M. Correlated mutations select misfolded from properly folded proteins. Bioinformatics 2017; 33:1497-1504. [PMID: 28203707 DOI: 10.1093/bioinformatics/btx013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2016] [Accepted: 01/11/2017] [Indexed: 11/14/2022] Open
Affiliation(s)
- P P Wozniak
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| | - G Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Centre, Nijmegen, The Netherlands
| | - M Kotulska
- Faculty of Fundamental Problems of Technology, Department of Biomedical Engineering, Wrocław University of Science and Technology, Wrocław, Poland
| |
Collapse
|
36
|
Adhikari B, Nowotny J, Bhattacharya D, Hou J, Cheng J. ConEVA: a toolbox for comprehensive assessment of protein contacts. BMC Bioinformatics 2016; 17:517. [PMID: 27923350 PMCID: PMC5142288 DOI: 10.1186/s12859-016-1404-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2016] [Accepted: 12/01/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In recent years, successful contact prediction methods and contact-guided ab initio protein structure prediction methods have highlighted the importance of incorporating contact information into protein structure prediction methods. It is also observed that for almost all globular proteins, the quality of contact prediction dictates the accuracy of structure prediction. Hence, like many existing evaluation measures for evaluating 3D protein models, various measures are currently used to evaluate predicted contacts, with the most popular ones being precision, coverage and distance distribution score (Xd). RESULTS We have built a web application and a downloadable tool, ConEVA, for comprehensive assessment and detailed comparison of predicted contacts. Besides implementing existing measures for contact evaluation we have implemented new and useful methods of contact visualization using chord diagrams and comparison using Jaccard similarity computations. For a set (or sets) of predicted contacts, the web application runs even when a native structure is not available, visualizing the contact coverage and similarity between predicted contacts. We applied the tool on various contact prediction data sets and present our findings and insights we obtained from the evaluation of effective contact assessments. ConEVA is publicly available at http://cactus.rnet.missouri.edu/coneva/ . CONCLUSION ConEVA is useful for a range of contact related analysis and evaluations including predicted contact comparison, investigation of individual protein folding using predicted contacts, and analysis of contacts in a structure of interest.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jackson Nowotny
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | | | - Jie Hou
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211 USA
- Informatics Institute, University of Missouri, Columbia, MO 65211 USA
- C. Bond Life Science Center, University of Missouri, Columbia, MO 65211 USA
| |
Collapse
|
37
|
Assessing Predicted Contacts for Building Protein Three-Dimensional Models. Methods Mol Biol 2016. [PMID: 27787823 DOI: 10.1007/978-1-4939-6406-2_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Recent successes of contact-guided protein structure prediction methods have revived interest in solving the long-standing problem of ab initio protein structure prediction. With homology modeling failing for many protein sequences that do not have templates, contact-guided structure prediction has shown promise, and consequently, contact prediction has gained a lot of interest recently. Although a few dozen contact prediction tools are already currently available as web servers and downloadables, not enough research has been done towards using existing measures like precision and recall to evaluate these contacts with the goal of building three-dimensional models. Moreover, when we do not have a native structure for a set of predicted contacts, the only analysis we can perform is a simple contact map visualization of the predicted contacts. A wider and more rigorous assessment of the predicted contacts is needed, in order to build tertiary structure models. This chapter discusses instructions and protocols for using tools and applying techniques in order to assess predicted contacts for building three-dimensional models.
Collapse
|
38
|
Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016; 84 Suppl 1:131-44. [PMID: 26474083 PMCID: PMC4834069 DOI: 10.1002/prot.24943] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/15/2015] [Accepted: 10/11/2015] [Indexed: 12/27/2022]
Abstract
This article provides a report on the state-of-the-art in the prediction of intra-molecular residue-residue contacts in proteins based on the assessment of the predictions submitted to the CASP11 experiment. The assessment emphasis is placed on the accuracy in predicting long-range contacts. Twenty-nine groups participated in contact prediction in CASP11. At least eight of them used the recently developed evolutionary coupling techniques, with the top group (CONSIP2) reaching precision of 27% on target proteins that could not be modeled by homology. This result indicates a breakthrough in the development of methods based on the correlated mutation approach. Successful prediction of contacts was shown to be practically helpful in modeling three-dimensional structures; in particular target T0806 was modeled exceedingly well with accuracy not yet seen for ab initio targets of this size (>250 residues). Proteins 2016; 84(Suppl 1):131-144. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Daniel D'Andrea
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
| | | | - Anna Tramontano
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
- Istituto Pasteur-Fondazione Cenci Bolognetti-University of Rome, Rome, 00185, Italy
| | | |
Collapse
|
39
|
Li Q, Dahl DB, Vannucci M, Joo H, Tsai JW. KScons: a Bayesian approach for protein residue contact prediction using the knob-socket model of protein tertiary structure. Bioinformatics 2016; 32:3774-3781. [PMID: 27559156 DOI: 10.1093/bioinformatics/btw553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 07/15/2016] [Accepted: 08/18/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION By simplifying the many-bodied complexity of residue packing into patterns of simple pairwise secondary structure interactions between a single knob residue with a three-residue socket, the knob-socket construct allows a more direct incorporation of structural information into the prediction of residue contacts. By modeling the preferences between the amino acid composition of a socket and knob, we undertake an investigation of the knob-socket construct's ability to improve the prediction of residue contacts. The statistical model considers three priors and two posterior estimations to better understand how the input data affects predictions. This produces six implementations of KScons that are tested on three sets: PSICOV, CASP10 and CASP11. We compare against the current leading contact prediction methods. RESULTS The results demonstrate the usefulness as well as the limits of knob-socket based structural modeling of protein contacts. The construct is able to extract good predictions from known structural homologs, while its performance degrades when no homologs exist. Among our six implementations, KScons MST-MP (which uses the multiple structure alignment prior and marginal posterior incorporating structural homolog information) performs the best in all three prediction sets. An analysis of recall and precision finds that KScons MST-MP improves accuracy not only by improving identification of true positives, but also by decreasing the number of false positives. Over the CASP10 and CASP11 sets, KScons MST-MP performs better than the leading methods using only evolutionary coupling data, but not quite as well as the supervised learning methods of MetaPSICOV and CoinDCA-NN that incorporate a large set of structural features. CONTACT qiwei.li@rice.eduSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Qiwei Li
- Department of Statistics, Rice University, Houston, TX, USA
| | - David B Dahl
- Department of Statistics, Brigham Young University, Provo, UT, USA
| | | | - Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, CA, USA
| | - Jerry W Tsai
- Department of Chemistry, University of the Pacific, Stockton, CA, USA
| |
Collapse
|
40
|
Huwe PJ, Xu Q, Shapovalov MV, Modi V, Andrake MD, Dunbrack RL. Biological function derived from predicted structures in CASP11. Proteins 2016; 84 Suppl 1:370-91. [PMID: 27181425 DOI: 10.1002/prot.24997] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2015] [Revised: 01/10/2016] [Accepted: 01/18/2016] [Indexed: 12/26/2022]
Abstract
In CASP11, the organizers sought to bring the biological inferences from predicted structures to the fore. To accomplish this, we assessed the models for their ability to perform quantifiable tasks related to biological function. First, for 10 targets that were probable homodimers, we measured the accuracy of docking the models into homodimers as a function of GDT-TS of the monomers, which produced characteristic L-shaped plots. At low GDT-TS, none of the models could be docked correctly as homodimers. Above GDT-TS of ∼60%, some models formed correct homodimers in one of the largest docked clusters, while many other models at the same values of GDT-TS did not. Docking was more successful when many of the templates shared the same homodimer. Second, we docked a ligand from an experimental structure into each of the models of one of the targets. Docking to the models with two different programs produced poor ligand RMSDs with the experimental structure. Measures that evaluated similarity of contacts were reasonable for some of the models, although there was not a significant correlation with model accuracy. Finally, we assessed whether models would be useful in predicting the phenotypes of missense mutations in three human targets by comparing features calculated from the models with those calculated from the experimental structures. The models were successful in reproducing accessible surface areas but there was little correlation of model accuracy with calculation of FoldX evaluation of the change in free energy between the wild-type and the mutant. Proteins 2016; 84(Suppl 1):370-391. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Peter J Huwe
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | | - Vivek Modi
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | - Mark D Andrake
- Fox Chase Cancer Center, Philadelphia, Pennsylvania, 19111
| | | |
Collapse
|
41
|
Bhattacharya D, Cao R, Cheng J. UniCon3D: de novo protein structure prediction using united-residue conformational search via stepwise, probabilistic sampling. Bioinformatics 2016; 32:2791-9. [PMID: 27259540 PMCID: PMC5018369 DOI: 10.1093/bioinformatics/btw316] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/15/2016] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Recent experimental studies have suggested that proteins fold via stepwise assembly of structural units named 'foldons' through the process of sequential stabilization. Alongside, latest developments on computational side based on probabilistic modeling have shown promising direction to perform de novo protein conformational sampling from continuous space. However, existing computational approaches for de novo protein structure prediction often randomly sample protein conformational space as opposed to experimentally suggested stepwise sampling. RESULTS Here, we develop a novel generative, probabilistic model that simultaneously captures local structural preferences of backbone and side chain conformational space of polypeptide chains in a united-residue representation and performs experimentally motivated conditional conformational sampling via stepwise synthesis and assembly of foldon units that minimizes a composite physics and knowledge-based energy function for de novo protein structure prediction. The proposed method, UniCon3D, has been found to (i) sample lower energy conformations with higher accuracy than traditional random sampling in a small benchmark of 6 proteins; (ii) perform comparably with the top five automated methods on 30 difficult target domains from the 11th Critical Assessment of Protein Structure Prediction (CASP) experiment and on 15 difficult target domains from the 10th CASP experiment; and (iii) outperform two state-of-the-art approaches and a baseline counterpart of UniCon3D that performs traditional random sampling for protein modeling aided by predicted residue-residue contacts on 45 targets from the 10th edition of CASP. AVAILABILITY AND IMPLEMENTATION Source code, executable versions, manuals and example data of UniCon3D for Linux and OSX are freely available to non-commercial users at http://sysbio.rnet.missouri.edu/UniCon3D/ CONTACT: chengji@missouri.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Jianlin Cheng
- Department of Computer Science Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
42
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 2016; 84 Suppl 1:4-14. [PMID: 27171127 DOI: 10.1002/prot.25064] [Citation(s) in RCA: 149] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2016] [Revised: 04/29/2016] [Accepted: 05/08/2016] [Indexed: 12/15/2022]
Abstract
Modeling of protein structure from amino acid sequence now plays a major role in structural biology. Here we report new developments and progress from the CASP11 community experiment, assessing the state of the art in structure modeling. Notable points include the following: (1) New methods for predicting three dimensional contacts resulted in a few spectacular template free models in this CASP, whereas models based on sequence homology to proteins with experimental structure continue to be the most accurate. (2) Refinement of initial protein models, primarily using molecular dynamics related approaches, has now advanced to the point where the best methods can consistently (though slightly) improve nearly all models. (3) The use of relatively sparse NMR constraints dramatically improves the accuracy of models, and another type of sparse data, chemical crosslinking, introduced in this CASP, also shows promise for producing better models. (4) A new emphasis on modeling protein complexes, in collaboration with CAPRI, has produced interesting results, but also shows the need for more focus on this area. (5) Methods for estimating the accuracy of models have advanced to the point where they are of considerable practical use. (6) A first assessment demonstrates that models can sometimes successfully address biological questions that motivate experimental structure determination. (7) There is continuing progress in accuracy of modeling regions of structure not directly available by comparative modeling, while there is marginal or no progress in some other areas. Proteins 2016; 84(Suppl 1):4-14. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland, 20850.
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, 95616
| | | | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
43
|
Yang J, Jin QY, Zhang B, Shen HB. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. ACTA ACUST UNITED AC 2016; 32:2435-43. [PMID: 27153618 DOI: 10.1093/bioinformatics/btw181] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Accepted: 04/03/2016] [Indexed: 11/12/2022]
Abstract
MOTIVATION Inter-residue contacts in proteins dictate the topology of protein structures. They are crucial for protein folding and structural stability. Accurate prediction of residue contacts especially for long-range contacts is important to the quality of ab inito structure modeling since they can enforce strong restraints to structure assembly. RESULTS In this paper, we present a new Residue-Residue Contact predictor called R2C that combines machine learning-based and correlated mutation analysis-based methods, together with a two-dimensional Gaussian noise filter to enhance the long-range residue contact prediction. Our results show that the outputs from the machine learning-based method are concentrated with better performance on short-range contacts; while for correlated mutation analysis-based approach, the predictions are widespread with higher accuracy on long-range contacts. An effective query-driven dynamic fusion strategy proposed here takes full advantages of the two different methods, resulting in an impressive overall accuracy improvement. We also show that the contact map directly from the prediction model contains the interesting Gaussian noise, which has not been discovered before. Different from recent studies that tried to further enhance the quality of contact map by removing its transitive noise, we designed a new two-dimensional Gaussian noise filter, which was especially helpful for reinforcing the long-range residue contact prediction. Tested on recent CASP10/11 datasets, the overall top L/5 accuracy of our final R2C predictor is 17.6%/15.5% higher than the pure machine learning-based method and 7.8%/8.3% higher than the correlated mutation analysis-based approach for the long-range residue contact prediction. AVAILABILITY AND IMPLEMENTATION http://www.csbio.sjtu.edu.cn/bioinf/R2C/Contact:hbshen@sjtu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jing Yang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Qi-Yu Jin
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Biao Zhang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|
44
|
Kinch LN, Li W, Monastyrskyy B, Kryshtafovych A, Grishin NV. Assessment of CASP11 contact-assisted predictions. Proteins 2016; 84 Suppl 1:164-80. [PMID: 26889875 DOI: 10.1002/prot.25020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2015] [Revised: 02/05/2016] [Accepted: 02/14/2016] [Indexed: 12/26/2022]
Abstract
We present an overview of contact-assisted predictions in the eleventh round of critical assessment of protein structure prediction (CASP11), which included four categories: predicted contacts (Tp), correct contacts (Tc), simulated sparse NMR contacts (Ts), and cross-linking contacts (Tx). Comparison of assisted to unassisted model quality highlighted a relatively poor overall performance in CASP11 using predicted Tp and crosslinked Tx contact information. However, average model quality significantly improved in the correct Tc and simulated NMR Ts categories for most targets, where maximum improvement of unassisted models reached an impressive 70 GDT_TS. Comparison of the performance in the correct Tc category to CASP10 suggested the improvement in CASP11 model quality originated from an increased number of provided contacts per target. Group rankings based on a combination of scores used in the CASP11 free modeling (FM) assessment for each category highlight four top-performing groups, with three from the Lee lab and one from the Baker lab. We used the overall performance of these groups in each category to develop hypotheses for their relative outperformance in the correct Tc and simulated NMR Ts categories, which stemmed from the fraction of correct contacts provided (correct Tc category) and a reduced fraction of correct contacts offset by an increased coverage of the correct contacts (simulated NMR Ts category). Proteins 2016; 84(Suppl 1):164-180. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050.
| | - Wenlin Li
- Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050.,Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050
| | | | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050.,Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050.,Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas, 75390-9050
| |
Collapse
|
45
|
Jeong CS, Kim D. Structure-based Markov random field model for representing evolutionary constraints on functional sites. BMC Bioinformatics 2016; 17:99. [PMID: 26911566 PMCID: PMC4765150 DOI: 10.1186/s12859-016-0948-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2015] [Accepted: 02/15/2016] [Indexed: 11/10/2022] Open
Abstract
Background Elucidating the cooperative mechanism of interconnected residues is an important component toward understanding the biological function of a protein. Coevolution analysis has been developed to model the coevolutionary information reflecting structural and functional constraints. Recently, several methods have been developed based on a probabilistic graphical model called the Markov random field (MRF), which have led to significant improvements for coevolution analysis; however, thus far, the performance of these models has mainly been assessed by focusing on the aspect of protein structure. Results In this study, we built an MRF model whose graphical topology is determined by the residue proximity in the protein structure, and derived a novel positional coevolution estimate utilizing the node weight of the MRF model. This structure-based MRF method was evaluated for three data sets, each of which annotates catalytic site, allosteric site, and comprehensively determined functional site information. We demonstrate that the structure-based MRF architecture can encode the evolutionary information associated with biological function. Furthermore, we show that the node weight can more accurately represent positional coevolution information compared to the edge weight. Lastly, we demonstrate that the structure-based MRF model can be reliably built with only a few aligned sequences in linear time. Conclusions The results show that adoption of a structure-based architecture could be an acceptable approximation for coevolution modeling with efficient computation complexity.
Collapse
Affiliation(s)
- Chan-Seok Jeong
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea
| | - Dongsup Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), 291 Daehak-ro, Yuseong-gu, Daejeon, 34141, Republic of Korea.
| |
Collapse
|
46
|
Zhang H, Huang Q, Bei Z, Wei Y, Floudas CA. COMSAT: Residue contact prediction of transmembrane proteins based on support vector machines and mixed integer linear programming. Proteins 2016; 84:332-48. [PMID: 26756402 DOI: 10.1002/prot.24979] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/19/2015] [Accepted: 12/10/2015] [Indexed: 12/28/2022]
Abstract
In this article, we present COMSAT, a hybrid framework for residue contact prediction of transmembrane (TM) proteins, integrating a support vector machine (SVM) method and a mixed integer linear programming (MILP) method. COMSAT consists of two modules: COMSAT_SVM which is trained mainly on position-specific scoring matrix features, and COMSAT_MILP which is an ab initio method based on optimization models. Contacts predicted by the SVM model are ranked by SVM confidence scores, and a threshold is trained to improve the reliability of the predicted contacts. For TM proteins with no contacts above the threshold, COMSAT_MILP is used. The proposed hybrid contact prediction scheme was tested on two independent TM protein sets based on the contact definition of 14 Å between Cα-Cα atoms. First, using a rigorous leave-one-protein-out cross validation on the training set of 90 TM proteins, an accuracy of 66.8%, a coverage of 12.3%, a specificity of 99.3% and a Matthews' correlation coefficient (MCC) of 0.184 were obtained for residue pairs that are at least six amino acids apart. Second, when tested on a test set of 87 TM proteins, the proposed method showed a prediction accuracy of 64.5%, a coverage of 5.3%, a specificity of 99.4% and a MCC of 0.106. COMSAT shows satisfactory results when compared with 12 other state-of-the-art predictors, and is more robust in terms of prediction accuracy as the length and complexity of TM protein increase. COMSAT is freely accessible at http://hpcc.siat.ac.cn/COMSAT/.
Collapse
Affiliation(s)
- Huiling Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Qingsheng Huang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Zhendong Bei
- Center for Cloud Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Yanjie Wei
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Christodoulos A Floudas
- Department of Chemical Engineering, Texas A&M University, College Station, Texas, 77843.,Texas A&M Energy Institute, Texas A&M University, College Station, Texas, 77843
| |
Collapse
|
47
|
Abstract
In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA.
| |
Collapse
|
48
|
Mabrouk M, Werner T, Schneider M, Putz I, Brock O. Analysis of free modeling predictions by RBO aleph in CASP11. Proteins 2015; 84 Suppl 1:87-104. [PMID: 26492194 DOI: 10.1002/prot.24950] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 09/28/2015] [Accepted: 10/19/2015] [Indexed: 12/15/2022]
Abstract
The CASP experiment is a biannual benchmark for assessing protein structure prediction methods. In CASP11, RBO Aleph ranked as one of the top-performing automated servers in the free modeling category. This category consists of targets for which structural templates are not easily retrievable. We analyze the performance of RBO Aleph and show that its success in CASP was a result of its ab initio structure prediction protocol. A detailed analysis of this protocol demonstrates that two components unique to our method greatly contributed to prediction quality: residue-residue contact prediction by EPC-map and contact-guided conformational space search by model-based search (MBS). Interestingly, our analysis also points to a possible fundamental problem in evaluating the performance of protein structure prediction methods: Improvements in components of the method do not necessarily lead to improvements of the entire method. This points to the fact that these components interact in ways that are poorly understood. This problem, if indeed true, represents a significant obstacle to community-wide progress. Proteins 2016; 84(Suppl 1):87-104. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Mahmoud Mabrouk
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Tim Werner
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Michael Schneider
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Ines Putz
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany
| | - Oliver Brock
- Department of Electrical Engineering and Computer Science, Robotics and Biology Laboratory, Technische Universität Berlin, Berlin, 10587, Germany.
| |
Collapse
|
49
|
Heinze S, Putnam DK, Fischer AW, Kohlmann T, Weiner BE, Meiler J. CASP10-BCL::Fold efficiently samples topologies of large proteins. Proteins 2015; 83:547-63. [PMID: 25581562 DOI: 10.1002/prot.24733] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Revised: 10/15/2014] [Accepted: 11/03/2014] [Indexed: 12/26/2022]
Abstract
During CASP10 in summer 2012, we tested BCL::Fold for prediction of free modeling (FM) and template-based modeling (TBM) targets. BCL::Fold assembles the tertiary structure of a protein from predicted secondary structure elements (SSEs) omitting more flexible loop regions early on. This approach enables the sampling of conformational space for larger proteins with more complex topologies. In preparation of CASP11, we analyzed the quality of CASP10 models throughout the prediction pipeline to understand BCL::Fold's ability to sample the native topology, identify native-like models by scoring and/or clustering approaches, and our ability to add loop regions and side chains to initial SSE-only models. The standout observation is that BCL::Fold sampled topologies with a GDT_TS score > 33% for 12 of 18 and with a topology score > 0.8 for 11 of 18 test cases de novo. Despite the sampling success of BCL::Fold, significant challenges still exist in clustering and loop generation stages of the pipeline. The clustering approach employed for model selection often failed to identify the most native-like assembly of SSEs for further refinement and submission. It was also observed that for some β-strand proteins model refinement failed as β-strands were not properly aligned to form hydrogen bonds removing otherwise accurate models from the pool. Further, BCL::Fold samples frequently non-natural topologies that require loop regions to pass through the center of the protein.
Collapse
Affiliation(s)
- Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, Tennessee, 37240
| | | | | | | | | | | |
Collapse
|
50
|
Xiao F, Shen HB. Prediction Enhancement of Residue Real-Value Relative Accessible Surface Area in Transmembrane Helical Proteins by Solving the Output Preference Problem of Machine Learning-Based Predictors. J Chem Inf Model 2015; 55:2464-74. [DOI: 10.1021/acs.jcim.5b00246] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Affiliation(s)
- Feng Xiao
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Hong-Bin Shen
- Institute
of Image Processing
and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory
of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| |
Collapse
|