1
|
Reza MS, Zhang H, Hossain MT, Jin L, Feng S, Wei Y. COMTOP: Protein Residue-Residue Contact Prediction through Mixed Integer Linear Optimization. MEMBRANES 2021; 11:membranes11070503. [PMID: 34209399 PMCID: PMC8305966 DOI: 10.3390/membranes11070503] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2021] [Revised: 06/24/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]
Abstract
Protein contact prediction helps reconstruct the tertiary structure that greatly determines a protein’s function; therefore, contact prediction from the sequence is an important problem. Recently there has been exciting progress on this problem, but many of the existing methods are still low quality of prediction accuracy. In this paper, we present a new mixed integer linear programming (MILP)-based consensus method: a Consensus scheme based On a Mixed integer linear opTimization method for prOtein contact Prediction (COMTOP). The MILP-based consensus method combines the strengths of seven selected protein contact prediction methods, including CCMpred, EVfold, DeepCov, NNcon, PconsC4, plmDCA, and PSICOV, by optimizing the number of correctly predicted contacts and achieving a better prediction accuracy. The proposed hybrid protein residue–residue contact prediction scheme was tested in four independent test sets. For 239 highly non-redundant proteins, the method showed a prediction accuracy of 59.68%, 70.79%, 78.86%, 89.04%, 94.51%, and 97.35% for top-5L, top-3L, top-2L, top-L, top-L/2, and top-L/5 contacts, respectively. When tested on the CASP13 and CASP14 test sets, the proposed method obtained accuracies of 75.91% and 77.49% for top-L/5 predictions, respectively. COMTOP was further tested on 57 non-redundant α-helical transmembrane proteins and achieved prediction accuracies of 64.34% and 73.91% for top-L/2 and top-L/5 predictions, respectively. For all test datasets, the improvement of COMTOP in accuracy over the seven individual methods increased with the increasing number of predicted contacts. For example, COMTOP performed much better for large number of contact predictions (such as top-5L and top-3L) than for small number of contact predictions such as top-L/2 and top-L/5. The results and analysis demonstrate that COMTOP can significantly improve the performance of the individual methods; therefore, COMTOP is more robust against different types of test sets. COMTOP also showed better/comparable predictions when compared with the state-of-the-art predictors.
Collapse
Affiliation(s)
- Md. Selim Reza
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Huiling Zhang
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Md. Tofazzal Hossain
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Langxi Jin
- Department of Computer Science and Technology, School of Computer Science and Technology, Harbin University of Science and Technology, 52 Xuefu Road, Nangang District, Harbin 150080, China;
| | - Shengzhong Feng
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
| | - Yanjie Wei
- School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China; (M.S.R.); (H.Z.); (M.T.H.)
- Centre for High Performance Computing, Joint Engineering Research Center for Health Big Data Intelligent Analysis Technology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China;
- Correspondence:
| |
Collapse
|
2
|
CONAN: A Tool to Decode Dynamical Information from Molecular Interaction Maps. Biophys J 2019; 114:1267-1273. [PMID: 29590584 PMCID: PMC5883949 DOI: 10.1016/j.bpj.2018.01.033] [Citation(s) in RCA: 60] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 12/19/2017] [Accepted: 01/22/2018] [Indexed: 02/07/2023] Open
Abstract
The analysis of contacts is a powerful tool to understand biomolecular function in a series of contexts, from the investigation of dynamical behavior at equilibrium to the study of nonequilibrium dynamics in which the system moves between multiple states. We thus propose a tool called CONtact ANalysis (CONAN) that, from molecular dynamics (MD) trajectories, analyzes interresidue contacts, creates videos of time-resolved contact maps, and performs correlation, principal component, and cluster analysis, revealing how specific contacts relate to functionally relevant states sampled by MD. We present how CONAN can identify features describing the dynamics of ubiquitin both at equilibrium and during mechanical unfolding. Additionally, we show the analysis of MD trajectories of an α-synuclein mutant peptide that undergoes an α-β conformational transition that can be easily monitored using CONAN, which identifies the multiple states that the peptide explores along its conformational dynamics. The high versatility and ease of use of the software make CONAN a tool that can significantly facilitate the understanding of the complex dynamical behavior of proteins or other biomolecules. CONAN and its documentation are freely available for download on GitHub.
Collapse
|
3
|
Abstract
In the field of computational structural proteomics, contact predictions have shown new prospects of solving the longstanding problem of ab initio protein structure prediction. In the last few years, application of deep learning algorithms and availability of large protein sequence databases, combined with improvement in methods that derive contacts from multiple sequence alignments, have shown a huge increase in the precision of contact prediction. In addition, these predicted contacts have also been used to build three-dimensional models from scratch.In this chapter, we briefly discuss many elements of protein residue-residue contacts and the methods available for prediction, focusing on a state-of-the-art contact prediction tool, DNcon. Illustrating with a case study, we describe how DNcon can be used to make ab initio contact predictions for a given protein sequence and discuss how the predicted contacts may be analyzed and evaluated.
Collapse
Affiliation(s)
- Badri Adhikari
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, 201 Engineering Building West, Columbia, MO, 65211, USA.
| |
Collapse
|
4
|
Pietal MJ, Bujnicki JM, Kozlowski LP. GDFuzz3D: a method for protein 3D structure reconstruction from contact maps, based on a non-Euclidean distance function. Bioinformatics 2015; 31:3499-505. [PMID: 26130575 DOI: 10.1093/bioinformatics/btv390] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2014] [Accepted: 06/23/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION To date, only a few distinct successful approaches have been introduced to reconstruct a protein 3D structure from a map of contacts between its amino acid residues (a 2D contact map). Current algorithms can infer structures from information-rich contact maps that contain a limited fraction of erroneous predictions. However, it is difficult to reconstruct 3D structures from predicted contact maps that usually contain a high fraction of false contacts. RESULTS We describe a new, multi-step protocol that predicts protein 3D structures from the predicted contact maps. The method is based on a novel distance function acting on a fuzzy residue proximity graph, which predicts a 2D distance map from a 2D predicted contact map. The application of a Multi-Dimensional Scaling algorithm transforms that predicted 2D distance map into a coarse 3D model, which is further refined by typical modeling programs into an all-atom representation. We tested our approach on contact maps predicted de novo by MULTICOM, the top contact map predictor according to CASP10. We show that our method outperforms FT-COMAR, the state-of-the-art method for 3D structure reconstruction from 2D maps. For all predicted 2D contact maps of relatively low sensitivity (60-84%), GDFuzz3D generates more accurate 3D models, with the average improvement of 4.87 Å in terms of RMSD. AVAILABILITY AND IMPLEMENTATION GDFuzz3D server and standalone version are freely available at http://iimcb.genesilico.pl/gdserver/GDFuzz3D/. CONTACT iamb@genesilico.pl SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Michal J Pietal
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Laboratory of Functional and Structural Genomics, Centre of New Technologies, University of Warsaw, Warsaw, Poland and
| | - Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland, Bioinformatics Laboratory, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University, Poznan, Poland
| | - Lukasz P Kozlowski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland
| |
Collapse
|
5
|
Ding W, Xie J, Dai D, Zhang H, Xie H, Zhang W. CNNcon: improved protein contact maps prediction using cascaded neural networks. PLoS One 2013; 8:e61533. [PMID: 23626696 PMCID: PMC3634008 DOI: 10.1371/journal.pone.0061533] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 03/11/2013] [Indexed: 11/18/2022] Open
Abstract
BACKGROUNDS Despite continuing progress in X-ray crystallography and high-field NMR spectroscopy for determination of three-dimensional protein structures, the number of unsolved and newly discovered sequences grows much faster than that of determined structures. Protein modeling methods can possibly bridge this huge sequence-structure gap with the development of computational science. A grand challenging problem is to predict three-dimensional protein structure from its primary structure (residues sequence) alone. However, predicting residue contact maps is a crucial and promising intermediate step towards final three-dimensional structure prediction. Better predictions of local and non-local contacts between residues can transform protein sequence alignment to structure alignment, which can finally improve template based three-dimensional protein structure predictors greatly. METHODS CNNcon, an improved multiple neural networks based contact map predictor using six sub-networks and one final cascade-network, was developed in this paper. Both the sub-networks and the final cascade-network were trained and tested with their corresponding data sets. While for testing, the target protein was first coded and then input to its corresponding sub-networks for prediction. After that, the intermediate results were input to the cascade-network to finish the final prediction. RESULTS The CNNcon can accurately predict 58.86% in average of contacts at a distance cutoff of 8 Å for proteins with lengths ranging from 51 to 450. The comparison results show that the present method performs better than the compared state-of-the-art predictors. Particularly, the prediction accuracy keeps steady with the increase of protein sequence length. It indicates that the CNNcon overcomes the thin density problem, with which other current predictors have trouble. This advantage makes the method valuable to the prediction of long length proteins. As a result, the effective prediction of long length proteins could be possible by the CNNcon.
Collapse
Affiliation(s)
- Wang Ding
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Jiang Xie
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- Department of Mathematics, University of California Irvine, Irvine, California, United States of America
| | - Dongbo Dai
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Huiran Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
| | - Hao Xie
- College of Stomatology, Wuhan University, Wuhan, People’s Republic of China
| | - Wu Zhang
- School of Computer Engineering and Science, Shanghai University, Shanghai, People’s Republic of China
- Institute of Systems Biology, Shanghai University, Shanghai, People’s Republic of China
- * E-mail:
| |
Collapse
|
6
|
Fačkovec B, Vondrášek J. Optimal definition of inter-residual contact in globular proteins based on pairwise interaction energy calculations, its robustness, and applications. J Phys Chem B 2012; 116:12651-60. [PMID: 22988914 DOI: 10.1021/jp303088n] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Although a contact is an essential measurement for the topology as well as strength of non-covalent interactions in biomolecules and their complexes, there is no general agreement in the definition of this feature. Most of the definitions work with simple geometric criteria which do not fully reflect the energy content or ability of the biomolecular building blocks to arrange their environment. We offer a reasonable solution to this problem by distinguishing between "productive" and "non-productive" contacts based on their interaction energy strength and properties. We have proposed a method which converts the protein topology into a contact map that represents interactions with statistically significant high interaction energies. We do not prove that these contacts are exclusively stabilizing, but they represent a gateway to thermodynamically important rather than geometry-based contacts. The process is based on protein fragmentation and calculation of interaction energies using the OPLS force field and relies on pairwise additivity of amino acid interactions. Our approach integrates the treatment of different types of interactions, avoiding the problems resulting from different contributions to the overall stability and the different effect of the environment. The first applications on a set of homologous proteins have shown the usefulness of this classification for a sound estimate of protein stability.
Collapse
Affiliation(s)
- Boris Fačkovec
- Institute of Organic Chemistry and Biochemistry, Academy of Sciences of the Czech Republic, Flemingovo nam. 2, 166 10 Prague 6, Czech Republic
| | | |
Collapse
|
7
|
|
8
|
Vehlow C, Stehr H, Winkelmann M, Duarte JM, Petzold L, Dinse J, Lappe M. CMView: Interactive contact map visualization and analysis. Bioinformatics 2011; 27:1573-4. [DOI: 10.1093/bioinformatics/btr163] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
9
|
Blurring contact maps of thousands of proteins: what we can learn by reconstructing 3D structure. BioData Min 2011; 4:1. [PMID: 21232136 PMCID: PMC3033854 DOI: 10.1186/1756-0381-4-1] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2010] [Accepted: 01/13/2011] [Indexed: 11/17/2022] Open
Abstract
Background The present knowledge of protein structures at atomic level derives from some 60,000 molecules. Yet the exponential ever growing set of hypothetical protein sequences comprises some 10 million chains and this makes the problem of protein structure prediction one of the challenging goals of bioinformatics. In this context, the protein representation with contact maps is an intermediate step of fold recognition and constitutes the input of contact map predictors. However contact map representations require fast and reliable methods to reconstruct the specific folding of the protein backbone. Methods In this paper, by adopting a GRID technology, our algorithm for 3D reconstruction FT-COMAR is benchmarked on a huge set of non redundant proteins (1716) taking random noise into consideration and this makes our computation the largest ever performed for the task at hand. Results We can observe the effects of introducing random noise on 3D reconstruction and derive some considerations useful for future implementations. The dimension of the protein set allows also statistical considerations after grouping per SCOP structural classes. Conclusions All together our data indicate that the quality of 3D reconstruction is unaffected by deleting up to an average 75% of the real contacts while only few percentage of randomly generated contacts in place of non-contacts are sufficient to hamper 3D reconstruction.
Collapse
|
10
|
Nasrallah CA, Mathews DH, Huelsenbeck JP. Quantifying the impact of dependent evolution among sites in phylogenetic inference. Syst Biol 2010; 60:60-73. [PMID: 21081481 PMCID: PMC2997629 DOI: 10.1093/sysbio/syq074] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Nearly all commonly used methods of phylogenetic inference assume that characters in an alignment evolve independently of one another. This assumption is attractive for simplicity and computational tractability but is not biologically reasonable for RNAs and proteins that have secondary and tertiary structures. Here, we simulate RNA and protein-coding DNA sequence data under a general model of dependence in order to assess the robustness of traditional methods of phylogenetic inference to violation of the assumption of independence among sites. We find that the accuracy of independence-assuming methods is reduced by the dependence among sites; for proteins this reduction is relatively mild, but for RNA this reduction may be substantial. We introduce the concept of effective sequence length and its utility for considering information content in phylogenetics.
Collapse
Affiliation(s)
- Chris A Nasrallah
- Department of Integrative Biology, University of California, Berkeley, 3060 Valley Life Sciences Building #3140, Berkeley, CA 94720-3140, USA.
| | | | | |
Collapse
|
11
|
Gillespie J, Mayne M, Jiang M. RNA folding on the 3D triangular lattice. BMC Bioinformatics 2009; 10:369. [PMID: 19891777 PMCID: PMC2780420 DOI: 10.1186/1471-2105-10-369] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2009] [Accepted: 11/05/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Difficult problems in structural bioinformatics are often studied in simple exact models to gain insights and to derive general principles. Protein folding, for example, has long been studied in the lattice model. Recently, researchers have also begun to apply the lattice model to the study of RNA folding. RESULTS We present a novel method for predicting RNA secondary structures with pseudoknots: first simulate the folding dynamics of the RNA sequence on the 3D triangular lattice, next extract and select a set of disjoint base pairs from the best lattice conformation found by the folding simulation. Experiments on sequences from PseudoBase show that our prediction method outperforms the HotKnot algorithm of Ren, Rastegari, Condon and Hoos, a leading method for RNA pseudoknot prediction. Our method for RNA secondary structure prediction can be adapted into an efficient reconstruction method that, given an RNA sequence and an associated secondary structure, finds a conformation of the sequence on the 3D triangular lattice that realizes the base pairs in the secondary structure. We implemented a suite of computer programs for the simulation and visualization of RNA folding on the 3D triangular lattice. These programs come with detailed documentation and are accessible from the companion website of this paper at http://www.cs.usu.edu/~mjiang/rna/DeltaIS/. CONCLUSION Folding simulation on the 3D triangular lattice is effective method for RNA secondary structure prediction and lattice conformation reconstruction. The visualization software for the lattice conformations of RNA structures is a valuable tool for the study of RNA folding and is a great pedagogic device.
Collapse
Affiliation(s)
- Joel Gillespie
- Department of Computer Science, Utah State University, Logan, Utah 84322-4205, USA.
| | | | | |
Collapse
|
12
|
Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R. Reconstruction of 3D structures from protein contact maps. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:357-367. [PMID: 18670040 DOI: 10.1109/tcbb.2008.27] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
The prediction of the protein tertiary structure from solely its residue sequence (the so called Protein Folding Problem) is one of the most challenging problems in Structural Bioinformatics. We focus on the protein residue contact map. When this map is assigned it is possible to reconstruct the 3D structure of the protein backbone. The general problem of recovering a set of 3D coordinates consistent with some given contact map is known as a unit-disk-graph realization problem and it has been recently proven to be NP-Hard. In this paper we describe a heuristic method (COMAR) that is able to reconstruct with an unprecedented rate (3-15 seconds) a 3D model that exactly matches the target contact map of a protein. Working with a non-redundant set of 1760 proteins, we find that the scoring efficiency of finding a 3D model very close to the protein native structure depends on the threshold value adopted to compute the protein residue contact map. Contact maps whose threshold values range from 10 to 18 Angstroms allow reconstructing 3D models that are very similar to the proteins native structure.
Collapse
Affiliation(s)
- Marco Vassura
- Department of Computer Science, University of Bologna, Via Mura Anteo Zamboni 7, 40127 Bologna, Italy.
| | | | | | | | | | | |
Collapse
|
13
|
Dunn S, Wahl L, Gloor G. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2007; 24:333-40. [DOI: 10.1093/bioinformatics/btm604] [Citation(s) in RCA: 363] [Impact Index Per Article: 21.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
14
|
Zhang J, Kou SC, Liu JS. Biopolymer structure simulation and optimization via fragment regrowth Monte Carlo. J Chem Phys 2007; 126:225101. [PMID: 17581081 DOI: 10.1063/1.2736681] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
An efficient exploration of the configuration space of a biopolymer is essential for its structure modeling and prediction. In this study, the authors propose a new Monte Carlo method, fragment regrowth via energy-guided sequential sampling (FRESS), which incorporates the idea of multigrid Monte Carlo into the framework of configurational-bias Monte Carlo and is suitable for chain polymer simulations. As a by-product, the authors also found a novel extension of the Metropolis Monte Carlo framework applicable to all Monte Carlo computations. They tested FRESS on hydrophobic-hydrophilic (HP) protein folding models in both two and three dimensions. For the benchmark sequences, FRESS not only found all the minimum energies obtained by previous studies with substantially less computation time but also found new lower energies for all the three-dimensional HP models with sequence length longer than 80 residues.
Collapse
Affiliation(s)
- Jinfeng Zhang
- Department of Statistics, Harvard University, Science Center, Cambridge, Massachusetts 02138, USA
| | | | | |
Collapse
|
15
|
Vassura M, Margara L, Medri F, di Lena P, Fariselli P, Casadio R. Reconstruction of 3D Structures from Protein Contact Maps. BIOINFORMATICS RESEARCH AND APPLICATIONS 2007. [DOI: 10.1007/978-3-540-72031-7_53] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
|
16
|
Fault Tolerance for Large Scale Protein 3D Reconstruction from Contact Maps. LECTURE NOTES IN COMPUTER SCIENCE 2007. [DOI: 10.1007/978-3-540-74126-8_4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
17
|
Lise S, Walker-Taylor A, Jones DT. Docking protein domains in contact space. BMC Bioinformatics 2006; 7:310. [PMID: 16790041 PMCID: PMC1559650 DOI: 10.1186/1471-2105-7-310] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 06/21/2006] [Indexed: 11/10/2022] Open
Abstract
Background Many biological processes involve the physical interaction between protein domains. Understanding these functional associations requires knowledge of the molecular structure. Experimental investigations though present considerable difficulties and there is therefore a need for accurate and reliable computational methods. In this paper we present a novel method that seeks to dock protein domains using a contact map representation. Rather than providing a full three dimensional model of the complex, the method predicts contacting residues across the interface. We use a scoring function that combines structural, physicochemical and evolutionary information, where each potential residue contact is assigned a value according to the scoring function and the hypothesis is that the real configuration of contacts is the one that maximizes the score. The search is performed with a simulated annealing algorithm directly in contact space. Results We have tested the method on interacting domain pairs that are part of the same protein (intra-molecular domains). We show that it correctly predicts some contacts and that predicted residues tend to be significantly closer to each other than other pairs of residues in the same domains. Moreover we find that predicted contacts can often discriminate the best model (or the native structure, if present) among a set of optimal solutions generated by a standard docking procedure. Conclusion Contact docking appears feasible and able to complement other computational methods for the prediction of protein-protein interactions. With respect to more standard docking algorithms it might be more suitable to handle protein conformational changes and to predict complexes starting from protein models.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | - David T Jones
- Department of Biochemistry and Molecular Biology, University College London, UK
- Department of Computer Science, University College London, UK
| |
Collapse
|
18
|
Taylor TJ, Vaisman II. Graph theoretic properties of networks formed by the Delaunay tessellation of protein structures. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2006; 73:041925. [PMID: 16711854 DOI: 10.1103/physreve.73.041925] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2005] [Indexed: 05/09/2023]
Abstract
The Delaunay tessellation of several sets of real and simplified model protein structures has been used to explore graph theoretic properties of residue contact networks. The system of contacts defined by residues joined by edges in the Delaunay simplices can be thought of as a graph or network and analyzed using techniques from elementary graph theory and the theory of complex networks. Such analysis indicates that protein contact networks have small world character, but technically are not small world networks. This approach also indicates that networks formed by native structures and by most misfolded decoys can be differentiated by their respective graph properties. The characteristic features of residue contact networks can be used for the detection of structural elements in proteins, such as the ubiquitous closed loops consisting of 22-32 consecutive residues, where terminal residues are Delaunay neighbors.
Collapse
Affiliation(s)
- Todd J Taylor
- Laboratory for Structural Bioinformatics, School of Computational Sciences, George Mason University, 10900 University Boulevard MSN5B3, Manassas, VA 20110, USA
| | | |
Collapse
|
19
|
Berrera M, Molinari H, Fogolari F. Amino acid empirical contact energy definitions for fold recognition in the space of contact maps. BMC Bioinformatics 2003; 4:8. [PMID: 12689348 PMCID: PMC153506 DOI: 10.1186/1471-2105-4-8] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2003] [Accepted: 02/28/2003] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Contradicting evidence has been presented in the literature concerning the effectiveness of empirical contact energies for fold recognition. Empirical contact energies are calculated on the basis of information available from selected protein structures, with respect to a defined reference state, according to the quasi-chemical approximation. Protein-solvent interactions are estimated from residue solvent accessibility. RESULTS In the approach presented here, contact energies are derived from the potential of mean force theory, several definitions of contact are examined and their performance in fold recognition is evaluated on sets of decoy structures. The best definition of contact is tested, on a more realistic scenario, on all predictions including sidechains accepted in the CASP4 experiment. In 30 out of 35 cases the native structure is correctly recognized and best predictions are usually found among the 10 lowest energy predictions. CONCLUSION The definition of contact based on van der Waals radii of alpha carbon and side chain heavy atoms is seen to perform better than other definitions involving only alpha carbons, only beta carbons, all heavy atoms or only backbone atoms. An important prerequisite for the applicability of the approach is that the protein structure under study should not exhibit anomalous solvent accessibility, compared to soluble proteins whose structure is deposited in the Protein Data Bank. The combined evaluation of a solvent accessibility parameter and contact energy allows for an effective gross screening of predictive models.
Collapse
Affiliation(s)
- Marco Berrera
- International School for Advanced Studies Via Beirut 4, 34014 Trieste, Italy
| | - Henriette Molinari
- Dipartimento Scientifico e Tecnologico, Universita' di Verona, Strada Le Grazie 15, 37134 Verona, Italy
| | - Federico Fogolari
- Dipartimento Scientifico e Tecnologico, Universita' di Verona, Strada Le Grazie 15, 37134 Verona, Italy
| |
Collapse
|
20
|
Bastolla U, Farwer J, Knapp EW, Vendruscolo M. How to guarantee optimal stability for most representative structures in the Protein Data Bank. Proteins 2001; 44:79-96. [PMID: 11391771 DOI: 10.1002/prot.1075] [Citation(s) in RCA: 101] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We proposed recently an optimization method to derive energy parameters for simplified models of protein folding. The method is based on the maximization of the thermodynamic average of the overlap between protein native structures and a Boltzmann ensemble of alternative structures. Such a condition enforces protein models whose ground states are most similar to the corresponding native states. We present here an extensive testing of the method for a simple residue-residue contact energy function and for alternative structures generated by threading. The optimized energy function guarantees high stability and a well-correlated energy landscape to most representative structures in the PDB database. Failures in the recognition of the native structure can be attributed to the neglect of interactions between different chains in oligomeric proteins or with cofactors. When these are taken into account, only very few X-ray structures are not recognized. Most of them are short inhibitors or fragments and one is a structure that presents serious inconsistencies. Finally, we discuss the reasons that make NMR structures more difficult to recognizeCopyright 2001 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- U Bastolla
- Free University of Berlin, Department of Biology, Chemistry and Pharmacy, Institute of Chemistry, Berlin, Germany.
| | | | | | | |
Collapse
|