1
|
Achimba F, Faezov B, Cohen B, Dunbrack R, Holford M. Targeting Dysregulated Ion Channels in Liver Tumors with Venom Peptides. Mol Cancer Ther 2024; 23:139-147. [PMID: 38015557 PMCID: PMC10831335 DOI: 10.1158/1535-7163.mct-23-0256] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 10/04/2023] [Accepted: 11/14/2023] [Indexed: 11/29/2023]
Abstract
The regulation of cellular processes by ion channels has become central to the study of cancer mechanisms. Designing molecules that can modify ion channels specific to tumor cells is a promising area of targeted drug delivery and therapy. Despite their potential in drug discovery, venom peptides-a group of natural products-have largely remained understudied and under-characterized. In general, venom peptides display high specificity and selectivity for their target ion channels. Therefore, they may represent an effective strategy for selectively targeting the dysregulation of ion channels in tumor cells. This review examines existing venom peptide therapies for different cancer types and focuses on the application of snail venom peptides in hepatocellular carcinoma (HCC), the most common form of primary liver cancer worldwide. We provide insights into the mode of action of venom peptides that have been shown to target tumors. We also explore the benefit of using new computational methods like de novo protein structure prediction to screen venom peptides and identify potential druggable candidates. Finally, we summarize the role of cell culture, animal, and organoid models in developing effective therapies against HCC and highlight the need for creating models that represent the most disproportionately affected ethnicities in HCC.
Collapse
Affiliation(s)
- Favour Achimba
- The PhD Program in Biochemistry, Graduate Center, City University of New York, New York, New York
- Hunter College, City University of New York, New York, New York
| | - Bulat Faezov
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania
- Institute of Fundamental Medicine and Biology, Kazan Federal University, Kazan, Russian Federation
| | - Brandon Cohen
- Hunter College, City University of New York, New York, New York
| | - Roland Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania
| | - Mandë Holford
- The PhD Program in Biochemistry, Graduate Center, City University of New York, New York, New York
- Hunter College, City University of New York, New York, New York
- The PhD Program in Chemistry, Graduate Center of the City University of New York, New York, New York
- The PhD Program in Biology, Graduate Center of the City University of New York, New York, New York
- Department of Invertebrate Zoology, The American Museum of Natural History, New York, New York
- Department of Biochemistry, Weill Cornell Medicine, New York, New York
| |
Collapse
|
2
|
Modi V, Xu Q, Adhikari S, Dunbrack RL. Assessment of template-based modeling of protein structure in CASP11. Proteins 2016; 84 Suppl 1:200-20. [PMID: 27081927 DOI: 10.1002/prot.25049] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2016] [Revised: 04/04/2016] [Accepted: 04/11/2016] [Indexed: 12/27/2022]
Abstract
We present the assessment of predictions submitted in the template-based modeling (TBM) category of CASP11 (Critical Assessment of Protein Structure Prediction). Model quality was judged on the basis of global and local measures of accuracy on all atoms including side chains. The top groups on 39 human-server targets based on model 1 predictions were LEER, Zhang, LEE, MULTICOM, and Zhang-Server. The top groups on 81 targets by server groups based on model 1 predictions were Zhang-Server, nns, BAKER-ROSETTASERVER, QUARK, and myprotein-me. In CASP11, the best models for most targets were equal to or better than the best template available in the Protein Data Bank, even for targets with poor templates. The overall performance in CASP11 is similar to the performance of predictors in CASP10 with slightly better performance on the hardest targets. For most targets, assessment measures exhibited bimodal probability density distributions. Multi-dimensional scaling of an RMSD matrix for each target typically revealed a single cluster with models similar to the target structure, with a mode in the GDT-TS density between 40 and 90, and a wide distribution of models highly divergent from each other and from the experimental structure, with density mode at a GDT-TS value of ∼20. The models in this peak in the density were either compact models with entirely the wrong fold, or highly non-compact models. The results argue for a density-driven approach in future CASP TBM assessments that accounts for the bimodal nature of these distributions instead of Z scores, which assume a unimodal, Gaussian distribution. Proteins 2016; 84(Suppl 1):200-220. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Vivek Modi
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Qifang Xu
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Sam Adhikari
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111
| | - Roland L Dunbrack
- Fox Chase Cancer Center, Institute for Cancer Research, Philadelphia, Pennsylvania, 19111.
| |
Collapse
|
3
|
Wabik J, Kurcinski M, Kolinski A. Coarse-Grained Modeling of Peptide Docking Associated with Large Conformation Transitions of the Binding Protein: Troponin I Fragment-Troponin C System. Molecules 2015; 20:10763-80. [PMID: 26111167 PMCID: PMC6272278 DOI: 10.3390/molecules200610763] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 05/14/2015] [Accepted: 05/21/2015] [Indexed: 11/25/2022] Open
Abstract
Most of the current docking procedures are focused on fine conformational adjustments of assembled complexes and fail to reproduce large-scale protein motion. In this paper, we test a new modeling approach developed to address this problem. CABS-dock is a versatile and efficient tool for modeling the structure, dynamics and interactions of protein complexes. The docking protocol employs a coarse-grained representation of proteins, a simplified model of interactions and advanced protocols for conformational sampling. CABS-dock is one of the very few tools that allow unrestrained docking with large conformational freedom of the receptor. In an example application we modeled the process of complex assembly between two proteins: Troponin C (TnC) and the N-terminal helix of Troponin I (TnI N-helix), which occurs in vivo during muscle contraction. Docking simulations illustrated how the TnC molecule undergoes significant conformational transition on complex formation, a phenomenon that can be modeled only when protein flexibility is properly accounted for. This way our procedure opens up a new possibility for studying mechanisms of protein complex assembly, which may be a supporting tool for rational drug design.
Collapse
Affiliation(s)
- Jacek Wabik
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| | - Mateusz Kurcinski
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| | - Andrzej Kolinski
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland.
| |
Collapse
|
4
|
Ligand heterogeneity of the cysteine protease binding protein family in the parasitic protist Entamoeba histolytica. Int J Parasitol 2014; 44:625-35. [DOI: 10.1016/j.ijpara.2014.04.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Revised: 04/11/2014] [Accepted: 04/15/2014] [Indexed: 01/08/2023]
|
5
|
Feng Y, Lin H, Luo L. Prediction of protein secondary structure using feature selection and analysis approach. Acta Biotheor 2014; 62:1-14. [PMID: 24052343 DOI: 10.1007/s10441-013-9203-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 08/24/2013] [Indexed: 01/09/2023]
Abstract
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.
Collapse
|
6
|
Kaushik S, Mutt E, Chellappan A, Sankaran S, Srinivasan N, Sowdhamini R. Improved detection of remote homologues using cascade PSI-BLAST: influence of neighbouring protein families on sequence coverage. PLoS One 2013; 8:e56449. [PMID: 23437136 PMCID: PMC3577913 DOI: 10.1371/journal.pone.0056449] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2012] [Accepted: 01/13/2013] [Indexed: 12/31/2022] Open
Abstract
Background Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST. Methodology/Principal Findings We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families. Conclusions/Significance Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
Collapse
Affiliation(s)
- Swati Kaushik
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| | - Eshita Mutt
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
| | - Ajithavalli Chellappan
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
- School of Chemical and Biotechnology, Shanmugha Arts, Science, Technology & Research Academy, Thanjavur, Tamil Nadu, India
| | - Sandhya Sankaran
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Bangalore, India
| | - Narayanaswamy Srinivasan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Bangalore, India
- * E-mail: (NS); (RS)
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore, Karnataka, India
- * E-mail: (NS); (RS)
| |
Collapse
|
7
|
González J, Gálvez A, Morales L, Barreto GE, Capani F, Sierra O, Torres Y. Integrative Approach for Computationally Inferring Interactions between the Alpha and Beta Subunits of the Calcium-Activated Potassium Channel (BK): A Docking Study. Bioinform Biol Insights 2013; 7:73-82. [PMID: 23492851 PMCID: PMC3588595 DOI: 10.4137/bbi.s10077] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Three-dimensional models of the alpha- and beta-1 subunits of the calcium-activated potassium channel (BK) were predicted by threading modeling. A recursive approach comprising of sequence alignment and model building based on three templates was used to build these models, with the refinement of non-conserved regions carried out using threading techniques. The complex formed by the subunits was studied by means of docking techniques, using 3D models of the two subunits, and an approach based on rigid-body structures. Structural effects of the complex were analyzed with respect to hydrogen-bond interactions and binding-energy calculations. Potential interaction sites of the complex were determined by referencing a study of the difference accessible surface area (DASA) of the protein subunits in the complex.
Collapse
Affiliation(s)
- Janneth González
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - Angela Gálvez
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - Ludis Morales
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - George E. Barreto
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - Francisco Capani
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - Omar Sierra
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| | - Yolima Torres
- Department of Nutrition and Biochemistry, Faculty of Sciences, Javeriana University, Bogotá DC, Colombia
| |
Collapse
|
8
|
3D profile-based approach to proteome-wide discovery of novel human chemokines. PLoS One 2012; 7:e36151. [PMID: 22586462 PMCID: PMC3346806 DOI: 10.1371/journal.pone.0036151] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2012] [Accepted: 03/27/2012] [Indexed: 12/29/2022] Open
Abstract
Chemokines are small secreted proteins with important roles in immune responses. They consist of a conserved three-dimensional (3D) structure, so-called IL8-like chemokine fold, which is supported by disulfide bridges characteristic of this protein family. Sequence- and profile-based computational methods have been proficient in discovering novel chemokines by making use of their sequence-conserved cysteine patterns. However, it has been recently shown that some chemokines escaped annotation by these methods due to low sequence similarity to known chemokines and to different arrangement of cysteines in sequence and in 3D. Innovative methods overcoming the limitations of current techniques may allow the discovery of new remote homologs in the still functionally uncharacterized fraction of the human genome. We report a novel computational approach for proteome-wide identification of remote homologs of the chemokine family that uses fold recognition techniques in combination with a scaffold-based automatic mapping of disulfide bonds to define a 3D profile of the chemokine protein family. By applying our methodology to all currently uncharacterized human protein sequences, we have discovered two novel proteins that, without having significant sequence similarity to known chemokines or characteristic cysteine patterns, show strong structural resemblance to known anti-HIV chemokines. Detailed computational analysis and experimental structural investigations based on mass spectrometry and circular dichroism support our structural predictions and highlight several other chemokine-like features. The results obtained support their functional annotation as putative novel chemokines and encourage further experimental characterization. The identification of remote homologs of human chemokines may provide new insights into the molecular mechanisms causing pathologies such as cancer or AIDS, and may contribute to the development of novel treatments. Besides, the genome-wide applicability of our methodology based on 3D protein family profiles may open up new possibilities for improving and accelerating protein function annotation processes.
Collapse
|
9
|
POLEKSIC ALEKSANDAR, FIENUP MARK, DANZER JOSEPHF, DEBE DEREKA. A DIFFERENT LOOK AT THE QUALITY OF MODELED THREE-DIMENSIONAL PROTEIN STRUCTURES. J Bioinform Comput Biol 2011; 6:335-45. [DOI: 10.1142/s0219720008003424] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2007] [Revised: 11/14/2007] [Accepted: 12/05/2007] [Indexed: 11/18/2022]
Abstract
Measuring the accuracy of protein three-dimensional structures is one of the most important problems in protein structure prediction. For structure-based drug design, the accuracy of the binding site is far more important than the accuracy of any other region of the protein. We have developed an automated method for assessing the quality of a protein model by focusing on the set of residues in the small molecule binding site. Small molecule binding sites typically involve multiple regions of the protein coming together in space, and their accuracy has been observed to be sensitive to even small alignment errors. In addition, ligand binding sites contain the critical information required for drug design, making their accuracy particularly important. We analyzed the accuracy of the binding sites on two sets of protein models: the predictions submitted by the top-performing CASP7 groups, and the models generated by four widely used homology modeling packages. The results of our CASP7 analysis significantly differ from the previous findings, implying that the binding site measure does not correlate with the traditional model quality measures used in the structure prediction benchmarks. For the modeling programs, the resolution of binding sites is extremely sensitive to the degree of sequence homology between the query and the template, even when the most accurate alignments are used in the homology modeling process.
Collapse
Affiliation(s)
- ALEKSANDAR POLEKSIC
- Computer Science Department, University of Northern Iowa, Cedar Falls, IA 50614, USA
| | - MARK FIENUP
- Computer Science Department, University of Northern Iowa, Cedar Falls, IA 50614, USA
| | - JOSEPH F. DANZER
- Eidogen-Sertanty Inc., 9381 Judicial Dr., San Diego, CA 92121, USA
| | - DEREK A. DEBE
- Global Pharmaceutical Research and Development, Abbott Laboratories, Abbott Park, IL 60064, USA
| |
Collapse
|
10
|
Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2011.0514] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most of the protein structure prediction methods use a multi-step process, which often includes secondary structure prediction, contact prediction, fragment generation, clustering, etc. For many years, secondary structure prediction has been the workhorse for numerous methods aimed at predicting protein structure and function. This paper presents a new mixed integer linear optimization (MILP)-based consensus method: a Consensus scheme based On a mixed integer liNear optimization method for seCOndary stRucture preDiction (CONCORD). Based on seven secondary structure prediction methods, SSpro, DSC, PROF, PROFphd, PSIPRED, Predator and GorIV, the MILP-based consensus method combines the strengths of different methods, maximizes the number of correctly predicted amino acids and achieves a better prediction accuracy. The method is shown to perform well compared with the seven individual methods when tested on the PDBselect25 training protein set using sixfold cross validation. It also performs well compared with another set of 10 online secondary structure prediction servers (including several recent ones) when tested on the CASP9 targets (
http://predictioncenter.org/casp9/
). The average Q3 prediction accuracy is 83.04 per cent for the sixfold cross validation of the PDBselect25 set and 82.3 per cent for the CASP9 targets. We have developed a MILP-based consensus method for protein secondary structure prediction. A web server, CONCORD, is available to the scientific community at
http://helios.princeton.edu/CONCORD
.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - J. Thompson
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
11
|
Wei Y, Floudas CA. Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins. Chem Eng Sci 2011; 66:4356-4369. [PMID: 21892227 PMCID: PMC3164537 DOI: 10.1016/j.ces.2011.04.033] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
12
|
Zhou H, Skolnick J. Improving threading algorithms for remote homology modeling by combining fragment and template comparisons. Proteins 2010; 78:2041-8. [PMID: 20455261 DOI: 10.1002/prot.22717] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
In this work, we develop a method called fragment comparison and the template comparison (FTCOM) for assessing the global quality of protein structural models for targets of medium and hard difficulty (remote homology) produced by structure prediction approaches such as threading or ab initio structure prediction. FTCOM requires the C(alpha) coordinates of full length models and assesses model quality based on fragment comparison and a score derived from comparison of the model to top threading templates. On a set of 361 medium/hard targets, FTCOM was applied to and assessed for its ability to improve on the results from the SP(3), SPARKS, PROSPECTOR_3, and PRO-SP(3)-TASSER threading algorithms. The average TM-score improves by 5-10% for the first selected model by the new method over models obtained by the original selection procedure in the respective threading methods. Moreover, the number of foldable targets (TM-score >or= 0.4) increases from least 7.6% for SP(3) to 54% for SPARKS. Thus, FTCOM is a promising approach to template selection. Proteins 2010. (c) 2010 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | |
Collapse
|
13
|
Wu S, Zhang Y. Recognizing protein substructure similarity using segmental threading. Structure 2010; 18:858-67. [PMID: 20637422 DOI: 10.1016/j.str.2010.04.007] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Revised: 04/02/2010] [Accepted: 04/03/2010] [Indexed: 11/15/2022]
Abstract
Protein template identification is essential to protein structure and function predictions. However, conventional whole-chain threading approaches often fail to recognize conserved substructure motifs when the target and templates do not share the same fold. We developed a new approach, SEGMER, for identifying protein substructure similarities by segmental threading. The target sequence is split into segments of two to four consecutive or nonconsecutive secondary structural elements, which are then threaded through PDB to identify appropriate substructure motifs. SEGMER is tested on 144 nonredundant hard proteins. When combined with whole-chain threading, the TM-score of alignments and accuracy of spatial restraints of SEGMER increase by 16% and 25%, respectively, compared with that by the whole-chain threading methods only. When tested on 12 free modeling targets from CASP8, SEGMER increases the TM-score and contact accuracy by 28% and 48%, respectively. This significant improvement should have important impact on protein structure modeling and functional inference.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA
| | | |
Collapse
|
14
|
Krivov GG, Shapovalov MV, Dunbrack RL. Improved prediction of protein side-chain conformations with SCWRL4. Proteins 2010; 77:778-95. [PMID: 19603484 DOI: 10.1002/prot.22488] [Citation(s) in RCA: 984] [Impact Index Per Article: 70.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Determination of side-chain conformations is an important step in protein structure prediction and protein design. Many such methods have been presented, although only a small number are in widespread use. SCWRL is one such method, and the SCWRL3 program (2003) has remained popular because of its speed, accuracy, and ease-of-use for the purpose of homology modeling. However, higher accuracy at comparable speed is desirable. This has been achieved in a new program SCWRL4 through: (1) a new backbone-dependent rotamer library based on kernel density estimates; (2) averaging over samples of conformations about the positions in the rotamer library; (3) a fast anisotropic hydrogen bonding function; (4) a short-range, soft van der Waals atom-atom interaction potential; (5) fast collision detection using k-discrete oriented polytopes; (6) a tree decomposition algorithm to solve the combinatorial problem; and (7) optimization of all parameters by determining the interaction graph within the crystal environment using symmetry operators of the crystallographic space group. Accuracies as a function of electron density of the side chains demonstrate that side chains with higher electron density are easier to predict than those with low-electron density and presumed conformational disorder. For a testing set of 379 proteins, 86% of chi(1) angles and 75% of chi(1+2) angles are predicted correctly within 40 degrees of the X-ray positions. Among side chains with higher electron density (25-100th percentile), these numbers rise to 89 and 80%. The new program maintains its simple command-line interface, designed for homology modeling, and is now available as a dynamic-linked library for incorporation into other software programs.
Collapse
Affiliation(s)
- Georgii G Krivov
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, Pennsylvania 19111, USA
| | | | | |
Collapse
|
15
|
Cozzetto D, Kryshtafovych A, Fidelis K, Moult J, Rost B, Tramontano A. Evaluation of template-based models in CASP8 with standard measures. Proteins 2010; 77 Suppl 9:18-28. [PMID: 19731382 DOI: 10.1002/prot.22561] [Citation(s) in RCA: 108] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The strategy for evaluating template-based models submitted to CASP has continuously evolved from CASP1 to CASP5, leading to a standard procedure that has been used in all subsequent editions. The established approach includes methods for calculating the quality of each individual model, for assigning scores based on the distribution of the results for each target and for computing the statistical significance of the differences in scores between prediction methods. These data are made available to the assessor of the template-based modeling category, who uses them as a starting point for further evaluations and analyses. This article describes the detailed workflow of the procedure, provides justifications for a number of choices that are customarily made for CASP data evaluation, and reports the results of the analysis of template-based predictions at CASP8.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| | | | | | - John Moult
- Center for Advanced Research in Biotechnology, University of Maryland, Rockville, Maryland 20850
| | - Burkhard Rost
- Department of Biochemistry and Molecular Biophysics, Columbia University, Northeast Structural Genomics Consortium (NESG) and New York Consortium on Membrane Proteins (NYCOMPS), Columbia University, New York, New York 10032
| | - Anna Tramontano
- Department of Biochemical Sciences, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy.,Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza-University of Rome, P. le A. Moro, 5, 00185 Rome, Italy
| |
Collapse
|
16
|
Keedy DA, Williams CJ, Headd JJ, Arendall WB, Chen VB, Kapral GJ, Gillespie RA, Block JN, Zemla A, Richardson DC, Richardson JS. The other 90% of the protein: assessment beyond the Calphas for CASP8 template-based and high-accuracy models. Proteins 2010; 77 Suppl 9:29-49. [PMID: 19731372 DOI: 10.1002/prot.22551] [Citation(s) in RCA: 71] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
For template-based modeling in the CASP8 Critical Assessment of Techniques for Protein Structure Prediction, this work develops and applies six new full-model metrics. They are designed to complement and add value to the traditional template-based assessment by the global distance test (GDT) and related scores (based on multiple superpositions of Calpha atoms between target structure and predictions labeled "Model 1"). The new metrics evaluate each predictor group on each target, using all atoms of their best model with above-average GDT. Two metrics evaluate how "protein-like" the predicted model is: the MolProbity score used for validating experimental structures, and a mainchain reality score using all-atom steric clashes, bond length and angle outliers, and backbone dihedrals. Four other new metrics evaluate match of model to target for mainchain and sidechain hydrogen bonds, sidechain end positioning, and sidechain rotamers. Group-average Z-score across the six full-model measures is averaged with group-average GDT Z-score to produce the overall ranking for full-model, high-accuracy performance. Separate assessments are reported for specific aspects of predictor-group performance, such as robustness of approximately correct template or fold identification, and self-scoring ability at identifying the best of their models. Fold identification is distinct from but correlated with group-average GDT Z-score if target difficulty is taken into account, whereas self-scoring is done best by servers and is uncorrelated with GDT performance. Outstanding individual models on specific targets are identified and discussed. Predictor groups excelled at different aspects, highlighting the diversity of current methodologies. However, good full-model scores correlate robustly with high Calpha accuracy.
Collapse
Affiliation(s)
- Daniel A Keedy
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Hvidsten TR, Kryshtafovych A, Fidelis K. Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
18
|
Tkaczuk KL. Trm13p, the tRNA:Xm4 modification enzyme from Saccharomyces cerevisiae is a member of the Rossmann-fold MTase superfamily: prediction of structure and active site. J Mol Model 2009; 16:599-606. [PMID: 19697067 DOI: 10.1007/s00894-009-0570-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2009] [Accepted: 07/28/2009] [Indexed: 01/09/2023]
Abstract
2'-O-ribose methylation is one of the most common posttranscriptional modifications in RNA. Methylations at different positions are introduced by enzymes from at least two unrelated superfamilies. Recently, a new family of eukaryotic RNA methyltransferases (MTases) has been identified, and its representative from yeast (Yol125w, renamed as Trm13p) has been shown to 2'-O-methylate position 4 of tRNA. Trm13 is conserved in Eukaryota, but exhibits no sequence similarity to other known MTases. Here, I present the results of bioinformatics analysis which suggest that Trm13 is a strongly diverged member of the Rossmann-fold MTase (RFM) superfamily, and therefore is evolutionarily related to 2'-O-MTases such as Trm7 and fibrillarin. However, the character of conserved residues in the predicted active site of the Trm13 family suggests it may use a different mechanism of ribose methylation than its relatives. A molecular model of the Trm13p structure has been constructed and evaluated for potential accuracy using model quality assessment methods. The predicted structure will facilitate experimental analyses of the Trm13p mechanism of action.
Collapse
|
19
|
Sadreyev RI, Shi S, Baker D, Grishin NV. Structure similarity measure with penalty for close non-equivalent residues. Bioinformatics 2009; 25:1259-63. [PMID: 19321733 PMCID: PMC2677741 DOI: 10.1093/bioinformatics/btp148] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation:Recent improvement in homology-based structure modeling emphasizes the importance of sensitive evaluation measures that help identify and correct modest distortions in models compared with the target structures. Global Distance Test Total Score (GDT_TS), otherwise a very powerful and effective measure for model evaluation, is still insensitive to and can even reward such distortions, as observed for remote homology modeling in the latest CASP8 (Comparative Assessment of Structure Prediction). Results:We develop a new measure that balances GDT_TS reward for the closeness of equivalent model and target residues (‘attraction’ term) with the penalty for the closeness of non-equivalent residues (‘repulsion’ term). Compared with GDT_TS, the resulting score, TR (total score with repulsion), is much more sensitive to structure compression both in real remote homologs and in CASP models. TR is correlated yet different from other measures of structure similarity. The largest difference from GDT_TS is observed in models of mid-range quality based on remote homology modeling. Availability:The script for TR calculation is included in Supplementary Material. TR scores for all server models in CASP8 are available at http://prodata.swmed.edu/CASP8. Contact:grishin@chop.swmed.edu Supplementary information:All scripts and numerical data are available for download at ftp://iole.swmed.edu/pub/tr_score/
Collapse
Affiliation(s)
- Ruslan I Sadreyev
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | | | |
Collapse
|
20
|
Orlowski J, Mebrhatu MT, Michiels CW, Bujnicki JM, Aertsen A. Mutational analysis and a structural model of methyl-directed restriction enzyme Mrr. Biochem Biophys Res Commun 2008; 377:862-6. [DOI: 10.1016/j.bbrc.2008.10.064] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2008] [Accepted: 10/15/2008] [Indexed: 11/29/2022]
|
21
|
Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008; 72:547-56. [PMID: 18247410 DOI: 10.1002/prot.21945] [Citation(s) in RCA: 310] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, Kansas 66047, USA
| | | |
Collapse
|
22
|
SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences. BMC Bioinformatics 2008; 9:226. [PMID: 18452616 PMCID: PMC2391167 DOI: 10.1186/1471-2105-9-226] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2007] [Accepted: 05/01/2008] [Indexed: 11/16/2022] Open
Abstract
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Collapse
|
23
|
Poleksic A, Fienup M. Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms. Bioinformatics 2008; 24:1145-53. [PMID: 18337259 DOI: 10.1093/bioinformatics/btn097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench. RESULTS We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences. AVAILABILITY UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
| | | |
Collapse
|
24
|
Cozzetto D, Kryshtafovych A, Ceriani M, Tramontano A. Assessment of predictions in the model quality assessment category. Proteins 2008; 69 Suppl 8:175-83. [PMID: 17680695 DOI: 10.1002/prot.21669] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]
Abstract
The article presents our evaluation of the predictions submitted to the model quality assessment (QA) category in CASP7. In this newly introduced category, predictors were asked to provide quality estimates for protein structure models. The QA category uses the automatically produced models that are traditionally distributed to CASP participants as input for predictions. Predictors were asked to provide an index of the quality of these individual models (QM1) as well as an index for the expected correctness of each of their residues (QM2). We computed the correlation between the observed and predicted quality of the models and of the individual residues achieved by the participating groups and evaluated the statistical significance of the differences. We also compared the results with those obtained by a "naïve predictor" that assigns a quality score related to how close the model is to the structure of the most similar protein of known structure. The aims of a method for assessing the overall quality of a model can be twofold: selecting the best (or one of the best) model(s) among a set of plausible choices, or assigning a nonrelative quality value to an individual model. The applications of the two strategies are different, albeit equally important. Our assessment of the QA category demonstrates that methods for addressing the first task effectively do exist, while there is room for improvement as far as the second aspect is concerned. Notwithstanding the limited number of groups submitting predictions for residue-level accuracy, our data demonstrate that a respectable accuracy in this task can be achieved by methods relying on the comparison of different models for the same target.
Collapse
Affiliation(s)
- Domenico Cozzetto
- Department of Biochemical Sciences, University of Rome La Sapienza, P. le A. Moro, 00185 Rome, Italy
| | | | | | | |
Collapse
|
25
|
Kopp J, Bordoli L, Battey JND, Kiefer F, Schwede T. Assessment of CASP7 predictions for template-based modeling targets. Proteins 2008; 69 Suppl 8:38-56. [PMID: 17894352 DOI: 10.1002/prot.21753] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
This manuscript presents the assessment of the template-based modeling category of the seventh Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The accuracy of predicted protein models for 108 target domains was assessed based on a detailed comparison between the experimental and predicted structures. The assessment was performed using numerical measures for backbone and structural alignment accuracy, and by scoring correctly modeled hydrogen bond interactions in the predictions. Based on these criteria, our statistical analysis identified a number of groups whose predictions were on average significantly more accurate. Furthermore, the predictions for six target proteins were evaluated for the accuracy of their modeled cofactor binding sites. We also assessed the ability of predictors to improve over the best available single template structure, which showed that the best groups produced models closer to the target structure than the best single template for a significant number of targets. In addition, we assessed the accuracy of the error estimates (local confidence values) assigned to predictions on a per residue basis. Finally, we discuss some general conclusions about the state of the art of template-based modeling methods and their usefulness for practical applications.
Collapse
Affiliation(s)
- Jürgen Kopp
- Biozentrum, University of Basel, Switzerland
| | | | | | | | | |
Collapse
|
26
|
Yang J. Comprehensive description of protein structures using protein folding shape code. Proteins 2008; 71:1497-518. [DOI: 10.1002/prot.21932] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
27
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
28
|
|
29
|
Kosinski J, Kubareva E, Bujnicki JM. A model of restriction endonuclease MvaI in complex with DNA: a template for interpretation of experimental data and a guide for specificity engineering. Proteins 2007; 68:324-36. [PMID: 17407166 DOI: 10.1002/prot.21460] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
R.MvaI is a Type II restriction enzyme (REase), which specifically recognizes the pentanucleotide DNA sequence 5'-CCWGG-3' (W indicates A or T). It belongs to a family of enzymes, which recognize related sequences, including 5'-CCSGG-3' (S indicates G or C) in the case of R.BcnI, or 5'-CCNGG-3' (where N indicates any nucleoside) in the case of R.ScrFI. REases from this family hydrolyze the phosphodiester bond in the DNA between the 2nd and 3rd base in both strands, thereby generating a double strand break with 5'-protruding single nucleotides. So far, no crystal structures of REases with similar cleavage patterns have been solved. Characterization of sequence-structure-function relationships in this family would facilitate understanding of evolution of sequence specificity among REases and could aid in engineering of enzymes with new specificities. However, sequences of R.MvaI or its homologs show no significant similarity to any proteins with known structures, thus precluding straightforward comparative modeling. We used a fold recognition approach to identify a remote relationship between R.MvaI and the structure of DNA repair enzyme MutH, which belongs to the PD-(D/E)XK superfamily together with many other REases. We constructed a homology model of R.MvaI and used it to predict functionally important amino acid residues and the mode of interaction with the DNA. In particular, we predict that only one active site of R.MvaI interacts with the DNA target at a time, and the cleavage of both strands (5'-CCAGG-3' and 5'-CCTGG-3') is achieved by two independent catalytic events. The model is in good agreement with the available experimental data and will serve as a template for further analyses of R.MvaI, R.BcnI, R.ScrFI and other related enzymes.
Collapse
Affiliation(s)
- Jan Kosinski
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| | | | | |
Collapse
|
30
|
Yan A, Kloczkowski A, Hofmann H, Jernigan RL. Prediction of side chain orientations in proteins by statistical machine learning methods. J Biomol Struct Dyn 2007; 25:275-88. [PMID: 17937489 DOI: 10.1080/07391102.2007.10507176] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
We develop ways to predict the side chain orientations of residues within a protein structure by using several different statistical machine learning methods. Here side chain orientation of a given residue i is measured by an angle Omega(i) between the vector pointing from the center of the protein structure to the C(i)(alpha) atom and the vector pointing from the C(i)(alpha) atom to the center of its side chain atoms. To predict the Omega(i) angles, we construct statistical models by using several different methods such as general linear regression, a regression tree and bagging, a neural network, and a support vector machine. The root mean square errors for the different models range only from 36.67 to 37.60 degrees and the correlation coefficients are all between 30% and 34%. The performances of different models in the test set are, thus, quite similar, and show the relative predictive power of these models to be significant in comparison with random side chain orientations.
Collapse
Affiliation(s)
- Aimin Yan
- Laurence H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, Iowa, USA
| | | | | | | |
Collapse
|
31
|
Chen K, Kurgan L. PFRES: protein fold classification by using evolutionary information and predicted secondary structure. Bioinformatics 2007; 23:2843-50. [DOI: 10.1093/bioinformatics/btm475] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
32
|
Maravić Vlahovicek G, Cubrilo S, Tkaczuk KL, Bujnicki JM. Modeling and experimental analyses reveal a two-domain structure and amino acids important for the activity of aminoglycoside resistance methyltransferase Sgm. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2007; 1784:582-90. [PMID: 18343347 DOI: 10.1016/j.bbapap.2007.09.009] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/31/2007] [Revised: 09/18/2007] [Accepted: 09/19/2007] [Indexed: 12/19/2022]
Abstract
Methyltransferases that carry out posttranscriptional N7-methylation of G1405 in 16S rRNA confer bacterial resistance to aminoglycoside antibiotics, including kanamycin and gentamicin. Genes encoding enzymes from this family (hereafter referred to as Arm, for aminoglycoside resistance methyltransferases) have been recently found to spread by horizontal gene transfer between various human pathogens. The knowledge of the Arm protein structure would lay the groundwork for the development of potential resistance inhibitors, which could be used to restore the potential of aminoglycosides to act against the resistant pathogens. We analyzed the sequence-function relationships of Sgm MTase, a member of the Arm family, by limited proteolysis and site-directed and random mutagenesis. We also modeled the structure of Sgm using bioinformatics techniques and used the model to provide a structural context for experimental results. We found that Sgm comprises two domains and we characterized a number of functionally compromised point mutants with substitutions of invariant or conserved residues. Our study provides a low-resolution (residue-level) model of sequence-structure-function relationships in the Arm family of enzymes and reveals the cofactor-binding and substrate-binding sites. These functional regions will be prime targets for further experimental and theoretical studies aimed at defining the reaction mechanism of m7 G1405 methylation, increasing the resolution of the model and developing Arm-specific inhibitors.
Collapse
Affiliation(s)
- Gordana Maravić Vlahovicek
- Department of Biochemistry and Molecular Biology, Faculty of Pharmacy and Biochemistry, University of Zagreb, Ante Kovacića 1, 10000 Zagreb, Croatia.
| | | | | | | |
Collapse
|
33
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
34
|
Structural and evolutionary bioinformatics of the SPOUT superfamily of methyltransferases. BMC Bioinformatics 2007; 8:73. [PMID: 17338813 PMCID: PMC1829167 DOI: 10.1186/1471-2105-8-73] [Citation(s) in RCA: 128] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2006] [Accepted: 03/05/2007] [Indexed: 11/29/2022] Open
Abstract
Background SPOUT methyltransferases (MTases) are a large class of S-adenosyl-L-methionine-dependent enzymes that exhibit an unusual alpha/beta fold with a very deep topological knot. In 2001, when no crystal structures were available for any of these proteins, Anantharaman, Koonin, and Aravind identified homology between SpoU and TrmD MTases and defined the SPOUT superfamily. Since then, multiple crystal structures of knotted MTases have been solved and numerous new homologous sequences appeared in the databases. However, no comprehensive comparative analysis of these proteins has been carried out to classify them based on structural and evolutionary criteria and to guide functional predictions. Results We carried out extensive searches of databases of protein structures and sequences to collect all members of previously identified SPOUT MTases, and to identify previously unknown homologs. Based on sequence clustering, characterization of domain architecture, structure predictions and sequence/structure comparisons, we re-defined families within the SPOUT superfamily and predicted putative active sites and biochemical functions for the so far uncharacterized members. We have also delineated the common core of SPOUT MTases and inferred a multiple sequence alignment for the conserved knot region, from which we calculated the phylogenetic tree of the superfamily. We have also studied phylogenetic distribution of different families, and used this information to infer the evolutionary history of the SPOUT superfamily. Conclusion We present the first phylogenetic tree of the SPOUT superfamily since it was defined, together with a new scheme for its classification, and discussion about conservation of sequence and structure in different families, and their functional implications. We identified four protein families as new members of the SPOUT superfamily. Three of these families are functionally uncharacterized (COG1772, COG1901, and COG4080), and one (COG1756 represented by Nep1p) has been already implicated in RNA metabolism, but its biochemical function has been unknown. Based on the inference of orthologous and paralogous relationships between all SPOUT families we propose that the Last Universal Common Ancestor (LUCA) of all extant organisms contained at least three SPOUT members, ancestors of contemporary RNA MTases that carry out m1G, m3U, and 2'O-ribose methylation, respectively. In this work we also speculate on the origin of the knot and propose possible 'unknotted' ancestors. The results of our analysis provide a comprehensive 'roadmap' for experimental characterization of SPOUT MTases and interpretation of functional studies in the light of sequence-structure relationships.
Collapse
|
35
|
Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN, Obradovic Z, Dunker AK. DisProt: the Database of Disordered Proteins. Nucleic Acids Res 2006; 35:D786-93. [PMID: 17145717 PMCID: PMC1751543 DOI: 10.1093/nar/gkl893] [Citation(s) in RCA: 616] [Impact Index Per Article: 34.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The Database of Protein Disorder (DisProt) links structure and function information for intrinsically disordered proteins (IDPs). Intrinsically disordered proteins do not form a fixed three-dimensional structure under physiological conditions, either in their entireties or in segments or regions. We define IDP as a protein that contains at least one experimentally determined disordered region. Although lacking fixed structure, IDPs and regions carry out important biological functions, being typically involved in regulation, signaling and control. Such functions can involve high-specificity low-affinity interactions, the multiple binding of one protein to many partners and the multiple binding of many proteins to one partner. These three features are all enabled and enhanced by protein intrinsic disorder. One of the major hindrances in the study of IDPs has been the lack of organized information. DisProt was developed to enable IDP research by collecting and organizing knowledge regarding the experimental characterization and the functional associations of IDPs. In addition to being a unique source of biological information, DisProt opens doors for a plethora of bioinformatics studies. DisProt is openly available at .
Collapse
Affiliation(s)
- Megan Sickmeier
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
| | - Justin A. Hamilton
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
| | - Tanguy LeGall
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
| | - Vladimir Vacic
- Computer Science and Engineering, University of California RiversideRiverside, CA 92521, USA
| | - Marc S. Cortese
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
| | - Agnes Tantos
- Institute of Enzymology, Biological Research CenterHungarian Academy of Sciences, Budapest, Hungary
| | - Beata Szabo
- Institute of Enzymology, Biological Research CenterHungarian Academy of Sciences, Budapest, Hungary
| | - Peter Tompa
- Institute of Enzymology, Biological Research CenterHungarian Academy of Sciences, Budapest, Hungary
| | - Jake Chen
- School of Informatics, Indiana UniversityIndianapolis, IN 46202, USA
| | - Vladimir N. Uversky
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
- Institute for Biological Instrumentation, Russian Academy of Sciences142292 Pushchino, Moscow Region, Russia
| | - Zoran Obradovic
- Center for Information Science and Technology, Temple UniversityPhiladelphia, PA 19122, USA
| | - A. Keith Dunker
- Department of Biochemistry and Molecular Biology, Center for Computational Biology and Bioinformatics, Indiana University School of MedicineIndianapolis, IN 46202, USA
- School of Informatics, Indiana UniversityIndianapolis, IN 46202, USA
- To whom correspondence should be addressed. Tel: +1 317 278 9650; Fax: +1 317 278 9217;
| |
Collapse
|
36
|
Zhi D, Krishna SS, Cao H, Pevzner P, Godzik A. Representing and comparing protein structures as paths in three-dimensional space. BMC Bioinformatics 2006; 7:460. [PMID: 17052359 PMCID: PMC1626488 DOI: 10.1186/1471-2105-7-460] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Accepted: 10/20/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Most existing formulations of protein structure comparison are based on detailed atomic level descriptions of protein structures and bypass potential insights that arise from a higher-level abstraction. RESULTS We propose a structure comparison approach based on a simplified representation of proteins that describes its three-dimensional path by local curvature along the generalized backbone of the polypeptide. We have implemented a dynamic programming procedure that aligns curvatures of proteins by optimizing a defined sum turning angle deviation measure. CONCLUSION Although our procedure does not directly optimize global structural similarity as measured by RMSD, our benchmarking results indicate that it can surprisingly well recover the structural similarity defined by structure classification databases and traditional structure alignment programs. In addition, our program can recognize similarities between structures with extensive conformation changes that are beyond the ability of traditional structure alignment programs. We demonstrate the applications of procedure to several contexts of structure comparison. An implementation of our procedure, CURVE, is available as a public webserver.
Collapse
Affiliation(s)
- Degui Zhi
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720-3102, USA
| | - S Sri Krishna
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Haibo Cao
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| | - Pavel Pevzner
- Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
- Bioinformatics Program, Infectious and Inflammation Disease Center, Burnham Institute for Medical Research, La Jolla, CA 92037, USA
| |
Collapse
|
37
|
Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res 2006; 34:e112. [PMID: 16971460 PMCID: PMC1635247 DOI: 10.1093/nar/gkl480] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.
Collapse
Affiliation(s)
- Dylan Chivian
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
- Howard Hughes Medical Institute, SeattleWA, USA
- To whom correspondence should be addressed at Department of Biochemistry and HHMI, University of Washington, Box 357350, Seattle, WA 98195, USA. Tel: +1 206 543 1295; Fax: +1 206 685 1792;
| |
Collapse
|
38
|
Debe DA, Danzer JF, Goddard WA, Poleksic A. STRUCTFAST: Protein sequence remote homology detection and alignment using novel dynamic programming and profile-profile scoring. Proteins 2006; 64:960-7. [PMID: 16786595 DOI: 10.1002/prot.21049] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
STRUCTFAST is a novel profile-profile alignment algorithm capable of detecting weak similarities between protein sequences. The increased sensitivity and accuracy of the STRUCTFAST method are achieved through several unique features. First, the algorithm utilizes a novel dynamic programming engine capable of incorporating important information from a structural family directly into the alignment process. Second, the algorithm employs a rigorous analytical formula for profile-profile scoring to overcome the limitations of ad hoc scoring functions that require adjustable parameter training. Third, the algorithm employs Convergent Island Statistics (CIS) to compute the statistical significance of alignment scores independently for each pair of sequences. STRUCTFAST routinely produces alignments that meet or exceed the quality obtained by an expert human homology modeler, as evidenced by its performance in the latest CAFASP4 and CASP6 blind prediction benchmark experiments.
Collapse
Affiliation(s)
- Derek A Debe
- Eidogen-Sertanty Inc., San Diego, California 92121, USA
| | | | | | | |
Collapse
|
39
|
Dunbrack RL. Sequence comparison and protein structure prediction. Curr Opin Struct Biol 2006; 16:374-84. [PMID: 16713709 DOI: 10.1016/j.sbi.2006.05.006] [Citation(s) in RCA: 119] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Revised: 03/22/2006] [Accepted: 05/08/2006] [Indexed: 10/24/2022]
Abstract
Sequence comparison is a major step in the prediction of protein structure from existing templates in the Protein Data Bank. The identification of potentially remote homologues to be used as templates for modeling target sequences of unknown structure and their accurate alignment remain challenges, despite many years of study. The most recent advances have been in combining as many sources of information as possible--including amino acid variation in the form of profiles or hidden Markov models for both the target and template families, known and predicted secondary structures of the template and target, respectively, the combination of structure alignment for distant homologues and sequence alignment for close homologues to build better profiles, and the anchoring of certain regions of the alignment based on existing biological data. Newer technologies have been applied to the problem, including the use of support vector machines to tackle the fold classification problem for a target sequence and the alignment of hidden Markov models. Finally, using the consensus of many fold recognition methods, whether based on profile-profile alignments, threading or other approaches, continues to be one of the most successful strategies for both recognition and alignment of remote homologues. Although there is still room for improvement in identification and alignment methods, additional progress may come from model building and refinement methods that can compensate for large structural changes between remotely related targets and templates, as well as for regions of misalignment.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA.
| |
Collapse
|