301
|
Nnakwe CC, Altaf M, Côté J, Kron SJ. Dissection of Rad9 BRCT domain function in the mitotic checkpoint response to telomere uncapping. DNA Repair (Amst) 2009; 8:1452-61. [PMID: 19880356 DOI: 10.1016/j.dnarep.2009.09.010] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2009] [Revised: 08/27/2009] [Accepted: 09/21/2009] [Indexed: 11/29/2022]
Abstract
In Saccharomyces cerevisiae, destabilizing telomeres, via inactivation of telomeric repeat binding factor Cdc13, induces a cell cycle checkpoint that arrests cells at the metaphase to anaphase transition--much like the response to an unrepaired DNA double strand break (DSB). Throughout the cell cycle, the multi-domain adaptor protein Rad9 is required for the activation of checkpoint effector kinase Rad53 in response to DSBs and is similarly necessary for checkpoint signaling in response to telomere uncapping. Rad53 activation in G1 and S phase depends on Rad9 association with modified chromatin adjacent to DSBs, which is mediated by Tudor domains binding histone H3 di-methylated at K79 and BRCT domains to histone H2A phosphorylated at S129. Nonetheless, Rad9 Tudor or BRCT mutants can initiate a checkpoint response to DNA damage in nocodazole-treated cells. Mutations affecting di-methylation of H3 K79, or its recognition by Rad9 enhance 5' strand resection upon telomere uncapping, and potentially implicate Rad9 chromatin binding in the checkpoint response to telomere uncapping. Indeed, we report that Rad9 binds to sub-telomeric chromatin, upon telomere uncapping, up to 10 kb from the telomere. Rad9 binding occurred within 30 min after inactivating Cdc13, preceding Rad53 phosphorylation. In turn, Rad9 Tudor and BRCT domain mutations blocked chromatin binding and led to attenuated checkpoint signaling as evidenced by decreased Rad53 phosphorylation and impaired cell cycle arrest. Our work identifies a role for Rad9 chromatin association, during mitosis, in the DNA damage checkpoint response to telomere uncapping, suggesting that chromatin binding may be an initiating event for checkpoints throughout the cell cycle.
Collapse
Affiliation(s)
- Chinonye C Nnakwe
- Department of Pathology, The University of Chicago, Chicago, IL 60637, USA
| | | | | | | |
Collapse
|
302
|
|
303
|
Botelho HM, Leal SS, Veith A, Prosinecki V, Bauer C, Fröhlich R, Kletzin A, Gomes CM. Role of a novel disulfide bridge within the all-beta fold of soluble Rieske proteins. J Biol Inorg Chem 2009; 15:271-81. [PMID: 19862563 DOI: 10.1007/s00775-009-0596-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 10/04/2009] [Indexed: 11/25/2022]
Abstract
Rieske proteins and Rieske ferredoxins are present in the three domains of life and are involved in a variety of cellular processes. Despite their functional diversity, these small Fe-S proteins contain a highly conserved all-beta fold, which harbors a [2Fe-2S] Rieske center. We have identified a novel subtype of Rieske ferredoxins present in hyperthermophilic archaea, in which a two-cysteine conserved SKTPCX((2-3))C motif is found at the C-terminus. We establish that in the Acidianus ambivalens representative, Rieske ferredoxin 2 (RFd2), these cysteines form a novel disulfide bond within the Rieske fold, which can be selectively broken under mild reducing conditions insufficient to reduce the [2Fe-2S] cluster or affect the secondary structure of the protein, as shown by visible circular dichroism, absorption, and attenuated total reflection Fourier transform IR spectroscopies. RFd2 presents all the EPR, visible absorption, and visible circular dichroism spectroscopic features of the [2Fe-2S] Rieske center. The cluster has a redox potential of +48 mV (25 degrees C and pH 7) and a pK (a) of 10.1 +/- 0.2. These shift to +77 mV and 8.9 +/- 0.3, respectively, upon reduction of the disulfide. RFd2 has a melting temperature near the boiling point of water (T(m) = 99 degrees C, pH 7.0), but it becomes destabilized upon disulfide reduction (DeltaT(m) = -9 degrees C, DeltaC(m) = -0.7 M guanidinium hydrochloride). This example illustrates how the incorporation of an additional structural element such as a disulfide bond in a highly conserved fold such as that of the Rieske domain may fine-tune the protein for a particular function or for increased stability.
Collapse
Affiliation(s)
- Hugo M Botelho
- Instituto de Tecnologia Química e Biológica, Universidade Nova de Lisboa, Oeiras, Portugal
| | | | | | | | | | | | | | | |
Collapse
|
304
|
Glekas GD, Foster RM, Cates JR, Estrella JA, Wawrzyniak MJ, Rao CV, Ordal GW. A PAS domain binds asparagine in the chemotaxis receptor McpB in Bacillus subtilis. J Biol Chem 2009; 285:1870-8. [PMID: 19864420 DOI: 10.1074/jbc.m109.072108] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
During chemotaxis toward asparagine by Bacillus subtilis, the ligand is thought to bind to the chemoreceptor McpB on the exterior of the cell and induce a conformational change. This change affects the degree of phosphorylation of the CheA kinase bound to the cytoplasmic region of the receptor. Until recently, the sensing domains of the B. subtilis receptors were thought to be structurally similar to the well studied Escherichia coli four-helical bundle. However, sequence analysis has shown the sensing domains of receptors from these two organisms to be vastly different. Homology modeling of the sensing domain of the B. subtilis asparagine receptor McpB revealed two tandem PAS domains. McpB mutants having alanine substitutions in key arginine and tyrosine residues of the upper PAS domain but not in any residues of the lower PAS domain exhibited a chemotactic defect in both swarm plates and capillary assays. Thus, binding does not appear to occur across any dimeric surface but within a monomer. A modified capillary assay designed to determine the concentration of attractant where chemotaxis is most sensitive showed that when Arg-111, Tyr-121, or Tyr-133 is mutated to an alanine, much more asparagine is required to obtain an active chemoreceptor. Isothermal titration calorimetry experiments on the purified sensing domain showed a K(D) to asparagine of 14 mum, with the three mutations leading to less efficient binding. Taken together, these results reveal not only a novel chemoreceptor sensing domain architecture but also, possibly, a different mechanism for chemoreceptor activation.
Collapse
Affiliation(s)
- George D Glekas
- Department of Biochemistry, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | | | | | | | | | | | | |
Collapse
|
305
|
Helles G, Fonseca R. Predicting dihedral angle probability distributions for protein coil residues from primary sequence using neural networks. BMC Bioinformatics 2009; 10:338. [PMID: 19835576 PMCID: PMC2771020 DOI: 10.1186/1471-2105-10-338] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2009] [Accepted: 10/16/2009] [Indexed: 11/10/2022] Open
Abstract
Background Predicting the three-dimensional structure of a protein from its amino acid sequence is currently one of the most challenging problems in bioinformatics. The internal structure of helices and sheets is highly recurrent and help reduce the search space significantly. However, random coil segments make up nearly 40% of proteins and they do not have any apparent recurrent patterns, which complicates overall prediction accuracy of protein structure prediction methods. Luckily, previous work has indicated that coil segments are in fact not completely random in structure and flanking residues do seem to have a significant influence on the dihedral angles adopted by the individual amino acids in coil segments. In this work we attempt to predict a probability distribution of these dihedral angles based on the flanking residues. While attempts to predict dihedral angles of coil segments have been done previously, none have, to our knowledge, presented comparable results for the probability distribution of dihedral angles. Results In this paper we develop an artificial neural network that uses an input-window of amino acids to predict a dihedral angle probability distribution for the middle residue in the input-window. The trained neural network shows a significant improvement (4-68%) in predicting the most probable bin (covering a 30° × 30° area of the dihedral angle space) for all amino acids in the data set compared to baseline statistics. An accuracy comparable to that of secondary structure prediction (≈ 80%) is achieved by observing the 20 bins with highest output values. Conclusion Many different protein structure prediction methods exist and each uses different tools and auxiliary predictions to help determine the native structure. In this work the sequence is used to predict local context dependent dihedral angle propensities in coil-regions. This predicted distribution can potentially improve tertiary structure prediction methods that are based on sampling the backbone dihedral angles of individual amino acids. The predicted distribution may also help predict local structure fragments used in fragment assembly methods.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Department of Computer Science, Universitetsparken 1, 2100 Copenhagen, Denmark.
| | | |
Collapse
|
306
|
Bultrini E, Brick K, Mukherjee S, Zhang Y, Silvestrini F, Alano P, Pizzi E. Revisiting the Plasmodium falciparum RIFIN family: from comparative genomics to 3D-model prediction. BMC Genomics 2009; 10:445. [PMID: 19769795 PMCID: PMC2756283 DOI: 10.1186/1471-2164-10-445] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2009] [Accepted: 09/21/2009] [Indexed: 11/24/2022] Open
Abstract
Background Subtelomeric RIFIN genes constitute the most abundant multigene family in Plasmodium falciparum. RIFIN products are targets for the human immune response and contribute to the antigenic variability of the parasite. They are transmembrane proteins grouped into two sub-families (RIF_A and RIF_B). Although recent data show that RIF_A and RIF_B have different sub-cellular localisations and possibly different functions, the same structural organisation has been proposed for members of the two sub-families. Despite recent advances, our knowledge of the regulation of RIFIN gene expression is still poor and the biological role of the protein products remain obscure. Results Comparative studies on RIFINs in three clones of P. falciparum (3D7, HB3 and Dd2) by Multidimensional scaling (MDS) showed that gene sequences evolve differently in the 5'upstream, coding, and 3'downstream regions, and suggested a possible role of highly conserved 3' downstream sequences. Despite the expected polymorphism, we found that the overall structure of RIFIN repertoires is conserved among clones suggesting a balance between genetic drift and homogenisation mechanisms which guarantees emergence of novel variants but preserves the functionality of genes. Protein sequences from a bona fide set of 3D7 RIFINs were submitted to predictors of secondary structure elements. In contrast with the previously proposed structural organisation, no signal peptide and only one transmembrane helix were predicted for the majority of RIF_As. Finally, we developed a strategy to obtain a reliable 3D-model for RIF_As. We generated 265 possible structures from 53 non-redundant sequences, from which clustering and quality assessments selected two models as the most representative for putative RIFIN protein structures. Conclusion First, comparative analyses of RIFIN repertoires in different clones of P. falciparum provide insights on evolutionary mechanisms shaping the multigene family. Secondly, we found that members of the two sub-families RIF_As and RIF_Bs have different structural organization in accordance with recent experimental results. Finally, representative models for RIF_As have an "Armadillo-like" fold which is known to promote protein-protein interactions in diverse contexts.
Collapse
Affiliation(s)
- Emanuele Bultrini
- Dipartimento di Malattie Infettive, Parassitarie ed Immunomediate, Istituto Superiore di Sanità, Viale Regina Elena, 299, 00161 Roma, Italy.
| | | | | | | | | | | | | |
Collapse
|
307
|
Li Y, Zhang Y. REMO: A new protocol to refine full atomic protein models from C-alpha traces by optimizing hydrogen-bonding networks. Proteins 2009; 76:665-76. [PMID: 19274737 PMCID: PMC2771173 DOI: 10.1002/prot.22380] [Citation(s) in RCA: 99] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein structure prediction approaches usually perform modeling simulations based on reduced representation of protein structures. For biological utilizations, it is an important step to construct full atomic models from the reduced structure decoys. Most of the current full atomic model reconstruction procedures have defects which either could not completely remove the steric clashes among backbone atoms or generate final atomic models with worse topology similarity relative to the native structures than the reduced models. In this work, we develop a new protocol, called REMO, to generate full atomic protein models by optimizing the hydrogen-bonding network with basic fragments matched from a newly constructed backbone isomer library of solved protein structures. The algorithm is benchmarked on 230 nonhomologous proteins with reduced structure decoys generated by I-TASSER simulations. The results show that REMO has a significant ability to remove steric clashes, and meanwhile retains good topology of the reduced model. The hydrogen-bonding network of the final models is dramatically improved during the procedure. The REMO algorithm has been exploited in the recent CASP8 experiment which demonstrated significant improvements of the I-TASSER models in both atomic-level structural refinement and hydrogen-bonding network construction.
Collapse
Affiliation(s)
- Yunqi Li
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | - Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| |
Collapse
|
308
|
Goldman AD, Leigh JA, Samudrala R. Comprehensive computational analysis of Hmd enzymes and paralogs in methanogenic Archaea. BMC Evol Biol 2009; 9:199. [PMID: 19671178 PMCID: PMC2739858 DOI: 10.1186/1471-2148-9-199] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Accepted: 08/11/2009] [Indexed: 11/29/2022] Open
Abstract
Background Methanogenesis is the sole means of energy production in methanogenic Archaea. H2-forming methylenetetrahydromethanopterin dehydrogenase (Hmd) catalyzes a step in the hydrogenotrophic methanogenesis pathway in class I methanogens. At least one hmd paralog has been identified in nine of the eleven complete genome sequences of class I hydrogenotrophic methanogens. The products of these paralog genes have thus far eluded any detailed functional characterization. Results Here we present a thorough computational analysis of Hmd enzymes and paralogs that includes state of the art phylogenetic inference, structure prediction, and functional site prediction techniques. We determine that the Hmd enzymes are phylogenetically distinct from Hmd paralogs but share a common overall structure. We predict that the active site of the Hmd enzyme is conserved as a functional site in Hmd paralogs and use this observation to propose possible molecular functions of the paralog that are consistent with previous experimental evidence. We also identify an uncharacterized site in the N-terminal domains of both proteins that is predicted by our methods to directly impart function. Conclusion This study contributes to our understanding of the evolutionary history, structural conservation, and functional roles, of the Hmd enzymes and paralogs. The results of our phylogenetic and structural analysis constitute datasets that will aid in the future study of the Hmd protein family. Our functional site predictions generate several testable hypotheses that will guide further experimental characterization of the Hmd paralog. This work also represents a novel approach to protein function prediction in which multiple computational methods are integrated to achieve a detailed characterization of proteins that are not well understood.
Collapse
Affiliation(s)
- Aaron D Goldman
- Department of Microbiology, University of Washington, Seattle, WA, USA.
| | | | | |
Collapse
|
309
|
|
310
|
On the intracellular trafficking of mouse S5 ribosomal protein from cytoplasm to nucleoli. J Mol Biol 2009; 392:1192-204. [PMID: 19631221 DOI: 10.1016/j.jmb.2009.07.049] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2009] [Revised: 07/07/2009] [Accepted: 07/16/2009] [Indexed: 11/21/2022]
Abstract
The non-ribosomal functions of mammalian ribosomal proteins have recently attracted worldwide attention. The mouse ribosomal protein S5 (rpS5) derived from ribosomal material is an assembled non-phosphorylated protein. The free form of rpS5 protein, however, undergoes phosphorylation. In this study, we have (a) investigated the potential role of phosphorylation in rpS5 protein transport into the nucleus and then into nucleoli and (b) determined which of the domains of rpS5 are involved in this intracellular trafficking. In vitro PCR mutagenesis of mouse rpS5 cDNA, complemented by subsequent cloning and expression of rpS5 truncated recombinant forms, produced in fusion with green fluorescent protein, permitted the investigation of rpS5 intracellular trafficking in HeLa cells using confocal microscopy complemented by Western blot analysis. Our results indicate the following: (a) rpS5 protein enters the nucleus via the region 38-50 aa that forms a random coil as revealed by molecular dynamic simulation. (b) Immunoprecipitation of rpS5 with casein kinase II and immobilized metal affinity chromatography analysis complemented by in vitro kinase assay revealed that phosphorylation of rpS5 seems to be indispensable for its transport from nucleus to nucleoli; upon entering the nucleus, Thr-133 phosphorylation triggers Ser-24 phosphorylation by casein kinase II, thus promoting entrance of rpS5 into the nucleoli. Another important role of rpS5 N-terminal region is proposed to be the regulation of protein's cellular level. The repetitively co-appearance of a satellite C-terminal band below the entire rpS5 at the late stationary phase, and not at the early logarithmic phase, of cell growth suggests a specific degradation balancing probably the unassembled ribosomal protein molecules with those that are efficiently assembled to ribosomal subunits. Overall, these data provide new insights on the structural and functional domains within the rpS5 molecule that contribute to its cellular functions.
Collapse
|
311
|
Veith A, Klingl A, Zolghadr B, Lauber K, Mentele R, Lottspeich F, Rachel R, Albers SV, Kletzin A. Acidianus,SulfolobusandMetallosphaerasurface layers: structure, composition and gene expression. Mol Microbiol 2009; 73:58-72. [DOI: 10.1111/j.1365-2958.2009.06746.x] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
312
|
New vistas in GPCR 3D structure prediction. J Mol Model 2009; 16:183-91. [PMID: 19551412 DOI: 10.1007/s00894-009-0533-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2009] [Accepted: 05/06/2009] [Indexed: 10/20/2022]
Abstract
Human G-protein coupled receptors (hGPCRs) comprise the most prominent family of validated drug targets. More than 50% of approved drugs reveal their therapeutic effects by targeting this family. Accurate models would greatly facilitate the process of drug discovery and development. However, 3-D structure prediction of GPCRs remains a challenge due to limited availability of resolved structure. The X-ray structures have been solved for only four such proteins. The identity between hGPCRs and the potential templates is mostly less than 30%, well below the level at which sequence alignment can be done regularly. In this study, we analyze a large database of human G-protein coupled receptors that are members of family A in order to optimize usage of the available crystal structures for molecular modeling of hGPCRs. On the basis of our findings in this study, we propose to regard specific parts from the trans-membrane domains of the reference receptor helices as appropriate template for constructing models of other GPCRs, while other residues require other techniques for their remodeling and refinement. The proposed hypothesis in the current study has been tested by modeling human beta2-adrenergic receptor based on crystal structures of bovine rhodopsin (1F88) and human A2A adenosine receptor (3EML). The results have shown some improvement in the quality of the predicted models compared to Modeller software.
Collapse
|
313
|
Weraarpachai W, Antonicka H, Sasarman F, Seeger J, Schrank B, Kolesar JE, Lochmüller H, Chevrette M, Kaufman BA, Horvath R, Shoubridge EA. Mutation in TACO1, encoding a translational activator of COX I, results in cytochrome c oxidase deficiency and late-onset Leigh syndrome. Nat Genet 2009; 41:833-7. [DOI: 10.1038/ng.390] [Citation(s) in RCA: 229] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2009] [Accepted: 04/27/2009] [Indexed: 12/15/2022]
|
314
|
Zhou H, Skolnick J. Protein structure prediction by pro-Sp3-TASSER. Biophys J 2009; 96:2119-27. [PMID: 19289038 DOI: 10.1016/j.bpj.2008.12.3898] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 11/12/2008] [Accepted: 12/03/2008] [Indexed: 12/29/2022] Open
Abstract
An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP(3). Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
315
|
Benkert P, Schwede T, Tosatto SC. QMEANclust: estimation of protein model quality by combining a composite scoring function with structural density information. BMC STRUCTURAL BIOLOGY 2009; 9:35. [PMID: 19457232 PMCID: PMC2709111 DOI: 10.1186/1472-6807-9-35] [Citation(s) in RCA: 112] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2008] [Accepted: 05/20/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND The selection of the most accurate protein model from a set of alternatives is a crucial step in protein structure prediction both in template-based and ab initio approaches. Scoring functions have been developed which can either return a quality estimate for a single model or derive a score from the information contained in the ensemble of models for a given sequence. Local structural features occurring more frequently in the ensemble have a greater probability of being correct. Within the context of the CASP experiment, these so called consensus methods have been shown to perform considerably better in selecting good candidate models, but tend to fail if the best models are far from the dominant structural cluster. In this paper we show that model selection can be improved if both approaches are combined by pre-filtering the models used during the calculation of the structural consensus. RESULTS Our recently published QMEAN composite scoring function has been improved by including an all-atom interaction potential term. The preliminary model ranking based on the new QMEAN score is used to select a subset of reliable models against which the structural consensus score is calculated. This scoring function called QMEANclust achieves a correlation coefficient of predicted quality score and GDT_TS of 0.9 averaged over the 98 CASP7 targets and perform significantly better in selecting good models from the ensemble of server models than any other groups participating in the quality estimation category of CASP7. Both scoring functions are also benchmarked on the MOULDER test set consisting of 20 target proteins each with 300 alternatives models generated by MODELLER. QMEAN outperforms all other tested scoring functions operating on individual models, while the consensus method QMEANclust only works properly on decoy sets containing a certain fraction of near-native conformations. We also present a local version of QMEAN for the per-residue estimation of model quality (QMEANlocal) and compare it to a new local consensus-based approach. CONCLUSION Improved model selection is obtained by using a composite scoring function operating on single models in order to enrich higher quality models which are subsequently used to calculate the structural consensus. The performance of consensus-based methods such as QMEANclust highly depends on the composition and quality of the model ensemble to be analysed. Therefore, performance estimates for consensus methods based on large meta-datasets (e.g. CASP) might overrate their applicability in more realistic modelling situations with smaller sets of models based on individual methods.
Collapse
Affiliation(s)
- Pascal Benkert
- Swiss Institute of Bioinformatics, Biozentrum, University of Basel, Klingelbergstrasse 50/70, 4056 Basel, Switzerland.
| | | | | |
Collapse
|
316
|
Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation. Nucleic Acids Res 2009; 37:W510-4. [PMID: 19429685 DOI: 10.1093/nar/gkp322] [Citation(s) in RCA: 593] [Impact Index Per Article: 39.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Model quality estimation is an essential component of protein structure prediction, since ultimately the accuracy of a model determines its usefulness for specific applications. Usually, in the course of protein structure prediction a set of alternative models is produced, from which subsequently the most accurate model has to be selected. The QMEAN server provides access to two scoring functions successfully tested at the eighth round of the community-wide blind test experiment CASP. The user can choose between the composite scoring function QMEAN, which derives a quality estimate on the basis of the geometrical analysis of single models, and the clustering-based scoring function QMEANclust which calculates a global and local quality estimate based on a weighted all-against-all comparison of the models from the ensemble provided by the user. The web server performs a ranking of the input models and highlights potentially problematic regions for each model. The QMEAN server is available at http://swissmodel.expasy.org/qmean.
Collapse
|
317
|
Anderson DM, Beres BJ, Wilson-Rawls J, Rawls A. The homeobox gene Mohawk represses transcription by recruiting the sin3A/HDAC co-repressor complex. Dev Dyn 2009; 238:572-80. [PMID: 19235719 DOI: 10.1002/dvdy.21873] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
Mohawk is an atypical homeobox gene expressed in embryonic progenitor cells of skeletal muscle, tendon, and cartilage. We demonstrate that Mohawk functions as a transcriptional repressor capable of blocking the myogenic conversion of 10T1/2 fibroblasts. The repressor activity is located in three small, evolutionarily conserved domains (MRD1-3) in the carboxy-terminal half of the protein. Point mutation analysis revealed six residues in MRD1 are sufficient for repressor function. The carboxy-terminal half of Mohawk is able to recruit components of the Sin3A/HDAC co-repressor complex (Sin3A, Hdac1, and Sap18) and a subset of Polymerase II general transcription factors (Tbp, TFIIA1 and TFIIB). Furthermore, Sap18, a protein that bridges the Sin3A/HDAC complex to DNA-bound transcription factors, is co-immunoprecipitated by MRD1. These data predict that Mohawk can repress transcription through recruitment of the Sin3A/HDAC co-repressor complex, and as a result, repress target genes required for the differentiation of cells to the myogenic lineage.
Collapse
Affiliation(s)
- Douglas M Anderson
- School of Life Sciences, Center for Evolutionary Functional Genomics, Arizona State University, Tempe, Arizona 85287-4501, USA
| | | | | | | |
Collapse
|
318
|
Zhang Y. Protein structure prediction: when is it useful? Curr Opin Struct Biol 2009; 19:145-55. [PMID: 19327982 PMCID: PMC2673339 DOI: 10.1016/j.sbi.2009.02.005] [Citation(s) in RCA: 193] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2008] [Revised: 02/18/2009] [Accepted: 02/19/2009] [Indexed: 10/21/2022]
Abstract
Computationally predicted three-dimensional structure of protein molecules has demonstrated the usefulness in many areas of biomedicine, ranging from approximate family assignments to precise drug screening. For nearly 40 years, however, the accuracy of the predicted models has been dictated by the availability of close structural templates. Progress has recently been achieved in refining low-resolution models closer to the native ones; this has been made possible by combining knowledge-based information from multiple sources of structural templates as well as by improving the energy funnel of physics-based force fields. Unfortunately, there has been no essential progress in the development of techniques for detecting remotely homologous templates and for predicting novel protein structures.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, USA.
| |
Collapse
|
319
|
Skolnick J, Brylinski M. FINDSITE: a combined evolution/structure-based approach to protein function prediction. Brief Bioinform 2009; 10:378-91. [PMID: 19324930 DOI: 10.1093/bib/bbp017] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A key challenge of the post-genomic era is the identification of the function(s) of all the molecules in a given organism. Here, we review the status of sequence and structure-based approaches to protein function inference and ligand screening that can provide functional insights for a significant fraction of the approximately 50% of ORFs of unassigned function in an average proteome. We then describe FINDSITE, a recently developed algorithm for ligand binding site prediction, ligand screening and molecular function prediction, which is based on binding site conservation across evolutionary distant proteins identified by threading. Importantly, FINDSITE gives comparable results when high-resolution experimental structures as well as predicted protein models are used.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology 250 14th St NW, Atlanta, GA 30318, USA.
| | | |
Collapse
|
320
|
Miklós I, Novák Á, Satija R, Lyngsø R, Hein J. Stochastic models of sequence evolution including insertion—deletion events. Stat Methods Med Res 2009; 18:453-85. [DOI: 10.1177/0962280208099500] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4—5 sequences. MCMC techniques can bring this to about 10—15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.
Collapse
Affiliation(s)
- István Miklós
- Bioinformatics Group, Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences, 1053 Budapest, Reáltanoda u. 13-15, Hungary, , Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK, Data Mining and Search Research Group, Computer and Automation Institute, Hungarian Academy of Sciences, 1111 Budapest, Lágymányosi u. 11., Hungary
| | - Ádám Novák
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Rahul Satija
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Rune Lyngsø
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| | - Jotun Hein
- Bioinformatics Group, Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK
| |
Collapse
|
321
|
Sathyanarayana BK, Hahn Y, Patankar MS, Pastan I, Lee B. Mesothelin, Stereocilin, and Otoancorin are predicted to have superhelical structures with ARM-type repeats. BMC STRUCTURAL BIOLOGY 2009; 9:1. [PMID: 19128473 PMCID: PMC2628672 DOI: 10.1186/1472-6807-9-1] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/31/2008] [Accepted: 01/07/2009] [Indexed: 11/25/2022]
Abstract
Background Mesothelin is a 40 kDa protein present on the surface of normal mesothelial cells and overexpressed in many human tumours, including mesothelioma and ovarian and pancreatic adenocarcinoma. It forms a strong and specific complex with MUC16, which is also highly expressed on the surface of mesothelioma and ovarian cancer cells. This binding has been suggested to be the basis of ovarian cancer metastasis. Knowledge of the structure of this protein will be useful, for example, in building a structural model of the MUC16-mesothelin complex. Mesothelin is produced as a precursor, which is cleaved by furin to produce the N-terminal half, which is called the megakaryocyte potentiating factor (MPF), and the C-terminal half, which is mesothelin. Little is known about the function of mesothelin and there is no information on its possible three-dimensional structure. Mesothelin has been reported to be homologous to the deafness-related inner ear proteins otoancorin and stereocilin, for neither of which the three-dimensional structure is known. Results The BLAST and PSI-BLAST searches confirmed that mesothelin and mesothelin precursor proteins are remotely homologous to stereocilin and otoancorin and more closely homologous to the hypothetical protein MPFL (MPF-like). Secondary structure prediction servers predicted a predominantly helical structure for both mesothelin and mesothelin precursor proteins and also for stereocilin and otoancorin. Three-dimensional structure prediction servers INHUB and I-TASSER produced structural models for mesothelin, which consisted of superhelical structures with ARM-type repeats in conformity with the secondary structure predictions. Similar ARM-type superhelical repeat structures were predicted by 3D-PSSM server for mesothelin precursor and for stereocilin and otoancorin proteins. Conclusion The mesothelin superfamily of proteins, which includes mesothelin, mesothelin precursor, megakaryocyte potentiating factor, MPFL, stereocilin and otoancorin, are predicted to have superhelical structures with ARM-type repeats. We suggest that all of these function as superhelical lectins to bind the carbohydrate moieties of extracellular glycoproteins.
Collapse
Affiliation(s)
- Bangalore K Sathyanarayana
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, NIH, Bethesda, Maryland 20892-4264, USA.
| | | | | | | | | |
Collapse
|
322
|
|
323
|
A Probabilistic Graphical Model for Ab Initio Folding. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2009; 5541:59-73. [PMID: 23459639 PMCID: PMC3583211 DOI: 10.1007/978-3-642-02008-7_5] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Despite significant progress in recent years, ab initio folding is still one of the most challenging problems in structural biology. This paper presents a probabilistic graphical model for ab initio folding, which employs Conditional Random Fields (CRFs) and directional statistics to model the relationship between the primary sequence of a protein and its three-dimensional structure. Different from the widely-used fragment assembly method and the lattice model for protein folding, our graphical model can explore protein conformations in a continuous space according to their probability. The probability of a protein conformation reflects its stability and is estimated from PSI-BLAST sequence profile and predicted secondary structure. Experimental results indicate that this new method compares favorably with the fragment assembly method and the lattice model.
Collapse
|
324
|
Peng J, Xu J. Boosting Protein Threading Accuracy. RESEARCH IN COMPUTATIONAL MOLECULAR BIOLOGY : ... ANNUAL INTERNATIONAL CONFERENCE, RECOMB ... : PROCEEDINGS. RECOMB (CONFERENCE : 2005- ) 2009; 5541:31-45. [PMID: 22506254 DOI: 10.1007/978-3-642-02008-7_3] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Protein threading is one of the most successful protein structure prediction methods. Most protein threading methods use a scoring function linearly combining sequence and structure features to measure the quality of a sequence-template alignment so that a dynamic programming algorithm can be used to optimize the scoring function. However, a linear scoring function cannot fully exploit interdependency among features and thus, limits alignment accuracy.This paper presents a nonlinear scoring function for protein threading, which not only can model interactions among different protein features, but also can be efficiently optimized using a dynamic programming algorithm. We achieve this by modeling the threading problem using a probabilistic graphical model Conditional Random Fields (CRF) and training the model using the gradient tree boosting algorithm. The resultant model is a nonlinear scoring function consisting of a collection of regression trees. Each regression tree models a type of nonlinear relationship among sequence and structure features. Experimental results indicate that this new threading model can effectively leverage weak biological signals and improve both alignment accuracy and fold recognition rate greatly.
Collapse
|
325
|
El-Kased RF, Koy C, Deierling T, Lorenz P, Qian Z, Li Y, Thiesen HJ, Glocker MO. Mass spectrometric and peptide chip epitope mapping of rheumatoid arthritis autoantigen RA33. EUROPEAN JOURNAL OF MASS SPECTROMETRY (CHICHESTER, ENGLAND) 2009; 15:747-759. [PMID: 19940341 DOI: 10.1255/ejms.1040] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
The protein termed RA33 was determined to be one major autoantigen in rheumatoid arthritis (RA) patients and antiRA33 auto-antibodies were found to appear shortly after onset of RA. They are often detectable before a final diagnosis can be made in the clinic. The aim of our study is to characterise the epitope of a monoclonal antiRA33 antibody on recombinant RA33 using mass spectrometric epitope mapping. Recombinant RA33 has been subjected to BrCN cleavage and fragments were separated by sodium dodecyl sulphate polyacrylamide gel electrophoresis (SDS-PAGE). Subsequent in-gel proteolytic digestion and mass spectrometric analysis determined the partial sequences in the protein bands. Western blotting of SDS-PAGE-separated protein fragments revealed immuno-positive, i.e. epitope-containing bands. BrCN-derived RA33 fragments were also separated by high- performance liquid chromatography (HPLC) and immuno-reactivity of peptides was measured by dot-blot analysis with the individual HPLC fractions after partial amino acid sequences were determined. The epitope region identified herewith was compared to data from peptide chip analysis with 15-meric synthetic peptides attached to a glass surface. Results from all three analyses consistently showed that the epitope of the monoclonal antiRA33 antibody is located in the aa79-84 region on recombinant RA33; the epitope sequence is MAARPHSIDGRVVEP. Sequence comparisons of the 15 best scoring peptides from the peptide chip analysis revealed that the epitope can be separated into two adjacent binding parts. The N-terminal binding parts comprise the amino acid residues "DGR", resembling the general physico-chemical properties "acidic/polar-small-basic". The C-terminal binding parts contain the amino acid residues "VVE", with the motif "hydrophobic-gap-acidic". The matching epitope region that emerged from our analysis on both the full-length protein and the 15-meric surface bound peptides suggests that peptide chips are indeed suitable tools for screening patterns of autoantibodies in patients suffering from autoimmune diseases.
Collapse
Affiliation(s)
- R F El-Kased
- Proteome Center Rostock, University of Rostock, Schillingallee 69, 18057 Rostock, Germany
| | | | | | | | | | | | | | | |
Collapse
|
326
|
Lee J, Joo K, Kim SY, Lee J. Re-examination of structure optimization of off-lattice protein AB models by conformational space annealing. J Comput Chem 2008; 29:2479-84. [PMID: 18470971 DOI: 10.1002/jcc.20995] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
The global structural optimization is carried out for off-lattice protein AB models in two and three dimensions by conformational space annealing. The models consist of hydrophobic and hydrophilic monomers in Fibonacci sequences. To accelerate the convergence, we have introduced a shift operator in the internal coordinate system, and effectively reduced the search space by forming a quotient space. With this, we significantly improve our previous results on AB models, and provide new low energy conformations. This work provides insights on exploring complicated energy landscapes by exploiting the advantages and limitations of CSA.
Collapse
Affiliation(s)
- Jinwoo Lee
- Department of Mathematics, Kwangwoon University, 26 Kwangoon Street, Nowon-Gu, Seoul 139-701 Korea.
| | | | | | | |
Collapse
|
327
|
Randall A, Baldi P. SELECTpro: effective protein model selection using a structure-based energy function resistant to BLUNDERs. BMC STRUCTURAL BIOLOGY 2008; 8:52. [PMID: 19055744 PMCID: PMC2667183 DOI: 10.1186/1472-6807-8-52] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2008] [Accepted: 12/03/2008] [Indexed: 11/10/2022]
Abstract
Background Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models. Results Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding. SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER). Conclusion SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.
Collapse
Affiliation(s)
- Arlo Randall
- School of Information and Computer Sciences, University of California, Irvine, CA 92697, USA.
| | | |
Collapse
|
328
|
Momen-Roknabadi A, Sadeghi M, Pezeshk H, Marashi SA. Impact of residue accessible surface area on the prediction of protein secondary structures. BMC Bioinformatics 2008; 9:357. [PMID: 18759992 PMCID: PMC2553345 DOI: 10.1186/1471-2105-9-357] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2007] [Accepted: 08/31/2008] [Indexed: 12/02/2022] Open
Abstract
Background The problem of accurate prediction of protein secondary structure continues to be one of the challenging problems in Bioinformatics. It has been previously suggested that amino acid relative solvent accessibility (RSA) might be an effective factor for increasing the accuracy of protein secondary structure prediction. Previous studies have either used a single constant threshold to classify residues into discrete classes (buries vs. exposed), or used the real-value predicted RSAs in their prediction method. Results We studied the effect of applying different RSA threshold types (namely, fixed thresholds vs. residue-dependent thresholds) on a variety of secondary structure prediction methods. With the consideration of DSSP-assigned RSA values we realized that improvement in the accuracy of prediction strictly depends on the selected threshold(s). Furthermore, we showed that choosing a single threshold for all amino acids is not the best possible parameter. We therefore used residue-dependent thresholds and most of residues showed improvement in prediction. Next, we tried to consider predicted RSA values, since in the real-world problem, protein sequence is the only available information. We first predicted the RSA classes by RVP-net program and then used these data in our method. Using this approach, improvement in prediction was also obtained. Conclusion The success of applying the RSA information on different secondary structure prediction methods suggest that prediction accuracy can be improved independent of prediction approaches. Thus, solvent accessibility can be considered as a rich source of information to help the improvement of these methods.
Collapse
Affiliation(s)
- Amir Momen-Roknabadi
- Department of Biotechnology, College of Science, University of Tehran, Tehran, Iran.
| | | | | | | |
Collapse
|
329
|
Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008; 72:547-56. [PMID: 18247410 DOI: 10.1002/prot.21945] [Citation(s) in RCA: 276] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, Kansas 66047, USA
| | | |
Collapse
|
330
|
Benchmarking of TASSER_2.0: an improved protein structure prediction algorithm with more accurate predicted contact restraints. Biophys J 2008; 95:1956-64. [PMID: 18487301 DOI: 10.1529/biophysj.108.129759] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To improve tertiary structure predictions of more difficult targets, the next generation of TASSER, TASSER_2.0, has been developed. TASSER_2.0 incorporates more accurate side-chain contact restraint predictions from a new approach, the composite-sequence method, based on consensus restraints generated by an improved threading algorithm, PROSPECTOR_3.5, which uses computationally evolved and wild-type template sequences as input. TASSER_2.0 was tested on a large-scale, benchmark set of 2591 nonhomologous, single domain proteins < or =200 residues that cover the Protein Data Bank at 35% pairwise sequence identity. Compared with the average fraction of accurately predicted side-chain contacts of 0.37 using PROSPECTOR_3.5 with wild-type template sequences, the average accuracy of the composite-sequence method increases to 0.60. The resulting TASSER_2.0 models are closer to their native structures, with an average root mean-square deviation of 4.99 A compared to the 5.31 A result of TASSER. Defining a successful prediction as a model with a root mean-square deviation to native <6.5 A, the success rate of TASSER_2.0 (TASSER) for Medium targets (targets with good templates/poor alignments) is 74.3% (64.7%) and 40.8% (35.5%) for the Hard targets (incorrect templates/alignments). For Easy targets (good templates/alignments), the success rate slightly increases from 86.3% to 88.4%.
Collapse
|
331
|
Zhang Y. Progress and challenges in protein structure prediction. Curr Opin Struct Biol 2008; 18:342-8. [PMID: 18436442 DOI: 10.1016/j.sbi.2008.02.004] [Citation(s) in RCA: 304] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 02/14/2008] [Indexed: 10/22/2022]
Abstract
Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Biosciences, University of Kansas, 2030 Becker Drive, Lawrence, KS 66047, United States.
| |
Collapse
|
332
|
Helles G. A comparative study of the reported performance of ab initio protein structure prediction algorithms. J R Soc Interface 2008; 5:387-96. [PMID: 18077243 DOI: 10.1098/rsif.2007.1278] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein structure prediction is one of the major challenges in bioinformatics today. Throughout the past five decades, many different algorithmic approaches have been attempted, and although progress has been made the problem remains unsolvable even for many small proteins. While the general objective is to predict the three-dimensional structure from primary sequence, our current knowledge and computational power are simply insufficient to solve a problem of such high complexity. Some prediction algorithms do, however, appear to perform better than others, although it is not always obvious which ones they are and it is perhaps even less obvious why that is. In this review, the reported performance results from 18 different recently published prediction algorithms are compared. Furthermore, the general algorithmic settings most likely responsible for the difference in the reported performance are identified, and the specific settings of each of the 18 prediction algorithms are also compared. The average normalized r.m.s.d. scores reported range from 11.17 to 3.48. With a performance measure including both r.m.s.d. scores and CPU time, the currently best-performing prediction algorithm is identified to be the I-TASSER algorithm. Two of the algorithmic settings--protein representation and fragment assembly--were found to have definite positive influence on the running time and the predicted structures, respectively. There thus appears to be a clear benefit from incorporating this knowledge in the design of new prediction algorithms.
Collapse
Affiliation(s)
- Glennie Helles
- University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark.
| |
Collapse
|
333
|
Cheng J. A multi-template combination algorithm for protein comparative modeling. BMC STRUCTURAL BIOLOGY 2008; 8:18. [PMID: 18366648 PMCID: PMC2311309 DOI: 10.1186/1472-6807-8-18] [Citation(s) in RCA: 74] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 03/17/2008] [Indexed: 11/26/2022]
Abstract
BACKGROUND Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available. RESULTS Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure. CONCLUSION We have developed a novel multi-template algorithm to improve protein comparative modeling.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO 65211-2060, USA.
| |
Collapse
|
334
|
Miklós I, Novák A, Dombai B, Hein J. How reliably can we predict the reliability of protein structure predictions? BMC Bioinformatics 2008; 9:137. [PMID: 18315874 PMCID: PMC2324098 DOI: 10.1186/1471-2105-9-137] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2007] [Accepted: 03/03/2008] [Indexed: 11/10/2022] Open
Abstract
Background Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.
Collapse
Affiliation(s)
- István Miklós
- Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG Oxford, UK.
| | | | | | | |
Collapse
|
335
|
Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. ACTA ACUST UNITED AC 2008; 24:924-31. [PMID: 18296462 DOI: 10.1093/bioinformatics/btn069] [Citation(s) in RCA: 151] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Pair-wise residue-residue contacts in proteins can be predicted from both threading templates and sequence-based machine learning. However, most structure modeling approaches only use the template-based contact predictions in guiding the simulations; this is partly because the sequence-based contact predictions are usually considered to be less accurate than that by threading. With the rapid progress in sequence databases and machine-learning techniques, it is necessary to have a detailed and comprehensive assessment of the contact-prediction methods in different template conditions. RESULTS We develop two methods for protein-contact predictions: SVM-SEQ is a sequence-based machine learning approach which trains a variety of sequence-derived features on contact maps; SVM-LOMETS collects consensus contact predictions from multiple threading templates. We test both methods on the same set of 554 proteins which are categorized into 'Easy', 'Medium', 'Hard' and 'Very Hard' targets based on the evolutionary and structural distance between templates and targets. For the Easy and Medium targets, SVM-LOMETS obviously outperforms SVM-SEQ; but for the Hard and Very Hard targets, the accuracy of the SVM-SEQ predictions is higher than that of SVM-LOMETS by 12-25%. If we combine the SVM-SEQ and SVM-LOMETS predictions together, the total number of correctly predicted contacts in the Hard proteins will increase by more than 60% (or 70% for the long-range contact with a sequence separation > or =24), compared with SVM-LOMETS alone. The advantage of SVM-SEQ is also shown in the CASP7 free modeling targets where the SVM-SEQ is around four times more accurate than SVM-LOMETS in the long-range contact prediction. These data demonstrate that the state-of-the-art sequence-based contact prediction has reached a level which may be helpful in assisting tertiary structure modeling for the targets which do not have close structure templates. The maximum yield should be obtained by the combination of both sequence- and template-based predictions.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | | |
Collapse
|
336
|
Abstract
We developed and tested the I-TASSER protein structure prediction algorithm in the CASP7 experiment, where targets are first threaded through the PDB library and continuous fragments in the threading alignments are exploited to assemble the global structure. The final models are obtained from the progressive refinements started from the last round structure clusters. A majority of the targets in the template-based modeling (TBM) category have the templates drawn closer to the native structure by more than 1 A within the aligned regions. For the free-modeling (FM) targets, I-TASSER builds correct topology for 7/19 cases with sequence up to 155 residues long. For the first time, the automated server prediction generates models as good as the human-expert does in all the categories, which shows the robustness of the method and the potential of the application to genome-wide structure prediction. Despite the success, the accuracy of I-TASSER modeling is still dominated by the similarity of the template and target structures with a strong correlation coefficient ( approximately 0.9) between the root-mean-squared deviation (RMSD) to native of the templates and the final models. Especially, there is no high-resolution model below 2 A for the FM targets. These problems highlight the issues that need to be addressed in the next generation of atomic-level I-TASSER development especially for the FM target modeling.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics, Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas 66047, USA.
| |
Collapse
|
337
|
Zhang Y. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 2008; 9:40. [PMID: 18215316 PMCID: PMC2245901 DOI: 10.1186/1471-2105-9-40] [Citation(s) in RCA: 3835] [Impact Index Per Article: 239.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 01/23/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions. RESULTS An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 A for RMSD. CONCLUSION The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/I-TASSER.
Collapse
Affiliation(s)
- Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA.
| |
Collapse
|
338
|
Abstract
BACKGROUND The prediction of protein structure can be facilitated by the use of constraints based on a knowledge of functional sites. Without this information it is still possible to predict which residues are likely to be part of a functional site and this information can be used to select model structures from a variety of alternatives that would correspond to a functional protein. RESULTS Using a large collection of protein-like decoy models, a score was devised that selected those with predicted functional site residues that formed a cluster. When tested on a variety of small alpha/beta/alpha type proteins, including enzymes and non-enzymes, those that corresponded to the native fold were ranked highly. This performance held also for a selection of larger alpha/beta/alpha proteins that played no part in the development of the method. CONCLUSION The use of predicted site positions provides a useful filter to discriminate native-like protein models from non-native models. The method can be applied to any collection of models and should provide a useful aid to all modelling methods from ab initio to homology based approaches.
Collapse
Affiliation(s)
- Vijayalakshmi Chelliah
- Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| | - William R Taylor
- Division of Mathematical Biology, The National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK
| |
Collapse
|
339
|
Qiu J, Sheffler W, Baker D, Noble WS. Ranking predicted protein structures with support vector regression. Proteins 2007; 71:1175-82. [PMID: 18004754 DOI: 10.1002/prot.21809] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Affiliation(s)
- Jian Qiu
- Department of Genome Sciences, University of Washington, Seattle, Washington, USA
| | | | | | | |
Collapse
|
340
|
Kaján L, Rychlewski L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics 2007; 8:304. [PMID: 17711571 PMCID: PMC2040163 DOI: 10.1186/1471-2105-8-304] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 08/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND 3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. RESULTS The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. CONCLUSION The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.
Collapse
Affiliation(s)
- László Kaján
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
| | - Leszek Rychlewski
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
- Bioinformatics Unit, Department of Physics, Adam Mickiewicz University, ul. Umultowska 85, 61-614 Poznań, Poland
| |
Collapse
|
341
|
Abstract
We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins < or =250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP(3), original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted alpha-helix content > or =50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score > or =0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP(3), TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is approximately 11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | | |
Collapse
|
342
|
Abstract
We developed LOMETS, a local threading meta-server, for quick and automated predictions of protein tertiary structures and spatial constraints. Nine state-of-the-art threading programs are installed and run in a local computer cluster, which ensure the quick generation of initial threading alignments compared with traditional remote-server-based meta-servers. Consensus models are generated from the top predictions of the component-threading servers, which are at least 7% more accurate than the best individual servers based on TM-score at a t-test significance level of 0.1%. Moreover, side-chain and C-alpha (C(alpha)) contacts of 42 and 61% accuracy respectively, as well as long- and short-range distant maps, are automatically constructed from the threading alignments. These data can be easily used as constraints to guide the ab initio procedures such as TASSER for further protein tertiary structure modeling. The LOMETS server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/LOMETS.
Collapse
Affiliation(s)
| | - Yang Zhang
- *To whom correspondence should be addressed. Tel: +1 785 864 1948; Fax: +1 785 864 5558;
| |
Collapse
|