101
|
Jeong J, Berman P, Przytycka TM. Improving strand pairing prediction through exploring folding cooperativity. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:484-491. [PMID: 18989036 PMCID: PMC2597093 DOI: 10.1109/tcbb.2008.88] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The topology of beta-sheets is defined by the pattern of hydrogen-bonded strand pairing. Therefore, predicting hydrogen bonded strand partners is a fundamental step towards predicting beta-sheet topology. At the same time, finding the correct partners is very difficult due to long range interactions involved in strand pairing. Additionally, patterns of amino acids involved, in beta-sheet formations are very general and therefore difficult to use for computational recognition of specific contacts between strands. In this work, we report a new strand pairing algorithm. To address above mentioned difficulties, our algorithm attempts to mimic elements of the folding process. Namely, in addition to ensuring that the predicted hydrogen bonded strand pairs satisfy basic global consistency constraints, it takes into account hypothetical folding pathways. Consistently with this view, introducing hydrogen bonds between a pair of strands changes the probabilities of forming hydrogen bonds between other pairs of strand. We demonstrate that this approach provides an improvement over previously proposed algorithms. We also compare the performance of this method to that of a global optimization algorithm that poses the problem as integer linear programming optimization problem and solves it using ILOG CPLEX package.
Collapse
Affiliation(s)
- Jieun Jeong
- Department of Computer Science and Engineering, Pennsylvania State University, University Park, PA 16802, USA
| | | | | |
Collapse
|
102
|
Wu S, Zhang Y. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins 2008; 72:547-56. [PMID: 18247410 DOI: 10.1002/prot.21945] [Citation(s) in RCA: 310] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We develop a new threading algorithm MUSTER by extending the previous sequence profile-profile alignment method, PPA. It combines various sequence and structure information into single-body terms which can be conveniently used in dynamic programming search: (1) sequence profiles; (2) secondary structures; (3) structure fragment profiles; (4) solvent accessibility; (5) dihedral torsion angles; (6) hydrophobic scoring matrix. The balance of the weighting parameters is optimized by a grading search based on the average TM-score of 111 training proteins which shows a better performance than using the conventional optimization methods based on the PROSUP database. The algorithm is tested on 500 nonhomologous proteins independent of the training sets. After removing the homologous templates with a sequence identity to the target >30%, in 224 cases, the first template alignment has the correct topology with a TM-score >0.5. Even with a more stringent cutoff by removing the templates with a sequence identity >20% or detectable by PSI-BLAST with an E-value <0.05, MUSTER is able to identify correct folds in 137 cases with the first model of TM-score >0.5. Dependent on the homology cutoffs, the average TM-score of the first threading alignments by MUSTER is 5.1-6.3% higher than that by PPA. This improvement is statistically significant by the Wilcoxon signed rank test with a P-value < 1.0 x 10(-13), which demonstrates the effect of additional structural information on the protein fold recognition. The MUSTER server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/MUSTER.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, Kansas 66047, USA
| | | |
Collapse
|
103
|
Abstract
The long-standing problem of constructing protein structure alignments is of central importance in computational biology. The main goal is to provide an alignment of residue correspondences, in order to identify homologous residues across chains. A critical next step of this is the alignment of protein complexes and their interfaces. Here, we introduce the program CMAPi, a two-dimensional dynamic programming algorithm that, given a pair of protein complexes, optimally aligns the contact maps of their interfaces: it produces polynomial-time near-optimal alignments in the case of multiple complexes. We demonstrate the efficacy of our algorithm on complexes from PPI families listed in the SCOPPI database and from highly divergent cytokine families. In comparison to existing techniques, CMAPi generates more accurate alignments of interacting residues within families of interacting proteins, especially for sequences with low similarity. While previous methods that use an all-atom based representation of the interface have been successful, CMAPi's use of a contact map representation allows it to be more tolerant to conformational changes and thus to align more of the interaction surface. These improved interface alignments should enhance homology modeling and threading methods for predicting PPIs by providing a basis for generating template profiles for sequence-structure alignment.
Collapse
Affiliation(s)
- Vinay Pulim
- Computer Science and Artificial Intelligence Laboratory, MIT, MIT, Cambridge, USA
| | | | | |
Collapse
|
104
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 754] [Impact Index Per Article: 47.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
105
|
Ellrott K, Guo JT, Olman V, Xu Y. Improving the performance of protein threading using insertion/deletion frequency arrays. J Bioinform Comput Biol 2008; 6:585-602. [PMID: 18574864 DOI: 10.1142/s0219720008003552] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 12/01/2007] [Accepted: 01/03/2008] [Indexed: 11/18/2022]
Abstract
As a protein evolves, not every part of the amino acid sequence has an equal probability of being deleted or for allowing insertions, because not every amino acid plays an equally important role in maintaining the protein structure. However, the most prevalent models in fold recognition methods treat every amino acid deletion and insertion as equally probable events. We have analyzed the alignment patterns for homologous and analogous sequences to determine patterns of insertion and deletion, and used that information to determine the statistics of insertions and deletions for different amino acids of a target sequence. We define these patterns as insertion/deletion (indel) frequency arrays (IFAs). By applying IFAs to the protein threading problem, we have been able to improve the alignment accuracy, especially for proteins with low sequence identity. We have also demonstrated that the application of this information can lead to an improvement in fold recognition.
Collapse
Affiliation(s)
- Kyle Ellrott
- Department of Biochemistry and Molecular Biology, The University of Georgia, Athens, GA 30602, USA.
| | | | | | | |
Collapse
|
106
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1766] [Impact Index Per Article: 110.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
107
|
Cheng J. A multi-template combination algorithm for protein comparative modeling. BMC STRUCTURAL BIOLOGY 2008; 8:18. [PMID: 18366648 PMCID: PMC2311309 DOI: 10.1186/1472-6807-8-18] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/08/2008] [Accepted: 03/17/2008] [Indexed: 11/26/2022]
Abstract
BACKGROUND Multiple protein templates are commonly used in manual protein structure prediction. However, few automated algorithms of selecting and combining multiple templates are available. RESULTS Here we develop an effective multi-template combination algorithm for protein comparative modeling. The algorithm selects templates according to the similarity significance of the alignments between template and target proteins. It combines the whole template-target alignments whose similarity significance score is close to that of the top template-target alignment within a threshold, whereas it only takes alignment fragments from a less similar template-target alignment that align with a sizable uncovered region of the target. We compare the algorithm with the traditional method of using a single top template on the 45 comparative modeling targets (i.e. easy template-based modeling targets) used in the seventh edition of Critical Assessment of Techniques for Protein Structure Prediction (CASP7). The multi-template combination algorithm improves the GDT-TS scores of predicted models by 6.8% on average. The statistical analysis shows that the improvement is significant (p-value < 10-4). Compared with the ideal approach that always uses the best template, the multi-template approach yields only slightly better performance. During the CASP7 experiment, the preliminary implementation of the multi-template combination algorithm (FOLDpro) was ranked second among 67 servers in the category of high-accuracy structure prediction in terms of GDT-TS measure. CONCLUSION We have developed a novel multi-template algorithm to improve protein comparative modeling.
Collapse
Affiliation(s)
- Jianlin Cheng
- Department of Computer Science, Informatics Institute, University of Missouri, Columbia, MO 65211-2060, USA.
| |
Collapse
|
108
|
Poleksic A, Fienup M. Optimizing the size of the sequence profiles to increase the accuracy of protein sequence alignments generated by profile-profile algorithms. Bioinformatics 2008; 24:1145-53. [PMID: 18337259 DOI: 10.1093/bioinformatics/btn097] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Profile-based protein homology detection algorithms are valuable tools in genome annotation and protein classification. By utilizing information present in the sequences of homologous proteins, profile-based methods are often able to detect extremely weak relationships between protein sequences, as evidenced by the large-scale benchmarking experiments such as CASP and LiveBench. RESULTS We study the relationship between the sensitivity of a profile-profile method and the size of the sequence profile, which is defined as the average number of different residue types observed at the profile's positions. We also demonstrate that improvements in the sensitivity of a profile-profile method can be made by incorporating a profile-dependent scoring scheme, such as position-specific background frequencies. The techniques presented in this article are implemented in an alignment algorithm UNI-FOLD. When tested against other well-established methods for fold recognition, UNI-FOLD shows increased sensitivity and specificity in detecting remote relationships between protein sequences. AVAILABILITY UNI-FOLD web server can be accessed at http://blackhawk.cs.uni.edu
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, Cedar Falls, IA 50614, USA.
| | | |
Collapse
|
109
|
Battey JND, Kopp J, Bordoli L, Read RJ, Clarke ND, Schwede T. Automated server predictions in CASP7. Proteins 2008; 69 Suppl 8:68-82. [PMID: 17894354 DOI: 10.1002/prot.21761] [Citation(s) in RCA: 94] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
With each round of CASP (Critical Assessment of Techniques for Protein Structure Prediction), automated prediction servers have played an increasingly important role. Today, most protein structure prediction approaches in some way depend on automated methods for fold recognition or model building. The accuracy of server predictions has significantly increased over the last years, and, in CASP7, we observed a continuation of this trend. In the template-based modeling category, the best prediction server was ranked third overall, i.e. it outperformed all but two of the human participating groups. This server also ranked among the very best predictors in the free modeling category as well, being clearly beaten by only one human group. In the high accuracy (HA) subset of TBM, two of the top five groups were servers. This article summarizes the contribution of automated structure prediction servers in the CASP7 experiment, with emphasis on 3D structure prediction, as well as information on their prediction scope and public availability.
Collapse
|
110
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
111
|
|
112
|
Xu J, Jiao F, Yu L. Protein structure prediction using threading. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:91-121. [PMID: 18075163 DOI: 10.1007/978-1-59745-574-9_4] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
This chapter discusses the protocol for computational protein structure prediction by protein threading. First, we present a general procedure and summarize some typical ideas for each step of protein threading. Then, we describe the design and implementation of RAPTOR, a protein structure prediction program based on threading. The major focuses are three key components of RAPTOR: a linear programming approach to protein threading, two machine learning approaches (SVM and Gradient Boosting) to fold recognition, and evaluation of the statistical significance of the prediction results. The first part of this chapter is a brief review of protein threading, and the second part contains original research results. Some key ideas and results have been previously published.
Collapse
Affiliation(s)
- Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago, IL, USA
| | | | | |
Collapse
|
113
|
Pulim V, Bienkowska J, Berger B. LTHREADER: prediction of extracellular ligand-receptor interactions in cytokines using localized threading. Protein Sci 2007; 17:279-92. [PMID: 18096641 DOI: 10.1110/ps.073178108] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
Identification of extracellular ligand-receptor interactions is important for drug design and the treatment of diseases. Difficulties in detecting these interactions using high-throughput experimental techniques motivate the development of computational prediction methods. We propose a novel threading algorithm, LTHREADER, which generates accurate local sequence-structure interface alignments and integrates various statistical scores and experimental binding data to predict interactions within ligand-receptor families. LTHREADER uses a profile of secondary structure and solvent accessibility predictions with residue contact maps to guide and constrain alignments. Using a decision tree classifier and low-throughput experimental data for training, it combines information inferred from statistical interaction potentials, energy functions, correlated mutations, and conserved residue pairs to predict interactions. We apply our method to cytokines, which play a central role in the development of many diseases including cancer and inflammatory and autoimmune disorders. We tested our approach on two representative families from different structural classes (all-alpha and all-beta proteins) of cytokines. In comparison with the state-of-the-art threader RAPTOR, LTHREADER generates on average 20% more accurate alignments of interacting residues. Furthermore, in cross-validation tests, LTHREADER correctly predicts experimentally confirmed interactions for a common binding mode within the 4-helical long-chain cytokine family with 75% sensitivity and 86% specificity with 40% gain in sensitivity compared to RAPTOR. For the TNF-like family our method achieves 70% sensitivity with 55% specificity with 70% gain in sensitivity. LTHREADER combines information from multiple complex templates when such data are available. When only one solved structure is available, a localized PSI-BLAST approach also outperforms standard threading methods with 25%-50% improvements in sensitivity.
Collapse
Affiliation(s)
- Vinay Pulim
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139 USA
| | | | | |
Collapse
|
114
|
Akhavan A, Crivelli SN, Singh M, Lingappa VR, Muschler JL. SEA domain proteolysis determines the functional composition of dystroglycan. FASEB J 2007; 22:612-21. [PMID: 17905726 DOI: 10.1096/fj.07-8354com] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Post-translational modifications of the extracellular matrix receptor dystroglycan (DG) determine its functional state, and defects in these modifications are linked to muscular dystrophies and cancers. A prominent feature of DG biosynthesis is a precursor cleavage that segregates the ligand-binding and transmembrane domains into the noncovalently attached alpha- and beta-subunits. We investigate here the structural determinants and functional significance of this cleavage. We show that cleavage of DG elicits a conspicuous change in its ligand-binding activity. Mutations that obstruct this cleavage result in increased capacity to bind laminin, in part, due to enhanced glycosylation of alpha-DG. Reconstitution of DG cleavage in a cell-free expression system demonstrates that cleavage takes place in the endoplasmic reticulum, providing a suitable regulatory point for later processing events. Sequence and mutational analyses reveal that the cleavage occurs within a full SEA (sea urchin, enterokinase, agrin) module with traits matching those ascribed to autoproteolysis. Thus, cleavage of DG constitutes a control point for the modulation of its ligand-binding properties, with therapeutic implications for muscular dystrophies. We provide a structural model for the cleavage domain that is validated by experimental analysis and discuss this cleavage in the context of mucin protein and SEA domain evolution.
Collapse
Affiliation(s)
- Armin Akhavan
- California Pacific Medical Center Research Institute, 475 Brannan St., Ste. 220, San Francisco, CA 94107, USA
| | | | | | | | | |
Collapse
|
115
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
116
|
Kaján L, Rychlewski L. Evaluation of 3D-Jury on CASP7 models. BMC Bioinformatics 2007; 8:304. [PMID: 17711571 PMCID: PMC2040163 DOI: 10.1186/1471-2105-8-304] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2007] [Accepted: 08/21/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND 3D-Jury, the structure prediction consensus method publicly available in the Meta Server http://meta.bioinfo.pl/, was evaluated using models gathered in the 7th round of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7). 3D-Jury is an automated expert process that generates protein structure meta-predictions from sets of models obtained from partner servers. RESULTS The performance of 3D-Jury was analysed for three aspects. First, we examined the correlation between the 3D-Jury score and a model quality measure: the number of correctly predicted residues. The 3D-Jury score was shown to correlate significantly with the number of correctly predicted residues, the correlation is good enough to be used for prediction. 3D-Jury was also found to improve upon the competing servers' choice of the best structure model in most cases. The value of the 3D-Jury score as a generic reliability measure was also examined. We found that the 3D-Jury score separates bad models from good models better than the reliability score of the original server in 27 cases and falls short of it in only 5 cases out of a total of 38. We report the release of a new Meta Server feature: instant 3D-Jury scoring of uploaded user models. CONCLUSION The 3D-Jury score continues to be a good indicator of structural model quality. It also provides a generic reliability score, especially important for models that were not assigned such by the original server. Individual structure modellers can also benefit from the 3D-Jury scoring system by testing their models in the new instant scoring feature http://meta.bioinfo.pl/compare_your_model_example.pl available in the Meta Server.
Collapse
Affiliation(s)
- László Kaján
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
| | - Leszek Rychlewski
- BioInfoBank Institute, ul. Limanowskiego 24 A, 60-744 Poznań, Poland
- Bioinformatics Unit, Department of Physics, Adam Mickiewicz University, ul. Umultowska 85, 61-614 Poznań, Poland
| |
Collapse
|
117
|
Mirkovic N, Li Z, Parnassa A, Murray D. Strategies for high-throughput comparative modeling: applications to leverage analysis in structural genomics and protein family organization. Proteins 2007; 66:766-77. [PMID: 17154423 DOI: 10.1002/prot.21191] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The technological breakthroughs in structural genomics were designed to facilitate the solution of a sufficient number of structures, so that as many protein sequences as possible can be structurally characterized with the aid of comparative modeling. The leverage of a solved structure is the number and quality of the models that can be produced using the structure as a template for modeling and may be viewed as the "currency" with which the success of a structural genomics endeavor can be measured. Moreover, the models obtained in this way should be valuable to all biologists. To this end, at the Northeast Structural Genomics Consortium (NESG), a modular computational pipeline for automated high-throughput leverage analysis was devised and used to assess the leverage of the 186 unique NESG structures solved during the first phase of the Protein Structure Initiative (January 2000 to July 2005). Here, the results of this analysis are presented. The number of sequences in the nonredundant protein sequence database covered by quality models produced by the pipeline is approximately 39,000, so that the average leverage is approximately 210 models per structure. Interestingly, only 7900 of these models fulfill the stringent modeling criterion of being at least 30% sequence-identical to the corresponding NESG structures. This study shows how high-throughput modeling increases the efficiency of structure determination efforts by providing enhanced coverage of protein structure space. In addition, the approach is useful in refining the boundaries of structural domains within larger protein sequences, subclassifying sequence diverse protein families, and defining structure-based strategies specific to a particular family.
Collapse
Affiliation(s)
- Nebojsa Mirkovic
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | | | | | |
Collapse
|
118
|
Abstract
We developed LOMETS, a local threading meta-server, for quick and automated predictions of protein tertiary structures and spatial constraints. Nine state-of-the-art threading programs are installed and run in a local computer cluster, which ensure the quick generation of initial threading alignments compared with traditional remote-server-based meta-servers. Consensus models are generated from the top predictions of the component-threading servers, which are at least 7% more accurate than the best individual servers based on TM-score at a t-test significance level of 0.1%. Moreover, side-chain and C-alpha (C(alpha)) contacts of 42 and 61% accuracy respectively, as well as long- and short-range distant maps, are automatically constructed from the threading alignments. These data can be easily used as constraints to guide the ab initio procedures such as TASSER for further protein tertiary structure modeling. The LOMETS server is freely available to the academic community at http://zhang.bioinformatics.ku.edu/LOMETS.
Collapse
Affiliation(s)
| | - Yang Zhang
- *To whom correspondence should be addressed. Tel: +1 785 864 1948; Fax: +1 785 864 5558;
| |
Collapse
|
119
|
Palmer CA, Hollis DM, Watts RA, Houck LD, McCall MA, Gregg RG, Feldhoff PW, Feldhoff RC, Arnold SJ. Plethodontid modulating factor, a hypervariable salamander courtship pheromone in the three-finger protein superfamily. FEBS J 2007; 274:2300-10. [PMID: 17419731 DOI: 10.1111/j.1742-4658.2007.05766.x] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The soluble members of the three-finger protein superfamily all share a relatively simple 'three-finger' structure, yet perform radically different functions. Plethodontid modulating factor (PMF), a pheromone protein produced by the lungless salamander, Plethodon shermani, is a new and unusual member of this group. It affects female receptivity when delivered to the female's nares during courtship. As with other plethodontid pheromone genes, PMF is hyperexpressed in a specialized male mental (chin) gland. Unlike other plethodontid pheromone genes, however, PMF is also expressed at low levels in the skin, liver, intestine and kidneys of both sexes. The PMF sequences obtained from all tissue types were highly variable, with 103 unique haplotypes identified which averaged 35% sequence dissimilarity (range 1-60%) at the protein level. Despite this variation, however, all PMF sequences contained a conserved approximately 20-amino-acid secretion signal sequence and a pattern of eight cysteines that is also found in cytotoxins and short neurotoxins from snake venoms, as well as xenoxins from Xenopus. Although they share a common cysteine pattern, PMF isoforms differ from other three-finger proteins in: (a) amino-acid composition outside of the conserved motif; (b) length of the three distinguishing 'fingers'; (c) net charge at neutral pH. Whereas most three-finger proteins have a net positive charge at pH 7.0, PMF has a high net negative charge at neutral pH (pI range of most PMFs 3.5-4.0). Sequence comparisons suggest that PMF belongs to a distinct multigene subfamily within the three-finger protein superfamily.
Collapse
Affiliation(s)
- Catherine A Palmer
- Department of Zoology, Oregon State University, Corvallis, OR 97331-2914, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
120
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
121
|
Abstract
BACKGROUND Structure matching plays an important part in understanding the functional role of biological structures. Bioinformatics assists in this effort by reformulating this process into a problem of finding a maximum common subgraph between graphical representations of these structures. Among the many different variants of the maximum common subgraph problem, the maximum common induced subgraph of two graphs is of special interest. RESULTS Based on current research in the area of parameterized computation, we derive a new lower bound for the exact algorithms of the maximum common induced subgraph of two graphs which is the best currently known. Then we investigate the upper bound and design techniques for approaching this problem, specifically, reducing it to one of finding a maximum clique in the product graph of the two given graphs. Considering the upper bound result, the derived lower bound result is asymptotically tight. CONCLUSION Parameterized computation is a viable approach with great potential for investigating many applications within bioinformatics, such as the maximum common subgraph problem studied in this paper. With an improved hardness result and the proposed approaches in this paper, future research can be focused on further exploration of efficient approaches for different variants of this problem within the constraints imposed by real applications.
Collapse
Affiliation(s)
- Xiuzhen Huang
- Department of Computer Science, Arkansas State University, State University, Arkansas 72467, USA
| | - Jing Lai
- Department of Applied Science, University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA
| | - Steven F Jennings
- Department of Information Science, University of Arkansas at Little Rock, Little Rock, Arkansas 72204, USA
| |
Collapse
|
122
|
Abstract
Recently, we developed a pairwise structural alignment algorithm using realistic structural and environmental information (SAUCE). In this paper, we at first present an automatic fold hierarchical classification based on SAUCE alignments. This classification enables us to build a fold tree containing different levels of multiple structural profiles. Then a tree-based fold search algorithm is described. We applied this method to a group of structures with sequence identity less than 35% and did a series of leave one out tests. These tests are approximately comparable to fold recognition tests on superfamily level. Results show that fold recognition via a fold tree can be faster and better at detecting distant homologues than classic fold recognition methods.
Collapse
Affiliation(s)
- Yu Chen
- Bioinformatics Program, University of Michigan, Ann Arbor, Michigan 48109-1065, USA
| | | |
Collapse
|
123
|
Song Y, Liu C, Huang X, Malmberg RL, Xu Y, Cai L. Efficient parameterized algorithms for biopolymer structure-sequence alignment. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2006; 3:423-32. [PMID: 17085850 DOI: 10.1109/tcbb.2006.52] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity, but also spatially conserved conformations caused by residue interactions and, consequently, is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where, in general, the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O[k(t)N(2)] for the structure of N residues and its tree decomposition of width t. Parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. This paper demonstrates a successful application of the algorithm to RNA structure search used for noncoding RNA identification. An application to protein threading is also discussed.
Collapse
Affiliation(s)
- Yinglei Song
- Department of Computer Science, University of Georgia, Athens, GA 30602, USA.
| | | | | | | | | | | |
Collapse
|
124
|
Ma B, Wu L, Zhang K. Improving the sensitivity and specificity of protein homology search by incorporating predicted secondary structures. J Bioinform Comput Biol 2006; 4:709-20. [PMID: 16960971 DOI: 10.1142/s0219720006002119] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2005] [Revised: 12/30/2005] [Accepted: 12/30/2005] [Indexed: 11/18/2022]
Abstract
In this paper, we improve the homology search performance by the combination of the predicted protein secondary structures and protein sequences. Previous research suggested that the straightforward combination of predicted secondary structures did not improve the homology search performance, mostly because of the errors in the structure prediction. We solved this problem by taking into account the confidence scores output by the prediction programs.
Collapse
Affiliation(s)
- Bin Ma
- Computer Science Department, University of Western Ontario, London, ON N6A 5B7, Canada.
| | | | | |
Collapse
|
125
|
McDonnell AV, Menke M, Palmer N, King J, Cowen L, Berger B. Fold recognition and accurate sequence-structure alignment of sequences directing beta-sheet proteins. Proteins 2006; 63:976-85. [PMID: 16547930 DOI: 10.1002/prot.20942] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The ability to predict structure from sequence is particularly important for toxins, virulence factors, allergens, cytokines, and other proteins of public health importance. Many such functions are represented in the parallel beta-helix and beta-trefoil families. A method using pairwise beta-strand interaction probabilities coupled with evolutionary information represented by sequence profiles is developed to tackle these problems for the beta-helix and beta-trefoil folds. The algorithm BetaWrapPro employs a "wrapping" component that may capture folding processes with an initiation stage followed by processive interaction of the sequence with the already-formed motifs. BetaWrapPro outperforms all previous motif recognition programs for these folds, recognizing the beta-helix with 100% sensitivity and 99.7% specificity and the beta-trefoil with 100% sensitivity and 92.5% specificity, in crossvalidation on a database of all nonredundant known positive and negative examples of these fold classes in the PDB. It additionally aligns 88% of residues for the beta-helices and 86% for the beta-trefoils accurately (within four residues of the exact position) to the structural template, which is then used with the side-chain packing program SCWRL to produce 3D structure predictions. One striking result has been the prediction of an unexpected parallel beta-helix structure for a pollen allergen, and its recent confirmation through solution of its structure. A Web server running BetaWrapPro is available and outputs putative PDB-style coordinates for sequences predicted to form the target folds.
Collapse
Affiliation(s)
- Andrew V McDonnell
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | | | | | | | |
Collapse
|
126
|
Zhou H, Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins 2006; 58:321-8. [PMID: 15523666 PMCID: PMC1408319 DOI: 10.1002/prot.20308] [Citation(s) in RCA: 195] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recognizing structural similarity without significant sequence identity has proved to be a challenging task. Sequence-based and structure-based methods as well as their combinations have been developed. Here, we propose a fold-recognition method that incorporates structural information without the need of sequence-to-structure threading. This is accomplished by generating sequence profiles from protein structural fragments. The structure-derived sequence profiles allow a simple integration with evolution-derived sequence profiles and secondary-structural information for an optimized alignment by efficient dynamic programming. The resulting method (called SP(3)) is found to make a statistically significant improvement in both sensitivity of fold recognition and accuracy of alignment over the method based on evolution-derived sequence profiles alone (SP) and the method based on evolution-derived sequence profile and secondary structure profile (SP(2)). SP(3) was tested in SALIGN benchmark for alignment accuracy and Lindahl, PROSPECTOR 3.0, and LiveBench 8.0 benchmarks for remote-homology detection and model accuracy. SP(3) is found to be the most sensitive and accurate single-method server in all benchmarks tested where other methods are available for comparison (although its results are statistically indistinguishable from the next best in some cases and the comparison is subjected to the limitation of time-dependent sequence and/or structural library used by different methods.). In LiveBench 8.0, its accuracy rivals some of the consensus methods such as ShotGun-INBGU, Pmodeller3, Pcons4, and ROBETTA. SP(3) fold-recognition server is available on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
| | - Yaoqi Zhou
- *Correspondence to: Dr. Yaoqi Zhou, Howard Hughes Medical Institute, Center for Single Molecule Biophysics and Department of Physiology & Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214. E-mail:
| |
Collapse
|
127
|
Marsh L. Evolution of Structural Shape in Bacterial Globin-Related Proteins. J Mol Evol 2006; 62:575-87. [PMID: 16612536 DOI: 10.1007/s00239-005-0025-3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2005] [Accepted: 12/31/2005] [Indexed: 10/24/2022]
Abstract
The globin family of proteins has a characteristic structural pattern of helix interactions that nonetheless exhibits some variation. A simplified model for globin structural evolution was developed in which protein shape evolved by random change of contacts between helices. A conserved globin domain of 15 bacterial proteins representing four structural families was studied. Using a parsimony approach ancestral structural states could be reconstructed. The distribution of number of contact changes per site for a fixed topology tree fit a gamma distribution. Homoplasy was high, with multiple changes per site and no support for an invariant class of residue-residue contacts. Contacts changed more slowly than sequence. A phylogenetic reconstruction using a distance measure based on the proportion of shared contacts was generally consistent with a sequence-based phylogeny but not highly resolved. Contact pattern convergence between members of different globin family proteins could not be detected. Simulation studies indicated the convergence test was sensitive enough to have detected convergence involving only 10% of the contacts, suggesting a limit on the extent of selection for a specific contact pattern. Contact site methods may provide additional approaches to study the relationship between protein structure and sequence evolution.
Collapse
Affiliation(s)
- Lorraine Marsh
- Department of Biology, Long Island University, 1 University Plaza, Brooklyn, NY 11201, USA.
| |
Collapse
|
128
|
Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006; 16:178-82. [PMID: 16546376 DOI: 10.1016/j.sbi.2006.03.004] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Revised: 02/14/2006] [Accepted: 03/07/2006] [Indexed: 11/18/2022]
Abstract
The 1990s cultivated a generation of protein structure human predictors. As a result of structural genomics and genome sequencing projects, and significant improvements in the performance of protein structure prediction methods, a generation of automated servers has evolved in the past few years. Servers for close and distant homology modeling are now routinely used by many biologists, and have already been applied to the experimental structure determination process itself, and to the interpretation and annotation of genome sequences. Because dozens of servers are currently available, it is hard for a biologist to know which server(s) to use; however, the state of the art of these methods is now assessed through the LiveBench and CAFASP experiments. Meta-servers--servers that use the results of other autonomous servers to produce a consensus prediction--have proven to be the best performers, and are already challenging all but a handful of expert human predictors. The difference in performance of the top ten autonomous (non-meta) servers is small and hard to assess using relatively small test sets. Recent experiments suggest that servers will soon free humans from most of the burden of protein structure prediction.
Collapse
Affiliation(s)
- Daniel Fischer
- Buffalo Center of Excellence in Bioinformatics, and Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA.
| |
Collapse
|
129
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
130
|
Huang YM, Bystroff C. Improved pairwise alignments of proteins in the Twilight Zone using local structure predictions. Bioinformatics 2005; 22:413-22. [PMID: 16352653 DOI: 10.1093/bioinformatics/bti828] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION In recent years, advances have been made in the ability of computational methods to discriminate between homologous and non-homologous proteins in the 'twilight zone' of sequence similarity, where the percent sequence identity is a poor indicator of homology. To make these predictions more valuable to the protein modeler, they must be accompanied by accurate alignments. Pairwise sequence alignments are inferences of orthologous relationships between sequence positions. Evolutionary distance is traditionally modeled using global amino acid substitution matrices. But real differences in the likelihood of substitutions may exist for different structural contexts within proteins, since structural context contributes to the selective pressure. RESULTS HMMSUM (HMMSTR-based substitution matrices) is a new model for structural context-based amino acid substitution probabilities consisting of a set of 281 matrices, each for a different sequence-structure context. HMMSUM does not require the structure of the protein to be known. Instead, predictions of local structure are made using HMMSTR, a hidden Markov model for local structure. Alignments using the HMMSUM matrices compare favorably to alignments carried out using the BLOSUM matrices or structure-based substitution matrices SDM and HSDM when validated against remote homolog alignments from BAliBASE. HMMSUM has been implemented using local Dynamic Programming and with the Bayesian Adaptive alignment method.
Collapse
Affiliation(s)
- Yao-Ming Huang
- Center for Bioinformatics, Department of Biology, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
| | | |
Collapse
|
131
|
Wallner B, Elofsson A. Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics 2005; 21:4248-54. [PMID: 16204344 DOI: 10.1093/bioinformatics/bti702] [Citation(s) in RCA: 91] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The success of the consensus approach to the protein structure prediction problem has led to development of several different consensus methods. Most of them only rely on a structural comparison of a number of different models. However, there are other types of information that might be useful such as the score from the server and structural evaluation. RESULTS Pcons5 is a new and improved version of the consensus predictor Pcons. Pcons5 integrates information from three different sources: the consensus analysis, structural evaluation and the score from the fold recognition servers. We show that Pcons5 is better than the previous version of Pcons and that it performs better than using only the consensus analysis. In addition, we also present a version of Pmodeller based on Pcons5, which performs significantly better than Pcons5. AVAILABILITY Pcons5 is the first Pcons version available as a standalone program from http://www.sbc.su.se/~bjorn/Pcons5. It should be easy to implement in local meta-servers.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, Stockholm University SE-106 91 Stockholm, Sweden.
| | | |
Collapse
|
132
|
Xu J. Fold recognition by predicted alignment accuracy. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2005; 2:157-65. [PMID: 17044180 DOI: 10.1109/tcbb.2005.24] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
One of the key components in protein structure prediction by protein threading technique is to choose the best overall template for a given target sequence after all the optimal sequence-template alignments are generated. The chosen template should have the best alignment with the target sequence since the three-dimensional structure of the target sequence is built on the sequence-template alignment. The traditional method for template selection is called Z-score, which uses a statistical test to rank all the sequence-template alignments and then chooses the first-ranked template for the sequence. However, the calculation of Z-score is time-consuming and not suitable for genome-scale structure prediction. Z-scores are also hard to interpret when the threading scoring function is the weighted sum of several energy items of different physical meanings. This paper presents a Support Vector Machine (SVM) regression approach to directly predict the alignment accuracy of a sequence-template alignment, which is used to rank all the templates for a specific target sequence. Experimental results on a large-scale benchmark demonstrate that SVM regression performs much better than the composition-corrected Z-score method. SVM regression also runs much faster than the Z-score method.
Collapse
Affiliation(s)
- Jinbo Xu
- School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1.
| |
Collapse
|
133
|
Efficient Parameterized Algorithm for Biopolymer Structure-Sequence Alignment. LECTURE NOTES IN COMPUTER SCIENCE 2005. [PMCID: PMC7121179 DOI: 10.1007/11557067_31] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Computational alignment of a biopolymer sequence (e.g., an RNA or a protein) to a structure is an effective approach to predict and search for the structure of new sequences. To identify the structure of remote homologs, the structure-sequence alignment has to consider not only sequence similarity but also spatially conserved conformations caused by residue interactions, and consequently is computationally intractable. It is difficult to cope with the inefficiency without compromising alignment accuracy, especially for structure search in genomes or large databases. This paper introduces a novel method and a parameterized algorithm for structure-sequence alignment. Both the structure and the sequence are represented as graphs, where in general the graph for a biopolymer structure has a naturally small tree width. The algorithm constructs an optimal alignment by finding in the sequence graph the maximum valued subgraph isomorphic to the structure graph. It has the computational time complexity O(ktN2) for the structure of N residues and its tree decomposition of width t. The parameter k, small in nature, is determined by a statistical cutoff for the correspondence between the structure and the sequence. The paper demonstrates a successful application of the algorithm to developing a fast program for RNA structural homology search.
Collapse
|
134
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
135
|
Rapid Protein Side-Chain Packing via Tree Decomposition. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11415770_32] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
136
|
Xu J, Jiao F, Berger B. A tree-decomposition approach to protein structure prediction. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2005:247-56. [PMID: 16447982 DOI: 10.1109/csb.2005.9] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/06/2023]
Abstract
This paper proposes a tree decomposition of protein structures, which can be used to efficiently solve two key subproblems of protein structure prediction: protein threading for backbone prediction and protein side-chain prediction. To develop a unified tree-decomposition based approach to these two subproblems, we model them as a geometric neighborhood graph labeling problem. Theoretically, we can have a low-degree polynomial time algorithm to decompose a geometric neighborhood graph G = (V, E) into components with size O(|V|((2/3))log|V|). The computational complexity of the tree-decomposition based graph labeling algorithms is O(|V|Delta(tw+1)) where Delta is the average number of possible labels for each vertex and tw( = O(|V|((2/3))log|V|)) the tree width of G. Empirically, tw is very small and the tree-decomposition method can solve these two problems very efficiently. This paper also compares the computational efficiency of the tree-decomposition approach with the linear programming approach to these two problems and identifies the condition under which the tree-decomposition approach is more efficient than the linear programming approach. Experimental result indicates that the tree-decomposition approach is more efficient most of the time.
Collapse
Affiliation(s)
- Jinbo Xu
- Department of Mathematics and CSAIL, Massassachusetts Institute of Technology, Cambridge, MA 02139, USA.
| | | | | |
Collapse
|
137
|
Zhou H, Zhou Y. Single-body residue-level knowledge-based energy score combined with sequence-profile and secondary structure information for fold recognition. Proteins 2004; 55:1005-13. [PMID: 15146497 DOI: 10.1002/prot.20007] [Citation(s) in RCA: 163] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
An elaborate knowledge-based energy function is designed for fold recognition. It is a residue-level single-body potential so that highly efficient dynamic programming method can be used for alignment optimization. It contains a backbone torsion term, a buried surface term, and a contact-energy term. The energy score combined with sequence profile and secondary structure information leads to an algorithm called SPARKS (Sequence, secondary structure Profiles and Residue-level Knowledge-based energy Score) for fold recognition. Compared with the popular PSI-BLAST, SPARKS is 21% more accurate in sequence-sequence alignment in ProSup benchmark and 10%, 25%, and 20% more sensitive in detecting the family, superfamily, fold similarities in the Lindahl benchmark, respectively. Moreover, it is one of the best methods for sensitivity (the number of correctly recognized proteins), alignment accuracy (based on the MaxSub score), and specificity (the average number of correctly recognized proteins whose scores are higher than the first false positives) in LiveBench 7 among more than twenty servers of non-consensus methods. The simple algorithm used in SPARKS has the potential for further improvement. This highly efficient method can be used for fold recognition on genomic scales. A web server is established for academic users on http://theory.med.buffalo.edu.
Collapse
Affiliation(s)
- Hongyi Zhou
- Howard Hughes Medical Institute Center for Single Molecule Biophysics, Department of Physiology & Biophysics, State University of New York at Buffalo, New York 14214, USA
| | | |
Collapse
|
138
|
Friedberg I, Jaroszewski L, Ye Y, Godzik A. The interplay of fold recognition and experimental structure determination in structural genomics. Curr Opin Struct Biol 2004; 14:307-12. [PMID: 15193310 DOI: 10.1016/j.sbi.2004.04.005] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Achieving the goals of structural genomics initiatives depends on the outcomes of two groups of factors: the number and distribution of experimentally determined protein structures, and our ability to assign novel proteins to known structures (fold recognition) and use them to build models (modeling). The quality of the tools used for fold recognition defines the scope of experimental effort - the more distant the templates that can be recognized, the smaller the number of proteins that have to be solved. Recent improvements in fold recognition may have suggested that the goals of structural genomics initiatives are getting closer. However, problems that surfaced during the first few years of active work have put many of the early estimates in doubt and new ones are still slow in coming.
Collapse
Affiliation(s)
- Iddo Friedberg
- The Burnham Institute, 10901 North Torrey Pines Road, La Jolla, California 92037, USA
| | | | | | | |
Collapse
|
139
|
Abstract
We have developed a new algorithm based on the mathematical theory of linear programming (LP) and implemented it in our program RAPTOR. Our new approach provides an elegant formulation of the protein-threading problem, overcomes the intractability problem of protein threading, in practice, and allows us to use existing powerful linear programming software to obtain optimal protein threading solutions. CASP5 and CAFASP3 gave us the first chance to test RAPTOR in an unbiased way. RAPTOR was ranked as the top individual (automatic) server for fold recognition by the CAFASP3 organizers. In this short article, we describe RAPTOR's LP formulation, assess RAPTOR's performance in CAFASP3/CASP5, explain why it has superceded other existing automatic individual methods, and point out its strengths, limitations, extensions, and prospects for improvement.
Collapse
Affiliation(s)
- Jinbo Xu
- Department of Computer Science, University of Waterloo, Waterloo, Canada. j3xu,
| | | |
Collapse
|
140
|
|