1
|
Simpkin AJ, Mesdaghi S, Sánchez Rodríguez F, Elliott L, Murphy DL, Kryshtafovych A, Keegan RM, Rigden DJ. Tertiary structure assessment at CASP15. Proteins 2023; 91:1616-1635. [PMID: 37746927 PMCID: PMC10792517 DOI: 10.1002/prot.26593] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 08/25/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
The results of tertiary structure assessment at CASP15 are reported. For the first time, recognizing the outstanding performance of AlphaFold 2 (AF2) at CASP14, all single-chain predictions were assessed together, irrespective of whether a template was available. At CASP15, there was no single stand-out group, with most of the best-scoring groups-led by PEZYFoldings, UM-TBM, and Yang Server-employing AF2 in one way or another. Many top groups paid special attention to generating deep Multiple Sequence Alignments (MSAs) and testing variant MSAs, thereby allowing them to successfully address some of the hardest targets. Such difficult targets, as well as lacking templates, were typically proteins with few homologues. Local divergence between prediction and target correlated with localization at crystal lattice or chain interfaces, and with regions exhibiting high B-factor factors in crystal structure targets, and should not necessarily be considered as representing error in the prediction. However, analysis of exposed and buried side chain accuracy showed room for improvement even in the latter. Nevertheless, a majority of groups produced high-quality predictions for most targets, which are valuable for experimental structure determination, functional analysis, and many other tasks across biology. These include those applying methods similar to those used to generate major resources such as the AlphaFold Protein Structure Database and the ESM Metagenomic atlas: the confidence estimates of the former were also notably accurate.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - Shahram Mesdaghi
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Computational Biology Facility, MerseyBio, University of LiverpoolLiverpoolUK
| | - Filomeno Sánchez Rodríguez
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
- Life Science, Diamond Light Source, Harwell Science and Innovation CampusOxfordshireUK
- Department of Chemistry, York Structural Biology LaboratoryUniversity of YorkYorkUK
| | - Luc Elliott
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | - David L. Murphy
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| | | | - Ronan M. Keegan
- UKRI‐STFC, Rutherford Appleton Laboratory, Research Complex at HarwellDidcotUK
| | - Daniel J. Rigden
- Department of Biochemistry, Cell and Systems BiologyInstitute of Structural, Molecular and Integrative Biology, University of LiverpoolLiverpoolUK
| |
Collapse
|
2
|
Parmar M, Thumar R, Sheth J, Patel D. Designing multi-epitope based peptide vaccine targeting spike protein SARS-CoV-2 B1.1.529 (Omicron) variant using computational approaches. Struct Chem 2022; 33:2243-2260. [PMID: 36160688 PMCID: PMC9485025 DOI: 10.1007/s11224-022-02027-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 08/02/2022] [Indexed: 10/26/2022]
Abstract
Millions of lives have been infected since the SARS-CoV-2 outbreak in 2019. The high human-to-human transmission rate has warranted a need for a vaccine to protect people. Although some vaccines are in use, due to the high mutation rate in the SARS-CoV-2 multiple variants, the current vaccines may not be sufficient to immunize people against new variant threats. One of the emerging concern variants is B1.1.529 (Omicron), which carries ~ 30 mutations in the Spike protein (S) of SARS-CoV-2 and is predicted to evade antibody recognition even from vaccinated people. We used a structure-based approach and an epitope prediction server to develop a Multi-Epitope based Subunit Vaccine (MESV) involving SARS-CoV-2 B1.1.529 variant spike glycoprotein. The predicted epitope with better antigenicity and non-toxicity was used for designing and predicting vaccine construct features and structure models. In addition, the MESV construct In silico cloning in the pET28a expression vector predicted the construct to be highly translational. The proposed MESV vaccine construct was also subjected to immune simulation prediction and was found to be highly antigenic and elicit a cell-mediated immune response. Therefore, the proposed MESV in the present study has the potential to be evaluated further for vaccine production against the newly identified B1.1.529 (Omicron) variant of concern. Supplementary Information The online version contains supplementary material available at 10.1007/s11224-022-02027-6.
Collapse
Affiliation(s)
- Meet Parmar
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Ritik Thumar
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Jigar Sheth
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
| | - Dhaval Patel
- Department of Biotechnology and Bioengineering, Institute of Advanced Research, Koba Institutional Area, Gandhinagar-382426, Gujarat, India
- Gujarat Biotechnology University, Gujarat International Finance Tec-City, Gandhinagar, 382355 Gujarat India
| |
Collapse
|
3
|
Karanji AK, Khakinejad M, Kondalaji SG, Majuta SN, Attanayake K, Valentine SJ. Comparison of Peptide Ion Conformers Arising from Non-Helical and Helical Peptides Using Ion Mobility Spectrometry and Gas-Phase Hydrogen/Deuterium Exchange. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2018; 29:2402-2412. [PMID: 30324261 PMCID: PMC6553874 DOI: 10.1007/s13361-018-2053-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 07/17/2018] [Accepted: 08/03/2018] [Indexed: 05/06/2023]
Abstract
The dominant gas-phase conformer of [M+3H]3+ ions of the model peptide acetyl-PSSSSKSSSSKSSSSKSSSSK has been examined with ion mobility spectrometry (IMS), gas-phase hydrogen deuterium exchange (HDX), and mass spectrometry (MS) techniques. The [M+3H]3+ peptide ions are observed predominantly as a relatively compact conformer type. Upon subjecting these ions to electron transfer dissociation (ETD), the level of protection for each amino acid residue in the peptide sequence is assessed. The overall per-residue deuterium uptake is observed to be relatively more efficient for the neutral residues than for the model peptide acetyl-PAAAAKAAAAKAAAAKAAAAK. In comparison, the N-terminal and C-terminal regions of the serine peptide show greater relative protection compared with interior residues. Molecular dynamics (MD) simulations have been used to generate candidate structures for collision cross section and HDX reactivity matching. Hydrogen accessibility scoring (HAS) for select structural candidates from MD simulations has been used to suggest conformer types that could contribute to the observed HDX patterns. The results are discussed with respect to recent studies employing extensive MD simulations of gas-phase structure establishment of a peptide system. Graphical Abstract ᅟ.
Collapse
Affiliation(s)
- Ahmad Kiani Karanji
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Mahdiar Khakinejad
- Department of Biophysics, Johns Hopkins University, Baltimore, MD, 21218, USA
| | | | - Sandra N Majuta
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Kushani Attanayake
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA
| | - Stephen J Valentine
- Department of Chemistry, West Virginia University, Morgantown, WV, 26506, USA.
| |
Collapse
|
4
|
Leman JK, Ulmschneider MB, Gray JJ. Computational modeling of membrane proteins. Proteins 2015; 83:1-24. [PMID: 25355688 PMCID: PMC4270820 DOI: 10.1002/prot.24703] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Revised: 10/01/2014] [Accepted: 10/18/2014] [Indexed: 02/06/2023]
Abstract
The determination of membrane protein (MP) structures has always trailed that of soluble proteins due to difficulties in their overexpression, reconstitution into membrane mimetics, and subsequent structure determination. The percentage of MP structures in the protein databank (PDB) has been at a constant 1-2% for the last decade. In contrast, over half of all drugs target MPs, only highlighting how little we understand about drug-specific effects in the human body. To reduce this gap, researchers have attempted to predict structural features of MPs even before the first structure was experimentally elucidated. In this review, we present current computational methods to predict MP structure, starting with secondary structure prediction, prediction of trans-membrane spans, and topology. Even though these methods generate reliable predictions, challenges such as predicting kinks or precise beginnings and ends of secondary structure elements are still waiting to be addressed. We describe recent developments in the prediction of 3D structures of both α-helical MPs as well as β-barrels using comparative modeling techniques, de novo methods, and molecular dynamics (MD) simulations. The increase of MP structures has (1) facilitated comparative modeling due to availability of more and better templates, and (2) improved the statistics for knowledge-based scoring functions. Moreover, de novo methods have benefited from the use of correlated mutations as restraints. Finally, we outline current advances that will likely shape the field in the forthcoming decade.
Collapse
Affiliation(s)
- Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Martin B. Ulmschneider
- Department of Materials Science and Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J. Gray
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
5
|
Afzal AJ, Srour A, Goil A, Vasudaven S, Liu T, Samudrala R, Dogra N, Kohli P, Malakar A, Lightfoot DA. Homo-dimerization and ligand binding by the leucine-rich repeat domain at RHG1/RFS2 underlying resistance to two soybean pathogens. BMC PLANT BIOLOGY 2013; 13:43. [PMID: 23497186 PMCID: PMC3626623 DOI: 10.1186/1471-2229-13-43] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/03/2012] [Accepted: 02/05/2013] [Indexed: 05/26/2023]
Abstract
BACKGROUND The protein encoded by GmRLK18-1 (Glyma_18_02680 on chromosome 18) was a receptor like kinase (RLK) encoded within the soybean (Glycine max L. Merr.) Rhg1/Rfs2 locus. The locus underlies resistance to the soybean cyst nematode (SCN) Heterodera glycines (I.) and causal agent of sudden death syndrome (SDS) Fusarium virguliforme (Aoki). Previously the leucine rich repeat (LRR) domain was expressed in Escherichia coli. RESULTS The aims here were to evaluate the LRRs ability to; homo-dimerize; bind larger proteins; and bind to small peptides. Western analysis suggested homo-dimers could form after protein extraction from roots. The purified LRR domain, from residue 131-485, was seen to form a mixture of monomers and homo-dimers in vitro. Cross-linking experiments in vitro showed the H274N region was close (<11.1 A) to the highly conserved cysteine residue C196 on the second homo-dimer subunit. Binding constants of 20-142 nM for peptides found in plant and nematode secretions were found. Effects on plant phenotypes including wilting, stem bending and resistance to infection by SCN were observed when roots were treated with 50 pM of the peptides. Far-Western analyses followed by MS showed methionine synthase and cyclophilin bound strongly to the LRR domain. A second LRR from GmRLK08-1 (Glyma_08_g11350) did not show these strong interactions. CONCLUSIONS The LRR domain of the GmRLK18-1 protein formed both a monomer and a homo-dimer. The LRR domain bound avidly to 4 different CLE peptides, a cyclophilin and a methionine synthase. The CLE peptides GmTGIF, GmCLE34, GmCLE3 and HgCLE were previously reported to be involved in root growth inhibition but here GmTGIF and HgCLE were shown to alter stem morphology and resistance to SCN. One of several models from homology and ab-initio modeling was partially validated by cross-linking. The effect of the 3 amino acid replacements present among RLK allotypes, A87V, Q115K and H274N were predicted to alter domain stability and function. Therefore, the LRR domain of GmRLK18-1 might underlie both root development and disease resistance in soybean and provide an avenue to develop new variants and ligands that might promote reduced losses to SCN.
Collapse
Affiliation(s)
- Ahmed J Afzal
- Department of Molecular Biology, Microbiology and Biochemistry and Center for Excellence the Illinois Soybean Center, Southern Illinois University at Carbondale, IL 62901, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Combining segmental semi-Markov models with neural networks for protein secondary structure prediction. Neurocomputing 2009. [DOI: 10.1016/j.neucom.2009.04.017] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Schön JC, Jansen M. Determination, prediction, and understanding of structures, using the energy landscapes of chemical systems – Part II. ACTA ACUST UNITED AC 2009. [DOI: 10.1524/zkri.216.7.361.20362] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Abstract
In the past decade, new theoretical approaches have been developed to determine, predict and understand the struc-ture of chemical compounds. The central element of these methods has been the investigation of the energy landscape of chemical systems. Applications range from extended crystalline and amorphous compounds over clusters and molecular crystals to proteins. In this review, we are going to give an introduction to energy landscapes and methods for their investigation, together with a number of examples. These include structure prediction of extended and mo-lecular crystals, structure prediction and folding of proteins, structure analysis of zeolites, and structure determination of crystals from powder diffraction data.
Collapse
|
8
|
LaPointe SM, Farrag S, Bohórquez HJ, Boyd RJ. QTAIM Study of an α-Helix Hydrogen Bond Network. J Phys Chem B 2009; 113:10957-64. [DOI: 10.1021/jp903635h] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Shenna M. LaPointe
- Department of Chemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3
| | - Sarah Farrag
- Department of Chemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3
| | - Hugo J. Bohórquez
- Department of Chemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3
| | - Russell J. Boyd
- Department of Chemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3
| |
Collapse
|
9
|
|
10
|
Wilson CL, Boardman PE, Doig AJ, Hubbard SJ. Improved prediction for N-termini of alpha-helices using empirical information. Proteins 2005; 57:322-30. [PMID: 15340919 DOI: 10.1002/prot.20218] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The prediction of the secondary structure of proteins from their amino acid sequences remains a key component of many approaches to the protein folding problem. The most abundant form of regular secondary structure in proteins is the alpha-helix, in which specific residue preferences exist at the N-terminal locations. Propensities derived from these observed amino acid frequencies in the Protein Data Bank (PDB) database correlate well with experimental free energies measured for residues at different N-terminal positions in alanine-based peptides. We report a novel method to exploit this data to improve protein secondary structure prediction through identification of the correct N-terminal sequences in alpha-helices, based on existing popular methods for secondary structure prediction. With this algorithm, the number of correctly predicted alpha-helix start positions was improved from 30% to 38%, while the overall prediction accuracy (Q3) remained the same, using cross-validated testing. Although the algorithm was developed and tested on multiple sequence alignment-based secondary structure predictions, it was also able to improve the predictions of start locations by methods that use single sequences to make their predictions. Furthermore, the residue frequencies at N-terminal positions of the improved predictions better reflect those seen at the N-terminal positions of alpha-helices in proteins. This has implications for areas such as comparative modeling, where a more accurate prediction of the N-terminal regions of alpha-helices should benefit attempts to model adjacent loop regions. The algorithm is available as a Web tool, located at http://rocky.bms.umist.ac.uk/elephant.
Collapse
Affiliation(s)
- Claire L Wilson
- Department of Biomolecular Sciences, University of Manchester Institute of Science and Technology, Manchester, United Kingdom
| | | | | | | |
Collapse
|
11
|
Eyrich VA, Przybylski D, Koh IYY, Grana O, Pazos F, Valencia A, Rost B. CAFASP3 in the spotlight of EVA. Proteins 2004; 53 Suppl 6:548-60. [PMID: 14579345 DOI: 10.1002/prot.10534] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have analysed fold recognition, secondary structure and contact prediction servers from CAFASP3. This assessment was carried out in the framework of the fully automated, web-based evaluation server EVA. Detailed results are available at http://cubic.bioc.columbia.edu/eva/cafasp3/. We observed that the sequence-unique targets from CAFASP3/CASP5 were not fully representative for evaluating performance. For all three categories, we showed how careless ranking might be misleading. We compared methods from all categories to experts in secondary structure and contact prediction and homology modellers to fold recognisers. While the secondary structure experts clearly outperformed all others, the contact experts appeared to outperform only novel fold methods. Automatic evaluation servers are good at getting statistics right and at using these to discard misleading ranking schemes. We challenge that to let machines rule where they are best might be the best way for the community to enjoy the tremendous benefit of CASP as a unique opportunity for brainstorming.
Collapse
Affiliation(s)
- Volker A Eyrich
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Aloy P, Stark A, Hadley C, Russell RB. Predictions without templates: New folds, secondary structure, and contacts in CASP5. Proteins 2003; 53 Suppl 6:436-56. [PMID: 14579333 DOI: 10.1002/prot.10546] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present the assessment of CASP5 predictions in the new fold category. For coordinate predictions, we considered five targets with new folds and eight lying on the fold recognition borderline. We performed detailed visual and numerical comparisons between predicted and experimental structures to assess prediction accuracy. The two procedures largely agreed, but the visual inspection identified instances where metrics, such as GDT_TS, ranked what we considered incorrect predictions highly. We found the quality of the best predictions to be very good: for nearly every target at least one group predicted a structure close to the correct one. However, selection of the best of five models is still problematic. The group of David Baker once again proved to be best overall, with many individual highlights. However, high quality and consistency were also seen from others, suggesting that the community is moving toward general procedures to predict accurate structures for proteins showing no resemblance to anything seen before. Predictions for secondary structure showed at best limited progress since CASP4. The number of targets is probably too small to spot differences in performance between methods, suggesting that such predictions might be better evaluated with schemes involving more proteins. For contact predictions, accuracies are still low, although there were several instances of accurate and useful contacts predicted de novo, and new approaches hint at future progress.
Collapse
|
13
|
Kho R, Baker BL, Newman JV, Jack RM, Sem DS, Villar HO, Hansen MR. A path from primary protein sequence to ligand recognition. Proteins 2003; 50:589-99. [PMID: 12577265 DOI: 10.1002/prot.10316] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A novel method to organize protein structural information based solely on sequence is presented. The method clusters proteins into families that correlate with the three-dimensional protein structure and the conformation of the bound ligands. This procedure was applied to nicotinamide adenine dinucleotide [NAD(P)]-utilizing enzymes to identify a total of 94 sequence families, 53 of which are structurally characterized. Each of the structurally characterized proteins within a sequence family correlates to a single protein fold and to a common bound conformation of NAD(P). A wide range of structural folds is identified that recognize NAD(P), including Rossmann folds and beta/alpha barrels. The defined sequence families can be used to identify the type and prevalence of NAD(P)-utilizing enzymes in the proteomes of sequenced organisms. The proteome of Mycobacterium tuberculosis was mined to generate a proteome-wide profile of NAD(P)-utilizing enzymes coded by this organism. This enzyme family comprises approximately 6% of the open reading frames, with the largest subgroup being the Rossmann fold, short-chain dehydrogenases. The preponderance of short-chain dehydrogenases correlates strongly with the phenotype of M. tuberculosis, which is characterized as having one of the most complex prokaryotic cell walls.
Collapse
Affiliation(s)
- Richard Kho
- Triad Therapeutics, Inc., San Diego, CA 92121, USA
| | | | | | | | | | | | | |
Collapse
|
14
|
Das R, Junker J, Greenbaum D, Gerstein MB. Global perspectives on proteins: comparing genomes in terms of folds, pathways and beyond. THE PHARMACOGENOMICS JOURNAL 2002; 1:115-25. [PMID: 11911438 DOI: 10.1038/sj.tpj.6500021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The sequencing of complete genomes provides us with a global view of all the proteins in an organism. Proteomic analysis can be done on a purely sequence-based level, with a focus on finding homologues and grouping them into families and clusters of orthologs. However, incorporating protein structure into this analysis provides valuable simplification; it allows one to collect together very distantly related sequences, thus condensing the proteome into a minimal number of 'parts.' We describe issues related to surveying proteomes in terms of structural parts, including methods for fold assignment and formats for comparisons (eg top-10 lists and whole-genome trees), and show how biases in the databases and in sampling can affect these surveys. We illustrate our main points through a case study on the unique protein properties evident in many thermophile genomes (eg more salt bridges). Finally, we discuss metabolic pathways as an even greater simplification of genomes. In comparison to folds these allow the organization of many more genes into coherent systems, yet can nevertheless be understood in many of the same terms.
Collapse
Affiliation(s)
- R Das
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT 06520, USA
| | | | | | | |
Collapse
|
15
|
Abstract
EVA is a web-based server that evaluates automatic structure prediction servers continuously and objectively. Since June 2000, EVA collected more than 20,000 secondary structure predictions. The EVA sets sufficed to conclude that the field of secondary structure prediction has advanced again. Accuracy increased substantially in the 1990s through using evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than 4% to its current height around 76% of all residues predicted correctly in one of the three states: helix, strand, or other. The best current methods solved most of the problems raised at earlier CASP meetings: All good methods now get segments right and perform well on strands. Is the recent increase in accuracy significant enough to make predictions even more useful? We believe the answer is affirmative. What is the limit of prediction accuracy? We shall see. All data are available through the EVA web site at [cubic.bioc.columbia.edu/eva/]. The raw data for the results presented are available at [eva]/sec/bup_common/2001_02_22/.
Collapse
Affiliation(s)
- B Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA.
| | | |
Collapse
|
16
|
Samudrala R, Levitt M. A comprehensive analysis of 40 blind protein structure predictions. BMC STRUCTURAL BIOLOGY 2002; 2:3. [PMID: 12150712 PMCID: PMC122083 DOI: 10.1186/1472-6807-2-3] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/09/2002] [Accepted: 08/01/2002] [Indexed: 11/21/2022]
Abstract
BACKGROUND We thoroughly analyse the results of 40 blind predictions for which an experimental answer was made available at the fourth meeting on the critical assessment of protein structure methods (CASP4). Using our comparative modelling and fold recognition methodologies, we made 29 predictions for targets that had sequence identities ranging from 50% to 10% to the nearest related protein with known structure. Using our ab initio methodologies, we made eleven predictions for targets that had no detectable sequence relationships. RESULTS For 23 of these proteins, we produced models ranging from 1.0 to 6.0 A root mean square deviation (RMSD) for the Calpha atoms between the model and the corresponding experimental structure for all or large parts of the protein, with model accuracies scaling fairly linearly with respect to sequence identity (i.e., the higher the sequence identity, the better the prediction). We produced nine models with accuracies ranging from 4.0 to 6.0 A Calpha RMSD for 60-100 residue proteins (or large fragments of a protein), with a prediction accuracy of 4.0 A Calpha RMSD for residues 1-80 for T110/rbfa. CONCLUSIONS The areas of protein structure prediction that work well, and areas that need improvement, are discernable by examining how our methods have performed over the past four CASP experiments. These results have implications for modelling the structure of all tractable proteins encoded by the genome of an organism.
Collapse
Affiliation(s)
- Ram Samudrala
- Department of Microbiology, University of Washington, School of Medicine, Seattle, WA 98195, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, School of Medicine, Stanford, CA 94305, USA
| |
Collapse
|
17
|
Abstract
Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.
Collapse
Affiliation(s)
- Dariusz Przybylski
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
18
|
Glick M, Rayan A, Goldblum A. A stochastic algorithm for global optimization and for best populations: a test case of side chains in proteins. Proc Natl Acad Sci U S A 2002; 99:703-8. [PMID: 11792838 PMCID: PMC117369 DOI: 10.1073/pnas.022418199] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The problem of global optimization is pivotal in a variety of scientific fields. Here, we present a robust stochastic search method that is able to find the global minimum for a given cost function, as well as, in most cases, any number of best solutions for very large combinatorial "explosive" systems. The algorithm iteratively eliminates variable values that contribute consistently to the highest end of a cost function's spectrum of values for the full system. Values that have not been eliminated are retained for a full, exhaustive search, allowing the creation of an ordered population of best solutions, which includes the global minimum. We demonstrate the ability of the algorithm to explore the conformational space of side chains in eight proteins, with 54 to 263 residues, to reproduce a population of their low energy conformations. The 1,000 lowest energy solutions are identical in the stochastic (with two different seed numbers) and full, exhaustive searches for six of eight proteins. The others retain the lowest 141 and 213 (of 1,000) conformations, depending on the seed number, and the maximal difference between stochastic and exhaustive is only about 0.15 Kcal/mol. The energy gap between the lowest and highest of the 1,000 low-energy conformers in eight proteins is between 0.55 and 3.64 Kcal/mol. This algorithm offers real opportunities for solving problems of high complexity in structural biology and in other fields of science and technology.
Collapse
Affiliation(s)
- Meir Glick
- Department of Medicinal Chemistry and the David R. Bloom Center for Pharmacy, School of Pharmacy, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | | | | |
Collapse
|
19
|
|
20
|
Moult J, Fidelis K, Zemla A, Hubbard T. Critical assessment of methods of protein structure prediction (CASP): Round IV. Proteins 2002. [DOI: 10.1002/prot.10054] [Citation(s) in RCA: 122] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
21
|
Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA.
| | | |
Collapse
|
22
|
Abstract
Methods predicting protein secondary structure improved substantially in the 1990s through the use of evolutionary information taken from the divergence of proteins in the same structural family. Recently, the evolutionary information resulting from improved searches and larger databases has again boosted prediction accuracy by more than four percentage points to its current height of around 76% of all residues predicted correctly in one of the three states, helix, strand, and other. The past year also brought successful new concepts to the field. These new methods may be particularly interesting in light of the improvements achieved through simple combining of existing methods. Divergent evolutionary profiles contain enough information not only to substantially improve prediction accuracy, but also to correctly predict long stretches of identical residues observed in alternative secondary structure states depending on nonlocal conditions. An example is a method automatically identifying structural switches and thus finding a remarkable connection between predicted secondary structure and aspects of function. Secondary structure predictions are increasingly becoming the work horse for numerous methods aimed at predicting protein structure and function. Is the recent increase in accuracy significant enough to make predictions even more useful? Because the recent improvement yields a better prediction of segments, and in particular of beta strands, I believe the answer is affirmative. What is the limit of prediction accuracy? We shall see.
Collapse
Affiliation(s)
- B Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 630 West 168th Street, New York, New York 10032, USA
| |
Collapse
|
23
|
Dunbrack RL, Gerloff DL, Bower M, Chen X, Lichtarge O, Cohen FE. Meeting review: the Second meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2), Asilomar, California, December 13-16, 1996. FOLDING & DESIGN 2001; 2:R27-42. [PMID: 9135979 DOI: 10.1016/s1359-0278(97)00011-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
In most fields of scientific endeavor, the outcomes of important experiments are not always known before the experiments are performed. But in protein structure prediction, algorithms are usually developed and tested in situations where the answers are known. In December 1996, the Second Meeting on the Critical Assessment of Techniques for Protein Structure Prediction (CASP2) was held in Asilomar, California to rectify this situation: protein sequences were provided in advance for which the experimental structure had not yet been published. Over 70 research groups provided bona fide predictions on 42 targets in four categories: comparative or 'homology' modeling, fold recognition or 'threading', ab initio structure predictions, and docking predictions. Since the previous CASP meeting in 1994, the role of fold recognition in structure prediction has increased enormously with the largest number of groups participating in this category. In this review, we highlight some of the important developments and give at least a qualitative sense of what kind of methods produced some of the better predictions.
Collapse
Affiliation(s)
- R L Dunbrack
- Department of Cellular and Molecular Pharmacology, University of California at San Francisco 94143-0450, USA
| | | | | | | | | | | |
Collapse
|
24
|
Abstract
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.
Collapse
Affiliation(s)
- A G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436, Université Paris 7, Paris, France.
| | | | | |
Collapse
|
25
|
Lackner P, Koppensteiner WA, Sippl MJ, Domingues FS. ProSup: a refined tool for protein structure alignment. PROTEIN ENGINEERING 2000; 13:745-52. [PMID: 11161105 DOI: 10.1093/protein/13.11.745] [Citation(s) in RCA: 90] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
We investigated and optimized a method for structure comparison which is based on rigid body superimposition. The method maximizes the number of structurally equivalent residues while keeping the root mean square deviation constant. The resulting number of equivalent residues then provides an adequate similarity measure, which is easy to interpret. We demonstrate that the approach is able to detect remote structural similarity. We show that the number of equivalent residues is a suitable measure for ranking database searches and that the results are in good agreement with expert knowledge protein structure classification. Structure comparison frequently has multiple solutions. The approach that we use provides a range of alternative alignments rather a single solution. We discuss the nature of alternative solutions on several examples.
Collapse
Affiliation(s)
- P Lackner
- Center for Applied Molecular Engineering, Institute for Chemistry and Biochemistry, University of Salzburg, Jakob-Haringer Str. 3, A-5020 Salzburg, Austria
| | | | | | | |
Collapse
|
26
|
King RD, Ouali M, Strong AT, Aly A, Elmaghraby A, Kantardzic M, Page D. Is it better to combine predictions? PROTEIN ENGINEERING 2000; 13:15-9. [PMID: 10679525 DOI: 10.1093/protein/13.1.15] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
We have compared the accuracy of the individual protein secondary structure prediction methods: PHD, DSC, NNSSP and Predator against the accuracy obtained by combing the predictions of the methods. A range of ways of combing predictions were tested: voting, biased voting, linear discrimination, neural networks and decision trees. The combined methods that involve 'learning' (the non-voting methods) were trained using a set of 496 non-homologous domains; this dataset was biased as some of the secondary structure prediction methods had used them for training. We used two independent test sets to compare predictions: the first consisted of 17 non-homologous domains from CASP3 (Third Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction); the second set consisted of 405 domains that were selected in the same way as the training set, and were non-homologous to each other and the training set. On both test datasets the most accurate individual method was NNSSP, then PHD, DSC and the least accurate was Predator; however, it was not possible to conclusively show a significant difference between the individual methods. Comparing the accuracy of the single methods with that obtained by combing predictions it was found that it was better to use a combination of predictions. On both test datasets it was possible to obtain a approximately 3% improvement in accuracy by combing predictions. In most cases the combined methods were statistically significantly better (at P = 0.05 on the CASP3 test set, and P = 0.01 on the EBI test set). On the CASP3 test dataset there was no significant difference in accuracy between any of the combined method of prediction: on the EBI test dataset, linear discrimination and neural networks significantly outperformed voting techniques. We conclude that it is better to combine predictions.
Collapse
Affiliation(s)
- R D King
- Department of Computer Science, University of Wales, Aberystwyth Penglais, Aberystwyth, Ceredigion, SY23 3DB, Wales, UK
| | | | | | | | | | | | | |
Collapse
|
27
|
Abstract
The current state of the art in modeling protein structure has been assessed, based on the results of the CASP (Critical Assessment of protein Structure Prediction) experiments. In comparative modeling, improvements have been made in sequence alignment, sidechain orientation and loop building. Refinement of the models remains a serious challenge. Improved sequence profile methods have had a large impact in fold recognition. Although there has been some progress in alignment quality, this factor still limits model usefulness. In ab initio structure prediction, there has been notable progress in building approximately correct structures of 40-60 residue-long protein fragments. There is still a long way to go before the general ab initio prediction problem is solved. Overall, the field is maturing into a practical technology, able to deliver useful models for a large number of sequences.
Collapse
Affiliation(s)
- J Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, MD 20850, USA.
| |
Collapse
|
28
|
Contreras CF, Canales MA, Alvarez A, De Ferrari GV, Inestrosa NC. Molecular modeling of the amyloid-beta-peptide using the homology to a fragment of triosephosphate isomerase that forms amyloid in vitro. PROTEIN ENGINEERING 1999; 12:959-66. [PMID: 10585501 DOI: 10.1093/protein/12.11.959] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
The main component of the amyloid senile plaques found in Alzheimer's brain is the amyloid-beta-peptide (A beta), a proteolytic product of a membrane precursor protein. Previous structural studies have found different conformations for the A beta peptide depending on the solvent and pH used. In general, they have suggested an alpha-helix conformation at the N-terminal domain and a beta-sheet conformation for the C-terminal domain. The structure of the complete A beta peptide (residues 1-40) solved by NMR has revealed that only helical structure is present in A beta. However, this result cannot explain the large beta-sheet A beta aggregates known to form amyloid under physiological conditions. Therefore, we investigated the structure of A beta by molecular modeling based on extensive homology using the Smith and Waterman algorithm implemented in the MPsrch program (Blitz server). The results showed a mean value of 23% identity with selected sequences. Since these values do not allow a clear homology to be established with a reference structure in order to perform molecular modeling studies, we searched for detailed homology. A 28% identity with an alpha/beta segment of a triosephosphate isomerase (TIM) from Culex tarralis with an unsolved three-dimensional structure was obtained. Then, multiple sequence alignment was performed considering A beta, TIM from C.tarralis and another five TIM sequences with known three-dimensional structures. We found a TIM segment with secondary structure elements in agreement with previous experimental data for A beta. Moreover, when a synthetic peptide from this TIM segment was studied in vitro, it was able to aggregate and to form amyloid fibrils, as established by Congo red binding and electron microscopy. The A beta model obtained was optimized by molecular dynamics considering ionizable side chains in order to simulate A beta in a neutral pH environment. We report here the structural implications of this study.
Collapse
Affiliation(s)
- C F Contreras
- Laboratorio de Biofísica Molecular, Facultad de Ciencias Biológicas, Universidad de Concepción, Concepción and Departamento de Biología Celular y Molecular, Facultad de Ciencias Biológicas, Pontificia Universidad Católica
| | | | | | | | | |
Collapse
|
29
|
Zemla A, Venclovas C, Fidelis K, Rost B. A modified definition of Sov, a segment-based measure for protein secondary structure prediction assessment. Proteins 1999; 34:220-3. [PMID: 10022357 DOI: 10.1002/(sici)1097-0134(19990201)34:2<220::aid-prot7>3.0.co;2-k] [Citation(s) in RCA: 221] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a measure for the evaluation of secondary structure prediction methods that is based on secondary structure segments rather than individual residues. The algorithm is an extension of the segment overlap measure Sov, originally defined by Rost et al. (J Mol Biol 1994;235:13-26). The new definition of Sov corrects the normalization procedure and improves Sov's ability to discriminate between similar and dissimilar segment distributions. The method has been comprehensively tested during the second Critical Assessment of Techniques for Protein Structure Prediction (CASP2). Here, we describe the underlying concepts, modifications to the original definition, and their significance.
Collapse
Affiliation(s)
- A Zemla
- Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94551, USA
| | | | | | | |
Collapse
|
30
|
Morea V, Leplae R, Tramontano A. Protein structure prediction and design. BIOTECHNOLOGY ANNUAL REVIEW 1999; 4:177-214. [PMID: 9890141 DOI: 10.1016/s1387-2656(08)70070-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Proteins have a unique native conformation, which can be proven in many instances to be determined by the amino acid sequence alone. The folding problem, that is the understanding of how the amino acid sequence directs folding, is still unsolved, despite more than 30 years of effort. However, many new methods have appeared in the past few years. This chapter describes the different principles underlying them and tries to give an overview of their successes and pitfalls.
Collapse
Affiliation(s)
- V Morea
- IRBM P. Angeletti, Pomezia, Rome, Italy
| | | | | |
Collapse
|
31
|
|
32
|
|
33
|
Gerstein M, Hegyi H. Comparing genomes in terms of protein structure: surveys of a finite parts list. FEMS Microbiol Rev 1998; 22:277-304. [PMID: 10357579 DOI: 10.1111/j.1574-6976.1998.tb00371.x] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
We give an overview of the emerging field of structural genomics, describing how genomes can be compared in terms of protein structure. As the number of genes in a genome and the total number of protein folds are both quite limited, these comparisons take the form of surveys of a finite parts list, similar in respects to demographic censuses. Fold surveys have many similarities with other whole-genome characterizations, e.g., analyses of motifs or pathways. However, structure has a number of aspects that make it particularly suitable for comparing genomes, namely the way it allows for the precise definition of a basic protein module and the fact that it has a better defined relationship to sequence similarity than does protein function. An essential requirement for a structure survey is a library of folds, which groups the known structures into 'fold families.' This library can be built up automatically using a structure comparison program, and we described how important objective statistical measures are for assessing similarities within the library and between the library and genome sequences. After building the library, one can use it to count the number of folds in genomes, expressing the results in the form of Venn diagrams and 'top-10' statistics for shared and common folds. Depending on the counting methodology employed, these statistics can reflect different aspects of the genome, such as the amount of internal duplication or gene expression. Previous analyses have shown that the common folds shared between very different microorganisms, i.e., in different kingdoms, have a remarkably similar structure, being comprised of repeated strand-helix-strand super-secondary structure units. A major difficulty with this sort of 'fold-counting' is that only a small subset of the structures in a complete genome are currently known and this subset is prone to sampling bias. One way of overcoming biases is through structure prediction, which can be applied uniformly and comprehensively to a whole genome. Various investigators have, in fact, already applied many of the existing techniques for predicting secondary structure and transmembrane (TM) helices to the recently sequenced genomes. The results have been consistent: microbial genomes have similar fractions of strands and helices even though they have significantly different amino acid composition. The fraction of membrane proteins with a given number of TM helices falls off rapidly with more TM elements, approximately according to a Zipf law. This latter finding indicates that there is no preference for the highly studied 7-TM proteins in microbial genomes. Continuously updated tables and further information pertinent to this review are available over the web at http://bioinfo.mbb.yale.edu/genome.
Collapse
Affiliation(s)
- M Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520, USA.
| | | |
Collapse
|
34
|
Padilla-Zúñiga AJ, Rojo-Domínguez A. Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms. FOLDING & DESIGN 1998; 3:271-84. [PMID: 9710573 DOI: 10.1016/s1359-0278(98)00038-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND A detailed knowledge of three-dimensional conformations is necessary in order to understand the close relationship between protein structure and function. Among current methodologies, homology modeling is an important tool for obtaining reliable geometries and it provides a direct alternative to X-ray or NMR techniques. In contrast, predictive methods with no three-dimensional template (non-homology) still require further validation and systematization. RESULTS Here, we present a non-homology knowledge-based strategy for the structural prediction of the proregion of a cysteine proteinase zymogen. This method analyzes individual sequences and multiple alignments of homologous sequences, making use of different published algorithms and incorporating all available structure-related information to obtain improved predictions. Our strategy yielded acceptable secondary structure and general three-dimensional assignments when compared with crystallographic data from homologous proteins. CONCLUSIONS We discuss our successes and failures as a contribution to non-homology prediction development. In addition, based on the information analyzed and generated in this work, we propose plausible folding and activation mechanisms for thiol-proteinase precursors that attempt to shed light on the molecular basis of prosegment functions.
Collapse
Affiliation(s)
- A J Padilla-Zúñiga
- Departamento de Química, Universidad Autónoma Metropolitana-Iztapalapa, México, D.F., México.
| | | |
Collapse
|
35
|
Abstract
The average globular protein contains 30% alpha-helix, the most common type of secondary structure. Some amino acids occur more frequently in alpha-helices than others; this tendency is known as helix propensity. Here we derive a helix propensity scale for solvent-exposed residues in the middle positions of alpha-helices. The scale is based on measurements of helix propensity in 11 systems, including both proteins and peptides. Alanine has the highest helix propensity, and, excluding proline, glycine has the lowest, approximately 1 kcal/mol less favorable than alanine. Based on our analysis, the helix propensities of the amino acids are as follows (kcal/mol): Ala = 0, Leu = 0.21, Arg = 0.21, Met = 0.24, Lys = 0.26, Gln = 0.39, Glu = 0.40, Ile = 0.41, Trp = 0.49, Ser = 0.50, Tyr = 0. 53, Phe = 0.54, Val = 0.61, His = 0.61, Asn = 0.65, Thr = 0.66, Cys = 0.68, Asp = 0.69, and Gly = 1.
Collapse
Affiliation(s)
- C N Pace
- Department of Medical Biochemistry and Genetics, Texas A&M University, College Station, Texas 77843-1114, USA.
| | | |
Collapse
|
36
|
Sunyaev SR, Eisenhaber F, Argos P, Kuznetsov EN, Tumanyan VG. Are knowledge-based potentials derived from protein structure sets discriminative with respect to amino acid types? Proteins 1998. [DOI: 10.1002/(sici)1097-0134(19980515)31:3<225::aid-prot1>3.0.co;2-i] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
37
|
Searls DB. Grand challenges in computational biology. COMPUTATIONAL METHODS IN MOLECULAR BIOLOGY 1998. [DOI: 10.1016/s0167-7306(08)60458-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/10/2023]
|
38
|
Benner SA, Cannarozzi G, Gerloff D, Turcotte M, Chelvanayagam G. Bona Fide Predictions of Protein Secondary Structure Using Transparent Analyses of Multiple Sequence Alignments. Chem Rev 1997; 97:2725-2844. [PMID: 11851479 DOI: 10.1021/cr940469a] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Steven A. Benner
- Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | | | | | | | | |
Collapse
|
39
|
Dandekar T, König R. Computational methods for the prediction of protein folds. BIOCHIMICA ET BIOPHYSICA ACTA 1997; 1343:1-15. [PMID: 9428653 DOI: 10.1016/s0167-4838(97)00132-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
40
|
|
41
|
Abstract
The accuracy of secondary structure prediction methods has been improved significantly by the use of aligned protein sequences. The PHD method and the NNSSP method reach 71 to 72% of sustained overall three-state accuracy when multiple sequence alignments are with neural networks and nearest-neighbor algorithms, respectively. We introduce a variant of the nearest-neighbor approach that can achieve similar accuracy using a single sequence as the query input. We compute the 50 best non-intersecting local alignments of the query sequence with each sequence from a set of proteins with known 3D structures. Each position of the query sequence is aligned with the database amino acids in alpha-helical, beta-strand or coil states. The prediction type of secondary structure is selected as the type of aligned position with the maximal total score. On the dataset of 124 non-membrane non-homologous proteins, used earlier as a benchmark for secondary structure predictions, our method reaches an overall three-state accuracy of 71.2%. The performance accuracy is verified by an additional test on 461 non-homologous proteins giving an accuracy of 71.0%. The main strength of the method is the high level of prediction accuracy for proteins without any known homolog. Using multiple sequence alignments as input the method has a prediction accuracy of 73.5%. Prediction of secondary structure by the SSPAL method is available via Baylor College of Medicine World Wide Web server.
Collapse
Affiliation(s)
- A A Salamov
- Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
42
|
Bower MJ, Cohen FE, Dunbrack RL. Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. J Mol Biol 1997; 267:1268-82. [PMID: 9150411 DOI: 10.1006/jmbi.1997.0926] [Citation(s) in RCA: 425] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Modeling by homology is the most accurate computational method for translating an amino acid sequence into a protein structure. Homology modeling can be divided into two sub-problems, placing the polypeptide backbone and adding side-chains. We present a method for rapidly predicting the conformations of protein side-chains, starting from main-chain coordinates alone. The method involves using fewer than ten rotamers per residue from a backbone-dependent rotamer library and a search to remove steric conflicts. The method is initially tested on 299 high resolution crystal structures by rebuilding side-chains onto the experimentally determined backbone structures. A total of 77% of chi1 and 66% of chi(1 + 2) dihedral angles are predicted within 40 degrees of their crystal structure values. We then tested the method on the entire database of known structures in the Protein Data Bank. The predictive accuracy of the algorithm was strongly correlated with the resolution of the structures. In an effort to simulate a realistic homology modeling problem, 9424 homology models were created using three different modeling strategies. For prediction purposes, pairs of structures were identified which shared between 30% and 90% sequence identity. One strategy results in 82% of chi1 and 72% chi(1 + 2) dihedral angles predicted within 40 degrees of the target crystal structure values, suggesting that movements of the backbone associated with this degree of sequence identity are not large enough to disrupt the predictive ability of our method for non-native backbones. These results compared favorably with existing methods over a comprehensive data set.
Collapse
Affiliation(s)
- M J Bower
- Department of Pharmaceutical Chemistry, University of California San Francisco, 94143-0450, USA
| | | | | |
Collapse
|
43
|
Gerloff DL, Cohen FE, Korostensky C, Turcotte M, Gonnet GH, Benner SA. A predicted consensus structure for the N-Terminal fragment of the heat shock protein HSP90 family. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199703)27:3<450::aid-prot12>3.0.co;2-k] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
44
|
Abstract
Although we are still a long way from being able to predict the details of protein structure from the underlying chemistry, slow but steady progress is being made at modeling structural features by recognizing the patterns that connect sequence to structure.
Collapse
Affiliation(s)
- D Shortle
- Department of Biological Chemistry, The Johns Hopkins University School of Medicine, 725 North Wolfe Street, Baltimore, Maryland 21205, USA
| |
Collapse
|
45
|
Abstract
The computational techniques of sorting out protein folds (these techniques include dynamic programming, self-consistent field theory, etc.) have already ceased to be the bottleneck of predictions. The main problem is that all the methods of recognition and prediction of protein structure can actually use only some part of the interactions operating in the chain, and that even their energies are not known precisely. This is the principal source of errors now. The errors can be reduced by employment of many distant homologues, but this opens a possibility to predict a generalized folding pattern rather than a particular fold with all its details.
Collapse
Affiliation(s)
- A V Finkelstein
- Institute of Protein Research, Russian Academy of Sciences, 142292 Pushchino, Moscow Region, Russia.
| |
Collapse
|
46
|
Zemla A, Venclovas Č, Reinhardt A, Fidelis K, Hubbard TJ. Numerical criteria for the evaluation of ab initio predictions of protein structure. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(1997)1+<140::aid-prot19>3.0.co;2-o] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
47
|
Benner SA, Geroff DL, Rozzell JD. Protein Structure Prediction. Science 1996. [DOI: 10.1126/science.274.5292.1448.b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Steven A. Benner
- Department of Chemistry, Swiss Federal Institute of Technology, CH-8092 Zürich, Switzerland, and University of Florida, Gainesville, FL 32611, USA
| | - Dietlind L. Geroff
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA
| | - J. David Rozzell
- President, Sulfonics Inc., South Orange Grove Boulevard, Pasadena, CA 91105, USA
| |
Collapse
|
48
|
Benner SA, Geroff DL, Rozzell JD. Protein Structure Prediction. Science 1996. [DOI: 10.1126/science.274.5292.1448-b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Affiliation(s)
- Steven A. Benner
- Department of Chemistry, Swiss Federal Institute of Technology, CH-8092 Zürich, Switzerland, and University of Florida, Gainesville, FL 32611, USA
| | - Dietlind L. Geroff
- Department of Pharmaceutical Chemistry, University of California, San Francisco, CA 94143, USA
| | - J. David Rozzell
- President, Sulfonics Inc., South Orange Grove Boulevard, Pasadena, CA 91105, USA
| |
Collapse
|
49
|
Corey MJ, Corey E. On the failure of de novo-designed peptides as biocatalysts. Proc Natl Acad Sci U S A 1996; 93:11428-34. [PMID: 8876152 PMCID: PMC38074 DOI: 10.1073/pnas.93.21.11428] [Citation(s) in RCA: 54] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
While the elegance and efficiency of enzymatic catalysis have long tempted chemists and biochemists with reductionist leanings to try to mimic the functions of natural enzymes in much smaller peptides, such efforts have only rarely produced catalysts with biologically interesting properties. However, the advent of genetic engineering and hybridoma technology and the discovery of catalytic RNA have led to new and very promising alternative means of biocatalyst development. Synthetic chemists have also had some success in creating nonpeptide catalysts with certain enzyme-like characteristics, although their rates and specificities are generally much poorer than those exhibited by the best novel biocatalysts based on natural structures. A comparison of the various approaches from theoretical and practical viewpoints is presented. It is suggested that, given our current level of understanding, the most fruitful methods may incorporate both iterative selection strategies and rationally chosen small perturbations, superimposed on frameworks designed by nature.
Collapse
Affiliation(s)
- M J Corey
- Urology Department, University of Washington School of Medicine, Seattle 98195, USA
| | | |
Collapse
|
50
|
Abstract
The capabilities of current protein structure prediction methods have been assessed from the outcome of a set of blind tests. In comparative modeling, many of the numerical methods did not perform as well as expected, although the resulting structures are still of great practical use. The new methods of fold identification ('threading') were partially successful, and show considerable promise for the future. Except for secondary structure data, results from traditional ab initio methods were poor. A second blind prediction experiment is underway, and progress in all areas is expected.
Collapse
Affiliation(s)
- J Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville 20850, USA.
| |
Collapse
|