1
|
Pan Q, Portelli S, Nguyen TB, Ascher DB. Characterization on the oncogenic effect of the missense mutations of p53 via machine learning. Brief Bioinform 2023; 25:bbad428. [PMID: 38018912 PMCID: PMC10685404 DOI: 10.1093/bib/bbad428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 10/13/2023] [Accepted: 11/05/2023] [Indexed: 11/30/2023] Open
Abstract
Dysfunctions caused by missense mutations in the tumour suppressor p53 have been extensively shown to be a leading driver of many cancers. Unfortunately, it is time-consuming and labour-intensive to experimentally elucidate the effects of all possible missense variants. Recent works presented a comprehensive dataset and machine learning model to predict the functional outcome of mutations in p53. Despite the well-established dataset and precise predictions, this tool was trained on a complicated model with limited predictions on p53 mutations. In this work, we first used computational biophysical tools to investigate the functional consequences of missense mutations in p53, informing a bias of deleterious mutations with destabilizing effects. Combining these insights with experimental assays, we present two interpretable machine learning models leveraging both experimental assays and in silico biophysical measurements to accurately predict the functional consequences on p53 and validate their robustness on clinical data. Our final model based on nine features obtained comparable predictive performance with the state-of-the-art p53 specific method and outperformed other generalized, widely used predictors. Interpreting our models revealed that information on residue p53 activity, polar atom distances and changes in p53 stability were instrumental in the decisions, consistent with a bias of the properties of deleterious mutations. Our predictions have been computed for all possible missense mutations in p53, offering clinical diagnostic utility, which is crucial for patient monitoring and the development of personalized cancer treatment.
Collapse
Affiliation(s)
- Qisheng Pan
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Stephanie Portelli
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - Thanh Binh Nguyen
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| | - David B Ascher
- School of Chemistry and Molecular Bioscience, University of Queensland, Brisbane Queensland 4072, Australia
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne Victoria 3004, Australia
| |
Collapse
|
2
|
Nacar C. Propensities of Some Amino Acid Pairings in α-Helices Vary with Length. Protein J 2022; 41:551-562. [PMID: 36169766 DOI: 10.1007/s10930-022-10076-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/15/2022] [Indexed: 11/29/2022]
Abstract
The results of secondary structure prediction methods are widely used in applications in biotechnology and bioinformatics. However, the accuracy limit of these methods could be improved up to 92%. One approach to achieve this goal is to harvest information from the primary structure of the peptide. This study aims to contribute to this goal by investigating the variations in propensity of amino acid pairings to α-helices in globular proteins depending on helix length. (n):(n + 4) residue pairings were determined using a comprehensive peptide data set according to backbone hydrogen bond criterion which states that backbone hydrogen bond is the dominant driving force of protein folding. Helix length is limited to 13 to 26 residues. Findings of this study show that propensities of ALA:GLY and GLY:GLU pairings to α-helix in globular protein increase with increasing helix length but of ALA:ALA and ALA:VAL decrease. While the frequencies of ILE:ALA, LEU:ALA, LEU:GLN, LEU:GLU, LEU:LEU, MET:ILE and VAL:LEU pairings remain roughly constant with length, the 25 residue pairings have varying propensities in narrow helix lengths. The remaining pairings have no prominent propensity to α-helices.
Collapse
Affiliation(s)
- Cevdet Nacar
- Department of Biophysics, School of Medicine, Marmara University, Istanbul, Turkey.
| |
Collapse
|
3
|
Heinzinger M, Elnaggar A, Wang Y, Dallago C, Nechaev D, Matthes F, Rost B. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 2019; 20:723. [PMID: 31847804 PMCID: PMC6918593 DOI: 10.1186/s12859-019-3220-8] [Citation(s) in RCA: 223] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 11/13/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here. RESULTS We introduced a novel way to represent protein sequences as continuous vectors (embeddings) by using the language model ELMo taken from natural language processing. By modeling protein sequences, ELMo effectively captured the biophysical properties of the language of life from unlabeled big data (UniRef50). We refer to these new embeddings as SeqVec (Sequence-to-Vector) and demonstrate their effectiveness by training simple neural networks for two different tasks. At the per-residue level, secondary structure (Q3 = 79% ± 1, Q8 = 68% ± 1) and regions with intrinsic disorder (MCC = 0.59 ± 0.03) were predicted significantly better than through one-hot encoding or through Word2vec-like approaches. At the per-protein level, subcellular localization was predicted in ten classes (Q10 = 68% ± 1) and membrane-bound were distinguished from water-soluble proteins (Q2 = 87% ± 1). Although SeqVec embeddings generated the best predictions from single sequences, no solution improved over the best existing method using evolutionary information. Nevertheless, our approach improved over some popular methods using evolutionary information and for some proteins even did beat the best. Thus, they prove to condense the underlying principles of protein sequences. Overall, the important novelty is speed: where the lightning-fast HHblits needed on average about two minutes to generate the evolutionary information for a target protein, SeqVec created embeddings on average in 0.03 s. As this speed-up is independent of the size of growing sequence databases, SeqVec provides a highly scalable approach for the analysis of big data in proteomics, i.e. microbiome or metaproteome analysis. CONCLUSION Transfer-learning succeeded to extract information from unlabeled sequence databases relevant for various protein prediction tasks. SeqVec modeled the language of life, namely the principles underlying protein sequences better than any features suggested by textbooks and prediction methods. The exception is evolutionary information, however, that information is not available on the level of a single sequence.
Collapse
Affiliation(s)
- Michael Heinzinger
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany.
| | - Ahmed Elnaggar
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Yu Wang
- Leibniz Supercomputing Centre, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Christian Dallago
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Dmitrii Nechaev
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- TUM Graduate School, Center of Doctoral Studies in Informatics and its Applications (CeDoSIA), Boltzmannstr. 11, 85748, Garching, Germany
| | - Florian Matthes
- TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany
| | - Burkhard Rost
- Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany
- Institute for Advanced Study (TUM-IAS), Lichtenbergstr. 2a, 85748, Garching/Munich, Germany
- TUM School of Life Sciences Weihenstephan (WZW), Alte Akademie 8, Freising, Germany
- Department of Biochemistry and Molecular Biophysics & New York Consortium on Membrane Protein Structure (NYCOMPS), Columbia University, 701 West, 168th Street, New York, NY, 10032, USA
| |
Collapse
|
4
|
Depth dependent amino acid substitution matrices and their use in predicting deleterious mutations. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017; 128:14-23. [DOI: 10.1016/j.pbiomolbio.2017.02.004] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 01/06/2017] [Accepted: 02/07/2017] [Indexed: 12/31/2022]
|
5
|
Capturing native/native like structures with a physico-chemical metric (pcSM) in protein folding. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1834:1520-31. [PMID: 23665455 DOI: 10.1016/j.bbapap.2013.04.023] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 04/12/2013] [Accepted: 04/15/2013] [Indexed: 12/15/2022]
Abstract
Specification of the three dimensional structure of a protein from its amino acid sequence, also called a "Grand Challenge" problem, has eluded a solution for over six decades. A modestly successful strategy has evolved over the last couple of decades based on development of scoring functions (e.g. mimicking free energy) that can capture native or native-like structures from an ensemble of decoys generated as plausible candidates for the native structure. A scoring function must be fast enough in discriminating the native from unfolded/misfolded structures, and requires validation on a large data set(s) to generate sufficient confidence in the score. Here we develop a scoring function called pcSM that detects true native structure in the top 5 with 93% accuracy from an ensemble of candidate structures. If we eliminate the native from ensemble of decoys then pcSM is able to capture near native structure (RMSD<=5Ǻ) in top 10 with 86% accuracy. The parameters considered in pcSM are a C-alpha Euclidean metric, secondary structural propensity, surface areas and an intramolecular energy function. pcSM has been tested on 415 systems consisting 142,698 decoys (public and CASP-largest reported hitherto in literature). The average rank for the native is 2.38, a significant improvement over that existing in literature. In-silico protein structure prediction requires robust scoring technique(s). Therefore, pcSM is easily amenable to integration into a successful protein structure prediction strategy. The tool is freely available at http://www.scfbio-iitd.res.in/software/pcsm.jsp.
Collapse
|
6
|
Glembo TJ, Farrell DW, Gerek ZN, Thorpe MF, Ozkan SB. Collective dynamics differentiates functional divergence in protein evolution. PLoS Comput Biol 2012; 8:e1002428. [PMID: 22479170 PMCID: PMC3315450 DOI: 10.1371/journal.pcbi.1002428] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 01/30/2012] [Indexed: 12/29/2022] Open
Abstract
Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function. Proteins are remarkable machines of the living systems that show diverse biochemical functions. Biochemical diversity has grown over time via molecular evolution. In order to understand how diversity arose, it is fundamental to understand how the earliest proteins evolved and served as templates for the present diverse proteome. The one sequence - one structure - one function paradigm is being extended to a new view: an ensemble of different conformations in equilibrium can evolve new function and the analysis of inherent structural dynamics is crucial to give a more complete understanding of protein evolution. Therefore, we aim to bring structural dynamics into protein evolution through our zipping and assembly method with FRODA. (ZAMF). We apply ZAMF to simultaneously obtain structures and structural dynamics of three ancestral sequences of steroid receptor proteins. By comparative dynamics analysis among the three ancestral steroid hormone receptors: (i) we show that changes in the structural dynamics indicates functional divergence and (ii) we identify all functionally critical and most of the permissive mutations necessary to evolve new function. Overall, all these findings suggest that conformational dynamics may play an important role where new functions evolve through novel molecular interactions.
Collapse
Affiliation(s)
- Tyler J. Glembo
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Daniel W. Farrell
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Z. Nevin Gerek
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - M. F. Thorpe
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
- * E-mail:
| |
Collapse
|
7
|
Li D, Li T, Cong P, Xiong W, Sun J. A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics 2011; 28:32-9. [DOI: 10.1093/bioinformatics/btr611] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
8
|
Schröder A, Eichner J, Supper J, Eichner J, Wanke D, Henneges C, Zell A. Predicting DNA-binding specificities of eukaryotic transcription factors. PLoS One 2010; 5:e13876. [PMID: 21152420 PMCID: PMC2994704 DOI: 10.1371/journal.pone.0013876] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2010] [Accepted: 10/14/2010] [Indexed: 11/18/2022] Open
Abstract
Today, annotated amino acid sequences of more and more transcription factors (TFs) are readily available. Quantitative information about their DNA-binding specificities, however, are hard to obtain. Position frequency matrices (PFMs), the most widely used models to represent binding specificities, are experimentally characterized only for a small fraction of all TFs. Even for some of the most intensively studied eukaryotic organisms (i.e., human, rat and mouse), roughly one-sixth of all proteins with annotated DNA-binding domain have been characterized experimentally. Here, we present a new method based on support vector regression for predicting quantitative DNA-binding specificities of TFs in different eukaryotic species. This approach estimates a quantitative measure for the PFM similarity of two proteins, based on various features derived from their protein sequences. The method is trained and tested on a dataset containing 1 239 TFs with known DNA-binding specificity, and used to predict specific DNA target motifs for 645 TFs with high accuracy.
Collapse
Affiliation(s)
- Adrian Schröder
- Center for Bioinformatics Tübingen (ZBIT), University of Tübingen, Tübingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
9
|
Probing protein fold space with a simplified model. J Mol Biol 2007; 375:920-33. [PMID: 18054792 DOI: 10.1016/j.jmb.2007.10.087] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2007] [Revised: 10/15/2007] [Accepted: 10/31/2007] [Indexed: 11/24/2022]
Abstract
We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (33-300 residues) amino acid composition and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of parallel tempering and equi-energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all alpha: 4.77 A, all beta: 2.93 A, alpha/beta: 3.09 A, alpha+beta: 4.89 A on average and within 6 A for 71.41%, 92.85%, 94.29% and 64.28% for all-alpha, all-beta, alpha/beta and alpha+beta, classes, respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of alpha and beta folds. We find that alpha/beta proteins with alternating alpha and beta segments (such as the beta-barrel) are more stable than proteins in other fold classes.
Collapse
|
10
|
Xiao X, Shao S, Ding Y, Huang Z, Chen X, Chou KC. An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. J Theor Biol 2005; 235:555-65. [PMID: 15935173 DOI: 10.1016/j.jtbi.2005.02.008] [Citation(s) in RCA: 80] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2004] [Revised: 12/13/2004] [Accepted: 02/09/2005] [Indexed: 11/26/2022]
Abstract
Hepatitis B viruses (HBVs) show instantaneous and high-ratio mutations when they are replicated, some sorts of which significantly affect the efficiency of virus replication through enhancing or depressing the viral replication, while others have no influence at all. The mechanism of gene expression is closely correlated with its gene sequence. With the rapid increase in the number of newly found sequences entering into data banks, it is highly desirable to develop an automated method for simulating the gene regulating function. The establishment of such a predictor will no doubt expedite the process of prioritizing genes and proteins identified by genomics efforts as potential molecular targets for drug design. Based on the power of cellular automata (CA) in treating complex systems with simple rules, a novel method to present HBV gene image has been introduced. The results show that the images thus obtained can very efficiently simulate the effects of the gene missense mutation on the virus replication. It is anticipated that CA may also serve as a useful vehicle for many other studies on complicated biological systems.
Collapse
Affiliation(s)
- Xuan Xiao
- Bio-Informatics Research Center, Donghua University, Shanghai 200051, China
| | | | | | | | | | | |
Collapse
|
11
|
Huang JT, Wang MT. Secondary structural wobble: the limits of protein prediction accuracy. Biochem Biophys Res Commun 2002; 294:621-5. [PMID: 12056813 DOI: 10.1016/s0006-291x(02)00545-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
At present, accuracies of secondary structural prediction scarcely go beyond 70-75%. Secondary structural comparison is carried out among sequence-identified proteins. The results show natural wobble between different secondary structural types is possible in homologous families, and the best prediction accuracy will rarely be 100%. Besides shortcoming of the prediction approaches, secondary structural wobble is found to be responsible for nearly all secondary structural prediction limits. Only average 73.2% of amino acid residue is conserved in secondary structural types. The wobble allows alpha-class/coil and beta-class/coil transitions but not direct alpha-class/beta-class transition. Propensity values representing the statistical occurrence of 20 amino acid residues in secondary structural wobbles are given.
Collapse
Affiliation(s)
- Ji-Tao Huang
- Department of Biochemistry, Tianjin Institute of Technology, Tianjin 300191, China.
| | | |
Collapse
|
12
|
Abstract
Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.
Collapse
Affiliation(s)
- Dariusz Przybylski
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York, USA.
| | | |
Collapse
|
13
|
Shi J, Blundell TL, Mizuguchi K. FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J Mol Biol 2001; 310:243-57. [PMID: 11419950 DOI: 10.1006/jmbi.2001.4762] [Citation(s) in RCA: 922] [Impact Index Per Article: 40.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
FUGUE, a program for recognizing distant homologues by sequence-structure comparison (http://www-cryst.bioc.cam.ac.uk/fugue/), has three key features. (1) Improved environment-specific substitution tables. Substitutions of an amino acid in a protein structure are constrained by its local structural environment, which can be defined in terms of secondary structure, solvent accessibility, and hydrogen bonding status. The environment-specific substitution tables have been derived from structural alignments in the HOMSTRAD database (http://www-cryst.bioc. cam.ac.uk/homstrad/). (2) Automatic selection of alignment algorithm with detailed structure-dependent gap penalties. FUGUE uses the global-local algorithm to align a sequence-structure pair when they greatly differ in length and uses the global algorithm in other cases. The gap penalty at each position of the structure is determined according to its solvent accessibility, its position relative to the secondary structure elements (SSEs) and the conservation of the SSEs. (3) Combined information from both multiple sequences and multiple structures. FUGUE is designed to align multiple sequences against multiple structures to enrich the conservation/variation information. We demonstrate that the combination of these three key features implemented in FUGUE improves both homology recognition performance and alignment accuracy.
Collapse
Affiliation(s)
- J Shi
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Old Addenbrookes Site, Cambridge, CB2 1GA, UK
| | | | | |
Collapse
|
14
|
Lecompte O, Thompson JD, Plewniak F, Thierry J, Poch O. Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene 2001; 270:17-30. [PMID: 11403999 DOI: 10.1016/s0378-1119(01)00461-9] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Multiple alignment, since its introduction in the early seventies, has become a cornerstone of modern molecular biology. It has traditionally been used to deduce structure / function by homology, to detect conserved motifs and in phylogenetic studies. There has recently been some renewed interest in the development of multiple alignment techniques, with current opinion moving away from a single all-encompassing algorithm to iterative and / or co-operative strategies. The exploitation of multiple alignments in genome annotation projects represents a qualitative leap in the functional analysis process, opening the way to the study of the co-evolution of validated sets of proteins and to reliable phylogenomic analysis. However, the alignment of the highly complex proteins detected by today's advanced database search methods is a daunting task. In addition, with the explosion of the sequence databases and with the establishment of numerous specialized biological databases, multiple alignment programs must evolve if they are to successfully rise to the new challenges of the post-genomic era. The way forward is clearly an integrated system bringing together sequence data, knowledge-based systems and prediction methods with their inherent unreliability. The incorporation of such heterogeneous, often non-consistent, data will require major changes to the fundamental alignment algorithms used to date. Such an integrated multiple alignment system will provide an ideal workbench for the validation, propagation and presentation of this information in a format that is concise, clear and intuitive.
Collapse
Affiliation(s)
- O Lecompte
- Laboratoire de Biologie et Génomique Structurales, Institut de Génétique et de Biologie Moléculaire et Cellulaire (CNRS/INSERM/ULP), BP 163, 67404 Cedex, Illkirch, France
| | | | | | | | | |
Collapse
|
15
|
Jennings AJ, Edge CM, Sternberg MJ. An approach to improving multiple alignments of protein sequences using predicted secondary structure. PROTEIN ENGINEERING 2001; 14:227-31. [PMID: 11391014 DOI: 10.1093/protein/14.4.227] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
The object of this work was to improve multiple sequence alignments using public-domain software and methods as far as possible. A method is described where the secondary structure of proteins is predicted and this information, coupled with a simplified description of the amino acids, is used to produce multiple sequence alignments. This method improved the accuracy of the resulting alignments by between 5 and 14% when compared with full sequence profile alignments (as scored against structural alignments). These improved alignments were used to predict the secondary structure of the sequences they contain. The resultant predictions were more accurate than those produced from less optimal alignments. An improvement of 6% for a three-state (helix, sheet and coil) prediction was observed when using the best alignment from the method presented here and the alignment obtained using sequence only. The method makes use of public domain software and all the associated files required to repeat the work are available from the primary author.
Collapse
Affiliation(s)
- A J Jennings
- Discovery Chemistry, SmithKline Beecham Pharmaceuticals, New Frontiers Science Park, Third Avenue, Harlow, Essex, UK
| | | | | |
Collapse
|
16
|
Abstract
The effect of training a neural network secondary structure prediction algorithm with different types of multiple sequence alignment profiles derived from the same sequences, is shown to provide a range of accuracy from 70.5% to 76.4%. The best accuracy of 76.4% (standard deviation 8.4%), is 3.1% (Q(3)) and 4.4% (SOV2) better than the PHD algorithm run on the same set of 406 sequence non-redundant proteins that were not used to train either method. Residues predicted by the new method with a confidence value of 5 or greater, have an average Q(3) accuracy of 84%, and cover 68% of the residues. Relative solvent accessibility based on a two state model, for 25, 5, and 0% accessibility are predicted at 76.2, 79.8, and 86. 6% accuracy respectively. The source of the improvements obtained from training with different representations of the same alignment data are described in detail. The new Jnet prediction method resulting from this study is available in the Jpred secondary structure prediction server, and as a stand-alone computer program from: http://barton.ebi.ac.uk/. Proteins 2000;40:502-511.
Collapse
Affiliation(s)
- J A Cuff
- Laboratory of Molecular Biophysics, Oxford, United Kingdom
| | | |
Collapse
|
17
|
Affiliation(s)
- S Henikoff
- Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109-1024, USA
| | | |
Collapse
|
18
|
Zhang CT, Zhang R. A graphic approach to evaluate algorithms of secondary structure prediction. J Biomol Struct Dyn 2000; 17:829-42. [PMID: 10798528 DOI: 10.1080/07391102.2000.10506572] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Algorithms of secondary structure prediction have undergone the developments of nearly 30 years. However, the problem of how to appropriately evaluate and compare algorithms has not yet completely solved. A graphic method to evaluate algorithms of secondary structure prediction has been proposed here. Traditionally, the performance of an algorithm is evaluated by a number, i.e., accuracy of various definitions. Instead of a number, we use a graph to completely evaluate an algorithm, in which the mapping points are distributed in a three-dimensional space. Each point represents the predictive result of the secondary structure of a protein. Because the distribution of mapping points in the 3D space generally contains more information than a number or a set of numbers, it is expected that algorithms may be evaluated and compared by the proposed graphic method more objectively. Based on the point distribution, six evaluation parameters are proposed, which describe the overall performance of the algorithm evaluated. Furthermore, the graphic method is simple and intuitive. As an example of application, two advanced algorithms, i.e., the PHD and NNpredict methods, are evaluated and compared. It is shown that there is still much room for further improvement for both algorithms. It is pointed out that the accuracy for predicting either the alpha-helix or beta-strand in proteins with higher alpha-helix or beta-strand content, respectively, should be greatly improved for both algorithms.
Collapse
Affiliation(s)
- C T Zhang
- Department of Physics, Tianjin University, China.
| | | |
Collapse
|
19
|
Mugilan SA, Veluraja K. Generation of deviation parameters for amino acid singlets, doublets and triplets from three-dimentional structures of proteins and its implications for secondary structure prediction from amino acid sequences. J Biosci 2000; 25:81-91. [PMID: 10824202 DOI: 10.1007/bf02985185] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We present a new method, secondary structure prediction by deviation parameter (SSPDP) for predicting the secondary structure of proteins from amino acid sequence. Deviation parameters (DP) for amino acid singlets, doublets and triplets were computed with respect to secondary structural elements of proteins based on the dictionary of secondary structure prediction (DSSP)-generated secondary structure for 408 selected non-homologous proteins. To the amino acid triplets which are not found in the selected dataset, a DP value of zero is assigned with respect to the secondary structural elements of proteins. The total number of parameters generated is 15,432, in the possible parameters of 25,260. Deviation parameter is complete with respect to amino acid singlets, doublets, and partially complete with respect to amino acid triplets. These generated parameters were used to predict secondary structural elements from amino acid sequence. The secondary structure predicted by our method (SSPDP) was compared with that of single sequence (NNPREDICT) and multiple sequence (PHD) methods. The average value of the percentage of prediction accuracy for a helix by SSPDP, NNPREDICT and PHD methods was found to be 57%, 44% and 69% respectively for the proteins in the selected dataset. For b-strand the prediction accuracy is found to be 69%, 21% and 53% respectively by SSPDP, NNPREDICT and PHD methods. This clearly indicates that the secondary structure prediction by our method is as good as PHD method but much better than NNPREDICT method.
Collapse
Affiliation(s)
- S A Mugilan
- Department of Physics, Manonmaniam Sundaranar University, Tirunelveli 627 012, Tamil Nadu, India
| | | |
Collapse
|
20
|
Abstract
Elucidation of interrelationships among sequence, structure, function, and evolution (FESS relationships) of a family of genes or gene products is a central theme of modern molecular biology. Multiple sequence alignment has been proven to be a powerful tool for many fields of studies such as phylogenetic reconstruction, illumination of functionally important regions, and prediction of higher order structures of proteins and RNAs. However, it is far too trivial to automatically construct a multiple alignment from a set of related sequences. A variety of methods for solving this computationally difficult problem are reviewed. Several important applications of multiple alignment for elucidation of the FESS relationships are also discussed. For a long period, progressive methods have been the only practical means to solve a multiple alignment problem of appreciable size. This situation is now changing with the development of new techniques including several classes of iterative methods. Today's progress in multiple sequence alignment methods has been made by the multidisciplinary endeavors of mathematicians, computer scientists, and biologists in various fields including biophysicists in particular. The ideas are also originated from various backgrounds, pure algorithmics, statistics, thermodynamics, and others. The outcomes are now enjoyed by researchers in many fields of biological sciences. In the near future, generalized multiple alignment may play a central role in studies of FESS relationships. The organized mixture of knowledge from multiple fields will ferment to develop fruitful results which would be hard to obtain within each area. I hope this review provides a useful information resource for future development of theory and practice in this rapidly expanding area of bioinformatics.
Collapse
Affiliation(s)
- O Gotoh
- Saitama Cancer Center Research Institute, Japan
| |
Collapse
|
21
|
Heringa J. Two strategies for sequence comparison: profile-preprocessed and secondary structure-induced multiple alignment. COMPUTERS & CHEMISTRY 1999; 23:341-64. [PMID: 10404624 DOI: 10.1016/s0097-8485(99)00012-1] [Citation(s) in RCA: 112] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Multiple sequence alignment remains one of the most powerful tools for assessing sequence relateness and the identification of structurally and functionally important protein regions. In this work, two new techniques are introduced to increase the sensitivity of dynamic programming and to enable checks for alignment consistency: Profile-preprocessed and secondary structure-induced alignments. Both strategies are based upon the hierarchical dynamic programming technique and can be applied separately or used in combination. Alignments resulting from the strategies are shown in comparison with the multiple alignment methods CLUSTALX and MULTAL for distant sequence sets of the flavoxin and cupredoxin protein families.
Collapse
Affiliation(s)
- J Heringa
- Division of Mathematical Biology, National Institute for Medical Research (NIMR), Mill Hill, London, UK.
| |
Collapse
|
22
|
Jermutus L, Guez V, Bedouelle H. Disordered C-terminal domain of tyrosyl-tRNA synthetase: secondary structure prediction. Biochimie 1999; 81:235-44. [PMID: 10385005 DOI: 10.1016/s0300-9084(99)80057-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The C-terminal domain (residues 320-419) of tyrosyl-tRNA synthetase (TyrRS) from Bacillus stearothermophilus is disordered in the crystal structure and involved in the binding of the anticodon arm of tRNA(Tyr). The sequences of 11 TyrRSs of prokaryotic or mitochondrial origins were aligned and the alignment showed the existence of conserved residues in the sequences of the C-terminal domains. A consensus could be deduced from the application of five programs of secondary structure prediction to the 11 sequences of the query set. These results suggested that the sequences of the C-terminal domains determined a precise and conserved secondary structure. They predicted that the C-terminal domain would have a mixed fold (alpha/beta or alpha+beta), with the alpha-helices in the first half of the sequence and the beta-strands mainly in its second half. Several programs of fold recognition from sequence alone, by threading onto known structures, were applied but none of them identified a type of fold that would be common to the different sequences of the query set. Therefore, the fold of the C-terminal, anticodon binding domain might be novel.
Collapse
Affiliation(s)
- L Jermutus
- Groupe d'Ingénierie des Protéines (CNRS URA 1129), Unité de Biochimie Cellulaire, Institut Pasteur, Paris, France
| | | | | |
Collapse
|
23
|
Liu Z, Song D, Kramer A, Martin AC, Dandekar T, Schneider-Mergener J, Bautz EK, Dübel S. Fine mapping of the antigen-antibody interaction of scFv215, a recombinant antibody inhibiting RNA polymerase II from Drosophila melanogaster. J Mol Recognit 1999; 12:103-11. [PMID: 10398401 DOI: 10.1002/(sici)1099-1352(199903/04)12:2<103::aid-jmr447>3.0.co;2-b] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
A bacterially expressed single chain antibody (scFv215) directed against the largest subunit of drosophila RNA polymerase II was analysed. Structure and function of the antigen binding site in scFv215 were probed by chain shuffling and by site-specific mutagenesis. The entire variable region of either the heavy or light chain was replaced by an unrelated heavy or light chain. Both replacements resulted in a total loss of binding activity suggesting that the antigen binding site is contributed by both chains. The functional contributions of each complementarity determining region (CDR) were investigated by site specific mutagenesis of each CDR separately. Mutations in two of the CDRs, CDR1 of light chain and CDR2 of heavy chain, reduced the binding activity significantly. Each of the amino acids in these two CDRs was replaced individually by alanine (alanine walking). Seven amino acid substitutions in the two CDRs were found to reduce the binding activity by more than 50%. The data support a computer model of scFv215 which fits an epitope model based on a mutational analysis of the epitope suggesting an alpha-helical structure for the main contact area.
Collapse
Affiliation(s)
- Z Liu
- Universität Heidelberg, Molekulare Genetik, Im Neuenheimer Feld 230, 69120 Heidelberg, Germany
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Baxevanis AD, Landsman D. Predictive methods using protein sequences. METHODS OF BIOCHEMICAL ANALYSIS 1998; 39:246-67. [PMID: 9707934 DOI: 10.1002/9780470110607.ch11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- A D Baxevanis
- Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA
| | | |
Collapse
|
25
|
Padilla-Zúñiga AJ, Rojo-Domínguez A. Non-homology knowledge-based prediction of the papain prosegment folding pattern: a description of plausible folding and activation mechanisms. FOLDING & DESIGN 1998; 3:271-84. [PMID: 9710573 DOI: 10.1016/s1359-0278(98)00038-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
BACKGROUND A detailed knowledge of three-dimensional conformations is necessary in order to understand the close relationship between protein structure and function. Among current methodologies, homology modeling is an important tool for obtaining reliable geometries and it provides a direct alternative to X-ray or NMR techniques. In contrast, predictive methods with no three-dimensional template (non-homology) still require further validation and systematization. RESULTS Here, we present a non-homology knowledge-based strategy for the structural prediction of the proregion of a cysteine proteinase zymogen. This method analyzes individual sequences and multiple alignments of homologous sequences, making use of different published algorithms and incorporating all available structure-related information to obtain improved predictions. Our strategy yielded acceptable secondary structure and general three-dimensional assignments when compared with crystallographic data from homologous proteins. CONCLUSIONS We discuss our successes and failures as a contribution to non-homology prediction development. In addition, based on the information analyzed and generated in this work, we propose plausible folding and activation mechanisms for thiol-proteinase precursors that attempt to shed light on the molecular basis of prosegment functions.
Collapse
Affiliation(s)
- A J Padilla-Zúñiga
- Departamento de Química, Universidad Autónoma Metropolitana-Iztapalapa, México, D.F., México.
| | | |
Collapse
|
26
|
|
27
|
Kozmin SG, Schaaper RM, Shcherbakova PV, Kulikov VN, Noskov VN, Guetsova ML, Alenin VV, Rogozin IB, Makarova KS, Pavlov YI. Multiple antimutagenesis mechanisms affect mutagenic activity and specificity of the base analog 6-N-hydroxylaminopurine in bacteria and yeast. Mutat Res 1998; 402:41-50. [PMID: 9675240 DOI: 10.1016/s0027-5107(97)00280-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Base analog 6-N-hydroxylaminopurine is a potent mutagen in variety of prokaryotic and eukaryotic organisms. In the review, we discuss recent results of the studies of HAP mutagenic activity, genetic control and specificity in bacteria and yeast with the emphasis to the mechanisms protecting living cells from mutagenic and toxic effects of this base analog.
Collapse
Affiliation(s)
- S G Kozmin
- Department of Genetics, Sankt-Petersburg University, Sankt-Petersburg, 199034, Russian Federation
| | | | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Macheroux P, Hill S, Austin S, Eydmann T, Jones T, Kim SO, Poole R, Dixon R. Electron donation to the flavoprotein NifL, a redox-sensing transcriptional regulator. Biochem J 1998; 332 ( Pt 2):413-9. [PMID: 9601070 PMCID: PMC1219496 DOI: 10.1042/bj3320413] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Transcriptional control of the nitrogen fixation (nif) genes in response to oxygen in Azotobacter vinelandii is mediated by nitrogen fixation regulatory protein L (NifL), a regulatory flavoprotein that modulates the activity of the transcriptional activator nitrogen fixation regulatory protein A (NifA). CD spectra of purified NifL indicate that FAD is bound to NifL in an asymmetric environment and the protein is predominantly alpha-helical. The redox potential of NifL is -226 mV at pH 8 as determined by the enzymic reduction of NifL by xanthine oxidase/xanthine in the presence of appropriate mediators. The reduction of NifL by xanthine oxidase prevented NifL from acting as an inhibitor of NifA. In the absence of electron mediators NifL could also be reduced by Escherichia coli flavohaemoprotein (Hmp) with NADH as reductant. Hmp contains a globin-like domain with haem B as prosthetic group and an FAD-containing oxidoreductase module. The carboxyferrohaem form of Hmp was competent to reduce NifL, suggesting that electron donation to NifL originates from the flavin in Hmp rather than by direct electron transfer from the haem. Spinach ferredoxin:NAD(P) oxidoreductase, which adopts a folding similar to the FAD- and NAD-binding domains of Hmp, also reduced NifL with NADH as reductant. Re-oxidation of NifL occurs rapidly in the presence of air, raising the possibility that NifL might sense intracellular oxygen. We propose a physiological redox cycle in which the oxidation of NifL by oxygen and hence the activation of its inhibitory properties occurs rapidly, in contrast with the switch from the active to the reduced form of NifL, which occurs more slowly.
Collapse
Affiliation(s)
- P Macheroux
- Nitrogen Fixation Laboratory, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, Norfolk, UK
| | | | | | | | | | | | | | | |
Collapse
|
29
|
Li-Chan EC. Methods to monitor process-induced changes in food proteins. An overview. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 1998; 434:5-23. [PMID: 9598186 DOI: 10.1007/978-1-4899-1925-0_2] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Proteins in food systems may undergo various changes in their structural properties as a consequence of processing. Whether these changes are beneficial or detrimental in terms of the nutritional, biological or functional properties of the processed system, it is important to apply analytical methods which can monitor the course of protein structural changes, in order to elucidate the underlying mechanism behind the results of different processes. Proteins are usually found in high concentrations in foods; furthermore, these proteins frequently may either initially be part of a solid food or may become insoluble due to processing. As a result, many of the traditional biochemical methods for analysis of protein structural properties in dilute solution cannot be applied directly to study food proteins. This chapter gives an overview of some potential methods which may be used to monitor the changes in quaternary, tertiary, secondary and primary structure of proteins in food systems.
Collapse
Affiliation(s)
- E C Li-Chan
- Department of Food Science, University of British Columbia, Vancouver, Canada
| |
Collapse
|
30
|
Peelman F, Vinaimont N, Verhee A, Vanloo B, Verschelde JL, Labeur C, Seguret-Mace S, Duverger N, Hutchinson G, Vandekerckhove J, Tavernier J, Rosseneu M. A proposed architecture for lecithin cholesterol acyl transferase (LCAT): identification of the catalytic triad and molecular modeling. Protein Sci 1998; 7:587-99. [PMID: 9541390 PMCID: PMC2143955 DOI: 10.1002/pro.5560070307] [Citation(s) in RCA: 81] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The enzyme cholesterol lecithin acyl transferase (LCAT) shares the Ser/Asp-Glu/His triad with lipases, esterases and proteases, but the low level of sequence homology between LCAT and these enzymes did not allow for the LCAT fold to be identified yet. We, therefore, relied upon structural homology calculations using threading methods based on alignment of the sequence against a library of solved three-dimensional protein structures, for prediction of the LCAT fold. We propose that LCAT, like lipases, belongs to the alpha/beta hydrolase fold family, and that the central domain of LCAT consists of seven conserved parallel beta-strands connected by four alpha-helices and separated by loops. We used the conserved features of this protein fold for the prediction of functional domains in LCAT, and carried out site-directed mutagenesis for the localization of the active site residues. The wild-type enzyme and mutants were expressed in Cos-1 cells. LCAT mass was measured by ELISA, and enzymatic activity was measured on recombinant HDL, on LDL and on a monomeric substrate. We identified D345 and H377 as the catalytic residues of LCAT, together with F103 and L182 as the oxyanion hole residues. In analogy with lipases, we further propose that a potential "lid" domain at residues 50-74 of LCAT might be involved in the enzyme-substrate interaction. Molecular modeling of human LCAT was carried out using human pancreatic and Candida antarctica lipases as templates. The three-dimensional model proposed here is compatible with the position of natural mutants for either LCAT deficiency or Fish-eye disease. It enables moreover prediction of the LCAT domains involved in the interaction with the phospholipid and cholesterol substrates.
Collapse
Affiliation(s)
- F Peelman
- Department of Biochemistry, University of Gent, Belgium
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Perozich J, Hempel J, Morris SM. Roles of conserved residues in the arginase family. BIOCHIMICA ET BIOPHYSICA ACTA 1998; 1382:23-37. [PMID: 9507056 DOI: 10.1016/s0167-4838(97)00131-3] [Citation(s) in RCA: 72] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Arginases and related enzymes metabolize arginine or similar nitrogen-containing compounds to urea or formamide. In the present report a sequence alignment of 31 members of this family was generated. The alignment, together with the crystal structure of rat liver arginase, allowed the assignment of possible functional or structural roles to 32 conserved residues and conservative substitutions. Two of these residues were previously identified as functionally essential by analysis of inherited defects in the type I arginase gene. Nearly half of the conserved residues are either glycines or prolines located at critical bends in the protein structure. Most metal-coordinating residues, including one histidine and four aspartic acid residues, are strictly conserved. Two additional histidines involved in metal-binding and catalysis are conserved in all arginases and in almost all other family members. Two positions with invariant similarities may serve as indirect metal ligands. Evolutionary relationships within this family were also suggested. Vertebrate type I and II arginases appear to have developed independently from an early gene duplication event. A ureohydrolase sequence from Caenorhabditis elegans is more closely related to other arginases than previously appreciated, while unclassified enzymes from Methanococcus jannaschii and Methanothermus fervidus appear more similar to arginase-related enzymes. In addition, enzymes from Arabidopsis thaliana and Synechocystis, previously identified as arginases, more closely resemble arginase-related enzymes than currently known arginases.
Collapse
Affiliation(s)
- J Perozich
- Department of Molecular Genetics and Biochemistry, School of Medicine, University of Pittsburgh, PA 15261, USA.
| | | | | |
Collapse
|
32
|
Benner SA, Cannarozzi G, Gerloff D, Turcotte M, Chelvanayagam G. Bona Fide Predictions of Protein Secondary Structure Using Transparent Analyses of Multiple Sequence Alignments. Chem Rev 1997; 97:2725-2844. [PMID: 11851479 DOI: 10.1021/cr940469a] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Steven A. Benner
- Department of Chemistry, University of Florida, Gainesville, Florida 32611-7200
| | | | | | | | | |
Collapse
|
33
|
Malik HS, Eickbush TH, Goldfarb DS. Evolutionary specialization of the nuclear targeting apparatus. Proc Natl Acad Sci U S A 1997; 94:13738-42. [PMID: 9391096 PMCID: PMC28376 DOI: 10.1073/pnas.94.25.13738] [Citation(s) in RCA: 94] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/1997] [Accepted: 10/02/1997] [Indexed: 02/05/2023] Open
Abstract
The alpha- and beta-karyopherins (Kaps), also called importins, mediate the nuclear transport of proteins. All alpha-Kaps contain a central domain composed of eight approximately 40 amino acid, tandemly arranged, armadillo-like (Arm) repeats. The number and order of these repeats have not changed since the common origin of fungi, plants, and mammals. Phylogenetic analysis suggests that the various alpha-Kaps fall into two groups, alpha1 and alpha2. Whereas animals encode both types, the yeast genome encodes only an alpha1-Kap. The beta-Kaps are characterized by 14-15 tandemly arranged HEAT motifs. We show that the Arm repeats of alpha-Kaps and the HEAT motifs of beta-Kaps are similar, suggesting that the alpha-Kaps and beta-Kaps (and for that matter, all Arm and HEAT repeat-containing proteins) are members of the same protein superfamily. Phylogenetic analysis indicates that there are at least three major groups of beta-Kaps, consistent with their proposed cargo specificities. We present a model in which an alpha-independent beta-Kap progenitor gave rise to the alpha-dependent beta-Kaps and the alpha-Kaps.
Collapse
Affiliation(s)
- H S Malik
- Department of Biology, University of Rochester, Rochester, NY 14627, USA
| | | | | |
Collapse
|
34
|
Seidel G, Adermann K, Schindler T, Ejchart A, Jaenicke R, Forssmann WG, Rösch P. Solution structure of porcine delta sleep-inducing peptide immunoreactive peptide A homolog of the shortsighted gene product. J Biol Chem 1997; 272:30918-27. [PMID: 9388238 DOI: 10.1074/jbc.272.49.30918] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The 77-residue delta sleep-inducing peptide immunoreactive peptide (DIP) is a close homolog of the Drosophila melanogaster shortsighted gene product. Porcine DIP (pDIP) and a peptide containing a leucine zipper-related partial sequence of pDIP, pDIP(9-46), was synthesized and studied by circular dichroism and nuclear magnetic resonance spectroscopy in combination with molecular dynamics calculations. Ultracentrifugation, size exclusion chromatography, and model calculations indicated that pDIP forms a dimer. This was confirmed by the observation of concentration-dependent thermal folding-unfolding transitions. From CD spectroscopy and thermal folding-unfolding transitions of pDIP(9-46), it was concluded that the dimerization of pDIP is a result of interaction between helical structures localized in the leucine zipper motif. The three-dimensional structure of the protein was determined with a modified simulated annealing protocol using experimental data derived from nuclear magnetic resonance spectra and a modeling approach based on an established strategy for coiled coil structures. The left-handed super helical structure of the leucine zipper type sequence resulting from the modeling approach is in agreement with known leucine zipper structures. In addition to the hydrophobic interactions between the amino acids at the heptade positions a and d, the structure of pDIP is stabilized by the formation of interhelical i to i' + 5 salt bridges. This result was confirmed by the pH dependence of the thermal-folding transitions. In addition to the amphipatic helix of the leucine zipper, a second helix is formed in the NH2-terminal part of pDIP. This helix exhibits more 310-helix character and is less stable than the leucine zipper helix. For the COOH-terminal region of pDIP no elements of regular secondary structure were observed.
Collapse
Affiliation(s)
- G Seidel
- Lehrstuhl für Biopolymere, Universität Bayreuth, Universitätsstrasse 30, D-95447 Bayreuth, Germany
| | | | | | | | | | | | | |
Collapse
|
35
|
Thompson MJ, Goldstein RA. Predicting protein secondary structure with probabilistic schemata of evolutionarily derived information. Protein Sci 1997; 6:1963-75. [PMID: 9300496 PMCID: PMC2143796 DOI: 10.1002/pro.5560060917] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We demonstrate the applicability of our previously developed Bayesian probabilistic approach for predicting residue solvent accessibility to the problem of predicting secondary structure. Using only single-sequence data, this method achieves a three-state accuracy of 67% over a database of 473 non-homologous proteins. This approach is more amenable to inspection and less likely to overlearn specifics of a dataset than "black box" methods such as neural networks. It is also conceptually simpler and less computationally costly. We also introduce a novel method for representing and incorporating multiple-sequence alignment information within the prediction algorithm, achieving 72% accuracy over a dataset of 304 non-homologous proteins. This is accomplished by creating a statistical model of the evolutionarily derived correlations between patterns of amino acid substitution and local protein structure. This model consists of parameter vectors, termed "substitution schemata," which probabilistically encode the structure-based heterogeneity in the distributions of amino acid substitutions found in alignments of homologous proteins. The model is optimized for structure prediction by maximizing the mutual information between the set of schemata and the database of secondary structures. Unlike "expert heuristic" methods, this approach has been demonstrated to work well over large datasets. Unlike the opaque neural network algorithms, this approach is physicochemically intelligible. Moreover, the model optimization procedure, the formalism for predicting one-dimensional structural features and our previously developed method for tertiary structure recognition all share a common Bayesian probabilistic basis. This consistency starkly contrasts with the hybrid and ad hoc nature of methods that have dominated this field in recent years.
Collapse
Affiliation(s)
- M J Thompson
- Biophysics Research Division, University of Michigan, Ann Arbor 48109-1055, USA
| | | |
Collapse
|
36
|
Hipp WM, Pott AS, Thum-Schmitz N, Faath I, Dahl C, Trüper HG. Towards the phylogeny of APS reductases and sirohaem sulfite reductases in sulfate-reducing and sulfur-oxidizing prokaryotes. MICROBIOLOGY (READING, ENGLAND) 1997; 143 ( Pt 9):2891-2902. [PMID: 9308173 DOI: 10.1099/00221287-143-9-2891] [Citation(s) in RCA: 123] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The genes for adenosine-5'-phosphosulfate (APS) reductase, aprBA, and sirohaem sulfite reductase, dsrAB, from the sulfur-oxidizing phototrophic bacterium Chromatium vinosum strain D (DSMZ 180(T)) were cloned and sequenced. Statistically significant sequence similarities and similar physicochemical properties suggest that the aprBA and dsrAB gene products from Chr. vinosum are true homologues of their counterparts from the sulfate-reducing chemotrophic archaeon Archaeoglobus fulgidus and the sulfate-reducing chemotrophic bacterium Desulfovibrio vulgaris. Evidence for the proposed duplication of a common ancestor of the dsrAB genes is provided. Phylogenetic analyses revealed a greater evolutionary distance between the enzymes from Chr. vinosum and D. vulgaris than between those from A. fulgidus and D. vulgaris. The data reported in this study are most consistent with the concept of common ancestral protogenotic genes both for dissimilatory sirohaem sulfite reductases and for APS reductases. The aprA gene was demonstrated to be a suitable DNA probe for the identification of apr genes from organisms of different phylogenetic positions. PCR primers and conditions for the amplification of apr homologous regions are described.
Collapse
Affiliation(s)
- Wolfgang M Hipp
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| | - Andrea S Pott
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| | - Natalie Thum-Schmitz
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| | - Ilka Faath
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| | - Christiane Dahl
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| | - Hans G Trüper
- Institut for Mikrobiologie & Biotechnologie, Rheinische Friedrich-Wilhelms-Universitat Bonn, Meckenheimer Allee 168, 53115 Bonn, Germany
| |
Collapse
|
37
|
Rink R, Fennema M, Smids M, Dehmel U, Janssen DB. Primary structure and catalytic mechanism of the epoxide hydrolase from Agrobacterium radiobacter AD1. J Biol Chem 1997; 272:14650-7. [PMID: 9169427 DOI: 10.1074/jbc.272.23.14650] [Citation(s) in RCA: 135] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
The epoxide hydrolase gene from Agrobacterium radiobacter AD1, a bacterium that is able to grow on epichlorohydrin as the sole carbon source, was cloned by means of the polymerase chain reaction with two degenerate primers based on the N-terminal and C-terminal sequences of the enzyme. The epoxide hydrolase gene coded for a protein of 294 amino acids with a molecular mass of 34 kDa. An identical epoxide hydrolase gene was cloned from chromosomal DNA of the closely related strain A. radiobacter CFZ11. The recombinant epoxide hydrolase was expressed up to 40% of the total cellular protein content in Escherichia coli BL21(DE3) and the purified enzyme had a kcat of 21 s-1 with epichlorohydrin. Amino acid sequence similarity of the epoxide hydrolase with eukaryotic epoxide hydrolases, haloalkane dehalogenase from Xanthobacter autotrophicus GJ10, and bromoperoxidase A2 from Streptomyces aureofaciens indicated that it belonged to the alpha/beta-hydrolase fold family. This conclusion was supported by secondary structure predictions and analysis of the secondary structure with circular dichroism spectroscopy. The catalytic triad residues of epoxide hydrolase are proposed to be Asp107, His275, and Asp246. Replacement of these residues to Ala/Glu, Arg/Gln, and Ala, respectively, resulted in a dramatic loss of activity for epichlorohydrin. The reaction mechanism of epoxide hydrolase proceeds via a covalently bound ester intermediate, as was shown by single turnover experiments with the His275 --> Arg mutant of epoxide hydrolase in which the ester intermediate could be trapped.
Collapse
Affiliation(s)
- R Rink
- Department of Biochemistry, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG, Groningen, The Netherlands
| | | | | | | | | |
Collapse
|
38
|
Abstract
The accuracy of secondary structure prediction methods has been improved significantly by the use of aligned protein sequences. The PHD method and the NNSSP method reach 71 to 72% of sustained overall three-state accuracy when multiple sequence alignments are with neural networks and nearest-neighbor algorithms, respectively. We introduce a variant of the nearest-neighbor approach that can achieve similar accuracy using a single sequence as the query input. We compute the 50 best non-intersecting local alignments of the query sequence with each sequence from a set of proteins with known 3D structures. Each position of the query sequence is aligned with the database amino acids in alpha-helical, beta-strand or coil states. The prediction type of secondary structure is selected as the type of aligned position with the maximal total score. On the dataset of 124 non-membrane non-homologous proteins, used earlier as a benchmark for secondary structure predictions, our method reaches an overall three-state accuracy of 71.2%. The performance accuracy is verified by an additional test on 461 non-homologous proteins giving an accuracy of 71.0%. The main strength of the method is the high level of prediction accuracy for proteins without any known homolog. Using multiple sequence alignments as input the method has a prediction accuracy of 73.5%. Prediction of secondary structure by the SSPAL method is available via Baylor College of Medicine World Wide Web server.
Collapse
Affiliation(s)
- A A Salamov
- Department of Cell Biology, Baylor College of Medicine, Houston, TX 77030, USA
| | | |
Collapse
|
39
|
Abstract
In this study we present an accurate secondary structure prediction procedure by using an query and related sequences. The most novel aspect of our approach is its reliance on local pairwise alignment of the sequence to be predicted with each related sequence rather than utilization of a multiple alignment. The residue-by-residue accuracy of the method is 75% in three structural states after jack-knife tests. The gain in prediction accuracy compared with the existing techniques, which are at best 72%, is achieved by secondary structure propensities based on both local and long-range effects, utilization of similar sequence information in the form of carefully selected pairwise alignment fragments, and reliance on a large collection of known protein primary structures. The method is especially appropriate for large-scale sequence analysis of efforts such as genome characterization, where precise and significant multiple sequence alignments are not available or achievable.
Collapse
Affiliation(s)
- D Frishman
- European Molecular Biology Laboratory, Heidelberg, Germany
| | | |
Collapse
|
40
|
Osuna J, Soberón X, Morett E. A proposed architecture for the central domain of the bacterial enhancer-binding proteins based on secondary structure prediction and fold recognition. Protein Sci 1997; 6:543-55. [PMID: 9070437 PMCID: PMC2143673 DOI: 10.1002/pro.5560060304] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The expression of genes transcribed by the RNA polymerase with the alternative sigma factor sigma 54 (E sigma 54) is absolutely dependent on activator proteins that bind to enhancer-like sites, located far upstream from the promoter. These unique prokaryotic proteins, known as enhancer-binding proteins (EBP), mediate open promoter complex formation in a reaction dependent on NTP hydrolysis. The best characterized proteins of this family of regulators are NtrC and NifA, which activate genes required for ammonia assimilation and nitrogen fixation, respectively. In a recent IRBM course (@ontiers of protein structure prediction," IRBM, Pomezia, Italy, 1995; see web site http://www.mrc-cpe.cam.uk/irbm-course95/), one of us (J.O.) participated in the elaboration of the proposal that the Central domain of the EBPs might adopt the classical mononucleotide-binding fold. This suggestion was based on the results of a new protein fold recognition algorithm (Map) and in the mapping of correlated mutations calculated for the sequence family on the same mononucleotide-binding fold topology. In this work, we present new data that support the previous conclusion. The results from a number of different secondary structure prediction programs suggest that the Central domain could adopt an alpha/beta topology. The fold recognition programs ProFIT 0.9, 3D PROFILE combined with secondary structure prediction, and 123D suggest a mononucleotide-binding fold topology for the Central domain amino acid sequence. Finally, and most importantly, three of five reported residue alterations that impair the Central domain. ATPase activity of the E sigma 54 activators are mapped to polypeptide regions that might be playing equivalent roles as those involved in nucleotide-binding in the mononucleotide-binding proteins. Furthermore, the known residue substitution that alter the function of the E sigma 54 activators, leaving intact the Central domain ATPase activity, are mapped on region proposed to play an equivalent role as the effector region of the GTPase superfamily.
Collapse
Affiliation(s)
- J Osuna
- Departamento de Reconocimiento Molecular Bioestructura, Universidad Nacional Autónoma de México, México.
| | | | | |
Collapse
|
41
|
Pedersen JT, Moult J. Ab initio protein folding simulations with genetic algorithms: Simulations on the complete sequence of small proteins. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(1997)1+<179::aid-prot23>3.0.co;2-k] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
42
|
Frishman D, Argos P. The future of protein secondary structure prediction accuracy. FOLDING & DESIGN 1997; 2:159-62. [PMID: 9218953 DOI: 10.1016/s1359-0278(97)00022-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
BACKGROUND The accuracy of secondary structure prediction for a protein from knowledge of its sequence has been significantly improved by about 7% to the 70-75% range by inclusion of information residing in sequences similar to the query sequence. The scientific literature has been inconsistent, if not negative, regarding chances for further improvement from the vast knowledge to be provided by genome sequencing efforts. RESULTS By applying a prediction technique that is particularly sensitive to added sequence information to a standard set of query sequences with related primary structures taken from chronologically successive releases of the SWISS-PROT database, it is shown that prediction accuracy can be expected to reach 80-85% with a large 10-fold increase in present sequence knowledge. CONCLUSIONS Even with present prediction approaches, improvement in prediction accuracy can still be expected, albeit limited to no more than 10%.
Collapse
Affiliation(s)
- D Frishman
- Martinsried Institute for Protein Sequences, Max-Planck-Institute for Biochemistry, Germany.
| | | |
Collapse
|
43
|
Abstract
N-myristoylation is an acylation process absolutely specific to the N-terminal amino acid glycine in proteins. This maturation process concerns about a hundred proteins in lower and higher eukaryotes involved in oncogenesis, in secondary cellular signalling, in infectivity of retroviruses and, marginally, of other virus types. Thy cytosolic enzyme responsible for this activity, N-myristoyltransferase (NMT), studied since 1987, has been purified from different sources. However, the studies of the specificities of the various NMTs have not progressed in detail except for those relating to the yeast cytosolic enzyme. Still to be explained are differences in species specificity and between various putative isoenzymes, also whether the data obtained from the yeast enzyme can be transposed to other NMTs. The present review discusses data on the various addressing processes subsequent to myristoylation, a patchwork of pathways that suggests myristoylation is only the first step of the mechanisms by which a protein associates with the membrane. Concerning the enzyme itself, there are evidences that NMT is also present in the endoplasmic reticulum and that its substrate specificity is different from that of the cytosolic enzyme(s). These differences have major implications for their differential inhibition and for their respective roles in several pathologies. For instance, the NMTs from mammalians are clearly different from those found in several microorganisms, which raises the question whether the NMT may be a new targets for fungicides. Finally, since myristoylation has a central role in virus maturation and oncogenesis, specific NMT inhibitors might lead to potent antivirus and anticancer agents.
Collapse
Affiliation(s)
- J A Boutin
- Département de Chemie des Peptides, Institut de Recherches Servier 11, Suresnes, France
| |
Collapse
|
44
|
Chebrou H, Bigey F, Arnaud A, Galzy P. Study of the amidase signature group. BIOCHIMICA ET BIOPHYSICA ACTA 1996; 1298:285-93. [PMID: 8980653 DOI: 10.1016/s0167-4838(96)00145-8] [Citation(s) in RCA: 105] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Computer methods for database search, multiple alignment and cluster analysis indicated significant homology between amino-acid sequences of 21 amidases or amidohydrolases (EC 3.5). All of them were found to be involved in the reduction of organic nitrogen compounds and ammonia production. A conserved motif was found which may be important in amide binding and in catalytic mechanisms. Homology studies between these amidases and some ureases, nitrilases and acyl-transferases or enzymes with unknown functions provided new insight into the evolution of these proteins. Dissemination of these genes seemed to be facilitated by transfer of genetic elements such as transposons and plasmids.
Collapse
Affiliation(s)
- H Chebrou
- Chaire de Microbiologie Industrielle et de Génétique des Micro-organismes, E.N.S.A.-I.N.R.A., Montpellier, France
| | | | | | | |
Collapse
|
45
|
King RD, Sternberg MJ. Identification and application of the concepts important for accurate and reliable protein secondary structure prediction. Protein Sci 1996; 5:2298-310. [PMID: 8931148 PMCID: PMC2143286 DOI: 10.1002/pro.5560051116] [Citation(s) in RCA: 338] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
A protein secondary structure prediction method from multiply aligned homologous sequences is presented with an overall per residue three-state accuracy of 70.1%. There are two aims: to obtain high accuracy by identification of a set of concepts important for prediction followed by use of linear statistics; and to provide insight into the folding process. The important concepts in secondary structure prediction are identified as: residue conformational propensities, sequence edge effects, moments of hydrophobicity, position of insertions and deletions in aligned homologous sequence, moments of conservation, auto-correlation, residue ratios, secondary structure feedback effects, and filtering. Explicit use of edge effects, moments of conservation, and auto-correlation are new to this paper. The relative importance of the concepts used in prediction was analyzed by stepwise addition of information and examination of weights in the discrimination function. The simple and explicit structure of the prediction allows the method to be reimplemented easily. The accuracy of a prediction is predictable a priori. This permits evaluation of the utility of the prediction: 10% of the chains predicted were identified correctly as having a mean accuracy of > 80%. Existing high-accuracy prediction methods are "black-box" predictors based on complex nonlinear statistics (e.g., neural networks in PHD: Rost & Sander, 1993a). For medium- to short-length chains (> or = 90 residues and < 170 residues), the prediction method is significantly more accurate (P < 0.01) than the PHD algorithm (probably the most commonly used algorithm). In combination with the PHD, an algorithm is formed that is significantly more accurate than either method, with an estimated overall three-state accuracy of 72.4%, the highest accuracy reported for any prediction method.
Collapse
Affiliation(s)
- R D King
- Biomolecular Modelling Laboratory, Imperial Cancer Research Fund, London, United Kingdom
| | | |
Collapse
|
46
|
Abstract
Every sequence comparison method requires a set of scores. For aligning protein sequences, substitution scores are based on models of amino acid conservation and properties, and matrices of these scores have substantially improved in recent years. Position-specific scoring matrices provide representations of sequence families that are capable of detecting subtle similarities. Comprehensive evaluations can effectively guide the choice of scores for sequence alignment and searching applications, including those that aid in the prediction of protein structures.
Collapse
Affiliation(s)
- S Henikoff
- Howard Hughes Medical Institute, Basic Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA 98104, USA.
| |
Collapse
|
47
|
Beissinger M, Paulus C, Bayer P, Wolf H, Rösch P, Wagner R. Sequence-specific resonance assignments of the 1H-NMR spectra and structural characterization in solution of the HIV-1 transframe protein p6. EUROPEAN JOURNAL OF BIOCHEMISTRY 1996; 237:383-92. [PMID: 8647076 DOI: 10.1111/j.1432-1033.1996.0383k.x] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The frameshift protein p6* encoded directly upstream of the protease in the human immunodeficiency virus type 1 (HIV-1) pol reading frame is thought to be a natural inhibitor of protease activation and to play a role in the polyprotein processing of Gag and Gag-Pol precursors. To allow structural characterization of the p6* transframe protein, the p6* coding region was cloned into the vector pGEX-KG and expressed in Escherichia coli as a fusion protein with glutathione S-transferase (GST) under the control of the tac promoter. Thrombin cleavage of the construct resulted in a 70-amino-acid polypeptide which is extended by two additional residues at the N-terminus compared to the natural p6* sequence. The native purification procedure including an affinity and a size-exclusion chromatography step yielded sufficient amounts of highly pure protein suitable for NMR spectroscopy. Fluorescence, circular dichroism and 1H-NMR spectroscopy were applied to characterize the structure of protein. Two-dimensional NMR spectra provided essentially complete sequence-specific resonance assignments at pH 5.9. Although there is evidence for a helix-forming tendency in the N-terminus of the protein, the experiments indicate that p6* has no overall stable secondary or tertiary structure with the single tryptophan exposed in aqueous solution. However, the results reported herein open the way to characterize further the interaction of p6* with the HIV-1 protease in structural and functional in vitro studies.
Collapse
Affiliation(s)
- M Beissinger
- Lehrstuhl für Biopolymere, Universität Bayreuth, Germany
| | | | | | | | | | | |
Collapse
|
48
|
Abstract
The smooth progression of the eukaryotic cell cycle relies on the periodic activation of members of a family of cell cycle kinases by regulatory proteins called cyclins. Outside of the cell cycle, cyclin homologs play important roles in regulating the assembly of transcription complexes; distant structural relatives of the conserved cyclin core or "box" can also function as general transcription factors (like TFIIB) or survive embedded in the chain of the tumor suppressor, retinoblastoma protein. The present work attempts the prediction of the canonical secondary, supersecondary, and tertiary fold of the minimal cyclin box domain using a combination of techniques that make use of the evolutionary information captured in a multiple alignment of homolog sequences. A tandem set of closely packed, helical modules are predicted to form the cyclin box domain.
Collapse
Affiliation(s)
- J F Bazan
- Protein Machine Group, Department of Molecular Biology, DNAX Research Institute, Palo Alto, California 94304-1104, USA
| |
Collapse
|