1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Wardah W, Khan M, Sharma A, Rashid MA. Protein secondary structure prediction using neural networks and deep learning: A review. Comput Biol Chem 2019; 81:1-8. [DOI: 10.1016/j.compbiolchem.2019.107093] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Revised: 12/28/2018] [Accepted: 07/10/2019] [Indexed: 02/02/2023]
|
3
|
An alternative approach to protein folding. BIOMED RESEARCH INTERNATIONAL 2013; 2013:583045. [PMID: 24078920 PMCID: PMC3775432 DOI: 10.1155/2013/583045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 06/20/2013] [Accepted: 07/31/2013] [Indexed: 11/26/2022]
Abstract
A diffusion theory-based, all-physical ab initio protein folding simulation is described and applied. The model is based upon the drift and diffusion of protein substructures relative to one another in the multiple energy fields present. Without templates or statistical inputs, the simulations were run at physiologic and ambient temperatures (including pH). Around 100 protein secondary structures were surveyed, and twenty tertiary structures were determined. Greater than 70% of the secondary core structures with over 80% alpha helices were correctly identified on protein ranging from 30 to 200 amino-acid sequence. The drift-diffusion model predicted tertiary structures with RMSD values in the 3–5 Angstroms range for proteins ranging 30 to 150 amino acids. These predictions are among the best for an all ab initio protein simulation. Simulations could be run entirely on a desktop computer in minutes; however, more accurate tertiary structures were obtained using molecular dynamic energy relaxation. The drift-diffusion model generated realistic energy versus time traces. Rapid secondary structures followed by a slow compacting towards lower energy tertiary structures occurred after an initial incubation period in agreement with observations.
Collapse
|
4
|
Fernandez-Fuentes N, Fiser A. A modular perspective of protein structures: application to fragment based loop modeling. Methods Mol Biol 2013; 932:141-58. [PMID: 22987351 PMCID: PMC3635063 DOI: 10.1007/978-1-62703-065-6_9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Proteins can be decomposed into supersecondary structure modules. We used a generic definition of supersecondary structure elements, so-called Smotifs, which are composed of two flanking regular secondary structures connected by a loop, to explore the evolution and current variety of structure building blocks. Here, we discuss recent observations about the saturation of Smotif geometries in protein structures and how it opens new avenues in protein structure modeling and design. As a first application of these observations we describe our loop conformation modeling algorithm, ArchPred that takes advantage of Smotifs classification. In this application, instead of focusing on specific loop properties the method narrows down possible template conformations in other, often not homologous structures, by identifying the most likely supersecondary structure environment that cradles the loop. Beyond identifying the correct starting supersecondary structure geometry, it takes into account information of fit of anchor residues, sterical clashes, match of predicted and observed dihedral angle preferences, and local sequence signal.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, University of Leeds, St. James's University Hospital, Leeds LS9 7TF, UK
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry Albert Einstein College of Medicine, 1301 Morris Park Ave, Bronx, NY 10461, USA
| |
Collapse
|
5
|
Li D, Li T, Cong P, Xiong W, Sun J. A novel structural position-specific scoring matrix for the prediction of protein secondary structures. Bioinformatics 2011; 28:32-9. [DOI: 10.1093/bioinformatics/btr611] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
6
|
Fernandez-Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol 2010; 6:e1000750. [PMID: 20421995 PMCID: PMC2858679 DOI: 10.1371/journal.pcbi.1000750] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 03/19/2010] [Indexed: 11/29/2022] Open
Abstract
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region. Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- University of Leeds, Leeds Institute of Molecular Medicine Section of Experimental Therapeutics, St. James's University Hospital, Leeds, United Kingdom
| | - Joseph M. Dybas
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
7
|
Sivan S, Filo O, Siegelmann H. Application of expert networks for predicting proteins secondary structure. ACTA ACUST UNITED AC 2007; 24:237-43. [PMID: 17236807 DOI: 10.1016/j.bioeng.2006.12.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2006] [Revised: 12/05/2006] [Accepted: 12/06/2006] [Indexed: 02/02/2023]
Abstract
The present study utilizes expert neural networks for the prediction of proteins secondary structure. We use three independent networks, one for each structure (alpha, beta and coil) as the first-level processing unit; decision upon the chosen structure for each residue is carried out by a second-level, post-processing unit, which utilizes the Chou and Fasman frequency values Falpha and Fbeta in order to strengthen and/or deplete the probability of the specific structure under investigation. The highest prediction case was 76%. Our method requires primitive computational means and a relatively small training set, while still been comparable to previous work. It is not meant to be an alternative to the determination of secondary structure by means of free energy minimization, integration of dynamic equations of motion or crystallography, which are expensive, time-consuming and complicated, but to provide additional constrains, which might be considered and incorporated into larger computing setups in order to reduce the initial search space for the above methods.
Collapse
Affiliation(s)
- Sarit Sivan
- Department of Biomedical Engineering, Technion, Israel Institute of Technology, IIT, Haifa 32000, Israel.
| | | | | |
Collapse
|
8
|
Brylinski M, Konieczny L, Czerwonko P, Jurkowski W, Roterman I. Early-stage folding in proteins (in silico) sequence-to-structure relation. J Biomed Biotechnol 2006; 2005:65-79. [PMID: 16046811 PMCID: PMC1184056 DOI: 10.1155/jbb.2005.65] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A sequence-to-structure library has been created based on the
complete PDB database. The tetrapeptide was selected as a unit
representing a well-defined structural motif. Seven structural
forms were introduced for structure classification. The
early-stage folding conformations were used as the objects for
structure analysis and classification. The degree of
determinability was estimated for the sequence-to-structure and
structure-to-sequence relations. Probability calculus and
informational entropy were applied for quantitative estimation of
the mutual relation between them. The structural motifs
representing different forms of loops and bends were found to
favor particular sequences in structure-to-sequence analysis.
Collapse
Affiliation(s)
- Michał Brylinski
- Department of Bioinformatics and
Telemedicine, Medical College, Jagiellonian University,
Kopernika 17, 31-501, Poland
| | - Leszek Konieczny
- Institute of Biochemistry,
Medical Faculty, Jagiellonian University, Kopernika 7, 31-501
Cracow, Poland
| | - Patryk Czerwonko
- Department of Bioinformatics and
Telemedicine, Medical College, Jagiellonian University,
Kopernika 17, 31-501, Poland
| | - Wiktor Jurkowski
- Department of Bioinformatics and
Telemedicine, Medical College, Jagiellonian University,
Kopernika 17, 31-501, Poland
| | - Irena Roterman
- Department of Bioinformatics and
Telemedicine, Medical College, Jagiellonian University,
Kopernika 17, 31-501, Poland
- *Irena Roterman:
| |
Collapse
|
9
|
Pal L, Dasgupta B, Chakrabarti P. 3(10)-Helix adjoining alpha-helix and beta-strand: sequence and structural features and their conservation. Biopolymers 2005; 78:147-62. [PMID: 15759287 DOI: 10.1002/bip.20266] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Does the amino acid use at the terminal positions of an alpha-helix become altered depending on the context-more specifically, when there is an adjoining 3(10)-helix, and can a single helical cylinder encompass the resultant composite helix? An analysis of 138 and 107 cases of 3(10)-alpha and alpha-3(10) composite helices, respectively, found in known protein structures indicate that the secondary structural element occurring first imposes its characteristics on the sequence of the structural element coming next. Thus, when preceded by a 3(10)-helix, the preference of proline to occur at the N1 position of an alpha-helix is shifted to the N2 position, a typical characteristic of the C-terminal capping of the 3(10)-helix. When an alpha- or a 3(10)-helix leads into a helix of the other type, there is a bend at the junction, especially for the 3(10)-alpha composite, with the two junction residues facing inward and buried within the structure. Thus a single helical cylinder may not properly represent a composite helix, the bend providing a means for the tertiary structure to assume a globular shape, very much akin to what a proline-induced kink does to an alpha-helix. The tertiary structural context in which beta-3(10) and 3(10)-beta composites occurs can be different, causing the angle between the secondary structural elements in the two cases to be different. Composites of 3(10)-helices and beta-strands are much more conserved among members in families of homologous structures than those between two types of helices; in many of the former instances, the 3(10)-helix constitutes the loops in beta-hairpin or beta-beta-corner motifs. The overall fold of the chain may be more conserved than the actual identify of the secondary structure elements in a composite.
Collapse
Affiliation(s)
- Lipika Pal
- Bioinformatics Centre, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| | | | | |
Collapse
|
10
|
Dupuis F, Sadoc JF, Mornon JP. Protein secondary structure assignment through Voronoï tessellation. Proteins 2004; 55:519-28. [PMID: 15103616 DOI: 10.1002/prot.10566] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a new automatic algorithm, named VoTAP (Voronoï Tessellation Assignment Procedure), which assigns secondary structures of a polypeptide chain using the list of alpha-carbon coordinates. This program uses three-dimensional Voronoï tessellation. This geometrical tool associates with each amino acid a Voronoï polyhedron, the faces of which unambiguously define contacts between residues. Thanks to the face area, for the contacts close together along the primary structure (low-order contacts) a distinction is made between strong and normal ones. This new definition yields new contact matrices, which are analyzed and used to assign secondary structures. This assignment is performed in two stages. The first one uses contacts between residues close together along the primary structure and is based on data collected on a bank of 282 well-refined nonredundant structures. In this bank, associations were made between the prints defined by these low-order contacts and the assignments performed by different automatic methods. The second step focuses on the strand assignment and uses contacts between distant residues. Comparison with several other automatic assignment methods are presented, and the influence of resolution on the assignment is investigated.
Collapse
Affiliation(s)
- Franck Dupuis
- Laboratoire de Minéralogie Cristallographie Paris, CNRS UMR 7590, Universités Paris 6 et 7, Paris, France
| | | | | |
Collapse
|
11
|
Chang DK, Cheng SF, Yang SH. A helix initiation motif, XLLRA, is stabilized by hydrogen bond, hydrophobic and van der Waals interactions. BIOCHIMICA ET BIOPHYSICA ACTA 2000; 1478:39-50. [PMID: 10719173 DOI: 10.1016/s0167-4838(99)00286-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Five partially overlapping synthetic peptides containing the N-terminal portion of the leucine zipper (LZ)-like domain of human immunodeficiency virus envelope glycoprotein gp41 were used to deduce the helix initiation site. Circular dichroism (CD) data suggested a strong helix-inducing motif, LLRA. The coupling constant and nuclear Overhauser effect (NOE) results obtained from nuclear magnetic resonance experiments in 20% trifluoroethanol aqueous solution at 280 K for the four decapeptides under study suggested that the motif XLLRA, where X is a group or an amino acid residue capable of forming hydrogen bond to arginine, constitutes a helix nucleation core. A similar conclusion was reached for a pentadecapeptide in water, suggesting that the result was not dependent on both chain length and the helix promoting medium. Detailed analysis of NOE and CD data from the four decapeptides indicated that the acetyl group and asparagine had a strong tendency to be helix N-capping, in confirmation of previous studies. Molecular modeling using restraints derived from NOE data showed that van der Waals, hydrophobic interactions and hydrogen bonds contribute synergetically to the stability of the core structure. The concept of nucleation core consisting of a few amino acids may be generally applied in proton design and folding studies.
Collapse
Affiliation(s)
- D K Chang
- Institute of Chemistry, Academia Sinica, Taipei, Taiwan.
| | | | | |
Collapse
|
12
|
Abstract
We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.
Collapse
Affiliation(s)
- S C Schmidler
- Section on Medical Informatics, Stanford University School of Medicine, CA 94305, USA.
| | | | | |
Collapse
|
13
|
Han KF, Bystroff C, Baker D. Three-dimensional structures and contexts associated with recurrent amino acid sequence patterns. Protein Sci 1997; 6:1587-90. [PMID: 9232660 PMCID: PMC2143736 DOI: 10.1002/pro.5560060723] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
We have used cluster analysis to identify recurring sequence patterns that transcend protein family boundaries. A subset of these patterns occur predominantly in a single type of local structure in proteins. Here we characterize the three-dimensional structures and contexts in which these sequence patterns occur, with particular attention to the interactions responsible for their structural selectivity.
Collapse
Affiliation(s)
- K F Han
- Graduate Group in Biophysics, University of California San Francisco School of Medicine 94143-0448, USA
| | | | | |
Collapse
|
14
|
Fetrow JS, Palumbo MJ, Berg G. Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199702)27:2<249::aid-prot11>3.0.co;2-m] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
15
|
Lin TH, Peng WJ, Lin JJ. A contact scoring matrix for qualitative prediction of change in folding of alpha-helices in globular proteins caused by a mutation. BIOCHIMICA ET BIOPHYSICA ACTA 1997; 1337:17-26. [PMID: 9003433 DOI: 10.1016/s0167-4838(96)00144-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
The atomic pairs in contact for atoms from pairs of amino-acid residues on pairs of helices in a protein database consisting of 48 proteins of known tertiary structure from the Brookhaven Protein Data Bank are searched and counted to construct a primary scoring system. Each score in the primary scoring system is weighted further with the possibility of occurrence of each residue pair in the protein database to give a final scoring matrix. Scores for predicting change in folding of alpha-helices in a mutant protein are calculated by assuming that every pair of helices in the protein can closely interact with each other. It is shown that the change in folding of alpha-helices in several mutant proteins are reflected in both the change of the contact scores and the helix geometry calculated.
Collapse
Affiliation(s)
- T H Lin
- Department of Life Science, National Tsing Hua University, Taiwan, ROC.
| | | | | |
Collapse
|
16
|
Abstract
All of molecular recognition, from the binding of substrates by enzymes, information transfer in replicating and processing the genetic information to the folding of proteins, is dominated by non-covalent interactions. Perhaps the most difficult challenge is understanding protein folding because each group in the molecule has to recognize with which ones it has to pair. Protein engineering is providing an experimental entry to determine the magnitude, nature and importance of the various levels of recognition in protein folding. In addition to providing the energetics of specific interactions, fundamental information has been given on the energetics of burial of hydrophobic and hydrophilic solvent-accessible surface areas and their specific roles in stabilizing protein cores and helices.
Collapse
|
17
|
Han KF, Baker D. Global properties of the mapping between local amino acid sequence and local structure in proteins. Proc Natl Acad Sci U S A 1996; 93:5814-8. [PMID: 8650175 PMCID: PMC39144 DOI: 10.1073/pnas.93.12.5814] [Citation(s) in RCA: 93] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Local protein structure prediction efforts have consistently failed to exceed approximately 70% accuracy. We characterize the degeneracy of the mapping from local sequence to local structure responsible for this failure by investigating the extent to which similar sequence segments found in different proteins adopt similar three-dimensional structures. Sequence segments 3-15 residues in length from 154 different protein families are partitioned into neighborhoods containing segments with similar sequences using cluster analysis. The consistency of the sequence-to-structure mapping is assessed by comparing the local structures adopted by sequence segments in the same neighborhood in proteins of known structure. In the 154 families, 45% and 28% of the positions occur in neighborhoods in which one and two local structures predominate, respectively. The sequence patterns that characterize the neighborhoods in the first class probably include virtually all of the short sequence motifs in proteins that consistently occur in a particular local structure. These patterns, many of which occur in transitions between secondary structural elements, are an interesting combination of previously studied and novel motifs. The identification of sequence patterns that consistently occur in one or a small number of local structures in proteins should contribute to the prediction of protein structure from sequence.
Collapse
Affiliation(s)
- K F Han
- Graduate Group in Biophysics, University of California, San Francisco, 94143, USA
| | | |
Collapse
|
18
|
Abstract
In the past years, much effort has been put on the development of new methodologies and algorithms for the prediction of protein secondary and tertiary structures from (sequence) data; this is reviewed in detail. New approaches for these predictions such as neural network methods, genetic algorithms, machine learning, and graph theoretical methods are discussed. Secondary structure prediction algorithms were improved mostly by considering families of related proteins; however, for the reliable tertiary structure modeling of proteins, knowledge-based techniques are still preferred. Methods and examples with more or less successful results are described. Also, programs and parameterizations for energy minimisations, molecular dynamics, and electrostatic interactions have been improved, especially with respect to their former limits of applicability. Other topics discussed in this review include the use of traditional and on-line databases, the docking problem and surface properties of biomolecules, packing of protein cores, de novo design and protein engineering, prediction of membrane protein structures, the verification and reliability of model structures, and progress made with currently available software and computer hardware. In summary, the prediction of the structure, function, and other properties of a protein is still possible only within limits, but these limits continue to be moved.
Collapse
Affiliation(s)
- G Böhm
- Institut für Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, Germany
| |
Collapse
|
19
|
Lemer CM, Rooman MJ, Wodak SJ. Protein structure prediction by threading methods: evaluation of current techniques. Proteins 1995; 23:337-55. [PMID: 8710827 DOI: 10.1002/prot.340230308] [Citation(s) in RCA: 163] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
This paper evaluates the results of a protein structure prediction contest. The predictions were made using threading procedures, which employ techniques for aligning sequences with 3D structures to select the correct fold of a given sequence from a set of alternatives. Nine different teams submitted 86 predictions, on a total of 21 target proteins with little or no sequence homology to proteins of known structure. The 3D structures of these proteins were newly determined by experimental methods, but not yet published or otherwise available to the predictors. The predictions, made from the amino acid sequence alone, thus represent a genuine test of the current performance of threading methods. Only a subset of all the predictions is evaluated here. It corresponds to the 44 predictions submitted for the 11 target proteins seen to adopt known folds. The predictions for the remaining 10 proteins were not analyzed, although weak similarities with known folds may also exist in these proteins. We find that threading methods are capable of identifying the correct fold in many cases, but not reliably enough as yet. Every team predicts correctly a different set of targets, with virtually all targets predicted correctly by at least one team. Also, common folds such as TIM barrels are recognized more readily than folds with only a few known examples. However, quite surprisingly, the quality of the sequence-structure alignments, corresponding to correctly recognized folds, is generally very poor, as judged by comparison with the corresponding 3D structure alignments. Thus, threading can presently not be relied upon to derive a detailed 3D model from the amino acid sequence. This raises a very intriguing question: how is fold recognition achieved? Our analysis suggests that it may be achieved because threading procedures maximize hydrophobic interactions in the protein core, and are reasonably good at recognizing local secondary structure.
Collapse
Affiliation(s)
- C M Lemer
- Unité de Conformation de Macromolécules Biologiques, Brussels, Belgium
| | | | | |
Collapse
|
20
|
Mocz G. Fuzzy cluster analysis of simple physicochemical properties of amino acids for recognizing secondary structure in proteins. Protein Sci 1995; 4:1178-87. [PMID: 7549882 PMCID: PMC2143138 DOI: 10.1002/pro.5560040616] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Fuzzy cluster analysis has been applied to the 20 amino acids by using 65 physicochemical properties as a basis for classification. The clustering products, the fuzzy sets (i.e., classical sets with associated membership functions), have provided a new measure of amino acid similarities for use in protein folding studies. This work demonstrates that fuzzy sets of simple molecular attributes, when assigned to amino acid residues in a protein's sequence, can predict the secondary structure of the sequence with reasonable accuracy. An approach is presented for discriminating standard folding states, using near-optimum information splitting in half-overlapping segments of the sequence of assigned membership functions. The method is applied to a nonredundant set of 252 proteins and yields approximately 73% matching for correctly predicted and correctly rejected residues with approximately 60% overall success rate for the correctly recognized ones in three folding states: alpha-helix, beta-strand, and coil. The most useful attributes for discriminating these states appear to be related to size, polarity, and thermodynamic factors. Van der Waals volume, apparent average thickness of surrounding molecular free volume, and a measure of dimensionless surface electron density can explain approximately 95% of prediction results. hydrogen bonding and hydrophobicity induces do not yet enable clear clustering and prediction.
Collapse
Affiliation(s)
- G Mocz
- Pacific Biomedical Research Center, University of Hawaii, Honolulu 96822, USA
| |
Collapse
|
21
|
Chandonia JM, Karplus M. Neural networks for secondary structure and structural class predictions. Protein Sci 1995; 4:275-85. [PMID: 7757016 PMCID: PMC2143056 DOI: 10.1002/pro.5560040214] [Citation(s) in RCA: 82] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
A pair of neural network-based algorithms is presented for predicting the tertiary structural class and the secondary structure of proteins. Each algorithm realizes improvements in accuracy based on information provided by the other. Structural class prediction of proteins nonhomologous to any in the training set is improved significantly, from 62.3% to 73.9%, and secondary structure prediction accuracy improves slightly, from 62.26% to 62.64%. A number of aspects of neural network optimization and testing are examined. They include network overtraining and an output filter based on a rolling average. Secondary structure prediction results vary greatly depending on the particular proteins chosen for the training and test sets; consequently, an appropriate measure of accuracy reflects the more unbiased approach of "jackknife" cross-validation (testing each protein in the data-base individually).
Collapse
Affiliation(s)
- J M Chandonia
- Biophysics Program, Harvard University, Cambridge, Massachusetts 02138, USA
| | | |
Collapse
|
22
|
Eisenhaber F, Persson B, Argos P. Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit Rev Biochem Mol Biol 1995; 30:1-94. [PMID: 7587278 DOI: 10.3109/10409239509085139] [Citation(s) in RCA: 96] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
This review attempts a critical stock-taking of the current state of the science aimed at predicting structural features of proteins from their amino acid sequences. At the primary structure level, methods are considered for detection of remotely related sequences and for recognizing amino acid patterns to predict posttranslational modifications and binding sites. The techniques involving secondary structural features include prediction of secondary structure, membrane-spanning regions, and secondary structural class. At the tertiary structural level, methods for threading a sequence into a mainchain fold, homology modeling and assigning sequences to protein families with similar folds are discussed. A literature analysis suggests that, to date, threading techniques are not able to show their superiority over sequence pattern recognition methods. Recent progress in the state of ab initio structure calculation is reviewed in detail. The analysis shows that many structural features can be predicted from the amino acid sequence much better than just a few years ago and with attendant utility in experimental research. Best prediction can be achieved for new protein sequences that can be assigned to well-studied protein families. For single sequences without homologues, the folding problem has not yet been solved.
Collapse
Affiliation(s)
- F Eisenhaber
- Institut für Biochemie der Charité, Medizinische Fakultät, Humboldt-Universität zu Berlin, Fed. Rep. Germany
| | | | | |
Collapse
|
23
|
Seale JW, Srinivasan R, Rose GD. Sequence determinants of the capping box, a stabilizing motif at the N-termini of alpha-helices. Protein Sci 1994; 3:1741-5. [PMID: 7849592 PMCID: PMC2142610 DOI: 10.1002/pro.5560031014] [Citation(s) in RCA: 91] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The capping box, a recurrent hydrogen bonded motif at the N-termini of alpha-helices, caps 2 of the initial 4 backbone amide hydrogen donors of the helix (Harper ET, Rose GD, 1993, Biochemistry 32:7605-7609). In detail, the side chain of the first helical residue forms a hydrogen bond with the backbone of the fourth helical residue and, reciprocally, the side chain of the fourth residue forms a hydrogen bond with the backbone of the first residue. We now enlarge the earlier definition of this motif to include an accompanying hydrophobic interaction between residues that bracket the capping box sequence on either side. The expanded box motif--in which 2 hydrogen bonds and a hydrophobic interaction are localized within 6 consecutive residues--resembles a glycine-based capping motif found at helix C-termini (Aurora R, Srinivasan R, Rose GD, 1994, Science 264:1126-1130).
Collapse
Affiliation(s)
- J W Seale
- Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, St. Louis, Missouri 63110
| | | | | |
Collapse
|
24
|
Abstract
Through the comprehensive analysis of protein sequence and structural data, relationships can be established that suggest, with varying degrees of success, structural models for a protein for which only the sequence is known. The certainty with which a model can be proposed depends on the degree of similarity between the sequence of unknown structure and the sequence of a protein of known structure. Methods are being developed to detect remote similarities between sequences or structures, and to predict protein structure based on such small levels of similarity.
Collapse
Affiliation(s)
- W R Taylor
- Laboratory of Mathematical Biology, National Institute for Medical Research, London, UK
| |
Collapse
|
25
|
Abstract
Secondary structure prediction recently has surpassed the 70% level of average accuracy, evaluated on the single residue states helix, strand and loop (Q3). But the ultimate goal is reliable prediction of tertiary (three-dimensional, 3D) structure, not 100% single residue accuracy for secondary structure. A comparison of pairs of structurally homologous proteins with divergent sequences reveals that considerable variation in the position and length of secondary structure segments can be accommodated within the same 3D fold. It is therefore sufficient to predict the approximate location of helix, strand, turn and loop segments, provided they are compatible with the formation of 3D structure. Accordingly, we define here a measure of segment overlap (Sov) that is somewhat insensitive to small variations in secondary structure assignments. The new segment overlap measure ranges from an ignorance level of 37% (random protein pairs) via a current level of 72% for a prediction method based on sequence profile input to neural networks (PHD) to an average 90% level for homologous protein pairs. We conclude that the highest scores one can reasonably expect for secondary structure prediction are a single residue accuracy of Q3 > 85% and a fractional segment overlap of Sov > 90%.
Collapse
|
26
|
Jin L, Cohen FE, Wells JA. Structure from function: screening structural models with functional data. Proc Natl Acad Sci U S A 1994; 91:113-7. [PMID: 7506411 PMCID: PMC42896 DOI: 10.1073/pnas.91.1.113] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
Structural constraints derived from different antibody epitopes on human growth hormone (hGH) were used to screen three-dimensional models of hGH that were generated by computer algorithms. Previously, alanine-scanning mutagenesis defined the residues that modulate binding to 21 different monoclonal antibodies to hGH. These functional epitopes were composed of 4-14 side chains whose alpha-carbons clustered within 4-23 A. Distance and topographic constraints for these functional epitopes were virtually the same as constraints derived from known x-ray structures of protein-antigen complexes. The constraints were used to evaluate about 1400 models of hGH that were computer-generated by a secondary-structure prediction and packing algorithm. On average each functional epitope reduced the number of models in the pool by a factor of 2, so that 8 monoclonal antibodies could reduce the number of possible models to < 10. The average root-mean-square deviation of alpha-carbon coordinates between the x-ray structure and either the pool of starting models or final models ranged from 13 to 16 A or 4 to 7 A, respectively, depending on the pool of starting models and the level of constraints imposed. All of the final models had the correct folding topography, and the best model was within 3.8 A root-mean-square deviation of the x-ray coordinates. This model was as close as it could have been because the models were built by using ideal helices and those in the x-ray structure are not. Our studies suggest that epitope mapping data can effectively screen structural models and, when coupled to predictive algorithms, can help to generate low-resolution models of a protein.
Collapse
Affiliation(s)
- L Jin
- Department of Protein Engineering, Genentech, Inc., South San Francisco, CA 94080
| | | | | |
Collapse
|
27
|
Cohen BI, Presnell SR, Cohen FE. Origins of structural diversity within sequentially identical hexapeptides. Protein Sci 1993; 2:2134-45. [PMID: 8298461 PMCID: PMC2142335 DOI: 10.1002/pro.5560021213] [Citation(s) in RCA: 77] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Efforts to predict protein secondary structure have been hampered by the apparent structural plasticity of local amino acid sequences. Kabsch and Sander (1984, Proc. Natl. Acad. Sci. USA 81, 1075-1078) articulated this problem by demonstrating that identical pentapeptide sequences can adopt distinct structures in different proteins. With the increased size of the protein structure database and the availability of new methods to characterize structural environments, we revisit this observation of structural plasticity. Within a set of proteins with less than 50% sequence identity, 59 pairs of identical hexapeptide sequences were identified. These local structures were compared and their surrounding structural environments examined. Within a protein structural class (alpha/alpha, beta/beta, alpha/beta, alpha + beta), the structural similarity of sequentially identical hexapeptides usually is preserved. This study finds eight pairs of identical hexapeptide sequences that adopt beta-strand structure in one protein and alpha-helical structure in the other. In none of the eight cases do the members of these sequences pairs come from proteins within the same folding class. These results have implications for class dependent secondary structure prediction algorithms.
Collapse
Affiliation(s)
- B I Cohen
- Department of Pharmaceutical Chemistry, University of California at San Francisco 94143-0446
| | | | | |
Collapse
|
28
|
Boissel J, Lee W, Presnell S, Cohen F, Bunn H. Erythropoietin structure-function relationships. Mutant proteins that test a model of tertiary structure. J Biol Chem 1993. [DOI: 10.1016/s0021-9258(18)82348-1] [Citation(s) in RCA: 94] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
|
29
|
Abstract
The pathway of folding of a protein will be completely solved when the structures and energetics of the initial unfolded states, all folding intermediates, all transition states and the final folded state, have been determined. The ultimate goal is to analyse, at the detail of individual residues, the non-covalent interactions that are primarily responsible for dictating secondary and tertiary structure. Until recently, the tools for tackling such a daunting task were quite inadequate, but recent developments in NMR and protein engineering have made it possible to determine crucial features in the folding process. It now seems feasible that sufficient experimental detail will be obtained to provide general principles that govern protein folding and provide the basis for its rigorous theoretical analysis. This lecture will outline the progress and prospects in attainment of the goals as applied to the small ribonuclease, barnase.
Collapse
Affiliation(s)
- A R Fersht
- Cambridge Centre for Protein Engineering, UK
| |
Collapse
|
30
|
Abstract
Prediction of protein secondary structure is an old problem and progress has been slow. Recently, spectacular success has been claimed in the blind prediction of the catalytic subunit of the cAMP-dependent protein kinase. When predictions in this and other test cases are assessed critically, some claims of prediction success turn out to be exaggerated, but a kernel of real progress remains: protein structure prediction can be improved substantially when a family of related sequences is available. Enough so that molecular biologists equipped with a new amino acid sequence and a multiple sequence alignment in hand may be tempted to test the new prediction methods.
Collapse
Affiliation(s)
- B Rost
- EMBL, Heidelberg, Germany
| | | | | |
Collapse
|
31
|
Fersht AR, Serrano L. Principles of protein stability derived from protein engineering experiments. Curr Opin Struct Biol 1993. [DOI: 10.1016/0959-440x(93)90205-y] [Citation(s) in RCA: 221] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
|
32
|
Abstract
Ten years of protein engineering have seen the synthesis of novel therapeutic agents and the analysis of the structure, activity, specificity, stability and folding pathways of proteins. It is hoped that protein engineering will eventually lead to the design of novel catalytic sites on either novel or existing proteins.
Collapse
Affiliation(s)
- A Fersht
- Cambridge University Chemical Laboratory, UK
| | | |
Collapse
|