1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Extension of the classical classification of β-turns. Sci Rep 2016; 6:33191. [PMID: 27627963 PMCID: PMC5024104 DOI: 10.1038/srep33191] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Accepted: 08/22/2016] [Indexed: 11/29/2022] Open
Abstract
The functional properties of a protein primarily depend on its three-dimensional (3D) structure. These properties have classically been assigned, visualized and analysed on the basis of protein secondary structures. The β-turn is the third most important secondary structure after helices and β-strands. β-turns have been classified according to the values of the dihedral angles φ and ψ of the central residue. Conventionally, eight different types of β-turns have been defined, whereas those that cannot be defined are classified as type IV β-turns. This classification remains the most widely used. Nonetheless, the miscellaneous type IV β-turns represent 1/3rd of β-turn residues. An unsupervised specific clustering approach was designed to search for recurrent new turns in the type IV category. The classical rules of β-turn type assignment were central to the approach. The four most frequently occurring clusters defined the new β-turn types. Unexpectedly, these types, designated IV1, IV2, IV3 and IV4, represent half of the type IV β-turns and occur more frequently than many of the previously established types. These types show convincing particularities, in terms of both structures and sequences that allow for the classical β-turn classification to be extended for the first time in 25 years.
Collapse
|
3
|
Ho HK, Zhang L, Ramamohanarao K, Martin S. A survey of machine learning methods for secondary and supersecondary protein structure prediction. Methods Mol Biol 2013; 932:87-106. [PMID: 22987348 DOI: 10.1007/978-1-62703-065-6_6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible.
Collapse
Affiliation(s)
- Hui Kian Ho
- Department of Computer Science and Software Engineering, University of Melbourne, National ICT Australia, Parkville, VIC, Australia
| | | | | | | |
Collapse
|
4
|
Fernandez-Fuentes N, Fiser A. A modular perspective of protein structures: application to fragment based loop modeling. Methods Mol Biol 2013; 932:141-58. [PMID: 22987351 PMCID: PMC3635063 DOI: 10.1007/978-1-62703-065-6_9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Proteins can be decomposed into supersecondary structure modules. We used a generic definition of supersecondary structure elements, so-called Smotifs, which are composed of two flanking regular secondary structures connected by a loop, to explore the evolution and current variety of structure building blocks. Here, we discuss recent observations about the saturation of Smotif geometries in protein structures and how it opens new avenues in protein structure modeling and design. As a first application of these observations we describe our loop conformation modeling algorithm, ArchPred that takes advantage of Smotifs classification. In this application, instead of focusing on specific loop properties the method narrows down possible template conformations in other, often not homologous structures, by identifying the most likely supersecondary structure environment that cradles the loop. Beyond identifying the correct starting supersecondary structure geometry, it takes into account information of fit of anchor residues, sterical clashes, match of predicted and observed dihedral angle preferences, and local sequence signal.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, University of Leeds, St. James's University Hospital, Leeds LS9 7TF, UK
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry Albert Einstein College of Medicine, 1301 Morris Park Ave, Bronx, NY 10461, USA
| |
Collapse
|
5
|
Hu C, Koehl P, Max N. PackHelix: a tool for helix-sheet packing during protein structure prediction. Proteins 2011; 79:2828-43. [PMID: 21905109 PMCID: PMC3172692 DOI: 10.1002/prot.23108] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2011] [Revised: 04/18/2011] [Accepted: 05/13/2011] [Indexed: 11/09/2022]
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Predicting the topology and constructing the geometry of structural motifs involving α-helices and/or β-strands are therefore key steps for accurate prediction of protein structure. While many efforts have focused on how to pack helices and on how to sample exhaustively the topologies and geometries of multiple strands forming a β-sheet in a protein, there has been little progress on generating native-like packings of helices on sheets. We describe a method that can generate the packing of multiple helices on a given β-sheet for αβα sandwich type protein folds. This method mines the results of a statistical analysis of the conformations of αβ(2) motifs in protein structures to provide input values for the geometric attributes of the packing of a helix on a sheet. It then proceeds with a geometric builder that generates multiple arrangements of the helices on the sheet of interest by sampling through these values and performing consistency checks that guarantee proper loop geometry between the helices and the strands, minimal number of collisions between the helices, and proper formation of a hydrophobic core. The method is implemented as a module of ProteinShop. Our results show that it produces structures that are within 4-6 Å RMSD of the native one, regardless of the number of helices that need to be packed, though this number may increase if the protein has several helices between two consecutive strands in the sequence that pack on the sheet formed by these two strands.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science, University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| | - Nelson Max
- Department of Computer Science, University of California, Davis, CA 95616
| |
Collapse
|
6
|
Abstract
The three-dimensional structure of a protein is organized around the packing of its secondary structure elements. Although much is known about the packing geometry observed between alpha-helices and between beta-sheets, there has been little progress on characterizing helix-sheet interactions. We present an analysis of the conformation of alphabeta(2) motifs in proteins, corresponding to all occurrences of helices in contact with two strands that are hydrogen bonded. The geometry of the alphabeta(2) motif is characterized by the azimuthal angle theta between the helix axis and an average vector representing the two strands, the elevation angle psi between the helix axis and the plane containing the two strands, and the distance D between the helix and the strands. We observe that the helix tends to align to the two strands, with a preference for an antiparallel orientation if the two strands are parallel; this preference is diminished for other topologies of the beta-sheet. Side-chain packing at the interface between the helix and the strands is mostly hydrophobic, with a preference for aliphatic amino acids in the strand and aromatic amino acids in the helix. From the knowledge of the geometry and amino acid propensities of alphabeta(2) motifs in proteins, we have derived different statistical potentials that are shown to be efficient in picking native-like conformations among a set of non-native conformations in well-known decoy datasets. The information on the geometry of alphabeta(2) motifs as well as the related statistical potentials have applications in the field of protein structure prediction.
Collapse
Affiliation(s)
- Chengcheng Hu
- Department of Computer Science University of California, Davis, CA 95616
| | - Patrice Koehl
- Department of Computer Science and Genome Center, University of California, Davis, CA 95616
| |
Collapse
|
7
|
Fernandez-Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol 2010; 6:e1000750. [PMID: 20421995 PMCID: PMC2858679 DOI: 10.1371/journal.pcbi.1000750] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 03/19/2010] [Indexed: 11/29/2022] Open
Abstract
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region. Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- University of Leeds, Leeds Institute of Molecular Medicine Section of Experimental Therapeutics, St. James's University Hospital, Leeds, United Kingdom
| | - Joseph M. Dybas
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
8
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009; 18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
9
|
Shi S, Chitturi B, Grishin NV. ProSMoS server: a pattern-based search using interaction matrix representation of protein structures. Nucleic Acids Res 2009; 37:W526-31. [PMID: 19420061 PMCID: PMC2703969 DOI: 10.1093/nar/gkp316] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Assessing structural similarity and defining common regions through comparison of protein spatial structures is an important task in functional and evolutionary studies of proteins. There are many servers that compare structures and define sub-structures in common between proteins through superposition and closeness of either coordinates or contacts. However, a natural way to analyze a structure for experts working on structure classification is to look for specific three-dimensional (3D) motifs and patterns instead of finding common features in two proteins. Such motifs can be described by the architecture and topology of major secondary structural elements (SSEs) without consideration of subtle differences in 3D coordinates. Despite the importance of motif-based structure searches, currently there is a shortage of servers to perform this task. Widely known TOPS does not fully address this problem, as it finds only topological match but does not take into account other important spatial properties, such as interactions and chirality. Here, we implemented our approach to protein structure pattern search (ProSMoS) as a web-server. ProSMoS converts 3D structure into an interaction matrix representation including the SSE types, handednesses of connections between SSEs, coordinates of SSE starts and ends, types of interactions between SSEs and beta-sheet definitions. For a user-defined structure pattern, ProSMoS lists all structures from a database that contain this pattern. ProSMoS server will be of interest to structural biologists who would like to analyze very general and distant structural similarities. The ProSMoS web server is available at: http://prodata.swmed.edu/ProSMoS/.
Collapse
Affiliation(s)
- Shuoyong Shi
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | |
Collapse
|
10
|
Paluszewski M, Winter P. Protein Decoy Generation Using Branch and Bound with Efficient Bounding. LECTURE NOTES IN COMPUTER SCIENCE 2008. [DOI: 10.1007/978-3-540-87361-7_32] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
11
|
Kurgan L, Kedarisetti KD. Sequence representation and prediction of protein secondary structure for structural motifs in twilight zone proteins. Protein J 2007; 25:463-74. [PMID: 17115254 DOI: 10.1007/s10930-006-9029-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Characterizing and classifying regularities in protein structure is an important element in uncovering the mechanisms that regulate protein structure, function and evolution. Recent research concentrates on analysis of structural motifs that can be used to describe larger, fold-sized structures based on homologous primary sequences. At the same time, accuracy of secondary protein structure prediction based on multiple sequence alignment drops significantly when low homology (twilight zone) sequences are considered. To this end, this paper addresses a problem of providing an alternative sequences representation that would improve ability to distinguish secondary structures for the twilight zone sequences without using alignment. We consider a novel classification problem, in which, structural motifs, referred to as structural fragments (SFs) are defined as uniform strand, helix and coil fragments. Classification of SFs allows to design novel sequence representations, and to investigate which other factors and prediction algorithms may result in the improved discrimination. Comprehensive experimental results show that statistically significant improvement in classification accuracy can be achieved by: (1) improving sequence representations, and (2) removing possible noise on the terminal residues in the SFs. Combining these two approaches reduces the error rate on average by 15% when compared to classification using standard representation and noisy information on the terminal residues, bringing the classification accuracy to over 70%. Finally, we show that certain prediction algorithms, such as neural networks and boosted decision trees, are superior to other algorithms.
Collapse
Affiliation(s)
- Lukasz Kurgan
- Electrical and Computer Engineering Department, University of Alberta, Edmonton, Alberta, Canada, T6G 2V4.
| | | |
Collapse
|
12
|
Shi S, Zhong Y, Majumdar I, Sri Krishna S, Grishin NV. Searching for three-dimensional secondary structural patterns in proteins with ProSMoS. Bioinformatics 2007; 23:1331-8. [PMID: 17384423 DOI: 10.1093/bioinformatics/btm121] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Many evolutionarily distant, but functionally meaningful links between proteins come to light through comparison of spatial structures. Most programs that assess structural similarity compare two proteins to each other and find regions in common between them. Structural classification experts look for a particular structural motif instead. Programs base similarity scores on superposition or closeness of either Cartesian coordinates or inter-residue contacts. Experts pay more attention to the general orientation of the main chain and mutual spatial arrangement of secondary structural elements. There is a need for a computational tool to find proteins with the same secondary structures, topological connections and spatial architecture, regardless of subtle differences in 3D coordinates. RESULTS We developed ProSMoS--a Protein Structure Motif Search program that emulates an expert. Starting from a spatial structure, the program uses previously delineated secondary structural elements. A meta-matrix of interactions between the elements (parallel or antiparallel) minding handedness of connections (left or right) and other features (e.g. element lengths and hydrogen bonds) is constructed prior to or during the searches. All structures are reduced to such meta-matrices that contain just enough information to define a protein fold, but this definition remains very general and deviations in 3D coordinates are tolerated. User supplies a meta-matrix for a structural motif of interest, and ProSMoS finds all proteins in the protein data bank (PDB) that match the meta-matrix. ProSMoS performance is compared to other programs and is illustrated on a beta-Grasp motif. A brief analysis of all beta-Grasp-containing proteins is presented. Program availability: ProSMoS is freely available for non-commercial use from ftp://iole.swmed.edu/pub/ProSMoS.
Collapse
Affiliation(s)
- Shuoyong Shi
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390-9050, USA
| | | | | | | | | |
Collapse
|
13
|
Abstract
Structure comparisons of all representative proteins have been done. Employing the relative root mean square deviation (RMSD) from native enables the assessment of the statistical significance of structure alignments of different lengths in terms of a Z-score. Two conclusions emerge: first, proteins with their native fold can be distinguished by their Z-score. Second and somewhat surprising, all small proteins up to 100 residues in length have significant structure alignments to other proteins in a different secondary structure and fold class; i.e. 24.0% of them have 60% coverage by a template protein with a RMSD below 3.5A and 6.0% have 70% coverage. If the restriction that we align proteins only having different secondary structure types is removed, then in a representative benchmark set of proteins of 200 residues or smaller, 93% can be aligned to a single template structure (with average sequence identity of 9.8%), with a RMSD less than 4A, and 79% average coverage. In this sense, the current Protein Data Bank (PDB) is almost a covering set of small protein structures. The length of the aligned region (relative to the whole protein length) does not differ among the top hit proteins, indicating that protein structure space is highly dense. For larger proteins, non-related proteins can cover a significant portion of the structure. Moreover, these top hit proteins are aligned to different parts of the target protein, so that almost the entire molecule can be covered when combined. The number of proteins required to cover a target protein is very small, e.g. the top ten hit proteins can give 90% coverage below a RMSD of 3.5A for proteins up to 320 residues long. These results give a new view of the nature of protein structure space, and its implications for protein structure prediction are discussed.
Collapse
Affiliation(s)
- Daisuke Kihara
- Center of Excellence in Bioinformatics, University at Buffalo, 901 Washington St, Suite 300, Buffalo, NY 14203, USA
| | | |
Collapse
|
14
|
Singh SK, Babu MM, Balaram P. Registering alpha-helices and beta-strands using backbone C-H...O interactions. Proteins 2003; 51:167-71. [PMID: 12660986 DOI: 10.1002/prot.10245] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The possible occurrence of a novel helix terminating structural motif in proteins involving a stabilizing short C-H...O interaction has been examined using a dataset of 634 non-homologous protein structures (<or=2.0 A). The search for this motif was prompted by the crystallographic characterization of a novel structural feature in crystals of a synthetic decapeptide in which extension of a Schellman motif led to the formation of a C-H...O hydrogen bond between the T-4 C(alpha)H and the T+1 C=O groups, where T is the helix terminator adopting a left handed (alpha(L)) conformation. More than 100 such motifs with backbone conformation superposing well with the peptide examples were identified. In several examples, formation of this motif led to an approximately antiparallel arrangement of a helical segment with an extended beta-strand. Careful examination of these examples suggested the possibility of registering antiparallel arrangement of helices and strands by means of backbone C-H...O interactions with a regular periodicity. Model building resulted in the generation of idealized alphabeta and betaalpha motifs, which can then be generalized to higher-order repetitive structures. Inspection of the antiparallel alphabeta motif revealed a significant propensity for Ser, Glu, and Gln residues at the T-4 position resulting in further stabilization using an O...H-N side-chain-backbone hydrogen bond. Modeling studies revealed ready accommodation of serine residues along the helix face that contacts the strand. The theoretically generated folds correspond to "open" polypeptide structures.
Collapse
Affiliation(s)
- S Kumar Singh
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | | |
Collapse
|
15
|
Iurcu-Mustata G, Van Belle D, Wintjens R, Prévost M, Rooman M. Role of salt bridges in homeodomains investigated by structural analyses and molecular dynamics simulations. Biopolymers 2001; 59:145-59. [PMID: 11391564 DOI: 10.1002/1097-0282(200109)59:3<145::aid-bip1014>3.0.co;2-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Homeodomains are a class of helix-turn-helix DNA-binding protein motifs that play an important role in the control of cellular development in eukaryotes. They fold in a three alpha-helix structural module, where the third helix is the recognition helix that fits into the major groove of DNA. Structural analysis of the members of the homeodomain family led to the identification of interactions likely to stabilize the protein domains. Linking the helices pairwise, three salt bridges were found to be well preserved within the family. Also well conserved were two cation-pi interactions between aromatic and positively charged side chains. To analyze the structural role of the salt bridges, molecular dynamics simulations (MD) were carried out on the wild-type homeodomain from the Drosophila paired protein (1fjl) and on three mutants, which lack one or two salt bridges and mimic natural mutations in other homeodomains. Analysis of the trajectories revealed only small structural rearrangements of the three helices in all MD simulations, thereby suggesting that the salt bridges have no essential stabilizing role at room temperature, but rather might be important for improving thermostability. The latter hypothesis is supported by a good correlation between the melting midpoint temperatures of several homeodomains and the number of salt bridges and cation-pi interactions that connect secondary structures.
Collapse
Affiliation(s)
- G Iurcu-Mustata
- Department of Biology and Biochemistry, University of Houston, Houston, TX 77204-5513, USA
| | | | | | | | | |
Collapse
|
16
|
Abstract
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.
Collapse
Affiliation(s)
- D Gilis
- Ingénierie Biomoléculaire, Université Libre de Bruxelles, Bruxelles, Belgium.
| | | |
Collapse
|
17
|
Abstract
By using an unsupervised cluster analyzer, we have identified a local structural alphabet composed of 16 folding patterns of five consecutive C(alpha) ("protein blocks"). The dependence that exists between successive blocks is explicitly taken into account. A Bayesian approach based on the relation protein block-amino acid propensity is used for prediction and leads to a success rate close to 35%. Sharing sequence windows associated with certain blocks into "sequence families" improves the prediction accuracy by 6%. This prediction accuracy exceeds 75% when keeping the first four predicted protein blocks at each site of the protein. In addition, two different strategies are proposed: the first one defines the number of protein blocks in each site needed for respecting a user-fixed prediction accuracy, and alternatively, the second one defines the different protein sites to be predicted with a user-fixed number of blocks and a chosen accuracy. This last strategy applied to the ubiquitin conjugating enzyme (alpha/beta protein) shows that 91% of the sites may be predicted with a prediction accuracy larger than 77% considering only three blocks per site. The prediction strategies proposed improve our knowledge about sequence-structure dependence and should be very useful in ab initio protein modelling.
Collapse
Affiliation(s)
- A G de Brevern
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U436, Université Paris 7, Paris, France.
| | | | | |
Collapse
|
18
|
Tsai CJ, Maizel JV, Nussinov R. Distinguishing between sequential and nonsequentially folded proteins: implications for folding and misfolding. Protein Sci 1999; 8:1591-604. [PMID: 10452603 PMCID: PMC2144423 DOI: 10.1110/ps.8.8.1591] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We describe here an algorithm for distinguishing sequential from nonsequentially folding proteins. Several experiments have recently suggested that most of the proteins that are synthesized in the eukaryotic cell may fold sequentially. This proposed folding mechanism in vivo is particularly advantageous to the organism. In the absence of chaperones, the probability that a sequentially folding protein will misfold is reduced significantly. The problem we address here is devising a procedure that would differentiate between the two types of folding patterns. Footprints of sequential folding may be found in structures where consecutive fragments of the chain interact with each other. In such cases, the folding complexity may be viewed as being lower. On the other hand, higher folding complexity suggests that at least a portion of the polypeptide backbone folds back upon itself to form three-dimensional (3D) interactions with noncontiguous portion(s) of the chain. Hence, we look at the mechanism of folding of the molecule via analysis of its complexity, that is, through the 3D interactions formed by contiguous segments on the polypeptide chain. To computationally splice the structure into consecutively interacting fragments, we either cut it into compact hydrophobic folding units or into a set of hypothetical, transient, highly populated, contiguous fragments ("building blocks" of the structure). In sequential folding, successive building blocks interact with each other from the amino to the carboxy terminus of the polypeptide chain. Consequently, the results of the parsing differentiate between sequentially vs. nonsequentially folded chains. The automated assessment of the folding complexity provides insight into both the likelihood of misfolding and the kinetic folding rate of the given protein. In terms of the funnel free energy landscape theory, a protein that truly follows the mechanism of sequential folding, in principle, encounters smoother free energy barriers. A simple sequentially folded protein should, therefore, be less error prone and fold faster than a protein with a complex folding pattern.
Collapse
Affiliation(s)
- C J Tsai
- Laboratory of Experimental and Computational Biology, NCI-FCRDC, Frederick, Maryland 21702, USA
| | | | | |
Collapse
|
19
|
Reddy BV, Nagarajaram HA, Blundell TL. Analysis of interactive packing of secondary structural elements in alpha/beta units in proteins. Protein Sci 1999; 8:573-86. [PMID: 10091660 PMCID: PMC2144285 DOI: 10.1110/ps.8.3.573] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
An alpha-helix and a beta-strand are said to be interactively packed if at least one residue in each of the secondary structural elements loses 10% of its solvent accessible contact area on association with the other secondary structural element. An analysis of all such 5,975 nonidentical alpha/beta units in protein structures, defined at < or = 2.5 A resolution, shows that the interaxial distance between the alpha-helix and the beta-strand is linearly correlated with the residue-dependent function, log[(V/nda)/n-int], where V is the volume of amino acid residues in the packing interface, nda is the normalized difference in solvent accessible contact area of the residues in packed and unpacked secondary structural elements, and n-int is the number of residues in the packing interface. The beta-sheet unit (beta u), defined as a pair of adjacent parallel or antiparallel hydrogen-bonded beta-strands, packing with an alpha-helix shows a better correlation between the interaxial distance and log(V/nda) for the residues in the packing interface. This packing relationship is shown to be useful in the prediction of interaxial distances in alpha/beta units using the interacting residue information of equivalent alpha/beta units of homologous proteins. It is, therefore, of value in comparative modeling of protein structures.
Collapse
Affiliation(s)
- B V Reddy
- Department of Biochemistry, University of Cambridge, United Kingdom
| | | | | |
Collapse
|