1
|
Caetano-Anollés G, Aziz MF, Mughal F, Caetano-Anollés D. Tracing protein and proteome history with chronologies and networks: folding recapitulates evolution. Expert Rev Proteomics 2021; 18:863-880. [PMID: 34628994 DOI: 10.1080/14789450.2021.1992277] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
INTRODUCTION While the origin and evolution of proteins remain mysterious, advances in evolutionary genomics and systems biology are facilitating the historical exploration of the structure, function and organization of proteins and proteomes. Molecular chronologies are series of time events describing the history of biological systems and subsystems and the rise of biological innovations. Together with time-varying networks, these chronologies provide a window into the past. AREAS COVERED Here, we review molecular chronologies and networks built with modern methods of phylogeny reconstruction. We discuss how chronologies of structural domain families uncover the explosive emergence of metabolism, the late rise of translation, the co-evolution of ribosomal proteins and rRNA, and the late development of the ribosomal exit tunnel; events that coincided with a tendency to shorten folding time. Evolving networks described the early emergence of domains and a late 'big bang' of domain combinations. EXPERT OPINION Two processes, folding and recruitment appear central to the evolutionary progression. The former increases protein persistence. The later fosters diversity. Chronologically, protein evolution mirrors folding by combining supersecondary structures into domains, developing translation machinery to facilitate folding speed and stability, and enhancing structural complexity by establishing long-distance interactions in novel structural and architectural designs.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA.,C. R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, USA
| | - Derek Caetano-Anollés
- Data Science Platform, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
2
|
Ghosh G, Panicker L. Protein-nanoparticle interactions and a new insight. SOFT MATTER 2021; 17:3855-3875. [PMID: 33885450 DOI: 10.1039/d0sm02050h] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
The study of protein-nanoparticle interactions provides knowledge about the bio-reactivity of nanoparticles, and creates a database of nanoparticles for applications in nanomedicine, nanodiagnosis, and nanotherapy. The problem arises when nanoparticles come in contact with physiological fluids such as plasma or serum, wherein they interact with the proteins (or other biomolecules). This interaction leads to the coating of proteins on the nanoparticle surface, mostly due to the electrostatic interaction, called 'corona'. These proteins are usually partially unfolded. The protein corona can deter nanoparticles from their targeted functionalities, such as drug/DNA delivery at the site and fluorescence tagging of diseased tissues. The protein corona also has many repercussions on cellular intake, inflammation, accumulation, degradation, and clearance of the nanoparticles from the body depending on the exposed part of the proteins. Hence, the protein-nanoparticle interaction and the configuration of the bound-proteins on the nanosurface need thorough investigation and understanding. Several techniques such as DLS and zeta potential measurement, UV-vis spectroscopy, fluorescence spectroscopy, circular dichroism, FTIR, and DSC provide valuable information in the protein-nanoparticle interaction study. Besides, theoretical simulations also provide additional understanding. Despite a lot of research publications, the fundamental question remained unresolved. Can we aim for the application of functional nanoparticles in medicine? A new insight, given by us, in this article assumes a reasonable solution to this crucial question.
Collapse
Affiliation(s)
- Goutam Ghosh
- UGC-DAE Consortium for Scientific Research, Mumbai Centre, Mumbai 400 085, India.
| | | |
Collapse
|
3
|
V K MA, Chandrasekaran VM, Pandurangan S. Protein Domain Level Cancer Drug Targets in the Network of MAPK Pathways. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2057-2065. [PMID: 29993692 DOI: 10.1109/tcbb.2018.2829507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Proteins in the MAPK pathways considered as potential drug targets for cancer treatment. Pathways along with the cross-talks increase their scope to view them as a network of MAPK pathways. Side effect causing targeted domains act as a proxy for drug targets due to its structural similarity and frequent reuse of their variants. We proposed to identify non-repeatable protein domains as the drug targets to disrupt the signal transduction than targeting the whole protein. Network based approach is used to understand the contribution of 52 domains in non-hub, non-essential, and intra-pathway cancerous nodes and to identify potential drug target domains. 34 distinct domains in the cancerous proteins are playing vital roles in making cancer as a complex disease and pose challenges to identify potential drug targets. Distribution of domain families follows the power law in the network. Single promiscuous domains are contributing to the formation of hubs like Pkinease, Pkinease Tyr, and Ras. Hub nodes are positively correlated with the domain coverage and targeting them would disrupt functional properties of the proteins. EIF 4EBP, alpha Kinase, Sel1, ROKNT, and KH 1 are the domains identified as potential domain targets for the disruption of the signaling mechanism involved in cancer.
Collapse
|
4
|
Basile W, Salvatore M, Bassot C, Elofsson A. Why do eukaryotic proteins contain more intrinsically disordered regions? PLoS Comput Biol 2019; 15:e1007186. [PMID: 31329574 PMCID: PMC6675126 DOI: 10.1371/journal.pcbi.1007186] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2018] [Revised: 08/01/2019] [Accepted: 06/14/2019] [Indexed: 12/12/2022] Open
Abstract
Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder? Intrinsic disorder is essential for various functions in eukaryotic cells and is a signature of eukaryotic proteins. Here, we try to understand the origin of the difference in disorder between eukaryotic and prokaryotic proteins. We show that eukaryotic proteins contain more extended linker regions and that these linker regions are significantly more disordered. Further, we show, for the first time, that the difference in disorder originates from a systematic difference in amino acid frequencies between eukaryotic and prokaryotic proteins. Three amino acids contribute to the difference in disorder; serine and proline are more abundant in eukaryotic linkers, while isoleucine is less frequent. These shifts in frequencies are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. Anyhow the widespread of the shifts in abundance indicates that the differences are ancient and caused be some yet not fully understood selective difference acting on eukaryotic and prokaryotic proteins.
Collapse
Affiliation(s)
- Walter Basile
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Marco Salvatore
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Claudio Bassot
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, Sweden
- Department of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
- Swedish e-Science Research Center (SeRC), Stockholm, Sweden
- * E-mail:
| |
Collapse
|
5
|
A global map of the protein shape universe. PLoS Comput Biol 2019; 15:e1006969. [PMID: 30978181 PMCID: PMC6481876 DOI: 10.1371/journal.pcbi.1006969] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 04/24/2019] [Accepted: 03/20/2019] [Indexed: 11/19/2022] Open
Abstract
Proteins are involved in almost all functions in a living cell, and functions of proteins are realized by their tertiary structures. Obtaining a global perspective of the variety and distribution of protein structures lays a foundation for our understanding of the building principle of protein structures. In light of the rapid accumulation of low-resolution structure data from electron tomography and cryo-electron microscopy, here we map and classify three-dimensional (3D) surface shapes of proteins into a similarity space. Surface shapes of proteins were represented with 3D Zernike descriptors, mathematical moment-based invariants, which have previously been demonstrated effective for biomolecular structure similarity search. In addition to single chains of proteins, we have also analyzed the shape space occupied by protein complexes. From the mapping, we have obtained various new insights into the relationship between shapes, main-chain folds, and complex formation. The unique view obtained from shape mapping opens up new ways to understand design principles, functions, and evolution of proteins. Proteins are the major molecules involved in almost all cellular processes. In this work, we present a novel mapping of protein shapes that represents the variety and the similarities of 3D shapes of proteins and their assemblies. This mapping provides various novel insights into protein shapes including determinant factors of protein 3D shapes, which enhance our understanding of the design principles of protein shapes. The mapping will also be a valuable resource for artificial protein design as well as references for classifying medium- to low-resolution protein structure images of determined by cryo-electron microscopy and tomography.
Collapse
|
6
|
Gorman SD, Sahu D, O'Rourke KF, Boehr DD. Assigning methyl resonances for protein solution-state NMR studies. Methods 2018; 148:88-99. [PMID: 29958930 DOI: 10.1016/j.ymeth.2018.06.010] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Revised: 06/16/2018] [Accepted: 06/18/2018] [Indexed: 10/28/2022] Open
Abstract
Solution-state NMR is an important tool for studying protein structure and function. The ability to probe methyl groups has substantially expanded the scope of proteins accessible by NMR spectroscopy, including facilitating study of proteins and complexes greater than 100 kDa in size. While the toolset for studying protein structure and dynamics by NMR continues to grow, a major rate-limiting step in these studies is the initial resonance assignments, especially for larger (>50 kDa) proteins. In this practical review, we present strategies to efficiently isotopically label proteins, delineate NMR pulse sequences that can be used to determine methyl resonance assignments in the presence and absence of backbone assignments, and outline computational methods for NMR data analysis. We use our experiences from assigning methyl resonances for the aromatic biosynthetic enzymes tryptophan synthase and chorismate mutase to provide advice for all stages of experimental set-up and data analysis.
Collapse
Affiliation(s)
- Scott D Gorman
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Debashish Sahu
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - Kathleen F O'Rourke
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA
| | - David D Boehr
- Department of Chemistry, The Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
7
|
Diversity in αβ and βα Loop Connections in TIM Barrel Proteins: Implications for Stability and Design of the Fold. Interdiscip Sci 2017; 10:805-812. [DOI: 10.1007/s12539-017-0250-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Revised: 06/16/2017] [Accepted: 07/01/2017] [Indexed: 11/25/2022]
|
8
|
Unfolding and inactivation of proteins by counterions in protein-nanoparticles interaction. Colloids Surf B Biointerfaces 2016; 145:194-200. [DOI: 10.1016/j.colsurfb.2016.04.053] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Revised: 04/15/2016] [Accepted: 04/30/2016] [Indexed: 12/13/2022]
|
9
|
r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements. ADV APPL PROBAB 2016. [DOI: 10.1017/s0001867800002056] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A stochastic model of a dynamic marker array in which markers could disappear, duplicate, and move relative to its original position is constructed to reflect on the nature of long DNA sequences. The sequence changes of deletions, duplications, and displacements follow the stochastic rules: (i) the original distribution of the marker array {…, X
−2, X
−1, X
0, X
1, X
2, …} is a Poisson process on the real line; (ii) each marker is replicated l times; replication or loss of marker points occur independently; (iii) each replicated point is independently and randomly displaced by an amount Y relative to its original position, with the Y displacements sampled from a continuous density g(y). Limiting distributions for the maximal and minimal statistics of the r-scan lengths (collection of distances between r + 1 successive markers) for the l-shift model are derived with the aid of the Chen-Stein method and properties of Poisson processes.
Collapse
|
10
|
Chen C, Karlin S. r-scan statistics of a Poisson process with events transformed by duplications, deletions, and displacements. ADV APPL PROBAB 2016. [DOI: 10.1239/aap/1189518639] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A stochastic model of a dynamic marker array in which markers could disappear, duplicate, and move relative to its original position is constructed to reflect on the nature of long DNA sequences. The sequence changes of deletions, duplications, and displacements follow the stochastic rules: (i) the original distribution of the marker array {…,X−2,X−1,X0,X1,X2, …} is a Poisson process on the real line; (ii) each marker is replicatedltimes; replication or loss of marker points occur independently; (iii) each replicated point is independently and randomly displaced by an amountYrelative to its original position, with theYdisplacements sampled from a continuous densityg(y). Limiting distributions for the maximal and minimal statistics of ther-scan lengths (collection of distances betweenr+ 1 successive markers) for thel-shift model are derived with the aid of the Chen-Stein method and properties of Poisson processes.
Collapse
|
11
|
Das BB, Park SH, Opella SJ. Membrane protein structure from rotational diffusion. BIOCHIMICA ET BIOPHYSICA ACTA 2015; 1848:229-45. [PMID: 24747039 PMCID: PMC4201901 DOI: 10.1016/j.bbamem.2014.04.002] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/05/2014] [Accepted: 04/02/2014] [Indexed: 02/02/2023]
Abstract
The motional averaging of powder pattern line shapes is one of the most fundamental aspects of sold-state NMR. Since membrane proteins in liquid crystalline phospholipid bilayers undergo fast rotational diffusion, all of the signals reflect the angles of the principal axes of their dipole-dipole and chemical shift tensors with respect to the axis defined by the bilayer normal. The frequency span and sign of the axially symmetric powder patterns that result from motional averaging about a common axis provide sufficient structural restraints for the calculation of the three-dimensional structure of a membrane protein in a phospholipid bilayer environment. The method is referred to as rotationally aligned (RA) solid-state NMR and demonstrated with results on full-length, unmodified membrane proteins with one, two, and seven trans-membrane helices. RA solid-state NMR is complementary to other solid-state NMR methods, in particular oriented sample (OS) solid-state NMR of stationary, aligned samples. Structural distortions of membrane proteins from the truncations of terminal residues and other sequence modifications, and the use of detergent micelles instead of phospholipid bilayers have also been demonstrated. Thus, it is highly advantageous to determine the structures of unmodified membrane proteins in liquid crystalline phospholipid bilayers under physiological conditions. RA solid-state NMR provides a general method for obtaining accurate and precise structures of membrane proteins under near-native conditions.
Collapse
Affiliation(s)
- Bibhuti B Das
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0307 USA
| | - Sang Ho Park
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0307 USA
| | - Stanley J Opella
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093-0307 USA.
| |
Collapse
|
12
|
Frueh DP. Practical aspects of NMR signal assignment in larger and challenging proteins. PROGRESS IN NUCLEAR MAGNETIC RESONANCE SPECTROSCOPY 2014; 78:47-75. [PMID: 24534088 PMCID: PMC3951217 DOI: 10.1016/j.pnmrs.2013.12.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2013] [Revised: 12/05/2013] [Accepted: 12/06/2013] [Indexed: 05/03/2023]
Abstract
NMR has matured into a technique routinely employed for studying proteins in near physiological conditions. However, applications to larger proteins are impeded by the complexity of the various correlation maps necessary to assign NMR signals. This article reviews the data analysis techniques traditionally employed for resonance assignment and describes alternative protocols necessary for overcoming challenges in large protein spectra. In particular, simultaneous analysis of multiple spectra may help overcome ambiguities or may reveal correlations in an indirect manner. Similarly, visualization of orthogonal planes in a multidimensional spectrum can provide alternative assignment procedures. We describe examples of such strategies for assignment of backbone, methyl, and nOe resonances. We describe experimental aspects of data acquisition for the related experiments and provide guidelines for preliminary studies. Focus is placed on large folded monomeric proteins and examples are provided for 37, 48, 53, and 81 kDa proteins.
Collapse
Affiliation(s)
- Dominique P Frueh
- Johns Hopkins University School of Medicine, Biophysics and Biophysical Chemistry, 725 N. Wolfe Street, 701 Hunterian, Baltimore, MD 21205-2105, United States.
| |
Collapse
|
13
|
Kannan L, Li H, Rubinstein B, Mushegian A. Models of gene gain and gene loss for probabilistic reconstruction of gene content in the last universal common ancestor of life. Biol Direct 2013; 8:32. [PMID: 24354654 PMCID: PMC3892064 DOI: 10.1186/1745-6150-8-32] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Accepted: 12/04/2013] [Indexed: 12/21/2022] Open
Abstract
Background The problem of probabilistic inference of gene content in the last common ancestor of several extant species with completely sequenced genomes is: for each gene that is conserved in all or some of the genomes, assign the probability that its ancestral gene was present in the genome of their last common ancestor. Results We have developed a family of models of gene gain and gene loss in evolution, and applied the maximum-likelihood approach that uses phylogenetic tree of prokaryotes and the record of orthologous relationships between their genes to infer the gene content of LUCA, the Last Universal Common Ancestor of all currently living cellular organisms. The crucial parameter, the ratio of gene losses and gene gains, was estimated from the data and was higher in models that take account of the number of in-paralogs in genomes than in models that treat gene presences and absences as a binary trait. Conclusion While the numbers of genes that are placed confidently into LUCA are similar in the ML methods and in previously published methods that use various parsimony-based approaches, the identities of genes themselves are different. Most of the models of either kind treat the genes found in many existing genomes in a similar way, assigning to them high probabilities of being ancestral (“high ancestrality”). The ML models are more likely than others to assign high ancestrality to the genes that are relatively rare in the present-day genomes. Reviewers This article was reviewed by Martijn A Huynen, Toni Gabaldón and Fyodor Kondrashov.
Collapse
Affiliation(s)
| | | | | | - Arcady Mushegian
- Stowers Institute for Medical Research, Kansas City, Missouri 64110, USA.
| |
Collapse
|
14
|
Szczesny P, Mykowiecka A, Pawłowski K, Grynberg M. Distinct protein classes in human red cell proteome revealed by similarity of phylogenetic profiles. PLoS One 2013; 8:e54471. [PMID: 23349899 PMCID: PMC3549994 DOI: 10.1371/journal.pone.0054471] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2012] [Accepted: 12/12/2012] [Indexed: 01/16/2023] Open
Abstract
The minimal set of proteins necessary to maintain a vertebrate cell forms an interesting core of cellular machinery. The known proteome of human red blood cell consists of about 1400 proteins. We treated this protein complement of one of the simplest human cells as a model and asked the questions on its function and origins. The proteome was mapped onto phylogenetic profiles, i.e. vectors of species possessing homologues of human proteins. A novel clustering approach was devised, utilising similarity in the phylogenetic spread of homologues as distance measure. The clustering based on phylogenetic profiles yielded several distinct protein classes differing in phylogenetic taxonomic spread, presumed evolutionary history and functional properties. Notably, small clusters of proteins common to vertebrates or Metazoa and other multicellular eukaryotes involve biological functions specific to multicellular organisms, such as apoptosis or cell-cell signaling, respectively. Also, a eukaryote-specific cluster is identified, featuring GTP-ase signalling and ubiquitination. Another cluster, made up of proteins found in most organisms, including bacteria and archaea, involves basic molecular functions such as oxidation-reduction and glycolysis. Approximately one third of erythrocyte proteins do not fall in any of the clusters, reflecting the complexity of protein evolution in comparison to our simple model. Basically, the clustering obtained divides the proteome into old and new parts, the former originating from bacterial ancestors, the latter from inventions within multicellular eukaryotes. Thus, the model human cell proteome appears to be made up of protein sets distinct in their history and biological roles. The current work shows that phylogenetic profiles concept allows protein clustering in a way relevant both to biological function and evolutionary history.
Collapse
Affiliation(s)
- Paweł Szczesny
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
- Department of Plant Molecular Biology, Institute of Experimental Plant Biology, University of Warsaw, Warsaw, Poland
| | | | - Krzysztof Pawłowski
- Faculty of Agriculture and Biology, Warsaw University of Life Sciences, Warsaw, Poland
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- * E-mail: (MG); (KP)
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
- * E-mail: (MG); (KP)
| |
Collapse
|
15
|
Barbas A, Popescu A, Frazão C, Arraiano CM, Fialho AM. Rossmann-fold motifs can confer multiple functions to metabolic enzymes: RNA binding and ribonuclease activity of a UDP-glucose dehydrogenase. Biochem Biophys Res Commun 2012; 430:218-24. [PMID: 23137539 DOI: 10.1016/j.bbrc.2012.10.091] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2012] [Accepted: 10/25/2012] [Indexed: 12/25/2022]
Abstract
Metabolic enzymes are usually characterized to have one specific function, and this is the case of UDP-glucose dehydrogenase that catalyzes the twofold NAD(+)-dependent oxidation of UDP-glucose into UDP-glucuronic acid. We have determined that this enzyme is also capable of participating in other cellular processes. Here, we report that the bacterial UDP-glucose dehydrogenase (UgdG) from Sphingomonas elodea ATCC 31461, which provides UDP-glucuronic acid for the synthesis of the exopolysaccharide gellan, is not only able to bind RNA but also acts as a ribonuclease. The ribonucleolytic activity occurs independently of the presence of NAD(+) and the RNA binding site does not coincide with the NAD(+) binding region. We have also performed the kinetics of interaction between UgdG and RNA. Moreover, computer analysis reveals that the N- and C-terminal domains of UgdG share structural features with ancient mitochondrial ribonucleases named MAR. MARs are present in lower eukaryotic microorganisms, have a Rossmannoid-fold and belong to the isochorismatase superfamily. This observation reinforces that the Rossmann structural motifs found in NAD(+)-dependent dehydrogenases can have a dual function working as a nucleotide cofactor binding domain and as a ribonuclease.
Collapse
Affiliation(s)
- Ana Barbas
- Instituto de Tecnologia Química e Biológica/Universidade Nova de Lisboa, Oeiras, Portugal
| | | | | | | | | |
Collapse
|
16
|
Velyvis A, Ruschak AM, Kay LE. An economical method for production of (2)H, (13)CH3-threonine for solution NMR studies of large protein complexes: application to the 670 kDa proteasome. PLoS One 2012; 7:e43725. [PMID: 22984438 PMCID: PMC3439479 DOI: 10.1371/journal.pone.0043725] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2012] [Accepted: 07/24/2012] [Indexed: 11/26/2022] Open
Abstract
NMR studies of very high molecular weight protein complexes have been greatly facilitated through the development of labeling strategies whereby 13CH3 methyl groups are introduced into highly deuterated proteins. Robust and cost-effective labeling methods are well established for all methyl containing amino acids with the exception of Thr. Here we describe an inexpensive biosynthetic strategy for the production of L-[α-2H; β−2H;γ-13C]-Thr that can then be directly added during protein expression to produce highly deuterated proteins with Thr methyl group probes of structure and dynamics. These reporters are particularly valuable, because unlike other methyl containing amino acids, Thr residues are localized predominantly to the surfaces of proteins, have unique hydrogen bonding capabilities, have a higher propensity to be found at protein nucleic acid interfaces and can play important roles in signaling pathways through phosphorylation. The utility of the labeling methodology is demonstrated with an application to the 670 kDa proteasome core particle, where high quality Thr 13C,1H correlation spectra are obtained that could not be generated from samples prepared with commercially available U-[13C,1H]-Thr.
Collapse
Affiliation(s)
- Algirdas Velyvis
- Departments of Molecular Genetics, Biochemistry, and Chemistry, University of Toronto, Toronto, Ontario, Canada
- * E-mail: (AV); (LEK)
| | - Amy M. Ruschak
- Departments of Molecular Genetics, Biochemistry, and Chemistry, University of Toronto, Toronto, Ontario, Canada
| | - Lewis E. Kay
- Departments of Molecular Genetics, Biochemistry, and Chemistry, University of Toronto, Toronto, Ontario, Canada
- Program in Molecular Structure and Function, Hospital for Sick Children, Toronto, Ontario, Canada
- * E-mail: (AV); (LEK)
| |
Collapse
|
17
|
The ecology of bacterial genes and the survival of the new. INTERNATIONAL JOURNAL OF EVOLUTIONARY BIOLOGY 2012; 2012:394026. [PMID: 22900231 PMCID: PMC3415099 DOI: 10.1155/2012/394026] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 04/21/2012] [Accepted: 06/26/2012] [Indexed: 11/18/2022]
Abstract
Much of the observed variation among closely related bacterial genomes is attributable to gains and losses of genes that are acquired horizontally as well as to gene duplications and larger amplifications. The genomic flexibility that results from these mechanisms certainly contributes to the ability of bacteria to survive and adapt in varying environmental challenges. However, the duplicability and transferability of individual genes imply that natural selection should operate, not only at the organismal level, but also at the level of the gene. Genes can be considered semiautonomous entities that possess specific functional niches and evolutionary dynamics. The evolution of bacterial genes should respond both to selective pressures that favor competition, mostly among orthologs or paralogs that may occupy the same functional niches, and cooperation, with the majority of other genes coexisting in a given genome. The relative importance of either type of selection is likely to vary among different types of genes, based on the functional niches they cover and on the tightness of their association with specific organismal lineages. The frequent availability of new functional niches caused by environmental changes and biotic evolution should enable the constant diversification of gene families and the survival of new lineages of genes.
Collapse
|
18
|
Hansen AL, Lundström P, Velyvis A, Kay LE. Quantifying millisecond exchange dynamics in proteins by CPMG relaxation dispersion NMR using side-chain 1H probes. J Am Chem Soc 2012; 134:3178-89. [PMID: 22300166 DOI: 10.1021/ja210711v] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
A Carr-Purcell-Meiboom-Gill relaxation dispersion experiment is presented for quantifying millisecond time-scale chemical exchange at side-chain (1)H positions in proteins. Such experiments are not possible in a fully protonated molecule because of magnetization evolution from homonuclear scalar couplings that interferes with the extraction of accurate transverse relaxation rates. It is shown, however, that by using a labeling strategy whereby proteins are produced using {(13)C,(1)H}-glucose and D(2)O a significant number of 'isolated' side-chain (1)H spins are generated, eliminating such effects. It thus becomes possible to record (1)H dispersion profiles at the β positions of Asx, Cys, Ser, His, Phe, Tyr, and Trp as well as the γ positions of Glx, in addition to the methyl side-chain moieties. This brings the total of amino acid side-chain positions that can be simultaneously probed using a single (1)H dispersion experiment to 16. The utility of the approach is demonstrated with an application to the four-helix bundle colicin E7 immunity protein, Im7, which folds via a partially structured low populated intermediate that interconverts with the folded, ground state on the millisecond time-scale. The extracted (1)H chemical shift differences at side-chain positions provide valuable restraints in structural studies of invisible, excited states, complementing backbone chemical shifts that are available from existing relaxation dispersion experiments.
Collapse
Affiliation(s)
- Alexandar L Hansen
- Department of Molecular Genetics, The University of Toronto, Toronto, Ontario M5S 1A8, Canada
| | | | | | | |
Collapse
|
19
|
Kinjo AR, Nakamura H. Functional structural motifs for protein-ligand, protein-protein, and protein-nucleic acid interactions and their connection to supersecondary structures. Methods Mol Biol 2012; 932:295-315. [PMID: 22987360 DOI: 10.1007/978-1-62703-065-6_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
Protein functions are mediated by interactions between proteins and other molecules. One useful approach to analyze protein functions is to compare and classify the structures of interaction interfaces of proteins. Here, we describe the procedures for compiling a database of interface structures and efficiently comparing the interface structures. To do so requires a good understanding of the data structures of the Protein Data Bank (PDB). Therefore, we also provide a detailed account of the PDB exchange dictionary necessary for extracting data that are relevant for analyzing interaction interfaces and secondary structures. We identify recurring structural motifs by classifying similar interface structures, and we define a coarse-grained representation of supersecondary structures (SSS) which represents a sequence of two or three secondary structure elements including their relative orientations as a string of four to seven letters. By examining the correspondence between structural motifs and SSS strings, we show that no SSS string has particularly high propensity to be found interaction interfaces in general, indicating any SSS can be used as a binding interface. When individual structural motifs are examined, there are some SSS strings that have high propensity for particular groups of structural motifs. In addition, it is shown that while the SSS strings found in particular structural motifs for nonpolymer and protein interfaces are as abundant as in other structural motifs that belong to the same subunit, structural motifs for nucleic acid interfaces exhibit somewhat stronger preference for SSS strings. In regard to protein folds, many motif-specific SSS strings were found across many folds, suggesting that SSS may be a useful description to investigate the universality of ligand binding modes.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Osaka, Japan
| | | |
Collapse
|
20
|
Hansen AL, Kay LE. Quantifying millisecond time-scale exchange in proteins by CPMG relaxation dispersion NMR spectroscopy of side-chain carbonyl groups. JOURNAL OF BIOMOLECULAR NMR 2011; 50:347-55. [PMID: 21681650 DOI: 10.1007/s10858-011-9520-6] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2011] [Accepted: 05/19/2011] [Indexed: 05/21/2023]
Abstract
A new pulse sequence is presented for the measurement of relaxation dispersion profiles quantifying millisecond time-scale exchange dynamics of side-chain carbonyl groups in uniformly (13)C labeled proteins. The methodology has been tested using the 87-residue colicin E7 immunity protein, Im7, which is known to fold via a partially structured low populated intermediate that interconverts with the folded, ground state on the millisecond time-scale. Comparison of exchange parameters extracted for this folding 'reaction' using the present methodology with those obtained from more 'traditional' (15)N and backbone carbonyl probes establishes the utility of the approach. The extracted excited state side-chain carbonyl chemical shifts indicate that the Asx/Glx side-chains are predominantly unstructured in the Im7 folding intermediate. However, several crucial salt-bridges that exist in the native structure appear to be already formed in the excited state, either in part or in full. This information, in concert with that obtained from existing backbone and side-chain methyl relaxation dispersion experiments, will ultimately facilitate a detailed description of the structure of the Im7 folding intermediate.
Collapse
Affiliation(s)
- Alexandar L Hansen
- Departments of Molecular Genetics, Biochemistry and Chemistry, The University of Toronto, Toronto, ON, M5S 1A8, Canada
| | | |
Collapse
|
21
|
Tarrío R, Ayala FJ, Rodríguez-Trelles F. The Vein Patterning 1 (VEP1) gene family laterally spread through an ecological network. PLoS One 2011; 6:e22279. [PMID: 21818306 PMCID: PMC3144213 DOI: 10.1371/journal.pone.0022279] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2011] [Accepted: 06/18/2011] [Indexed: 11/23/2022] Open
Abstract
Lateral gene transfer (LGT) is a major evolutionary mechanism in prokaryotes. Knowledge about LGT— particularly, multicellular— eukaryotes has only recently started to accumulate. A widespread assumption sees the gene as the unit of LGT, largely because little is yet known about how LGT chances are affected by structural/functional features at the subgenic level. Here we trace the evolutionary trajectory of VEin Patterning 1, a novel gene family known to be essential for plant development and defense. At the subgenic level VEP1 encodes a dinucleotide-binding Rossmann-fold domain, in common with members of the short-chain dehydrogenase/reductase (SDR) protein family. We found: i) VEP1 likely originated in an aerobic, mesophilic and chemoorganotrophic α-proteobacterium, and was laterally propagated through nets of ecological interactions, including multiple LGTs between phylogenetically distant green plant/fungi-associated bacteria, and five independent LGTs to eukaryotes. Of these latest five transfers, three are ancient LGTs, implicating an ancestral fungus, the last common ancestor of land plants and an ancestral trebouxiophyte green alga, and two are recent LGTs to modern embryophytes. ii) VEP1's rampant LGT behavior was enabled by the robustness and broad utility of the dinucleotide-binding Rossmann-fold, which provided a platform for the evolution of two unprecedented departures from the canonical SDR catalytic triad. iii) The fate of VEP1 in eukaryotes has been different in different lineages, being ubiquitous and highly conserved in land plants, whereas fungi underwent multiple losses. And iv) VEP1-harboring bacteria include non-phytopathogenic and phytopathogenic symbionts which are non-randomly distributed with respect to the type of harbored VEP1 gene. Our findings suggest that VEP1 may have been instrumental for the evolutionary transition of green plants to land, and point to a LGT-mediated ‘Trojan Horse’ mechanism for the evolution of bacterial pathogenesis against plants. VEP1 may serve as tool for revealing microbial interactions in plant/fungi-associated environments.
Collapse
Affiliation(s)
- Rosa Tarrío
- Universidad de Santiago de Compostela, CIBERER, Genome Medicine Group, Santiago de Compostela, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco J. Ayala
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Francisco Rodríguez-Trelles
- Grup de Biologia Evolutiva, Departament de Genètica i de Microbiologia, Universitat Autònoma de Barcelona, Barcelona, Spain
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| |
Collapse
|
22
|
Staritzbichler R, Anselmi C, Forrest LR, Faraldo-Gómez JD. GRIFFIN: A versatile methodology for optimization of protein-lipid interfaces for membrane protein simulations. J Chem Theory Comput 2011; 7:1167-1176. [PMID: 24707227 PMCID: PMC3972769 DOI: 10.1021/ct100576m] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
As new atomic structures of membrane proteins are resolved, they reveal increasingly complex transmembrane topologies, and highly irregular surfaces with crevices and pores. In many cases, specific interactions formed with the lipid membrane are functionally crucial, as is the overall lipid composition. Compounded with increasing protein size, these characteristics pose a challenge for the construction of simulation models of membrane proteins in lipid environments; clearly, that these models are sufficiently realistic bears upon the reliability of simulation-based studies of these systems. Here, we introduce GRIFFIN, which uses a versatile framework to automate and improve a widely-used membrane-embedding protocol. Initially, GRIFFIN carves out lipid and water molecules from a volume equivalent to that of the protein, so as to conserve the system density. In the subsequent optimization phase GRIFFIN adds an implicit grid-based protein force-field to a molecular dynamics simulation of the pre-carved membrane. In this force-field, atoms inside the implicit protein volume experience an outward force that will expel them from that volume, whereas those outside are subject to electrostatic and van-der-Waals interactions with the implicit protein. At each step of the simulation, these forces are updated by GRIFFIN and combined with the intermolecular forces of the explicit lipid-water system. This procedure enables the construction of realistic and reproducible starting configurations of the protein-membrane interface within a reasonable timeframe and with minimal intervention. GRIFFIN is a standalone tool designed to work alongside any existing molecular dynamics package, such as NAMD or GROMACS.
Collapse
Affiliation(s)
- René Staritzbichler
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Claudio Anselmi
- Theoretical Molecular Biophysics Group, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - Lucy R. Forrest
- Computational Structural Biology Group, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| | - José D. Faraldo-Gómez
- Theoretical Molecular Biophysics Group, Max Planck Institute of Biophysics, Frankfurt am Main, Germany
| |
Collapse
|
23
|
Nolan MJ, Hofmann A, Jex AR, Gasser RB. A theoretical study to establish the relationship between the three-dimensional structure of triose-phosphate isomerase of Giardia duodenalis and point mutations in the respective gene. Mol Cell Probes 2010; 24:281-5. [DOI: 10.1016/j.mcp.2010.06.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2010] [Revised: 06/01/2010] [Accepted: 06/03/2010] [Indexed: 11/15/2022]
|
24
|
Neumann S, Fuchs A, Mulkidjanian A, Frishman D. Current status of membrane protein structure classification. Proteins 2010; 78:1760-73. [PMID: 20186977 DOI: 10.1002/prot.22692] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
For over 2 decades, continuous efforts to organize the jungle of available protein structures have been underway. Although a number of discrepancies between different classification approaches for soluble proteins have been reported, the classification of membrane proteins has so far not been comparatively studied because of the limited amount of available structural data. Here, we present an analysis of alpha-helical membrane protein classification in the SCOP and CATH databases. In the current set of 63 alpha-helical membrane protein chains having between 1 and 13 transmembrane helices, we observed a number of differently classified proteins both regarding their domain and fold assignment. The majority of all discrepancies affect single transmembrane helix, two helix hairpin, and four helix bundle domains, while domains with more than five helices are mostly classified consistently between SCOP and CATH. It thus appears that the structural constraints imposed by the lipid bilayer complicate the classification of membrane proteins with only few membrane-spanning regions. This problem seems to be specific for membrane proteins as soluble four helix bundles, not restrained by the membrane, are more consistently classified by SCOP and CATH. Our findings indicate that the structural space of small membrane helix bundles is highly continuous such that even minor differences in individual classification procedures may lead to a significantly different classification. Membrane proteins with few helices and limited structural diversity only seem to be reasonably classifiable if the definition of a fold is adapted to include more fine-grained structural features such as helix-helix interactions and reentrant regions.
Collapse
Affiliation(s)
- Sindy Neumann
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, D-85354 Freising, Germany
| | | | | | | |
Collapse
|
25
|
Wu CH, Das BB, Opella SJ. (1)H-(13)C Hetero-nuclear dipole-dipole couplings of methyl groups in stationary and magic angle spinning solid-state NMR experiments of peptides and proteins. JOURNAL OF MAGNETIC RESONANCE (SAN DIEGO, CALIF. : 1997) 2010; 202:127-34. [PMID: 19896874 PMCID: PMC2888030 DOI: 10.1016/j.jmr.2009.10.007] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/17/2009] [Revised: 10/16/2009] [Accepted: 10/16/2009] [Indexed: 05/16/2023]
Abstract
(13)C NMR of isotopically labeled methyl groups has the potential to combine spectroscopic simplicity with ease of labeling for protein NMR studies. However, in most high resolution separated local field experiments, such as polarization inversion spin exchange at the magic angle (PISEMA), that are used to measure (1)H-(13)C hetero-nuclear dipolar couplings, the four-spin system of the methyl group presents complications. In this study, the properties of the (1)H-(13)C hetero-nuclear dipolar interactions of (13)C-labeled methyl groups are revealed through solid-state NMR experiments on a range of samples, including single crystals, stationary powders, and magic angle spinning of powders, of (13)C(3) labeled alanine alone and incorporated into a protein. The spectral simplifications resulting from proton detected local field (PDLF) experiments are shown to enhance resolution and simplify the interpretation of results on single crystals, magnetically aligned samples, and powders. The complementarity of stationary sample and magic angle spinning (MAS) measurements of dipolar couplings is demonstrated by applying polarization inversion spin exchange at the magic angle and magic angle spinning (PISEMAMAS) to unoriented samples.
Collapse
|
26
|
Betaalpha-hairpin clamps brace betaalphabeta modules and can make substantive contributions to the stability of TIM barrel proteins. PLoS One 2009; 4:e7179. [PMID: 19787060 PMCID: PMC2747017 DOI: 10.1371/journal.pone.0007179] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2009] [Accepted: 08/30/2009] [Indexed: 11/24/2022] Open
Abstract
Non-local hydrogen bonding interactions between main chain amide hydrogen atoms and polar side chain acceptors that bracket consecutive βα or αβ elements of secondary structure in αTS from E. coli, a TIM barrel protein, have previously been found to contribute 4–6 kcal mol−1 to the stability of the native conformation. Experimental analysis of similar βα-hairpin clamps in a homologous pair of TIM barrel proteins of low sequence identity, IGPS from S. solfataricus and E. coli, reveals that this dramatic enhancement of stability is not unique to αTS. A survey of 71 TIM barrel proteins demonstrates a 4-fold symmetry for the placement of βα-hairpin clamps, bracing the fundamental βαβ building block and defining its register in the (βα)8 motif. The preferred sequences and locations of βα-hairpin clamps will enhance structure prediction algorithms and provide a strategy for engineering stability in TIM barrel proteins.
Collapse
|
27
|
Valas RE, Yang S, Bourne PE. Nothing about protein structure classification makes sense except in the light of evolution. Curr Opin Struct Biol 2009; 19:329-34. [PMID: 19394812 DOI: 10.1016/j.sbi.2009.03.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Revised: 02/19/2009] [Accepted: 03/16/2009] [Indexed: 12/27/2022]
Abstract
In this, the 200th anniversary of Charles Darwin's birth and the 150th anniversary of the publication of the Origin of Species, it is fitting to revisit the classification of protein structures from an evolutionary perspective. Existing classifications use homologous sequence relationships, but knowing that structure is much more conserved that sequence creates an iterative loop from which structures can be further classified beyond that of the domain, thereby teasing out distant evolutionary relationships. The desired classification scheme is then one in which a fold is merely semantics and structure can be classified as either ancestral or derived.
Collapse
Affiliation(s)
- Ruben E Valas
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093-0743, USA
| | | | | |
Collapse
|
28
|
Pascual-García A, Abia D, Ortiz ÁR, Bastolla U. Cross-over between discrete and continuous protein structure space: insights into automatic classification and networks of protein structures. PLoS Comput Biol 2009; 5:e1000331. [PMID: 19325884 PMCID: PMC2654728 DOI: 10.1371/journal.pcbi.1000331] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2008] [Accepted: 02/11/2009] [Indexed: 11/19/2022] Open
Abstract
Structural classifications of proteins assume the existence of the fold, which is an intrinsic equivalence class of protein domains. Here, we test in which conditions such an equivalence class is compatible with objective similarity measures. We base our analysis on the transitive property of the equivalence relationship, requiring that similarity of A with B and B with C implies that A and C are also similar. Divergent gene evolution leads us to expect that the transitive property should approximately hold. However, if protein domains are a combination of recurrent short polypeptide fragments, as proposed by several authors, then similarity of partial fragments may violate the transitive property, favouring the continuous view of the protein structure space. We propose a measure to quantify the violations of the transitive property when a clustering algorithm joins elements into clusters, and we find out that such violations present a well defined and detectable cross-over point, from an approximately transitive regime at high structure similarity to a regime with large transitivity violations and large differences in length at low similarity. We argue that protein structure space is discrete and hierarchic classification is justified up to this cross-over point, whereas at lower similarities the structure space is continuous and it should be represented as a network. We have tested the qualitative behaviour of this measure, varying all the choices involved in the automatic classification procedure, i.e., domain decomposition, alignment algorithm, similarity score, and clustering algorithm, and we have found out that this behaviour is quite robust. The final classification depends on the chosen algorithms. We used the values of the clustering coefficient and the transitivity violations to select the optimal choices among those that we tested. Interestingly, this criterion also favours the agreement between automatic and expert classifications. As a domain set, we have selected a consensus set of 2,890 domains decomposed very similarly in SCOP and CATH. As an alignment algorithm, we used a global version of MAMMOTH developed in our group, which is both rapid and accurate. As a similarity measure, we used the size-normalized contact overlap, and as a clustering algorithm, we used average linkage. The resulting automatic classification at the cross-over point was more consistent than expert ones with respect to the structure similarity measure, with 86% of the clusters corresponding to subsets of either SCOP or CATH superfamilies and fewer than 5% containing domains in distinct folds according to both SCOP and CATH. Almost 15% of SCOP superfamilies and 10% of CATH superfamilies were split, consistent with the notion of fold change in protein evolution. These results were qualitatively robust for all choices that we tested, although we did not try to use alignment algorithms developed by other groups. Folds defined in SCOP and CATH would be completely joined in the regime of large transitivity violations where clustering is more arbitrary. Consistently, the agreement between SCOP and CATH at fold level was lower than their agreement with the automatic classification obtained using as a clustering algorithm, respectively, average linkage (for SCOP) or single linkage (for CATH). The networks representing significant evolutionary and structural relationships between clusters beyond the cross-over point may allow us to perform evolutionary, structural, or functional analyses beyond the limits of classification schemes. These networks and the underlying clusters are available at http://ub.cbm.uam.es/research/ProtNet.php Making order of the fast-growing information on proteins is essential for gaining evolutionary and functional knowledge. The most successful approaches to this task are based on classifications of protein structures, such as SCOP and CATH, which assume a discrete view of the protein structure space as a collection of separated equivalence classes (folds). However, several authors proposed that protein domains should be regarded as assemblies of polypeptide fragments, which implies that the protein–structure space is continuous. Here, we assess these views of domain space through the concept of transitivity; i.e., we test whether structure similarity of A with B and B with C implies that A and C are similar, as required for consistent classification. We find that the domain space is approximately transitive and discrete at high similarity and continuous at low similarity, where transitivity is severely violated. Comparing our classification at the cross-over similarity with CATH and SCOP, we find that they join proteins at low similarity where classification is inconsistent. Part of this discrepancy is due to structural divergence of homologous domains, which are forced to be in a single cluster in CATH and SCOP. Structural and evolutionary relationships between consistent clusters are represented as a network in our approach, going beyond current protein classification schemes. We conjecture that our results are related to a change of evolutionary regime, from uniparental divergent evolution for highly related domains to assembly of large fragments for which the classical tree representation is unsuitable.
Collapse
Affiliation(s)
| | - David Abia
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ángel R. Ortiz
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
| | - Ugo Bastolla
- Centro de Biología Molecular ‘Severo Ochoa’ (CSIC-UAM), Cantoblanco, Madrid, Spain
- * E-mail:
| |
Collapse
|
29
|
Abstract
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Collapse
|
30
|
Abstract
Submolecular details of Azotobacter vinelandii apoflavodoxin (apoFD) (un)folding are revealed by time-resolved fluorescence anisotropy using wild-type protein and variants lacking one or two of apoFD's three tryptophans. ApoFD equilibrium (un)folding by guanidine hydrochloride follows a three-state model: native <--> unfolded <--> intermediate. In native protein, W128 is a sink for Förster resonance energy transfer (FRET). Consequently, unidirectional FRET with a 50-ps transfer correlation time occurs from W167 to W128. FRET from W74 to W167 is much slower (6.9 ns). In the intermediate, W128 and W167 have native-like geometry because the 50-ps transfer time is observed. However, non-native structure exists between W74 and W167 because instead of 6.9 ns the transfer correlation time is 2.0 ns. In unfolded apoFD this 2.0-ns transfer correlation time is also detected. This decrease in transfer correlation time is a result of W74 and W167 becoming solvent accessible and randomly oriented toward one another. Apparently W74 and W167 are near-natively separated in the folding intermediate and in unfolded apoFD. Both tryptophans may actually be slightly closer in space than in the native state, even though apoFD's radius increases substantially upon unfolding. In unfolded apoFD the 50-ps transfer time observed for native and intermediate folding states becomes 200 ps as W128 and W167 are marginally further separated than in the native state. Apparently, apoFD's unfolded state is not a featureless statistical coil but contains well-defined substructures. The approach presented is a powerful tool to study protein folding.
Collapse
|
31
|
Shih SCC, Stoica I, Goto NK. Investigation of the utility of selective methyl protonation for determination of membrane protein structures. JOURNAL OF BIOMOLECULAR NMR 2008; 42:49-58. [PMID: 18762867 DOI: 10.1007/s10858-008-9263-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Revised: 07/08/2008] [Accepted: 07/21/2008] [Indexed: 05/26/2023]
Abstract
Polytopic alpha-helical membrane proteins present one of the final frontiers for protein structural biology, with significant challenges causing severe under-representation in the protein structure databank. However, with the advent of hardware and methodology geared to the study of large molecular weight complexes, solution NMR is being increasingly considered as a tool for structural studies of these types of membrane proteins. One method that has the potential to facilitate these studies utilizes uniformly deuterated samples with protons reintroduced at one or two methyl groups of leucine, valine and isoleucine. In this work we demonstrate that in spite of the increased proportion of these amino acids in membrane proteins, the quality of structures that can be obtained from this strategy is similar to that obtained for all alpha-helical water soluble proteins. This is partly attributed to the observation that NOEs between residues within the transmembrane helix did not have an impact on structure quality. Instead the most important factors controlling structure accuracy were the strength of dihedral angle restraints imposed and the number of unique inter-helical pairs of residues constrained by NOEs. Overall these results suggest that the most accurate structures will arise from accurate identification of helical segments and utilization of inter-helical distance restraints from various sources to maximize the distribution of long-range restraints.
Collapse
Affiliation(s)
- Steve C C Shih
- Department of Chemistry, University of Ottawa, 10 Marie Curie, Ottawa, ON, Canada, K1N 6N5
| | | | | |
Collapse
|
32
|
Hartling J, Kim J. Mutational robustness and geometrical form in protein structures. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2008; 310:216-26. [PMID: 17973270 DOI: 10.1002/jez.b.21203] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Theoretical studies of RNA and lattice protein models suggest that mutationally robust or the so-called designable phenotypes tend to have special geometric features such as being more compact and more geometrically regular. Such geometrical forms have been also linked to speed of folding and stability properties that may also assist in promoting mutational robustness. Here we test these theoretical predictions on a non-redundant collection of 2,660 experimentally determined structures from the PDB (Protein Data Bank) and CATH (Class Architecture Topology Homologous superfamily) database. We first developed an index summarizing the geometrical regularity of the structures and then used this index to show that the statistical pattern of empirical data is consistent with the theoretical predictions relating geometry to mutational robustness. Mutationally robust proteins tend to be more symmetric and compact. But, the relationship between compactness and robustness cannot be explained simply by the geometrical packing of individual amino acids in proteins; rather, it is the property of the whole system that is related to the statistical characteristics of the folding landscape. Finally, we hypothesize that a triplet relationship between mutational robustness, stability and form is a general properties of objects that optimize real-valued relationships between sequences and discrete structures.
Collapse
Affiliation(s)
- Julia Hartling
- Department of Ecology and Evolutionary Biology, Yale University, New Haven, Connecticut, USA
| | | |
Collapse
|
33
|
Mechanisms for stabilisation and the maintenance of solubility in proteins from thermophiles. BMC STRUCTURAL BIOLOGY 2007; 7:18. [PMID: 17394655 PMCID: PMC1851960 DOI: 10.1186/1472-6807-7-18] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 03/29/2007] [Indexed: 01/26/2023]
Abstract
Background The database of protein structures contains representatives from organisms with a range of growth temperatures. Various properties have been studied in a search for the molecular basis of protein adaptation to higher growth temperature. Charged groups have emerged as key distinguishing factors for proteins from thermophiles and mesophiles. Results A dataset of 291 thermophile-derived protein structures is compared with mesophile proteins. Calculations of electrostatic interactions support the importance of charges, but indicate that increases in charge contribution to folded state stabilisation do not generally correlate with the numbers of charged groups. Relative propensities of charged groups vary, such as the substitution of glutamic for aspartic acid sidechains. Calculations suggest an energetic basis, with less dehydration for longer sidechains. Most other properties studied show weak or insignificant separation of proteins from moderate thermophiles or hyperthermophiles and mesophiles, including an estimate of the difference in sidechain rotameric entropy upon protein folding. An exception is increased burial of alanine and proline residues and decreased burial of phenylalanine, methionine, tyrosine and tryptophan in hyperthermophile proteins compared to those from mesophiles. Conclusion Since an increase in the number of charged groups for hyperthermophile proteins is separable from charged group contribution to folded state stability, we hypothesise that charged group propensity is important in the context of protein solubility and the prevention of aggregation. Accordingly we find some separation between mesophile and hyperthermophile proteins when looking at the largest surface patch that does not contain a charged sidechain. With regard to our observation that aromatic sidechains are less buried in hyperthermophile proteins, further analysis indicates that the placement of some of these groups may facilitate the reduction of folding fluctuations in proteins of the higher growth temperature organisms.
Collapse
|
34
|
Baker ML, Ju T, Chiu W. Identification of secondary structure elements in intermediate-resolution density maps. Structure 2007; 15:7-19. [PMID: 17223528 PMCID: PMC1810566 DOI: 10.1016/j.str.2006.11.008] [Citation(s) in RCA: 134] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2006] [Revised: 11/10/2006] [Accepted: 11/18/2006] [Indexed: 11/25/2022]
Abstract
An increasing number of structural studies of large macromolecular complexes, both in X-ray crystallography and cryo-electron microscopy, have resulted in intermediate-resolution (5-10 A) density maps. Despite being limited in resolution, significant structural and functional information may be extractable from these maps. To aid in the analysis and annotation of these complexes, we have developed SSEhunter, a tool for the quantitative detection of alpha helices and beta sheets. Based on density skeletonization, local geometry calculations, and a template-based search, SSEhunter has been tested and validated on a variety of simulated and authentic subnanometer-resolution density maps. The result is a robust, user-friendly approach that allows users to quickly visualize, assess, and annotate intermediate-resolution density maps. Beyond secondary structure element identification, the skeletonization algorithm in SSEhunter provides secondary structure topology, which is potentially useful in leading to structural models of individual molecular components directly from the density.
Collapse
Affiliation(s)
- Matthew L. Baker
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030
| | - Tao Ju
- Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO 63130
| | - Wah Chiu
- National Center for Macromolecular Imaging, Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030
- *Corresponding author , Phone: 713-798-6985, Fax: 713-798-8682
| |
Collapse
|
35
|
Abstract
We develop models of the divergent evolution of genomes; the elementary object of sequence dynamics is the protein structural domain. To identify patterns of organization that reflect mechanisms of evolution, we consider the individual genomes of many procaryote species, studying the arrangement of protein structural domains in the space of all polypeptide structures. We view the network of structural similarities as a graph, called the organismal Protein Domain Universe Graph (oPDUG); vertices represent types of structural domains and edges represent strong structural similarity. As observed before, each oPDUG is a highly nonrandom graph, as evidenced in the vertex degree distribution, which resembles a Pareto law (which has a power-law asymptotic). To explain this and other peculiar properties of the oPDUGs, we construct an evolving-graph model for the long-timescale evolutionary dynamics of oPDUGs, containing only divergent mechanisms of domain discovery. The model generates degree distributions (resembling Pareto laws) and clustering-coefficient distributions that are characteristic of the oPDUGs. In the infinite-graph limit, we analytically compute the exponent for specific biological parameters, as well as the complete phase diagram of the model, finding two distinct regimes of domain innovation dynamics. Thus, divergent evolutionary dynamics quantitatively explains the nonrandom organization of oPDUGs.
Collapse
Affiliation(s)
- C Brian Roland
- Chemical Physics Program, Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, USA
| | | |
Collapse
|
36
|
Schmidt T, Frishman D. PROMPT: a protein mapping and comparison tool. BMC Bioinformatics 2006; 7:331. [PMID: 16817977 PMCID: PMC1569443 DOI: 10.1186/1471-2105-7-331] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2006] [Accepted: 07/04/2006] [Indexed: 11/12/2022] Open
Abstract
Background Comparison of large protein datasets has become a standard task in bioinformatics. Typically researchers wish to know whether one group of proteins is significantly enriched in certain annotation attributes or sequence properties compared to another group, and whether this enrichment is statistically significant. In order to conduct such comparisons it is often required to integrate molecular sequence data and experimental information from disparate incompatible sources. While many specialized programs exist for comparisons of this kind in individual problem domains, such as expression data analysis, no generic software solution capable of addressing a wide spectrum of routine tasks in comparative proteomics is currently available. Results PROMPT is a comprehensive bioinformatics software environment which enables the user to compare arbitrary protein sequence sets, revealing statistically significant differences in their annotation features. It allows automatic retrieval and integration of data from a multitude of molecular biological databases as well as from a custom XML format. Similarity-based mapping of sequence IDs makes it possible to link experimental information obtained from different sources despite discrepancies in gene identifiers and minor sequence variation. PROMPT provides a full set of statistical procedures to address the following four use cases: i) comparison of the frequencies of categorical annotations between two sets, ii) enrichment of nominal features in one set with respect to another one, iii) comparison of numeric distributions, and iv) correlation of numeric variables. Analysis results can be visualized in the form of plots and spreadsheets and exported in various formats, including Microsoft Excel. Conclusion PROMPT is a versatile, platform-independent, easily expandable, stand-alone application designed to be a practical workhorse in analysing and mining protein sequences and associated annotation. The availability of the Java Application Programming Interface and scripting capabilities on one hand, and the intuitive Graphical User Interface with context-sensitive help system on the other, make it equally accessible to professional bioinformaticians and biologically-oriented users. PROMPT is freely available for academic users from .
Collapse
Affiliation(s)
- Thorsten Schmidt
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| | - Dmitrij Frishman
- Department of Genome Oriented Bioinformatics, Technische Universität München, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany
| |
Collapse
|
37
|
Rastogi S, Reuter N, Liberles DA. Evaluation of models for the evolution of protein sequences and functions under structural constraint. Biophys Chem 2006; 124:134-44. [PMID: 16837122 DOI: 10.1016/j.bpc.2006.06.008] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2006] [Revised: 06/13/2006] [Accepted: 06/14/2006] [Indexed: 12/01/2022]
Abstract
In the field of evolutionary structural genomics, methods are needed to evaluate why genomes evolved to contain the fold distributions that are observed. In order to study the effects of population dynamics in the evolved genomes we need fast and accurate evolutionary models which can analyze the effects of selection, drift and fixation of a protein sequence in a population that are grounded by physical parameters governing the folding and binding properties of the sequence. In this study, various knowledge-based, force field, and statistical methods for protein folding have been evaluated with four different folds: SH2 domains, SH3 domains, Globin-like, and Flavodoxin-like, to evaluate the speed and accuracy of the energy functions. Similarly, knowledge-based and force field methods have been used to predict ligand binding specificity in SH2 domain. To demonstrate the applicability of these methods, the dynamics of evolution of new binding capabilities by an SH2 domain is demonstrated.
Collapse
Affiliation(s)
- Shruti Rastogi
- Department of Molecular Biology, University of Wyoming, Laramie, WY 82071, USA
| | | | | |
Collapse
|
38
|
Marsden RL, Lee D, Maibaum M, Yeats C, Orengo CA. Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space. Nucleic Acids Res 2006; 34:1066-80. [PMID: 16481312 PMCID: PMC1373602 DOI: 10.1093/nar/gkj494] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
We present an analysis of 203 completed genomes in the Gene3D resource (including 17 eukaryotes), which demonstrates that the number of protein families is continually expanding over time and that singleton-sequences appear to be an intrinsic part of the genomes. A significant proportion of the proteomes can be assigned to fewer than 6000 well-characterized domain families with the remaining domain-like regions belonging to a much larger number of small uncharacterized families that are largely species specific. Our comprehensive domain annotation of 203 genomes enables us to provide more accurate estimates of the number of multi-domain proteins found in the three kingdoms of life than previous calculations. We find that 67% of eukaryotic sequences are multi-domain compared with 56% of sequences in prokaryotes. By measuring the domain coverage of genome sequences, we show that the structural genomics initiatives should aim to provide structures for less than a thousand structurally uncharacterized Pfam families to achieve reasonable structural annotation of the genomes. However, in large families, additional structures should be determined as these would reveal more about the evolution of the family and enable a greater understanding of how function evolves.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
39
|
Brajusković G. Genomics. VOJNOSANIT PREGL 2006; 63:604-10. [PMID: 16796028 DOI: 10.2298/vsp0606604b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Affiliation(s)
- Goran Brajusković
- Vojnomedicinska akademija, Centar za patologiju i sudsku medicinu, Institut za patologiju, Beograd, Srbija i Crna Gora.
| |
Collapse
|
40
|
Abstract
Originally the term 'protein module' was coined to distinguish mobile domains that frequently occur as building blocks of diverse multidomain proteins from 'static' domains that usually exist only as stand-alone units of single-domain proteins. Despite the widespread use of the term 'mobile domain', the distinction between static and mobile domains is rather vague as it is not easy to quantify the mobility of domains. In the present work we show that the most appropriate measure of the mobility of domains is the number of types of local environments in which a given domain is present. Ranking of domains with respect to this parameter in different evolutionary lineages highlighted marked differences in the propensity of domains to form multidomain proteins. Our analyses have also shown that there is a correlation between domain size and domain mobility: smaller domains are more likely to be used in the construction of multidomain proteins, whereas larger domains are more likely to be static, stand-alone domains. It is also shown that shuffling of a limited set of modules was facilitated by intronic recombination in the metazoan lineage and this has contributed significantly to the emergence of novel complex multidomain proteins, novel functions and increased organismic complexity of metazoa.
Collapse
Affiliation(s)
- Hedvig Tordai
- Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, Budapest
| | | | | | | | | |
Collapse
|
41
|
Kozbial PZ, Mushegian AR. Natural history of S-adenosylmethionine-binding proteins. BMC STRUCTURAL BIOLOGY 2005; 5:19. [PMID: 16225687 PMCID: PMC1282579 DOI: 10.1186/1472-6807-5-19] [Citation(s) in RCA: 209] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2005] [Accepted: 10/14/2005] [Indexed: 11/10/2022]
Abstract
BACKGROUND S-adenosylmethionine is a source of diverse chemical groups used in biosynthesis and modification of virtually every class of biomolecules. The most notable reaction requiring S-adenosylmethionine, transfer of methyl group, is performed by a large class of enzymes, S-adenosylmethionine-dependent methyltransferases, which have been the focus of considerable structure-function studies. Evolutionary trajectories of these enzymes, and especially of other classes of S-adenosylmethionine-binding proteins, nevertheless, remain poorly understood. We addressed this issue by computational comparison of sequences and structures of various S-adenosylmethionine-binding proteins. RESULTS Two widespread folds, Rossmann fold and TIM barrel, have been repeatedly used in evolution for diverse types of S-adenosylmethionine conversion. There were also cases of recruitment of other relatively common folds for S-adenosylmethionine binding. Several classes of proteins have unique unrelated folds, specialized for just one type of chemistry and unified by the theme of internal domain duplications. In several cases, functional divergence is evident, when evolutionarily related enzymes have changed the mode of binding and the type of chemical transformation of S-adenosylmethionine. There are also instances of functional convergence, when biochemically similar processes are performed by drastically different classes of S-adenosylmethionine-binding proteins. Comparison of remote sequence similarities and analysis of phyletic patterns suggests that the last universal common ancestor of cellular life had between 10 and 20 S-adenosylmethionine-binding proteins from at least 5 fold classes, providing for S-adenosylmethionine formation, polyamine biosynthesis, and methylation of several substrates, including nucleic acids and peptide chain release factor. CONCLUSION We have observed several novel relationships between families that were not known to be related before, and defined 15 large superfamilies of SAM-binding proteins, at least 5 of which may have been represented in the last common ancestor.
Collapse
Affiliation(s)
- Piotr Z Kozbial
- Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110, USA
| | - Arcady R Mushegian
- Stowers Institute for Medical Research, 1000 E. 50th St., Kansas City, MO 64110, USA
- Department of Microbiology, Molecular Genetics, and Immunology, University of Kansas Medical Center, Kansas City, Kansas 66160, USA
| |
Collapse
|
42
|
Arai M, Fukushi T, Satake M, Shimizu T. A proteome-wide analysis of domain architectures of prokaryotic single-spanning transmembrane proteins. Comput Biol Chem 2005; 29:379-87. [PMID: 16213795 DOI: 10.1016/j.compbiolchem.2005.08.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2005] [Accepted: 08/08/2005] [Indexed: 11/24/2022]
Abstract
We performed a proteome-wide survey of the domain architectures in single-spanning transmembrane (TM) proteins (single-spannings) from 87 sequenced prokaryotic (Bacterial and Archaean) genomes by assigning Pfam domains to their N-tail and C-tail loops. Out of 14,625 single-spannings, 3,516 sequences have at least one domain assigned, and no domains were assigned to 7,850, with the remaining 3,259 with less reliable assignment. In the domain-assigned sequences, 3116 sequences are with at most two domains, and the other 400 sequences with more than two. The assigned domains distribute over 651 Pfam families, which account for 11.4% of the total Pfam-A families. Among the 651 families are mostly soluble-protein-originated ones, but only 21 families are unique to TM proteins. The occurrence frequency of the individual domain families follows a power-law, that is, 264 families occur only once, 106 just twice, and the families appeared more than 30 times are counted by only 39. It is found that the great majority of the sequences having one or two domains are of the type II topology with the C-tail loop containing domains on it. On the contrary, the N-tail loop of the same type topology seldom carries domains. Importantly, the assigned domains are always found on the tail loops longer than 60 residues, even for the small domains with less than 30 residues. There are still as many as 5,800 sequences without assigned domains in spite of having at least one long tail, on which no less than 1,000 novel domain families are expected most likely to lie concealed unknown yet. We also investigated the domain arrangement preference and the domain family combination patterns in 'singlets' (single-spannings with one assigned domain) and 'doublets' (with two domains).
Collapse
Affiliation(s)
- Masafumi Arai
- Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Japan
| | | | | | | |
Collapse
|
43
|
Doolittle RF. Evolutionary aspects of whole-genome biology. Curr Opin Struct Biol 2005; 15:248-53. [PMID: 15963888 DOI: 10.1016/j.sbi.2005.04.001] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2005] [Revised: 02/08/2005] [Accepted: 04/12/2005] [Indexed: 11/28/2022]
Abstract
A decade of access to whole-genome sequences has been increasingly revealing about the informational network relating all living organisms. Although at one point there was concern that extensive horizontal gene transfer might hopelessly muddle phylogenies, it has not proved a severe hindrance. The melding of sequence and structural information is being used to great advantage, and the prospect exists that some of the earliest aspects of life on Earth can be reconstructed, including the invention of biosynthetic and metabolic pathways. Still, some fundamental phylogenetic problems remain, including determining the root--if there is one--of the historical relationship between Archaea, Bacteria and Eukarya.
Collapse
Affiliation(s)
- Russell F Doolittle
- Department of Chemistry & Biochemistry, University of California San Diego, La Jolla, CA 92093-0314, USA.
| |
Collapse
|
44
|
Coinçon M, Heitz A, Chiche L, Derreumaux P. The βαβαβ elementary supersecondary structure of the Rossmann fold from porcine lactate dehydrogenase exhibits characteristics of a molten globule. Proteins 2005; 60:740-5. [PMID: 16001419 DOI: 10.1002/prot.20507] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein classifications show that the Rossmann fold, which consists of two betaalphabetaalphabeta motifs (BABAB) related by a rough twofold axis, is the most populated alphabeta fold, and that the betaalphabeta submotif (BAB) is a widespread elementary structural arrangement. Herein, we report MD simulations, circular dichroism and NMR analyses on BAB and BABAB from porcine lactate dehydrogenase to evaluate their intrinsic stability. Our results demonstrate that BAB is not stable in solution and is not a folding nucleus. We also find that BABAB, despite its appearance of a functional and structural unit, is not an independent and thermodynamically stable folding unit. Rather, we show that BABAB retains most native secondary structure but very little tertiary structure, thus displaying characteristics of a molten globule.
Collapse
Affiliation(s)
- Mathieu Coinçon
- Information Génomique et Structurale, CNRS UPR 2589, Marseille Cedex, France
| | | | | | | |
Collapse
|
45
|
Todd AE, Marsden RL, Thornton JM, Orengo CA. Progress of Structural Genomics Initiatives: An Analysis of Solved Target Structures. J Mol Biol 2005; 348:1235-60. [PMID: 15854658 DOI: 10.1016/j.jmb.2005.03.037] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2004] [Revised: 02/28/2005] [Accepted: 03/15/2005] [Indexed: 11/27/2022]
Abstract
The explosion in gene sequence data and technological breakthroughs in protein structure determination inspired the launch of structural genomics (SG) initiatives. An often stated goal of structural genomics is the high-throughput structural characterisation of all protein sequence families, with the long-term hope of significantly impacting on the life sciences, biotechnology and drug discovery. Here, we present a comprehensive analysis of solved SG targets to assess progress of these initiatives. Eleven consortia have contributed 316 non-redundant entries and 323 protein chains to the Protein Data Bank (PDB), and 459 and 393 domains to the CATH and SCOP structure classifications, respectively. The quality and size of these proteins are comparable to those solved in traditional structural biology and, despite huge scope for duplicated efforts, only 14% of targets have a close homologue (>/=30% sequence identity) solved by another consortium. Analysis of CATH and SCOP revealed the significant contribution that structural genomics is making to the coverage of superfamilies and folds. A total of 67% of SG domains in CATH are unique, lacking an already characterised close homologue in the PDB, whereas only 21% of non-SG domains are unique. For 29% of domains, structure determination revealed a remote evolutionary relationship not apparent from sequence, and 19% and 11% contributed new superfamilies and folds. The secondary structure class, fold and superfamily distributions of this dataset reflect those of the genomes. The domains fall into 172 different folds and 259 superfamilies in CATH but the distribution is highly skewed. The most populous of these are those that recur most frequently in the genomes. Whilst 11% of superfamilies are bacteria-specific, most are common to all three superkingdoms of life and together the 316 PDB entries have provided new and reliable homology models for 9287 non-redundant gene sequences in 206 completely sequenced genomes. From the perspective of this analysis, it appears that structural genomics is on track to be a success, and it is hoped that this work will inform future directions of the field.
Collapse
Affiliation(s)
- Annabel E Todd
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London, WC1E 6BT, UK.
| | | | | | | |
Collapse
|
46
|
Caetano-Anollés G, Caetano-Anollés D. Universal Sharing Patterns in Proteomes and Evolution of Protein Fold Architecture and Life. J Mol Evol 2005; 60:484-98. [PMID: 15883883 DOI: 10.1007/s00239-004-0221-6] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2004] [Accepted: 10/11/2004] [Indexed: 11/30/2022]
Abstract
Protein evolution is imprinted in both the sequence and the structure of evolutionary building blocks known as protein domains. These domains share a common ancestry and can be unified into a comparatively small set of folding architectures, the protein folds. We have traced the distribution of protein folds between and within proteomes belonging to Eukarya, Archaea, and Bacteria along the branches of a universal phylogeny of protein architecture. This tree was reconstructed from global fold-usage statistics derived from a structural census of proteomes. We found that folds shared by the three organismal domains were placed almost exclusively at the base of the rooted tree and that there were marked heterogeneities in fold distribution and clear evolutionary patterns related to protein architecture and organismal diversification. These include a relative timing for the emergence of prokaryotes, congruent episodes of architectural loss and diversification in Archaea and Bacteria, and a late and quite massive rise of architectural novelties in Eukarya perhaps linked to multicellularity.
Collapse
Affiliation(s)
- Gustavo Caetano-Anollés
- Department of Crop Sciences, University of Illinois, 332 NSRC, 1101 West Peabody Drive, Urbana, IL, 61801, USA.
| | | |
Collapse
|
47
|
Ekman D, Björklund AK, Frey-Skött J, Elofsson A. Multi-domain Proteins in the Three Kingdoms of Life: Orphan Domains and Other Unassigned Regions. J Mol Biol 2005; 348:231-43. [PMID: 15808866 DOI: 10.1016/j.jmb.2005.02.007] [Citation(s) in RCA: 165] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2004] [Revised: 01/31/2005] [Accepted: 02/02/2005] [Indexed: 11/17/2022]
Abstract
Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multi-domain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions.
Collapse
Affiliation(s)
- Diana Ekman
- Stockholm Bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | | | |
Collapse
|
48
|
Bollen YJM, van Mierlo CPM. Protein topology affects the appearance of intermediates during the folding of proteins with a flavodoxin-like fold. Biophys Chem 2004; 114:181-9. [PMID: 15829351 DOI: 10.1016/j.bpc.2004.12.005] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2004] [Revised: 10/25/2004] [Accepted: 12/08/2004] [Indexed: 11/23/2022]
Abstract
The topology of a native protein influences the rate with which it is formed, but does topology affect the appearance of folding intermediates and their specific role in kinetic folding as well? This question is addressed by comparing the folding data recently obtained on apoflavodoxin from Azotobacter vinelandii with those available on all three other alpha-beta parallel proteins the kinetic folding mechanism of which has been studied, i.e. Anabaena apoflavodoxin, Fusarium solani pisi cutinase and CheY. Two kinetic folding intermediates, one on-pathway and the other off-pathway, seem to be present during the folding of proteins with an alpha-beta parallel, also called flavodoxin-like, topology. The on-pathway intermediate lies on a direct route from the unfolded to the native state of the protein involved. The off-pathway intermediate needs to unfold to allow the production of native protein. Available simulation data of the folding of CheY show the involvement of two intermediates with characteristics that resemble those of the two intermediates experimentally observed. Apparently, protein topology governs the appearance and kinetic roles of protein folding intermediates during the folding of proteins that have a flavodoxin-like fold.
Collapse
Affiliation(s)
- Yves J M Bollen
- Department of Agrotechnology and Food Sciences, Laboratory of Biochemistry, Wageningen University, The Netherlands
| | | |
Collapse
|
49
|
Bollen YJM, Sánchez IE, van Mierlo CPM. Formation of on- and off-pathway intermediates in the folding kinetics of Azotobacter vinelandii apoflavodoxin. Biochemistry 2004; 43:10475-89. [PMID: 15301546 DOI: 10.1021/bi049545m] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The folding kinetics of the 179-residue Azotobacter vinelandii apoflavodoxin, which has an alpha-beta parallel topology, have been followed by stopped-flow experiments monitored by fluorescence intensity and anisotropy. Single-jump and interrupted refolding experiments show that the refolding kinetics involve four processes yielding native molecules. Interrupted unfolding experiments show that the two slowest folding processes are due to Xaa-Pro peptide bond isomerization in unfolded apoflavodoxin. The denaturant dependence of the folding kinetics is complex. Under strongly unfolding conditions (>2.5 M GuHCl), single exponential kinetics are observed. The slope of the chevron plot changes between 3 and 5 M denaturant, and no additional unfolding process is observed. This reveals the presence of two consecutive transition states on a linear pathway that surround a high-energy on-pathway intermediate. Under refolding conditions, two processes are observed for the folding of apoflavodoxin molecules with native Xaa-Pro peptide bond conformations, which implies the population of an intermediate. The slowest of these two processes becomes faster with increasing denaturant concentration, meaning that an unfolding step is rate-limiting for folding of the majority of apoflavodoxin molecules. It is shown that the intermediate that populates during refolding is off-pathway. The experimental data obtained on apoflavodoxin folding are consistent with the linear folding mechanism I(off) <==> U <==> I(on) <== > N, the off-pathway intermediate being the molten globule one that also populates during equilibrium denaturation of apoflavodoxin. The presence of such on-pathway and off-pathway intermediates in the folding kinetics of alpha-beta parallel proteins is apparently governed by protein topology.
Collapse
Affiliation(s)
- Yves J M Bollen
- Department of Agrotechnology and Food Sciences, Laboratory of Biochemistry, Wageningen University, Dreijenlaan 3, NL-6703 HA Wageningen, The Netherlands
| | | | | |
Collapse
|
50
|
Aroul-Selvam R, Hubbard T, Sasidharan R. Domain insertions in protein structures. J Mol Biol 2004; 338:633-41. [PMID: 15099733 PMCID: PMC2665287 DOI: 10.1016/j.jmb.2004.03.039] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2003] [Revised: 03/07/2004] [Accepted: 03/10/2004] [Indexed: 10/26/2022]
Abstract
Domains are the structural, functional or evolutionary units of proteins. Proteins can comprise a single domain or a combination of domains. In multi-domain proteins, the domains almost always occur end-to-end, i.e., one domain follows the C-terminal end of another domain. However, there are exceptions to this common pattern, where multi-domain proteins are formed by insertion of one domain (insert) into another domain (parent). Here, we provide a quantitative description of known insertions in the Protein Data Bank (PDB). We found that 9% of domain combinations observed in non-redundant PDB are insertions. Although 90% of all insertions involve only one insert, proteins can clearly have multiple (nested, two-domain and three-domain) inserts. We also observed correlations between the structure and function of a domain and its tendency to be found as a parent or an insert. There is a bias in insert position towards the C terminus of parents. We observed that the atomic distance between the N and C terminus of an insert is significantly smaller when compared to the N-to-C distance in a parent context or a single domain context. Insertions are found always to occur in loop regions of parent domains. Our observations regarding the relationship between domain insertions and the structure, function and evolution of proteins have implications for protein engineering.
Collapse
Affiliation(s)
- R. Aroul-Selvam
- The Wellcome Trust Sanger Institute, Genome Campus Hinxton, Cambridge CB10 1SA UK
| | - Tim Hubbard
- The Wellcome Trust Sanger Institute, Genome Campus Hinxton, Cambridge CB10 1SA UK
| | - Rajkumar Sasidharan
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
- Corresponding author E-mail address of the corresponding author:
| |
Collapse
|