1
|
Kimmel J, Schmitt M, Sinner A, Jansen PWTC, Mainye S, Ramón-Zamorano G, Toenhake CG, Wichers-Misterek JS, Cronshagen J, Sabitzki R, Mesén-Ramírez P, Behrens HM, Bártfai R, Spielmann T. Gene-by-gene screen of the unknown proteins encoded on Plasmodium falciparum chromosome 3. Cell Syst 2023; 14:9-23.e7. [PMID: 36657393 DOI: 10.1016/j.cels.2022.12.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Revised: 10/07/2022] [Accepted: 12/08/2022] [Indexed: 01/19/2023]
Abstract
Taxon-specific proteins are key determinants defining the biology of all organisms and represent prime drug targets in pathogens. However, lacking comparability with proteins in other lineages makes them particularly difficult to study. In malaria parasites, this is exacerbated by technical limitations. Here, we analyzed the cellular location, essentiality, function, and, in selected cases, interactome of all unknown non-secretory proteins encoded on an entire P. falciparum chromosome. The nucleus was the most common localization, indicating that it is a hotspot of parasite-specific biology. More in-depth functional studies with four proteins revealed essential roles in DNA replication and mitosis. The mitosis proteins defined a possible orphan complex and a highly diverged complex needed for spindle-kinetochore connection. Structure-function comparisons indicated that the taxon-specific proteins evolved by different mechanisms. This work demonstrates the feasibility of gene-by-gene screens to elucidate the biology of malaria parasites and reveal critical parasite-specific processes of interest as drug targets.
Collapse
Affiliation(s)
- Jessica Kimmel
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Marius Schmitt
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Alexej Sinner
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | | | - Sheila Mainye
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Gala Ramón-Zamorano
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Christa Geeke Toenhake
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, the Netherlands
| | | | - Jakob Cronshagen
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Ricarda Sabitzki
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Paolo Mesén-Ramírez
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Hannah Michaela Behrens
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany
| | - Richárd Bártfai
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Radboud University, 6525 GA Nijmegen, the Netherlands
| | - Tobias Spielmann
- Bernhard Nocht Institute for Tropical Medicine, Bernhard Nocht Str. 74, 20359 Hamburg, Germany.
| |
Collapse
|
2
|
Bhat AS, Kinch LN, Grishin NV. β-Strand-mediated interactions of protein domains. Proteins 2020; 88:1513-1527. [PMID: 32543729 PMCID: PMC8018532 DOI: 10.1002/prot.25970] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 03/10/2020] [Accepted: 06/06/2020] [Indexed: 01/14/2023]
Abstract
Protein domains exist by themselves or in combination with other domains to form complex multidomain proteins. Defining domain boundaries in proteins is essential for understanding their evolution and function but is not trivial. More specifically, partitioning domains that interact by forming a single β-sheet is known to be particularly troublesome for automatic structure-based domain decomposition pipelines. Here, we study edge-to-edge β-strand interactions between domains in a protein chain, to help define the boundaries for some more difficult cases where a single β-sheet spanning over two domains gives an appearance of one. We give a number of examples where β-strands belonging to a single β-sheet do not belong to a single domain and highlight the difficulties of automatic domain parsers on these examples. This work can be used as a baseline for defining domain boundaries in homologous proteins or proteins with similar domain interactions in the future.
Collapse
Affiliation(s)
- Archana S. Bhat
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| | - Lisa N. Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| | - Nick V. Grishin
- Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050
| |
Collapse
|
3
|
An in silico structural and physicochemical characterization of TonB-dependent copper receptor in A. baumannii. Microb Pathog 2018. [DOI: 10.1016/j.micpath.2018.03.009] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
4
|
Dos Santos Vasconcelos CR, de Lima Campos T, Rezende AM. Building protein-protein interaction networks for Leishmania species through protein structural information. BMC Bioinformatics 2018; 19:85. [PMID: 29510668 PMCID: PMC5840830 DOI: 10.1186/s12859-018-2105-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Accepted: 03/01/2018] [Indexed: 12/21/2022] Open
Abstract
Background Systematic analysis of a parasite interactome is a key approach to understand different biological processes. It makes possible to elucidate disease mechanisms, to predict protein functions and to select promising targets for drug development. Currently, several approaches for protein interaction prediction for non-model species incorporate only small fractions of the entire proteomes and their interactions. Based on this perspective, this study presents an integration of computational methodologies, protein network predictions and comparative analysis of the protozoan species Leishmania braziliensis and Leishmania infantum. These parasites cause Leishmaniasis, a worldwide distributed and neglected disease, with limited treatment options using currently available drugs. Results The predicted interactions were obtained from a meta-approach, applying rigid body docking tests and template-based docking on protein structures predicted by different comparative modeling techniques. In addition, we trained a machine-learning algorithm (Gradient Boosting) using docking information performed on a curated set of positive and negative protein interaction data. Our final model obtained an AUC = 0.88, with recall = 0.69, specificity = 0.88 and precision = 0.83. Using this approach, it was possible to confidently predict 681 protein structures and 6198 protein interactions for L. braziliensis, and 708 protein structures and 7391 protein interactions for L. infantum. The predicted networks were integrated to protein interaction data already available, analyzed using several topological features and used to classify proteins as essential for network stability. Conclusions The present study allowed to demonstrate the importance of integrating different methodologies of interaction prediction to increase the coverage of the protein interaction of the studied protocols, besides it made available protein structures and interactions not previously reported. Electronic supplementary material The online version of this article (10.1186/s12859-018-2105-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Crhisllane Rafaele Dos Santos Vasconcelos
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Genetics Department of Universidade Federal de Pernambuco, Recife, PE, Brazil.
| | - Túlio de Lima Campos
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil.,Bioinformatics Plataform of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil
| | - Antonio Mauro Rezende
- Microbiology Department of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Bioinformatics Plataform of Instituto Aggeu Magalhães - FIOCRUZ, Recife, PE, Brazil. .,Genetics Department of Universidade Federal de Pernambuco, Recife, PE, Brazil.
| |
Collapse
|
5
|
Abstract
The significant expansion in protein sequence and structure data that we are now witnessing brings with it a pressing need to bring order to the protein world. Such order enables us to gain insights into the evolution of proteins, their function and the extent to which the functional repertoire can vary across the three kingdoms of life. This has lead to the creation of a wide range of protein family classifications that aim to group proteins based upon their evolutionary relationships.In this chapter we discuss the approaches and methods that are frequently used in the classification of proteins, with a specific emphasis on the classification of protein domains. The construction of both domain sequence and domain structure databases is considered and we show how the use of domain family annotations to assign structural and functional information is enhancing our understanding of genomes.
Collapse
|
6
|
Abstract
Symmetry is a common feature among natural systems, including protein structures. A strong propensity toward symmetric architectures has long been recognized for water-soluble proteins, and this propensity has been rationalized from an evolutionary standpoint. Proteins residing in cellular membranes, however, have traditionally been less amenable to structural studies, and thus the prevalence and significance of symmetry in this important class of molecules is not as well understood. In the past two decades, researchers have made great strides in this area, and these advances have provided exciting insights into the range of architectures adopted by membrane proteins. These structural studies have revealed a similarly strong bias toward symmetric arrangements, which were often unexpected and which occurred despite the restrictions imposed by the membrane environment on the possible symmetry groups. Moreover, membrane proteins disproportionately contain internal structural repeats resulting from duplication and fusion of smaller segments. This article discusses the types and origins of symmetry in membrane proteins and the implications of symmetry for protein function.
Collapse
Affiliation(s)
- Lucy R Forrest
- Computational Structural Biology Group, Porter Neuroscience Center, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, Maryland 20852;
| |
Collapse
|
7
|
Fox NK, Brenner SE, Chandonia JM. The value of protein structure classification information-Surveying the scientific literature. Proteins 2015; 83:2025-38. [PMID: 26313554 PMCID: PMC4609302 DOI: 10.1002/prot.24915] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Revised: 08/06/2015] [Accepted: 08/18/2015] [Indexed: 11/08/2022]
Abstract
The Structural Classification of Proteins (SCOP) and Class, Architecture, Topology, Homology (CATH) databases have been valuable resources for protein structure classification for over 20 years. Development of SCOP (version 1) concluded in June 2009 with SCOP 1.75. The SCOPe (SCOP-extended) database offers continued development of the classic SCOP hierarchy, adding over 33,000 structures. We have attempted to assess the impact of these two decade old resources and guide future development. To this end, we surveyed recent articles to learn how structure classification data are used. Of 571 articles published in 2012-2013 that cite SCOP, 439 actually use data from the resource. We found that the type of use was fairly evenly distributed among four top categories: A) study protein structure or evolution (27% of articles), B) train and/or benchmark algorithms (28% of articles), C) augment non-SCOP datasets with SCOP classification (21% of articles), and D) examine the classification of one protein/a small set of proteins (22% of articles). Most articles described computational research, although 11% described purely experimental research, and a further 9% included both. We examined how CATH and SCOP were used in 158 articles that cited both databases: while some studies used only one dataset, the majority used data from both resources. Protein structure classification remains highly relevant for a diverse range of problems and settings.
Collapse
Affiliation(s)
- Naomi K Fox
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| | - Steven E Brenner
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720.,Department of Plant and Microbial Biology, University of California, Berkeley, California, 94720
| | - John-Marc Chandonia
- Lawrence Berkeley National Laboratory, Physical Biosciences Division, Berkeley, California, 94720
| |
Collapse
|
8
|
Li L, Wurtele ES. The QQS orphan gene of Arabidopsis modulates carbon and nitrogen allocation in soybean. PLANT BIOTECHNOLOGY JOURNAL 2015; 13:177-87. [PMID: 25146936 PMCID: PMC4345402 DOI: 10.1111/pbi.12238] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/20/2014] [Revised: 06/30/2014] [Accepted: 07/03/2014] [Indexed: 05/19/2023]
Abstract
The genome of each species contains as high as 8% of genes that are uniquely present in that species. Little is known about the functional significance of these so-called species specific or orphan genes. The Arabidopsis thaliana gene Qua-Quine Starch (QQS) is species specific. Here, we show that altering QQS expression in Arabidopsis affects carbon partitioning to both starch and protein. We hypothesized QQS may be conserved in a feature other than primary sequence, and as such could function to impact composition in another species. To test the potential of QQS in affecting composition in an ectopic species, we introduced QQS into soybean. Soybean T1 lines expressing QQS have up to 80% decreased leaf starch and up to 60% increased leaf protein; T4 generation seeds from field-grown plants contain up to 13% less oil, while protein is increased by up to 18%. These data broaden the concept of QQS as a modulator of carbon and nitrogen allocation, and demonstrate that this species-specific gene can affect the seed composition of an agronomic species thought to have diverged from Arabidopsis 100 million years ago.
Collapse
Affiliation(s)
- Ling Li
- Department of Genetics, Development and Cell Biology, Iowa State UniversityAmes, IA, USA
| | - Eve Syrkin Wurtele
- Department of Genetics, Development and Cell Biology, Iowa State UniversityAmes, IA, USA
| |
Collapse
|
9
|
Rakshambikai R, Manoharan M, Gnanavel M, Srinivasan N. Typical and atypical domain combinations in human protein kinases: functions, disease causing mutations and conservation in other primates. RSC Adv 2015. [DOI: 10.1039/c4ra11685b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A twist in the evolution of human kinases resulting in kinases with hybrid and rogue properties.
Collapse
Affiliation(s)
| | - Malini Manoharan
- Molecular Biophysics Unit
- Indian Institute of Science
- Bangalore 560012
- India
| | - Mutharasu Gnanavel
- Molecular Biophysics Unit
- Indian Institute of Science
- Bangalore 560012
- India
| | | |
Collapse
|
10
|
Light S, Basile W, Elofsson A. Orphans and new gene origination, a structural and evolutionary perspective. Curr Opin Struct Biol 2014; 26:73-83. [DOI: 10.1016/j.sbi.2014.05.006] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2014] [Revised: 05/07/2014] [Accepted: 05/16/2014] [Indexed: 12/28/2022]
|
11
|
Bhaskara RM, Mehrotra P, Rakshambikai R, Gnanavel M, Martin J, Srinivasan N. The relationship between classification of multi-domain proteins using an alignment-free approach and their functions: a case study with immunoglobulins. MOLECULAR BIOSYSTEMS 2014; 10:1082-93. [DOI: 10.1039/c3mb70443b] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
12
|
Claridge JK, Schnell JR. Bacterial production and solution NMR studies of a viral membrane ion channel. Methods Mol Biol 2012; 831:165-79. [PMID: 22167674 DOI: 10.1007/978-1-61779-480-3_10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Advances in solution nuclear magnetic resonance (NMR) methodology that enable studies of very large proteins have also paved the way for studies of membrane proteins that behave like large proteins due to the added weight of surfactants. Solution NMR has been used to determine the high-resolution structures of several small, membrane proteins dissolved in detergent micelles and small bicelles. However, the usual difficulties with membrane proteins in producing, purifying, and stabilizing the proteins away from native membranes remain, requiring intensive screening efforts. Low levels of heterologous expression can be the most detrimental aspect to studying membrane proteins. This is exacerbated for NMR studies because of the costs of isotopically enriched media. Thus, solution NMR studies have tended to focus on relatively small, membrane proteins that can be expressed into inclusion bodies and refolded. Here, we describe the methods used to produce, purify, and refold the proton channel M2 into detergent micelles, and the procedures used to determine chemical shift assignments and the atomic level structure of the closed form of the homotetrameric channel.
Collapse
|
13
|
Stella R, Cifani P, Peggion C, Hansson K, Lazzari C, Bendz M, Levander F, Sorgato MC, Bertoli A, James P. Relative Quantification of Membrane Proteins in Wild-Type and Prion Protein (PrP)-Knockout Cerebellar Granule Neurons. J Proteome Res 2011; 11:523-36. [DOI: 10.1021/pr200759m] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Roberto Stella
- Department of Biological Chemistry, University of Padova, Italy
| | - Paolo Cifani
- Department of Immunotechnology and CREATE Health, Lund University, Sweden
| | | | - Karin Hansson
- Department of Immunotechnology and CREATE Health, Lund University, Sweden
| | | | - Maria Bendz
- Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden
| | - Fredrik Levander
- Department of Immunotechnology and CREATE Health, Lund University, Sweden
| | | | | | - Peter James
- Department of Immunotechnology and CREATE Health, Lund University, Sweden
| |
Collapse
|
14
|
Abstract
Gene evolution has long been thought to be primarily driven by duplication and rearrangement mechanisms. However, every evolutionary lineage harbours orphan genes that lack homologues in other lineages and whose evolutionary origin is only poorly understood. Orphan genes might arise from duplication and rearrangement processes followed by fast divergence; however, de novo evolution out of non-coding genomic regions is emerging as an important additional mechanism. This process appears to provide raw material continuously for the evolution of new gene functions, which can become relevant for lineage-specific adaptations.
Collapse
|
15
|
Kim Y, Babnigg G, Jedrzejczak R, Eschenfeldt WH, Li H, Maltseva N, Hatzos-Skintges C, Gu M, Makowska-Grzyska M, Wu R, An H, Chhor G, Joachimiak A. High-throughput protein purification and quality assessment for crystallization. Methods 2011; 55:12-28. [PMID: 21907284 DOI: 10.1016/j.ymeth.2011.07.010] [Citation(s) in RCA: 118] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2011] [Revised: 07/14/2011] [Accepted: 07/14/2011] [Indexed: 12/31/2022] Open
Abstract
The ultimate goal of structural biology is to understand the structural basis of proteins in cellular processes. In structural biology, the most critical issue is the availability of high-quality samples. "Structural biology-grade" proteins must be generated in the quantity and quality suitable for structure determination using X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. The purification procedures must reproducibly yield homogeneous proteins or their derivatives containing marker atom(s) in milligram quantities. The choice of protein purification and handling procedures plays a critical role in obtaining high-quality protein samples. With structural genomics emphasizing a genome-based approach in understanding protein structure and function, a number of unique structures covering most of the protein folding space have been determined and new technologies with high efficiency have been developed. At the Midwest Center for Structural Genomics (MCSG), we have developed semi-automated protocols for high-throughput parallel protein expression and purification. A protein, expressed as a fusion with a cleavable affinity tag, is purified in two consecutive immobilized metal affinity chromatography (IMAC) steps: (i) the first step is an IMAC coupled with buffer-exchange, or size exclusion chromatography (IMAC-I), followed by the cleavage of the affinity tag using the highly specific Tobacco Etch Virus (TEV) protease; the second step is IMAC and buffer exchange (IMAC-II) to remove the cleaved tag and tagged TEV protease. These protocols have been implemented on multidimensional chromatography workstations and, as we have shown, many proteins can be successfully produced in large-scale. All methods and protocols used for purification, some developed by MCSG, others adopted and integrated into the MCSG purification pipeline and more recently the Center for Structural Genomics of Infectious Diseases (CSGID) purification pipeline, are discussed in this chapter.
Collapse
Affiliation(s)
- Youngchang Kim
- Midwest Center for Structural Genomics, Biosciences Division, Argonne National Laboratory, Argonne, IL 60439, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Lee D, de Beer TAP, Laskowski RA, Thornton JM, Orengo CA. 1,000 structures and more from the MCSG. BMC STRUCTURAL BIOLOGY 2011; 11:2. [PMID: 21219649 PMCID: PMC3024214 DOI: 10.1186/1472-6807-11-2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2010] [Accepted: 01/10/2011] [Indexed: 11/10/2022]
Abstract
Background The Midwest Center for Structural Genomics (MCSG) is one of the large-scale centres of the Protein Structure Initiative (PSI). During the first two phases of the PSI the MCSG has solved over a thousand protein structures. A criticism of structural genomics is that target selection strategies mean that some structures are solved without having a known function and thus are of little biomedical significance. Structures of unknown function have stimulated the development of methods for function prediction from structure. Results We show that the MCSG has met the stated goals of the PSI and use online resources and readily available function prediction methods to provide functional annotations for more than 90% of the MCSG structures. The structure-to-function prediction method ProFunc provides likely functions for many of the MCSG structures that cannot be annotated by sequence-based methods. Conclusions Although the focus of the PSI was structural coverage, many of the structures solved by the MCSG can also be associated with functional classes and biological roles of possible biomedical value.
Collapse
Affiliation(s)
- David Lee
- Department of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
17
|
Capriles PVSZ, Guimarães ACR, Otto TD, Miranda AB, Dardenne LE, Degrave WM. Structural modelling and comparative analysis of homologous, analogous and specific proteins from Trypanosoma cruzi versus Homo sapiens: putative drug targets for chagas' disease treatment. BMC Genomics 2010; 11:610. [PMID: 21034488 PMCID: PMC3091751 DOI: 10.1186/1471-2164-11-610] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Accepted: 10/29/2010] [Indexed: 11/25/2022] Open
Abstract
Background Trypanosoma cruzi is the etiological agent of Chagas' disease, an endemic infection that causes thousands of deaths every year in Latin America. Therapeutic options remain inefficient, demanding the search for new drugs and/or new molecular targets. Such efforts can focus on proteins that are specific to the parasite, but analogous enzymes and enzymes with a three-dimensional (3D) structure sufficiently different from the corresponding host proteins may represent equally interesting targets. In order to find these targets we used the workflows MHOLline and AnEnΠ obtaining 3D models from homologous, analogous and specific proteins of Trypanosoma cruzi versus Homo sapiens. Results We applied genome wide comparative modelling techniques to obtain 3D models for 3,286 predicted proteins of T. cruzi. In combination with comparative genome analysis to Homo sapiens, we were able to identify a subset of 397 enzyme sequences, of which 356 are homologous, 3 analogous and 38 specific to the parasite. Conclusions In this work, we present a set of 397 enzyme models of T. cruzi that can constitute potential structure-based drug targets to be investigated for the development of new strategies to fight Chagas' disease. The strategies presented here support the concept of structural analysis in conjunction with protein functional analysis as an interesting computational methodology to detect potential targets for structure-based rational drug design. For example, 2,4-dienoyl-CoA reductase (EC 1.3.1.34) and triacylglycerol lipase (EC 3.1.1.3), classified as analogous proteins in relation to H. sapiens enzymes, were identified as new potential molecular targets.
Collapse
Affiliation(s)
- Priscila V S Z Capriles
- Grupo de Modelagem Molecular de Sistemas Biológicos, Laboratório Nacional de Computação Científica, LNCC/MCT, Petrópolis, CEP 25651-075, Brazil.
| | | | | | | | | | | |
Collapse
|
18
|
Chubb D, Jefferys BR, Sternberg MJE, Kelley LA. Sequencing delivers diminishing returns for homology detection: implications for mapping the protein universe. ACTA ACUST UNITED AC 2010; 26:2664-71. [PMID: 20843957 DOI: 10.1093/bioinformatics/btq527] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Databases of sequenced genomes are widely used to characterize the structure, function and evolutionary relationships of proteins. The ability to discern such relationships is widely expected to grow as sequencing projects provide novel information, bridging gaps in our map of the protein universe. RESULTS We have plotted our progress in protein sequencing over the last two decades and found that the rate of novel sequence discovery is in a sustained period of decline. Consequently, PSI-BLAST, the most widely used method to detect remote evolutionary relationships, which relies upon the accumulation of novel sequence data, is now showing a plateau in performance. We interpret this trend as signalling our approach to a representative map of the protein universe and discuss its implications.
Collapse
Affiliation(s)
- Daniel Chubb
- Department of Life Science, Imperial College London, London, UK.
| | | | | | | |
Collapse
|
19
|
Triviño JC, Pazos F. Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds. BMC SYSTEMS BIOLOGY 2010; 4:46. [PMID: 20406431 PMCID: PMC2883543 DOI: 10.1186/1752-0509-4-46] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 04/20/2010] [Indexed: 12/02/2022]
Abstract
Background Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures. Results In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism E. coli. Conclusions We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.
Collapse
Affiliation(s)
- Juan C Triviño
- Computational Systems Biology Group, National Centre for Biotechnology (CNB-CSIC), C/Darwin, 3, Cantoblanco, 28049 Madrid, Spain
| | | |
Collapse
|
20
|
Abstract
Many protein classification systems capture homologous relationships by grouping domains into families and superfamilies on the basis of sequence similarity. Superfamilies with similar 3D structures are further grouped into folds. In the absence of discernable sequence similarity, these structural similarities were long thought to have originated independently, by convergent evolution. However, the growth of databases and advances in sequence comparison methods have led to the discovery of many distant evolutionary relationships that transcend the boundaries of superfamilies and folds. To investigate the contributions of convergent versus divergent evolution in the origin of protein folds, we clustered representative domains of known structure by their sequence similarity, treating them as point masses in a virtual 2D space which attract or repel each other depending on their pairwise sequence similarities. As expected, families in the same superfamily form tight clusters. But often, superfamilies of the same fold are linked with each other, suggesting that the entire fold evolved from an ancient prototype. Strikingly, some links connect superfamilies with different folds. They arise from modular peptide fragments of between 20 and 40 residues that co-occur in the connected folds in disparate structural contexts. These may be descendants of an ancestral pool of peptide modules that evolved as cofactors in the RNA world and from which the first folded proteins arose by amplification and recombination. Our galaxy of folds summarizes, in a single image, most known and many yet undescribed homologous relationships between protein superfamilies, providing new insights into the evolution of protein domains.
Collapse
Affiliation(s)
- Vikram Alva
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen 72076, Germany
| | | | | | | | | |
Collapse
|
21
|
Cuff A, Redfern OC, Greene L, Sillitoe I, Lewis T, Dibley M, Reid A, Pearl F, Dallman T, Todd A, Garratt R, Thornton J, Orengo C. The CATH hierarchy revisited-structural divergence in domain superfamilies and the continuity of fold space. Structure 2010; 17:1051-62. [PMID: 19679085 PMCID: PMC2741583 DOI: 10.1016/j.str.2009.06.015] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2008] [Revised: 06/24/2009] [Accepted: 06/25/2009] [Indexed: 11/29/2022]
Abstract
This paper explores the structural continuum in CATH and the extent to which superfamilies adopt distinct folds. Although most superfamilies are structurally conserved, in some of the most highly populated superfamilies (4% of all superfamilies) there is considerable structural divergence. While relatives share a similar fold in the evolutionary conserved core, diverse elaborations to this core can result in significant differences in the global structures. Applying similar protocols to examine the extent to which structural overlaps occur between different fold groups, it appears this effect is confined to just a few architectures and is largely due to small, recurring super-secondary motifs (e.g., αβ-motifs, α-hairpins). Although 24% of superfamilies overlap with superfamilies having different folds, only 14% of nonredundant structures in CATH are involved in overlaps. Nevertheless, the existence of these overlaps suggests that, in some regions of structure space, the fold universe should be seen as more continuous.
Collapse
Affiliation(s)
- Alison Cuff
- Institute of Structural and Molecular Biology, University College London, London, UK.
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Bendz M, Möller MC, Arrigoni G, Wåhlander Å, Stella R, Cappadona S, Levander F, Hederstedt L, James P. Quantification of Membrane Proteins Using Nonspecific Protease Digestions. J Proteome Res 2009; 8:5666-73. [DOI: 10.1021/pr900741t] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Maria Bendz
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Mirja Carlsson Möller
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Giorgio Arrigoni
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Åsa Wåhlander
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Roberto Stella
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Salvatore Cappadona
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Fredrik Levander
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Lars Hederstedt
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| | - Peter James
- Protein Technology, Department of Immunotechnology, CREATE Health, Lund University, Sweden, Centre for Biomembrane Research, Department of Biochemistry and Biophysics, Stockholm University, Sweden, Department of Cell & Organism Biology, Lund University, Sweden, Department of Biological Chemistry, University of Padova, Italy, and Department of Bioengineering, IIT Unit, Politecnico di Milano, Italy
| |
Collapse
|
23
|
Dessailly BH, Nair R, Jaroszewski L, Fajardo JE, Kouranov A, Lee D, Fiser A, Godzik A, Rost B, Orengo C. PSI-2: structural genomics to cover protein domain family space. Structure 2009; 17:869-81. [PMID: 19523904 DOI: 10.1016/j.str.2009.03.015] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2008] [Revised: 03/18/2009] [Accepted: 03/22/2009] [Indexed: 11/25/2022]
Abstract
One major objective of structural genomics efforts, including the NIH-funded Protein Structure Initiative (PSI), has been to increase the structural coverage of protein sequence space. Here, we present the target selection strategy used during the second phase of PSI (PSI-2). This strategy, jointly devised by the bioinformatics groups associated with the PSI-2 large-scale production centers, targets representatives from large, structurally uncharacterized protein domain families, and from structurally uncharacterized subfamilies in very large and diverse families with incomplete structural coverage. These very large families are extremely diverse both structurally and functionally, and are highly overrepresented in known proteomes. On the basis of several metrics, we then discuss to what extent PSI-2, during its first 3 years, has increased the structural coverage of genomes, and contributed structural and functional novelty. Together, the results presented here suggest that PSI-2 is successfully meeting its objectives and provides useful insights into structural and functional space.
Collapse
Affiliation(s)
- Benoît H Dessailly
- Department of Structural and Molecular Biology, University College of London, London WC1E6BT, UK.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Redfern OC, Dessailly BH, Dallman TJ, Sillitoe I, Orengo CA. FLORA: a novel method to predict protein function from structure in diverse superfamilies. PLoS Comput Biol 2009; 5:e1000485. [PMID: 19714201 PMCID: PMC2721411 DOI: 10.1371/journal.pcbi.1000485] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2008] [Accepted: 07/23/2009] [Indexed: 11/18/2022] Open
Abstract
Predicting protein function from structure remains an active area of interest, particularly for the structural genomics initiatives where a substantial number of structures are initially solved with little or no functional characterisation. Although global structure comparison methods can be used to transfer functional annotations, the relationship between fold and function is complex, particularly in functionally diverse superfamilies that have evolved through different secondary structure embellishments to a common structural core. The majority of prediction algorithms employ local templates built on known or predicted functional residues. Here, we present a novel method (FLORA) that automatically generates structural motifs associated with different functional sub-families (FSGs) within functionally diverse domain superfamilies. Templates are created purely on the basis of their specificity for a given FSG, and the method makes no prior prediction of functional sites, nor assumes specific physico-chemical properties of residues. FLORA is able to accurately discriminate between homologous domains with different functions and substantially outperforms (a 2–3 fold increase in coverage at low error rates) popular structure comparison methods and a leading function prediction method. We benchmark FLORA on a large data set of enzyme superfamilies from all three major protein classes (α, β, αβ) and demonstrate the functional relevance of the motifs it identifies. We also provide novel predictions of enzymatic activity for a large number of structures solved by the Protein Structure Initiative. Overall, we show that FLORA is able to effectively detect functionally similar protein domain structures by purely using patterns of structural conservation of all residues. Understanding how the three-dimensional (3D) molecular structure of proteins influences their function can provide insights into the workings of biological systems. Structural Genomics Initiatives have been set up to investigate these structures on a large scale and make the data available to the wider biological research community. However, in a significant number of cases, there is little known about the functions of the structures that are solved. To address this, computational methods can be used as a predictive tool to guide future experimental investigations. One such approach is to exploit global structural comparison to assign the protein in question to an evolutionary family, which has already been functionally characterised. However, this is problematic in some large evolutionary families, which contain a number of different functional sub-families. We have developed a new method (FLORA) which is able to calculate 3D “motifs” which are specific to each of these sub-families. Any new protein structure can then be compared against these motifs to make a more accurate prediction of its function. Our paper shows that FLORA substantially outperforms other standard approaches for predicting function from structure. We use our method to make confident functional predictions for a set of proteins solved by the structural genomics projects, which could not have been assigned reliably by global structure comparison.
Collapse
Affiliation(s)
- Oliver C. Redfern
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
- * E-mail:
| | - Benoît H. Dessailly
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Timothy J. Dallman
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Ian Sillitoe
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| | - Christine A. Orengo
- Research Department of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
25
|
Naumoff DG, Carreras M. PSI protein classifier: A new program automating PSI-BLAST search results. Mol Biol 2009. [DOI: 10.1134/s0026893309040189] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
26
|
Lu G, Wang Z, Jones AM, Moriyama EN. 7TMRmine: a Web server for hierarchical mining of 7TMR proteins. BMC Genomics 2009; 10:275. [PMID: 19538753 PMCID: PMC2718930 DOI: 10.1186/1471-2164-10-275] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2009] [Accepted: 06/19/2009] [Indexed: 11/12/2022] Open
Abstract
BACKGROUND Seven-transmembrane region-containing receptors (7TMRs) play central roles in eukaryotic signal transduction. Due to their biomedical importance, thorough mining of 7TMRs from diverse genomes has been an active target of bioinformatics and pharmacogenomics research. The need for new and accurate 7TMR/GPCR prediction tools is paramount with the accelerated rate of acquisition of diverse sequence information. Currently available and often used protein classification methods (e.g., profile hidden Markov Models) are highly accurate for identifying their membership information among already known 7TMR subfamilies. However, these alignment-based methods are less effective for identifying remote similarities, e.g., identifying proteins from highly divergent or possibly new 7TMR families. In this regard, more sensitive (e.g., alignment-free) methods are needed to complement the existing protein classification methods. A better strategy would be to combine different classifiers, from more specific to more sensitive methods, to identify a broader spectrum of 7TMR protein candidates. DESCRIPTION We developed a Web server, 7TMRmine, by integrating alignment-free and alignment-based classifiers specifically trained to identify candidate 7TMR proteins as well as transmembrane (TM) prediction methods. This new tool enables researchers to easily assess the distribution of GPCR functionality in diverse genomes or individual newly-discovered proteins. 7TMRmine is easily customized and facilitates exploratory analysis of diverse genomes. Users can integrate various alignment-based, alignment-free, and TM-prediction methods in any combination and in any hierarchical order. Sixteen classifiers (including two TM-prediction methods) are available on the 7TMRmine Web server. Not only can the 7TMRmine tool be used for 7TMR mining, but also for general TM-protein analysis. Users can submit protein sequences for analysis, or explore pre-analyzed results for multiple genomes. The server currently includes prediction results and the summary statistics for 68 genomes. CONCLUSION 7TMRmine facilitates the discovery of 7TMR proteins. By combining prediction results from different classifiers in a multi-level filtering process, prioritized sets of 7TMR candidates can be obtained for further investigation. 7TMRmine can be also used as a general TM-protein classifier. Comparisons of TM and 7TMR protein distributions among 68 genomes revealed interesting differences in evolution of these protein families among major eukaryotic phyla.
Collapse
Affiliation(s)
- Guoqing Lu
- Department of Computer Science, University of Nebraska at Omaha, Omaha, NE 68182, USA
- Department of Biology, University of Nebraska at Omaha, Omaha, NE 68182, USA
| | - Zhifang Wang
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588-0660, USA
| | - Alan M Jones
- Departments of Biology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
- Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Etsuko N Moriyama
- School of Biological Sciences, University of Nebraska-Lincoln, Lincoln, NE 68588-0118, USA
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE 68588-0118, USA
| |
Collapse
|
27
|
Abstract
The large-scale structural biology projects that target human proteins focus predominantly on the catalytic domains of potential therapeutic targets and the domains of human proteins that mediate protein-protein and protein-small-molecule interactions. Their main scientific objective is to elucidate the molecular basis for specificity and selectivity of function within large protein families of therapeutic interest, such as kinases, phosphatases, and proteins involved in epigenetic regulation. Half of the unique human protein structures determined in the past three years derive from these initiatives.
Collapse
Affiliation(s)
- Aled Edwards
- Banting and Best Department of Medical Research, University of Toronto, Ontario M5G 1L6, Canada
| |
Collapse
|
28
|
Bhasi A, Philip P, Manikandan V, Senapathy P. ExDom: an integrated database for comparative analysis of the exon-intron structures of protein domains in eukaryotes. Nucleic Acids Res 2009; 37:D703-11. [PMID: 18984624 PMCID: PMC2686582 DOI: 10.1093/nar/gkn746] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2008] [Revised: 10/02/2008] [Accepted: 10/03/2008] [Indexed: 11/27/2022] Open
Abstract
We have developed ExDom, a unique database for the comparative analysis of the exon-intron structures of 96 680 protein domains from seven eukaryotic organisms (Homo sapiens, Mus musculus, Bos taurus, Rattus norvegicus, Danio rerio, Gallus gallus and Arabidopsis thaliana). ExDom provides integrated access to exon-domain data through a sophisticated web interface which has the following analytical capabilities: (i) intergenomic and intragenomic comparative analysis of exon-intron structure of domains; (ii) color-coded graphical display of the domain architecture of proteins correlated with their corresponding exon-intron structures; (iii) graphical analysis of multiple sequence alignments of amino acid and coding nucleotide sequences of homologous protein domains from seven organisms; (iv) comparative graphical display of exon distributions within the tertiary structures of protein domains; and (v) visualization of exon-intron structures of alternative transcripts of a gene correlated to variations in the domain architecture of corresponding protein isoforms. These novel analytical features are highly suited for detailed investigations on the exon-intron structure of domains and make ExDom a powerful tool for exploring several key questions concerning the function, origin and evolution of genes and proteins. ExDom database is freely accessible at: http://66.170.16.154/ExDom/.
Collapse
Affiliation(s)
- Ashwini Bhasi
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Philge Philip
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Vinu Manikandan
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| | - Periannan Senapathy
- Department of Human Genetics, Genome International Corp, 8000 Excelsior Drive, Madison, WI 53717, USA and Department of Bioinformatics, International Center for Advanced Genomics and Proteomics, 83, 1st Cross Street, Nehru Nagar, Chennai 600096, India
| |
Collapse
|
29
|
Redfern OC, Dessailly B, Orengo CA. Exploring the structure and function paradigm. Curr Opin Struct Biol 2008; 18:394-402. [PMID: 18554899 DOI: 10.1016/j.sbi.2008.05.007] [Citation(s) in RCA: 90] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2008] [Revised: 04/16/2008] [Accepted: 05/07/2008] [Indexed: 11/29/2022]
Abstract
Advances in protein structure determination, led by the structural genomics initiatives have increased the proportion of novel folds deposited in the Protein Data Bank. However, these structures are often not accompanied by functional annotations with experimental confirmation. In this review, we reassess the meaning of structural novelty and examine its relevance to the complexity of the structure-function paradigm. Recent advances in the prediction of protein function from structure are discussed, as well as new sequence-based methods for partitioning large, diverse superfamilies into biologically meaningful clusters. Obtaining structural data for these functionally coherent groups of proteins will allow us to better understand the relationship between structure and function.
Collapse
Affiliation(s)
- Oliver C Redfern
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, United Kingdom
| | | | | |
Collapse
|
30
|
Abstract
The success of the whole genome sequencing projects brought considerable credence to the belief that high-throughput approaches, rather than traditional hypothesis-driven research, would be essential to structurally and functionally annotate the rapid growth in available sequence data within a reasonable time frame. Such observations supported the emerging field of structural genomics, which is now faced with the task of providing a library of protein structures that represent the biological diversity of the protein universe. To run efficiently, structural genomics projects aim to define a set of targets that maximize the potential of each structure discovery whether it represents a novel structure, novel function, or missing evolutionary link. However, not all protein sequences make suitable structural genomics targets: It takes considerably more effort to determine the structure of a protein than the sequence of its gene because of the increased complexity of the methods involved and also because the behavior of targeted proteins can be extremely variable at the different stages in the structural genomics "pipeline." Therefore, structural genomics target selection must identify and prioritize the most suitable candidate proteins for structure determination, avoiding "problematic" proteins while also ensuring the ultimate goals of the project are followed.
Collapse
|
31
|
Towards completion of the Earth's proteome. EMBO Rep 2008; 8:1135-41. [PMID: 18059312 DOI: 10.1038/sj.embor.7401117] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2007] [Accepted: 10/15/2007] [Indexed: 11/08/2022] Open
Abstract
New protein sequences are deposited in databases at an accelerating pace; however, many of these are homologous to known proteins and could be considered redundant. If all historical releases of the protein database are analysed using the original sequence-clustering procedure described here, the fraction of newly sequenced proteins that are redundant is increasing. We interpret this as an indication that the sequencing of the Earth's proteome--the complete set of proteins on Earth--is approaching completion. We estimate the approximate size of the Earth's proteome to be 5 million sequences, most of which will be identified during the next 5 years. As the Earth's proteome nears completion, cluster analysis of the protein database will become essential to identify under-explored taxa to which future sequencing efforts should be directed and to focus research on protein families without experimental characterization.
Collapse
|
32
|
Martin J, de Brevern AG, Camproux AC. In silico local structure approach: a case study on outer membrane proteins. Proteins 2008; 71:92-109. [PMID: 17932925 DOI: 10.1002/prot.21659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results.
Collapse
Affiliation(s)
- Juliette Martin
- INSERM UMR-S 726/Université Denis Diderot Paris 7, Equipe de Bioinformatique Génomique et Moléculaire, F-75005 Paris
| | | | | |
Collapse
|
33
|
|
34
|
Carter P, Lee D, Orengo C. Chapter 1. Target selection in structural genomics projects to increase knowledge of protein structure and function space. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2008; 75:1-52. [PMID: 20731988 DOI: 10.1016/s0065-3233(07)75001-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Structural genomics aims to solve the three-dimensional structures of proteins at a rapid rate and in a cost-effective manner, with the hope of significantly impacting on the life sciences, biotechnology, and drug discovery in the long-term. Structural genomics initiatives started in Japan in 1997 with the advent of the Protein Folds Project. Since then many new initiatives have begun worldwide, with diverse aims motivating the selection of proteins for structure determination. In this chapter, we consider the biological goals of high-throughput structural biology, while focusing on the Protein Structure Initiative in the United States. This is the most productive of the structural genomics initiatives, having solved 3,363 new structures between September 2000 and October 2008.
Collapse
Affiliation(s)
- Phil Carter
- Department of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | | | | |
Collapse
|
35
|
Jansson M, Wårell K, Levander F, James P. Membrane Protein Identification: N-Terminal Labeling of Nontryptic Membrane Protein Peptides Facilitates Database Searching. J Proteome Res 2007; 7:659-65. [DOI: 10.1021/pr070545t] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Maria Jansson
- Department of Protein Technology, BMC D13, Lund University, Lund SE-221 84, Sweden
| | - Kristofer Wårell
- Department of Protein Technology, BMC D13, Lund University, Lund SE-221 84, Sweden
| | - Fredrik Levander
- Department of Protein Technology, BMC D13, Lund University, Lund SE-221 84, Sweden
| | - Peter James
- Department of Protein Technology, BMC D13, Lund University, Lund SE-221 84, Sweden
| |
Collapse
|
36
|
Grabowski M, Joachimiak A, Otwinowski Z, Minor W. Structural genomics: keeping up with expanding knowledge of the protein universe. Curr Opin Struct Biol 2007; 17:347-53. [PMID: 17587562 PMCID: PMC2885969 DOI: 10.1016/j.sbi.2007.06.003] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2007] [Revised: 04/11/2007] [Accepted: 06/13/2007] [Indexed: 11/15/2022]
Abstract
Structural characterization of the protein universe is the main mission of Structural Genomics (SG) programs. However, progress in gene sequencing technology, set in motion in the 1990s, has resulted in rapid expansion of protein sequence space--a twelvefold increase in the past seven years. For the SG field, this creates new challenges and necessitates a re-assessment of its strategies. Nevertheless, despite the growth of sequence space, at present nearly half of the content of the Swiss-Prot database and over 40% of Pfam protein families can be structurally modeled based on structures determined so far, with SG projects making an increasingly significant contribution. The SG contribution of new Pfam structures nearly doubled from 27.2% in 2003 to 51.6% in 2006.
Collapse
Affiliation(s)
- Marek Grabowski
- Department of Molecular Physiology and Biological Physics, University of Virginia, Charlottesville, VA 22903, USA
| | | | | | | |
Collapse
|
37
|
Jenney FE, Adams MWW. The impact of extremophiles on structural genomics (and vice versa). Extremophiles 2007; 12:39-50. [PMID: 17563834 DOI: 10.1007/s00792-007-0087-9] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Accepted: 04/19/2007] [Indexed: 11/24/2022]
Abstract
The advent of the complete genome sequences of various organisms in the mid-1990s raised the issue of how one could determine the function of hypothetical proteins. While insight might be obtained from a 3D structure, the chances of being able to predict such a structure is limited for the deduced amino acid sequence of any uncharacterized gene. A template for modeling is required, but there was only a low probability of finding a protein closely-related in sequence with an available structure. Thus, in the late 1990s, an international effort known as structural genomics (SG) was initiated, its primary goal to "fill sequence-structure space" by determining the 3D structures of representatives of all known protein families. This was to be achieved mainly by X-ray crystallography and it was estimated that at least 5,000 new structures would be required. While the proteins (genes) for SG have subsequently been derived from hundreds of different organisms, extremophiles and particularly thermophiles have been specifically targeted due to the increased stability and ease of handling of their proteins, relative to those from mesophiles. This review summarizes the significant impact that extremophiles and proteins derived from them have had on SG projects worldwide. To what extent SG has influenced the field of extremophile research is also discussed.
Collapse
Affiliation(s)
- Francis E Jenney
- Department of Biochemistry and Molecular Biology, University of Georgia, Davison Life Sciences Complex, Green Street, Athens, GA 30602-7229, USA
| | | |
Collapse
|
38
|
Raes J, Harrington ED, Singh AH, Bork P. Protein function space: viewing the limits or limited by our view? Curr Opin Struct Biol 2007; 17:362-9. [PMID: 17574832 DOI: 10.1016/j.sbi.2007.05.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Revised: 04/25/2007] [Accepted: 05/31/2007] [Indexed: 12/13/2022]
Abstract
Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%. Although this fraction is more uncertain in metagenomics surveys, because of environmental complexities and differences in analysis protocols, our global knowledge of protein functions still appears to be considerable. However, when we consider protein families, continued sequencing seems to yield an ever-increasing number of novel families. Until we reconcile these two views, the limits of protein space will remain obscured.
Collapse
Affiliation(s)
- Jeroen Raes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | | | |
Collapse
|
39
|
Gollery M, Harper J, Cushman J, Mittler T, Girke T, Zhu JK, Bailey-Serres J, Mittler R. What makes species unique? The contribution of proteins with obscure features. Genome Biol 2007; 7:R57. [PMID: 16859532 PMCID: PMC1779552 DOI: 10.1186/gb-2006-7-7-r57] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2006] [Revised: 04/28/2006] [Accepted: 06/27/2006] [Indexed: 11/23/2022] Open
Abstract
An analysis of proteins with obscure features in ten eukaryotic genomes revealed that the majority are species-specific. Background Proteins with obscure features (POFs), which lack currently defined motifs or domains, represent between 18% and 38% of a typical eukaryotic proteome. To evaluate the contribution of this class of proteins to the diversity of eukaryotes, we performed a comparative analysis of the predicted proteomes derived from 10 different sequenced genomes, including budding and fission yeast, worm, fly, mosquito, Arabidopsis, rice, mouse, rat, and human. Results Only 1,650 protein groups were found to be conserved among these proteomes (BLAST E-value threshold of 10-6). Of these, only three were designated as POFs. Surprisingly, we found that, on average, 60% of the POFs identified in these 10 proteomes (44,236 in total) were species specific. In contrast, only 7.5% of the proteins with defined features (PDFs) were species specific (17,554 in total). As a group, POFs appear similar to PDFs in their relative contribution to biological functions, as indicated by their expression, participation in protein-protein interactions and association with mutant phenotypes. However, POF have more predicted disordered structure than PDFs, implying that they may exhibit preferential involvement in species-specific regulatory and signaling networks. Conclusion Because the majority of eukaryotic POFs are not well conserved, and by definition do not have defined domains or motifs upon which to formulate a functional working hypothesis, understanding their biochemical and biological functions will require species-specific investigations.
Collapse
Affiliation(s)
- Martin Gollery
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Jeff Harper
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - John Cushman
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Taliah Mittler
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| | - Thomas Girke
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Jian-Kang Zhu
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Julia Bailey-Serres
- Center for Plant Cell Biology, University Of California, Riverside, CA 92521, USA
| | - Ron Mittler
- Department of Biochemistry and Molecular Biology, University Of Nevada, Reno, NV 89557, USA
| |
Collapse
|
40
|
Marti-Renom MA, Rossi A, Al-Shahrour F, Davis FP, Pieper U, Dopazo J, Sali A. The AnnoLite and AnnoLyze programs for comparative annotation of protein structures. BMC Bioinformatics 2007; 8 Suppl 4:S4. [PMID: 17570147 PMCID: PMC1892083 DOI: 10.1186/1471-2105-8-s4-s4] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Advances in structural biology, including structural genomics, have resulted in a rapid increase in the number of experimentally determined protein structures. However, about half of the structures deposited by the structural genomics consortia have little or no information about their biological function. Therefore, there is a need for tools for automatically and comprehensively annotating the function of protein structures. We aim to provide such tools by applying comparative protein structure annotation that relies on detectable relationships between protein structures to transfer functional annotations. Here we introduce two programs, AnnoLite and AnnoLyze, which use the structural alignments deposited in the DBAli database. Description AnnoLite predicts the SCOP, CATH, EC, InterPro, PfamA, and GO terms with an average sensitivity of ~90% and average precision of ~80%. AnnoLyze predicts ligand binding site and domain interaction patches with an average sensitivity of ~70% and average precision of ~30%, correctly localizing binding sites for small molecules in ~95% of its predictions. Conclusion The AnnoLite and AnnoLyze programs for comparative annotation of protein structures can reliably and automatically annotate new protein structures. The programs are fully accessible via the Internet as part of the DBAli suite of tools at .
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Structural Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrea Rossi
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Fátima Al-Shahrour
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Fred P Davis
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| | - Joaquín Dopazo
- Functional Genomics Unit, Bioinformatics Department, Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry, and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
41
|
Marsden RL, Lewis TA, Orengo CA. Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint. BMC Bioinformatics 2007; 8:86. [PMID: 17349043 PMCID: PMC1829165 DOI: 10.1186/1471-2105-8-86] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2006] [Accepted: 03/09/2007] [Indexed: 11/25/2022] Open
Abstract
Background Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families. Results In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterised families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterised domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies. Conclusion This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Tony A Lewis
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| | - Christine A Orengo
- Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK
| |
Collapse
|
42
|
Rudiño-Piñera E, Ravelli RBG, Sheldrick GM, Nanao MH, Korostelev VV, Werner JM, Schwarz-Linek U, Potts JR, Garman EF. The solution and crystal structures of a module pair from the Staphylococcus aureus-binding site of human fibronectin--a tale with a twist. J Mol Biol 2007; 368:833-44. [PMID: 17368672 DOI: 10.1016/j.jmb.2007.02.061] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2006] [Revised: 02/10/2007] [Accepted: 02/16/2007] [Indexed: 10/23/2022]
Abstract
An important goal of structural studies of modular proteins is to determine the inter-module orientation, which often influences biological function. The N-terminal domain of human fibronectin (Fn) is composed of a string of five type 1 modules (F1). Despite their small size, to date F1 modules have proved intractable to X-ray structure solution, although there are several NMR structures available. Here, we present the first structures (two X-ray models and an NMR-derived model) of the (2)F1(3)F1 module pair, which forms part of the binding site for Fn-binding proteins from pathogenic bacteria. The crystallographic structure determination was aided by the novel technique of UV radiation damage-induced phasing. The individual module structures are very similar in all three models. In the NMR structure and one of the X-ray structures, a similar but smaller interdomain interface than that observed previously for (4)F1(5)F1 is seen. The other X-ray structure has a different interdomain orientation. This work underlines the benefits of combining X-ray and NMR data in the studies of multi-domain proteins.
Collapse
|
43
|
Pratelli R, Pilot G. The plant-specific VIMAG domain of Glutamine Dumper1 is necessary for the function of the protein in Arabidopsis. FEBS Lett 2006; 580:6961-6. [PMID: 17157837 DOI: 10.1016/j.febslet.2006.11.064] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2006] [Revised: 11/21/2006] [Accepted: 11/21/2006] [Indexed: 11/23/2022]
Abstract
The over-expression of the arabidopsis GLUTAMINE DUMPER1 gene (GDU1) leads to increased amino acid content and transport. In a screening for mutations suppressing this phenotype, a mutant was isolated. The mutation leads to a glycine to arginine substitution in one of the two conserved domains of the protein, the VIMAG domain. More detailed structure function relationship analyses showed that the presence of this domain and the membrane localisation are both necessary for the function of the GDU1 protein. These results shed light on the function of the GDU1 protein whose family is specific to plants.
Collapse
Affiliation(s)
- Réjane Pratelli
- Institute for Cellular and Molecular Botany (IZMB), Kirschallee 1, 53115 Bonn, Germany
| | | |
Collapse
|
44
|
Abstract
Owing to the ongoing success of the genome sequencing and structural genomics projects, the increase in both sequence and structural data is rapid. The development of tools for the annotation of sequence and structural data has become more important in the hope of keeping up with this data explosion. Scientists in this field have addressed these issues over the last 10 years and there now exists a wealth of methods and approaches to help interpret these data. However, there is no current way in which these methods can be incorporated easily so that the resulting annotations can be viewed together. This review discusses the development of these annotation methods and introduces the BioSapiens Network of Excellence, which has been formed in order to integrate the methods which have been developed in Europe.
Collapse
Affiliation(s)
- Gabrielle A Reeves
- EMBL--European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
45
|
Marsden RL, Ranea JAG, Sillero A, Redfern O, Yeats C, Maibaum M, Lee D, Addou S, Reeves GA, Dallman TJ, Orengo CA. Exploiting protein structure data to explore the evolution of protein function and biological complexity. Philos Trans R Soc Lond B Biol Sci 2006; 361:425-40. [PMID: 16524831 PMCID: PMC1609337 DOI: 10.1098/rstb.2005.1801] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
New directions in biology are being driven by the complete sequencing of genomes, which has given us the protein repertoires of diverse organisms from all kingdoms of life. In tandem with this accumulation of sequence data, worldwide structural genomics initiatives, advanced by the development of improved technologies in X-ray crystallography and NMR, are expanding our knowledge of structural families and increasing our fold libraries. Methods for detecting remote sequence similarities have also been made more sensitive and this means that we can map domains from these structural families onto genome sequences to understand how these families are distributed throughout the genomes and reveal how they might influence the functional repertoires and biological complexities of the organisms. We have used robust protocols to assign sequences from completed genomes to domain structures in the CATH database, allowing up to 60% of domain sequences in these genomes, depending on the organism, to be assigned to a domain family of known structure. Analysis of the distribution of these families throughout bacterial genomes identified more than 300 universal families, some of which had expanded significantly in proportion to genome size. These highly expanded families are primarily involved in metabolism and regulation and appear to make major contributions to the functional repertoire and complexity of bacterial organisms. When comparisons are made across all kingdoms of life, we find a smaller set of universal domain families (approx. 140), of which families involved in protein biosynthesis are the largest conserved component. Analysis of the behaviour of other families reveals that some (e.g. those involved in metabolism, regulation) have remained highly innovative during evolution, making it harder to trace their evolutionary ancestry. Structural analyses of metabolic families provide some insights into the mechanisms of functional innovation, which include changes in domain partnerships and significant structural embellishments leading to modulation of active sites and protein interactions.
Collapse
Affiliation(s)
- Russell L Marsden
- Department of Biochemistry, University College London Gower Street, London WC1E 6BT, UK.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|