1
|
Zheng Z, Goncearenco A, Berezovsky IN. Back in time to the Gly-rich prototype of the phosphate binding elementary function. Curr Res Struct Biol 2024; 7:100142. [PMID: 38655428 PMCID: PMC11035071 DOI: 10.1016/j.crstbi.2024.100142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Revised: 03/31/2024] [Accepted: 04/03/2024] [Indexed: 04/26/2024] Open
Abstract
Binding of nucleotides and their derivatives is one of the most ancient elementary functions dating back to the Origin of Life. We review here the works considering one of the key elements in binding of (di)nucleotide-containing ligands - phosphate binding. We start from a brief discussion of major participants, conditions, and events in prebiotic evolution that resulted in the Origin of Life. Tracing back to the basic functions, including metal and phosphate binding, and, potentially, formation of primitive protein-protein interactions, we focus here on the phosphate binding. Critically assessing works on the structural, functional, and evolutionary aspects of phosphate binding, we perform a simple computational experiment reconstructing its most ancient and generic sequence prototype. The profiles of the phosphate binding signatures have been derived in form of position-specific scoring matrices (PSSMs), their peculiarities depending on the type of the ligands have been analyzed, and evolutionary connections between them have been delineated. Then, the apparent prototype that gave rise to all relevant phosphate-binding signatures had also been reconstructed. We show that two major signatures of the phosphate binding that discriminate between the binding of dinucleotide- and nucleotide-containing ligands are GxGxxG and GxxGxG, respectively. It appears that the signature archetypal for dinucleotide-containing ligands is more generic, and it can frequently bind phosphate groups in nucleotide-containing ligands as well. The reconstructed prototype's key signature GxGGxG underlies the role of glycine residues in providing flexibility and interactions necessary for binding the phosphate groups. The prototype also contains other ancient amino acids, valine, and alanine, showing versatility towards evolutionary design and functional diversification.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
2
|
Meng W, Pan H, Sha Y, Zhai X, Xing A, Lingampelly SS, Sripathi SR, Wang Y, Li K. Metabolic Connectome and Its Role in the Prediction, Diagnosis, and Treatment of Complex Diseases. Metabolites 2024; 14:93. [PMID: 38392985 PMCID: PMC10890086 DOI: 10.3390/metabo14020093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 01/17/2024] [Accepted: 01/25/2024] [Indexed: 02/25/2024] Open
Abstract
The interconnectivity of advanced biological systems is essential for their proper functioning. In modern connectomics, biological entities such as proteins, genes, RNA, DNA, and metabolites are often represented as nodes, while the physical, biochemical, or functional interactions between them are represented as edges. Among these entities, metabolites are particularly significant as they exhibit a closer relationship to an organism's phenotype compared to genes or proteins. Moreover, the metabolome has the ability to amplify small proteomic and transcriptomic changes, even those from minor genomic changes. Metabolic networks, which consist of complex systems comprising hundreds of metabolites and their interactions, play a critical role in biological research by mediating energy conversion and chemical reactions within cells. This review provides an introduction to common metabolic network models and their construction methods. It also explores the diverse applications of metabolic networks in elucidating disease mechanisms, predicting and diagnosing diseases, and facilitating drug development. Additionally, it discusses potential future directions for research in metabolic networks. Ultimately, this review serves as a valuable reference for researchers interested in metabolic network modeling, analysis, and their applications.
Collapse
Affiliation(s)
- Weiyu Meng
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| | - Hongxin Pan
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| | - Yuyang Sha
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| | - Xiaobing Zhai
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| | - Abao Xing
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| | | | - Srinivasa R Sripathi
- Henderson Ocular Stem Cell Laboratory, Retina Foundation of the Southwest, Dallas, TX 75231, USA
| | - Yuefei Wang
- National Key Laboratory of Chinese Medicine Modernization, State Key Laboratory of Component-Based Chinese Medicine, Tianjin University of Traditional Chinese Medicine, Tianjin 301617, China
- Haihe Laboratory of Modern Chinese Medicine, Tianjin 301617, China
| | - Kefeng Li
- Center for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, Macau SAR 999078, China
| |
Collapse
|
3
|
Riziotis IG, Ribeiro AJM, Borkakoti N, Thornton JM. The 3D Modules of Enzyme Catalysis: Deconstructing Active Sites into Distinct Functional Entities. J Mol Biol 2023; 435:168254. [PMID: 37652131 DOI: 10.1016/j.jmb.2023.168254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 08/20/2023] [Accepted: 08/22/2023] [Indexed: 09/02/2023]
Abstract
Enzyme catalysis is governed by a limited toolkit of residues and organic or inorganic co-factors. Therefore, it is expected that recurring residue arrangements will be found across the enzyme space, which perform a defined catalytic function, are structurally similar and occur in unrelated enzymes. Leveraging the integrated information in the Mechanism and Catalytic Site Atlas (M-CSA) (enzyme structure, sequence, catalytic residue annotations, catalysed reaction, detailed mechanism description), 3D templates were derived to represent compact groups of catalytic residues. A fuzzy template-template search, allowed us to identify those recurring motifs, which are conserved or convergent, that we define as the "modules of enzyme catalysis". We show that a large fraction of these modules facilitate binding of metal ions, co-factors and substrates, and are frequently the result of convergent evolution. A smaller number of convergent modules perform a well-defined catalytic role, such as the variants of the catalytic triad (i.e. Ser-His-Asp/Cys-His-Asp) and the saccharide-cleaving Asp/Glu triad. It is also shown that enzymes whose functions have diverged during evolution preserve regions of their active site unaltered, as shown by modules performing similar or identical steps of the catalytic mechanism. We have compiled a comprehensive library of catalytic modules, that characterise a broad spectrum of enzymes. These modules can be used as templates in enzyme design and for better understanding catalysis in 3D.
Collapse
Affiliation(s)
- Ioannis G Riziotis
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK.
| | - António J M Ribeiro
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Neera Borkakoti
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| | - Janet M Thornton
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, CB10 1SD Cambridge, UK
| |
Collapse
|
4
|
Laveglia V, Giachetti A, Sala D, Andreini C, Rosato A. Learning to Identify Physiological and Adventitious Metal-Binding Sites in the Three-Dimensional Structures of Proteins by Following the Hints of a Deep Neural Network. J Chem Inf Model 2022; 62:2951-2960. [PMID: 35679182 PMCID: PMC9241070 DOI: 10.1021/acs.jcim.2c00522] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Thirty-eight percent of protein structures in the Protein Data Bank contain at least one metal ion. However, not all these metal sites are biologically relevant. Cations present as impurities during sample preparation or in the crystallization buffer can cause the formation of protein-metal complexes that do not exist in vivo. We implemented a deep learning approach to build a classifier able to distinguish between physiological and adventitious zinc-binding sites in the 3D structures of metalloproteins. We trained the classifier using manually annotated sites extracted from the MetalPDB database. Using a 10-fold cross validation procedure, the classifier achieved an accuracy of about 90%. The same neural classifier could predict the physiological relevance of non-heme mononuclear iron sites with an accuracy of nearly 80%, suggesting that the rules learned on zinc sites have general relevance. By quantifying the relative importance of the features describing the input zinc sites from the network perspective and by analyzing the characteristics of the MetalPDB datasets, we inferred some common principles. Physiological sites present a low solvent accessibility of the aminoacids forming coordination bonds with the metal ion (the metal ligands), a relatively large number of residues in the metal environment (≥20), and a distinct pattern of conservation of Cys and His residues in the site. Adventitious sites, on the other hand, tend to have a low number of donor atoms from the polypeptide chain (often one or two). These observations support the evaluation of the physiological relevance of novel metal-binding sites in protein structures.
Collapse
Affiliation(s)
- Vincenzo Laveglia
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Davide Sala
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Institute for Drug Discovery, Leipzig University, Brüderstr. 34, 04103 Leipzig, Germany.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudia Andreini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy.,Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
5
|
Paiva VDA, Gomes IDS, Monteiro CR, Mendonça MV, Martins PM, Santana CA, Gonçalves-Almeida V, Izidoro SC, Melo-Minardi RCD, Silveira SDA. Protein structural bioinformatics: An overview. Comput Biol Med 2022; 147:105695. [DOI: 10.1016/j.compbiomed.2022.105695] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 06/01/2022] [Accepted: 06/02/2022] [Indexed: 11/27/2022]
|
6
|
Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chin J Chem Eng 2022. [DOI: 10.1016/j.cjche.2021.08.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
7
|
Wu D, Saleem M, He T, He G. The Mechanism of Metal Homeostasis in Plants: A New View on the Synergistic Regulation Pathway of Membrane Proteins, Lipids and Metal Ions. MEMBRANES 2021; 11:membranes11120984. [PMID: 34940485 PMCID: PMC8706360 DOI: 10.3390/membranes11120984] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/29/2021] [Revised: 12/04/2021] [Accepted: 12/11/2021] [Indexed: 12/15/2022]
Abstract
Heavy metal stress (HMS) is one of the most destructive abiotic stresses which seriously affects the growth and development of plants. Recent studies have shown significant progress in understanding the molecular mechanisms underlying plant tolerance to HMS. In general, three core signals are involved in plants' responses to HMS; these are mitogen-activated protein kinase (MAPK), calcium, and hormonal (abscisic acid) signals. In addition to these signal components, other regulatory factors, such as microRNAs and membrane proteins, also play an important role in regulating HMS responses in plants. Membrane proteins interact with the highly complex and heterogeneous lipids in the plant cell environment. The function of membrane proteins is affected by the interactions between lipids and lipid-membrane proteins. Our review findings also indicate the possibility of membrane protein-lipid-metal ion interactions in regulating metal homeostasis in plant cells. In this review, we investigated the role of membrane proteins with specific substrate recognition in regulating cell metal homeostasis. The understanding of the possible interaction networks and upstream and downstream pathways is developed. In addition, possible interactions between membrane proteins, metal ions, and lipids are discussed to provide new ideas for studying metal homeostasis in plant cells.
Collapse
Affiliation(s)
- Danxia Wu
- College of Agricultural, Guizhou University, Guiyang 550025, China;
| | - Muhammad Saleem
- Department of Biological Sciences, Alabama State University, Montgomery, AL 36104, USA;
| | - Tengbing He
- College of Agricultural, Guizhou University, Guiyang 550025, China;
- Institute of New Rural Development, West Campus, Guizhou University, Guiyang 550025, China
- Correspondence: (T.H.); (G.H.)
| | - Guandi He
- College of Agricultural, Guizhou University, Guiyang 550025, China;
- Correspondence: (T.H.); (G.H.)
| |
Collapse
|
8
|
Yin M, Goncearenco A, Berezovsky IN. Deriving and Using Descriptors of Elementary Functions in Rational Protein Design. FRONTIERS IN BIOINFORMATICS 2021; 1:657529. [PMID: 36303771 PMCID: PMC9581014 DOI: 10.3389/fbinf.2021.657529] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/15/2021] [Indexed: 06/26/2024] Open
Abstract
The rational design of proteins with desired functions requires a comprehensive description of the functional building blocks. The evolutionary conserved functional units constitute nature's toolbox; however, they are not readily available to protein designers. This study focuses on protein units of subdomain size that possess structural properties and amino acid residues sufficient to carry out elementary reactions in the catalytic mechanisms. The interactions within such elementary functional loops (ELFs) and the interactions with the surrounding protein scaffolds constitute the descriptor of elementary function. The computational approach to deriving descriptors directly from protein sequences and structures and applying them in rational design was implemented in a proof-of-concept DEFINED-PROTEINS software package. Once the descriptor is obtained, the ELF can be fitted into existing or novel scaffolds to obtain the desired function. For instance, the descriptor may be used to determine the necessary spatial restraints in a fragment-based grafting protocol. We illustrated the approach by applying it to well-known cases of ELFs, including phosphate-binding P-loop, diphosphate-binding glycine-rich motif, and calcium-binding EF-hand motif, which could be used to jumpstart templates for user applications. The DEFINED-PROTEINS package is available for free at https://github.com/MelvinYin/Defined_Proteins.
Collapse
Affiliation(s)
- Melvin Yin
- Bioinformatics Institute, Agency for Science, Technology, and Research (ASTAR), Singapore, Singapore
| | - Alexander Goncearenco
- National Center for Biotechnology Information, National Institute of Health (NIH), Bethesda, MD, United States
| | - Igor N. Berezovsky
- Bioinformatics Institute, Agency for Science, Technology, and Research (ASTAR), Singapore, Singapore
- Department of Biological Sciences (DBS), National University of Singapore (NUS), Singapore, Singapore
| |
Collapse
|
9
|
Lopez-Del Rio A, Martin M, Perera-Lluna A, Saidi R. Effect of sequence padding on the performance of deep learning models in archaeal protein functional prediction. Sci Rep 2020; 10:14634. [PMID: 32884053 PMCID: PMC7471694 DOI: 10.1038/s41598-020-71450-8] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Accepted: 08/06/2020] [Indexed: 11/08/2022] Open
Abstract
The use of raw amino acid sequences as input for deep learning models for protein functional prediction has gained popularity in recent years. This scheme obliges to manage proteins with different lengths, while deep learning models require same-shape input. To accomplish this, zeros are usually added to each sequence up to a established common length in a process called zero-padding. However, the effect of different padding strategies on model performance and data structure is yet unknown. We propose and implement four novel types of padding the amino acid sequences. Then, we analysed the impact of different ways of padding the amino acid sequences in a hierarchical Enzyme Commission number prediction problem. Results show that padding has an effect on model performance even when there are convolutional layers implied. Contrastingly to most of deep learning works which focus mainly on architectures, this study highlights the relevance of the deemed-of-low-importance process of padding and raises awareness of the need to refine it for better performance. The code of this analysis is publicly available at https://github.com/b2slab/padding_benchmark .
Collapse
Affiliation(s)
- Angela Lopez-Del Rio
- B2SLab, Department d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028, Barcelona, Spain.
- Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Dèu, 08950, Esplugues de Llobregat, Spain.
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK
| | - Alexandre Perera-Lluna
- B2SLab, Department d'Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, 08028, Barcelona, Spain
- Department of Biomedical Engineering, Institut de Recerca Pediàtrica Hospital Sant Joan de Dèu, 08950, Esplugues de Llobregat, Spain
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, CB10 1SD, UK
| |
Collapse
|
10
|
Chu XY, Zhang HY. Cofactors as Molecular Fossils To Trace the Origin and Evolution of Proteins. Chembiochem 2020; 21:3161-3168. [PMID: 32515532 DOI: 10.1002/cbic.202000027] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2020] [Revised: 06/03/2020] [Indexed: 12/16/2022]
Abstract
Due to their early origin and extreme conservation, cofactors are valuable molecular fossils for tracing the origin and evolution of proteins. First, as the order of protein folds binding with cofactors roughly coincides with protein-fold chronology, cofactors are considered to have facilitated the origin of primitive proteins by selecting them from pools of random amino acid sequences. Second, in the subsequent evolution of proteins, cofactors still played an important role. More interestingly, as metallic cofactors evolved with geochemical variations, some geochemical events left imprints in the chronology of protein architecture; this provides further evidence supporting the coevolution of biochemistry and geochemistry. In this paper, we attempt to review the molecular fossils used in tracing the origin and evolution of proteins, with a special focus on cofactors.
Collapse
Affiliation(s)
- Xin-Yi Chu
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hong-Yu Zhang
- Hubei Key Laboratory of Agricultural Bioinformatics College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| |
Collapse
|
11
|
Holliday GL, Brown SD, Mischel D, Polacco BJ, Babbitt PC. A strategy for large-scale comparison of evolutionary- and reaction-based classifications of enzyme function. Database (Oxford) 2020; 2020:baaa034. [PMID: 32449511 PMCID: PMC7246345 DOI: 10.1093/database/baaa034] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2019] [Revised: 03/18/2020] [Accepted: 04/27/2020] [Indexed: 12/12/2022]
Abstract
Determining the molecular function of enzymes discovered by genome sequencing represents a primary foundation for understanding many aspects of biology. Historically, classification of enzyme reactions has used the enzyme nomenclature system developed to describe the overall reactions performed by biochemically characterized enzymes, irrespective of their associated sequences. In contrast, functional classification and assignment for the millions of protein sequences of unknown function now available is largely done in two computational steps, first by similarity-based assignment of newly obtained sequences to homologous groups, followed by transferring to them the known functions of similar biochemically characterized homologs. Due to the fundamental differences in their etiologies and practice, `how' these chemistry- and evolution-centric functional classification systems relate to each other has been difficult to explore on a large scale. To investigate this issue in a new way, we integrated two published ontologies that had previously described each of these classification systems independently. The resulting infrastructure was then used to compare the functional assignments obtained from each classification system for the well-studied and functionally diverse enolase superfamily. Mapping these function assignments to protein structure and reaction similarity networks shows a profound and complex disconnect between the homology- and chemistry-based classification systems. This conclusion mirrors previous observations suggesting that except for closely related sequences, facile annotation transfer from small numbers of characterized enzymes to the huge number uncharacterized homologs to which they are related is problematic. Our extension of these comparisons to large enzyme superfamilies in a computationally intelligent manner provides a foundation for new directions in protein function prediction for the huge proportion of sequences of unknown function represented in major databases. Interactive sequence, reaction, substrate and product similarity networks computed for this work for the enolase and two other superfamilies are freely available for download from the Structure Function Linkage Database Archive (http://sfld.rbvi.ucsf.edu).
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Present Address: Medicines Discovery Catapult, Mereside, Alderley Park, Alderley Edge SK10 4TG, UK
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Benjamin J Polacco
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
- Quantitative Biosciences Institute, University of California, San Francisco, San Francisco, 1700 4th Street, CA 94143, USA
| |
Collapse
|
12
|
Ribeiro AJM, Tyzack JD, Borkakoti N, Holliday GL, Thornton JM. A global analysis of function and conservation of catalytic residues in enzymes. J Biol Chem 2019; 295:314-324. [PMID: 31796628 DOI: 10.1074/jbc.rev119.006289] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The catalytic residues of an enzyme comprise the amino acids located in the active center responsible for accelerating the enzyme-catalyzed reaction. These residues lower the activation energy of reactions by performing several catalytic functions. Decades of enzymology research has established general themes regarding the roles of specific residues in these catalytic reactions, but it has been more difficult to explore these roles in a more systematic way. Here, we review the data on the catalytic residues of 648 enzymes, as annotated in the Mechanism and Catalytic Site Atlas (M-CSA), and compare our results with those in previous studies. We structured this analysis around three key properties of the catalytic residues: amino acid type, catalytic function, and sequence conservation in homologous proteins. As expected, we observed that catalysis is mostly accomplished by a small set of residues performing a limited number of catalytic functions. Catalytic residues are typically highly conserved, but to a smaller degree in homologues that perform different reactions or are nonenzymes (pseudoenzymes). Cross-analysis yielded further insights revealing which residues perform particular functions and how often. We obtained more detailed specificity rules for certain functions by identifying the chemical group upon which the residue acts. Finally, we show the mutation tolerance of the catalytic residues based on their roles. The characterization of the catalytic residues, their functions, and conservation, as presented here, is key to understanding the impact of mutations in evolution, disease, and enzyme design. The tools developed for this analysis are available at the M-CSA website and allow for user specific analysis of the same data.
Collapse
Affiliation(s)
- António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom.
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| |
Collapse
|
13
|
Ribeiro AJM, Holliday GL, Furnham N, Tyzack JD, Ferris K, Thornton JM. Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 2019; 46:D618-D623. [PMID: 29106569 PMCID: PMC5753290 DOI: 10.1093/nar/gkx1012] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 10/13/2017] [Indexed: 12/28/2022] Open
Abstract
M-CSA (Mechanism and Catalytic Site Atlas) is a database of enzyme active sites and reaction mechanisms that can be accessed at www.ebi.ac.uk/thornton-srv/m-csa. Our objectives with M-CSA are to provide an open data resource for the community to browse known enzyme reaction mechanisms and catalytic sites, and to use the dataset to understand enzyme function and evolution. M-CSA results from the merging of two existing databases, MACiE (Mechanism, Annotation and Classification in Enzymes), a database of enzyme mechanisms, and CSA (Catalytic Site Atlas), a database of catalytic sites of enzymes. We are releasing M-CSA as a new website and underlying database architecture. At the moment, M-CSA contains 961 entries, 423 of these with detailed mechanism information, and 538 with information on the catalytic site residues only. In total, these cover 81% (195/241) of third level EC numbers with a PDB structure, and 30% (840/2793) of fourth level EC numbers with a PDB structure, out of 6028 in total. By searching for close homologues, we are able to extend M-CSA coverage of PDB and UniProtKB to 51 993 structures and to over five million sequences, respectively, of which about 40% and 30% have a conserved active site.
Collapse
Affiliation(s)
- António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 1HT, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Katherine Ferris
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
14
|
Abstract
Plants produce a diverse portfolio of sesquiterpenes that are important in their response to herbivores and the interaction with other plants. Their biosynthesis from farnesyl diphosphate depends on the sesquiterpene synthases that admit different cyclizations and rearrangements to yield a blend of sesquiterpenes. Here, we investigate to what extent sesquiterpene biosynthesis metabolic pathways can be reconstructed just from the knowledge of the final product and the reaction mechanisms catalyzed by sesquiterpene synthases. We use the software package MedØlDatschgerl (MØD) to generate chemical networks and to elucidate pathways contained in them. As examples, we successfully consider the reachability of the important plant sesquiterpenes β -caryophyllene, α -humulene, and β -farnesene. We also introduce a graph database to integrate the simulation results with experimental biological evidence for the selected predicted sesquiterpenes biosynthesis.
Collapse
|
15
|
Silveira MC, Azevedo da Silva R, Faria da Mota F, Catanho M, Jardim R, R Guimarães AC, de Miranda AB. Systematic Identification and Classification of β-Lactamases Based on Sequence Similarity Criteria: β-Lactamase Annotation. Evol Bioinform Online 2018; 14:1176934318797351. [PMID: 30210232 PMCID: PMC6131288 DOI: 10.1177/1176934318797351] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 08/08/2018] [Indexed: 12/11/2022] Open
Abstract
β-lactamases, the enzymes responsible for resistance to β-lactam antibiotics, are
widespread among prokaryotic genera. However, current β-lactamase classification
schemes do not represent their present diversity. Here, we propose a workflow to
identify and classify β-lactamases. Initially, a set of curated sequences was
used as a model for the construction of profiles Hidden Markov Models (HMM),
specific for each β-lactamase class. An extensive, nonredundant set of
β-lactamase sequences was constructed from 7 different resistance proteins
databases to test the methodology. The profiles HMM were improved for their
specificity and sensitivity and then applied to fully assembled genomes. Five
hierarchical classification levels are described, and a new class of
β-lactamases with fused domains is proposed. Our profiles HMM provide a better
annotation of β-lactamases, with classes and subclasses defined by objective
criteria such as sequence similarity. This classification offers a solid base to
the elaboration of studies on the diversity, dispersion, prevalence, and
evolution of the different classes and subclasses of this critical enzymatic
activity.
Collapse
Affiliation(s)
- Melise Chaves Silveira
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Rangeline Azevedo da Silva
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Fábio Faria da Mota
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Marcos Catanho
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Rodrigo Jardim
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Ana Carolina R Guimarães
- Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| | - Antonio B de Miranda
- Laboratório de Biologia Computacional e Sistemas, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil
| |
Collapse
|
16
|
Sivakumar TV, Bhaduri A, Duvvuru Muni RR, Park JH, Kim TY. SimCAL: a flexible tool to compute biochemical reaction similarity. BMC Bioinformatics 2018; 19:254. [PMID: 29969981 PMCID: PMC6029250 DOI: 10.1186/s12859-018-2248-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2017] [Accepted: 06/14/2018] [Indexed: 11/29/2022] Open
Abstract
Background Computation of reaction similarity is a pre-requisite for several bioinformatics applications including enzyme identification for specific biochemical reactions, enzyme classification and mining for specific inhibitors. Reaction similarity is often assessed at either two levels: (i) comparison across all the constituent substrates and products of a reaction, reaction level similarity, (ii) comparison at the transformation center with various degrees of neighborhood, transformation level similarity. Existing reaction similarity computation tools are designed for specific applications and use different features and similarity measures. A single system integrating these diverse features enables comparison of the impact of different molecular properties on similarity score computation. Results To address these requirements, we present SimCAL, an integrated system to calculate reaction similarity with novel features and capability to perform comparative assessment. SimCAL provides reaction similarity computation at both whole reaction level and transformation level. Novel physicochemical features such as stereochemistry, mass, volume and charge are included in computing reaction fingerprint. Users can choose from four different fingerprint types and nine molecular similarity measures. Further, a comparative assessment of these features is also enabled. The performance of SimCAL is assessed on 3,688,122 reaction pairs with Enzyme Commission (EC) number from MetaCyc and achieved an area under the curve (AUC) of > 0.9. In addition, SimCAL results showed strong correlation with state-of-the-art EC-BLAST and molecular signature based reaction similarity methods. Conclusions SimCAL is developed in java and is available as a standalone tool, with intuitive, user-friendly graphical interface and also as a console application. With its customizable feature selection and similarity calculations, it is expected to cater a wide audience interested in studying and analyzing biochemical reactions and metabolic networks. Electronic supplementary material The online version of this article (10.1186/s12859-018-2248-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Anirban Bhaduri
- Bioinformatics Lab, Samsung Advanced Institute of Technology, Bangalore, 560037, India
| | | | - Jin Hwan Park
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea
| | - Tae Yong Kim
- Biomaterials Lab, Materials Center, Samsung Advanced Institute of Technology, Gyeonggi-do, 443803, South Korea.
| |
Collapse
|
17
|
Holliday GL, Brown SD, Akiva E, Mischel D, Hicks MA, Morris JH, Huang CC, Meng EC, Pegg SCH, Ferrin TE, Babbitt PC. Biocuration in the structure-function linkage database: the anatomy of a superfamily. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3074783. [PMID: 28365730 PMCID: PMC5467563 DOI: 10.1093/database/bax006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 01/23/2017] [Indexed: 12/11/2022]
Abstract
With ever-increasing amounts of sequence data available in both the primary literature and sequence repositories, there is a bottleneck in annotating molecular function to a sequence. This article describes the biocuration process and methods used in the structure-function linkage database (SFLD) to help address some of the challenges. We discuss how the hierarchy within the SFLD allows us to infer detailed functional properties for functionally diverse enzyme superfamilies in which all members are homologous, conserve an aspect of their chemical function and have associated conserved structural features that enable the chemistry. Also presented is the Enzyme Structure-Function Ontology (ESFO), which has been designed to capture the relationships between enzyme sequence, structure and function that underlie the SFLD and is used to guide the biocuration processes within the SFLD. Database URL:http://sfld.rbvi.ucsf.edu/
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Shoshana D Brown
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - David Mischel
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA
| | - Michael A Hicks
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Human Longevity, Inc, San Diego, CA 92121, USA
| | - John H Morris
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Conrad C Huang
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | - Elaine C Meng
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA
| | | | - Thomas E Ferrin
- Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94143, USA.,Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, CA 94143, USA.,California Institute for Quantitative Biosciences, University of California, San Francisco, CA 94158, USA
| |
Collapse
|
18
|
Protein Moonlighting Revealed by Noncatalytic Phenotypes of Yeast Enzymes. Genetics 2017; 208:419-431. [PMID: 29127264 DOI: 10.1534/genetics.117.300377] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Accepted: 11/06/2017] [Indexed: 12/19/2022] Open
Abstract
A single gene can partake in several biological processes, and therefore gene deletions can lead to different-sometimes unexpected-phenotypes. However, it is not always clear whether such pleiotropy reflects the loss of a unique molecular activity involved in different processes or the loss of a multifunctional protein. Here, using Saccharomyces cerevisiae metabolism as a model, we systematically test the null hypothesis that enzyme phenotypes depend on a single annotated molecular function, namely their catalysis. We screened a set of carefully selected genes by quantifying the contribution of catalysis to gene deletion phenotypes under different environmental conditions. While most phenotypes were explained by loss of catalysis, slow growth was readily rescued by a catalytically inactive protein in about one-third of the enzymes tested. Such noncatalytic phenotypes were frequent in the Alt1 and Bat2 transaminases and in the isoleucine/valine biosynthetic enzymes Ilv1 and Ilv2, suggesting novel "moonlighting" activities in these proteins. Furthermore, differential genetic interaction profiles of gene deletion and catalytic mutants indicated that ILV1 is functionally associated with regulatory processes, specifically to chromatin modification. Our systematic study shows that gene loss phenotypes and their genetic interactions are frequently not driven by the loss of an annotated catalytic function, underscoring the moonlighting nature of cellular metabolism.
Collapse
|
19
|
Schomburg I, Jeske L, Ulbrich M, Placzek S, Chang A, Schomburg D. The BRENDA enzyme information system–From a database to an expert system. J Biotechnol 2017; 261:194-206. [DOI: 10.1016/j.jbiotec.2017.04.020] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 04/11/2017] [Accepted: 04/18/2017] [Indexed: 02/06/2023]
|
20
|
|
21
|
Lee SH, Hong SH, An JU, Kim KR, Kim DE, Kang LW, Oh DK. Structure-based prediction and identification of 4-epimerization activity of phosphate sugars in class II aldolases. Sci Rep 2017; 7:1934. [PMID: 28512318 PMCID: PMC5434028 DOI: 10.1038/s41598-017-02211-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 04/06/2017] [Indexed: 12/29/2022] Open
Abstract
Sugar 4-epimerization reactions are important for the production of rare sugars and their derivatives, which have various potential industrial applications. For example, the production of tagatose, a functional sweetener, from fructose by sugar 4-epimerization is currently constrained because a fructose 4-epimerase does not exist in nature. We found that class II d-fructose-1,6-bisphosphate aldolase (FbaA) catalyzed the 4-epimerization of d-fructose-6-phosphate (F6P) to d-tagatose-6-phosphate (T6P) based on the prediction via structural comparisons with epimerase and molecular docking and the identification of the condensed products of C3 sugars. In vivo, the 4-epimerization activity of FbaA is normally repressed. This can be explained by our results showing the catalytic efficiency of d-fructose-6-phosphate kinase for F6P phosphorylation was significantly higher than that of FbaA for F6P epimerization. Here, we identified the epimerization reactions and the responsible catalytic residues through observation of the reactions of FbaA and l-rhamnulose-1-phosphate aldolases (RhaD) variants with substituted catalytic residues using different substrates. Moreover, we obtained detailed potential epimerization reaction mechanism of FbaA and a general epimerization mechanism of the class II aldolases l-fuculose-1-phosphate aldolase, RhaD, and FbaA. Thus, class II aldolases can be used as 4-epimerases for the stereo-selective synthesis of valuable carbohydrates.
Collapse
Affiliation(s)
- Seon-Hwa Lee
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea
| | - Seung-Hye Hong
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jung-Ung An
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea
| | - Kyoung-Rok Kim
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea
| | - Dong-Eun Kim
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea
| | - Lin-Woo Kang
- Department of Biological Sciences, Konkuk University, Seoul, 05029, Republic of Korea
| | - Deok-Kun Oh
- Department of Bioscience and Biotechnology, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
22
|
Abstract
The GO captures many aspects of functional annotations, but there are other alternative complementary sources of protein function information. For example, enzyme functional annotations are described in a range of resources from the Enzyme Commission (E.C.) hierarchical classification to the Kyoto Encyclopedia of Genes and Genomes (KEGG) to the Catalytic Site Atlas amongst many others. This chapter describes some of the main resources available and how they can be used in conjunction with GO.
Collapse
Affiliation(s)
- Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London, WC1E 1HT, UK.
| |
Collapse
|
23
|
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]
Abstract
Study of the hierarchy of domain structure with alternative sets of domains and analysis of discontinuous domains, consisting of remote segments of the polypeptide chain, raised a question about the minimal structural unit of the protein domain. The hypothesis on the decisive role of the polypeptide backbone in determining the elementary units of globular proteins have led to the discovery of closed loops. It is reviewed here how closed loops form the loop-n-lock structure of proteins, providing the foundation for stability and designability of protein folds/domain and underlying their co-translational folding. Simplified protein sequences are considered here with the aim to explore the basic principles that presumably dominated the folding and stability of proteins in the early stages of structural evolution. Elementary functional loops (EFLs), closed loops with one or few catalytic residues, are, in turn, units of the protein function. They are apparent descendants of the prebiotic ring-like peptides, which gave rise to the first functional folds/domains being fused in the beginning of the evolution of protein structure. It is also shown how evolutionary relations between protein functional superfamilies and folds delineated with the help of EFLs can contribute to establishing the rules for design of desired enzymatic functions. Generalized descriptors of the elementary functions are proposed to be used as basic units in the future computational design.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Zejun Zheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| |
Collapse
|
24
|
Rahman SA, Torrance G, Baldacci L, Martínez Cuesta S, Fenninger F, Gopal N, Choudhary S, May JW, Holliday GL, Steinbeck C, Thornton JM. Reaction Decoder Tool (RDT): extracting features from chemical reactions. Bioinformatics 2016; 32:2065-6. [PMID: 27153692 PMCID: PMC4920114 DOI: 10.1093/bioinformatics/btw096] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 02/12/2016] [Indexed: 01/22/2023] Open
Abstract
Summary: Extracting chemical features like Atom–Atom Mapping (AAM), Bond Changes (BCs) and Reaction Centres from biochemical reactions helps us understand the chemical composition of enzymatic reactions. Reaction Decoder is a robust command line tool, which performs this task with high accuracy. It supports standard chemical input/output exchange formats i.e. RXN/SMILES, computes AAM, highlights BCs and creates images of the mapped reaction. This aids in the analysis of metabolic pathways and the ability to perform comparative studies of chemical reactions based on these features. Availability and implementation: This software is implemented in Java, supported on Windows, Linux and Mac OSX, and freely available at https://github.com/asad/ReactionDecoder Contact: asad@ebi.ac.uk or s9asad@gmail.com
Collapse
Affiliation(s)
- Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Congenica Ltd, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
| | - Gilliean Torrance
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Lorenzo Baldacci
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK DISI, University of Bologna, V.Le Risorgimento 2, Bologna, Italy
| | - Sergio Martínez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Cancer Research UK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Franz Fenninger
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Michael Smith Laboratories, the University of British Columbia, Vancouver, British Columbia V6T 1Z4, Canada
| | - Nimish Gopal
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Saket Choudhary
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - John W May
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Innovation Centre (Unit 23), Cambridge Science Park, NextMove Software Ltd, Cambridge CB4 0EY, UK
| | - Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Bioengineering, UCSF School of Pharmacy, San Francisco, CA 94158, USA
| | - Christoph Steinbeck
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
25
|
Abstract
Isomerization reactions are fundamental in biology, and isomers usually differ in their biological role and pharmacological effects. In this study, we have cataloged the isomerization reactions known to occur in biology using a combination of manual and computational approaches. This method provides a robust basis for comparison and clustering of the reactions into classes. Comparing our results with the Enzyme Commission (EC) classification, the standard approach to represent enzyme function on the basis of the overall chemistry of the catalyzed reaction, expands our understanding of the biochemistry of isomerization. The grouping of reactions involving stereoisomerism is straightforward with two distinct types (racemases/epimerases and cis-trans isomerases), but reactions entailing structural isomerism are diverse and challenging to classify using a hierarchical approach. This study provides an overview of which isomerases occur in nature, how we should describe and classify them, and their diversity.
Collapse
|
26
|
Beattie KE, De Ferrari L, Mitchell JBO. Why do Sequence Signatures Predict Enzyme Mechanism? Homology versus Chemistry. Evol Bioinform Online 2015; 11:267-74. [PMID: 26740739 PMCID: PMC4696837 DOI: 10.4137/ebo.s31482] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Revised: 11/04/2015] [Accepted: 11/08/2015] [Indexed: 01/25/2023] Open
Abstract
First, we identify InterPro sequence signatures representing evolutionary relatedness and, second, signatures identifying specific chemical machinery. Thus, we predict the chemical mechanisms of enzyme-catalyzed reactions from catalytic and non-catalytic subsets of InterPro signatures. We first scanned our 249 sequences using InterProScan and then used the MACiE database to identify those amino acid residues that are important for catalysis. The sequences were mutated in silico to replace these catalytic residues with glycine and then again scanned using InterProScan. Those signature matches from the original scan that disappeared on mutation were called catalytic. Mechanism was predicted using all signatures, only the 78 “catalytic” signatures, or only the 519 “non-catalytic” signatures. The non-catalytic signatures gave indistinguishable results from those for the whole feature set, with precision of 0.991 and sensitivity of 0.970. The catalytic signatures alone gave less impressive predictivity, with precision and sensitivity of 0.791 and 0.735, respectively. These results show that our successful prediction of enzyme mechanism is mostly by homology rather than by identifying catalytic machinery.
Collapse
Affiliation(s)
- Kirsten E Beattie
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | - Luna De Ferrari
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | - John B O Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| |
Collapse
|
27
|
Mussa HY, De Ferrari L, Mitchell JBO. Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces. BMC Res Notes 2015; 8:744. [PMID: 26634450 PMCID: PMC4669639 DOI: 10.1186/s13104-015-1730-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 11/20/2015] [Indexed: 11/25/2022] Open
Abstract
Background We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k1NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from the MACiE database. In the current study, we have carefully re-analysed our dataset and prediction results to “explain” why a high variance k1NN rule exhibited such remarkable classification performance. Results We find that enzymes with different chemical mechanism labels in this dataset reside in barely overlapping subspaces in the feature space defined by the 321 features selected. These features contain the appropriate information needed to accurately classify the enzymatic mechanisms, rendering our classification problem a basic look-up exercise. This observation dovetails with the low misclassification rate we reported. Conclusion Our results provide explanations for the “anomaly”—a basic nearest-neighbour algorithm exhibiting remarkable prediction performance for enzymatic mechanism despite the fact that the feature space was large and sparse. Our results also dovetail well with another finding we reported, namely that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also suggest simple rules that might enable one to inductively predict whether a novel enzyme possesses any of our 71 predefined mechanisms.
Collapse
Affiliation(s)
- Hamse Y Mussa
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, KY16 9ST, Scotland, UK.
| | - Luna De Ferrari
- The Centre for Genomic and Experimental Medicine, Western General Hospital, Crewe Road, Edinburgh, EH4 2XU, Scotland, UK.
| | - John B O Mitchell
- EaStCHEM School of Chemistry and Biomedical Sciences Research Complex, University of St Andrews, North Haugh, St Andrews, KY16 9ST, Scotland, UK.
| |
Collapse
|
28
|
Hagedoorn PL. Microbial Metalloproteomics. Proteomes 2015; 3:424-439. [PMID: 28248278 PMCID: PMC5217388 DOI: 10.3390/proteomes3040424] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2015] [Revised: 11/04/2015] [Accepted: 11/23/2015] [Indexed: 12/12/2022] Open
Abstract
Metalloproteomics is a rapidly developing field of science that involves the comprehensive analysis of all metal-containing or metal-binding proteins in a biological sample. The purpose of this review is to offer a comprehensive overview of the research involving approaches that can be categorized as inductively coupled plasma (ICP)-MS based methods, X-ray absorption/fluorescence, radionuclide based methods and bioinformatics. Important discoveries in microbial proteomics will be reviewed, as well as the outlook to new emerging approaches and research areas.
Collapse
Affiliation(s)
- Peter-Leon Hagedoorn
- Department of Biotechnology, Delft University of Technology, Julianalaan 67, Delft 2628 BC, The Netherlands.
| |
Collapse
|
29
|
Sillitoe I, Furnham N. FunTree: advances in a resource for exploring and contextualising protein function evolution. Nucleic Acids Res 2015; 44:D317-23. [PMID: 26590404 PMCID: PMC4702901 DOI: 10.1093/nar/gkv1274] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2015] [Accepted: 11/03/2015] [Indexed: 11/13/2022] Open
Abstract
FunTree is a resource that brings together protein sequence, structure and functional information, including overall chemical reaction and mechanistic data, for structurally defined domain superfamilies. Developed in tandem with the CATH database, the original FunTree contained just 276 superfamilies focused on enzymes. Here, we present an update of FunTree that has expanded to include 2340 superfamilies including both enzymes and proteins with non-enzymatic functions annotated by Gene Ontology (GO) terms. This allows the investigation of how novel functions have evolved within a structurally defined superfamily and provides a means to analyse trends across many superfamilies. This is done not only within the context of a protein's sequence and structure but also the relationships of their functions. New measures of functional similarity have been integrated, including for enzymes comparisons of overall reactions based on overall bond changes, reaction centres (the local environment atoms involved in the reaction) and the sub-structure similarities of the metabolites involved in the reaction and for non-enzymes semantic similarities based on the GO. To identify and highlight changes in function through evolution, ancestral character estimations are made and presented. All this is accessible through a new re-designed web interface that can be found at http://www.funtree.info.
Collapse
Affiliation(s)
- Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, Darwin Building, Gower Street, London WC1E 6BT, UK
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK
| |
Collapse
|
30
|
Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies. J Mol Biol 2015; 428:253-267. [PMID: 26585402 PMCID: PMC4751976 DOI: 10.1016/j.jmb.2015.11.010] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Revised: 10/05/2015] [Accepted: 11/10/2015] [Indexed: 01/28/2023]
Abstract
Enzymes, as biological catalysts, form the basis of all forms of life. How these proteins have evolved their functions remains a fundamental question in biology. Over 100 years of detailed biochemistry studies, combined with the large volumes of sequence and protein structural data now available, means that we are able to perform large-scale analyses to address this question. Using a range of computational tools and resources, we have compiled information on all experimentally annotated changes in enzyme function within 379 structurally defined protein domain superfamilies, linking the changes observed in functions during evolution to changes in reaction chemistry. Many superfamilies show changes in function at some level, although one function often dominates one superfamily. We use quantitative measures of changes in reaction chemistry to reveal the various types of chemical changes occurring during evolution and to exemplify these by detailed examples. Additionally, we use structural information of the enzymes active site to examine how different superfamilies have changed their catalytic machinery during evolution. Some superfamilies have changed the reactions they perform without changing catalytic machinery. In others, large changes of enzyme function, in terms of both overall chemistry and substrate specificity, have been brought about by significant changes in catalytic machinery. Interestingly, in some superfamilies, relatives perform similar functions but with different catalytic machineries. This analysis highlights characteristics of functional evolution across a wide range of superfamilies, providing insights that will be useful in predicting the function of uncharacterised sequences and the design of new synthetic enzymes. Examining how enzyme function evolves using sequence, structure, and reaction mechanism data. Quantifying changes in reaction mechanisms reveals how function has diverged in many superfamilies. Homologous domains frequently use different catalytic residues, which sometimes perform the same enzyme chemistry. This large-scale analysis has significance in protein function prediction and enzyme design.
Collapse
|
31
|
Zheng Z, Goncearenco A, Berezovsky IN. Nucleotide binding database NBDB--a collection of sequence motifs with specific protein-ligand interactions. Nucleic Acids Res 2015; 44:D301-7. [PMID: 26507856 PMCID: PMC4702817 DOI: 10.1093/nar/gkv1124] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2015] [Accepted: 10/14/2015] [Indexed: 11/14/2022] Open
Abstract
NBDB database describes protein motifs, elementary functional loops (EFLs) that are involved in binding of nucleotide-containing ligands and other biologically relevant cofactors/coenzymes, including ATP, AMP, ATP, GMP, GDP, GTP, CTP, PAP, PPS, FMN, FAD(H), NAD(H), NADP, cAMP, cGMP, c-di-AMP and c-di-GMP, ThPP, THD, F-420, ACO, CoA, PLP and SAM. The database is freely available online at http://nbdb.bii.a-star.edu.sg. In total, NBDB contains data on 249 motifs that work in interactions with 24 ligands. Sequence profiles of EFL motifs were derived de novo from nonredundant Uniprot proteome sequences. Conserved amino acid residues in the profiles interact specifically with distinct chemical parts of nucleotide-containing ligands, such as nitrogenous bases, phosphate groups, ribose, nicotinamide, and flavin moieties. Each EFL profile in the database is characterized by a pattern of corresponding ligand–protein interactions found in crystallized ligand–protein complexes. NBDB database helps to explore the determinants of nucleotide and cofactor binding in different protein folds and families. NBDB can also detect fragments that match to profiles of particular EFLs in the protein sequence provided by user. Comprehensive information on sequence, structures, and interactions of EFLs with ligands provides a foundation for experimental and computational efforts on design of required protein functions.
Collapse
Affiliation(s)
- Zejun Zheng
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | | | - Igor N Berezovsky
- Bioinformatics Institute, Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore
| |
Collapse
|
32
|
The history of the CATH structural classification of protein domains. Biochimie 2015; 119:209-17. [PMID: 26253692 PMCID: PMC4678953 DOI: 10.1016/j.biochi.2015.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 08/01/2015] [Indexed: 11/21/2022]
Abstract
This article presents a historical review of the protein structure classification database CATH. Together with the SCOP database, CATH remains comprehensive and reasonably up-to-date with the now more than 100,000 protein structures in the PDB. We review the expansion of the CATH and SCOP resources to capture predicted domain structures in the genome sequence data and to provide information on the likely functions of proteins mediated by their constituent domains. The establishment of comprehensive function annotation resources has also meant that domain families can be functionally annotated allowing insights into functional divergence and evolution within protein families. We present a historical review of the protein structure database CATH. We review the expansion of the CATH and SCOP resources with sequence data and functional annotations. How functional annotation resources allow insights into functional divergence and evolution within protein families.
Collapse
|
33
|
Goncearenco A, Berezovsky IN. Protein function from its emergence to diversity in contemporary proteins. Phys Biol 2015; 12:045002. [PMID: 26057563 DOI: 10.1088/1478-3975/12/4/045002] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The goal of this work is to learn from nature the rules that govern evolution and the design of protein function. The fundamental laws of physics lie in the foundation of the protein structure and all stages of the protein evolution, determining optimal sizes and shapes at different levels of structural hierarchy. We looked back into the very onset of the protein evolution with a goal to find elementary functions (EFs) that came from the prebiotic world and served as building blocks of the first enzymes. We defined the basic structural and functional units of biochemical reactions-elementary functional loops. The diversity of contemporary enzymes can be described via combinations of a limited number of elementary chemical reactions, many of which are performed by the descendants of primitive prebiotic peptides/proteins. By analyzing protein sequences we were able to identify EFs shared by seemingly unrelated protein superfamilies and folds and to unravel evolutionary relations between them. Binding and metabolic processing of the metal- and nucleotide-containing cofactors and ligands are among the most abundant ancient EFs that became indispensable in many natural enzymes. Highly designable folds provide structural scaffolds for many different biochemical reactions. We show that contemporary proteins are built from a limited number of EFs, making their analysis instrumental for establishing the rules for protein design. Evolutionary studies help us to accumulate the library of essential EFs and to establish intricate relations between different folds and functional superfamilies. Generalized sequence-structure descriptors of the EF will become useful in future design and engineering of desired enzymatic functions.
Collapse
Affiliation(s)
- Alexander Goncearenco
- Computational Biology Unit and Department of Informatics, University of Bergen, N-5008 Bergen, Norway
| | | |
Collapse
|
34
|
Holliday GL, Bairoch A, Bagos PG, Chatonnet A, Craik DJ, Finn RD, Henrissat B, Landsman D, Manning G, Nagano N, O’Donovan C, Pruitt KD, Rawlings ND, Saier M, Sowdhamini R, Spedding M, Srinivasan N, Vriend G, Babbitt PC, Bateman A. Key challenges for the creation and maintenance of specialist protein resources. Proteins 2015; 83:1005-13. [PMID: 25820941 PMCID: PMC4446195 DOI: 10.1002/prot.24803] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Revised: 03/06/2015] [Accepted: 03/20/2015] [Indexed: 11/12/2022]
Abstract
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that vary from the supplemental material of individual articles, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialized approaches that are not feasible in the major reference databases. Many are labors of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on August 11 and 12, 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value.
Collapse
Affiliation(s)
- Gemma L Holliday
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
| | - Amos Bairoch
- SIB—Swiss Institute of Bioinformatics, University of GenevaGeneva, Switzerland
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of ThessalyLamia, 35100, Greece
| | - Arnaud Chatonnet
- INRA, Umr866 Dynamique Musculaire Et MétabolismeMontpellier, F-34000, France
- Université MontpellierMontpellier, F-34000, France
| | - David J Craik
- Institute for Molecular Bioscience. The University of QueenslandBrisbane, Queensland, 4072, Australia
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Bernard Henrissat
- Architecture Et Fonction Des Macromolécules Biologiques, CNRS, Aix-Marseille UniversitéMarseille, 13288, France
- Department of Biological Sciences, King Abdulaziz UniversityJeddah, Saudi Arabia
| | - David Landsman
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
| | - Gerard Manning
- Department of Bioinformatics & Computational Biology, Genentech1 DNA Way, South San Francisco, California, 98010
| | - Nozomi Nagano
- Computational Biology Research Center, National Institute of Advanced Industrial Science and TechnologyTokyo, 135-0064, Japan
| | - Claire O’Donovan
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of HealthBethesda, Maryland, 20892
| | - Neil D Rawlings
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
- Wellcome Trust Sanger InstituteWellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| | - Milton Saier
- Department of Molecular Biology, University of California at San DiegoLa Jolla, California, 92093
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, TIFRGKVK Campus, Bellary Road, Bangalore, 560065, India
| | - Michael Spedding
- Chair NC-IUPHAR, Spedding Research Solutions SARL6 Rue Ampere, Le Vesinet, 78110, France
| | | | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics (CMBI), Radboud University Medical Center, Geert Grooteplein Zuid 26-28, 6525 GANijmegen, The Netherlands
| | - Patricia C Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of CaliforniaSan Francisco, California, 94158
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI)Wellcome Trust Genome Campus, Hinxton, Cambridge, Cb10 1SD, United Kingdom
| |
Collapse
|
35
|
Bastian FB, Chibucos MC, Gaudet P, Giglio M, Holliday GL, Huang H, Lewis SE, Niknejad A, Orchard S, Poux S, Skunca N, Robinson-Rechavi M. The Confidence Information Ontology: a step towards a standard for asserting confidence in annotations. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2015; 2015:bav043. [PMID: 25957950 PMCID: PMC4425939 DOI: 10.1093/database/bav043] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 04/15/2015] [Indexed: 02/01/2023]
Abstract
Biocuration has become a cornerstone for analyses in biology, and to meet needs, the amount of annotations has considerably grown in recent years. However, the reliability of these annotations varies; it has thus become necessary to be able to assess the confidence in annotations. Although several resources already provide confidence information about the annotations that they produce, a standard way of providing such information has yet to be defined. This lack of standardization undermines the propagation of knowledge across resources, as well as the credibility of results from high-throughput analyses. Seeded at a workshop during the Biocuration 2012 conference, a working group has been created to address this problem. We present here the elements that were identified as essential for assessing confidence in annotations, as well as a draft ontology—the Confidence Information Ontology—to illustrate how the problems identified could be addressed. We hope that this effort will provide a home for discussing this major issue among the biocuration community. Tracker URL:https://github.com/BgeeDB/confidence-information-ontology Ontology URL:https://raw.githubusercontent.com/BgeeDB/confidence-information-ontology/master/src/ontology/cio-simple.obo
Collapse
Affiliation(s)
- Frederic B Bastian
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley Nat
| | - Marcus C Chibucos
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Pascale Gaudet
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Michelle Giglio
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Gemma L Holliday
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Hong Huang
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Suzanna E Lewis
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Anne Niknejad
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley Nat
| | - Sandra Orchard
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Sylvain Poux
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK
| | - Nives Skunca
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley Nat
| | - Marc Robinson-Rechavi
- Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley National Lab, 1 Cyclotron Rd., Berkeley, 94720 CA USA, European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK, Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Centre Medical Universitaire, Geneva, Switzerland, ETH Zurich, Department of Computer Science, Universitätstr. 19, 8092 Zürich, Switzerland, SIB Swiss Institute of Bioinformatics, Universitätstr. 6, 8092 Zürich, Switzerland and University College London, Gower St, London WC1E 6BT, UK Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland, SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland, Department of Microbiology and Immunology and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, SIB Swiss Institute of Bioinformatics, 1 Rue Michel Servet, 1211 Geneva, Switzerland, Department of Medicine and Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore MD, USA, Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA 94158, USA, School of Information, University of South Florida, Tampa, FL, 33647, USA, Genomics Division, Lawrence Berkeley Nat
| |
Collapse
|
36
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 251] [Impact Index Per Article: 27.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
37
|
Morgat A, Axelsen KB, Lombardot T, Alcántara R, Aimo L, Zerara M, Niknejad A, Belda E, Hyka-Nouspikel N, Coudert E, Redaschi N, Bougueleret L, Steinbeck C, Xenarios I, Bridge A. Updates in Rhea--a manually curated resource of biochemical reactions. Nucleic Acids Res 2014; 43:D459-64. [PMID: 25332395 PMCID: PMC4384025 DOI: 10.1093/nar/gku961] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Rhea (http://www.ebi.ac.uk/rhea) is a comprehensive and non-redundant resource of expert-curated biochemical reactions described using species from the ChEBI (Chemical Entities of Biological Interest) ontology of small molecules. Rhea has been designed for the functional annotation of enzymes and the description of genome-scale metabolic networks, providing stoichiometrically balanced enzyme-catalyzed reactions (covering the IUBMB Enzyme Nomenclature list and additional reactions), transport reactions and spontaneously occurring reactions. Rhea reactions are extensively curated with links to source literature and are mapped to other publicly available enzyme and pathway databases such as Reactome, BioCyc, KEGG and UniPathway, through manual curation and computational methods. Here we describe developments in Rhea since our last report in the 2012 database issue of Nucleic Acids Research. These include significant growth in the number of Rhea reactions and the inclusion of reactions involving complex macromolecules such as proteins, nucleic acids and other polymers that lie outside the scope of ChEBI. Together these developments will significantly increase the utility of Rhea as a tool for the description, analysis and reconciliation of genome-scale metabolic models.
Collapse
Affiliation(s)
- Anne Morgat
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland Genoscope-LABGeM, CEA, Evry, F-91057, France
| | - Kristian B Axelsen
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Thierry Lombardot
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Rafael Alcántara
- Equipe BAMBOO, INRIA Grenoble Rhône-Alpes, Montbonnot Saint-Martin, F-38330, France
| | - Lucila Aimo
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Mohamed Zerara
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Anne Niknejad
- Cheminformatics and Metabolism Team, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Eugeni Belda
- Department of Biochemistry, University of Geneva, Geneva, CH-1206, Switzerland
| | - Nevila Hyka-Nouspikel
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Elisabeth Coudert
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Nicole Redaschi
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Lydie Bougueleret
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| | - Christoph Steinbeck
- Equipe BAMBOO, INRIA Grenoble Rhône-Alpes, Montbonnot Saint-Martin, F-38330, France
| | - Ioannis Xenarios
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland Cheminformatics and Metabolism Team, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Alan Bridge
- Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, CH-1206, Switzerland
| |
Collapse
|
38
|
Nagano N, Nakayama N, Ikeda K, Fukuie M, Yokota K, Doi T, Kato T, Tomii K. EzCatDB: the enzyme reaction database, 2015 update. Nucleic Acids Res 2014; 43:D453-8. [PMID: 25324316 PMCID: PMC4384017 DOI: 10.1093/nar/gku946] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The EzCatDB database (http://ezcatdb.cbrc.jp/EzCatDB/) has emphasized manual classification of enzyme reactions from the viewpoints of enzyme active-site structures and their catalytic mechanisms based on literature information, amino acid sequences of enzymes (UniProtKB) and the corresponding tertiary structures from the Protein Data Bank (PDB). Reaction types such as hydrolysis, transfer, addition, elimination, isomerization, hydride transfer and electron transfer have been included in the reaction classification, RLCP. This database includes information related to ligand molecules on the enzyme structures in the PDB data, classified in terms of cofactors, substrates, products and intermediates, which are also necessary to elucidate the catalytic mechanisms. Recently, the database system was updated. The 3D structures of active sites for each PDB entry can be viewed using Jmol or Rasmol software. Moreover, sequence search systems of two types were developed for the EzCatDB database: EzCat-BLAST and EzCat-FORTE. EzCat-BLAST is suitable for quick searches, adopting the BLAST algorithm, whereas EzCat-FORTE is more suitable for detecting remote homologues, adopting the algorithm for FORTE protein structure prediction software. Another system, EzMetAct, is also available to searching for major active-site structures in EzCatDB, for which PDB-formatted queries can be searched.
Collapse
Affiliation(s)
- Nozomi Nagano
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Naoko Nakayama
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| | - Kazuyoshi Ikeda
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Masaru Fukuie
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Kiyonobu Yokota
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Takuo Doi
- Level Five Co. Ltd., Grove Tower 4112, 4-21-1, Shibaura, Minato-ku, Tokyo 108-0023, Japan
| | - Tsuyoshi Kato
- Faculty of Science and Engineering, Gunma University, 1-5-1 Tenjin-cho, Kiryu, Gunma 376-8515, Japan Center for Informational Biology, Ochanomizu University, 2-1-1 Otsuka, Bunkyo-ku, Tokyo 112-8610, Japan
| | - Kentaro Tomii
- Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST), Tokyo Waterfront Bio-IT Research Building, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan
| |
Collapse
|
39
|
Alderson RG, Barker D, Mitchell JBO. One origin for metallo-β-lactamase activity, or two? An investigation assessing a diverse set of reconstructed ancestral sequences based on a sample of phylogenetic trees. J Mol Evol 2014; 79:117-29. [PMID: 25185655 PMCID: PMC4185109 DOI: 10.1007/s00239-014-9639-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2014] [Accepted: 08/11/2014] [Indexed: 01/04/2023]
Abstract
Bacteria use metallo-β-lactamase enzymes to hydrolyse lactam rings found in many antibiotics, rendering them ineffective. Metallo-β-lactamase activity is thought to be polyphyletic, having arisen on more than one occasion within a single functionally diverse homologous superfamily. Since discovery of multiple origins of enzymatic activity conferring antibiotic resistance has broad implications for the continued clinical use of antibiotics, we test the hypothesis of polyphyly further; if lactamase function has arisen twice independently, the most recent common ancestor (MRCA) is not expected to possess lactam-hydrolysing activity. Two major problems present themselves. Firstly, even with a perfectly known phylogeny, ancestral sequence reconstruction is error prone. Secondly, the phylogeny is not known, and in fact reconstructing a single, unambiguous phylogeny for the superfamily has proven impossible. To obtain a more statistical view of the strength of evidence for or against MRCA lactamase function, we reconstructed a sample of 98 MRCAs of the metallo-β-lactamases, each based on a different tree in a bootstrap sample of reconstructed phylogenies. InterPro sequence signatures and homology modelling were then used to assess our sample of MRCAs for lactamase functionality. Only 5 % of these models conform to our criteria for metallo-β-lactamase functionality, suggesting that the ancestor was unlikely to have been a metallo-β-lactamase. On the other hand, given that ancestral proteins may have had metallo-β-lactamase functionality with variation in sequence and structural properties compared with extant enzymes, our criteria are conservative, estimating a lower bound of evidence for metallo-β-lactamase functionality but not an upper bound.
Collapse
Affiliation(s)
- Rosanna G. Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| | - Daniel Barker
- Sir Harold Mitchell Building, School of Biology, University of St Andrews, St Andrews, KY16 9TH Scotland, UK
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, KY16 9ST Scotland, UK
| |
Collapse
|
40
|
A systems approach to predict oncometabolites via context-specific genome-scale metabolic networks. PLoS Comput Biol 2014; 10:e1003837. [PMID: 25232952 PMCID: PMC4168981 DOI: 10.1371/journal.pcbi.1003837] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 07/29/2014] [Indexed: 02/08/2023] Open
Abstract
Altered metabolism in cancer cells has been viewed as a passive response required for a malignant transformation. However, this view has changed through the recently described metabolic oncogenic factors: mutated isocitrate dehydrogenases (IDH), succinate dehydrogenase (SDH), and fumarate hydratase (FH) that produce oncometabolites that competitively inhibit epigenetic regulation. In this study, we demonstrate in silico predictions of oncometabolites that have the potential to dysregulate epigenetic controls in nine types of cancer by incorporating massive scale genetic mutation information (collected from more than 1,700 cancer genomes), expression profiling data, and deploying Recon 2 to reconstruct context-specific genome-scale metabolic models. Our analysis predicted 15 compounds and 24 substructures of potential oncometabolites that could result from the loss-of-function and gain-of-function mutations of metabolic enzymes, respectively. These results suggest a substantial potential for discovering unidentified oncometabolites in various forms of cancers. Cancer and metabolism have been considered to be associated for a long period since Otto Warburg observed that tumor cells consume glucose and convert most of it to lactate, despite the presence of oxygen. However, the role of the Warburg effect in oncogenesis had been under doubt because no solid links between genetic variations in metabolic genes and cancer had been observed until recent days. When with the development of sequencing technologies, researchers found mutations in IDH1, IDH2 (isocitrate dehydrogenase) in medium-grade glioma and acute leukemia. As these mutated metabolic genes initiate unexpected enzymatic reactions, cancer cells show altered concentration of particular metabolites, here called “oncometabolites”. The oncometabolites regulate the epigenetic controls of cell differentiations. In this study, we predict potential oncometabolites that might originate from loss or gain-of-function mutations in nine types of cancer from massive scale cancer mutation data with a systems biology approach.
Collapse
|
41
|
Martinez Cuesta S, Furnham N, Rahman SA, Sillitoe I, Thornton JM. The evolution of enzyme function in the isomerases. Curr Opin Struct Biol 2014; 26:121-30. [PMID: 25000289 PMCID: PMC4139412 DOI: 10.1016/j.sbi.2014.06.002] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2014] [Revised: 06/02/2014] [Accepted: 06/10/2014] [Indexed: 01/14/2023]
Abstract
The advent of computational approaches to measure functional similarity between enzymes adds a new dimension to existing evolutionary studies based on sequence and structure. This paper reviews research efforts aiming to understand the evolution of enzyme function in superfamilies, presenting a novel strategy to provide an overview of the evolution of enzymes belonging to an individual EC class, using the isomerases as an exemplar.
Collapse
Affiliation(s)
- Sergio Martinez Cuesta
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| | - Nicholas Furnham
- Department of Pathogen Molecular Biology, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, United Kingdom
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, Gower Street, London, WC1E 6BT, United Kingdom
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, United Kingdom.
| |
Collapse
|
42
|
De Ferrari L, Mitchell JBO. From sequence to enzyme mechanism using multi-label machine learning. BMC Bioinformatics 2014; 15:150. [PMID: 24885296 PMCID: PMC4229970 DOI: 10.1186/1471-2105-15-150] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2014] [Accepted: 05/07/2014] [Indexed: 12/30/2022] Open
Abstract
Background In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. Results In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. Conclusion We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at
http://sourceforge.net/projects/ml2db/ and as supplementary files.
Collapse
Affiliation(s)
- Luna De Ferrari
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland KY16 9ST, UK.
| | | |
Collapse
|
43
|
Holliday GL, Rahman SA, Furnham N, Thornton JM. Exploring the biological and chemical complexity of the ligases. J Mol Biol 2014; 426:2098-111. [PMID: 24657765 PMCID: PMC4018984 DOI: 10.1016/j.jmb.2014.03.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2013] [Revised: 03/01/2014] [Accepted: 03/14/2014] [Indexed: 12/03/2022]
Abstract
Using a novel method to map and cluster chemical reactions, we have re-examined the chemistry of the ligases [Enzyme Commission (EC) Class 6] and their associated protein families in detail. The type of bond formed by the ligase can be automatically extracted from the equation of the reaction, replicating the EC subclass division. However, this subclass division hides considerable complexities, especially for the C-N forming ligases, which fall into at least three distinct types. The lower levels of the EC classification for ligases are somewhat arbitrary in their definition and add little to understanding their chemistry or evolution. By comparing the multi-domain architecture of the enzymes and using sequence similarity networks, we examined the links between overall reaction and evolution of the ligases. These show that, whilst many enzymes that perform the same overall chemistry group together, both convergent (similar function, different ancestral lineage) and divergent (different function, common ancestor) evolution of function are observed. However, a common theme is that a single conserved domain (often the nucleoside triphosphate binding domain) is combined with ancillary domains that provide the variation in substrate binding and function.
Collapse
Affiliation(s)
- Gemma L Holliday
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | - Syed Asad Rahman
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Nicholas Furnham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
44
|
Nath N, Mitchell JBO, Caetano-Anollés G. The natural history of biocatalytic mechanisms. PLoS Comput Biol 2014; 10:e1003642. [PMID: 24874434 PMCID: PMC4038463 DOI: 10.1371/journal.pcbi.1003642] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 04/09/2014] [Indexed: 11/29/2022] Open
Abstract
Phylogenomic analysis of the occurrence and abundance of protein domains in proteomes has recently showed that the α/β architecture is probably the oldest fold design. This holds important implications for the origins of biochemistry. Here we explore structure-function relationships addressing the use of chemical mechanisms by ancestral enzymes. We test the hypothesis that the oldest folds used the most mechanisms. We start by tracing biocatalytic mechanisms operating in metabolic enzymes along a phylogenetic timeline of the first appearance of homologous superfamilies of protein domain structures from CATH. A total of 335 enzyme reactions were retrieved from MACiE and were mapped over fold age. We define a mechanistic step type as one of the 51 mechanistic annotations given in MACiE, and each step of each of the 335 mechanisms was described using one or more of these annotations. We find that the first two folds, the P-loop containing nucleotide triphosphate hydrolase and the NAD(P)-binding Rossmann-like homologous superfamilies, were α/β architectures responsible for introducing 35% (18/51) of the known mechanistic step types. We find that these two oldest structures in the phylogenomic analysis of protein domains introduced many mechanistic step types that were later combinatorially spread in catalytic history. The most common mechanistic step types included fundamental building blocks of enzyme chemistry: "Proton transfer," "Bimolecular nucleophilic addition," "Bimolecular nucleophilic substitution," and "Unimolecular elimination by the conjugate base." They were associated with the most ancestral fold structure typical of P-loop containing nucleotide triphosphate hydrolases. Over half of the mechanistic step types were introduced in the evolutionary timeline before the appearance of structures specific to diversified organisms, during a period of architectural diversification. The other half unfolded gradually after organismal diversification and during a period that spanned ∼2 billion years of evolutionary history.
Collapse
Affiliation(s)
- Neetika Nath
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - John B. O. Mitchell
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, University of St. Andrews, North Haugh, St. Andrews, Scotland, United Kingdom
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| |
Collapse
|
45
|
Reddy JG, Hosur RV. Complete backbone and DENQ side chain NMR assignments in proteins from a single experiment: implications to structure-function studies. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2014; 15:25-32. [PMID: 24535112 DOI: 10.1007/s10969-014-9175-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 02/10/2014] [Indexed: 06/03/2023]
Abstract
Resonance assignment is the first and the most crucial step in all nuclear magnetic resonance (NMR) investigations on structure-function relationships in biological macromolecules. Often, the assignment exercise has to be repeated several times when specific interactions with ligands, substrates etc., have to be elucidated for understanding the functional mechanisms. While the protein backbone serves to provide a scaffold, the side chains interact directly with the ligands. Such investigations will be greatly facilitated, if there are rapid methods for obtaining exhaustive information with minimum of NMR experimentation. In this context, we present here a pulse sequence which exploits the recently introduced technique of parallel detection of multiple nuclei, e.g. (1)H and (13)C, and results in two 3D-data sets simultaneously. These yield complete backbone resonance assignment ((1)H(N), (15)N, (13)CO, (1)Hα/(13)Cα, and (1)Hβ/(13)Cβ chemical shifts) and side chain assignment of D, E, N and Q residues. Such an exhaustive assignment has the potential of yielding accurate 3D structures using one or more of several algorithms which calculate structures of the molecules very reliably on the basis of NMR chemical shifts alone. The side chain assignments of D, E, N, and Q will be extremely valuable for interaction studies with different ligands; D and E side chains are known to be involved in majority of catalytic activities. Utility of this experiment has been demonstrated with Ca(2+) bound M-crystallin, which contains largely D, E, N and Q residues at the metal binding sites.
Collapse
Affiliation(s)
- Jithender G Reddy
- Department of Chemical Sciences, Tata Institute of Fundamental Research, Homi Bhabha Road, Mumbai, 400 005, India
| | | |
Collapse
|
46
|
Ranaghan KE, Hung JE, Bartlett GJ, Mooibroek TJ, Harvey JN, Woolfson DN, van der Donk WA, Mulholland AJ. A catalytic role for methionine revealed by a combination of computation and experiments on phosphite dehydrogenase. Chem Sci 2014. [DOI: 10.1039/c3sc53009d] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Novel role for methionine in enzyme catalysis.
Collapse
Affiliation(s)
- Kara E. Ranaghan
- Centre for Computational Chemistry
- School of Chemistry
- University of Bristol
- Bristol, UK
- School of Chemistry
| | - John E. Hung
- Department of Chemistry and The Howard Hughes Medical Institute
- University of Illinois at Urbana-Champaign
- Urbana, USA
| | | | | | - Jeremy N. Harvey
- Centre for Computational Chemistry
- School of Chemistry
- University of Bristol
- Bristol, UK
- School of Chemistry
| | - Derek N. Woolfson
- School of Chemistry
- University of Bristol
- Bristol, UK
- School of Biochemistry
- Medical Sciences
| | - Wilfred A. van der Donk
- Department of Chemistry and The Howard Hughes Medical Institute
- University of Illinois at Urbana-Champaign
- Urbana, USA
| | - Adrian J. Mulholland
- Centre for Computational Chemistry
- School of Chemistry
- University of Bristol
- Bristol, UK
- School of Chemistry
| |
Collapse
|
47
|
|
48
|
Furnham N, Holliday GL, de Beer TAP, Jacobsen JOB, Pearson WR, Thornton JM. The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes. Nucleic Acids Res 2013; 42:D485-9. [PMID: 24319146 PMCID: PMC3964973 DOI: 10.1093/nar/gkt1243] [Citation(s) in RCA: 127] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Understanding which are the catalytic residues in an enzyme and what function they perform is crucial to many biology studies, particularly those leading to new therapeutics and enzyme design. The original version of the Catalytic Site Atlas (CSA) (http://www.ebi.ac.uk/thornton-srv/databases/CSA) published in 2004, which catalogs the residues involved in enzyme catalysis in experimentally determined protein structures, had only 177 curated entries and employed a simplistic approach to expanding these annotations to homologous enzyme structures. Here we present a new version of the CSA (CSA 2.0), which greatly expands the number of both curated (968) and automatically annotated catalytic sites in enzyme structures, utilizing a new method for annotation transfer. The curated entries are used, along with the variation in residue type from the sequence comparison, to generate 3D templates of the catalytic sites, which in turn can be used to find catalytic sites in new structures. To ease the transfer of CSA annotations to other resources a new ontology has been developed: the Enzyme Mechanism Ontology, which has permitted the transfer of annotations to Mechanism, Annotation and Classification in Enzymes (MACiE) and UniProt Knowledge Base (UniProtKB) resources. The CSA database schema has been re-designed and both the CSA data and search capabilities are presented in a new modern web interface.
Collapse
Affiliation(s)
- Nicholas Furnham
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and Department of Biochemistry and Molecular Genetics, University of Virginia, 1300 Jefferson Park Ave., Charlottesville, VA 22908, USA
| | | | | | | | | | | |
Collapse
|
49
|
Akiva E, Brown S, Almonacid DE, Barber AE, Custer AF, Hicks MA, Huang CC, Lauck F, Mashiyama ST, Meng EC, Mischel D, Morris JH, Ojha S, Schnoes AM, Stryke D, Yunes JM, Ferrin TE, Holliday GL, Babbitt PC. The Structure-Function Linkage Database. Nucleic Acids Res 2013; 42:D521-30. [PMID: 24271399 PMCID: PMC3965090 DOI: 10.1093/nar/gkt1130] [Citation(s) in RCA: 180] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The Structure–Function Linkage Database (SFLD, http://sfld.rbvi.ucsf.edu/) is a manually curated classification resource describing structure–function relationships for functionally diverse enzyme superfamilies. Members of such superfamilies are diverse in their overall reactions yet share a common ancestor and some conserved active site features associated with conserved functional attributes such as a partial reaction. Thus, despite their different functions, members of these superfamilies ‘look alike’, making them easy to misannotate. To address this complexity and enable rational transfer of functional features to unknowns only for those members for which we have sufficient functional information, we subdivide superfamily members into subgroups using sequence information, and lastly into families, sets of enzymes known to catalyze the same reaction using the same mechanistic strategy. Browsing and searching options in the SFLD provide access to all of these levels. The SFLD offers manually curated as well as automatically classified superfamily sets, both accompanied by search and download options for all hierarchical levels. Additional information includes multiple sequence alignments, tab-separated files of functional and other attributes, and sequence similarity networks. The latter provide a new and intuitively powerful way to visualize functional trends mapped to the context of sequence similarity.
Collapse
Affiliation(s)
- Eyal Akiva
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA 94158, USA, Universidad Andres Bello, Center for Bioinformatics and Integrative Biology, Facultad de Ciencias Biologicas, Santiago 8370146, Chile, Nodality, Inc., South San Francisco, CA 94080, USA, Department of Electrical and Computer Engineering, College of Engineering, Boston University, Boston, MA 02215, USA, Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA, Department of Pharmaceutical Chemistry, School of Pharmacy, University of California, San Francisco, San Francisco, CA 94158, USA, Center for Bioinformatics (ZBH), University of Hamburg, Hamburg 20146, Germany, Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA, School of Medicine, University of California, San Francisco, San Francisco, CA 94143, USA, UC Berkeley - UCSF Graduate Program in Bioengineering, University of California, San Francisco, CA 94158 and Berkeley, CA 94720, USA and California Institute for Quantitative Biosciences, University of California, San Francisco, San Francisco, CA 94158, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Predictions of Enzymatic Parameters: A Mini-Review with Focus on Enzymes for Biofuel. Appl Biochem Biotechnol 2013; 171:590-615. [DOI: 10.1007/s12010-013-0328-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2013] [Accepted: 06/11/2013] [Indexed: 12/25/2022]
|