1
|
Elisée E, Ducrot L, Méheust R, Bastard K, Fossey-Jouenne A, Grogan G, Pelletier E, Petit JL, Stam M, de Berardinis V, Zaparucha A, Vallenet D, Vergne-Vaxelaire C. A refined picture of the native amine dehydrogenase family revealed by extensive biodiversity screening. Nat Commun 2024; 15:4933. [PMID: 38858403 PMCID: PMC11164908 DOI: 10.1038/s41467-024-49009-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 05/20/2024] [Indexed: 06/12/2024] Open
Abstract
Native amine dehydrogenases offer sustainable access to chiral amines, so the search for scaffolds capable of converting more diverse carbonyl compounds is required to reach the full potential of this alternative to conventional synthetic reductive aminations. Here we report a multidisciplinary strategy combining bioinformatics, chemoinformatics and biocatalysis to extensively screen billions of sequences in silico and to efficiently find native amine dehydrogenases features using computational approaches. In this way, we achieve a comprehensive overview of the initial native amine dehydrogenase family, extending it from 2,011 to 17,959 sequences, and identify native amine dehydrogenases with non-reported substrate spectra, including hindered carbonyls and ethyl ketones, and accepting methylamine and cyclopropylamine as amine donor. We also present preliminary model-based structural information to inform the design of potential (R)-selective amine dehydrogenases, as native amine dehydrogenases are mostly (S)-selective. This integrated strategy paves the way for expanding the resource of other enzyme families and in highlighting enzymes with original features.
Collapse
Affiliation(s)
- Eddy Elisée
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Laurine Ducrot
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Raphaël Méheust
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Bastard
- School of Pharmacy, Faculty of Medicine and Health, University of Sydney, Sydney, NSW, 2006, Australia
| | - Aurélie Fossey-Jouenne
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Gideon Grogan
- York Structural Biology Laboratory, Department of Chemistry, University of York, Heslington, York, YO10 5DD, UK
| | - Eric Pelletier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Louis Petit
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Mark Stam
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Véronique de Berardinis
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Anne Zaparucha
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - David Vallenet
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Carine Vergne-Vaxelaire
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
2
|
Paul M, Banerjee A, Maiti S, Mitra D, DasMohapatra PK, Thatoi H. Evaluation of substrate specificity and catalytic promiscuity of Bacillus albus cellulase: an insight into in silico proteomic study aiming at enhanced production of renewable energy. J Biomol Struct Dyn 2023:1-23. [PMID: 38126200 DOI: 10.1080/07391102.2023.2295971] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 12/11/2023] [Indexed: 12/23/2023]
Abstract
Cellulases are enzymes that aid in the hydrolysis of cellulosic fibers and have a wide range of industrial uses. In the present in silico study, sequence alignment between cellulases from different Bacillus species revealed that most of the residues are conserved in those aligned enzymes. Three dimensional structures of cellulase enzymes from 23 different Bacillus species have been predicted and based on the alignment between the modeled structures, those enzymes have been categorized into 7 different groups according to the homology in their conformational folds. There are two structural contents in Gr-I cellulase namely β1-α2 and β3-α5 loops which varies greatly according to their static position. Molecular docking study between the B. albus cellulase and its various cellulosic substrates including xylanoglucan oligosaccharides revealed that residues viz. Phe154, Tyr258, Tyr282, Tyr285, and Tyr376 of B. albus cellulase are significantly involved in formation stacking interaction during enzyme-substrate binding. Residue interaction network and binding energy analysis for the B. albus cellulase with different cellulosic substrates depicted the strong affinity of XylGlc3 substrate with the receptor enzyme. Molecular interaction and molecular dynamics simulation studies exhibited structural stability of enzyme-substrate complexes which are greatly influenced by the presence of catalytic promiscuity in their substrate binding sites. Screening of B. albus in carboxymethylcellulose (CMC) and xylan supplemented agar media revealed the capability of the bacterium in degrading both cellulose and xylan. Overall, the study demonstrated B. albus cellulase as an effective biocatalyst candidate with the potential role of catalytic promiscuity for possible applications in biofuel industries.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Manish Paul
- Department of Biotechnology, Maharaja Sriram Chandra Bhanja Deo University, Baripada, India
- Microbiology and Immunology, University of California San Francisco, San Francisco, CA, USA
| | - Amrita Banerjee
- Oriental Institute of Science and Technology, Midnapore, India
| | - Smarajit Maiti
- Oriental Institute of Science and Technology, Midnapore, India
| | - Debanjan Mitra
- Department of Microbiology, Raiganj University, Raiganj, India
| | - Pradeep K DasMohapatra
- Department of Microbiology, Raiganj University, Raiganj, India
- PAKB Environment Conservation Centre, Raiganj University, Raiganj, India
| | - Hrudayanath Thatoi
- Department of Biotechnology, Maharaja Sriram Chandra Bhanja Deo University, Baripada, India
| |
Collapse
|
3
|
Singh R, Sledzieski S, Bryson B, Cowen L, Berger B. Contrastive learning in protein language space predicts interactions between drugs and protein targets. Proc Natl Acad Sci U S A 2023; 120:e2220778120. [PMID: 37289807 PMCID: PMC10268324 DOI: 10.1073/pnas.2220778120] [Citation(s) in RCA: 24] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/10/2023] [Indexed: 06/10/2023] Open
Abstract
Sequence-based prediction of drug-target interactions has the potential to accelerate drug discovery by complementing experimental screens. Such computational prediction needs to be generalizable and scalable while remaining sensitive to subtle variations in the inputs. However, current computational techniques fail to simultaneously meet these goals, often sacrificing performance of one to achieve the others. We develop a deep learning model, ConPLex, successfully leveraging the advances in pretrained protein language models ("PLex") and employing a protein-anchored contrastive coembedding ("Con") to outperform state-of-the-art approaches. ConPLex achieves high accuracy, broad adaptivity to unseen data, and specificity against decoy compounds. It makes predictions of binding based on the distance between learned representations, enabling predictions at the scale of massive compound libraries and the human proteome. Experimental testing of 19 kinase-drug interaction predictions validated 12 interactions, including four with subnanomolar affinity, plus a strongly binding EPHB1 inhibitor (KD = 1.3 nM). Furthermore, ConPLex embeddings are interpretable, which enables us to visualize the drug-target embedding space and use embeddings to characterize the function of human cell-surface proteins. We anticipate that ConPLex will facilitate efficient drug discovery by making highly sensitive in silico drug screening feasible at the genome scale. ConPLex is available open source at https://ConPLex.csail.mit.edu.
Collapse
Affiliation(s)
- Rohit Singh
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Samuel Sledzieski
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Bryan Bryson
- Ragon Institute of MGH, MIT and Harvard, Cambridge, MA02139
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA02139
| | - Lenore Cowen
- Department of Computer Science, Tufts University, Medford, MA02155
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, MA02139
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA02139
| |
Collapse
|
4
|
Sherill-Rofe D, Raban O, Findlay S, Rahat D, Unterman I, Samiei A, Yasmeen A, Kaiser Z, Kuasne H, Park M, Foulkes WD, Bloch I, Zick A, Gotlieb WH, Tabach Y, Orthwein A. Multi-omics data integration analysis identifies the spliceosome as a key regulator of DNA double-strand break repair. NAR Cancer 2022; 4:zcac013. [PMID: 35399185 PMCID: PMC8991968 DOI: 10.1093/narcan/zcac013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Revised: 02/25/2022] [Accepted: 03/23/2022] [Indexed: 11/14/2022] Open
Abstract
DNA repair by homologous recombination (HR) is critical for the maintenance of genome stability. Germline and somatic mutations in HR genes have been associated with an increased risk of developing breast (BC) and ovarian cancers (OvC). However, the extent of factors and pathways that are functionally linked to HR with clinical relevance for BC and OvC remains unclear. To gain a broader understanding of this pathway, we used multi-omics datasets coupled with machine learning to identify genes that are associated with HR and to predict their sub-function. Specifically, we integrated our phylogenetic-based co-evolution approach (CladePP) with 23 distinct genetic and proteomic screens that monitored, directly or indirectly, DNA repair by HR. This omics data integration analysis yielded a new database (HRbase) that contains a list of 464 predictions, including 76 gold standard HR genes. Interestingly, the spliceosome machinery emerged as one major pathway with significant cross-platform interactions with the HR pathway. We functionally validated 6 spliceosome factors, including the RNA helicase SNRNP200 and its co-factor SNW1. Importantly, their RNA expression correlated with BC/OvC patient outcome. Altogether, we identified novel clinically relevant DNA repair factors and delineated their specific sub-function by machine learning. Our results, supported by evolutionary and multi-omics analyses, suggest that the spliceosome machinery plays an important role during the repair of DNA double-strand breaks (DSBs).
Collapse
Affiliation(s)
- Dana Sherill-Rofe
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Oded Raban
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Steven Findlay
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Dolev Rahat
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Irene Unterman
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Arash Samiei
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Amber Yasmeen
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| | - Zafir Kaiser
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Hellen Kuasne
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Morag Park
- Department of Biochemistry, McGill University, Montreal, QC H3G 1Y6, Canada
| | - William D Foulkes
- The Research Institute of the McGill University Health Centre, Montreal, QC H4A 3J1, Canada
| | - Idit Bloch
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Aviad Zick
- Department of Oncology, Hadassah Medical Center, Faculty of Medicine, Hebrew University of Jerusalem, Ein-Kerem, Jerusalem 91120, Israel
| | - Walter H Gotlieb
- Division of Gynecology Oncology, Segal Cancer Center, Jewish General Hospital, McGill University, Montreal, QC H3T 1E2, Canada
| | - Yuval Tabach
- Department of Developmental Biology and Cancer Research, Institute for Medical Research Israel-Canada, Hebrew University of Jerusalem-Hadassah Medical School, Jerusalem 91120, Israel
| | - Alexandre Orthwein
- Lady Davis Institute for Medical Research, Segal Cancer Centre, Jewish General Hospital, 3755 Chemin de la Côte-Sainte-Catherine, Montréal, QC H3T 1E2, Canada
| |
Collapse
|
5
|
Machine learning modeling of family wide enzyme-substrate specificity screens. PLoS Comput Biol 2022; 18:e1009853. [PMID: 35143485 PMCID: PMC8865696 DOI: 10.1371/journal.pcbi.1009853] [Citation(s) in RCA: 32] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2021] [Revised: 02/23/2022] [Accepted: 01/21/2022] [Indexed: 11/19/2022] Open
Abstract
Biocatalysis is a promising approach to sustainably synthesize pharmaceuticals, complex natural products, and commodity chemicals at scale. However, the adoption of biocatalysis is limited by our ability to select enzymes that will catalyze their natural chemical transformation on non-natural substrates. While machine learning and in silico directed evolution are well-posed for this predictive modeling challenge, efforts to date have primarily aimed to increase activity against a single known substrate, rather than to identify enzymes capable of acting on new substrates of interest. To address this need, we curate 6 different high-quality enzyme family screens from the literature that each measure multiple enzymes against multiple substrates. We compare machine learning-based compound-protein interaction (CPI) modeling approaches from the literature used for predicting drug-target interactions. Surprisingly, comparing these interaction-based models against collections of independent (single task) enzyme-only or substrate-only models reveals that current CPI approaches are incapable of learning interactions between compounds and proteins in the current family level data regime. We further validate this observation by demonstrating that our no-interaction baseline can outperform CPI-based models from the literature used to guide the discovery of kinase inhibitors. Given the high performance of non-interaction based models, we introduce a new structure-based strategy for pooling residue representations across a protein sequence. Altogether, this work motivates a principled path forward in order to build and evaluate meaningful predictive models for biocatalysis and other drug discovery applications. Predicting interactions between compounds and proteins represents a long-standing dream of drug discovery and protein engineering. Robust models of enzyme-substrate scope would dramatically advance our ability to design synthetic routes involving enzymatic catalysis. However, the lack of standardization between compound-protein interaction studies makes it difficult to evaluate the generalizability of such models. In this work we take a critical step forward by standardizing high-quality datasets measuring enzyme-substrate interactions, outlining rigorous evaluations, and proposing a new way to integrate structural information into protein representations. In testing previous modeling approaches, we highlight a surprising inability of existing models to effectively leverage compound-protein interactions to improve generalization, which challenges a perception in the literature. This establishes future opportunities for model development and integration of enzyme-substrate scope models into computer-aided synthesis planning software.
Collapse
|
6
|
Pazos F. Computational prediction of protein functional sites-Applications in biotechnology and biomedicine. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2022; 130:39-57. [PMID: 35534114 DOI: 10.1016/bs.apcsb.2021.12.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
There are many computational approaches for predicting protein functional sites based on different sequence and structural features. These methods are essential to cope with the sequence deluge that is filling databases with uncharacterized protein sequences. They complement the more expensive and time-consuming experimental approaches by pointing them to possible candidate positions. In many cases they are jointly used to characterize the functional sites in proteins of biotechnological and biomedical interest and eventually modify them for different purposes. There is a clear trend towards approaches based on machine learning and those using structural information, due to the recent developments in these areas. Nevertheless, "classic" methods based on sequence and evolutionary features are still playing an important role as these features are strongly related to functionality. In this review, the main approaches for predicting general functional sites in a protein are discussed, with a focus on sequence-based approaches.
Collapse
Affiliation(s)
- Florencio Pazos
- Computational Systems Biology Group, National Center for Biotechnology (CNB-CSIC), Madrid, Spain.
| |
Collapse
|
7
|
Rosen MR, Leuthaeuser JB, Parish CA, Fetrow JS. Isofunctional Clustering and Conformational Analysis of the Arsenate Reductase Superfamily Reveals Nine Distinct Clusters. Biochemistry 2020; 59:4262-4284. [PMID: 33135415 DOI: 10.1021/acs.biochem.0c00651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Arsenate reductase (ArsC) is a superfamily of enzymes that reduce arsenate. Due to active site similarities, some ArsC can function as low-molecular weight protein tyrosine phosphatases (LMW-PTPs). Broad superfamily classifications align with redox partners (Trx- or Grx-linked). To understand this superfamily's mechanistic diversity, the ArsC superfamily is classified on the basis of active site features utilizing the tools TuLIP (two-level iterative clustering process) and autoMISST (automated multilevel iterative sequence searching technique). This approach identified nine functionally relevant (perhaps isofunctional) protein groups. Five groups exhibit distinct ArsC mechanisms. Three are Grx-linked: group 4AA (classical ArsC), group 3AAA (YffB-like), and group 5BAA. Two are Trx-linked: groups 6AAAAA and 7AAAAAAAA. One is an Spx-like transcriptional regulatory group, group 5AAA. Three are potential LMW-PTP groups: groups 7BAAAA, and 7AAAABAA, which have not been previously identified, and the well-studied LMW-PTP family group 8AAA. Molecular dynamics simulations were utilized to explore functional site details. In several families, we confirm and add detail to literature-based mechanistic information. Mechanistic roles are hypothesized for conserved active site residues in several families. In three families, simulations of the unliganded structure sample specific conformational ensembles, which are proposed to represent either a more ligand-binding-competent conformation or a pathway toward a more binding-competent state; these active sites may be designed to traverse high-energy barriers to the lower-energy conformations necessary to more readily bind ligands. This more detailed biochemical understanding of ArsC and ArsC-like PTP mechanisms opens possibilities for further understanding of arsenate bioremediation and the LMW-PTP mechanism.
Collapse
Affiliation(s)
- Mikaela R Rosen
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Janelle B Leuthaeuser
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Carol A Parish
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| | - Jacquelyn S Fetrow
- Department of Chemistry, Gottwald Center for the Sciences, University of Richmond, Richmond, Virginia 23713, United States
| |
Collapse
|
8
|
Domain-mediated interactions for protein subfamily identification. Sci Rep 2020; 10:264. [PMID: 31937869 PMCID: PMC6959277 DOI: 10.1038/s41598-019-57187-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2019] [Accepted: 12/23/2019] [Indexed: 11/24/2022] Open
Abstract
Within a protein family, proteins with the same domain often exhibit different cellular functions, despite the shared evolutionary history and molecular function of the domain. We hypothesized that domain-mediated interactions (DMIs) may categorize a protein family into subfamilies because the diversified functions of a single domain often depend on interacting partners of domains. Here we systematically identified DMI subfamilies, in which proteins share domains with DMI partners, as well as with various functional and physical interaction networks in individual species. In humans, DMI subfamily members are associated with similar diseases, including cancers, and are frequently co-associated with the same diseases. DMI information relates to the functional and evolutionary subdivisions of human kinases. In yeast, DMI subfamilies contain proteins with similar phenotypic outcomes from specific chemical treatments. Therefore, the systematic investigation here provides insights into the diverse functions of subfamilies derived from a protein family with a link-centric approach and suggests a useful resource for annotating the functions and phenotypic outcomes of proteins.
Collapse
|
9
|
Weissenbach J. Exploring biochemical diversity in bacteria. AN ACAD BRAS CIENC 2019; 91:e20190252. [PMID: 31365611 DOI: 10.1590/0001-3765201920190252] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Accepted: 04/18/2019] [Indexed: 11/21/2022] Open
Abstract
The various descriptors of biochemical diversity and an evaluation of its status of knowledge are briefly outlined. Using a few examples from in house research projects, I illustrate strategies used to increase this knowledge. Because bacteria represent an extremely diverse domain of life and carry out the widest known range of biochemical transformations, this mini-review focusses on bacteria.
Collapse
Affiliation(s)
- Jean Weissenbach
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 2 rue Gaston Crémieux, 91057 Evry, France
| |
Collapse
|
10
|
|
11
|
Bastard K, Isabet T, Stura EA, Legrand P, Zaparucha A. Structural Studies based on two Lysine Dioxygenases with Distinct Regioselectivity Brings Insights Into Enzyme Specificity within the Clavaminate Synthase-Like Family. Sci Rep 2018; 8:16587. [PMID: 30410048 PMCID: PMC6224419 DOI: 10.1038/s41598-018-34795-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Accepted: 10/23/2018] [Indexed: 12/19/2022] Open
Abstract
Iron(II)/α-ketoacid-dependent oxygenases (αKAOs) are enzymes that catalyze the oxidation of unactivated C-H bonds, mainly through hydroxylation. Among these, those that are active towards amino-acids and their derivatives are grouped in the Clavaminate Synthase Like (CSL) family. CSL enzymes exhibit high regio- and stereoselectivities with strict substrate specificity. This study reports the structural elucidation of two new regiodivergent members, KDO1 and KDO5, active towards lysine, and the structural and computational analysis of the whole family through modelling and classification of active sites. The structures of KDO1 and KDO5 in complex with their ligands show that one exact position in the active site controls the regioselectivity of the reaction. Our results suggest that the substrate specificity and high stereoselectivity typical of this family is linked to a lid that closes up in order to form a sub-pocket around the side chain of the substrate. This dynamic lid is found throughout the family with varying sequence and length and is associated with a conserved stable dimeric interface. Results from this study could be a starting-point for exploring the functional diversity of the CSL family and direct in vitro screening in the search for new enzymatic activities.
Collapse
Affiliation(s)
- Karine Bastard
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France
| | - Tatiana Isabet
- Synchrotron SOLEIL, L'Orme des Merisiers, Saint-Aubin, BP 48, 91192, Gif-sur-Yvette, France
| | - Enrico A Stura
- CEA, Institut des Sciences du Vivant Frédéric Joliot, Service d'Ingénierie Moléculaire des Protéines (SIMOPRO), Université Paris-Saclay, Gif-sur-Yvette, 91190, France
| | - Pierre Legrand
- Synchrotron SOLEIL, L'Orme des Merisiers, Saint-Aubin, BP 48, 91192, Gif-sur-Yvette, France
| | - Anne Zaparucha
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay, 91057, Evry, France.
| |
Collapse
|
12
|
Affiliation(s)
- Jacquelyn S. Fetrow
- Office of the President, Albright College, Reading, Pennsylvania, United States of America
- * E-mail:
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
13
|
Sánchez-Gracia A, Guirao-Rico S, Hinojosa-Alvarez S, Rozas J. Computational prediction of the phenotypic effects of genetic variants: basic concepts and some application examples in Drosophila nervous system genes. J Neurogenet 2017; 31:307-319. [DOI: 10.1080/01677063.2017.1398241] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Affiliation(s)
- Alejandro Sánchez-Gracia
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Sara Guirao-Rico
- Center for Research in Agricultural Genomics (CRAG) CSIC-IRTA-UAB-UB, Bellaterra, Spain
| | - Silvia Hinojosa-Alvarez
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Julio Rozas
- Departament de Genètica, Microbiologia i Estadística and Institut de Recerca de la Biodiversitat (IRBio), Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| |
Collapse
|
14
|
Bastard K, Perret A, Mariage A, Bessonnet T, Pinet-Turpault A, Petit JL, Darii E, Bazire P, Vergne-Vaxelaire C, Brewee C, Debard A, Pellouin V, Besnard-Gonnet M, Artiguenave F, Médigue C, Vallenet D, Danchin A, Zaparucha A, Weissenbach J, Salanoubat M, de Berardinis V. Parallel evolution of non-homologous isofunctional enzymes in methionine biosynthesis. Nat Chem Biol 2017; 13:858-866. [PMID: 28581482 DOI: 10.1038/nchembio.2397] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2016] [Accepted: 03/22/2017] [Indexed: 12/30/2022]
Abstract
Experimental validation of enzyme function is crucial for genome interpretation, but it remains challenging because it cannot be scaled up to accommodate the constant accumulation of genome sequences. We tackled this issue for the MetA and MetX enzyme families, phylogenetically unrelated families of acyl-L-homoserine transferases involved in L-methionine biosynthesis. Members of these families are prone to incorrect annotation because MetX and MetA enzymes are assumed to always use acetyl-CoA and succinyl-CoA, respectively. We determined the enzymatic activities of 100 enzymes from diverse species, and interpreted the results by structural classification of active sites based on protein structure modeling. We predict that >60% of the 10,000 sequences from these families currently present in databases are incorrectly annotated, and suggest that acetyl-CoA was originally the sole substrate of these isofunctional enzymes, which evolved to use exclusively succinyl-CoA in the most recent bacteria. We also uncovered a divergent subgroup of MetX enzymes in fungi that participate only in L-cysteine biosynthesis as O-succinyl-L-serine transferases.
Collapse
Affiliation(s)
- Karine Bastard
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Alain Perret
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Aline Mariage
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Thomas Bessonnet
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Agnès Pinet-Turpault
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Jean-Louis Petit
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Ekaterina Darii
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Pascal Bazire
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Carine Vergne-Vaxelaire
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Clémence Brewee
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Adrien Debard
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Virginie Pellouin
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Marielle Besnard-Gonnet
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | | | - Claudine Médigue
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - David Vallenet
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Antoine Danchin
- Institute of Cardiometabolism and Nutrition (ICAN), Hôpital de la Pitié-Salpêtrière, Paris, France
| | - Anne Zaparucha
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Jean Weissenbach
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Marcel Salanoubat
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| | - Véronique de Berardinis
- CEA, DRF, Genoscope, Evry, France.,CNRS, UMR8030 Génomique Métabolique, Evry, France.,Université d'Evry Val d'Essonne, Evry, France.,Université Paris-Saclay, Evry, France
| |
Collapse
|
15
|
Knutson ST, Westwood BM, Leuthaeuser JB, Turner BE, Nguyendac D, Shea G, Kumar K, Hayden JD, Harper AF, Brown SD, Morris JH, Ferrin TE, Babbitt PC, Fetrow JS. An approach to functionally relevant clustering of the protein universe: Active site profile-based clustering of protein structures and sequences. Protein Sci 2017; 26:677-699. [PMID: 28054422 PMCID: PMC5368075 DOI: 10.1002/pro.3112] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2016] [Accepted: 12/22/2016] [Indexed: 01/11/2023]
Abstract
Protein function identification remains a significant problem. Solving this problem at the molecular functional level would allow mechanistic determinant identification-amino acids that distinguish details between functional families within a superfamily. Active site profiling was developed to identify mechanistic determinants. DASP and DASP2 were developed as tools to search sequence databases using active site profiling. Here, TuLIP (Two-Level Iterative clustering Process) is introduced as an iterative, divisive clustering process that utilizes active site profiling to separate structurally characterized superfamily members into functionally relevant clusters. Underlying TuLIP is the observation that functionally relevant families (curated by Structure-Function Linkage Database, SFLD) self-identify in DASP2 searches; clusters containing multiple functional families do not. Each TuLIP iteration produces candidate clusters, each evaluated to determine if it self-identifies using DASP2. If so, it is deemed a functionally relevant group. Divisive clustering continues until each structure is either a functionally relevant group member or a singlet. TuLIP is validated on enolase and glutathione transferase structures, superfamilies well-curated by SFLD. Correlation is strong; small numbers of structures prevent statistically significant analysis. TuLIP-identified enolase clusters are used in DASP2 GenBank searches to identify sequences sharing functional site features. Analysis shows a true positive rate of 96%, false negative rate of 4%, and maximum false positive rate of 4%. F-measure and performance analysis on the enolase search results and comparison to GEMMA and SCI-PHY demonstrate that TuLIP avoids the over-division problem of these methods. Mechanistic determinants for enolase families are evaluated and shown to correlate well with literature results.
Collapse
Affiliation(s)
- Stacy T. Knutson
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Brian M. Westwood
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
- Department of Computer ScienceWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Janelle B. Leuthaeuser
- Molecular Genetics and Genomics ProgramWake Forest School of MedicineWinston‐SalemNorth Carolina27157
| | - Brandon E. Turner
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Don Nguyendac
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Gabrielle Shea
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Kiran Kumar
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Julia D. Hayden
- Biochemistry Program, Dickinson CollegeCarlislePennsylvania17013
| | - Angela F. Harper
- Department of PhysicsWake Forest UniversityWinston‐SalemNorth Carolina27106
| | - Shoshana D. Brown
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - John H. Morris
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Thomas E. Ferrin
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | - Patricia C. Babbitt
- Department of Pharmaceutical ChemistryUniversity of CaliforniaSan FranciscoCalifornia94158
| | | |
Collapse
|
16
|
Harper AF, Leuthaeuser JB, Babbitt PC, Morris JH, Ferrin TE, Poole LB, Fetrow JS. An Atlas of Peroxiredoxins Created Using an Active Site Profile-Based Approach to Functionally Relevant Clustering of Proteins. PLoS Comput Biol 2017; 13:e1005284. [PMID: 28187133 PMCID: PMC5302317 DOI: 10.1371/journal.pcbi.1005284] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 12/06/2016] [Indexed: 12/15/2022] Open
Abstract
Peroxiredoxins (Prxs or Prdxs) are a large protein superfamily of antioxidant enzymes that rapidly detoxify damaging peroxides and/or affect signal transduction and, thus, have roles in proliferation, differentiation, and apoptosis. Prx superfamily members are widespread across phylogeny and multiple methods have been developed to classify them. Here we present an updated atlas of the Prx superfamily identified using a novel method called MISST (Multi-level Iterative Sequence Searching Technique). MISST is an iterative search process developed to be both agglomerative, to add sequences containing similar functional site features, and divisive, to split groups when functional site features suggest distinct functionally-relevant clusters. Superfamily members need not be identified initially-MISST begins with a minimal representative set of known structures and searches GenBank iteratively. Further, the method's novelty lies in the manner in which isofunctional groups are selected; rather than use a single or shifting threshold to identify clusters, the groups are deemed isofunctional when they pass a self-identification criterion, such that the group identifies itself and nothing else in a search of GenBank. The method was preliminarily validated on the Prxs, as the Prxs presented challenges of both agglomeration and division. For example, previous sequence analysis clustered the Prx functional families Prx1 and Prx6 into one group. Subsequent expert analysis clearly identified Prx6 as a distinct functionally relevant group. The MISST process distinguishes these two closely related, though functionally distinct, families. Through MISST search iterations, over 38,000 Prx sequences were identified, which the method divided into six isofunctional clusters, consistent with previous expert analysis. The results represent the most complete computational functional analysis of proteins comprising the Prx superfamily. The feasibility of this novel method is demonstrated by the Prx superfamily results, laying the foundation for potential functionally relevant clustering of the universe of protein sequences.
Collapse
Affiliation(s)
- Angela F. Harper
- Department of Physics, Wake Forest University, Winston-Salem, North Carolina, United States of America
| | - Janelle B. Leuthaeuser
- Department of Molecular Genetics and Genomics, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Patricia C. Babbitt
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - John H. Morris
- Department of Pharmaceutical Chemistry, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - Thomas E. Ferrin
- Department of Pharmaceutical Chemistry, University of California San Francisco School of Pharmacy, San Francisco, California, United States of America
| | - Leslie B. Poole
- Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Jacquelyn S. Fetrow
- Department of Chemistry, University of Richmond, Richmond, Virginia, United States of America
| |
Collapse
|
17
|
Moll M, Finn PW, Kavraki LE. Structure-guided selection of specificity determining positions in the human Kinome. BMC Genomics 2016; 17 Suppl 4:431. [PMID: 27556159 PMCID: PMC5001202 DOI: 10.1186/s12864-016-2790-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background The human kinome contains many important drug targets. It is well-known that inhibitors of protein kinases bind with very different selectivity profiles. This is also the case for inhibitors of many other protein families. The increased availability of protein 3D structures has provided much information on the structural variation within a given protein family. However, the relationship between structural variations and binding specificity is complex and incompletely understood. We have developed a structural bioinformatics approach which provides an analysis of key determinants of binding selectivity as a tool to enhance the rational design of drugs with a specific selectivity profile. Results We propose a greedy algorithm that computes a subset of residue positions in a multiple sequence alignment such that structural and chemical variation in those positions helps explain known binding affinities. By providing this information, the main purpose of the algorithm is to provide experimentalists with possible insights into how the selectivity profile of certain inhibitors is achieved, which is useful for lead optimization. In addition, the algorithm can also be used to predict binding affinities for structures whose affinity for a given inhibitor is unknown. The algorithm’s performance is demonstrated using an extensive dataset for the human kinome. Conclusion We show that the binding affinity of 38 different kinase inhibitors can be explained with consistently high precision and accuracy using the variation of at most six residue positions in the kinome binding site. We show for several inhibitors that we are able to identify residues that are known to be functionally important.
Collapse
Affiliation(s)
- Mark Moll
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA.
| | - Paul W Finn
- University of Buckingham, Hunter St, Buckingham, UK
| | - Lydia E Kavraki
- Department of Computer Science, Rice University, PO Box 1892, Houston, 77251, TX, USA
| |
Collapse
|
18
|
Boari de Lima E, Meira W, de Melo-Minardi RC. Isofunctional Protein Subfamily Detection Using Data Integration and Spectral Clustering. PLoS Comput Biol 2016; 12:e1005001. [PMID: 27348631 PMCID: PMC4922564 DOI: 10.1371/journal.pcbi.1005001] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 05/22/2016] [Indexed: 01/14/2023] Open
Abstract
As increasingly more genomes are sequenced, the vast majority of proteins may only be annotated computationally, given experimental investigation is extremely costly. This highlights the need for computational methods to determine protein functions quickly and reliably. We believe dividing a protein family into subtypes which share specific functions uncommon to the whole family reduces the function annotation problem's complexity. Hence, this work's purpose is to detect isofunctional subfamilies inside a family of unknown function, while identifying differentiating residues. Similarity between protein pairs according to various properties is interpreted as functional similarity evidence. Data are integrated using genetic programming and provided to a spectral clustering algorithm, which creates clusters of similar proteins. The proposed framework was applied to well-known protein families and to a family of unknown function, then compared to ASMC. Results showed our fully automated technique obtained better clusters than ASMC for two families, besides equivalent results for other two, including one whose clusters were manually defined. Clusters produced by our framework showed great correspondence with the known subfamilies, besides being more contrasting than those produced by ASMC. Additionally, for the families whose specificity determining positions are known, such residues were among those our technique considered most important to differentiate a given group. When run with the crotonase and enolase SFLD superfamilies, the results showed great agreement with this gold-standard. Best results consistently involved multiple data types, thus confirming our hypothesis that similarities according to different knowledge domains may be used as functional similarity evidence. Our main contributions are the proposed strategy for selecting and integrating data types, along with the ability to work with noisy and incomplete data; domain knowledge usage for detecting subfamilies in a family with different specificities, thus reducing the complexity of the experimental function characterization problem; and the identification of residues responsible for specificity.
Collapse
Affiliation(s)
- Elisa Boari de Lima
- Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Wagner Meira
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | | |
Collapse
|
19
|
Ligand-binding specificity and promiscuity of the main lignocellulolytic enzyme families as revealed by active-site architecture analysis. Sci Rep 2016; 6:23605. [PMID: 27009476 PMCID: PMC4806347 DOI: 10.1038/srep23605] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 03/09/2016] [Indexed: 02/02/2023] Open
Abstract
Biomass can be converted into sugars by a series of lignocellulolytic enzymes, which belong to the glycoside hydrolase (GH) families summarized in CAZy databases. Here, using a structural bioinformatics method, we analyzed the active site architecture of the main lignocellulolytic enzyme families. The aromatic amino acids Trp/Tyr and polar amino acids Glu/Asp/Asn/Gln/Arg occurred at higher frequencies in the active site architecture than in the whole enzyme structure. And the number of potential subsites was significantly different among different families. In the cellulase and xylanase families, the conserved amino acids in the active site architecture were mostly found at the −2 to +1 subsites, while in β-glucosidase they were mainly concentrated at the −1 subsite. Families with more conserved binding amino acid residues displayed strong selectivity for their ligands, while those with fewer conserved binding amino acid residues often exhibited promiscuity when recognizing ligands. Enzymes with different activities also tended to bind different hydroxyl oxygen atoms on the ligand. These results may help us to better understand the common and unique structural bases of enzyme-ligand recognition from different families and provide a theoretical basis for the functional evolution and rational design of major lignocellulolytic enzymes.
Collapse
|
20
|
Schwarz RF, Tamuri AU, Kultys M, King J, Godwin J, Florescu AM, Schultz J, Goldman N. ALVIS: interactive non-aggregative visualization and explorative analysis of multiple sequence alignments. Nucleic Acids Res 2016; 44:e77. [PMID: 26819408 PMCID: PMC4856975 DOI: 10.1093/nar/gkw022] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 01/08/2016] [Indexed: 12/19/2022] Open
Abstract
Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles, a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version (http://www.bitbucket.org/rfs/alvis) and its Sequence Bundles visualization module is further available as a web application (http://science-practice.com/projects/sequence-bundles).
Collapse
Affiliation(s)
- Roland F Schwarz
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Asif U Tamuri
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| | - Marek Kultys
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James King
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - James Godwin
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Ana M Florescu
- Science Practice, 83-85 Paul Street, London, EC2A 4NQ, UK
| | - Jörg Schultz
- Center for Computational and Theoretical Biology and Department of Bioinformatics, University of Würzburg, Biocenter, Am Hubland, 97074 Würzburg, Germany
| | - Nick Goldman
- European Molecular Biology Laboratory-European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK
| |
Collapse
|
21
|
Substrate-binding specificity of chitinase and chitosanase as revealed by active-site architecture analysis. Carbohydr Res 2015; 418:50-56. [PMID: 26545262 DOI: 10.1016/j.carres.2015.10.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Revised: 10/03/2015] [Accepted: 10/06/2015] [Indexed: 11/21/2022]
Abstract
Chitinases and chitosanases, referred to as chitinolytic enzymes, are two important categories of glycoside hydrolases (GH) that play a key role in degrading chitin and chitosan, two naturally abundant polysaccharides. Here, we investigate the active site architecture of the major chitosanase (GH8, GH46) and chitinase families (GH18, GH19). Both charged (Glu, His, Arg, Asp) and aromatic amino acids (Tyr, Trp, Phe) are observed with higher frequency within chitinolytic active sites as compared to elsewhere in the enzyme structure, indicating significant roles related to enzyme function. Hydrogen bonds between chitinolytic enzymes and the substrate C2 functional groups, i.e. amino groups and N-acetyl groups, drive substrate recognition, while non-specific CH-π interactions between aromatic residues and substrate mainly contribute to tighter binding and enhanced processivity evident in GH8 and GH18 enzymes. For different families of chitinolytic enzymes, the number, type, and position of substrate atoms bound in the active site vary, resulting in different substrate-binding specificities. The data presented here explain the synergistic action of multiple enzyme families at a molecular level and provide a more reasonable method for functional annotation, which can be further applied toward the practical engineering of chitinases and chitosanases.
Collapse
|
22
|
Chagoyen M, García-Martín JA, Pazos F. Practical analysis of specificity-determining residues in protein families. Brief Bioinform 2015; 17:255-61. [DOI: 10.1093/bib/bbv045] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Accepted: 06/15/2015] [Indexed: 12/17/2022] Open
|
23
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
24
|
Revealing the hidden functional diversity of an enzyme family. Nat Chem Biol 2013; 10:42-9. [DOI: 10.1038/nchembio.1387] [Citation(s) in RCA: 81] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2013] [Accepted: 10/02/2013] [Indexed: 11/08/2022]
|
25
|
|
26
|
Chen Z, Zeng AP. Protein design in systems metabolic engineering for industrial strain development. Biotechnol J 2013; 8:523-33. [PMID: 23589416 DOI: 10.1002/biot.201200238] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2012] [Revised: 01/24/2013] [Accepted: 02/27/2013] [Indexed: 12/20/2022]
Abstract
Accelerating the process of industrial bacterial host strain development, aimed at increasing productivity, generating new bio-products or utilizing alternative feedstocks, requires the integration of complementary approaches to manipulate cellular metabolism and regulatory networks. Systems metabolic engineering extends the concept of classical metabolic engineering to the systems level by incorporating the techniques used in systems biology and synthetic biology, and offers a framework for the development of the next generation of industrial strains. As one of the most useful tools of systems metabolic engineering, protein design allows us to design and optimize cellular metabolism at a molecular level. Here, we review the current strategies of protein design for engineering cellular synthetic pathways, metabolic control systems and signaling pathways, and highlight the challenges of this subfield within the context of systems metabolic engineering.
Collapse
Affiliation(s)
- Zhen Chen
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Hamburg, Germany
| | | |
Collapse
|
27
|
Gaston D, Susko E, Roger AJ. A phylogenetic mixture model for the identification of functionally divergent protein residues. ACTA ACUST UNITED AC 2011; 27:2655-63. [PMID: 21840876 DOI: 10.1093/bioinformatics/btr470] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION To understand the evolution of molecular function within protein families, it is important to identify those amino acid residues responsible for functional divergence; i.e. those sites in a protein family that affect cofactor, protein or substrate binding preferences; affinity; catalysis; flexibility; or folding. Type I functional divergence (FD) results from changes in conservation (evolutionary rate) at a site between protein subfamilies, whereas type II FD occurs when there has been a shift in preferences for different amino acid chemical properties. A variety of methods have been developed for identifying both site types in protein subfamilies, both from phylogenetic and information-theoretic angles. However, evaluation of the performance of these methods has typically relied upon a handful of reasonably well-characterized biological datasets or analyses of a single biological example. While experimental validation of many truly functionally divergent sites (true positives) can be relatively straightforward, determining that particular sites do not contribute to functional divergence (i.e. false positives and true negatives) is much more difficult, resulting in noisy 'gold standard' examples. RESULTS We describe a novel, phylogeny-based functional divergence classifier, FunDi. Unlike previous approaches, FunDi uses a unified mixture model-based approach to detect type I and type II FD. To assess FunDi's overall classification performance relative to other methods, we introduce two methods for simulating functionally divergent datasets. We find that the FunDi method performs better than several other predictors over a wide variety of simulation conditions. AVAILABILITY http://rogerlab.biochem.dal.ca/Software CONTACT andrew.roger@dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel Gaston
- Centre for Comparative Genomics and Evolutionary Bioinformatics, Dalhousie University, Halifax, Canada, B3H 1X5
| | | | | |
Collapse
|
28
|
Perret A, Lechaplais C, Tricot S, Perchat N, Vergne C, Pellé C, Bastard K, Kreimeyer A, Vallenet D, Zaparucha A, Weissenbach J, Salanoubat M. A novel acyl-CoA beta-transaminase characterized from a metagenome. PLoS One 2011; 6:e22918. [PMID: 21826218 PMCID: PMC3149608 DOI: 10.1371/journal.pone.0022918] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2010] [Accepted: 07/09/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Bacteria are key components in all ecosystems. However, our knowledge of bacterial metabolism is based solely on the study of cultivated organisms which represent just a tiny fraction of microbial diversity. To access new enzymatic reactions and new or alternative pathways, we investigated bacterial metabolism through analyses of uncultivated bacterial consortia. METHODOLOGY/PRINCIPAL FINDINGS We applied the gene context approach to assembled sequences of the metagenome of the anaerobic digester of a municipal wastewater treatment plant, and identified a new gene which may participate in an alternative pathway of lysine fermentation. CONCLUSIONS We characterized a novel, unique aminotransferase that acts exclusively on Coenzyme A (CoA) esters, and proposed a variant route for lysine fermentation. Results suggest that most of the lysine fermenting organisms use this new pathway in the digester. Its presence in organisms representative of two distinct bacterial divisions indicate that it may also be present in other organisms.
Collapse
Affiliation(s)
- Alain Perret
- Commissariat à l'Energie Atomique et aux Energies Alternatives, Institut de Génomique, Genoscope, Evry, France.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|