1
|
Hogg BN, Schnepel C, Finnigan JD, Charnock SJ, Hayes MA, Turner NJ. The Impact of Metagenomics on Biocatalysis. Angew Chem Int Ed Engl 2024; 63:e202402316. [PMID: 38494442 DOI: 10.1002/anie.202402316] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Revised: 03/11/2024] [Accepted: 03/12/2024] [Indexed: 03/19/2024]
Abstract
In the ever-growing demand for sustainable ways to produce high-value small molecules, biocatalysis has come to the forefront of greener routes to these chemicals. As such, the need to constantly find and optimise suitable biocatalysts for specific transformations has never been greater. Metagenome mining has been shown to rapidly expand the toolkit of promiscuous enzymes needed for new transformations, without requiring protein engineering steps. If protein engineering is needed, the metagenomic candidate can often provide a better starting point for engineering than a previously discovered enzyme on the open database or from literature, for instance. In this review, we highlight where metagenomics has made substantial impact on the area of biocatalysis in recent years. We review the discovery of enzymes in previously unexplored or 'hidden' sequence space, leading to the characterisation of enzymes with enhanced properties that originate from natural selection pressures in native environments.
Collapse
Affiliation(s)
- Bethany N Hogg
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, M1 7DN, UK
| | - Christian Schnepel
- School of Engineering Sciences in Chemistry, Biotechnology and Health, Department of Industrial Biotechnology, KTH Royal Institute of Technology, AlbaNova University Center, 11421, Stockholm, SE
| | - James D Finnigan
- Prozomix, Building 4, West End Ind. Estate, Haltwhistle, NE49 9HA, UK
| | - Simon J Charnock
- Prozomix, Building 4, West End Ind. Estate, Haltwhistle, NE49 9HA, UK
| | - Martin A Hayes
- Compound Synthesis and Management, Discovery Sciences, Biopharmaceuticals R&D , AstraZeneca, Mölndal 431 50, Gothenburg, SE
| | - Nicholas J Turner
- Department of Chemistry, University of Manchester, Manchester Institute of Biotechnology, 131 Princess Street, Manchester, M1 7DN, UK
| |
Collapse
|
2
|
Thermophilic Carboxylesterases from Hydrothermal Vents of the Volcanic Island of Ischia Active on Synthetic and Biobased Polymers and Mycotoxins. Appl Environ Microbiol 2023; 89:e0170422. [PMID: 36719236 PMCID: PMC9972953 DOI: 10.1128/aem.01704-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Hydrothermal vents are geographically widespread and host microorganisms with robust enzymes useful in various industrial applications. We examined microbial communities and carboxylesterases of two terrestrial hydrothermal vents of the volcanic island of Ischia (Italy) predominantly composed of Firmicutes, Proteobacteria, and Bacteroidota. High-temperature enrichment cultures with the polyester plastics polyhydroxybutyrate and polylactic acid (PLA) resulted in an increase of Thermus and Geobacillus species and to some extent Fontimonas and Schleiferia species. The screening at 37 to 70°C of metagenomic fosmid libraries from above enrichment cultures identified three hydrolases (IS10, IS11, and IS12), all derived from yet-uncultured Chloroflexota and showing low sequence identity (33 to 56%) to characterized enzymes. Enzymes expressed in Escherichia coli exhibited maximal esterase activity at 70 to 90°C, with IS11 showing the highest thermostability (90% activity after 20-min incubation at 80°C). IS10 and IS12 were highly substrate promiscuous and hydrolyzed all 51 monoester substrates tested. Enzymes were active with PLA, polyethylene terephthalate model substrate, and mycotoxin T-2 (IS12). IS10 and IS12 had a classical α/β-hydrolase core domain with a serine hydrolase catalytic triad (Ser155, His280, and Asp250) in their hydrophobic active sites. The crystal structure of IS11 resolved at 2.92 Å revealed the presence of a N-terminal β-lactamase-like domain and C-terminal lipocalin domain. The catalytic cleft of IS11 included catalytic Ser68, Lys71, Tyr160, and Asn162, whereas the lipocalin domain enclosed the catalytic cleft like a lid and contributed to substrate binding. Our study identified novel thermotolerant carboxylesterases with a broad substrate range, including polyesters and mycotoxins, for potential applications in biotechnology. IMPORTANCE High-temperature-active microbial enzymes are important biocatalysts for many industrial applications, including recycling of synthetic and biobased polyesters increasingly used in textiles, fibers, coatings and adhesives. Here, we identified three novel thermotolerant carboxylesterases (IS10, IS11, and IS12) from high-temperature enrichment cultures from Ischia hydrothermal vents and incubated with biobased polymers. The identified metagenomic enzymes originated from uncultured Chloroflexota and showed low sequence similarity to known carboxylesterases. Active sites of IS10 and IS12 had the largest effective volumes among the characterized prokaryotic carboxylesterases and exhibited high substrate promiscuity, including hydrolysis of polyesters and mycotoxin T-2 (IS12). Though less promiscuous than IS10 and IS12, IS11 had a higher thermostability with a high temperature optimum (80 to 90°C) for activity and hydrolyzed polyesters, and its crystal structure revealed an unusual lipocalin domain likely involved in substrate binding. The polyesterase activity of these enzymes makes them attractive candidates for further optimization and potential application in plastics recycling.
Collapse
|
3
|
Akdel M, Pires DEV, Pardo EP, Jänes J, Zalevsky AO, Mészáros B, Bryant P, Good LL, Laskowski RA, Pozzati G, Shenoy A, Zhu W, Kundrotas P, Serra VR, Rodrigues CHM, Dunham AS, Burke D, Borkakoti N, Velankar S, Frost A, Basquin J, Lindorff-Larsen K, Bateman A, Kajava AV, Valencia A, Ovchinnikov S, Durairaj J, Ascher DB, Thornton JM, Davey NE, Stein A, Elofsson A, Croll TI, Beltrao P. A structural biology community assessment of AlphaFold2 applications. Nat Struct Mol Biol 2022; 29:1056-1067. [PMID: 36344848 PMCID: PMC9663297 DOI: 10.1038/s41594-022-00849-w] [Citation(s) in RCA: 198] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2021] [Accepted: 09/20/2022] [Indexed: 11/09/2022]
Abstract
Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods for protein structure predictions have reached the accuracy of experimentally determined models. Although this has been independently verified, the implementation of these methods across structural-biology applications remains to be tested. Here, we evaluate the use of AlphaFold2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modeling of interactions; and modeling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modeled when compared with homology modeling, identifying structural features rarely seen in the Protein Data Bank. AF2-based predictions of protein disorder and complexes surpass dedicated tools, and AF2 models can be used across diverse applications equally well compared with experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life-science research.
Collapse
Affiliation(s)
- Mehmet Akdel
- Bioinformatics Group, Department of Plant Sciences, Wageningen University and Research, Wageningen, the Netherlands
| | - Douglas E V Pires
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Eduard Porta Pardo
- Josep Carreras Leukaemia Research Institute (IJC), Badalona, Spain
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Jürgen Jänes
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Arthur O Zalevsky
- Shemyakin-Ovchinnikov Institute of Bioorganic Chemistry, Russian Academy of Sciences, Moscow, Russian Federation
| | | | - Patrick Bryant
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Lydia L Good
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Roman A Laskowski
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Gabriele Pozzati
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Aditi Shenoy
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Wensi Zhu
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | - Petras Kundrotas
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden
| | | | - Carlos H M Rodrigues
- School of Computing and Information Systems, University of Melbourne, Melbourne, Victoria, Australia
| | - Alistair S Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - David Burke
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Sameer Velankar
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Adam Frost
- Department of Biochemistry and Biophysics University of California, San Francisco, CA, USA
| | - Jérôme Basquin
- Department of Structural Cell Biology, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Alex Bateman
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
| | - Andrey V Kajava
- Université de Montpellier, Centre de Recherche en Biologie Cellulaire de Montpellier (CRBM) CNRS, Montpellier, France
| | | | - Sergey Ovchinnikov
- Faculty of Arts and Sciences, Division of Science, Harvard University, Cambridge, MA, USA.
| | | | - David B Ascher
- School of Chemistry and Molecular Biology, University of Queensland, Brisbane, Queensland, Australia.
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
| | | | - Amelie Stein
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| | - Arne Elofsson
- Dep of Biochemistry and Biophysics and Science for Life Laboratory, Solna, Sweden.
| | - Tristan I Croll
- Cambridge Institute for Medical Research, Department of Haematology, The University of Cambridge, Cambridge, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
4
|
Semwal R, Aier I, Tyagi P, Varadwaj PK. DeEPn: a deep neural network based tool for enzyme functional annotation. J Biomol Struct Dyn 2020; 39:2733-2743. [PMID: 32274968 DOI: 10.1080/07391102.2020.1754292] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
With the advancement of high throughput techniques, the discovery rate of enzyme sequences has increased significantly in the recent past. All of these raw sequences are required to be precisely mapped to their respective functional attributes, which helps in deciphering their biological role. In the recent past, various prediction models have been proposed to predict the enzyme functional class; however, all of these models were able to quantify at most six functional enzyme classes (EC1 to EC6) out of existing seven functional classes, making these approaches inappropriate for handling enzymes corresponding to the seventh functional class (EC7). In this study, a Deep Neural Network-based approach, DeEPn, has been proposed, which can quantify enzymes corresponding to all seven functional classes with high precision and accuracy. The proposed model was compared with two recently developed tools, ECPred and SVM-Prot. The result demonstrated that DeEPn outperformed ECPred and SVM-Prot in terms of predictive quality. The DeEPn tool has been hosted as a web-based tool at https://bioserver.iiita.ac.in/DeEPn/.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Rahul Semwal
- Department of Information Technology (Bioinformatics), Indian Institute of Information Technology Allahabad, Allahabad, Uttar Pradesh, India
| | - Imlimaong Aier
- Department of Bioinformatics and Applied Science, Indian Institute of Information Technology, Allahabad, Allahabad, Uttar Pradesh, India
| | - Pankaj Tyagi
- Department of Information Technology (Bioinformatics), Indian Institute of Information Technology Allahabad, Allahabad, Uttar Pradesh, India
| | - Pritish Kumar Varadwaj
- Department of Bioinformatics and Applied Science, Indian Institute of Information Technology, Allahabad, Allahabad, Uttar Pradesh, India
| |
Collapse
|
5
|
Laurenceau R, Bliem C, Osburne MS, Becker JW, Biller SJ, Cubillos-Ruiz A, Chisholm SW. Toward a genetic system in the marine cyanobacterium Prochlorococcus. Access Microbiol 2020; 2:acmi000107. [PMID: 33005871 PMCID: PMC7523629 DOI: 10.1099/acmi.0.000107] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Accepted: 01/30/2020] [Indexed: 11/26/2022] Open
Abstract
As the smallest and most abundant primary producer in the oceans, the cyanobacterium Prochlorococcus is of interest to diverse branches of science. For the past 30 years, research on this minimal phototroph has led to a growing understanding of biological organization across multiple scales, from the genome to the global ocean ecosystem. Progress in understanding drivers of its diversity and ecology, as well as molecular mechanisms underpinning its streamlined simplicity, has been hampered by the inability to manipulate these cells genetically. Multiple attempts have been made to develop an efficient genetic transformation method for Prochlorococcus over the years; all have been unsuccessful to date, despite some success with their close relative, Synechococcus. To avoid the pursuit of unproductive paths, we report here what has not worked in our hands, as well as our progress developing a method to screen the most efficient electroporation parameters for optimal DNA delivery into Prochlorococcus cells. We also report a novel protocol for obtaining axenic colonies and a new method for differentiating live and dead cells. The electroporation method can be used to optimize DNA delivery into any bacterium, making it a useful tool for advancing transformation systems in other genetically recalcitrant microorganisms.
Collapse
Affiliation(s)
- Raphaël Laurenceau
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Christina Bliem
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Marcia S Osburne
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Present address: Department of Molecular Biology and Microbiology Tufts University School of Medicine, Boston, MA, USA
| | - Jamie W Becker
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Present address: Department of Biology, Haverford College, Haverford, PA, USA
| | - Steven J Biller
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Present address: Department of Biological Sciences, Wellesley College, Wellesley, MA, USA
| | - Andres Cubillos-Ruiz
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Present address: Institute for Medical Engineering and Science, Department of Biological Engineering, and Synthetic Biology Center, Massachusetts Institute of Technology, Cambridge, MA, USA.,Present address: Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Present address: Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA, USA
| | - Sallie W Chisholm
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA.,Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
6
|
Gao R, Wang M, Zhou J, Fu Y, Liang M, Guo D, Nie J. Prediction of Enzyme Function Based on Three Parallel Deep CNN and Amino Acid Mutation. Int J Mol Sci 2019; 20:E2845. [PMID: 31212665 PMCID: PMC6600291 DOI: 10.3390/ijms20112845] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2019] [Revised: 06/03/2019] [Accepted: 06/04/2019] [Indexed: 01/28/2023] Open
Abstract
During the past decade, due to the number of proteins in PDB database being increased gradually, traditional methods cannot better understand the function of newly discovered enzymes in chemical reactions. Computational models and protein feature representation for predicting enzymatic function are more important. Most of existing methods for predicting enzymatic function have used protein geometric structure or protein sequence alone. In this paper, the functions of enzymes are predicted from many-sided biological information including sequence information and structure information. Firstly, we extract the mutation information from amino acids sequence by the position scoring matrix and express structure information with amino acids distance and angle. Then, we use histogram to show the extracted sequence and structural features respectively. Meanwhile, we establish a network model of three parallel Deep Convolutional Neural Networks (DCNN) to learn three features of enzyme for function prediction simultaneously, and the outputs are fused through two different architectures. Finally, The proposed model was investigated on a large dataset of 43,843 enzymes from the PDB and achieved 92.34% correct classification when sequence information is considered, demonstrating an improvement compared with the previous result.
Collapse
Affiliation(s)
- Ruibo Gao
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Mengmeng Wang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Jiaoyan Zhou
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Yuhang Fu
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Meng Liang
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Dongliang Guo
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| | - Junlan Nie
- School of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, Hebei, China.
| |
Collapse
|
7
|
Hu G, Wang K, Song J, Uversky VN, Kurgan L. Taxonomic Landscape of the Dark Proteomes: Whole-Proteome Scale Interplay Between Structural Darkness, Intrinsic Disorder, and Crystallization Propensity. Proteomics 2018; 18:e1800243. [PMID: 30198635 DOI: 10.1002/pmic.201800243] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 08/30/2018] [Indexed: 12/14/2022]
Abstract
Growth rate of the protein sequence universe dramatically exceeds the speed of expansion for the protein structure universe, generating an immense dark proteome that includes proteins with unknown structure. A whole-proteome scale analysis of 5.4 million proteins from 987 proteomes in the three domains of life and viruses to systematically dissect an interplay between structural coverage, degree of putative intrinsic disorder, and predicted propensity for structure determination is performed. It has been found that Archaean and Bacterial proteomes have relatively high structural coverage and low amounts of disorder, whereas Eukaryotic and Viral proteomes are characterized by a broad spread of structural coverage and higher disorder levels. The analysis reveals that dark proteomes (i.e., proteomes containing high fractions of proteins with unknown structure) have significantly elevated amounts of intrinsic disorder and are predicted to be difficult to solve structurally. Although the majority of dark proteomes are of viral origin, many dark viral proteomes have at least modest crystallization propensity and only a handful of them are enriched in the intrinsic disorder. The disorder, structural coverage, and propensity are mapped for structural determination onto a novel proteome-level sequence similarity network to analyze the interplay of these characteristics in the taxonomic landscape.
Collapse
Affiliation(s)
- Gang Hu
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Kui Wang
- School of Mathematical Sciences and LPMC, Nankai University, Tianjin, 300071, P. R. China
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia.,Monash Centre for Data Science, Faculty of Information Technology, Monash University, Melbourne, VIC 3800, Australia
| | - Vladimir N Uversky
- Department of Molecular Medicine and USF Health Byrd Alzheimer's Research Institute, Morsani College of Medicine, University of South Florida, Tampa, 33612, USA.,Institute for Biological Instrumentation, Russian Academy of Sciences, Pushchino, 142290, Russia
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| |
Collapse
|
8
|
Abstract
The vast, mostly unknown protein universe can be explored by analyzing protein sequences as a string of domains. A broader coverage can be achieved when these domains, the essential blocks in protein evolution, are detected using sequence profiles. Using clustering to collapse redundant profiles into unique function words (UFWs), we find that over the years 2009–2016, the number of UFWs saturates while the number of sequences matched by a combination of two or more UFWs grows exponentially. Between 2009 and 2016 the number of protein sequences from known species increased 10-fold from 8 million to 85 million. About 80% of these sequences contain at least one region recognized by the conserved domain architecture retrieval tool (CDART) as a sequence motif. Motifs provide clues to biological function but CDART often matches the same region of a protein by two or more profiles. Such synonyms complicate estimates of functional complexity. We do full-linkage clustering of redundant profiles by finding maximum disjoint cliques: Each cluster is replaced by a single representative profile to give what we term a unique function word (UFW). From 2009 to 2016, the number of sequence profiles used by CDART increased by 80%; the number of UFWs increased more slowly by 30%, indicating that the number of UFWs may be saturating. The number of sequences matched by a single UFW (sequences with single domain architectures) increased as slowly as the number of different words, whereas the number of sequences matched by a combination of two or more UFWs in sequences with multiple domain architectures (MDAs) increased at the same rate as the total number of sequences. This combinatorial arrangement of a limited number of UFWs in MDAs accounts for the genomic diversity of protein sequences. Although eukaryotes and prokaryotes use very similar sets of “words” or UFWs (57% shared), the “sentences” (MDAs) are different (1.3% shared).
Collapse
|
9
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
10
|
Popovic A, Hai T, Tchigvintsev A, Hajighasemi M, Nocek B, Khusnutdinova AN, Brown G, Glinos J, Flick R, Skarina T, Chernikova TN, Yim V, Brüls T, Paslier DL, Yakimov MM, Joachimiak A, Ferrer M, Golyshina OV, Savchenko A, Golyshin PN, Yakunin AF. Activity screening of environmental metagenomic libraries reveals novel carboxylesterase families. Sci Rep 2017; 7:44103. [PMID: 28272521 PMCID: PMC5341072 DOI: 10.1038/srep44103] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2016] [Accepted: 02/01/2017] [Indexed: 11/29/2022] Open
Abstract
Metagenomics has made accessible an enormous reserve of global biochemical diversity. To tap into this vast resource of novel enzymes, we have screened over one million clones from metagenome DNA libraries derived from sixteen different environments for carboxylesterase activity and identified 714 positive hits. We have validated the esterase activity of 80 selected genes, which belong to 17 different protein families including unknown and cyclase-like proteins. Three metagenomic enzymes exhibited lipase activity, and seven proteins showed polyester depolymerization activity against polylactic acid and polycaprolactone. Detailed biochemical characterization of four new enzymes revealed their substrate preference, whereas their catalytic residues were identified using site-directed mutagenesis. The crystal structure of the metal-ion dependent esterase MGS0169 from the amidohydrolase superfamily revealed a novel active site with a bound unknown ligand. Thus, activity-centered metagenomics has revealed diverse enzymes and novel families of microbial carboxylesterases, whose activity could not have been predicted using bioinformatics tools.
Collapse
Affiliation(s)
- Ana Popovic
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Tran Hai
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Anatoly Tchigvintsev
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Mahbod Hajighasemi
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Boguslaw Nocek
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
| | - Anna N Khusnutdinova
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Greg Brown
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Julia Glinos
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Robert Flick
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Tatiana Skarina
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | | | - Veronica Yim
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Thomas Brüls
- Commissariat à l'Energie Atomique et aux Energies Alternatives (CEA), Direction de la Recherche Fondamentale, Institut de Génomique, Université de d'Evry Val d'Essonne (UEVE), Centre National de la Recherche Scientifique (CNRS), UMR8030, Génomique métabolique, Evry, France
| | - Denis Le Paslier
- Université de d'Evry Val d'Essonne (UEVE), Centre National de la Recherche, Scientifique (CNRS), UMR8030, Génomique métabolique, Commissariat à l'Energie, Atomique et aux Energies Alternatives (CEA), Direction de la Recherche, Fondamentale, Institut de Génomique, Evry, France
| | | | - Andrzej Joachimiak
- Midwest Center for Structural Genomics and Structural Biology Center, Biosciences Division, Argonne National Laboratory, Argonne, Illinois 60439, USA
| | | | - Olga V Golyshina
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Alexei Savchenko
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| | - Peter N Golyshin
- School of Biological Sciences, Bangor University, Gwynedd LL57 2UW, UK
| | - Alexander F Yakunin
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
| |
Collapse
|
11
|
Exploring the dark foldable proteome by considering hydrophobic amino acids topology. Sci Rep 2017; 7:41425. [PMID: 28134276 PMCID: PMC5278394 DOI: 10.1038/srep41425] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 12/19/2016] [Indexed: 12/18/2022] Open
Abstract
The protein universe corresponds to the set of all proteins found in all organisms. A way to explore it is by taking into account the domain content of the proteins. However, some part of sequences and many entire sequences remain un-annotated despite a converging number of domain families. The un-annotated part of the protein universe is referred to as the dark proteome and remains poorly characterized. In this study, we quantify the amount of foldable domains within the dark proteome by using the hydrophobic cluster analysis methodology. These un-annotated foldable domains were grouped using a combination of remote homology searches and domain annotations, leading to define different levels of darkness. The dark foldable domains were analyzed to understand what make them different from domains stored in databases and thus difficult to annotate. The un-annotated domains of the dark proteome universe display specific features relative to database domains: shorter length, non-canonical content and particular topology in hydrophobic residues, higher propensity for disorder, and a higher energy. These features make them hard to relate to known families. Based on these observations, we emphasize that domain annotation methodologies can still be improved to fully apprehend and decipher the molecular evolution of the protein universe.
Collapse
|
12
|
Raad MD, Modavi C, Sukovich DJ, Anderson JC. Observing Biosynthetic Activity Utilizing Next Generation Sequencing and the DNA Linked Enzyme Coupled Assay. ACS Chem Biol 2017; 12:191-199. [PMID: 28103681 DOI: 10.1021/acschembio.6b00652] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Currently, the identification of new genes drastically outpaces current experimental methods for determining their enzymatic function. This disparity necessitates the development of high-throughput techniques that operate with the same scalability as modern gene synthesis and sequencing technologies. In this paper, we demonstrate the versatility of the recently reported DNA-Linked Enzyme-Coupled Assay (DLEnCA) and its ability to support high-throughput data acquisition through next-generation sequencing (NGS). Utilizing methyltransferases, we highlight DLEnCA's ability to rapidly profile an enzyme's substrate specificity, determine relative enzyme kinetics, detect biosynthetic formation of a target molecule, and its potential to benefit from the scales and standardization afforded by NGS. This improved methodology minimizes the effort in acquiring biosynthetic knowledge by tying biochemical techniques to the rapidly evolving abilities in sequencing and synthesizing DNA.
Collapse
Affiliation(s)
- Markus de Raad
- Department of Biological
Engineering, Synthetic Biology Institute, University of California, Berkeley, Berkeley, California 94704, United States
| | - Cyrus Modavi
- Department of Biological
Engineering, Synthetic Biology Institute, University of California, Berkeley, Berkeley, California 94704, United States
| | - David J. Sukovich
- Department of Biological
Engineering, Synthetic Biology Institute, University of California, Berkeley, Berkeley, California 94704, United States
| | - J. Christopher Anderson
- Department of Biological
Engineering, Synthetic Biology Institute, University of California, Berkeley, Berkeley, California 94704, United States
| |
Collapse
|
13
|
Pearson VM, Caudle SB, Rokyta DR. Viral recombination blurs taxonomic lines: examination of single-stranded DNA viruses in a wastewater treatment plant. PeerJ 2016; 4:e2585. [PMID: 27781171 PMCID: PMC5075696 DOI: 10.7717/peerj.2585] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 09/19/2016] [Indexed: 12/26/2022] Open
Abstract
Understanding the structure and dynamics of microbial communities, especially those of economic concern, is of paramount importance to maintaining healthy and efficient microbial communities at agricultural sites and large industrial cultures, including bioprocessors. Wastewater treatment plants are large bioprocessors which receive water from multiple sources, becoming reservoirs for the collection of many viral families that infect a broad range of hosts. To examine this complex collection of viruses, full-length genomes of circular ssDNA viruses were isolated from a wastewater treatment facility using a combination of sucrose-gradient size selection and rolling-circle amplification and sequenced on an Illumina MiSeq. Single-stranded DNA viruses are among the least understood groups of microbial pathogens due to genomic biases and culturing difficulties, particularly compared to the larger, more often studied dsDNA viruses. However, the group contains several notable well-studied examples, including agricultural pathogens which infect both livestock and crops (Circoviridae and Geminiviridae), and model organisms for genetics and evolution studies (Microviridae). Examination of the collected viral DNA provided evidence for 83 unique genotypic groupings, which were genetically dissimilar to known viral types and exhibited broad diversity within the community. Furthermore, although these genomes express similarities to known viral families, such as Circoviridae, Geminiviridae, and Microviridae, many are so divergent that they may represent new taxonomic groups. This study demonstrated the efficacy of the protocol for separating bacteria and large viruses from the sought after ssDNA viruses and the ability to use this protocol to obtain an in-depth analysis of the diversity within this group.
Collapse
Affiliation(s)
- Victoria M Pearson
- Department of Biological Science, Florida State University , Tallahassee , FL , USA
| | - S Brian Caudle
- Division of Food Safety, Florida Department of Agriculture and Consumer Services , Tallahassee , FL , USA
| | - Darin R Rokyta
- Department of Biological Science, Florida State University , Tallahassee , FL , USA
| |
Collapse
|
14
|
Discovery of Nigri/nox and Panto/pox site-specific recombinase systems facilitates advanced genome engineering. Sci Rep 2016; 6:30130. [PMID: 27444945 PMCID: PMC4957104 DOI: 10.1038/srep30130] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2016] [Accepted: 06/27/2016] [Indexed: 12/21/2022] Open
Abstract
Precise genome engineering is instrumental for biomedical research and holds great promise for future therapeutic applications. Site-specific recombinases (SSRs) are valuable tools for genome engineering due to their exceptional ability to mediate precise excision, integration and inversion of genomic DNA in living systems. The ever-increasing complexity of genome manipulations and the desire to understand the DNA-binding specificity of these enzymes are driving efforts to identify novel SSR systems with unique properties. Here, we describe two novel tyrosine site-specific recombination systems designated Nigri/nox and Panto/pox. Nigri originates from Vibrio nigripulchritudo (plasmid VIBNI_pA) and recombines its target site nox with high efficiency and high target-site selectivity, without recombining target sites of the well established SSRs Cre, Dre, Vika and VCre. Panto, derived from Pantoea sp. aB, is less specific and in addition to its native target site, pox also recombines the target site for Dre recombinase, called rox. This relaxed specificity allowed the identification of residues that are involved in target site selectivity, thereby advancing our understanding of how SSRs recognize their respective DNA targets.
Collapse
|
15
|
Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016; 38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]
|
16
|
Wessels HJCT, de Almeida NM, Kartal B, Keltjens JT. Bacterial Electron Transfer Chains Primed by Proteomics. Adv Microb Physiol 2016; 68:219-352. [PMID: 27134025 DOI: 10.1016/bs.ampbs.2016.02.006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Electron transport phosphorylation is the central mechanism for most prokaryotic species to harvest energy released in the respiration of their substrates as ATP. Microorganisms have evolved incredible variations on this principle, most of these we perhaps do not know, considering that only a fraction of the microbial richness is known. Besides these variations, microbial species may show substantial versatility in using respiratory systems. In connection herewith, regulatory mechanisms control the expression of these respiratory enzyme systems and their assembly at the translational and posttranslational levels, to optimally accommodate changes in the supply of their energy substrates. Here, we present an overview of methods and techniques from the field of proteomics to explore bacterial electron transfer chains and their regulation at levels ranging from the whole organism down to the Ångstrom scales of protein structures. From the survey of the literature on this subject, it is concluded that proteomics, indeed, has substantially contributed to our comprehending of bacterial respiratory mechanisms, often in elegant combinations with genetic and biochemical approaches. However, we also note that advanced proteomics offers a wealth of opportunities, which have not been exploited at all, or at best underexploited in hypothesis-driving and hypothesis-driven research on bacterial bioenergetics. Examples obtained from the related area of mitochondrial oxidative phosphorylation research, where the application of advanced proteomics is more common, may illustrate these opportunities.
Collapse
Affiliation(s)
- H J C T Wessels
- Nijmegen Center for Mitochondrial Disorders, Radboud Proteomics Centre, Translational Metabolic Laboratory, Radboud University Medical Center, Nijmegen, The Netherlands
| | - N M de Almeida
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands
| | - B Kartal
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands; Laboratory of Microbiology, Ghent University, Ghent, Belgium
| | - J T Keltjens
- Institute of Water and Wetland Research, Radboud University Nijmegen, Nijmegen, The Netherlands.
| |
Collapse
|
17
|
Addis MF, Tanca A, Uzzau S, Oikonomou G, Bicalho RC, Moroni P. The bovine milk microbiota: insights and perspectives from -omics studies. MOLECULAR BIOSYSTEMS 2016; 12:2359-72. [DOI: 10.1039/c6mb00217j] [Citation(s) in RCA: 134] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Recent findings and future perspectives of -omics studies on the bovine milk microbiota, focusing on its impact on animal health.
Collapse
Affiliation(s)
- M. F. Addis
- Porto Conte Ricerche
- SP 55 Porto Conte/Capo Caccia
- 07041 Alghero
- Italy
| | - A. Tanca
- Porto Conte Ricerche
- SP 55 Porto Conte/Capo Caccia
- 07041 Alghero
- Italy
| | - S. Uzzau
- Porto Conte Ricerche
- SP 55 Porto Conte/Capo Caccia
- 07041 Alghero
- Italy
- Università degli Studi di Sassari
| | - G. Oikonomou
- Epidemiology and Population Health
- Institute of Infection and Global Health
- University of Liverpool
- Liverpool
- UK
| | - R. C. Bicalho
- Cornell University
- Department of Population Medicine and Diagnostic Sciences
- College of Veterinary Medicine
- Ithaca
- USA
| | - P. Moroni
- Cornell University
- Department of Population Medicine and Diagnostic Sciences
- College of Veterinary Medicine
- Ithaca
- USA
| |
Collapse
|
18
|
Punta M, Mistry J. Homology-Based Annotation of Large Protein Datasets. Methods Mol Biol 2016; 1415:153-176. [PMID: 27115632 DOI: 10.1007/978-1-4939-3572-7_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Advances in DNA sequencing technologies have led to an increasing amount of protein sequence data being generated. Only a small fraction of this protein sequence data will have experimental annotation associated with them. Here, we describe a protocol for in silico homology-based annotation of large protein datasets that makes extensive use of manually curated collections of protein families. We focus on annotations provided by the Pfam database and suggest ways to identify family outliers and family variations. This protocol may be useful to people who are new to protein data analysis, or who are unfamiliar with the current computational tools that are available.
Collapse
Affiliation(s)
- Marco Punta
- Sorbonne Universités, UPMC-Univ P6, CNRS, Laboratoire de Biologie Computationnelle et Quantitative - UMR 7238, 15 rue de l'Ecole deMédecine, Paris, France.
| | - Jaina Mistry
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| |
Collapse
|
19
|
Yutin N, Shevchenko S, Kapitonov V, Krupovic M, Koonin EV. A novel group of diverse Polinton-like viruses discovered by metagenome analysis. BMC Biol 2015; 13:95. [PMID: 26560305 PMCID: PMC4642659 DOI: 10.1186/s12915-015-0207-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2015] [Accepted: 10/28/2015] [Indexed: 01/08/2023] Open
Abstract
Background The rapidly growing metagenomic databases provide increasing opportunities for computational discovery of new groups of organisms. Identification of new viruses is particularly straightforward given the comparatively small size of viral genomes, although fast evolution of viruses complicates the analysis of novel sequences. Here we report the metagenomic discovery of a distinct group of diverse viruses that are distantly related to the eukaryotic virus-like transposons of the Polinton superfamily. Results The sequence of the putative major capsid protein (MCP) of the unusual linear virophage associated with Phaeocystis globosa virus (PgVV) was used as a bait to identify potential related viruses in metagenomic databases. Assembly of the contigs encoding the PgVV MCP homologs followed by comprehensive sequence analysis of the proteins encoded in these contigs resulted in the identification of a large group of Polinton-like viruses (PLV) that resemble Polintons (polintoviruses) and virophages in genome size, and share with them a conserved minimal morphogenetic module that consists of major and minor capsid proteins and the packaging ATPase. With a single exception, the PLV lack the retrovirus-type integrase that is encoded in the genomes of all Polintons and the Mavirus group of virophages. However, some PLV encode a newly identified tyrosine recombinase-integrase that is common in bacteria and bacteriophages and is also found in the Organic Lake virophage group. Although several PLV genomes and individual genes are integrated into algal genomes, it appears likely that most of the PLV are viruses. Given the absence of protease and retrovirus-type integrase, the PLV could resemble the ancestral polintoviruses that evolved from bacterial tectiviruses. Apart from the conserved minimal morphogenetic module, the PLV widely differ in their genome complements but share a gene network with Polintons and virophages, suggestive of multiple gene exchanges within a shared gene pool. Conclusions The discovery of PLV substantially expands the emerging class of eukaryotic viruses and transposons that also includes Polintons and virophages. This class of selfish elements is extremely widespread and might have been a hotbed of eukaryotic virus, transposon and plasmid evolution. New families of these elements are expected to be discovered. Electronic supplementary material The online version of this article (doi:10.1186/s12915-015-0207-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Natalya Yutin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Sofiya Shevchenko
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Vladimir Kapitonov
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Mart Krupovic
- Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Department of Microbiology, Institut Pasteur, Paris, France
| | - Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20894, USA.
| |
Collapse
|
20
|
Masuch T, Kusnezowa A, Nilewski S, Bautista JT, Kourist R, Leichert LI. A combined bioinformatics and functional metagenomics approach to discovering lipolytic biocatalysts. Front Microbiol 2015; 6:1110. [PMID: 26528261 PMCID: PMC4602143 DOI: 10.3389/fmicb.2015.01110] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2015] [Accepted: 09/25/2015] [Indexed: 11/30/2022] Open
Abstract
The majority of protein sequence data published today is of metagenomic origin. However, our ability to assign functions to these sequences is often hampered by our general inability to cultivate the larger part of microbial species and the sheer amount of sequence data generated in these projects. Here we present a combination of bioinformatics, synthetic biology, and Escherichia coli genetics to discover biocatalysts in metagenomic datasets. We created a subset of the Global Ocean Sampling dataset, the largest metagenomic project published to date, by removing all proteins that matched Hidden Markov Models of known protein families from PFAM and TIGRFAM with high confidence (E-value > 10-5). This essentially left us with proteins with low or no homology to known protein families, still encompassing ~1.7 million different sequences. In this subset, we then identified protein families de novo with a Markov clustering algorithm. For each protein family, we defined a single representative based on its phylogenetic relationship to all other members in that family. This reduced the dataset to ~17,000 representatives of protein families with more than 10 members. Based on conserved regions typical for lipases and esterases, we selected a representative gene from a family of 27 members for synthesis. This protein, when expressed in E. coli, showed lipolytic activity toward para-nitrophenyl (pNP) esters. The Km-value of the enzyme was 66.68 μM for pNP-butyrate and 68.08 μM for pNP-palmitate with kcat/Km values at 3.4 × 106 and 6.6 × 105 M-1s-1, respectively. Hydrolysis of model substrates showed enantiopreference for the R-form. Reactions yielded 43 and 61% enantiomeric excess of products with ibuprofen methyl ester and 2-phenylpropanoic acid ethyl ester, respectively. The enzyme retains 50% of its maximum activity at temperatures as low as 10°C, its activity is enhanced in artificial seawater and buffers with higher salt concentrations with an optimum osmolarity of 3,890 mosmol/l.
Collapse
Affiliation(s)
- Thorsten Masuch
- Department of Microbial Biochemistry, Institute of Biochemistry and Pathobiochemistry, Ruhr University Bochum Bochum, Germany
| | - Anna Kusnezowa
- Department of Microbial Biochemistry, Institute of Biochemistry and Pathobiochemistry, Ruhr University Bochum Bochum, Germany
| | - Sebastian Nilewski
- Department of Microbial Biochemistry, Institute of Biochemistry and Pathobiochemistry, Ruhr University Bochum Bochum, Germany
| | - José T Bautista
- Junior Research Group for Microbial Biotechnology - Department for Biology and Biotechnology, Ruhr University Bochum Bochum, Germany
| | - Robert Kourist
- Junior Research Group for Microbial Biotechnology - Department for Biology and Biotechnology, Ruhr University Bochum Bochum, Germany
| | - Lars I Leichert
- Department of Microbial Biochemistry, Institute of Biochemistry and Pathobiochemistry, Ruhr University Bochum Bochum, Germany
| |
Collapse
|
21
|
An assessment of the amount of untapped fold level novelty in under-sampled areas of the tree of life. Sci Rep 2015; 5:14717. [PMID: 26434770 PMCID: PMC4592975 DOI: 10.1038/srep14717] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2015] [Accepted: 09/07/2015] [Indexed: 11/14/2022] Open
Abstract
Previous studies of protein fold space suggest that fold coverage is plateauing. However, sequence sampling has been -and remains to a large extent- heavily biased, focusing on culturable phyla. Sustained technological developments have fuelled the advent of metagenomics and single-cell sequencing, which might correct the current sequencing bias. The extent to which these efforts affect structural diversity remains unclear, although preliminary results suggest that uncultured organisms could constitute a source of new folds. We investigate to what extent genomes from uncultured and under-sampled phyla accessed through single cell sequencing, metagenomics and high-throughput culturing efforts have the potential to increase protein fold space, and conclude that i) genomes from under-sampled phyla appear enriched in sequences not covered by current protein family and fold profile libraries, ii) this enrichment is linked to an excess of short (and possibly partly spurious) sequences in some of the datasets, iii) the discovery rate of novel folds among sequences uncovered by current fold and family profile libraries may be as high as 36%, but would ultimately translate into a marginal increase in global discovery of novel folds. Thus, genomes from under-sampled phyla should have a rather limited impact on increasing coarse grained tertiary structure level novelty.
Collapse
|
22
|
Lobb B, Kurtz DA, Moreno-Hagelsieb G, Doxey AC. Remote homology and the functions of metagenomic dark matter. Front Genet 2015; 6:234. [PMID: 26257768 PMCID: PMC4508852 DOI: 10.3389/fgene.2015.00234] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 06/22/2015] [Indexed: 01/26/2023] Open
Abstract
Predicted open reading frames (ORFs) that lack detectable homology to known proteins are termed ORFans. Despite their prevalence in metagenomes, the extent to which ORFans encode real proteins, the degree to which they can be annotated, and their functional contributions, remain unclear. To gain insights into these questions, we applied sensitive remote-homology detection methods to functionally analyze ORFans from soil, marine, and human gut metagenome collections. ORFans were identified, clustered into sequence families, and annotated through profile-profile comparison to proteins of known structure. We found that a considerable number of metagenomic ORFans (73,896 of 484,121, 15.3%) exhibit significant remote homology to structurally characterized proteins, providing a means for ORFan functional profiling. The extent of detected remote homology far exceeds that obtained for artificial protein families (1.4%). As expected for real genes, the predicted functions of ORFans are significantly similar to the functions of their gene neighbors (p < 0.001). Compared to the functional profiles predicted through standard homology searches, ORFans show biologically intriguing differences. Many ORFan-enriched functions are virus-related and tend to reflect biological processes associated with extreme sequence diversity. Each environment also possesses a large number of unique ORFan families and functions, including some known to play important community roles such as gut microbial polysaccharide digestion. Lastly, ORFans are a valuable resource for finding novel enzymes of interest, as we demonstrate through the identification of hundreds of novel ORFan metalloproteases that all possess a signature catalytic motif despite a general lack of similarity to known proteins. Our ORFan functional predictions are a valuable resource for discovering novel protein families and exploring the boundaries of protein sequence space. All remote homology predictions are available at http://doxey.uwaterloo.ca/ORFans.
Collapse
Affiliation(s)
- Briallen Lobb
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| | - Daniel A Kurtz
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| | | | - Andrew C Doxey
- Department of Biology, University of Waterloo Waterloo, ON, Canada
| |
Collapse
|
23
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 150] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
24
|
Bengtsson-Palme J, Alm Rosenblad M, Molin M, Blomberg A. Metagenomics reveals that detoxification systems are underrepresented in marine bacterial communities. BMC Genomics 2014; 15:749. [PMID: 25179155 PMCID: PMC4161860 DOI: 10.1186/1471-2164-15-749] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2014] [Accepted: 08/26/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Environmental shotgun sequencing (metagenomics) provides a new way to study communities in microbial ecology. We here use sequence data from the Global Ocean Sampling (GOS) expedition to investigate toxicant selection pressures revealed by the presence of detoxification genes in marine bacteria. To capture a broad range of potential toxicants we selected detoxification protein families representing systems protecting microorganisms from a variety of stressors, such as metals, organic compounds, antibiotics and oxygen radicals. RESULTS Using a bioinformatics procedure based on comparative analysis to finished bacterial genomes we found that the amount of detoxification genes present in marine microorganisms seems surprisingly small. The underrepresentation is particularly evident for toxicant transporters and proteins involved in detoxifying metals. Exceptions are enzymes involved in oxidative stress defense where peroxidase enzymes are more abundant in marine bacteria compared to bacteria in general. In contrast, catalases are almost completely absent from the open ocean environment, suggesting that peroxidases and peroxiredoxins constitute a core line of defense against reactive oxygen species (ROS) in the marine milieu. CONCLUSIONS We found no indication that detoxification systems would be generally more abundant close to the coast compared to the open ocean. On the contrary, for several of the protein families that displayed a significant geographical distribution, like peroxidase, penicillin binding transpeptidase and divalent ion transport protein, the open ocean samples showed the highest abundance. Along the same lines, the abundance of most detoxification proteins did not increase with estimated pollution. The low level of detoxification systems in marine bacteria indicate that the majority of marine bacteria have a low capacity to adapt to increased pollution. Our study exemplifies the use of metagenomics data in ecotoxicology, and in particular how anthropogenic consequences on life in the sea can be examined.
Collapse
Affiliation(s)
- Johan Bengtsson-Palme
- Department of Chemistry and Molecular Biology, University of Gothenburg, Box 462, SE-405 30 Göteborg, Sweden.
| | | | | | | |
Collapse
|
25
|
Bošnjak I, Bojović V, Šegvić-Bubić T, Bielen A. Occurrence of protein disulfide bonds in different domains of life: a comparison of proteins from the Protein Data Bank. Protein Eng Des Sel 2014; 27:65-72. [PMID: 24407015 DOI: 10.1093/protein/gzt063] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Disulfide bonds (SS bonds) are important post-translational modifications of proteins. They stabilize a three-dimensional (3D) structure (structural SS bonds) and also have the catalytic or regulatory functions (redox-active SS bonds). Although SS bonds are present in all groups of organisms, no comparative analyses of their frequency in proteins from different domains of life have been made to date. Using the Protein Data Bank, the number and subcellular locations of SS bonds in Archaea, Bacteria and Eukarya have been compared. Approximately three times higher frequency of proteins with SS bonds in eukaryotic secretory organelles (e.g. endoplasmic reticulum) than in bacterial periplasmic/secretory pathways was calculated. Protein length also affects the SS bond frequency: the average number of SS bonds is positively correlated with the length for longer proteins (>200 amino acids), while for the shorter and less stable proteins (<200 amino acids) this correlation is negative. Medium-sized proteins (250-350 amino acids) indicated a high number of SS bonds only in Archaea which could be explained by the need for additional protein stabilization in hyperthermophiles. The results emphasize higher capacity for the SS bond formation and isomerization in Eukarya when compared with Archaea and Bacteria.
Collapse
Affiliation(s)
- I Bošnjak
- Laboratory for Biology and Microbial Genetics, Department of Biochemical Engineering, Faculty of Food Technology and Biotechnology, Pierottijeva 6, 10000 Zagreb, Croatia
| | | | | | | |
Collapse
|
26
|
Abstract
Efficient high-throughput gene cloning represents a critical first step for conducting functional and structural proteomics in the post-genomic era. The ligation-independent cloning (LIC) method has been almost universally adopted by large structural biology centers as a component of high-throughput structure determination pipelines. The LIC platform is easy to use, of low cost, and rapid, and importantly, it is easily adapted to 96- or 384-well format, thereby facilitating automation. Procedures are described for 96-well format cloning using the LIC technology.
Collapse
Affiliation(s)
- Keehwan Kwon
- J. Craig Venter Institute, 9704 Medical Center Dr., Rockville, MD, 20850, USA,
| | | |
Collapse
|
27
|
Sharpton TJ. An introduction to the analysis of shotgun metagenomic data. FRONTIERS IN PLANT SCIENCE 2014; 5:209. [PMID: 24982662 PMCID: PMC4059276 DOI: 10.3389/fpls.2014.00209] [Citation(s) in RCA: 280] [Impact Index Per Article: 28.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Accepted: 04/29/2014] [Indexed: 05/19/2023]
Abstract
Environmental DNA sequencing has revealed the expansive biodiversity of microorganisms and clarified the relationship between host-associated microbial communities and host phenotype. Shotgun metagenomic DNA sequencing is a relatively new and powerful environmental sequencing approach that provides insight into community biodiversity and function. But, the analysis of metagenomic sequences is complicated due to the complex structure of the data. Fortunately, new tools and data resources have been developed to circumvent these complexities and allow researchers to determine which microbes are present in the community and what they might be doing. This review describes the analytical strategies and specific tools that can be applied to metagenomic data and the considerations and caveats associated with their use. Specifically, it documents how metagenomes can be analyzed to quantify community structure and diversity, assemble novel genomes, identify new taxa and genes, and determine which metabolic pathways are encoded in the community. It also discusses several methods that can be used compare metagenomes to identify taxa and functions that differentiate communities.
Collapse
Affiliation(s)
- Thomas J. Sharpton
- *Correspondence: Thomas J. Sharpton, Department of Microbiology and Department of Statistics, Oregon State University, 220 Nash Hall, Corvallis, OR 97331, USA e-mail:
| |
Collapse
|
28
|
Buj R, Iglesias N, Planas AM, Santalucía T. A plasmid toolkit for cloning chimeric cDNAs encoding customized fusion proteins into any Gateway destination expression vector. BMC Mol Biol 2013; 14:18. [PMID: 23957834 PMCID: PMC3765358 DOI: 10.1186/1471-2199-14-18] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 08/12/2013] [Indexed: 12/31/2022] Open
Abstract
Background Valuable clone collections encoding the complete ORFeomes for some model organisms have been constructed following the completion of their genome sequencing projects. These libraries are based on Gateway cloning technology, which facilitates the study of protein function by simplifying the subcloning of open reading frames (ORF) into any suitable destination vector. The expression of proteins of interest as fusions with functional modules is a frequent approach in their initial functional characterization. A limited number of Gateway destination expression vectors allow the construction of fusion proteins from ORFeome-derived sequences, but they are restricted to the possibilities offered by their inbuilt functional modules and their pre-defined model organism-specificity. Thus, the availability of cloning systems that overcome these limitations would be highly advantageous. Results We present a versatile cloning toolkit for constructing fully-customizable three-part fusion proteins based on the MultiSite Gateway cloning system. The fusion protein components are encoded in the three plasmids integral to the kit. These can recombine with any purposely-engineered destination vector that uses a heterologous promoter external to the Gateway cassette, leading to the in-frame cloning of an ORF of interest flanked by two functional modules. In contrast to previous systems, a third part becomes available for peptide-encoding as it no longer needs to contain a promoter, resulting in an increased number of possible fusion combinations. We have constructed the kit’s component plasmids and demonstrate its functionality by providing proof-of-principle data on the expression of prototype fluorescent fusions in transiently-transfected cells. Conclusions We have developed a toolkit for creating fusion proteins with customized N- and C-term modules from Gateway entry clones encoding ORFs of interest. Importantly, our method allows entry clones obtained from ORFeome collections to be used without prior modifications. Using this technology, any existing Gateway destination expression vector with its model-specific properties could be easily adapted for expressing fusion proteins.
Collapse
Affiliation(s)
- Raquel Buj
- Department of Brain Ischemia and Neurodegeneration, Institut d'Investigacions Biomèdiques de Barcelona (IIBB)-Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | | | | | | |
Collapse
|
29
|
Serine/threonine kinases and E2-ubiquitin conjugating enzymes in Planctomycetes: unexpected findings. Antonie van Leeuwenhoek 2013; 104:509-20. [DOI: 10.1007/s10482-013-9993-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Accepted: 07/26/2013] [Indexed: 12/25/2022]
|
30
|
Bornberg-Bauer E, Albà MM. Dynamics and adaptive benefits of modular protein evolution. Curr Opin Struct Biol 2013; 23:459-66. [PMID: 23562500 DOI: 10.1016/j.sbi.2013.02.012] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2013] [Revised: 02/15/2013] [Accepted: 02/15/2013] [Indexed: 11/29/2022]
Abstract
During protein evolution, novel domain arrangements are continuously formed. Rearrangements are important for the creation of molecular biodiversity and for functional molecular changes which underlie developmental shifts in the bauplan of organisms. Here we review the mechanisms by which new arrangements arise and the potential benefits of rearrangements. We concentrate on how new domains emerge and why they rapidly spread across genomes, gaining higher copy numbers than older, more established domains. This spread is most likely a consequence of their high adaptive potential but is unlikely to make up on its own for the drastic loss of domains, which is observed across different taxa. We show that a significant portion of the recently emerged domains, especially those in multidomain families, are highly disordered and speculate about the significance of these findings for the evolvability of novel genetic material.
Collapse
Affiliation(s)
- Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, School of Biological Sciences, University of Münster, Hüfferstrasse 1, D48149 Münster, Germany.
| | | |
Collapse
|
31
|
New nuclear markers and exploration of the relationships among Serraniformes (Acanthomorpha, Teleostei): The importance of working at multiple scales. Mol Phylogenet Evol 2013; 67:140-55. [DOI: 10.1016/j.ympev.2012.12.020] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2012] [Revised: 11/30/2012] [Accepted: 12/28/2012] [Indexed: 01/20/2023]
|
32
|
Protein structure prediction from sequence variation. Nat Biotechnol 2013; 30:1072-80. [PMID: 23138306 DOI: 10.1038/nbt.2419] [Citation(s) in RCA: 430] [Impact Index Per Article: 39.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2012] [Accepted: 10/15/2012] [Indexed: 02/07/2023]
Abstract
Genomic sequences contain rich evolutionary information about functional constraints on macromolecules such as proteins. This information can be efficiently mined to detect evolutionary couplings between residues in proteins and address the long-standing challenge to compute protein three-dimensional structures from amino acid sequences. Substantial progress has recently been made on this problem owing to the explosive growth in available sequences and the application of global statistical methods. In addition to three-dimensional structure, the improved understanding of covariation may help identify functional residues involved in ligand binding, protein-complex formation and conformational changes. We expect computation of covariation patterns to complement experimental structural biology in elucidating the full spectrum of protein structures, their functional interactions and evolutionary dynamics.
Collapse
|
33
|
Minkiewicz P, Bucholska J, Darewicz M, Borawska J. Epitopic hexapeptide sequences from Baltic cod parvalbumin beta (allergen Gad c 1) are common in the universal proteome. Peptides 2012; 38:105-9. [PMID: 22940202 DOI: 10.1016/j.peptides.2012.08.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Revised: 08/14/2012] [Accepted: 08/14/2012] [Indexed: 01/25/2023]
Abstract
The aim of this study was to analyze the distribution of hexapeptide fragments considered as epitopes of Baltic cod parvalbumin beta (allergen Gad c 1) in the universal proteome. Cod (Gadus morhua subsp. callarias) parvalbumin hexapeptides cataloged in the Immune Epitope Database were used as query sequences. The UniProt database was screened using the WU-BLAST 2 program. The distribution of hexapeptide fragments was investigated in various protein families, classified according to the presence of the appropriate domains, and in proteins of plant, animal and microbial species. Hexapeptides from cod parvalbumin were found in the proteins of plants and animals which are food sources, microorganisms with various applications in food technology and biotechnology, microorganisms which are human symbionts and commensals as well as human pathogens. In the last case possible coverage between epitopes from pathogens and allergens should be avoided during vaccine design.
Collapse
Affiliation(s)
- Piotr Minkiewicz
- University of Warmia and Mazury in Olsztyn, Chair of Food Biochemistry, Olsztyn-Kortowo, Poland.
| | | | | | | |
Collapse
|
34
|
Dougherty MJ, D'haeseleer P, Hazen TC, Simmons BA, Adams PD, Hadi MZ. Glycoside hydrolases from a targeted compost metagenome, activity-screening and functional characterization. BMC Biotechnol 2012; 12:38. [PMID: 22759983 PMCID: PMC3477009 DOI: 10.1186/1472-6750-12-38] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 07/03/2012] [Indexed: 11/29/2022] Open
Abstract
Background Metagenomics approaches provide access to environmental genetic diversity for biotechnology applications, enabling the discovery of new enzymes and pathways for numerous catalytic processes. Discovery of new glycoside hydrolases with improved biocatalytic properties for the efficient conversion of lignocellulosic material to biofuels is a critical challenge in the development of economically viable routes from biomass to fuels and chemicals. Results Twenty-two putative ORFs (open reading frames) were identified from a switchgrass-adapted compost community based on sequence homology to related gene families. These ORFs were expressed in E. coli and assayed for predicted activities. Seven of the ORFs were demonstrated to encode active enzymes, encompassing five classes of hemicellulases. Four enzymes were over expressed in vivo, purified to homogeneity and subjected to detailed biochemical characterization. Their pH optima ranged between 5.5 - 7.5 and they exhibit moderate thermostability up to ~60-70°C. Conclusions Seven active enzymes were identified from this set of ORFs comprising five different hemicellulose activities. These enzymes have been shown to have useful properties, such as moderate thermal stability and broad pH optima, and may serve as the starting points for future protein engineering towards the goal of developing efficient enzyme cocktails for biomass degradation under diverse process conditions.
Collapse
|
35
|
Svenson J. MabCent: Arctic marine bioprospecting in Norway. PHYTOCHEMISTRY REVIEWS : PROCEEDINGS OF THE PHYTOCHEMICAL SOCIETY OF EUROPE 2012; 12:567-578. [PMID: 24078803 PMCID: PMC3777186 DOI: 10.1007/s11101-012-9239-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/08/2011] [Accepted: 05/12/2012] [Indexed: 05/24/2023]
Abstract
The deep waters surrounding the coastline of the northern parts of Norway represent an exciting biotope for marine exploration. Dark and cold Arctic water generates a hostile environment where the ability to adapt is crucial to survival. These waters are nonetheless bountiful and a diverse plethora of marine organisms thrive in these extreme conditions, many with the help of specialised chemical compounds. In comparison to warmer, perhaps more inviting shallower tropical waters, the Arctic region has not been as thoroughly investigated. MabCent is a Norwegian initiative based in Tromsø that aims to change this. Since 2007, scientists within MabCent have focussed their efforts on the study of marine organisms inhabiting the Arctic waters with the long term goal of novel drug discovery and development. The activities of MabCent are diverse and range from sampling the Arctic ice shelf to the chemical synthesis of promising secondary metabolites discovered during the screening process. The current review will present the MabCent pipeline from isolation to identification of new bioactive marine compounds via an extensive screening process. An overview of the main activities will be given with particular focus on isolation strategies, bioactivity screening and structure determination. Pitfalls, hard earned lessons and the results so far are also discussed.
Collapse
Affiliation(s)
- Johan Svenson
- SmallStruct, Department of Chemistry, University of Tromsø, Breivika, 9037 Tromsø, Norway
| |
Collapse
|
36
|
Yin Y, Mao X, Yang J, Chen X, Mao F, Xu Y. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic Acids Res 2012; 40:W445-51. [PMID: 22645317 PMCID: PMC3394287 DOI: 10.1093/nar/gks479] [Citation(s) in RCA: 1209] [Impact Index Per Article: 100.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Carbohydrate-active enzymes (CAZymes) are very important to the biotech industry, particularly the emerging biofuel industry because CAZymes are responsible for the synthesis, degradation and modification of all the carbohydrates on Earth. We have developed a web resource, dbCAN (http://csbl.bmb.uga.edu/dbCAN/annotate.php), to provide a capability for automated CAZyme signature domain-based annotation for any given protein data set (e.g. proteins from a newly sequenced genome) submitted to our server. To accomplish this, we have explicitly defined a signature domain for every CAZyme family, derived based on the CDD (conserved domain database) search and literature curation. We have also constructed a hidden Markov model to represent the signature domain of each CAZyme family. These CAZyme family-specific HMMs are our key contribution and the foundation for the automated CAZyme annotation.
Collapse
Affiliation(s)
- Yanbin Yin
- Computational System Biology Laboratory, Department of Biochemistry and Molecular Biology, Institute of Bioinformatics, BioEnergy Science Center, University of Georgia, Athens, GA, USA
| | | | | | | | | | | |
Collapse
|
37
|
Collison M, Hirt RP, Wipat A, Nakjang S, Sanseau P, Brown JR. Data mining the human gut microbiota for therapeutic targets. Brief Bioinform 2012; 13:751-68. [DOI: 10.1093/bib/bbs002] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
|
38
|
Thomas T, Gilbert J, Meyer F. Metagenomics - a guide from sampling to data analysis. MICROBIAL INFORMATICS AND EXPERIMENTATION 2012; 2:3. [PMID: 22587947 PMCID: PMC3351745 DOI: 10.1186/2042-5783-2-3] [Citation(s) in RCA: 419] [Impact Index Per Article: 34.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/13/2011] [Accepted: 02/09/2012] [Indexed: 12/13/2022]
Abstract
Metagenomics applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of entire communities of organisms. The field of metagenomics has been responsible for substantial advances in microbial ecology, evolution, and diversity over the past 5 to 10 years, and many research laboratories are actively engaged in it now. With the growing numbers of activities also comes a plethora of methodological knowledge and expertise that should guide future developments in the field. This review summarizes the current opinions in metagenomics, and provides practical guidance and advice on sample processing, sequencing technology, assembly, binning, annotation, experimental design, statistical analysis, data storage, and data sharing. As more metagenomic datasets are generated, the availability of standardized procedures and shared data storage and analysis becomes increasingly important to ensure that output of individual projects can be assessed and compared.
Collapse
Affiliation(s)
- Torsten Thomas
- School of Biotechnology and Biomolecular Sciences & Centre for Marine Bio-Innovation, The University of New South Wales, Sydney, NSW 2052, Australia
| | - Jack Gilbert
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA
- Department of Ecology and Evolution, University of Chicago, 5640 South Ellis Avenue, Chicago, IL 60637, USA
| | - Folker Meyer
- Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA
- Computation Institute, University of Chicago, 5640 South Ellis Avenue, Chicago, IL 60637, USA
| |
Collapse
|