1
|
Wang M, Doak TG, Ye Y. Subtractive assembly for comparative metagenomics, and its application to type 2 diabetes metagenomes. Genome Biol 2015; 16:243. [PMID: 26527161 PMCID: PMC4630832 DOI: 10.1186/s13059-015-0804-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 10/09/2015] [Indexed: 12/18/2022] Open
Abstract
Comparative metagenomics remains challenging due to the size and complexity of metagenomic datasets. Here we introduce subtractive assembly, a de novo assembly approach for comparative metagenomics that directly assembles only the differential reads that distinguish between two groups of metagenomes. Using simulated datasets, we show it improves both the efficiency of the assembly and the assembly quality of the differential genomes and genes. Further, its application to type 2 diabetes (T2D) metagenomic datasets reveals clear signatures of the T2D gut microbiome, revealing new phylogenetic and functional features of the gut microbial communities associated with T2D.
Collapse
Affiliation(s)
- Mingjie Wang
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA.
| | - Thomas G Doak
- Department of Biology, Indiana University, Bloomington, IN, 47405, USA. .,National Center for Genome Analysis Support, Indiana University, Bloomington, IN, 47401, USA.
| | - Yuzhen Ye
- School of Informatics and Computing, Indiana University, Bloomington, IN, 47405, USA.
| |
Collapse
|
2
|
Aires T, Marbà N, Serrao EA, Duarte CM, Arnaud-Haond S. SELECTIVE ELIMINATION OF CHLOROPLASTIDIAL DNA FOR METAGENOMICS OF BACTERIA ASSOCIATED WITH THE GREEN ALGA CAULERPA TAXIFOLIA (BRYOPSIDOPHYCEAE)(1). JOURNAL OF PHYCOLOGY 2012; 48:483-490. [PMID: 27009738 DOI: 10.1111/j.1529-8817.2012.01124.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Molecular analyses of bacteria associated with photosynthetic organisms are often confounded by coamplification of the chloroplastidial 16S rDNA with the targeted bacterial 16S rDNA. This major problem has hampered progress in the characterization of bacterial communities associated to photosynthetic organisms and has limited the full realization of the potential offered by the last generation of metagenomics approaches. A simple and inexpensive method is presented, based on ethanol and bleach treatments prior to extraction, to efficiently discard a great part of chloroplastidial DNA without affecting the characterization of bacterial communities through pyrosequencing. Its effectiveness for the description of bacterial lineages associated to the green alga Caulerpa taxifolia (M. Vahl) C. Agardh was much higher than that of the preexisting enrichment protocols proposed for plants. Furthermore, this new technique requires a very small amount of biological material compared to the other current protocols, making it more realistic for systematic use in ecological and phylogenetic studies and opening promising prospects for metagenomics of green algae, as shown by our data.
Collapse
Affiliation(s)
- Tânia Aires
- CCMAR -Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, PortugalDepartment of Global Change Research, IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain Department of Global Change Research. IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain The UWA Oceans Institute, University of Western Australia, 35 Stirling Highway, Crawley 6009, Australia CCMAR - Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, Portugal IFREMER- Technopole de Brest-Iroise BP 70 29280 Plouzané, France
| | - Núria Marbà
- CCMAR -Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, PortugalDepartment of Global Change Research, IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain Department of Global Change Research. IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain The UWA Oceans Institute, University of Western Australia, 35 Stirling Highway, Crawley 6009, Australia CCMAR - Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, Portugal IFREMER- Technopole de Brest-Iroise BP 70 29280 Plouzané, France
| | - Ester A Serrao
- CCMAR -Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, PortugalDepartment of Global Change Research, IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain Department of Global Change Research. IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain The UWA Oceans Institute, University of Western Australia, 35 Stirling Highway, Crawley 6009, Australia CCMAR - Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, Portugal IFREMER- Technopole de Brest-Iroise BP 70 29280 Plouzané, France
| | - Carlos M Duarte
- CCMAR -Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, PortugalDepartment of Global Change Research, IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain Department of Global Change Research. IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain The UWA Oceans Institute, University of Western Australia, 35 Stirling Highway, Crawley 6009, Australia CCMAR - Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, Portugal IFREMER- Technopole de Brest-Iroise BP 70 29280 Plouzané, France
| | - Sophie Arnaud-Haond
- CCMAR -Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, PortugalDepartment of Global Change Research, IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain Department of Global Change Research. IMEDEA (CSIC-UIB) Institut Mediterrani d'Estudis Avançats, Miquel Marques 21, 07190 Esporles, Mallorca, Spain The UWA Oceans Institute, University of Western Australia, 35 Stirling Highway, Crawley 6009, Australia CCMAR - Center for Marine Sciences, CIMAR, FCT, University of Algarve, Gambelas, P-8005-139, Faro, Portugal IFREMER- Technopole de Brest-Iroise BP 70 29280 Plouzané, France
| |
Collapse
|
3
|
Wu YW, Ye Y. A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J Comput Biol 2011; 18:523-34. [PMID: 21385052 DOI: 10.1089/cmb.2010.0245] [Citation(s) in RCA: 90] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. Among the computational tools recently developed for metagenomic sequence analysis, binning tools attempt to classify the sequences in a metagenomic dataset into different bins (i.e., species), based on various DNA composition patterns (e.g., the tetramer frequencies) of various genomes. Composition-based binning methods, however, cannot be used to classify very short fragments, because of the substantial variation of DNA composition patterns within a single genome. We developed a novel approach (AbundanceBin) for metagenomics binning by utilizing the different abundances of species living in the same environment. AbundanceBin is an application of the Lander-Waterman model to metagenomics, which is based on the l-tuple content of the reads. AbundanceBin achieved accurate, unsupervised, clustering of metagenomic sequences into different bins, such that the reads classified in a bin belong to species of identical or very similar abundances in the sample. In addition, AbundanceBin gave accurate estimations of species abundances, as well as their genome sizes-two important parameters for characterizing a microbial community. We also show that AbundanceBin performed well when the sequence lengths are very short (e.g., 75 bp) or have sequencing errors. By combining AbundanceBin and a composition-based method (MetaCluster), we can achieve even higher binning accuracy. Supplementary Material is available at www.liebertonline.com/cmb .
Collapse
Affiliation(s)
- Yu-Wei Wu
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
| | | |
Collapse
|
4
|
Ye Y, Tang H. An ORFome assembly approach to metagenomics sequences analysis. J Bioinform Comput Biol 2009; 7:455-71. [PMID: 19507285 DOI: 10.1142/s0219720009004151] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Revised: 11/04/2008] [Accepted: 11/06/2008] [Indexed: 11/18/2022]
Abstract
Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.
Collapse
Affiliation(s)
- Yuzhen Ye
- School of Informatics, Indiana University, Bloomington, IN 47408, USA.
| | | |
Collapse
|
5
|
Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009; 5:e1000465. [PMID: 19680427 PMCID: PMC2714467 DOI: 10.1371/journal.pcbi.1000465] [Citation(s) in RCA: 309] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 07/10/2009] [Indexed: 11/18/2022] Open
Abstract
A common biological pathway reconstruction approach -- as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences -- starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database -- as compared to the naïve mapping approach -- eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.
Collapse
Affiliation(s)
- Yuzhen Ye
- School of Informatics, Indiana University, Bloomington, IN, USA.
| | | |
Collapse
|
6
|
Abstract
The total number of prokaryotic cells on earth has been estimated to be approximately 4-6 x 10(30), with the majority of these being uncharacterized. This diversity represents a vast genetic bounty that may be exploited for the discovery of novel genes, entire metabolic pathways and potentially valuable end-products thereof. Metagenomics constitutes the functional and sequence-based analysis of the collective microbial genomes (microbiome) in a particular environment or environmental niche. Herein, we review the most recent sequence-based metagenomic analyses of some of the most microbiologically diverse locations on earth; including soil, marine water and the insect and human gut. Such studies have helped to uncover several previously unknown facts; from the true microbial diversity of extreme environments to the actual extent of symbiosis that exists in the insect and human gut. In this respect, metagenomics has and will continue to play an essential part in the new and evolving area of microbial systems biology.
Collapse
Affiliation(s)
- R D Sleator
- Alimentary Pharmabiotic Centre, University College Cork, Cork, Ireland.
| | | | | |
Collapse
|
7
|
Wooley JC, Ye Y. Metagenomics: Facts and Artifacts, and Computational Challenges*. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 2009; 25:71-81. [PMID: 20648230 PMCID: PMC2905821 DOI: 10.1007/s11390-010-9306-4] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.
Collapse
Affiliation(s)
- John C. Wooley
- Center for Research on BioSystems, Calit2, UC San Diego, La Jolla CA 92093
| | - Yuzhen Ye
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, 47408
| |
Collapse
|
8
|
|
9
|
Dupré J, O'Malley MA. Metagenomics and biological ontology. STUDIES IN HISTORY AND PHILOSOPHY OF BIOLOGICAL AND BIOMEDICAL SCIENCES 2007; 38:834-846. [PMID: 18053937 DOI: 10.1016/j.shpsc.2007.09.001] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Metagenomics is an emerging microbial systems science that is based on the large-scale analysis of the DNA of microbial communities in their natural environments. Studies of metagenomes are revealing the vast scope of biodiversity in a wide range of environments, as well as new functional capacities of individual cells and communities, and the complex evolutionary relationships between them. Our examination of this science focuses on the ontological implications of these studies of metagenomes and metaorganisms, and what they mean for common sense and philosophical understandings of multicellularity, individuality and organism. We show how metagenomics requires us to think in different ways about what human beings are and what their relation to the microbial world is. Metagenomics could also transform the way in which evolutionary processes are understood, with the most basic relationship between cells from both similar and different organisms being far more cooperative and less antagonistic than is widely assumed. In addition to raising fundamental questions about biological ontology, metagenomics generates possibilities for powerful technologies addressed to issues of climate, health and conservation. We conclude with reflections about process-oriented versus entity-oriented analysis in light of current trends towards systems approaches.
Collapse
Affiliation(s)
- John Dupré
- Egenis, ESRC Centre for Genomics in Society, University of Exeter, Byrne House, St Germans Road, Exeter EX4 4PJ, UK.
| | | |
Collapse
|
10
|
Firkins JL, Yu Z, Morrison M. Ruminal Nitrogen Metabolism: Perspectives for Integration of Microbiology and Nutrition for Dairy. J Dairy Sci 2007; 90 Suppl 1:E1-16. [PMID: 17517749 DOI: 10.3168/jds.2006-518] [Citation(s) in RCA: 159] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Our objectives are to integrate current knowledge with a future perspective regarding how metagenomics can be used to integrate rumen microbiology and nutrition. Ruminal NH3-N concentration is a crude predictor of efficiency of dietary N conversion into microbial N, but as this concentration decreases below approximately 5 mg/dL (the value most often suggested to be the requirement for optimal microbial protein synthesis), blood urea N transfer into the rumen provides an increasing buffer against excessively low NH3-N concentrations, and the supply of amino N might become increasingly important to improve microbial function in dairy diets. Defaunation typically decreases NH3-N concentration, which should increase the efficiency of blood urea N and protein-derived NH3-N conversion into microbial protein in the rumen. Thus, we explain why more emphasis should be given toward characterization of protozoal interactions with proteolytic and deaminating bacterial populations. In contrast with research evaluating effects of protozoa on N metabolism, which has primarily been done with sheep and cattle with low dry matter intake, dairy cattle have greater intakes of readily available carbohydrate combined with increased ruminal passage rates. We argue that these conditions decrease protozoal biomass relative to bacterial biomass and increase the efficiency of protozoal growth, thus reducing the negative effects of bacterial predation compared with the beneficial effects that protozoa have on stabilizing the entire microbial ecosystem. A better understanding of mechanistic processes altering the production and uptake of amino N will help us to improve the overall conversion of dietary N into microbial protein and provide key information needed to further improve mechanistic models describing rumen function and evaluating dietary conditions that influence the efficiency of conversion of dietary N into milk protein.
Collapse
Affiliation(s)
- J L Firkins
- The MAPLE Research Initiative, Department of Animal Sciences, The Ohio State University, Columbus 43210, USA.
| | | | | |
Collapse
|
11
|
Raes J, Harrington ED, Singh AH, Bork P. Protein function space: viewing the limits or limited by our view? Curr Opin Struct Biol 2007; 17:362-9. [PMID: 17574832 DOI: 10.1016/j.sbi.2007.05.010] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2007] [Revised: 04/25/2007] [Accepted: 05/31/2007] [Indexed: 12/13/2022]
Abstract
Given that the number of protein functions on earth is finite, the rapid expansion of biological knowledge and the concomitant exponential increase in the number of protein sequences should, at some point, enable the estimation of the limits of protein function space. The functional coverage of protein sequences can be investigated using computational methods, especially given the massive amount of data being generated by large-scale environmental sequencing (metagenomics). In completely sequenced genomes, the fraction of proteins to which at least some functional features can be assigned has recently risen to as much as approximately 85%. Although this fraction is more uncertain in metagenomics surveys, because of environmental complexities and differences in analysis protocols, our global knowledge of protein functions still appears to be considerable. However, when we consider protein families, continued sequencing seems to yield an ever-increasing number of novel families. Until we reconcile these two views, the limits of protein space will remain obscured.
Collapse
Affiliation(s)
- Jeroen Raes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | | | |
Collapse
|
12
|
Zhu Y, Pulukkunat DK, Li Y. Deciphering RNA structural diversity and systematic phylogeny from microbial metagenomes. Nucleic Acids Res 2007; 35:2283-94. [PMID: 17389640 PMCID: PMC1874661 DOI: 10.1093/nar/gkm057] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Metagenomics has been employed to systematically sequence, classify, analyze and manipulate the entire genetic material isolated from environmental samples. Finding genes within metagenomic sequences remains a formidable challenge, and noncoding RNA genes other than those encoding rRNA and tRNA are not well annotated in metagenomic projects. In this work, we identify, validate and analyze the genes coding for RNase P RNA (P RNA) from all published metagenomic projects. P RNA is the RNA subunit of a ubiquitous endoribonuclease RNase P that consists of one RNA subunit and one or more protein subunits. The bacterial P RNAs are classified into two types, Type A and Type B, based on the constituents of the structure involved in precursor tRNA binding. Archaeal P RNAs are classified into Type A and Type M, whereas the Type A is ancestral and close to Type A bacterial P RNA. Bacterial and some archaeal P RNAs are catalytically active without protein subunits, capable of cleaving precursor tRNA transcripts to produce their mature 5′-termini. We have found 328 distinctive P RNAs (320 bacterial and 8 archaeal) from all published metagenomics sequences, which led us to expand by 60% the total number of this catalytic RNA from prokaryotes. Surprisingly, all newly identified P RNAs from metagenomics sequences are Type A, i.e. neither Type B bacterial nor Type M archaeal P RNAs are found. We experimentally validate the authenticity of an archaeal P RNA from Sargasso Sea. One of the distinctive features of some new P RNAs is that the P2 stem has kinked nucleotides in its 5′ strand. We find that the single nucleotide J2/3 joint region linking the P2 and P3 stem that was used to distinguish a bacterial P RNA from an archaeal one is no longer applicable, i.e. some archaeal P RNAs have only one nucleotide in the J2/3 joint. We also discuss the phylogenetic analysis based on covariance model of P RNA that offers a few advantages over the one based on 16S rRNA.
Collapse
Affiliation(s)
- Yanglong Zhu
- Department of Biochemistry and Molecular Biology, and Center for Genetics and Molecular Medicine, School of Medicine, University of Louisville, 319 Abraham Flexner Way, Louisville, KY, 40202, USA and Ohio State Biochemistry Program, Department of Biochemistry, Ohio State University, Columbus, OH 43210, USA
| | - Dileep K. Pulukkunat
- Department of Biochemistry and Molecular Biology, and Center for Genetics and Molecular Medicine, School of Medicine, University of Louisville, 319 Abraham Flexner Way, Louisville, KY, 40202, USA and Ohio State Biochemistry Program, Department of Biochemistry, Ohio State University, Columbus, OH 43210, USA
| | - Yong Li
- Department of Biochemistry and Molecular Biology, and Center for Genetics and Molecular Medicine, School of Medicine, University of Louisville, 319 Abraham Flexner Way, Louisville, KY, 40202, USA and Ohio State Biochemistry Program, Department of Biochemistry, Ohio State University, Columbus, OH 43210, USA
- *To whom correspondence should be addressed. +1-502-852-7551+1-502-852-6222
| |
Collapse
|
13
|
Tress ML, Cozzetto D, Tramontano A, Valencia A. An analysis of the Sargasso Sea resource and the consequences for database composition. BMC Bioinformatics 2006; 7:213. [PMID: 16623953 PMCID: PMC1513258 DOI: 10.1186/1471-2105-7-213] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2005] [Accepted: 04/19/2006] [Indexed: 01/20/2023] Open
Abstract
Background The environmental sequencing of the Sargasso Sea has introduced a huge new resource of genomic information. Unlike the protein sequences held in the current searchable databases, the Sargasso Sea sequences originate from a single marine environment and have been sequenced from species that are not easily obtainable by laboratory cultivation. The resource also contains very many fragments of whole protein sequences, a side effect of the shotgun sequencing method. These sequences form a significant addendum to the current searchable databases but also present us with some intrinsic difficulties. While it is important to know whether it is possible to assign function to these sequences with the current methods and whether they will increase our capacity to explore sequence space, it is also interesting to know how current bioinformatics techniques will deal with the new sequences in the resource. Results The Sargasso Sea sequences seem to introduce a bias that decreases the potential of current methods to propose structure and function for new proteins. In particular the high proportion of sequence fragments in the resource seems to result in poor quality multiple alignments. Conclusion These observations suggest that the new sequences should be used with care, especially if the information is to be used in large scale analyses. On a positive note, the results may just spark improvements in computational and experimental methods to take into account the fragments generated by environmental sequencing techniques.
Collapse
Affiliation(s)
- Michael L Tress
- Protein Design Group, CNB-CSIC, Calle Darwin, Cantoblanco 28049 Madrid, Spain
| | - Domenico Cozzetto
- Department of Biochemical Sciences, University "La Sapienza" Rome, Italy
| | - Anna Tramontano
- Department of Biochemical Sciences, University "La Sapienza" Rome, Italy
| | - Alfonso Valencia
- Protein Design Group, CNB-CSIC, Calle Darwin, Cantoblanco 28049 Madrid, Spain
| |
Collapse
|
14
|
Christen M, Christen B, Folcher M, Schauerte A, Jenal U. Identification and characterization of a cyclic di-GMP-specific phosphodiesterase and its allosteric control by GTP. J Biol Chem 2005; 280:30829-37. [PMID: 15994307 DOI: 10.1074/jbc.m504429200] [Citation(s) in RCA: 362] [Impact Index Per Article: 19.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Cyclic diguanylic acid (c-di-GMP) is a global second messenger controlling motility and adhesion in bacterial cells. Synthesis and degradation of c-di-GMP is catalyzed by diguanylate cyclases (DGC) and c-di-GMP-specific phosphodiesterases (PDE), respectively. Whereas the DGC activity has recently been assigned to the widespread GGDEF domain, the enzymatic activity responsible for c-di-GMP cleavage has been associated with proteins containing an EAL domain. Here we show biochemically that CC3396, a GGDEF-EAL composite protein from Caulobacter crescentus is a soluble PDE. The PDE activity, which rapidly converts c-di-GMP into the linear dinucleotide pGpG, is confined to the C-terminal EAL domain of CC3396, depends on the presence of Mg2+ ions, and is strongly inhibited by Ca2+ ions. Remarkably, the associated GGDEF domain, which contains an altered active site motif (GEDEF), lacks detectable DGC activity. Instead, this domain is able to bind GTP and in response activates the PDE activity in the neighboring EAL domain. PDE activation is specific for GTP (K(D) 4 microM) and operates by lowering the K(m) for c-di-GMP of the EAL domain to a physiologically significant level (420 nM). Mutational analysis suggested that the substrate-binding site (A-site) of the GGDEF domain is involved in the GTP-dependent regulatory function, arguing that a catalytically inactive GGDEF domain has retained the ability to bind GTP and in response can activate the neighboring EAL domain. Based on this we propose that the c-di-GMP-specific PDE activity is confined to the EAL domain, that GGDEF domains can either catalyze the formation of c-di-GMP or can serve as regulatory domains, and that c-di-GMP-specific phosphodiesterase activity is coupled to the cellular GTP level in bacteria.
Collapse
Affiliation(s)
- Matthias Christen
- Division of Molecular Microbiology, Biozentrum, University of Basel, Klingelbergstrasse 70, 4056 Basel, Switzerland
| | | | | | | | | |
Collapse
|
15
|
Abstract
Metagenomics (also referred to as environmental and community genomics) is the genomic analysis of microorganisms by direct extraction and cloning of DNA from an assemblage of microorganisms. The development of metagenomics stemmed from the ineluctable evidence that as-yet-uncultured microorganisms represent the vast majority of organisms in most environments on earth. This evidence was derived from analyses of 16S rRNA gene sequences amplified directly from the environment, an approach that avoided the bias imposed by culturing and led to the discovery of vast new lineages of microbial life. Although the portrait of the microbial world was revolutionized by analysis of 16S rRNA genes, such studies yielded only a phylogenetic description of community membership, providing little insight into the genetics, physiology, and biochemistry of the members. Metagenomics provides a second tier of technical innovation that facilitates study of the physiology and ecology of environmental microorganisms. Novel genes and gene products discovered through metagenomics include the first bacteriorhodopsin of bacterial origin; novel small molecules with antimicrobial activity; and new members of families of known proteins, such as an Na(+)(Li(+))/H(+) antiporter, RecA, DNA polymerase, and antibiotic resistance determinants. Reassembly of multiple genomes has provided insight into energy and nutrient cycling within the community, genome structure, gene function, population genetics and microheterogeneity, and lateral gene transfer among members of an uncultured community. The application of metagenomic sequence information will facilitate the design of better culturing strategies to link genomic analysis with pure culture studies.
Collapse
Affiliation(s)
- Jo Handelsman
- Department of Plant Pathology, University of Wisconsin-Madison, Madison, WI 53706, USA.
| |
Collapse
|