2301
|
Clark MS, Thorne MA, Vieira FA, Cardoso JC, Power DM, Peck LS. Insights into shell deposition in the Antarctic bivalve Laternula elliptica: gene discovery in the mantle transcriptome using 454 pyrosequencing. BMC Genomics 2010; 11:362. [PMID: 20529341 PMCID: PMC2896379 DOI: 10.1186/1471-2164-11-362] [Citation(s) in RCA: 133] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2010] [Accepted: 06/08/2010] [Indexed: 11/21/2022] Open
Abstract
Background The Antarctic clam, Laternula elliptica, is an infaunal stenothermal bivalve mollusc with a circumpolar distribution. It plays a significant role in bentho-pelagic coupling and hence has been proposed as a sentinel species for climate change monitoring. Previous studies have shown that this mollusc displays a high level of plasticity with regard to shell deposition and damage repair against a background of genetic homogeneity. The Southern Ocean has amongst the lowest present-day CaCO3 saturation rate of any ocean region, and is predicted to be among the first to become undersaturated under current ocean acidification scenarios. Hence, this species presents as an ideal candidate for studies into the processes of calcium regulation and shell deposition in our changing ocean environments. Results 454 sequencing of L. elliptica mantle tissue generated 18,290 contigs with an average size of 535 bp (ranging between 142 bp-5.591 kb). BLAST sequence similarity searching assigned putative function to 17% of the data set, with a significant proportion of these transcripts being involved in binding and potentially of a secretory nature, as defined by GO molecular function and biological process classifications. These results indicated that the mantle is a transcriptionally active tissue which is actively proliferating. All transcripts were screened against an in-house database of genes shown to be involved in extracellular matrix formation and calcium homeostasis in metazoans. Putative identifications were made for a number of classical shell deposition genes, such as tyrosinase, carbonic anhydrase and metalloprotease 1, along with novel members of the family 2 G-Protein Coupled Receptors (GPCRs). A membrane transport protein (SEC61) was also characterised and this demonstrated the utility of the clam sequence data as a resource for examining cold adapted amino acid substitutions. The sequence data contained 46,235 microsatellites and 13,084 Single Nucleotide Polymorphisms(SNPs/INDELS), providing a resource for population and also gene function studies. Conclusions This is the first 454 data from an Antarctic marine invertebrate. Sequencing of mantle tissue from this non-model species has considerably increased resources for the investigation of the processes of shell deposition and repair in molluscs in a changing environment. A number of promising candidate genes were identified for functional analyses, which will be the subject of further investigation in this species and also used in model-hopping experiments in more tractable and economically important model aquaculture species, such as Crassostrea gigas and Mytilus edulis.
Collapse
Affiliation(s)
- Melody S Clark
- British Antarctic Survey, Natural Environment Research Council, High Cross, Madingley Road, Cambridge CB30ET, UK.
| | | | | | | | | | | |
Collapse
|
2302
|
Affiliation(s)
- Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America.
| | | |
Collapse
|
2303
|
Mocali S, Benedetti A. Exploring research frontiers in microbiology: the challenge of metagenomics in soil microbiology. Res Microbiol 2010; 161:497-505. [PMID: 20452420 DOI: 10.1016/j.resmic.2010.04.010] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2009] [Revised: 04/13/2010] [Accepted: 04/13/2010] [Indexed: 11/28/2022]
Abstract
Soil is one of the most complex and challenging environments for microbiologists. In fact, although it contains the largest microbial diversity on the planet, the majority of these microbes are still uncharacterized and represent an enormous unexplored reservoir of genetic and metabolic diversity. Metagenomics, the study of the entire genome of soil biota, currently represents a powerful tool for assessing the diversity of complex microbial communities, providing access to a number of new species, genes or novel molecules that are relevant for biotechnology and agricultural applications. In this paper, the onset of new high-throughput metagenomic approaches and new perspectives in soil microbial ecology and data handling are discussed.
Collapse
Affiliation(s)
- Stefano Mocali
- CRA- Centro di Ricerca per lo Studio delle relazioni tra Pianta e Suolo, Via della Navicella, 2/4, 00184 Roma, Italy.
| | | |
Collapse
|
2304
|
Pfister CA, Meyer F, Antonopoulos DA. Metagenomic profiling of a microbial assemblage associated with the California mussel: a node in networks of carbon and nitrogen cycling. PLoS One 2010; 5:e10518. [PMID: 20463896 PMCID: PMC2865538 DOI: 10.1371/journal.pone.0010518] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2009] [Accepted: 04/06/2010] [Indexed: 11/19/2022] Open
Abstract
Mussels are conspicuous and often abundant members of rocky shores and may constitute an important site for the nitrogen cycle due to their feeding and excretion activities. We used shotgun metagenomics of the microbial community associated with the surface of mussels (Mytilus californianus) on Tatoosh Island in Washington state to test whether there is a nitrogen-based microbial assemblage associated with mussels. Analyses of both tidepool mussels and those on emergent benches revealed a diverse community of Bacteria and Archaea with approximately 31 million bp from 6 mussels in each habitat. Using MG-RAST, between 22.5–25.6% were identifiable using the SEED non-redundant database for proteins. Of those fragments that were identifiable through MG-RAST, the composition was dominated by Cyanobacteria and Alpha- and Gamma-proteobacteria. Microbial composition was highly similar between the tidepool and emergent bench mussels, suggesting similar functions across these different microhabitats. One percent of the proteins identified in each sample were related to nitrogen cycling. When normalized to protein discovery rate, the high diversity and abundance of enzymes related to the nitrogen cycle in mussel-associated microbes is as great or greater than that described for other marine metagenomes. In some instances, the nitrogen-utilizing profile of this assemblage was more concordant with soil metagenomes in the Midwestern U.S. than for open ocean system. Carbon fixation and Calvin cycle enzymes further represented 0.65 and 1.26% of all proteins and their abundance was comparable to a number of open ocean marine metagenomes. In sum, the diversity and abundance of nitrogen and carbon cycle related enzymes in the microbes occupying the shells of Mytilus californianus suggest these mussels provide a node for microbial populations and thus biogeochemical processes.
Collapse
Affiliation(s)
- Catherine A Pfister
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois, United States of America.
| | | | | |
Collapse
|
2305
|
The functional potential of high Arctic permafrost revealed by metagenomic sequencing, qPCR and microarray analyses. ISME JOURNAL 2010; 4:1206-14. [DOI: 10.1038/ismej.2010.41] [Citation(s) in RCA: 238] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
2306
|
Metagenome of the Mediterranean deep chlorophyll maximum studied by direct and fosmid library 454 pyrosequencing. ISME JOURNAL 2010; 4:1154-66. [PMID: 20393571 DOI: 10.1038/ismej.2010.44] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The deep chlorophyll maximum (DCM) is a zone of maximal photosynthetic activity, generally located toward the base of the photic zone in lakes and oceans. In the tropical waters, this is a permanent feature, but in the Mediterranean and other temperate waters, the DCM is a seasonal phenomenon. The metagenome from a single sample of a mature Mediterranean DCM community has been 454 pyrosequenced both directly and after cloning in fosmids. This study is the first to be carried out at this sequencing depth (ca. 600 Mb combining direct and fosmid sequencing) at any DCM. Our results indicate a microbial community massively dominated by the high-light-adapted Prochlorococcus marinus subsp. pastoris, Synechococcus sp., and the heterotroph Candidatus Pelagibacter. The sequences retrieved were remarkably similar to the existing genome of P. marinus subsp. pastoris with a nucleotide identity over 98%. Besides, we found a large number of cyanophages that could prey on this microbe, although sequence conservation was much lower. The high abundance of phage sequences in the cellular size fraction indicated a remarkably high proportion of cells suffering phage lytic attack. In addition, several fosmids clearly belonging to Group II Euryarchaeota were retrieved and recruited many fragments from the total direct DNA sequencing suggesting that this group might be quite abundant in this habitat. The comparison between the direct and fosmids sequencing revealed a bias in the fosmid libraries against low-GC DNA and specifically against the two most dominant members of the community, Candidatus Pelagibacter and P. marinus subsp. pastoris, thus unexpectedly providing a feasible method to obtain large genomic fragments from other less prevalent members of this community.
Collapse
|
2307
|
Hewson I, Poretsky RS, Tripp HJ, Montoya JP, Zehr JP. Spatial patterns and light-driven variation of microbial population gene expression in surface waters of the oligotrophic open ocean. Environ Microbiol 2010; 12:1940-56. [PMID: 20406287 DOI: 10.1111/j.1462-2920.2010.02198.x] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Because bacterioplankton production rates do not vary strongly across vast expanses of the ocean, it is unclear how variability in community structure corresponds with functional variability in the open ocean. We surveyed community transcript functional profiles at eight locations in the open ocean, in both the light and in the dark, using the genomic subsystems approach, to understand variability in gene expression patterns in surface waters. Metatranscriptomes from geographically distinct areas and collected during the day and night shared a large proportion of metabolic functional similarity (74%) at the finest metabolic resolution possible. The variability between metatranscriptomes could be explained by phylogenetic differences between libraries (Mantel test, P < 0.0001). Several key gene expression pathways, including Photosystem I, Photosystem II and ammonium uptake, demonstrated the most variability both geographically and between light and dark. Libraries were dominated by transcripts of the cyanobacterium Prochlorocococcus marinus, where most geographical and diel variability between metatranscriptomes reflected between-station differences in cyanobacterial phototrophic metabolism. Our results demonstrate that active genetic machinery in surface waters of the ocean is dominated by photosynthetic microorganisms and their site-to-site variability, while variability in the remainder of assemblages is dependent on local taxonomic composition.
Collapse
Affiliation(s)
- Ian Hewson
- Department of Microbiology, Cornell University, Wing Hall 403, Ithaca, NY 14853, USA.
| | | | | | | | | |
Collapse
|
2308
|
Rodríguez-Palenzuela P, Matas IM, Murillo J, López-Solanilla E, Bardaji L, Pérez-Martínez I, Rodríguez-Moskera ME, Penyalver R, López MM, Quesada JM, Biehl BS, Perna NT, Glasner JD, Cabot EL, Neeno-Eckwall E, Ramos C. Annotation and overview of thePseudomonas savastanoipv. savastanoi NCPPB 3335 draft genome reveals the virulence gene complement of a tumour-inducing pathogen of woody hosts. Environ Microbiol 2010; 12:1604-20. [DOI: 10.1111/j.1462-2920.2010.02207.x] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
2309
|
Uroz S, Buée M, Murat C, Frey-Klett P, Martin F. Pyrosequencing reveals a contrasted bacterial diversity between oak rhizosphere and surrounding soil. ENVIRONMENTAL MICROBIOLOGY REPORTS 2010; 2:281-8. [PMID: 23766079 DOI: 10.1111/j.1758-2229.2009.00117.x] [Citation(s) in RCA: 183] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/19/2023]
Abstract
Several reports have highlighted that forest soil samples are more phylum-rich than agricultural soil samples. However, little is known about the structure and richness of the bacterial communities in forest soil. Using high-throughput next generation 454 pyrosequencing, we deeply investigated the diversity of bacterial communities colonizing the oak rhizosphere niche and the surrounding soil. From three spatially independent soil samples, we obtained over 300 000 partial 16S rRNA gene sequences. The most abundant bacterial groups were the Acidobacteria, Proteobacteria and unclassified bacteria. Multifactorial analysis of the relative proportions of the different phyla revealed a net differentiation of the bacterial communities present in the rhizosphere and soil environments, suggesting an oak rhizosphere effect. Significantly more β-, γ- and unclassified Proteobacteria inhabited the rhizosphere when compared with the surrounding soil. Conversely, significantly more unclassified bacteria were detected in the bulk soil than in the rhizosphere, demonstrating that the soil remains a challenging reservoir of complexity. This work increases our understanding of the niche effect on bacterial diversity and on the rare phylogenetic groups inhabiting the soil.
Collapse
Affiliation(s)
- Stéphane Uroz
- INRA, UMR1136 INRA-Nancy Université'Interactions Arbres-Microorganismes', IFR 110, Centre INRA de Nancy, 54280 Champenoux, France
| | | | | | | | | |
Collapse
|
2310
|
Tanenbaum DM, Goll J, Murphy S, Kumar P, Zafar N, Thiagarajan M, Madupu R, Davidsen T, Kagan L, Kravitz S, Rusch DB, Yooseph S. The JCVI standard operating procedure for annotating prokaryotic metagenomic shotgun sequencing data. Stand Genomic Sci 2010; 2:229-37. [PMID: 21304707 PMCID: PMC3035284 DOI: 10.4056/sigs.651139] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
The JCVI metagenomics analysis pipeline provides for the efficient and consistent annotation of shotgun metagenomics sequencing data for sampling communities of prokaryotic organisms. The process can be equally applied to individual sequence reads from traditional Sanger capillary electrophoresis sequences, newer technologies such as 454 pyrosequencing, or sequence assemblies derived from one or more of these data types. It includes the analysis of both coding and non-coding genes, whether full-length or, as is often the case for shotgun metagenomics, fragmentary. The system is designed to provide the best-supported conservative functional annotation based on a combination of trusted homology-based scientific evidence and computational assertions and an annotation value hierarchy established through extensive manual curation. The functional annotation attributes assigned by this system include gene name, gene symbol, GO terms, EC numbers, and JCVI functional role categories.
Collapse
Affiliation(s)
| | | | - Sean Murphy
- J. Craig Venter Institute, Rockville, MD 20850
| | | | | | | | | | | | | | | | | | - Shibu Yooseph
- J. Craig Venter Institute, San Diego, CA 92121
- Corresponding author: Shibu Yooseph ()
| |
Collapse
|
2311
|
Marine metagenomics: new tools for the study and exploitation of marine microbial metabolism. Mar Drugs 2010; 8:608-28. [PMID: 20411118 PMCID: PMC2857354 DOI: 10.3390/md8030608] [Citation(s) in RCA: 119] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2010] [Revised: 02/04/2010] [Accepted: 03/12/2010] [Indexed: 12/21/2022] Open
Abstract
The marine environment is extremely diverse, with huge variations in pressure and temperature. Nevertheless, life, especially microbial life, thrives throughout the marine biosphere and microbes have adapted to all the divergent environments present. Large scale DNA sequence based approaches have recently been used to investigate the marine environment and these studies have revealed that the oceans harbor unprecedented microbial diversity. Novel gene families with representatives only within such metagenomic datasets represent a large proportion of the ocean metagenome. The presence of so many new gene families from these uncultured and highly diverse microbial populations represents a challenge for the understanding of and exploitation of the biology and biochemistry of the ocean environment. The application of new metagenomic and single cell genomics tools offers new ways to explore the complete metabolic diversity of the marine biome.
Collapse
|
2312
|
Aziz RK, Breitbart M, Edwards RA. Transposases are the most abundant, most ubiquitous genes in nature. Nucleic Acids Res 2010; 38:4207-17. [PMID: 20215432 PMCID: PMC2910039 DOI: 10.1093/nar/gkq140] [Citation(s) in RCA: 198] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Genes, like organisms, struggle for existence, and the most successful genes persist and widely disseminate in nature. The unbiased determination of the most successful genes requires access to sequence data from a wide range of phylogenetic taxa and ecosystems, which has finally become achievable thanks to the deluge of genomic and metagenomic sequences. Here, we analyzed 10 million protein-encoding genes and gene tags in sequenced bacterial, archaeal, eukaryotic and viral genomes and metagenomes, and our analysis demonstrates that genes encoding transposases are the most prevalent genes in nature. The finding that these genes, classically considered as selfish genes, outnumber essential or housekeeping genes suggests that they offer selective advantage to the genomes and ecosystems they inhabit, a hypothesis in agreement with an emerging body of literature. Their mobile nature not only promotes dissemination of transposable elements within and between genomes but also leads to mutations and rearrangements that can accelerate biological diversification and—consequently—evolution. By securing their own replication and dissemination, transposases guarantee to thrive so long as nucleic acid-based life forms exist.
Collapse
Affiliation(s)
- Ramy K Aziz
- Computation Institute, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
2313
|
Abstract
Metagenomics is a discipline that enables the genomic study of uncultured microorganisms. Faster, cheaper sequencing technologies and the ability to sequence uncultured microbes sampled directly from their habitats are expanding and transforming our view of the microbial world. Distilling meaningful information from the millions of new genomic sequences presents a serious challenge to bioinformaticians. In cultured microbes, the genomic data come from a single clone, making sequence assembly and annotation tractable. In metagenomics, the data come from heterogeneous microbial communities, sometimes containing more than 10,000 species, with the sequence data being noisy and partial. From sampling, to assembly, to gene calling and function prediction, bioinformatics faces new demands in interpreting voluminous, noisy, and often partial sequence data. Although metagenomics is a relative newcomer to science, the past few years have seen an explosion in computational methods applied to metagenomic-based research. It is therefore not within the scope of this article to provide an exhaustive review. Rather, we provide here a concise yet comprehensive introduction to the current computational requirements presented by metagenomics, and review the recent progress made. We also note whether there is software that implements any of the methods presented here, and briefly review its utility. Nevertheless, it would be useful if readers of this article would avail themselves of the comment section provided by this journal, and relate their own experiences. Finally, the last section of this article provides a few representative studies illustrating different facets of recent scientific discoveries made using metagenomics.
Collapse
Affiliation(s)
- John C. Wooley
- Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America
| | - Adam Godzik
- Community Cyberinfrastructure for Marine Microbial Ecology Research and Analysis, California Institute for Telecommunications and Information Technology, University of California San Diego, La Jolla, California, United States of America
- Program in Bioinformatics and Systems Biology, Burnham Institute for Medical Research, La Jolla, California, United States of America
| | - Iddo Friedberg
- Department of Microbiology, Miami University, Oxford, Ohio, United States of America
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, United States of America
| |
Collapse
|
2314
|
Schreiber F, Gumrich P, Daniel R, Meinicke P. Treephyler: fast taxonomic profiling of metagenomes. Bioinformatics 2010; 26:960-1. [DOI: 10.1093/bioinformatics/btq070] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
2315
|
Parks DH, Beiko RG. Identifying biologically relevant differences between metagenomic communities. ACTA ACUST UNITED AC 2010; 26:715-21. [PMID: 20130030 DOI: 10.1093/bioinformatics/btq041] [Citation(s) in RCA: 666] [Impact Index Per Article: 47.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Metagenomics is the study of genetic material recovered directly from environmental samples. Taxonomic and functional differences between metagenomic samples can highlight the influence of ecological factors on patterns of microbial life in a wide range of habitats. Statistical hypothesis tests can help us distinguish ecological influences from sampling artifacts, but knowledge of only the P-value from a statistical hypothesis test is insufficient to make inferences about biological relevance. Current reporting practices for pairwise comparative metagenomics are inadequate, and better tools are needed for comparative metagenomic analysis. RESULTS We have developed a new software package, STAMP, for comparative metagenomics that supports best practices in analysis and reporting. Examination of a pair of iron mine metagenomes demonstrates that deeper biological insights can be gained using statistical techniques available in our software. An analysis of the functional potential of 'Candidatus Accumulibacter phosphatis' in two enhanced biological phosphorus removal metagenomes identified several subsystems that differ between the A.phosphatis stains in these related communities, including phosphate metabolism, secretion and metal transport. AVAILABILITY Python source code and binaries are freely available from our website at http://kiwi.cs.dal.ca/Software/STAMP CONTACT: beiko@cs.dal.ca SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Donovan H Parks
- Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada B3H 1W5
| | | |
Collapse
|
2316
|
Craft JA, Gilbert JA, Temperton B, Dempsey KE, Ashelford K, Tiwari B, Hutchinson TH, Chipman JK. Pyrosequencing of Mytilus galloprovincialis cDNAs: tissue-specific expression patterns. PLoS One 2010; 5:e8875. [PMID: 20111607 PMCID: PMC2810337 DOI: 10.1371/journal.pone.0008875] [Citation(s) in RCA: 111] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2009] [Accepted: 12/21/2009] [Indexed: 01/28/2023] Open
Abstract
Background Mytilus species are important in marine ecology and in environmental quality assessment, yet their molecular biology is poorly understood. Molecular aspects of their reproduction, hybridisation between species, mitochondrial inheritance, skewed sex ratios of offspring and adaptation to climatic and pollution factors are priority areas. Methodology/Principal Findings To start to address this situation, expressed genetic transcripts from M. galloprovincialis were pyrosequenced. Transcripts were isolated from the digestive gland, foot, gill and mantle of both male and female mussels. In total, 175,547 sequences were obtained and for foot and mantle, 90% of the sequences could be assembled into contiguous fragments but this reduced to 75% for the digestive gland and gill. Transcripts relating to protein metabolism and respiration dominated including ribosomal proteins, cytochrome oxidases and NADH dehydrogenase subunits. Tissue specific variation was identified in transcripts associated with mitochondrial energy metabolism, with the digestive gland and gill having the greatest transcript abundance. Using fragment recruitment it was also possible to identify sites of potential small RNAs involved in mitochondrial transcriptional regulation. Sex ratios based on Vitelline Envelop Receptor for Lysin and Vitelline Coat Lysin transcript abundances, indicated that an equal sex distribution was maintained. Taxonomic profiling of the M. galloprovincialis tissues highlighted an abundant microbial flora associated with the digestive gland. Profiling of the tissues for genes involved in intermediary metabolism demonstrated that the gill and digestive gland were more similar to each other than to the other two tissues, and specifically the foot transcriptome was most dissimilar. Conclusions Pyrosequencing has provided extensive genomic information for M. galloprovincialis and generated novel observations on expression of different tissues, mitochondria and associated microorganisms. It will also facilitate the much needed production of an oligonucleotide microarray for the organism.
Collapse
Affiliation(s)
- John A Craft
- Biological and Biomedical Sciences, Glasgow Caledonian University, Glasgow, United Kingdom.
| | | | | | | | | | | | | | | |
Collapse
|
2317
|
Allgaier M, Reddy A, Park JI, Ivanova N, D'haeseleer P, Lowry S, Sapra R, Hazen TC, Simmons BA, VanderGheynst JS, Hugenholtz P. Targeted discovery of glycoside hydrolases from a switchgrass-adapted compost community. PLoS One 2010; 5:e8812. [PMID: 20098679 PMCID: PMC2809096 DOI: 10.1371/journal.pone.0008812] [Citation(s) in RCA: 155] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2009] [Accepted: 12/24/2009] [Indexed: 11/18/2022] Open
Abstract
Development of cellulosic biofuels from non-food crops is currently an area of intense research interest. Tailoring depolymerizing enzymes to particular feedstocks and pretreatment conditions is one promising avenue of research in this area. Here we added a green-waste compost inoculum to switchgrass (Panicum virgatum) and simulated thermophilic composting in a bioreactor to select for a switchgrass-adapted community and to facilitate targeted discovery of glycoside hydrolases. Small-subunit (SSU) rRNA-based community profiles revealed that the microbial community changed dramatically between the initial and switchgrass-adapted compost (SAC) with some bacterial populations being enriched over 20-fold. We obtained 225 Mbp of 454-titanium pyrosequence data from the SAC community and conservatively identified 800 genes encoding glycoside hydrolase domains that were biased toward depolymerizing grass cell wall components. Of these, approximately 10% were putative cellulases mostly belonging to families GH5 and GH9. We synthesized two SAC GH9 genes with codon optimization for heterologous expression in Escherichia coli and observed activity for one on carboxymethyl cellulose. The active GH9 enzyme has a temperature optimum of 50 degrees C and pH range of 5.5 to 8 consistent with the composting conditions applied. We demonstrate that microbial communities adapt to switchgrass decomposition using simulated composting condition and that full-length genes can be identified from complex metagenomic sequence data, synthesized and expressed resulting in active enzyme.
Collapse
Affiliation(s)
- Martin Allgaier
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Department of Energy (DOE) Joint Genome Institute, Walnut Creek, California, United States of America
| | - Amitha Reddy
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Department of Biological and Agricultural Engineering, University of California Davis, Davis, California, United States of America
| | - Joshua I. Park
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Biomass Science and Conversion Technology Department, Sandia National Laboratories, Livermore, California, United States of America
| | - Natalia Ivanova
- Department of Energy (DOE) Joint Genome Institute, Walnut Creek, California, United States of America
| | - Patrik D'haeseleer
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Microbial Systems Biology Group, Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Steve Lowry
- Department of Energy (DOE) Joint Genome Institute, Walnut Creek, California, United States of America
| | - Rajat Sapra
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Biomass Science and Conversion Technology Department, Sandia National Laboratories, Livermore, California, United States of America
| | - Terry C. Hazen
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California, United States of America
| | - Blake A. Simmons
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Biomass Science and Conversion Technology Department, Sandia National Laboratories, Livermore, California, United States of America
| | - Jean S. VanderGheynst
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Department of Biological and Agricultural Engineering, University of California Davis, Davis, California, United States of America
| | - Philip Hugenholtz
- Deconstruction Division, Joint BioEnergy Institute, Emeryville, California, United States of America
- Department of Energy (DOE) Joint Genome Institute, Walnut Creek, California, United States of America
| |
Collapse
|
2318
|
Martin NF, Martin F. From Galactic archeology to soil metagenomics - surfing on massive data streams. THE NEW PHYTOLOGIST 2010; 185:343-347. [PMID: 20088974 DOI: 10.1111/j.1469-8137.2009.03138.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
|
2319
|
Kottmann R, Kostadinov I, Duhaime MB, Buttigieg PL, Yilmaz P, Hankeln W, Waldmann J, Glöckner FO. Megx.net: integrated database resource for marine ecological genomics. Nucleic Acids Res 2010; 38:D391-5. [PMID: 19858098 PMCID: PMC2808895 DOI: 10.1093/nar/gkp918] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 10/08/2009] [Indexed: 11/28/2022] Open
Abstract
Megx.net is a database and portal that provides integrated access to georeferenced marker genes, environment data and marine genome and metagenome projects for microbial ecological genomics. All data are stored in the Microbial Ecological Genomics DataBase (MegDB), which is subdivided to hold both sequence and habitat data and global environmental data layers. The extended system provides access to several hundreds of genomes and metagenomes from prokaryotes and phages, as well as over a million small and large subunit ribosomal RNA sequences. With the refined Genes Mapserver, all data can be interactively visualized on a world map and statistics describing environmental parameters can be calculated. Sequence entries have been curated to comply with the proposed minimal standards for genomes and metagenomes (MIGS/MIMS) of the Genomic Standards Consortium. Access to data is facilitated by Web Services. The updated megx.net portal offers microbial ecologists greatly enhanced database content, and new features and tools for data analysis, all of which are freely accessible from our webpage http://www.megx.net.
Collapse
Affiliation(s)
- Renzo Kottmann
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Ivalyo Kostadinov
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Melissa Beth Duhaime
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Pier Luigi Buttigieg
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Pelin Yilmaz
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Wolfgang Hankeln
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Jost Waldmann
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| | - Frank Oliver Glöckner
- Microbial Genomics Group, Max Planck Institute for Marine Microbiology, D-28359 Bremen and Jacobs University Bremen gGmbH, D-28759 Bremen, Germany
| |
Collapse
|
2320
|
Liolios K, Chen IMA, Mavromatis K, Tavernarakis N, Hugenholtz P, Markowitz VM, Kyrpides NC. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata. Nucleic Acids Res 2010; 38:D346-54. [PMID: 19914934 PMCID: PMC2808860 DOI: 10.1093/nar/gkp848] [Citation(s) in RCA: 312] [Impact Index Per Article: 22.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2009] [Accepted: 09/22/2009] [Indexed: 11/14/2022] Open
Abstract
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/
Collapse
Affiliation(s)
- Konstantinos Liolios
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - I-Min A. Chen
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Konstantinos Mavromatis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nektarios Tavernarakis
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Philip Hugenholtz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Victor M. Markowitz
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| | - Nikos C. Kyrpides
- Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA
| |
Collapse
|
2321
|
Liu B, Pop M. Identifying Differentially Abundant Metabolic Pathways in Metagenomic Datasets. BIOINFORMATICS RESEARCH AND APPLICATIONS 2010. [DOI: 10.1007/978-3-642-13078-6_12] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
|
2322
|
Standards and standard-compliance. Stand Genomic Sci 2009. [DOI: 10.1186/bf03356041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
|
2323
|
Gerlach W, Jünemann S, Tille F, Goesmann A, Stoye J. WebCARMA: a web application for the functional and taxonomic classification of unassembled metagenomic reads. BMC Bioinformatics 2009; 10:430. [PMID: 20021646 PMCID: PMC2801688 DOI: 10.1186/1471-2105-10-430] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2009] [Accepted: 12/18/2009] [Indexed: 11/20/2022] Open
Abstract
Background Metagenomics is a new field of research on natural microbial communities. High-throughput sequencing techniques like 454 or Solexa-Illumina promise new possibilities as they are able to produce huge amounts of data in much shorter time and with less efforts and costs than the traditional Sanger technique. But the data produced comes in even shorter reads (35-100 basepairs with Illumina, 100-500 basepairs with 454-sequencing). CARMA is a new software pipeline for the characterisation of species composition and the genetic potential of microbial samples using short, unassembled reads. Results In this paper, we introduce WebCARMA, a refined version of CARMA available as a web application for the taxonomic and functional classification of unassembled (ultra-)short reads from metagenomic communities. In addition, we have analysed the applicability of ultra-short reads in metagenomics. Conclusions We show that unassembled reads as short as 35 bp can be used for the taxonomic classification of a metagenome. The web application is freely available at http://webcarma.cebitec.uni-bielefeld.de.
Collapse
Affiliation(s)
- Wolfgang Gerlach
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| | | | | | | | | |
Collapse
|
2324
|
Angly FE, Willner D, Prieto-Davó A, Edwards RA, Schmieder R, Vega-Thurber R, Antonopoulos DA, Barott K, Cottrell MT, Desnues C, Dinsdale EA, Furlan M, Haynes M, Henn MR, Hu Y, Kirchman DL, McDole T, McPherson JD, Meyer F, Miller RM, Mundt E, Naviaux RK, Rodriguez-Mueller B, Stevens R, Wegley L, Zhang L, Zhu B, Rohwer F. The GAAS metagenomic tool and its estimations of viral and microbial average genome size in four major biomes. PLoS Comput Biol 2009; 5:e1000593. [PMID: 20011103 PMCID: PMC2781106 DOI: 10.1371/journal.pcbi.1000593] [Citation(s) in RCA: 165] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2009] [Accepted: 11/03/2009] [Indexed: 11/18/2022] Open
Abstract
Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.
Collapse
Affiliation(s)
- Florent E Angly
- Biology Department, San Diego State University, San Diego, California, United States of America.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
2325
|
Network analyses structure genetic diversity in independent genetic worlds. Proc Natl Acad Sci U S A 2009; 107:127-32. [PMID: 20007769 DOI: 10.1073/pnas.0908978107] [Citation(s) in RCA: 206] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
DNA flows between chromosomes and mobile elements, following rules that are poorly understood. This limited knowledge is partly explained by the limits of current approaches to study the structure and evolution of genetic diversity. Network analyses of 119,381 homologous DNA families, sampled from 111 cellular genomes and from 165,529 phage, plasmid, and environmental virome sequences, offer challenging insights. Our results support a disconnected yet highly structured network of genetic diversity, revealing the existence of multiple "genetic worlds." These divides define multiple isolated groups of DNA vehicles drawing on distinct gene pools. Mathematical studies of the centralities of these worlds' subnetworks demonstrate that plasmids, not viruses, were key vectors of genetic exchange between bacterial chromosomes, both recently and in the past. Furthermore, network methodology introduces new ways of quantifying current sampling of genetic diversity.
Collapse
|
2326
|
Liang C, Schmid A, López-Sánchez MJ, Moya A, Gross R, Bernhardt J, Dandekar T. JANE: efficient mapping of prokaryotic ESTs and variable length sequence reads on related template genomes. BMC Bioinformatics 2009; 10:391. [PMID: 19943962 PMCID: PMC2789075 DOI: 10.1186/1471-2105-10-391] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2008] [Accepted: 11/29/2009] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND ESTs or variable sequence reads can be available in prokaryotic studies well before a complete genome is known. Use cases include (i) transcriptome studies or (ii) single cell sequencing of bacteria. Without suitable software their further analysis and mapping would have to await finalization of the corresponding genome. RESULTS The tool JANE rapidly maps ESTs or variable sequence reads in prokaryotic sequencing and transcriptome efforts to related template genomes. It provides an easy-to-use graphics interface for information retrieval and a toolkit for EST or nucleotide sequence function prediction. Furthermore, we developed for rapid mapping an enhanced sequence alignment algorithm which reassembles and evaluates high scoring pairs provided from the BLAST algorithm. Rapid assembly on and replacement of the template genome by sequence reads or mapped ESTs is achieved. This is illustrated (i) by data from Staphylococci as well as from a Blattabacteria sequencing effort, (ii) mapping single cell sequencing reads is shown for poribacteria to sister phylum representative Rhodopirellula Baltica SH1. The algorithm has been implemented in a web-server accessible at http://jane.bioapps.biozentrum.uni-wuerzburg.de. CONCLUSION Rapid prokaryotic EST mapping or mapping of sequence reads is achieved applying JANE even without knowing the cognate genome sequence.
Collapse
Affiliation(s)
- Chunguang Liang
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Alexander Schmid
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - María José López-Sánchez
- Department of Evolutionary Genetics, Institut Cavanilles de Biodiversitat i Biologia Evolutiva, University of Valencia, Spain
| | - Andres Moya
- Department of Evolutionary Genetics, Institut Cavanilles de Biodiversitat i Biologia Evolutiva, University of Valencia, Spain
| | - Roy Gross
- Department of Microbiology, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
| | - Jörg Bernhardt
- Institute for Microbiology, Ernst-Moritz-Arndt-University Greifswald, Jahnstrasse 15, 17487 Greifswald, Germany
| | - Thomas Dandekar
- Department of Bioinformatics, Biocenter, University of Würzburg, Am Hubland, D-97074 Würzburg, Germany
- EMBL, Postbox 102209, D-69012 Heidelberg, Germany
| |
Collapse
|
2327
|
Sterk P. Standards and standard-compliance. Stand Genomic Sci 2009; 1:216-7. [PMID: 21304659 PMCID: PMC3035235 DOI: 10.4056/sigs.55934] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
- Peter Sterk
- NERC Centre for Ecology and Hydrology, Oxford, OX1 3SR, United Kingdom
| |
Collapse
|
2328
|
Rosen GL, Sokhansanj BA, Polikar R, Bruns MA, Russell J, Garbarine E, Essinger S, Yok N. Signal processing for metagenomics: extracting information from the soup. Curr Genomics 2009; 10:493-510. [PMID: 20436876 PMCID: PMC2808676 DOI: 10.2174/138920209789208255] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2008] [Revised: 03/31/2009] [Accepted: 04/25/2009] [Indexed: 11/08/2022] Open
Abstract
Traditionally, studies in microbial genomics have focused on single-genomes from cultured species, thereby limiting their focus to the small percentage of species that can be cultured outside their natural environment. Fortunately, recent advances in high-throughput sequencing and computational analyses have ushered in the new field of metagenomics, which aims to decode the genomes of microbes from natural communities without the need for cultivation. Although metagenomic studies have shed a great deal of insight into bacterial diversity and coding capacity, several computational challenges remain due to the massive size and complexity of metagenomic sequence data. Current tools and techniques are reviewed in this paper which address challenges in 1) genomic fragment annotation, 2) phylogenetic reconstruction, 3) functional classification of samples, and 4) interpreting complementary metaproteomics and metametabolomics data. Also surveyed are important applications of metagenomic studies, including microbial forensics and the roles of microbial communities in shaping human health and soil ecology.
Collapse
Affiliation(s)
- Gail L. Rosen
- Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA, USA
| | - Bahrad A. Sokhansanj
- School of Biomedical Engineering, Science, and Health Systems, Drexel University, Philadelphia, PA, USA
| | - Robi Polikar
- Electrical and Computer Engineering Department, Rowan University, Glassboro, NJ, USA
| | - Mary Ann Bruns
- Soil Science/Microbial Ecology, Pennsylvania State University, University Park, PA, USA
| | - Jacob Russell
- Biology Department, Drexel University, Philadelphia, PA, USA
| | - Elaine Garbarine
- Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA, USA
| | - Steve Essinger
- Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA, USA
| | - Non Yok
- Electrical and Computer Engineering Department, Drexel University, Philadelphia, PA, USA
| |
Collapse
|
2329
|
Kosakovsky Pond S, Wadhawan S, Chiaromonte F, Ananda G, Chung WY, Taylor J, Nekrutenko A. Windshield splatter analysis with the Galaxy metagenomic pipeline. Genome Res 2009; 19:2144-53. [PMID: 19819906 DOI: 10.1101/gr.094508.109] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
How many species inhabit our immediate surroundings? A straightforward collection technique suitable for answering this question is known to anyone who has ever driven a car at highway speeds. The windshield of a moving vehicle is subjected to numerous insect strikes and can be used as a collection device for representative sampling. Unfortunately the analysis of biological material collected in that manner, as with most metagenomic studies, proves to be rather demanding due to the large number of required tools and considerable computational infrastructure. In this study, we use organic matter collected by a moving vehicle to design and test a comprehensive pipeline for phylogenetic profiling of metagenomic samples that includes all steps from processing and quality control of data generated by next-generation sequencing technologies to statistical analyses and data visualization. To the best of our knowledge, this is also the first publication that features a live online supplement providing access to exact analyses and workflows used in the article.
Collapse
|
2330
|
Willner D, Furlan M, Haynes M, Schmieder R, Angly FE, Silva J, Tammadoni S, Nosrat B, Conrad D, Rohwer F. Metagenomic analysis of respiratory tract DNA viral communities in cystic fibrosis and non-cystic fibrosis individuals. PLoS One 2009; 4:e7370. [PMID: 19816605 PMCID: PMC2756586 DOI: 10.1371/journal.pone.0007370] [Citation(s) in RCA: 289] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2009] [Accepted: 09/13/2009] [Indexed: 12/28/2022] Open
Abstract
The human respiratory tract is constantly exposed to a wide variety of viruses, microbes and inorganic particulates from environmental air, water and food. Physical characteristics of inhaled particles and airway mucosal immunity determine which viruses and microbes will persist in the airways. Here we present the first metagenomic study of DNA viral communities in the airways of diseased and non-diseased individuals. We obtained sequences from sputum DNA viral communities in 5 individuals with cystic fibrosis (CF) and 5 individuals without the disease. Overall, diversity of viruses in the airways was low, with an average richness of 175 distinct viral genotypes. The majority of viral diversity was uncharacterized. CF phage communities were highly similar to each other, whereas Non-CF individuals had more distinct phage communities, which may reflect organisms in inhaled air. CF eukaryotic viral communities were dominated by a few viruses, including human herpesviruses and retroviruses. Functional metagenomics showed that all Non-CF viromes were similar, and that CF viromes were enriched in aromatic amino acid metabolism. The CF metagenomes occupied two different metabolic states, probably reflecting different disease states. There was one outlying CF virome which was characterized by an over-representation of Guanosine-5'-triphosphate,3'-diphosphate pyrophosphatase, an enzyme involved in the bacterial stringent response. Unique environments like the CF airway can drive functional adaptations, leading to shifts in metabolic profiles. These results have important clinical implications for CF, indicating that therapeutic measures may be more effective if used to change the respiratory environment, as opposed to shifting the taxonomic composition of resident microbiota.
Collapse
Affiliation(s)
- Dana Willner
- Department of Biology, San Diego State University, San Diego, California, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
2331
|
Logares R, Bråte J, Heinrich F, Shalchian-Tabrizi K, Bertilsson S. Infrequent transitions between saline and fresh waters in one of the most abundant microbial lineages (SAR11). Mol Biol Evol 2009; 27:347-57. [PMID: 19808864 DOI: 10.1093/molbev/msp239] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
The aquatic bacterial group SAR11 is one of the most abundant organisms on Earth, with an estimated global population size of 2.4 x 10(28) cells in the oceans. Members of SAR11 have also been detected in brackish and fresh waters, but the evolutionary relationships between the species present in the different environments have been ambiguous. In particular, it was not clear how frequently this lineage has crossed the saline-freshwater boundary during its evolutionary diversification. Due to the huge population size of SAR11 and the potential of microbes for long-distance dispersal, we hypothesized that environmental transitions could have occurred repeatedly during the evolutionary diversification of this group. Here, we have constructed extensive 16S rDNA-based molecular phylogenies and undertaken metagenomic data analyses to assess the frequency of saline-freshwater transitions in SAR11 and to investigate the evolutionary implications of this process. Our analyses indicated that very few saline-freshwater transitions occurred during the evolutionary diversification of SAR11, generating genetically distinct saline and freshwater lineages that do not appear to exchange genes extensively via horizontal gene transfer. In contrast to lineages from saline environments, extant freshwater taxa from diverse, and sometimes distant, geographic locations were very closely related. This points to a rapid diversification and dispersal in fresh waters or to slower evolutionary rates in fresh water SAR11 when compared with marine counterparts. In addition, the colonization of both saline and fresh waters appears to have occurred early in the evolution of SAR11. We conclude that the different biogeochemical conditions that prevail in saline and fresh waters have likely prevented the environmental transitions in SAR11, promoting the evolution of clearly distinct lineages in each environment.
Collapse
Affiliation(s)
- Ramiro Logares
- Limnology/Department of Ecology and Evolution, Uppsala University, Sweden.
| | | | | | | | | |
Collapse
|
2332
|
Lazarevic V, Whiteson K, Huse S, Hernandez D, Farinelli L, Osterås M, Schrenzel J, François P. Metagenomic study of the oral microbiota by Illumina high-throughput sequencing. J Microbiol Methods 2009; 79:266-71. [PMID: 19796657 DOI: 10.1016/j.mimet.2009.09.012] [Citation(s) in RCA: 267] [Impact Index Per Article: 17.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2009] [Accepted: 09/14/2009] [Indexed: 02/01/2023]
Abstract
To date, metagenomic studies have relied on the utilization and analysis of reads obtained using 454 pyrosequencing to replace conventional Sanger sequencing. After extensively scanning the 16S ribosomal RNA (rRNA) gene, we identified the V5 hypervariable region as a short region providing reliable identification of bacterial sequences available in public databases such as the Human Oral Microbiome Database. We amplified samples from the oral cavity of three healthy individuals using primers covering an approximately 82-base segment of the V5 loop, and sequenced using the Illumina technology in a single orientation. We identified 135 genera or higher taxonomic ranks from the resulting 1,373,824 sequences. While the abundances of the most common phyla (Firmicutes, Proteobacteria, Actinobacteria, Fusobacteria and TM7) are largely comparable to previous studies, Bacteroidetes were less present. Potential sources for this difference include classification bias in this region of the 16S rRNA gene, human sample variation, sample preparation and primer bias. Using an Illumina sequencing approach, we achieved a much greater depth of coverage than previous oral microbiota studies, allowing us to identify several taxa not yet discovered in these types of samples, and to assess that at least 30,000 additional reads would be required to identify only one additional phylotype. The evolution of high-throughput sequencing technologies, and their subsequent improvements in read length enable the utilization of different platforms for studying communities of complex flora. Access to large amounts of data is already leading to a better representation of sample diversity at a reasonable cost.
Collapse
Affiliation(s)
- Vladimir Lazarevic
- Genomic Research Laboratory, Geneva University Hospitals, Rue Gabrielle-Perret-Gentil 4, CH-1211 Geneva 14, Switzerland.
| | | | | | | | | | | | | | | |
Collapse
|
2333
|
Abstract
We present FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison. Using the Subsystem approach the manual curation is carried out, ensuring a previously unattained degree of throughput and consistency. FIGfams are based on over 950 000 manually annotated proteins and across many hundred Bacteria and Archaea. Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins. FIGfams are freely available under an open source license. These can be downloaded at ftp://ftp.theseed.org/FIGfams/. The web site for FIGfams is http://www.theseed.org/wiki/FIGfams/
Collapse
Affiliation(s)
- Folker Meyer
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA.
| | | | | |
Collapse
|
2334
|
Simon C, Daniel R. Achievements and new knowledge unraveled by metagenomic approaches. Appl Microbiol Biotechnol 2009; 85:265-76. [PMID: 19760178 PMCID: PMC2773367 DOI: 10.1007/s00253-009-2233-z] [Citation(s) in RCA: 109] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2009] [Revised: 08/25/2009] [Accepted: 08/25/2009] [Indexed: 02/01/2023]
Abstract
Metagenomics has paved the way for cultivation-independent assessment and exploitation of microbial communities present in complex ecosystems. In recent years, significant progress has been made in this research area. A major breakthrough was the improvement and development of high-throughput next-generation sequencing technologies. The application of these technologies resulted in the generation of large datasets derived from various environments such as soil and ocean water. The analyses of these datasets opened a window into the enormous phylogenetic and metabolic diversity of microbial communities living in a variety of ecosystems. In this way, structure, functions, and interactions of microbial communities were elucidated. Metagenomics has proven to be a powerful tool for the recovery of novel biomolecules. In most cases, functional metagenomics comprising construction and screening of complex metagenomic DNA libraries has been applied to isolate new enzymes and drugs of industrial importance. For this purpose, several novel and improved screening strategies that allow efficient screening of large collections of clones harboring metagenomes have been introduced.
Collapse
Affiliation(s)
- Carola Simon
- Department of Genomic and Applied Microbiology, Institute of Microbiology and Genetics, Georg-August University Göttingen, Grisebachstr 8, 37077 Göttingen, Germany
| | | |
Collapse
|
2335
|
Tsafnat G, Coiera E, Partridge SR, Schaeffer J, Iredell JR. Context-driven discovery of gene cassettes in mobile integrons using a computational grammar. BMC Bioinformatics 2009; 10:281. [PMID: 19735578 PMCID: PMC3087341 DOI: 10.1186/1471-2105-10-281] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2009] [Accepted: 09/08/2009] [Indexed: 01/13/2023] Open
Abstract
Background Gene discovery algorithms typically examine sequence data for low level patterns. A novel method to computationally discover higher order DNA structures is presented, using a context sensitive grammar. The algorithm was applied to the discovery of gene cassettes associated with integrons. The discovery and annotation of antibiotic resistance genes in such cassettes is essential for effective monitoring of antibiotic resistance patterns and formulation of public health antibiotic prescription policies. Results We discovered two new putative gene cassettes using the method, from 276 integron features and 978 GenBank sequences. The system achieved κ = 0.972 annotation agreement with an expert gold standard of 300 sequences. In rediscovery experiments, we deleted 789,196 cassette instances over 2030 experiments and correctly relabelled 85.6% (α ≥ 95%, E ≤ 1%, mean sensitivity = 0.86, specificity = 1, F-score = 0.93), with no false positives. Error analysis demonstrated that for 72,338 missed deletions, two adjacent deleted cassettes were labeled as a single cassette, increasing performance to 94.8% (mean sensitivity = 0.92, specificity = 1, F-score = 0.96). Conclusion Using grammars we were able to represent heuristic background knowledge about large and complex structures in DNA. Importantly, we were also able to use the context embedded in the model to discover new putative antibiotic resistance gene cassettes. The method is complementary to existing automatic annotation systems which operate at the sequence level.
Collapse
Affiliation(s)
- Guy Tsafnat
- Centre for Health Informatics, Univ. of New South Wales, Sydney, NSW 2052, Australia.
| | | | | | | | | |
Collapse
|
2336
|
Meinicke P. UFO: a web server for ultra-fast functional profiling of whole genome protein sequences. BMC Genomics 2009; 10:409. [PMID: 19725959 PMCID: PMC2744726 DOI: 10.1186/1471-2164-10-409] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2009] [Accepted: 09/02/2009] [Indexed: 11/10/2022] Open
Abstract
Background Functional profiling is a key technique to characterize and compare the functional potential of entire genomes. The estimation of profiles according to an assignment of sequences to functional categories is a computationally expensive task because it requires the comparison of all protein sequences from a genome with a usually large database of annotated sequences or sequence families. Description Based on machine learning techniques for Pfam domain detection, the UFO web server for ultra-fast functional profiling allows researchers to process large protein sequence collections instantaneously. Besides the frequencies of Pfam and GO categories, the user also obtains the sequence specific assignments to Pfam domain families. In addition, a comparison with existing genomes provides dissimilarity scores with respect to 821 reference proteomes. Considering the underlying UFO domain detection, the results on 206 test genomes indicate a high sensitivity of the approach. In comparison with current state-of-the-art HMMs, the runtime measurements show a considerable speed up in the range of four orders of magnitude. For an average size prokaryotic genome, the computation of a functional profile together with its comparison typically requires about 10 seconds of processing time. Conclusion For the first time the UFO web server makes it possible to get a quick overview on the functional inventory of newly sequenced organisms. The genome scale comparison with a large number of precomputed profiles allows a first guess about functionally related organisms. The service is freely available and does not require user registration or specification of a valid email address.
Collapse
Affiliation(s)
- Peter Meinicke
- Department of Bioinformatics, Institute of Microbiology and Genetics, Georg-August-University Göttingen, Germany.
| |
Collapse
|
2337
|
Kristiansson E, Hugenholtz P, Dalevi D. ShotgunFunctionalizeR: an R-package for functional comparison of metagenomes. ACTA ACUST UNITED AC 2009; 25:2737-8. [PMID: 19696045 DOI: 10.1093/bioinformatics/btp508] [Citation(s) in RCA: 110] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
UNLABELLED Microorganisms are ubiquitous in nature and constitute intrinsic parts of almost every ecosystem. A culture-independent and powerful way to study microbial communities is metagenomics. In such studies, functional analysis is performed on fragmented genetic material from multiple species in the community. The recent advances in high-throughput sequencing have greatly increased the amount of data in metagenomic projects. At present, there is an urgent need for efficient statistical tools to analyse these data. We have created ShotgunFunctionalizeR, an R-package for functional comparison of metagenomes. The package contains tools for importing, annotating and visualizing metagenomic data produced by shotgun high-throughput sequencing. ShotgunFunctionalizeR contains several statistical procedures for assessing functional differences between samples, both for individual genes and for entire pathways. In addition to standard and previously published methods, we have developed and implemented a novel approach based on a Poisson model. This procedure is highly flexible and thus applicable to a wide range of different experimental designs. We demonstrate the potential of ShotgunFunctionalizeR by performing a regression analysis on metagenomes sampled at multiple depths in the Pacific Ocean. AVAILABILITY http://shotgun.zool.gu.se
Collapse
Affiliation(s)
- Erik Kristiansson
- Department of Zoology, University of Gothenburg, Department of Neuroscience and Physiology, the Sahlgrenska Academy at the University of Gothenburg, Göteborg, Sweden.
| | | | | |
Collapse
|
2338
|
Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol 2009; 5:e1000465. [PMID: 19680427 PMCID: PMC2714467 DOI: 10.1371/journal.pcbi.1000465] [Citation(s) in RCA: 309] [Impact Index Per Article: 20.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2009] [Accepted: 07/10/2009] [Indexed: 11/18/2022] Open
Abstract
A common biological pathway reconstruction approach -- as implemented by many automatic biological pathway services (such as the KAAS and RAST servers) and the functional annotation of metagenomic sequences -- starts with the identification of protein functions or families (e.g., KO families for the KEGG database and the FIG families for the SEED database) in the query sequences, followed by a direct mapping of the identified protein families onto pathways. Given a predicted patchwork of individual biochemical steps, some metric must be applied in deciding what pathways actually exist in the genome or metagenome represented by the sequences. Commonly, and straightforwardly, a complete biological pathway can be identified in a dataset if at least one of the steps associated with the pathway is found. We report, however, that this naïve mapping approach leads to an inflated estimate of biological pathways, and thus overestimates the functional diversity of the sample from which the DNA sequences are derived. We developed a parsimony approach, called MinPath (Minimal set of Pathways), for biological pathway reconstructions using protein family predictions, which yields a more conservative, yet more faithful, estimation of the biological pathways for a query dataset. MinPath identified far fewer pathways for the genomes collected in the KEGG database -- as compared to the naïve mapping approach -- eliminating some obviously spurious pathway annotations. Results from applying MinPath to several metagenomes indicate that the common methods used for metagenome annotation may significantly overestimate the biological pathways encoded by microbial communities.
Collapse
Affiliation(s)
- Yuzhen Ye
- School of Informatics, Indiana University, Bloomington, IN, USA.
| | | |
Collapse
|
2339
|
Abstract
Microbiology is a relatively modern scientific discipline intended to objectively study microorganisms, including pathogens and nonpathogens. However, since its birth, this science has been negatively affected by anthropocentric convictions, including rational and irrational beliefs. Among these, for example, is the artificial separation between environmental and medical microbiology that weakens both disciplines. Anthropocentric microbiology also fails to properly answer questions concerning the evolution of microbial pathogenesis. Here, I argue that an exclusively biocentric microbiology is imperative for improving our understanding not only of the microbial world, but also of our own species, our guts, and the world around us.
Collapse
Affiliation(s)
- Ramy Karam Aziz
- Department of Microbiology and Immunology, Faculty of Pharmacy, Cairo University, 11562 Cairo, Egypt.
| |
Collapse
|
2340
|
Vega Thurber R, Willner-Hall D, Rodriguez-Mueller B, Desnues C, Edwards RA, Angly F, Dinsdale E, Kelly L, Rohwer F. Metagenomic analysis of stressed coral holobionts. Environ Microbiol 2009; 11:2148-63. [PMID: 19397678 DOI: 10.1111/j.1462-2920.2009.01935.x] [Citation(s) in RCA: 352] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
2341
|
Gomez-Alvarez V, Teal TK, Schmidt TM. Systematic artifacts in metagenomes from complex microbial communities. ISME JOURNAL 2009; 3:1314-7. [PMID: 19587772 DOI: 10.1038/ismej.2009.72] [Citation(s) in RCA: 329] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Metagenomics is providing an unprecedented view of the taxonomic diversity, metabolic potential and ecological role of microbial communities in biomes as diverse as the mammalian gastrointestinal tract, the marine water column and soils. However, we have found a systematic error in metagenomes generated by 454-based pyrosequencing that leads to an overestimation of gene and taxon abundance; between 11% and 35% of sequences in a typical metagenome are artificial replicates. Here we document the error in several published and original datasets and offer a web-based solution (http://microbiomes.msu.edu/replicates) for identifying and removing these artifacts.
Collapse
Affiliation(s)
- Vicente Gomez-Alvarez
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI, USA
| | | | | |
Collapse
|
2342
|
Hamady M, Knight R. Microbial community profiling for human microbiome projects: Tools, techniques, and challenges. Genome Res 2009; 19:1141-52. [PMID: 19383763 PMCID: PMC3776646 DOI: 10.1101/gr.085464.108] [Citation(s) in RCA: 653] [Impact Index Per Article: 43.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
High-throughput sequencing studies and new software tools are revolutionizing microbial community analyses, yet the variety of experimental and computational methods can be daunting. In this review, we discuss some of the different approaches to community profiling, highlighting strengths and weaknesses of various experimental approaches, sequencing methodologies, and analytical methods. We also address one key question emerging from various Human Microbiome Projects: Is there a substantial core of abundant organisms or lineages that we all share? It appears that in some human body habitats, such as the hand and the gut, the diversity among individuals is so great that we can rule out the possibility that any species is at high abundance in all individuals: It is possible that the focus should instead be on higher-level taxa or on functional genes instead.
Collapse
Affiliation(s)
- Micah Hamady
- Department of Computer Science, University of Colorado, Boulder, Colorado 80309, USA
| | - Rob Knight
- Department of Chemistry and Biochemistry, University of Colorado, Boulder, Colorado 80309, USA
| |
Collapse
|
2343
|
Abstract
Numerically, microbial species dominate the oceans, yet their population dynamics, metabolic complexity and synergistic interactions remain largely uncharted. A full understanding of life in the ocean requires more than knowledge of marine microbial taxa and their genome sequences. The latest experimental techniques and analytical approaches can provide a fresh perspective on the biological interactions within marine ecosystems, aiding in the construction of predictive models that can interrelate microbial dynamics with the biogeochemical matter and energy fluxes that make up the ocean ecosystem.
Collapse
|
2344
|
Abstract
BACKGROUND Metagenomics is the study of the genomic content of an environmental sample of microbes. Advances in the through-put and cost-efficiency of sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets being generated. Bioinformatics is faced with the problem of how to handle and analyze these datasets in an efficient and useful way. One goal of these metagenomic studies is to get a basic understanding of the microbial world both surrounding us and within us. One major challenge is how to compare multiple datasets. Furthermore, there is a need for bioinformatics tools that can process many large datasets and are easy to use. RESULTS This article describes two new and helpful techniques for comparing multiple metagenomic datasets. The first is a visualization technique for multiple datasets and the second is a new statistical method for highlighting the differences in a pairwise comparison. We have developed implementations of both methods that are suitable for very large datasets and provide these in Version 3 of our standalone metagenome analysis tool MEGAN. CONCLUSION These new methods are suitable for the visual comparison of many large metagenomes and the statistical comparison of two metagenomes at a time. Nevertheless, more work needs to be done to support the comparative analysis of multiple metagenome datasets. AVAILABILITY Version 3 of MEGAN, which implements all ideas presented in this article, can be obtained from our web site at: www-ab.informatik.uni-tuebingen.de/software/megan. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Suparna Mitra
- Center for Bioinformatics ZBIT, Tübingen University, Sand 14, 72076 Tübingen, Germany.
| | | | | |
Collapse
|
2345
|
White JR, Nagarajan N, Pop M. Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol 2009; 5:e1000352. [PMID: 19360128 PMCID: PMC2661018 DOI: 10.1371/journal.pcbi.1000352] [Citation(s) in RCA: 1112] [Impact Index Per Article: 74.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2008] [Accepted: 03/09/2009] [Indexed: 12/18/2022] Open
Abstract
Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied to digital gene expression studies (e.g. SAGE). A web server implementation of our methods and freely available source code can be found at http://metastats.cbcb.umd.edu/.
Collapse
Affiliation(s)
- James Robert White
- Applied Mathematics and Scientific Computation Program, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Niranjan Nagarajan
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Mihai Pop
- Department of Computer Science, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
- * E-mail:
| |
Collapse
|
2346
|
Rokas A, Abbot P. Harnessing genomics for evolutionary insights. Trends Ecol Evol 2009; 24:192-200. [PMID: 19201503 DOI: 10.1016/j.tree.2008.11.004] [Citation(s) in RCA: 116] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2008] [Revised: 11/07/2008] [Accepted: 11/10/2008] [Indexed: 11/25/2022]
Abstract
Next-generation DNA sequencing technologies can generate unprecedented amounts of genomic data, even for non-model organisms. Here we describe how these new technologies have facilitated recent key advances in ecology and evolutionary biology, and highlight several outstanding ecological and evolutionary questions that are distinctly suited to the innovations they provide. Importantly, using these technologies to their full potential requires careful experimental design and critical consideration of several caveats associated with them. Although several significant challenges remain to be resolved before the integration of next-generation sequencing technologies into single-investigator research programs, we argue that they will soon transform ecology and evolution by fundamentally changing the ranges and types of questions that can be addressed.
Collapse
Affiliation(s)
- Antonis Rokas
- Department of Biological Sciences, Vanderbilt University, VU Station B 35-1634, Nashville, TN 37235, USA.
| | | |
Collapse
|
2347
|
Wooley JC, Ye Y. Metagenomics: Facts and Artifacts, and Computational Challenges*. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 2009; 25:71-81. [PMID: 20648230 PMCID: PMC2905821 DOI: 10.1007/s11390-010-9306-4] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
Metagenomics is the study of microbial communities sampled directly from their natural environment, without prior culturing. By enabling an analysis of populations including many (so-far) unculturable and often unknown microbes, metagenomics is revolutionizing the field of microbiology, and has excited researchers in many disciplines that could benefit from the study of environmental microbes, including those in ecology, environmental sciences, and biomedicine. Specific computational and statistical tools have been developed for metagenomic data analysis and comparison. New studies, however, have revealed various kinds of artifacts present in metagenomics data caused by limitations in the experimental protocols and/or inadequate data analysis procedures, which often lead to incorrect conclusions about a microbial community. Here, we review some of the artifacts, such as overestimation of species diversity and incorrect estimation of gene family frequencies, and discuss emerging computational approaches to address them. We also review potential challenges that metagenomics may encounter with the extensive application of next-generation sequencing (NGS) techniques.
Collapse
Affiliation(s)
- John C. Wooley
- Center for Research on BioSystems, Calit2, UC San Diego, La Jolla CA 92093
| | - Yuzhen Ye
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, 47408
| |
Collapse
|
2348
|
White AK, Smith RJ, Bigler CR, Brooke WF, Schauer PR. Head and neck manifestations of neurofibromatosis. Laryngoscope 1986; 47:75-85. [PMID: 3088347 DOI: 10.1249/jes.0000000000000183] [Citation(s) in RCA: 248] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Neurofibromatosis is a neurocutaneous systemic disease that occurs in 1:2500 to 3300 live births. Prevalence figures have shown it to be as common as cystic fibrosis or Down's syndrome and more than twice as common as muscular dystrophy. In this study, our experience with 257 cases of neurofibromatosis seen since 1972 is reviewed. Intracranial, bony, and extracranial anomalies are described in the 223 patients (87%) who presented with, or ultimately developed, head and neck manifestations of the disease. The most common intracranial tumor was optic glioma, found in 35 patients (14%), 19 younger than 10 years of age. Acoustic neuromas were diagnosed in eight individuals (3%) and were bilateral in three. The most common skull anomaly was macrocephaly, noted 78 times (30%). Absence of the sphenoid wing occurred in 11 patients (4%) and 19 others (7%) had facial asymmetry due to other skull abnormalities. Extracranial manifestations included neurofibromas of the plexiform and nonplexiform type, Lisch nodules, and cafe-au-lait spots.
Collapse
|