1
|
Peeters J, Bot DM, Rovelo Ruiz G, Aerts J. Snowflake: visualizing microbiome abundance tables as multivariate bipartite graphs. FRONTIERS IN BIOINFORMATICS 2024; 4:1331043. [PMID: 38375239 PMCID: PMC10875061 DOI: 10.3389/fbinf.2024.1331043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/23/2024] [Indexed: 02/21/2024] Open
Abstract
Current visualizations in microbiome research rely on aggregations in taxonomic classifications or do not show less abundant taxa. We introduce Snowflake: a new visualization method that creates a clear overview of the microbiome composition in collected samples without losing any information due to classification or neglecting less abundant reads. Snowflake displays every observed OTU/ASV in the microbiome abundance table and provides a solution to include the data's hierarchical structure and additional information obtained from downstream analysis (e.g., alpha- and beta-diversity) and metadata. Based on the value-driven ICE-T evaluation methodology, Snowflake was positively received. Experts in microbiome research found the visualizations to be user-friendly and detailed and liked the possibility of including and relating additional information to the microbiome's composition. Exploring the topological structure of the microbiome abundance table allows them to quickly identify which taxa are unique to specific samples and which are shared among multiple samples (i.e., separating sample-specific taxa from the core microbiome), and see the compositional differences between samples. An R package for constructing and visualizing Snowflake microbiome composition graphs is available at https://gitlab.com/vda-lab/snowflake.
Collapse
Affiliation(s)
- Jannes Peeters
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
| | - Daniël M. Bot
- Data Science Institute, Hasselt University, Diepenbeek, Belgium
| | - Gustavo Rovelo Ruiz
- Expertise Center for Digital Media, Hasselt University—Flanders Make, Diepenbeek, Belgium
| | - Jan Aerts
- Visual Data Analysis Lab, Department of Biosystems, KU Leuven, Leuven, Belgium
| |
Collapse
|
2
|
Comparison of Metagenomics and Metatranscriptomics Tools: A Guide to Making the Right Choice. Genes (Basel) 2022; 13:genes13122280. [PMID: 36553546 PMCID: PMC9777648 DOI: 10.3390/genes13122280] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2022] [Revised: 11/28/2022] [Accepted: 12/01/2022] [Indexed: 12/09/2022] Open
Abstract
The study of microorganisms is a field of great interest due to their environmental (e.g., soil contamination) and biomedical (e.g., parasitic diseases, autism) importance. The advent of revolutionary next-generation sequencing techniques, and their application to the hypervariable regions of the 16S, 18S or 23S ribosomal subunits, have allowed the research of a large variety of organisms more in-depth, including bacteria, archaea, eukaryotes and fungi. Additionally, together with the development of analysis software, the creation of specific databases (e.g., SILVA or RDP) has boosted the enormous growth of these studies. As the cost of sequencing per sample has continuously decreased, new protocols have also emerged, such as shotgun sequencing, which allows the profiling of all taxonomic domains in a sample. The sequencing of hypervariable regions and shotgun sequencing are technologies that enable the taxonomic classification of microorganisms from the DNA present in microbial communities. However, they are not capable of measuring what is actively expressed. Conversely, we advocate that metatranscriptomics is a "new" technology that makes the identification of the mRNAs of a microbial community possible, quantifying gene expression levels and active biological pathways. Furthermore, it can be also used to characterise symbiotic interactions between the host and its microbiome. In this manuscript, we examine the three technologies above, and discuss the implementation of different software and databases, which greatly impact the obtaining of reliable results. Finally, we have developed two easy-to-use pipelines leveraging Nextflow technology. These aim to provide everything required for an average user to perform a metagenomic analysis of marker genes with QIMME2 and a metatranscriptomic study using Kraken2/Bracken.
Collapse
|
3
|
Vainberg-Slutskin I, Kowalsman N, Silberberg Y, Cohen T, Gold J, Kario E, Weiner I, Gahali-Sass I, Kredo-Russo S, Zak NB, Bassan M. OUP accepted manuscript. Bioinformatics 2022; 38:3288-3290. [PMID: 35551337 PMCID: PMC9191209 DOI: 10.1093/bioinformatics/btac319] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2022] [Revised: 04/06/2022] [Accepted: 05/06/2022] [Indexed: 11/14/2022] Open
Abstract
Summary Next-Generation Sequencing is widely used as a tool for identifying and quantifying microorganisms pooled together in either natural or designed samples. However, a prominent obstacle is achieving correct quantification when the pooled microbes are genetically related. In such cases, the outcome mostly depends on the method used for assigning reads to the individual targets. To address this challenge, we have developed Exodus—a reference-based Python algorithm for quantification of genomes, including those that are highly similar, when they are sequenced together in a single mix. To test Exodus’ performance, we generated both empirical and in silico next-generation sequencing data of mixed genomes. When applying Exodus to these data, we observed median error rates varying between 0% and 0.21% as a function of the complexity of the mix. Importantly, no false negatives were recorded, demonstrating that Exodus’ likelihood of missing an existing genome is very low, even if the genome’s relative abundance is low and similar genomes are present in the same mix. Taken together, these data position Exodus as a reliable tool for identifying and quantifying genomes in mixed samples. Exodus is open source and free to use at: https://github.com/ilyavs/exodus. Availability and implementation Exodus is implemented in Python within a Snakemake framework. It is available on GitHub alongside a docker containing the required dependencies: https://github.com/ilyavs/exodus. The data underlying this article will be shared on reasonable request to the corresponding author. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - Tal Cohen
- BiomX Ltd., Ness Ziona 7414002, Israel
| | | | | | | | | | | | | | | |
Collapse
|
4
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
5
|
Zhao H, Wang S, Yuan X. Detection of Pathogenic Microbe Composition Using Next-Generation Sequencing Data. Front Genet 2020; 11:603093. [PMID: 33329748 PMCID: PMC7734255 DOI: 10.3389/fgene.2020.603093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2020] [Accepted: 10/21/2020] [Indexed: 11/23/2022] Open
Abstract
Next-generation sequencing (NGS) technologies have provided great opportunities to analyze pathogenic microbes with high-resolution data. The main goal is to accurately detect microbial composition and abundances in a sample. However, high similarity among sequences from different species and the existence of sequencing errors pose various challenges. Numerous methods have been developed for quantifying microbial composition and abundance, but they are not versatile enough for the analysis of samples with mixtures of noise. In this paper, we propose a new computational method, PGMicroD, for the detection of pathogenic microbial composition in a sample using NGS data. The method first filters the potentially mistakenly mapped reads and extracts multiple species-related features from the sequencing reads of 16S rRNA. Then it trains an Support Vector Machine classifier to predict the microbial composition. Finally, it groups all multiple-mapped sequencing reads into the references of the predicted species to estimate the abundance for each kind of species. The performance of PGMicroD is evaluated based on both simulation and real sequencing data and is compared with several existing methods. The results demonstrate that our proposed method achieves superior performance. The software package of PGMicroD is available at https://github.com/BDanalysis/PGMicroD.
Collapse
Affiliation(s)
- Haiyong Zhao
- School of Computer Science and Technology, Liaocheng University, Liaocheng, China.,School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Shuang Wang
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Xiguo Yuan
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
6
|
Flavodoxins as Novel Therapeutic Targets against Helicobacter pylori and Other Gastric Pathogens. Int J Mol Sci 2020; 21:ijms21051881. [PMID: 32164177 PMCID: PMC7084853 DOI: 10.3390/ijms21051881] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2020] [Revised: 03/04/2020] [Accepted: 03/06/2020] [Indexed: 02/06/2023] Open
Abstract
Flavodoxins are small soluble electron transfer proteins widely present in bacteria and absent in vertebrates. Flavodoxins participate in different metabolic pathways and, in some bacteria, they have been shown to be essential proteins representing promising therapeutic targets to fight bacterial infections. Using purified flavodoxin and chemical libraries, leads can be identified that block flavodoxin function and act as bactericidal molecules, as it has been demonstrated for Helicobacter pylori (Hp), the most prevalent human gastric pathogen. Increasing antimicrobial resistance by this bacterium has led current therapies to lose effectiveness, so alternative treatments are urgently required. Here, we summarize, with a focus on flavodoxin, opportunities for pharmacological intervention offered by the potential protein targets described for this bacterium and provide information on other gastrointestinal pathogens and also on bacteria from the gut microbiota that contain flavodoxin. The process of discovery and development of novel antimicrobials specific for Hp flavodoxin that is being carried out in our group is explained, as it can be extrapolated to the discovery of inhibitors specific for other gastric pathogens. The high specificity for Hp of the antimicrobials developed may be of help to reduce damage to the gut microbiota and to slow down the development of resistant Hp mutants.
Collapse
|
7
|
Nalbantoglu OU, Sayood K. MIMOSA: Algorithms for Microbial Profiling. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:2023-2034. [PMID: 29994027 DOI: 10.1109/tcbb.2018.2830324] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
A significant goal of the study of metagenomes obtained from an environment is to find the microbial diversity and the abundance of each organism in the community. Phylotyping and binning methods which address this problem generally operate using either marker sequences or by classifying each genome fragment individually. However, these approaches might not use all the information contained in the metagenome. We propose an approach based on a Multiple Input Multiple Output (MIMO) communication system model. Results from two different implementations of this approach, one using DNA-DNA hybridization simulations and one using short read mapping are evaluated using simulated and actual metagenomes and compared with other methods of phylotyping. The proposed approaches generally performed better under different scenarios including pathogen detection tasks of community complexity and low and high sequencing coverage while being highly computationally effective. The resulting framework can be integrated to metagenome analysis pipelines for phylogenetic diversity estimation. The approach is modular so that techniques other than hybridization simulations and short read mapping may be integrated. We have observed that even for low coverage samples, the method provides accurate estimates. Therefore, the use of the proposed strategy could enable the task of exploring biodiversity with limited resources.
Collapse
|
8
|
Thissen JB, Isshiki M, Jaing C, Nagao Y, Lebron Aldea D, Allen JE, Izui M, Slezak TR, Ishida T, Sano T. A novel variant of torque teno virus 7 identified in patients with Kawasaki disease. PLoS One 2018; 13:e0209683. [PMID: 30592753 PMCID: PMC6310298 DOI: 10.1371/journal.pone.0209683] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2018] [Accepted: 12/09/2018] [Indexed: 11/18/2022] Open
Abstract
Kawasaki disease (KD), first identified in 1967, is a pediatric vasculitis of unknown etiology that has an increasing incidence in Japan and many other countries. KD can cause coronary artery aneurysms. Its epidemiological characteristics, such as seasonality and clinical picture of acute systemic inflammation with prodromal intestinal/respiratory symptoms, suggest an infectious etiology for KD. Interestingly, multiple host genotypes have been identified as predisposing factors for KD. To explore experimental methodology for identifying etiological agent(s) for KD and to optimize epidemiological study design (particularly the sample size) for future studies, we conducted a pilot study. For a 1-year period, we prospectively enrolled 11 patients with KD. To each KD patient, we assigned two control individuals (one with diarrhea and the other with respiratory infections), matched for age, sex, and season of diagnosis. During the acute phase of disease, we collected peripheral blood, nasopharyngeal aspirate, and feces. We also determined genotypes, to identify those that confer susceptibility to KD. There was no statistically significant difference in the frequency of the risk genotypes between KD patients and control subjects. We also used unbiased metagenomic sequencing to analyze these samples. Metagenomic sequencing and PCR detected torque teno virus 7 (TTV7) in two patients with KD (18%), but not in control subjects (P = 0.111). Sanger sequencing revealed that the TTV7 found in the two KD patients contained almost identical variants in nucleotide and identical changes in resulting amino acid, relative to the reference sequence. Additionally, we estimated the sample size that would be required to demonstrate a statistical correlation between TTV7 and KD. Future larger scale studies with carefully optimized metagenomic sequencing experiments and adequate sample size are warranted to further examine the association between KD and potential pathogens, including TTV7.
Collapse
Affiliation(s)
- James B. Thissen
- Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Mariko Isshiki
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Crystal Jaing
- Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Yoshiro Nagao
- Department of Pediatrics, Japan Community Health Care Organization Osaka Hospital, Osaka, Japan
| | - Dayanara Lebron Aldea
- Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Jonathan E. Allen
- Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Masafumi Izui
- Department of Pediatrics, Japan Community Health Care Organization Osaka Hospital, Osaka, Japan
| | - Thomas R. Slezak
- Lawrence Livermore National Laboratory, Livermore, California, United States of America
| | - Takafumi Ishida
- Department of Biological Sciences, Graduate School of Science, University of Tokyo, Tokyo, Japan
| | - Tetsuya Sano
- Department of Pediatrics, Japan Community Health Care Organization Osaka Hospital, Osaka, Japan
| |
Collapse
|
9
|
Yao Y, Jin Z, Lee JH. An improved statistical model for taxonomic assignment of metagenomics. BMC Genet 2018; 19:98. [PMID: 30373533 PMCID: PMC6206629 DOI: 10.1186/s12863-018-0680-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 10/02/2018] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND With the advances in the next-generation sequencing technologies, researchers can now rapidly examine the composition of samples from humans and their surroundings. To enhance the accuracy of taxonomy assignments in metagenomic samples, we developed a method that allows multiple mismatch probabilities from different genomes. RESULTS We extended the algorithm of taxonomic assignment of metagenomic sequence reads (TAMER) by developing an improved method that can set a different mismatch probability for each genome rather than imposing a single parameter for all genomes, thereby obtaining a greater degree of accuracy. This method, which we call TADIP (Taxonomic Assignment of metagenomics based on DIfferent Probabilities), was comprehensively tested in simulated and real datasets. The results support that TADIP improved the performance of TAMER especially in large sample size datasets with high complexity. CONCLUSIONS TADIP was developed as a statistical model to improve the estimate accuracy of taxonomy assignments. Based on its varying mismatch probability setting and correlated variance matrix setting, its performance was enhanced for high complexity samples when compared with TAMER.
Collapse
Affiliation(s)
- Yujing Yao
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Zhezhen Jin
- Department of Biostatistics, Columbia University, New York, NY, USA
| | - Joseph H Lee
- Sergievsky Center, Taub Institute, and Departments of Epidemiology and Neurology, Columbia University, New York, NY, USA. .,Sergievsky Center, Columbia University, 630 West 168th Street, P&S Unit 16, New York, NY, 10032, USA.
| |
Collapse
|
10
|
Guajardo-Leiva S, Pedrós-Alió C, Salgado O, Pinto F, Díez B. Active Crossfire Between Cyanobacteria and Cyanophages in Phototrophic Mat Communities Within Hot Springs. Front Microbiol 2018; 9:2039. [PMID: 30233525 PMCID: PMC6129581 DOI: 10.3389/fmicb.2018.02039] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Accepted: 08/13/2018] [Indexed: 01/16/2023] Open
Abstract
Cyanophages are viruses with a wide distribution in aquatic ecosystems, that specifically infect Cyanobacteria. These viruses can be readily isolated from marine and fresh waters environments; however, their presence in cosmopolitan thermophilic phototrophic mats remains largely unknown. This study investigates the morphological diversity (TEM), taxonomic composition (metagenomics), and active infectivity (metatranscriptomics) of viral communities over a thermal gradient in hot spring phototrophic mats from Northern Patagonia (Chile). The mats were dominated (up to 53%) by cosmopolitan thermophilic filamentous true-branching cyanobacteria from the genus Mastigocladus, the associated viral community was predominantly composed of Caudovirales (70%), with most of the active infections driven by cyanophages (up to 90% of Caudovirales transcripts). Metagenomic assembly lead to the first full genome description of a T7-like Thermophilic Cyanophage recovered from a hot spring (Porcelana Hot Spring, Chile), with a temperature of 58°C (TC-CHP58). This could potentially represent a world-wide thermophilic lineage of podoviruses that infect cyanobacteria. In the hot spring, TC-CHP58 was active over a temperature gradient from 48 to 66°C, showing a high population variability represented by 1979 single nucleotide variants (SNVs). TC-CHP58 was associated to the Mastigocladus spp. by CRISPR spacers. Marked differences in metagenomic CRISPR loci number and spacers diversity, as well as SNVs, in the TC-CHP58 proto-spacers at different temperatures, reinforce the theory of co-evolution between natural virus populations and cyanobacterial hosts. Considering the importance of cyanobacteria in hot spring biogeochemical cycles, the description of this new cyanopodovirus lineage may have global implications for the functioning of these extreme ecosystems.
Collapse
Affiliation(s)
- Sergio Guajardo-Leiva
- Department of Molecular Genetics and Microbiology, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Carlos Pedrós-Alió
- Programa de Biología de Sistemas, Centro Nacional de Biotecnología - Consejo Superior de Investigaciones Científicas, Madrid, Spain
| | - Oscar Salgado
- Department of Molecular Genetics and Microbiology, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Fabián Pinto
- Department of Molecular Genetics and Microbiology, Pontificia Universidad Católica de Chile, Santiago, Chile
| | - Beatriz Díez
- Department of Molecular Genetics and Microbiology, Pontificia Universidad Católica de Chile, Santiago, Chile.,Center for Climate and Resilience Research, Santiago, Chile
| |
Collapse
|
11
|
Reppell M, Novembre J. Using pseudoalignment and base quality to accurately quantify microbial community composition. PLoS Comput Biol 2018; 14:e1006096. [PMID: 29659582 PMCID: PMC5945057 DOI: 10.1371/journal.pcbi.1006096] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 05/10/2018] [Accepted: 03/19/2018] [Indexed: 12/31/2022] Open
Abstract
Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.
Collapse
Affiliation(s)
- Mark Reppell
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| | - John Novembre
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
12
|
Krehenwinkel H, Wolf M, Lim JY, Rominger AJ, Simison WB, Gillespie RG. Estimating and mitigating amplification bias in qualitative and quantitative arthropod metabarcoding. Sci Rep 2017; 7:17668. [PMID: 29247210 PMCID: PMC5732254 DOI: 10.1038/s41598-017-17333-x] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 11/16/2017] [Indexed: 11/09/2022] Open
Abstract
Amplicon based metabarcoding promises rapid and cost-efficient analyses of species composition. However, it is disputed whether abundance estimates can be derived from metabarcoding due to taxon specific PCR amplification biases. PCR-free approaches have been suggested to mitigate this problem, but come with considerable increases in workload and cost. Here, we analyze multilocus datasets of diverse arthropod communities, to evaluate whether amplification bias can be countered by (1) targeting loci with highly degenerate primers or conserved priming sites, (2) increasing PCR template concentration, (3) reducing PCR cycle number or (4) avoiding locus specific amplification by directly sequencing genomic DNA. Amplification bias is reduced considerably by degenerate primers or targeting amplicons with conserved priming sites. Surprisingly, a reduction of PCR cycles did not have a strong effect on amplification bias. The association of taxon abundance and read count was actually less predictable with fewer cycles. Even a complete exclusion of locus specific amplification did not exclude bias. Copy number variation of the target loci may be another explanation for read abundance differences between taxa, which would affect amplicon based and PCR free methods alike. As read abundance biases are taxon specific and predictable, the application of correction factors allows abundance estimates.
Collapse
Affiliation(s)
- Henrik Krehenwinkel
- Department of Environmental Sciences, Policy and Management University of California Berkeley Mulford Hall, Berkeley, California, USA.
- Center for Comparative Genomics California Academy of Sciences Music Concourse Drive, San Francisco, California, USA.
| | - Madeline Wolf
- Department of Environmental Sciences, Policy and Management University of California Berkeley Mulford Hall, Berkeley, California, USA
| | - Jun Ying Lim
- Department of Environmental Sciences, Policy and Management University of California Berkeley Mulford Hall, Berkeley, California, USA
| | - Andrew J Rominger
- Department of Environmental Sciences, Policy and Management University of California Berkeley Mulford Hall, Berkeley, California, USA
| | - Warren B Simison
- Center for Comparative Genomics California Academy of Sciences Music Concourse Drive, San Francisco, California, USA
| | - Rosemary G Gillespie
- Department of Environmental Sciences, Policy and Management University of California Berkeley Mulford Hall, Berkeley, California, USA
| |
Collapse
|
13
|
Brittnacher MJ, Heltshe SL, Hayden HS, Radey MC, Weiss EJ, Damman CJ, Zisman TL, Suskind DL, Miller SI. GUTSS: An Alignment-Free Sequence Comparison Method for Use in Human Intestinal Microbiome and Fecal Microbiota Transplantation Analysis. PLoS One 2016; 11:e0158897. [PMID: 27391011 PMCID: PMC4938407 DOI: 10.1371/journal.pone.0158897] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2016] [Accepted: 06/23/2016] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Comparative analysis of gut microbiomes in clinical studies of human diseases typically rely on identification and quantification of species or genes. In addition to exploring specific functional characteristics of the microbiome and potential significance of species diversity or expansion, microbiome similarity is also calculated to study change in response to therapies directed at altering the microbiome. Established ecological measures of similarity can be constructed from species abundances, however methods for calculating these commonly used ecological measures of similarity directly from whole genome shotgun (WGS) metagenomic sequence are lacking. RESULTS We present an alignment-free method for calculating similarity of WGS metagenomic sequences that is analogous to the Bray-Curtis index for species, implemented by the General Utility for Testing Sequence Similarity (GUTSS) software application. This method was applied to intestinal microbiomes of healthy young children to measure developmental changes toward an adult microbiome during the first 3 years of life. We also calculate similarity of donor and recipient microbiomes to measure establishment, or engraftment, of donor microbiota in fecal microbiota transplantation (FMT) studies focused on mild to moderate Crohn's disease. We show how a relative index of similarity to donor can be calculated as a measure of change in a patient's microbiome toward that of the donor in response to FMT. CONCLUSION Because clinical efficacy of the transplant procedure cannot be fully evaluated without analysis methods to quantify actual FMT engraftment, we developed a method for detecting change in the gut microbiome that is independent of species identification and database bias, sensitive to changes in relative abundance of the microbial constituents, and can be formulated as an index for correlating engraftment success with clinical measures of disease. More generally, this method may be applied to clinical evaluation of human microbiomes and provide potential diagnostic determination of individuals who may be candidates for specific therapies directed at alteration of the microbiome.
Collapse
Affiliation(s)
- Mitchell J. Brittnacher
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Sonya L. Heltshe
- Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
- Seattle Children's Research Institute, Seattle, Washington, United States of America
| | - Hillary S. Hayden
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Matthew C. Radey
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Eli J. Weiss
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
| | - Christopher J. Damman
- Division of Gastroenterology, University of Washington, Seattle, Washington, United States of America
| | - Timothy L. Zisman
- Division of Gastroenterology, University of Washington, Seattle, Washington, United States of America
| | - David L. Suskind
- Department of Pediatrics, University of Washington, Seattle, Washington, United States of America
- Seattle Children’s Hospital, Seattle, Washington, United States of America
| | - Samuel I. Miller
- Department of Microbiology, University of Washington, Seattle, Washington, United States of America
- Department of Medicine, University of Washington, Seattle, Washington, United States of America
- Department of Immunology, University of Washington, Seattle, Washington, United States of America
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
14
|
Sohn MB, Du R, An L. A robust approach for identifying differentially abundant features in metagenomic samples. Bioinformatics 2015; 31:2269-75. [PMID: 25792553 DOI: 10.1093/bioinformatics/btv165] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2014] [Accepted: 03/16/2015] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales are not properly normalized within and between communities, prior to the analysis of differential abundance. Normalization methods used in the differential analysis typically try to adjust counts on different scales to a common scale using the total sum, mean or median of representative features across all samples. These methods often yield undesirable results when the difference in total counts of differentially abundant features (DAFs) across different conditions is large. RESULTS We develop a novel method, Ratio Approach for Identifying Differential Abundance (RAIDA), which utilizes the ratio between features in a modified zero-inflated lognormal model. RAIDA removes possible problems associated with counts on different scales within and between conditions. As a result, its performance is not affected by the amount of difference in total abundances of DAFs across different conditions. Through comprehensive simulation studies, the performance of our method is consistently powerful, and under some situations, RAIDA greatly surpasses other existing methods. We also apply RAIDA on real datasets of type II diabetes and find interesting results consistent with previous reports. AVAILABILITY AND IMPLEMENTATION An R package for RAIDA can be accessed from http://cals.arizona.edu/%7Eanling/sbg/software.htm.
Collapse
Affiliation(s)
| | - Ruofei Du
- Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
| | - Lingling An
- Interdisciplinary Program in Statistics and Department of Agricultural and Biosystems Engineering, University of Arizona, Tucson, AZ 85721, USA
| |
Collapse
|
15
|
An L, Pookhao N, Jiang H, Xu J. Statistical approach of functional profiling for a microbial community. PLoS One 2014; 9:e106588. [PMID: 25198674 PMCID: PMC4157783 DOI: 10.1371/journal.pone.0106588] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 07/31/2014] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Metagenomics is a relatively new but fast growing field within environmental biology and medical sciences. It enables researchers to understand the diversity of microbes, their functions, cooperation, and evolution in a particular ecosystem. Traditional methods in genomics and microbiology are not efficient in capturing the structure of the microbial community in an environment. Nowadays, high-throughput next-generation sequencing technologies are powerfully driving the metagenomic studies. However, there is an urgent need to develop efficient statistical methods and computational algorithms to rapidly analyze the massive metagenomic short sequencing data and to accurately detect the features/functions present in the microbial community. Although several issues about functions of metagenomes at pathways or subsystems level have been investigated, there is a lack of studies focusing on functional analysis at a low level of a hierarchical functional tree, such as SEED subsystem tree. RESULTS A two-step statistical procedure (metaFunction) is proposed to detect all possible functional roles at the low level from a metagenomic sample/community. In the first step a statistical mixture model is proposed at the base of gene codons to estimate the abundances for the candidate functional roles, with sequencing error being considered. As a gene could be involved in multiple biological processes the functional assignment is therefore adjusted by utilizing an error distribution in the second step. The performance of the proposed procedure is evaluated through comprehensive simulation studies. Compared with other existing methods in metagenomic functional analysis the new approach is more accurate in assigning reads to functional roles, and therefore at more general levels. The method is also employed to analyze two real data sets. CONCLUSIONS metaFunction is a powerful tool in accurate profiling functions in a metagenomic sample.
Collapse
Affiliation(s)
- Lingling An
- Department of Agricultural & Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
- Interdisciplinary Programs in Statistics, University of Arizona, Tucson, Arizona, United States of America
| | - Nauromal Pookhao
- Department of Agricultural & Biosystems Engineering, University of Arizona, Tucson, Arizona, United States of America
| | - Hongmei Jiang
- Department of Statistics, Northwestern University, Evanston, Illinois, United States of America
| | - Jiannong Xu
- Department of Biology, New Mexico State University, Las Cruces, New Mexico, United States of America
| |
Collapse
|