1
|
Zhou B, Wang C, Putzel G, Hu J, Liu M, Wu F, Chen Y, Pironti A, Li H. An integrated strain-level analytic pipeline utilizing longitudinal metagenomic data. Microbiol Spectr 2024; 12:e0143124. [PMID: 39311770 PMCID: PMC11542597 DOI: 10.1128/spectrum.01431-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/28/2024] [Indexed: 11/08/2024] Open
Abstract
With the development of sequencing technology and analytic tools, studying within-species variations enhances the understanding of microbial biological processes. Nevertheless, most existing methods designed for strain-level analysis lack the capability to concurrently assess both strain proportions and genome-wide single nucleotide variants (SNVs) across longitudinal metagenomic samples. In this study, we introduce LongStrain, an integrated pipeline for the analysis of large-scale metagenomic data from individuals with longitudinal or repeated samples. In LongStrain, we first utilize two efficient tools, Kraken2 and Bowtie2, for the taxonomic classification and alignment of sequencing reads, respectively. Subsequently, we propose to jointly model strain proportions and shared haplotypes across samples within individuals. This approach specifically targets tracking a primary strain and a secondary strain for each subject, providing their respective proportions and SNVs as output. With extensive simulation studies of a microbial community and single species, our results demonstrate that LongStrain is superior to two genotyping methods and two deconvolution methods across a majority of scenarios. Furthermore, we illustrate the potential applications of LongStrain in the real data analysis of The Environmental Determinants of Diabetes in the Young study and a gastric intestinal metaplasia microbiome study. In summary, the proposed analytic pipeline demonstrates marked statistical efficiency over the same type of methods and has great potential in understanding the genomic variants and dynamic changes at strain level. LongStrain and its tutorial are freely available online at https://github.com/BoyanZhou/LongStrain. IMPORTANCE The advancement in DNA-sequencing technology has enabled the high-resolution identification of microorganisms in microbial communities. Since different microbial strains within species may contain extreme phenotypic variability (e.g., nutrition metabolism, antibiotic resistance, and pathogen virulence), investigating within-species variations holds great scientific promise in understanding the underlying mechanism of microbial biological processes. To fully utilize the shared genomic variants across longitudinal metagenomics samples collected in microbiome studies, we develop an integrated analytic pipeline (LongStrain) for longitudinal metagenomics data. It concurrently leverages the information on proportions of mapped reads for individual strains and genome-wide SNVs to enhance the efficiency and accuracy of strain identification. Our method helps to understand strains' dynamic changes and their association with genome-wide variants. Given the fast-growing longitudinal studies of microbial communities, LongStrain which streamlines analyses of large-scale raw sequencing data should be of great value in microbiome research communities.
Collapse
Affiliation(s)
- Boyan Zhou
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Chan Wang
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Gregory Putzel
- Department of
Microbiology, New York University School of
Medicine, New York, New
York, USA
| | - Jiyuan Hu
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Menghan Liu
- Department of
Biological Sciences, Columbia University in the City of New
York, New York, New
York, USA
| | - Fen Wu
- Division of
Epidemiology, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Yu Chen
- Division of
Epidemiology, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Alejandro Pironti
- Department of
Microbiology, New York University School of
Medicine, New York, New
York, USA
| | - Huilin Li
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| |
Collapse
|
2
|
Sasikumar R, Saranya S, Lourdu Lincy L, Thamanna L, Chellapandi P. Genomic insights into fish pathogenic bacteria: A systems biology perspective for sustainable aquaculture. FISH & SHELLFISH IMMUNOLOGY 2024; 154:109978. [PMID: 39442738 DOI: 10.1016/j.fsi.2024.109978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/09/2024] [Revised: 10/12/2024] [Accepted: 10/20/2024] [Indexed: 10/25/2024]
Abstract
Fish diseases significantly challenge global aquaculture, causing substantial financial losses and impacting sustainability, trade, and socioeconomic conditions. Understanding microbial pathogenesis and virulence at the molecular level is crucial for disease prevention in commercial fish. This review provides genomic insights into fish pathogenic bacteria from a systems biology perspective, aiming to promote sustainable aquaculture. It covers the genomic characteristics of various fish pathogens and their industry impact. The review also explores the systems biology of zebrafish, fish bacterial pathogens, and probiotic bacteria, offering insights into fish production, potential vaccines, and therapeutic drugs. Genome-scale metabolic models aid in studying pathogenic bacteria, contributing to disease management and antimicrobial development. Researchers have also investigated probiotic strains to improve aquaculture health. Additionally, the review highlights bioinformatics resources for fish and fish pathogens, which are essential for researchers. Systems biology approaches enhance understanding of bacterial fish pathogens by revealing virulence factors and host interactions. Despite challenges from the adaptability and pathogenicity of bacterial infections, sustainable alternatives are necessary to meet seafood demand. This review underscores the potential of systems biology in understanding fish pathogen biology, improving production, and promoting sustainable aquaculture.
Collapse
Affiliation(s)
- R Sasikumar
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India
| | - S Saranya
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India
| | - L Lourdu Lincy
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India
| | - L Thamanna
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India
| | - P Chellapandi
- Industrial Systems Biology Lab, Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, 620024, Tamil Nadu, India.
| |
Collapse
|
3
|
Rocha U, Kasmanas JC, Toscan R, Sanches DS, Magnusdottir S, Saraiva JP. Simulation of 69 microbial communities indicates sequencing depth and false positives are major drivers of bias in prokaryotic metagenome-assembled genome recovery. PLoS Comput Biol 2024; 20:e1012530. [PMID: 39436938 PMCID: PMC11530072 DOI: 10.1371/journal.pcbi.1012530] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 11/01/2024] [Accepted: 10/01/2024] [Indexed: 10/25/2024] Open
Abstract
We hypothesize that sample species abundance, sequencing depth, and taxonomic relatedness influence the recovery of metagenome-assembled genomes (MAGs). To test this hypothesis, we assessed MAG recovery in three in silico microbial communities composed of 42 species with the same richness but different sample species abundance, sequencing depth, and taxonomic distribution profiles using three different pipelines for MAG recovery. The pipeline developed by Parks and colleagues (8K) generated the highest number of MAGs and the lowest number of true positives per community profile. The pipeline by Karst and colleagues (DT) showed the most accurate results (~ 92%), outperforming the 8K and Multi-Metagenome pipeline (MM) developed by Albertsen and collaborators. Sequencing depth influenced the accurate recovery of genomes when using the 8K and MM, even with contrasting patterns: the MM pipeline recovered more MAGs found in the original communities when employing sequencing depths up to 60 million reads, while the 8K recovered more true positives in communities sequenced above 60 million reads. DT showed the best species recovery from the same genus, even though close-related species have a low recovery rate in all pipelines. Our results highlight that more bins do not translate to the actual community composition and that sequencing depth plays a role in MAG recovery and increased community resolution. Even low MAG recovery error rates can significantly impact biological inferences. Our data indicates that the scientific community should curate their findings from MAG recovery, especially when asserting novel species or metabolic traits.
Collapse
Affiliation(s)
- Ulisses Rocha
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Jonas Coelho Kasmanas
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Rodolfo Toscan
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Danilo S. Sanches
- Department of Computer Science, Federal University of Technology—Paraná, UTFPR, Cornélio Procópio, Brazil
| | - Stefania Magnusdottir
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| | - Joao Pedro Saraiva
- Department of Applied Microbial Ecology, Helmholtz Center for Environmental Research-UFZ, Leipzig, Germany
| |
Collapse
|
4
|
Knobloch S, Salimi F, Buaya A, Ploch S, Thines M. RAPiD: a rapid and accurate plant pathogen identification pipeline for on-site nanopore sequencing. PeerJ 2024; 12:e17893. [PMID: 39346055 PMCID: PMC11438431 DOI: 10.7717/peerj.17893] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 07/19/2024] [Indexed: 10/01/2024] Open
Abstract
Nanopore sequencing technology has enabled the rapid, on-site taxonomic identification of samples from anything and anywhere. However, sequencing errors, inadequate databases, as well as the need for bioinformatic expertise and powerful computing resources, have hampered the widespread use of the technology for pathogen identification in the agricultural sector. Here we present RAPiD, a lightweight and accurate real-time taxonomic profiling pipeline. Compared to other metagenomic profilers, RAPiD had a higher classification precision achieved through the use of a curated, non-redundant database of common agricultural pathogens and extensive quality filtering of alignments. On a fungal, bacterial and mixed mock community RAPiD was the only pipeline to detect all members of the communities. We also present a protocol for in-field sample processing enabling pathogen identification from plant sample to sequence within 3 h using low-cost equipment. With sequencing costs continuing to decrease and more high-quality reference genomes becoming available, nanopore sequencing provides a viable method for rapid and accurate pathogen identification in the field. A web implementation of the RAPiD pipeline for real-time analysis is available at https://agrifuture.senckenberg.de.
Collapse
Affiliation(s)
- Stephen Knobloch
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Society for Nature Research, Frankfurt, Germany
- Department of Food Technology, Fulda University of Applied Sciences, Fulda, Germany
| | - Fatemeh Salimi
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Society for Nature Research, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
| | - Anthony Buaya
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Society for Nature Research, Frankfurt, Germany
| | - Sebastian Ploch
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Society for Nature Research, Frankfurt, Germany
| | - Marco Thines
- Senckenberg Biodiversity and Climate Research Centre, Senckenberg Society for Nature Research, Frankfurt, Germany
- LOEWE Centre for Translational Biodiversity Genomics, Frankfurt, Germany
- Department of Biological Sciences, Institute of Ecology, Evolution and Diversity, Goethe University Frankfurt, Frankfurt, Germany
| |
Collapse
|
5
|
Mather AE, Gilmour MW, Reid SWJ, French NP. Foodborne bacterial pathogens: genome-based approaches for enduring and emerging threats in a complex and changing world. Nat Rev Microbiol 2024; 22:543-555. [PMID: 38789668 DOI: 10.1038/s41579-024-01051-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2024] [Indexed: 05/26/2024]
Abstract
Foodborne illnesses pose a substantial health and economic burden, presenting challenges in prevention due to the diverse microbial hazards that can enter and spread within food systems. Various factors, including natural, political and commercial drivers, influence food production and distribution. The risks of foodborne illness will continue to evolve in step with these drivers and with changes to food systems. For example, climate impacts on water availability for agriculture, changes in food sustainability targets and evolving customer preferences can all have an impact on the ecology of foodborne pathogens and the agrifood niches that can carry microorganisms. Whole-genome and metagenome sequencing, combined with microbial surveillance schemes and insights from the food system, can provide authorities and businesses with transformative information to address risks and implement new food safety interventions across the food chain. In this Review, we describe how genome-based approaches have advanced our understanding of the evolution and spread of enduring bacterial foodborne hazards as well as their role in identifying emerging foodborne hazards. Furthermore, foodborne hazards exist in complex microbial communities across the entire food chain, and consideration of these co-existing organisms is essential to understanding the entire ecology supporting pathogen persistence and transmission in an evolving food system.
Collapse
Affiliation(s)
- Alison E Mather
- Quadram Institute Bioscience, Norwich, UK.
- University of East Anglia, Norwich, UK.
| | - Matthew W Gilmour
- Quadram Institute Bioscience, Norwich, UK
- University of East Anglia, Norwich, UK
| | | | - Nigel P French
- Tāuwharau Ora, School of Veterinary Science, Te Kunenga Ki Pūrehuroa, Massey University, Papaioea, Palmerston North, Aotearoa New Zealand
| |
Collapse
|
6
|
Acheampong DA, Jenjaroenpun P, Wongsurawat T, Kurilung A, Pomyen Y, Kandel S, Kunadirek P, Chuaypen N, Kusonmano K, Nookaew I. CAIM: coverage-based analysis for identification of microbiome. Brief Bioinform 2024; 25:bbae424. [PMID: 39222062 PMCID: PMC11367759 DOI: 10.1093/bib/bbae424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2024] [Revised: 06/26/2024] [Accepted: 08/13/2024] [Indexed: 09/04/2024] Open
Abstract
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic approach. In this study, we developed a new bioinformatics tool, coverage-based analysis for identification of microbiome (CAIM), for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count-based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consistently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similarity of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and 44 primary liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Collapse
Affiliation(s)
- Daniel A Acheampong
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Stowers Institute for Medical Research, 1000 E 50 St, Kansas City, MO 64110, United States
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| | - Alongkorn Kurilung
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
| | - Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, 54 Kamphaeng Phet Rd., Laksi, Bangkok 10210, Thailand
| | - Sangam Kandel
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Influenza Research Institute, Department of Pathobiological Sciences, School of Veterinary Medicine, University of Wisconsin-Madison, 575 Science Drive, Madison, WI 53711, United States
| | - Pattapon Kunadirek
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Rama 4 road, Pathumwan, Bangkok 10330, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Rama 4 road, Pathumwan, Bangkok 10330, Thailand
| | - Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Road, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand
- Systems Biology and Bioinformatics Research Laboratory, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi, 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Road, Tha Kham, Bang Khun Thian, Bangkok 10150, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Division of Endocrinology, Department of Medicine, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Department of Physiology and Cell Biology, University of Arkansas for Medical Sciences, 4301 W Markham St, Little Rock, AR 72205, United States
- Department of Biochemistry, Faculty of Medicine Siriraj Hospital, Mahidol University, 2 Wang Lang Road, Siriraj, Bangkok Noi, Bangkok 10700, Thailand
| |
Collapse
|
7
|
Pinto Y, Bhatt AS. Sequencing-based analysis of microbiomes. Nat Rev Genet 2024:10.1038/s41576-024-00746-6. [PMID: 38918544 DOI: 10.1038/s41576-024-00746-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/27/2024]
Abstract
Microbiomes occupy a range of niches and, in addition to having diverse compositions, they have varied functional roles that have an impact on agriculture, environmental sciences, and human health and disease. The study of microbiomes has been facilitated by recent technological and analytical advances, such as cheaper and higher-throughput DNA and RNA sequencing, improved long-read sequencing and innovative computational analysis methods. These advances are providing a deeper understanding of microbiomes at the genomic, transcriptional and translational level, generating insights into their function and composition at resolutions beyond the species level.
Collapse
Affiliation(s)
- Yishay Pinto
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA
| | - Ami S Bhatt
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA.
| |
Collapse
|
8
|
Enav H, Paz I, Ley RE. Strain tracking in complex microbiomes using synteny analysis reveals per-species modes of evolution. Nat Biotechnol 2024:10.1038/s41587-024-02276-2. [PMID: 38898177 DOI: 10.1038/s41587-024-02276-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 05/10/2024] [Indexed: 06/21/2024]
Abstract
Microbial species diversify into strains through single-nucleotide mutations and structural changes, such as recombination, insertions and deletions. Most strain-comparison methods quantify differences in single-nucleotide polymorphisms (SNPs) and are insensitive to structural changes. However, recombination is an important driver of phenotypic diversification in many species, including human pathogens. We introduce SynTracker, a tool that compares microbial strains using genome synteny-the order of sequence blocks in homologous genomic regions-in pairs of metagenomic assemblies or genomes. Genome synteny is a rich source of genomic information untapped by current strain-comparison tools. SynTracker has low sensitivity to SNPs, has no database requirement and is robust to sequencing errors. It outperforms existing tools when tracking strains in metagenomic data and is particularly suited for phages, plasmids and other low-data contexts. Applied to single-species datasets and human gut metagenomes, SynTracker, combined with an SNP-based tool, detects strains enriched in either point mutations or structural changes, providing insights into microbial evolution in situ.
Collapse
Affiliation(s)
- Hagay Enav
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Inbal Paz
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany
| | - Ruth E Ley
- Department of Microbiome Science, Max Planck Institute for Biology, Tübingen, Germany.
- Cluster of Excellence EXC 2124: Controlling Microbes to Fight Infections (CMFI), University of Tübingen, Tübingen, Germany.
| |
Collapse
|
9
|
Ju N, Liu J, He Q. SNP-slice resolves mixed infections: simultaneously unveiling strain haplotypes and linking them to hosts. Bioinformatics 2024; 40:btae344. [PMID: 38885409 PMCID: PMC11187496 DOI: 10.1093/bioinformatics/btae344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/09/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information sometimes have to discard mixed infection samples as many downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A scalable tool to learn and resolve the SNP-haplotypes from polygenomic data is an urgent need in molecular epidemiology. RESULTS We develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP-haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP-haplotypes and individual heterozygosities accurately without reference panels and outperforms the state-of-the-art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for using our method on empirical datasets. AVAILABILITY AND IMPLEMENTATION The implementation of the SNP-Slice algorithm, as well as scripts to analyze SNP-Slice outputs, are available at https://github.com/nianqiaoju/snp-slice.
Collapse
Affiliation(s)
- Nianqiao Ju
- Department of Statistics, Purdue University, West Lafayette, IN 47907, United States
| | - Jiawei Liu
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| | - Qixin He
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| |
Collapse
|
10
|
Wattanasombat S, Tongjai S. Easing genomic surveillance: A comprehensive performance evaluation of long-read assemblers across multi-strain mixture data of HIV-1 and Other pathogenic viruses for constructing a user-friendly bioinformatic pipeline. F1000Res 2024; 13:556. [PMID: 38984017 PMCID: PMC11231628 DOI: 10.12688/f1000research.149577.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/14/2024] [Indexed: 07/11/2024] Open
Abstract
Background Determining the appropriate computational requirements and software performance is essential for efficient genomic surveillance. The lack of standardized benchmarking complicates software selection, especially with limited resources. Methods We developed a containerized benchmarking pipeline to evaluate seven long-read assemblers-Canu, GoldRush, MetaFlye, Strainline, HaploDMF, iGDA, and RVHaplo-for viral haplotype reconstruction, using both simulated and experimental Oxford Nanopore sequencing data of HIV-1 and other viruses. Benchmarking was conducted on three computational systems to assess each assembler's performance, utilizing QUAST and BLASTN for quality assessment. Results Our findings show that assembler choice significantly impacts assembly time, with CPU and memory usage having minimal effect. Assembler selection also influences the size of the contigs, with a minimum read length of 2,000 nucleotides required for quality assembly. A 4,000-nucleotide read length improves quality further. Canu was efficient among de novo assemblers but not suitable for multi-strain mixtures, while GoldRush produced only consensus assemblies. Strainline and MetaFlye were suitable for metagenomic sequencing data, with Strainline requiring high memory and MetaFlye operable on low-specification machines. Among reference-based assemblers, iGDA had high error rates, RVHaplo showed the best runtime and accuracy but became ineffective with similar sequences, and HaploDMF, utilizing machine learning, had fewer errors with a slightly longer runtime. Conclusions The HIV-64148 pipeline, containerized using Docker, facilitates easy deployment and offers flexibility to select from a range of assemblers to match computational systems or study requirements. This tool aids in genome assembly and provides valuable information on HIV-1 sequences, enhancing viral evolution monitoring and understanding.
Collapse
Affiliation(s)
- Sara Wattanasombat
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Siripong Tongjai
- Department of Microbiology, Faculty of Medicine, Chiang Mai University, Chiang Mai, 50200, Thailand
| |
Collapse
|
11
|
Kuster R, Staton M. Readsynth: short-read simulation for consideration of composition-biases in reduced metagenome sequencing approaches. BMC Bioinformatics 2024; 25:191. [PMID: 38750423 PMCID: PMC11095026 DOI: 10.1186/s12859-024-05809-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 05/10/2024] [Indexed: 05/19/2024] Open
Abstract
BACKGROUND The application of reduced metagenomic sequencing approaches holds promise as a middle ground between targeted amplicon sequencing and whole metagenome sequencing approaches but has not been widely adopted as a technique. A major barrier to adoption is the lack of read simulation software built to handle characteristic features of these novel approaches. Reduced metagenomic sequencing (RMS) produces unique patterns of fragmentation per genome that are sensitive to restriction enzyme choice, and the non-uniform size selection of these fragments may introduce novel challenges to taxonomic assignment as well as relative abundance estimates. RESULTS Through the development and application of simulation software, readsynth, we compare simulated metagenomic sequencing libraries with existing RMS data to assess the influence of multiple library preparation and sequencing steps on downstream analytical results. Based on read depth per position, readsynth achieved 0.79 Pearson's correlation and 0.94 Spearman's correlation to these benchmarks. Application of a novel estimation approach, fixed length taxonomic ratios, improved quantification accuracy of simulated human gut microbial communities when compared to estimates of mean or median coverage. CONCLUSIONS We investigate the possible strengths and weaknesses of applying the RMS technique to profiling microbial communities via simulations with readsynth. The choice of restriction enzymes and size selection steps in library prep are non-trivial decisions that bias downstream profiling and quantification. The simulations investigated in this study illustrate the possible limits of preparing metagenomic libraries with a reduced representation sequencing approach, but also allow for the development of strategies for producing and handling the sequence data produced by this promising application.
Collapse
Affiliation(s)
- Ryan Kuster
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA.
| | - Margaret Staton
- Department of Entomology and Plant Pathology, University of Tennessee, Knoxville, TN, USA
| |
Collapse
|
12
|
Acheampong DA, Jenjaroenpun P, Wongsurawat T, Krulilung A, Pomyen Y, Kandel S, Kunadirek P, Chuaypen N, Kusonmano K, Nookaew I. CAIM: Coverage-based Analysis for Identification of Microbiome. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.25.591018. [PMID: 38746391 PMCID: PMC11091946 DOI: 10.1101/2024.04.25.591018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing technologies have contributed tremendously toward understanding these microbes at species resolution through a whole shotgun metagenomic (WMS) approach. In this study, we developed a new bioinformatics tool, CAIM, for accurate taxonomic classification and quantification within both long- and short-read metagenomic samples using an alignment-based method. CAIM depends on two different containment techniques to identify species in metagenomic samples using their genome coverage information to filter out false positives rather than the traditional approach of relative abundance. In addition, we propose a nucleotide-count based abundance estimation, which yield lesser root mean square error than the traditional read-count approach. We evaluated the performance of CAIM on 28 metagenomic mock communities and 2 synthetic datasets by comparing it with other top-performing tools. CAIM maintained a consitently good performance across datasets in identifying microbial taxa and in estimating relative abundances than other tools. CAIM was then applied to a real dataset sequenced on both Nanopore (with and without amplification) and Illumina sequencing platforms and found high similality of taxonomic profiles between the sequencing platforms. Lastly, CAIM was applied to fecal shotgun metagenomic datasets of 232 colorectal cancer patients and 229 controls obtained from 4 different countries and primary 44 liver cancer patients and 76 controls. The predictive performance of models using the genome-coverage cutoff was better than those using the relative-abundance cutoffs in discriminating colorectal cancer and primary liver cancer patients from healthy controls with a highly confident species markers.
Collapse
Affiliation(s)
- Daniel A. Acheampong
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Stowers Institute for Medical Research, Kansas City, MO, USA
| | - Piroon Jenjaroenpun
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Thidathip Wongsurawat
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
- Division of Medical Bioinformatics, Department of Research, Faculty of Medicine Siriraj Hospital, Mahidol University, Bangkok, Thailand
| | - Alongkorn Krulilung
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Yotsawat Pomyen
- Translational Research Unit, Chulabhorn Research Institute, Bangkok, 10210, Thailand
| | - Sangam Kandel
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| | - Pattapon Kunadirek
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Natthaya Chuaypen
- Center of Excellence in Hepatitis and Liver Cancer, Department of Biochemistry, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand
| | - Kanthida Kusonmano
- Bioinformatics and Systems Biology Program, School of Bioresources and Technology, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
- Systems Biology and Bioinformatics Research Laboratory, Pilot Plant Development and Training Institute, King Mongkut’s University of Technology Thonburi, Bangkok, 10150, Thailand
| | - Intawat Nookaew
- Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, AR, USA
| |
Collapse
|
13
|
Murugaiyan V, Utreja S, Hovey KM, Sun Y, LaMonte MJ, Wactawski-Wende J, Diaz PI, Buck MJ. Defining Porphyromonas gingivalis strains associated with periodontal disease. Sci Rep 2024; 14:6222. [PMID: 38485747 PMCID: PMC10940620 DOI: 10.1038/s41598-024-56849-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Accepted: 03/12/2024] [Indexed: 03/18/2024] Open
Abstract
Porphyromonas gingivalis, a Gram-negative anaerobic bacterium commonly found in human subgingival plaque, is a major etiologic agent for periodontitis and has been associated with multiple systemic pathologies. Many P. gingivalis strains have been identified and different strains possess different virulence factors. Current oral microbiome approaches (16S or shotgun) have been unable to differentiate P. gingivalis strains. This study presents a new approach that aims to improve the accuracy of strain identification, using a detection method based on sequencing of the intergenic spacer region (ISR) which is variable between P. gingivalis strains. Our approach uses two-step PCR to amplify only the P. gingivalis ISR region. Samples are then sequenced with an Illumina sequencer and mapped to specific strains. Our approach was validated by examining subgingival plaque from 153 participants with and without periodontal disease. We identified the avirulent strain ATCC33277/381 as the most abundant strain across all sample types. The W83/W50 strain was significantly enriched in periodontitis, with 13% of participants harboring that strain. Overall, this approach can have significant implications not only for the diagnosis and treatment of periodontal disease but also for other diseases where P. gingivalis or its toxins have been implicated, such as Alzheimer's disease.
Collapse
Affiliation(s)
- Vijaya Murugaiyan
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Simran Utreja
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Kathleen M Hovey
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA
| | - Yijun Sun
- Department of Microbiology and Immunology, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Michael J LaMonte
- Department of Epidemiology and Environmental Health, School of Public Health and Health Professions, University at Buffalo, Buffalo, NY, USA
| | - Jean Wactawski-Wende
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA
| | - Patricia I Diaz
- UB Microbiome Center, University at Buffalo, Buffalo, NY, USA
- Department of Oral Biology, School of Dental Medicine, University at Buffalo, Buffalo, NY, USA
| | - Michael J Buck
- Department of Biochemistry, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
- Department of Biomedical Informatics, Jacobs School of Medicine and Biomedical Sciences, University at Buffalo, Buffalo, NY, USA.
| |
Collapse
|
14
|
McHugh MP, Pettigrew KA, Taori S, Evans TJ, Leanord A, Gillespie SH, Templeton KE, Holden MTG. Consideration of within-patient diversity highlights transmission pathways and antimicrobial resistance gene variability in vancomycin-resistant Enterococcus faecium. J Antimicrob Chemother 2024; 79:656-668. [PMID: 38323373 PMCID: PMC11090465 DOI: 10.1093/jac/dkae023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 01/02/2024] [Indexed: 02/08/2024] Open
Abstract
BACKGROUND WGS is increasingly being applied to healthcare-associated vancomycin-resistant Enterococcus faecium (VREfm) outbreaks. Within-patient diversity could complicate transmission resolution if single colonies are sequenced from identified cases. OBJECTIVES Determine the impact of within-patient diversity on transmission resolution of VREfm. MATERIALS AND METHODS Fourteen colonies were collected from VREfm positive rectal screens, single colonies were collected from clinical samples and Illumina WGS was performed. Two isolates were selected for Oxford Nanopore sequencing and hybrid genome assembly to generate lineage-specific reference genomes. Mapping to closely related references was used to identify genetic variations and closely related genomes. A transmission network was inferred for the entire genome set using Phyloscanner. RESULTS AND DISCUSSION In total, 229 isolates from 11 patients were sequenced. Carriage of two or three sequence types was detected in 27% of patients. Presence of antimicrobial resistance genes and plasmids was variable within genomes from the same patient and sequence type. We identified two dominant sequence types (ST80 and ST1424), with two putative transmission clusters of two patients within ST80, and a single cluster of six patients within ST1424. We found transmission resolution was impaired using fewer than 14 colonies. CONCLUSIONS Patients can carry multiple sequence types of VREfm, and even within related lineages the presence of mobile genetic elements and antimicrobial resistance genes can vary. VREfm within-patient diversity could be considered in future to aid accurate resolution of transmission networks.
Collapse
Affiliation(s)
- Martin P McHugh
- School of Medicine, University of St Andrews, St Andrews, UK
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh, UK
| | | | - Surabhi Taori
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh, UK
| | - Thomas J Evans
- School of Infection and Immunity, University of Glasgow, Glasgow, UK
| | - Alistair Leanord
- School of Infection and Immunity, University of Glasgow, Glasgow, UK
- Scottish Microbiology Reference Laboratories, Glasgow Royal Infirmary, Glasgow, UK
| | | | - Kate E Templeton
- Medical Microbiology, Department of Laboratory Medicine, Royal Infirmary of Edinburgh, Edinburgh, UK
| | | |
Collapse
|
15
|
Kim M, Parrish RC, Shah VS, Ross M, Cormier J, Baig A, Huang CY, Brenner L, Neuringer I, Whiteson K, Harris JK, Willis AD, Lai PS. Host DNA depletion on frozen human respiratory samples enables successful metagenomic sequencing for microbiome studies. RESEARCH SQUARE 2024:rs.3.rs-3638876. [PMID: 38343829 PMCID: PMC10854296 DOI: 10.21203/rs.3.rs-3638876/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/22/2024]
Abstract
Background Most respiratory microbiome studies have focused on amplicon rather than metagenomics sequencing due to high host DNA content. We evaluated efficacy of five host DNA depletion methods on previously frozen human bronchoalveolar lavage (BAL), nasal swabs, and sputum prior to metagenomic sequencing. Results Median sequencing depth was 76.4 million reads per sample. Untreated nasal, sputum and BAL samples had 94.1%, 99.2%, and 99.7% host-reads. The effect of host depletion differed by sample type. Most treatment methods increased microbial reads, species richness and predicted functional richness; the increase in species and predicted functional richness was mediated by higher effective sequencing depth. For BAL and nasal samples, most methods did not change Morisita-Horn dissimilarity suggesting limited bias introduced by host depletion. Conclusions Metagenomics sequencing without host depletion will underestimate microbial diversity of most respiratory samples due to shallow effective sequencing depth and is not recommended. Optimal host depletion methods vary by sample type.
Collapse
Affiliation(s)
- Minsik Kim
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital; Department of Medicine, Harvard Medical School
| | - Raymond C Parrish
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital
| | - Viral S Shah
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital
| | - Matthew Ross
- Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine
| | - Juwan Cormier
- Alkek Center for Metagenomics and Microbiome Research, Baylor College of Medicine
| | - Aribah Baig
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital; College of Science, Northeastern University
| | - Ching-Ying Huang
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital
| | - Laura Brenner
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital; Department of Medicine, Harvard Medical School
| | - Isabel Neuringer
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital
| | - Katrine Whiteson
- Department of Molecular Biology & Biochemistry, University of California
| | - J Kirk Harris
- Department of Pediatrics, University of Colorado Anschutz Medical Campus
| | - Amy D Willis
- Department of Biostatistics, University of Washington School of Public Health
| | - Peggy S Lai
- Division of Pulmonary and Critical Care Medicine, Massachusetts General Hospital; Department of Medicine, Harvard Medical School
| |
Collapse
|
16
|
Dong Z, Xie Q, Yuan Y, Shen X, Hao Y, Li J, Xu H, Kuang W. Strain-level structure of gut microbiome showed potential association with cognitive function in major depressive disorder: A pilot study. J Affect Disord 2023; 341:236-247. [PMID: 37657622 DOI: 10.1016/j.jad.2023.08.129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/09/2023] [Revised: 08/13/2023] [Accepted: 08/29/2023] [Indexed: 09/03/2023]
Abstract
BACKGROUND Although the association between gut microbiota and the pathogenesis of major depressive disorder (MDD) has been well studied, it is unclear whether gut microbiota affects cognitive function in patients with MDD. In this study, we explored the association between gut microbiota and cognitive function in MDD and its possible mechanisms. METHODS We enrolled 57 patients with MDD and 30 healthy controls (HCs) and used 16S rRNA gene sequencing analysis and shotgun metagenomic sequencing analysis to determine gut microbial composition. RESULTS The richness and diversity of gut microbiota in patients with MDD were the same as those in HCs, but there were differences in the abundance of Bifidobacterium and Blautia. Compared with HCs, two strains (bin_32 and bin_55) were significantly increased, and one strain (bin_31) was significantly decreased in patients with MDD based on the strain-level meta-analysis. Time to complete the Stroop-C had significant negative correlations with bin_31 and bin_32. Bin_55 had significant negative correlations with time to complete the Stroop-C, time to complete the Stroop-CW, and repeated animal words in 60 s but significant positive correlations with correct answers in 120 s on the Stroop-CW. LIMITATIONS This study only tested the cognitive function of MDD in a small sample, which may have caused some bias. CONCLUSIONS Based on our strain-level analysis, we found that gut microbiota may be associated with the pathogenesis of MDD and may have potential effects on cognitive function.
Collapse
Affiliation(s)
- Zaiquan Dong
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China; Department of Psychiatry and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China
| | - Qinglian Xie
- Department of Outpatient, West China Hospital of Sichuan University, Chengdu 610041, PR China
| | - Yanling Yuan
- Department of Pharmacy, West China Hospital of Sichuan University, Chengdu 610041, PR China
| | - Xiaoling Shen
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China
| | - Yanni Hao
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China
| | - Jin Li
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China
| | - Haizhen Xu
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China
| | - Weihong Kuang
- Mental Health Center, West China Hospital, Sichuan University, Chengdu 610041, PR China; Department of Psychiatry and National Clinical Research Center for Geriatrics, West China Hospital, Sichuan University, Chengdu 610041, PR China.
| |
Collapse
|
17
|
Ventolero M, Wang S, Hu H, Li X. Are the predicted known bacterial strains in a sample really present? A case study. PLoS One 2023; 18:e0291964. [PMID: 37831725 PMCID: PMC10575510 DOI: 10.1371/journal.pone.0291964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2022] [Accepted: 09/10/2023] [Indexed: 10/15/2023] Open
Abstract
With mutations constantly accumulating in bacterial genomes, it is unclear whether the previously identified bacterial strains are really present in an extant sample. To address this question, we did a case study on the known strains of the bacterial species S. aureus and S. epidermis in 68 atopic dermatitis shotgun metagenomic samples. We evaluated the likelihood of the presence of all sixteen known strains predicted in the original study and by two popular tools in this study. We found that even with the same tool, only two known strains were predicted by the original study and this study. Moreover, none of the sixteen known strains was likely present in these 68 samples. Our study thus indicates the limitation of the known-strain-based studies, especially those on rapidly evolving bacterial species. It implies the unlikely presence of the previously identified known strains in a current environmental sample. It also called for de novo bacterial strain identification directly from shotgun metagenomic reads.
Collapse
Affiliation(s)
- Minerva Ventolero
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, Florida, United States of America
| | - Haiyan Hu
- Department of Computer Science, Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, Florida, United States of America
| | - Xiaoman Li
- Burnett School of Biomedical Science, College of Medicine, University of Central Florida, Orlando, Florida, United States of America
| |
Collapse
|
18
|
Qian X, Wu Y, Zuo X, Peng X, Guo Y, Yang R, Zhang X, Cui Y. mStrain: strain-level identification of Yersinia pestis using metagenomic data. BIOINFORMATICS ADVANCES 2023; 3:vbad115. [PMID: 37745000 PMCID: PMC10516513 DOI: 10.1093/bioadv/vbad115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 08/08/2023] [Accepted: 09/07/2023] [Indexed: 09/26/2023]
Abstract
Motivation High-resolution target pathogen detection using metagenomic sequencing data represents a major challenge due to the low concentration of target pathogens in samples. We introduced mStrain, a novel Yesinia pestis strain/lineage-level identification tool that utilizes metagenomic data. mStrain successfully identified Y. pestis at the strain/lineage level by extracting sufficient information regarding single-nucleotide polymorphisms (SNPs), which can therefore be an effective tool for identification and source tracking of Y. pestis based on metagenomic data during a plague outbreak. Definition . Strain-level identification Assigning the reads in the metagenomic sequencing data to an exactly known or most closely representative Y. pestis strain. Lineage-level identification Assigning the reads in the metagenomic sequencing data to a specific lineage on the phylogenetic tree. canoSNPs The unique and typical SNPs present in all representative strains. Ancestor/derived state An SNP is defined as the ancestor state when consistent with the allele of Yersinia pseudotuberculosis strain IP32953; otherwise, the SNP is defined as the derived state. Availability and implementation The code for running mStrain, the test dataset, and instructions for running the code can be found at the following GitHub repository: https://github.com/xwqian1123/mStrain.
Collapse
Affiliation(s)
- Xiuwei Qian
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yarong Wu
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Xiujuan Zuo
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Xin Peng
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yan Guo
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Ruifu Yang
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Xianglilan Zhang
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| | - Yujun Cui
- School of Basic Medical Sciences, Anhui Medical University, Hefei 230032, China
- State Key Laboratory of Pathogen and Biosecurity, Beijing Institute of Microbiology and Epidemiology, Beijing 100071, China
| |
Collapse
|
19
|
Ju NP, Liu J, He Q. SNP-Slice Resolves Mixed Infections: Simultaneously Unveiling Strain Haplotypes and Linking Them to Hosts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.29.551098. [PMID: 37546891 PMCID: PMC10402141 DOI: 10.1101/2023.07.29.551098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information have to discard mixed infection samples, because existing downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A reliable tool to learn and resolve the SNP haplotypes from polygenomic data is an urgent need in molecular epidemiology. In this work, we develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP haplotypes and individual heterozygosities accurately without reference panels and outperforms the state of art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for the practical use of the method.
Collapse
|
20
|
Liao H, Ji Y, Sun Y. High-resolution strain-level microbiome composition analysis from short reads. MICROBIOME 2023; 11:183. [PMID: 37587527 PMCID: PMC10433603 DOI: 10.1186/s40168-023-01615-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 07/07/2023] [Indexed: 08/18/2023]
Abstract
BACKGROUND Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan . Video Abstract.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yongxin Ji
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
21
|
Chen Y, Jiang Q, Liu Q, Gan M, Takiff HE, Gao Q. Whole-Genome Sequencing Exhibits Better Diagnostic Performance than Variable-Number Tandem Repeats for Identifying Mixed Infections of Mycobacterium tuberculosis. Microbiol Spectr 2023; 11:e0357022. [PMID: 37098911 PMCID: PMC10269500 DOI: 10.1128/spectrum.03570-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2022] [Accepted: 04/10/2023] [Indexed: 04/27/2023] Open
Abstract
Mixed infections of Mycobacterium tuberculosis, defined as the coexistence of multiple genetically distinct strains within a single host, have been associated with unfavorable treatment outcomes. Different methods have been used to detect mixed infections, but their performances have not been carefully evaluated. To compare the sensitivity of whole-genome sequencing (WGS) and variable-number tandem repeats (VNTR) typing to detect mixed infections, we prepared 10 artificial samples composed of DNA mixtures from two strains in different proportions and retrospectively collected 1,084 clinical isolates. The limit of detection (LOD) for the presence of a minor strain was 5% for both WGS and VNTR typing. The overall clinical detection rate of mixed infections was 3.7% (40/1,084) for the two methods combined, WGS identified 37/1,084 (3.4%), and VNTR typing identified 14/1,084 (1.3%), including 11 also identified by WGS. Multivariate analysis demonstrated that retreatment patients had a 2.7 times (95% confidence interval [CI], 1.2 to 6.0) higher risk of mixed infections than new cases. Collectively, WGS is a more reliable tool to identify mixed infections than VNTR typing, and mixed infections are more common in retreated patients. IMPORTANCE Mixed infections of M. tuberculosis have the potential to render treatment regimens ineffective and affect the transmission dynamics of the disease. VNTR typing, currently the most widely used method for the detection of mixed infections, detects mixed infections only by interrogating a small fraction of the M. tuberculosis genome, which necessarily limits sensitivity. With the introduction of WGS, it became possible to study the entire genome, but no quantitative comparison has yet been undertaken. Our systematic comparison of the ability of WGS and VNTR typing to detect mixed infections, using both artificial samples and clinical isolates, revealed the superior performance of WGS at a high sequencing depth (~100×) and found that mixed infections are more common in patients being retreated for tuberculosis (TB) in the populations studied. This provides valuable information for the application of WGS in the detection of mixed infections and the implications of mixed infections for tuberculosis control.
Collapse
Affiliation(s)
- Yiwang Chen
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, Shenzhen, Guangdong, China
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| | - Qi Jiang
- School of Public Health, Public Health Research Institute of Renmin Hospital, Wuhan University, Wuhan, China
| | - Qingyun Liu
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Mingyu Gan
- Molecular Medical Center, Children’s Hospital of Fudan University, Shanghai, China
| | - Howard E. Takiff
- Instituto Venezolano de Investigaciones Cientificas (IVIC), Caracas, Venezuela
| | - Qian Gao
- National Clinical Research Center for Infectious Diseases, Shenzhen Third People’s Hospital, Shenzhen, Guangdong, China
- Key Laboratory of Medical Molecular Virology (MOE/NHC/CAMS), School of Basic Medical Sciences, Shanghai Medical College, Fudan University, Shanghai, China
| |
Collapse
|
22
|
Zhou B, Li H. STEMSIM: a simulator of within-strain short-term evolutionary mutations for longitudinal metagenomic data. Bioinformatics 2023; 39:btad302. [PMID: 37154701 PMCID: PMC10188296 DOI: 10.1093/bioinformatics/btad302] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/29/2023] [Accepted: 04/29/2023] [Indexed: 05/10/2023] Open
Abstract
MOTIVATION As the resolution of metagenomic analysis increases, the evolution of microbial genomes in longitudinal metagenomic data has become a research focus. Some software has been developed for the simulation of complex microbial communities at the strain level. However, the tool for simulating within-strain evolutionary signals in longitudinal samples is still lacking. RESULTS In this study, we introduce STEMSIM, a user-friendly command-line simulator of short-term evolutionary mutations for longitudinal metagenomic data. The input is simulated longitudinal raw sequencing reads of microbial communities or single species. The output is the modified reads with within-strain evolutionary mutations and the relevant information of these mutations. STEMSIM will be of great use for the evaluation of analytic tools that detect short-term evolutionary mutations in metagenomic data. AVAILABILITY AND IMPLEMENTATION STEMSIM and its tutorial are freely available online at https://github.com/BoyanZhou/STEMSim.
Collapse
Affiliation(s)
- Boyan Zhou
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| | - Huilin Li
- Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY 10016, USA
| |
Collapse
|
23
|
Valadez-Cano C, Reyes-Prieto A, Beach DG, Rafuse C, McCarron P, Lawrence J. Genomic characterization of coexisting anatoxin-producing and non-toxigenic Microcoleus subspecies in benthic mats from the Wolastoq, New Brunswick, Canada. HARMFUL ALGAE 2023; 124:102405. [PMID: 37164558 DOI: 10.1016/j.hal.2023.102405] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 05/12/2023]
Abstract
The presence of toxigenic benthic cyanobacteria in riverine ecosystems is an increasing concern around the world. In 2018, the death of three dogs along the Wolastoq (also known as the Saint John River) in New Brunswick, Canada, was attributed to anatoxin exposure after they ingested benthic microbial mats found along the shore. Here, we shotgun sequenced the DNA of 15 non-axenic cyanobacterial isolates derived from four anatoxin-containing benthic mat samples associated with the dog deaths. Anatoxins were produced by some of the isolates, but not all. We retrieved near-complete Microcoleus metagenome-assembled genomes (MAGs) from the isolates that are closely related to anatoxin-producing Microcoleus from the Cardrona River (New Zealand), although the Microcoleus MAGs from the Wolastoq varied in the presence/absence of the anatoxin-a biosynthesis cluster. Sequence similarity at the genomic level suggests that toxigenic and non-toxigenic Microcoleus MAGs from the Wolastoq belong to the same species but are separate subspecies. The toxigenic and nontoxic Wolastoq Microcoleus subspecies coexisted in the mat samples in similar relative abundance. Overall genomic comparisons revealed that toxigenic Microcoleus MAGs are longer and code for more accessory genes than their non-toxigenic relatives, suggesting a differential responsiveness to changing environments, stress conditions and nutrient availability.
Collapse
Affiliation(s)
- Cecilio Valadez-Cano
- Department of Biology, University of New Brunswick, 10 Bailey Drive, Fredericton, New Brunswick, E3B 5A3, Canada
| | - Adrian Reyes-Prieto
- Department of Biology, University of New Brunswick, 10 Bailey Drive, Fredericton, New Brunswick, E3B 5A3, Canada
| | - Daniel G Beach
- Biotoxin Metrology, National Research Council Canada, 1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Cheryl Rafuse
- Biotoxin Metrology, National Research Council Canada, 1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Pearse McCarron
- Biotoxin Metrology, National Research Council Canada, 1411 Oxford Street, Halifax, Nova Scotia, B3H 3Z1, Canada
| | - Janice Lawrence
- Department of Biology, University of New Brunswick, 10 Bailey Drive, Fredericton, New Brunswick, E3B 5A3, Canada.
| |
Collapse
|
24
|
Yorki S, Shea T, Cuomo CA, Walker BJ, LaRocque RC, Manson AL, Earl AM, Worby CJ. Comparison of long- and short-read metagenomic assembly for low-abundance species and resistance genes. Brief Bioinform 2023; 24:bbad050. [PMID: 36804804 PMCID: PMC10025444 DOI: 10.1093/bib/bbad050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2022] [Revised: 01/13/2023] [Accepted: 01/26/2023] [Indexed: 02/23/2023] Open
Abstract
Recent technological and computational advances have made metagenomic assembly a viable approach to achieving high-resolution views of complex microbial communities. In previous benchmarking, short-read (SR) metagenomic assemblers had the highest accuracy, long-read (LR) assemblers generated the most contiguous sequences and hybrid (HY) assemblers balanced length and accuracy. However, no assessments have specifically compared the performance of these assemblers on low-abundance species, which include clinically relevant organisms in the gut. We generated semi-synthetic LR and SR datasets by spiking small and increasing amounts of Escherichia coli isolate reads into fecal metagenomes and, using different assemblers, examined E. coli contigs and the presence of antibiotic resistance genes (ARGs). For ARG assembly, although SR assemblers recovered more ARGs with high accuracy, even at low coverages, LR assemblies allowed for the placement of ARGs within longer, E. coli-specific contigs, thus pinpointing their taxonomic origin. HY assemblies identified resistance genes with high accuracy and had lower contiguity than LR assemblies. Each assembler type's strengths were maintained even when our isolate was spiked in with a competing strain, which fragmented and reduced the accuracy of all assemblies. For strain characterization and determining gene context, LR assembly is optimal, while for base-accurate gene identification, SR assemblers outperform other options. HY assembly offers contiguity and base accuracy, but requires generating data on multiple platforms, and may suffer high misassembly rates when strain diversity exists. Our results highlight the trade-offs associated with each approach for recovering low-abundance taxa, and that the optimal approach is goal-dependent.
Collapse
Affiliation(s)
- Sosie Yorki
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Terrance Shea
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Christina A Cuomo
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bruce J Walker
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Applied Invention, LLC, Cambridge, MA, USA
| | - Regina C LaRocque
- Division of Infectious Diseases, Massachusetts General Hospital, Boston, MA, USA
| | - Abigail L Manson
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Ashlee M Earl
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Colin J Worby
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
25
|
Zhao C, Shi ZJ, Pollard KS. Pitfalls of genotyping microbial communities with rapidly growing genome collections. Cell Syst 2023; 14:160-176.e3. [PMID: 36657438 PMCID: PMC9957970 DOI: 10.1016/j.cels.2022.12.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 10/15/2022] [Accepted: 12/19/2022] [Indexed: 01/20/2023]
Abstract
Detecting genetic variants in metagenomic data is a priority for understanding the evolution, ecology, and functional characteristics of microbial communities. Many tools that perform this metagenotyping rely on aligning reads of unknown origin to a database of sequences from many species before calling variants. In this synthesis, we investigate how databases of increasingly diverse and closely related species have pushed the limits of current alignment algorithms, thereby degrading the performance of metagenotyping tools. We identify multi-mapping reads as a prevalent source of errors and illustrate a trade-off between retaining correct alignments versus limiting incorrect alignments, many of which map reads to the wrong species. Then we evaluate several actionable mitigation strategies and review emerging methods showing promise to further improve metagenotyping in response to the rapid growth in genome collections. Our results have implications beyond metagenotyping to the many tools in microbial genomics that depend upon accurate read mapping.
Collapse
Affiliation(s)
- Chunyu Zhao
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Zhou Jason Shi
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA
| | - Katherine S Pollard
- Chan Zuckerberg Biohub, San Francisco, CA, USA; Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA; Department of Epidemiology & Biostatistics, University of California, San Francisco, San Francisco, CA, USA.
| |
Collapse
|
26
|
Hellmann J, Ta A, Ollberding NJ, Bezold R, Lake K, Jackson K, Dirksing K, Bonkowski E, Haslam DB, Denson LA. Patient-Reported Outcomes Correlate With Microbial Community Composition Independent of Mucosal Inflammation in Pediatric Inflammatory Bowel Disease. Inflamm Bowel Dis 2023; 29:286-296. [PMID: 35972440 PMCID: PMC9890220 DOI: 10.1093/ibd/izac175] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Indexed: 02/04/2023]
Abstract
BACKGROUND Inflammatory bowel diseases (IBDs) involve an aberrant host response to intestinal microbiota causing mucosal inflammation and gastrointestinal symptoms. Patient-reported outcomes (PROs) are increasingly important in clinical care and research. Our aim was to examine associations between PROs and fecal microbiota in patients 0 to 22 years of age with IBD. METHODS A longitudinal, prospective, single-center study tested for associations between microbial community composition via shotgun metagenomics and PROs including stool frequency and rectal bleeding in ulcerative colitis (UC) and abdominal pain and stool frequency in Crohn's disease (CD). Mucosal inflammation was assessed with fecal calprotectin. A negative binomial mixed-effects model including clinical characteristics and fecal calprotectin tested for differentially abundant species and metabolic pathways by PROs. RESULTS In 70 CD patients with 244 stool samples, abdominal pain correlated with increased relative abundance of Haemophilus and reduced Clostridium spp. There were no differences relative to calprotectin level. In 23 UC patients with 76 samples, both rectal bleeding and increased stool frequency correlated with increased Klebsiella and reduced Bacteroides spp. Conversely, UC patients with lower calprotectin had reduced Klebsiella. Both UC and CD patients with active symptoms exhibited less longitudinal microbial community stability. No differences in metabolic pathways were observed in CD. Increased sulfoglycolysis and ornithine biosynthesis correlated with symptomatic UC. CONCLUSIONS Microbial community composition correlated with PROs in both CD and UC. Metabolic pathways differed relative to PROs in UC, but not CD. Data suggest that microbiota may contribute to patient symptoms in IBD, in addition to effects of mucosal inflammation.
Collapse
Affiliation(s)
- Jennifer Hellmann
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Allison Ta
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| | - Nicholas J Ollberding
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Ramona Bezold
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Kathleen Lake
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Kimberly Jackson
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Kelsie Dirksing
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Erin Bonkowski
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - David B Haslam
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
- Division of Infectious Disease, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
| | - Lee A Denson
- Division of Gastroenterology, Hepatology, and Nutrition, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
- Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA
| |
Collapse
|
27
|
Moreno E, Ron R, Serrano-Villar S. The microbiota as a modulator of mucosal inflammation and HIV/HPV pathogenesis: From association to causation. Front Immunol 2023; 14:1072655. [PMID: 36756132 PMCID: PMC9900135 DOI: 10.3389/fimmu.2023.1072655] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 01/06/2023] [Indexed: 01/24/2023] Open
Abstract
Although the microbiota has largely been associated with the pathogenesis of viral infections, most studies using omics techniques are correlational and hypothesis-generating. The mechanisms affecting the immune responses to viral infections are still being fully understood. Here we focus on the two most important sexually transmitted persistent viruses, HPV and HIV. Sophisticated omics techniques are boosting our ability to understand microbiota-pathogen-host interactions from a functional perspective by surveying the host and bacterial protein and metabolite production using systems biology approaches. However, while these strategies have allowed describing interaction networks to identify potential novel microbiota-associated biomarkers or therapeutic targets to prevent or treat infectious diseases, the analyses are typically based on highly dimensional datasets -thousands of features in small cohorts of patients-. As a result, we are far from getting to their clinical use. Here we provide a broad overview of how the microbiota influences the immune responses to HIV and HPV disease. Furthermore, we highlight experimental approaches to understand better the microbiota-host-virus interactions that might increase our potential to identify biomarkers and therapeutic agents with clinical applications.
Collapse
Affiliation(s)
- Elena Moreno
- Department of Infectious Diseases, Hospital Universitario Ramón y Cajal, Facultad de Medicina, Universidad de Alcalá, IRYCIS, Madrid, Spain
- CIBERINFEC, Instituto de Salud Carlos III, Madrid, Spain
| | - Raquel Ron
- Department of Infectious Diseases, Hospital Universitario Ramón y Cajal, Facultad de Medicina, Universidad de Alcalá, IRYCIS, Madrid, Spain
- CIBERINFEC, Instituto de Salud Carlos III, Madrid, Spain
| | - Sergio Serrano-Villar
- Department of Infectious Diseases, Hospital Universitario Ramón y Cajal, Facultad de Medicina, Universidad de Alcalá, IRYCIS, Madrid, Spain
- CIBERINFEC, Instituto de Salud Carlos III, Madrid, Spain
| |
Collapse
|
28
|
Chuan J, Xu H, Hammill DL, Hale L, Chen W, Li X. Clasnip: a web-based intraspecies classifier and multi-locus sequence typing for pathogenic microorganisms using fragmented sequences. PeerJ 2023; 11:e14490. [PMID: 36643626 PMCID: PMC9835710 DOI: 10.7717/peerj.14490] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 11/09/2022] [Indexed: 01/11/2023] Open
Abstract
Bioinformatic approaches for the identification of microorganisms have evolved rapidly, but existing methods are time-consuming, complicated or expensive for massive screening of pathogens and their non-pathogenic relatives. Also, bioinformatic classifiers usually lack automatically generated performance statistics for specific databases. To address this problem, we developed Clasnip (www.clasnip.com), an easy-to-use web-based platform for the classification and similarity evaluation of closely related microorganisms at interspecies and intraspecies levels. Clasnip mainly consists of two modules: database building and sample classification. In database building, labeled nucleotide sequences are mapped to a reference sequence, and then single nucleotide polymorphisms (SNPs) statistics are generated. A probability model of SNPs and classification groups is built using Hidden Markov Models and solved using the maximum likelihood method. Database performance is estimated using three replicates of two-fold cross-validation. Sensitivity (recall), specificity (selectivity), precision, accuracy and other metrics are computed for all samples, training sets, and test sets. In sample classification, Clasnip accepts inputs of genes, short fragments, contigs and even whole genomes. It can report classification probability and a multi-locus sequence typing table for SNPs. The classification performance was tested using short sequences of 16S, 16-23S and 50S rRNA regions for 12 haplotypes of Candidatus Liberibacter solanacearum (CLso), a regulated plant pathogen associated with severe disease in economically important Apiaceous and Solanaceous crops. The program was able to classify CLso samples with even only 1-2 SNPs available, and achieved 97.2%, 98.8% and 100.0% accuracy based on 16S, 16-23S, and 50S rRNA sequences, respectively. In comparison with all existing 12 haplotypes, we proposed that to be classified as a new haplotype, given samples have at least 2 SNPs in the combined region of 16S rRNA (OA2/Lsc2) and 16-23S IGS (Lp Frag 4-1611F/Lp Frag 4-480R) regions, and 2 SNPs in the 50S rplJ/rplL (CL514F/CL514R) regions. Besides, we have included the databases for differentiating Dickeya spp., Pectobacterium spp. and Clavibacter spp. In addition to bacteria, we also tested Clasnip performance on potato virus Y (PVY). 251 PVY genomes were 100% correctly classified into seven groups (PVYC, PVYN, PVYO, PVYNTN, PVYN:O, Poha, and Chile3). In conclusion, Clasnip is a statistically sound and user-friendly bioinformatic application for microorganism classification at the intraspecies level. Clasnip service is freely available at www.clasnip.com.
Collapse
Affiliation(s)
- Jiacheng Chuan
- Charlottetown Laboratory, Canadian Food Inspection Agency, Charlottetown, Prince Edward Island, Canada,Department of Biology, University of Prince Edward Island, Charlottetown, Prince Edward Island, Canada
| | - Huimin Xu
- Charlottetown Laboratory, Canadian Food Inspection Agency, Charlottetown, Prince Edward Island, Canada
| | - Desmond L. Hammill
- Charlottetown Laboratory, Canadian Food Inspection Agency, Charlottetown, Prince Edward Island, Canada
| | - Lawrence Hale
- Department of Biology, University of Prince Edward Island, Charlottetown, Prince Edward Island, Canada
| | - Wen Chen
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada,Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, Ontario, Canada
| | - Xiang Li
- Charlottetown Laboratory, Canadian Food Inspection Agency, Charlottetown, Prince Edward Island, Canada
| |
Collapse
|
29
|
Ma S, Li H. Statistical and Computational Methods for Microbial Strain Analysis. Methods Mol Biol 2023; 2629:231-245. [PMID: 36929080 DOI: 10.1007/978-1-0716-2986-4_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.
Collapse
Affiliation(s)
- Siyuan Ma
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
30
|
Wei S, Jespersen ML, Baunwall SMD, Myers PN, Smith EM, Dahlerup JF, Rasmussen S, Nielsen HB, Licht TR, Bahl MI, Hvas CL. Cross-generational bacterial strain transfer to an infant after fecal microbiota transplantation to a pregnant patient: a case report. MICROBIOME 2022; 10:193. [PMID: 36352460 PMCID: PMC9647999 DOI: 10.1186/s40168-022-01394-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 10/13/2022] [Indexed: 06/14/2023]
Abstract
BACKGROUND Fecal microbiota transplantation (FMT) effectively prevents the recurrence of Clostridioides difficile infection (CDI). Long-term engraftment of donor-specific microbial consortia may occur in the recipient, but potential further transfer to other sites, including the vertical transmission of donor-specific strains to future generations, has not been investigated. Here, we report, for the first time, the cross-generational transmission of specific bacterial strains from an FMT donor to a pregnant patient with CDI and further to her child, born at term, 26 weeks after the FMT treatment. METHODS A pregnant woman (gestation week 12 + 5) with CDI was treated with FMT via colonoscopy. She gave vaginal birth at term to a healthy baby. Fecal samples were collected from the feces donor, the mother (before FMT, and 1, 8, 15, 22, 26, and 50 weeks after FMT), and the infant (meconium at birth and 3 and 6 months after birth). Fecal samples were profiled by deep metagenomic sequencing for strain-level analysis. The microbial transfer was monitored using single nucleotide variants in metagenomes and further compared to a collection of metagenomic samples from 651 healthy infants and 58 healthy adults. RESULTS The single FMT procedure led to an uneventful and sustained clinical resolution in the patient, who experienced no further CDI-related symptoms up to 50 weeks after treatment. The gut microbiota of the patient with CDI differed considerably from the healthy donor and was characterized as low in alpha diversity and enriched for several potential pathogens. The FMT successfully normalized the patient's gut microbiota, likely by donor microbiota transfer and engraftment. Importantly, our analysis revealed that some specific strains were transferred from the donor to the patient and then further to the infant, thus demonstrating cross-generational microbial transfer. CONCLUSIONS The evidence for cross-generational strain transfer following FMT provides novel insights into the dynamics and engraftment of bacterial strains from healthy donors. The data suggests FMT treatment of pregnant women as a potential strategy to introduce beneficial strains or even bacterial consortia to infants, i.e., neonatal seeding. Video Abstract.
Collapse
Affiliation(s)
- Shaodong Wei
- National Food Institute, Technical University of Denmark, Kemitorvet 202, 2800, Kgs Lyngby, Denmark
| | - Marie Louise Jespersen
- National Food Institute, Technical University of Denmark, Kemitorvet 202, 2800, Kgs Lyngby, Denmark
- Clinical-Microbiomics A/S, Copenhagen, Denmark
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Simon Mark Dahl Baunwall
- Department of Hepatology and Gastroenterology, Aarhus University Hospital, Aarhus, Denmark
- Institute of Clinical Medicine, Aarhus University, Aarhus, Denmark
| | | | - Emilie Milton Smith
- National Food Institute, Technical University of Denmark, Kemitorvet 202, 2800, Kgs Lyngby, Denmark
| | - Jens Frederik Dahlerup
- Department of Hepatology and Gastroenterology, Aarhus University Hospital, Aarhus, Denmark
| | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | - Tine Rask Licht
- National Food Institute, Technical University of Denmark, Kemitorvet 202, 2800, Kgs Lyngby, Denmark
| | - Martin Iain Bahl
- National Food Institute, Technical University of Denmark, Kemitorvet 202, 2800, Kgs Lyngby, Denmark.
| | - Christian Lodberg Hvas
- Department of Hepatology and Gastroenterology, Aarhus University Hospital, Aarhus, Denmark
- Institute of Clinical Medicine, Aarhus University, Aarhus, Denmark
| |
Collapse
|
31
|
Zhu K, Schäffer AA, Robinson W, Xu J, Ruppin E, Ergun AF, Ye Y, Sahinalp SC. Strain level microbial detection and quantification with applications to single cell metagenomics. Nat Commun 2022; 13:6430. [PMID: 36307411 PMCID: PMC9616933 DOI: 10.1038/s41467-022-33869-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 10/04/2022] [Indexed: 12/25/2022] Open
Abstract
Computational identification and quantification of distinct microbes from high throughput sequencing data is crucial for our understanding of human health. Existing methods either use accurate but computationally expensive alignment-based approaches or less accurate but computationally fast alignment-free approaches, which often fail to correctly assign reads to genomes. Here we introduce CAMMiQ, a combinatorial optimization framework to identify and quantify distinct genomes (specified by a database) in a metagenomic dataset. As a key methodological innovation, CAMMiQ uses substrings of variable length and those that appear in two genomes in the database, as opposed to the commonly used fixed-length, unique substrings. These substrings allow to accurately decouple mixtures of highly similar genomes resulting in higher accuracy than the leading alternatives, without requiring additional computational resources, as demonstrated on commonly used benchmarking datasets. Importantly, we show that CAMMiQ can distinguish closely related bacterial strains in simulated metagenomic and real single-cell metatranscriptomic data.
Collapse
Affiliation(s)
- Kaiyuan Zhu
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Department of Computer Science & Engineering, UC San Diego, La Jolla, CA, USA
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Alejandro A Schäffer
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Welles Robinson
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
- Surgery Branch, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Junyan Xu
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Eytan Ruppin
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - A Funda Ergun
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - Yuzhen Ye
- Department of Computer Science, Indiana University, Bloomington, IN, USA
| | - S Cenk Sahinalp
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA.
- Department of Computer Science, Indiana University, Bloomington, IN, USA.
| |
Collapse
|
32
|
Oyewole ORA, Latzin P, Brugger SD, Hilty M. Strain-level resolution and pneumococcal carriage dynamics by single-molecule real-time (SMRT) sequencing of the plyNCR marker: a longitudinal study in Swiss infants. MICROBIOME 2022; 10:152. [PMID: 36138483 PMCID: PMC9502908 DOI: 10.1186/s40168-022-01344-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 08/05/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND Pneumococcal carriage has often been studied from a serotype perspective; however, little is known about the strain-specific carriage and inter-strain interactions. Here, we examined the strain-level carriage and co-colonization dynamics of Streptococcus pneumoniae in a Swiss birth cohort by PacBio single-molecule real-time (SMRT) sequencing of the plyNCR marker. METHODS A total of 872 nasal swab (NS) samples were included from 47 healthy infants during the first year of life. Pneumococcal carriage was determined based on the quantitative real-time polymerase chain reaction (qPCR) targeting the lytA gene. The plyNCR marker was amplified from 214 samples having lytA-based carriage for pneumococcal strain resolution. Amplicons were sequenced using SMRT technology, and sequences were analyzed with the DADA2 pipeline. In addition, pneumococcal serotypes were determined using conventional, multiplex PCR (cPCR). RESULTS PCR-based plyNCR amplification demonstrated a 94.2% sensitivity and 100% specificity for Streptococcus pneumoniae if compared to lytA qPCR. The overall carriage prevalence was 63.8%, and pneumococcal co-colonization (≥ 2 plyNCR amplicon sequence variants (ASVs)) was detected in 38/213 (17.8%) sequenced samples with the relative proportion of the least abundant strain(s) ranging from 1.1 to 48.8% (median, 17.2%; IQR, 5.8-33.4%). The median age to first acquisition was 147 days, and having ≥ 2 siblings increased the risk of acquisition. CONCLUSION The plyNCR amplicon sequencing is species-specific and enables pneumococcal strain resolution. We therefore recommend its application for longitudinal strain-level carriage studies of Streptococcus pneumoniae. Video Abstract.
Collapse
Affiliation(s)
- Oluwaseun Rume-Abiola Oyewole
- Institute for Infectious Diseases, University of Bern, Friedbühlstrasse 51, 3001, Bern, Switzerland
- Graduate School for Cellular and Biomedical Sciences, University of Bern, Bern, Switzerland
| | - Philipp Latzin
- Division of Respiratory Medicine, Department of Pediatrics, Inselspital, University of Bern, Bern, Switzerland
| | - Silvio D Brugger
- Department of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, University of Zurich, Zurich, Switzerland
| | - Markus Hilty
- Institute for Infectious Diseases, University of Bern, Friedbühlstrasse 51, 3001, Bern, Switzerland.
| |
Collapse
|
33
|
Hu H, Tan Y, Li C, Chen J, Kou Y, Xu ZZ, Liu Y, Tan Y, Dai L. StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data. IMETA 2022; 1:e41. [PMID: 38868710 PMCID: PMC10989911 DOI: 10.1002/imt2.41] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/20/2022] [Accepted: 06/28/2022] [Indexed: 06/14/2024]
Abstract
Microbial strains of variable functional capacities coexist in microbiomes. Current bioinformatics methods of strain analysis cannot provide the direct linkage between strain composition and their gene contents from metagenomic data. Here we present Strain-level Pangenome Decomposition Analysis (StrainPanDA), a novel method that uses the pangenome coverage profile of multiple metagenomic samples to simultaneously reconstruct the composition and gene content variation of coexisting strains in microbial communities. We systematically validate the accuracy and robustness of StrainPanDA using synthetic data sets. To demonstrate the power of gene-centric strain profiling, we then apply StrainPanDA to analyze the gut microbiome samples of infants, as well as patients treated with fecal microbiota transplantation. We show that the linked reconstruction of strain composition and gene content profiles is critical for understanding the relationship between microbial adaptation and strain-specific functions (e.g., nutrient utilization and pathogenicity). Finally, StrainPanDA has minimal requirements for computing resources and can be scaled to process multiple species in a community in parallel. In short, StrainPanDA can be applied to metagenomic data sets to detect the association between molecular functions and microbial/host phenotypes to formulate testable hypotheses and gain novel biological insights at the strain or subspecies level.
Collapse
Affiliation(s)
- Han Hu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| | - Chenhao Li
- Center for Computational and Integrative BiologyMassachusetts General Hospital and Harvard Medical School, Richard B. Simches Research CenterBostonMassachusettsUSA
| | - Junyu Chen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| | - Yan Kou
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Zhenjiang Zech Xu
- Department of Food Science and Technology, State Key Laboratory of Food Science and TechnologyNanchang UniversityNanchangChina
| | - Yang‐Yu Liu
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Yan Tan
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| |
Collapse
|
34
|
Purushothaman S, Meola M, Egli A. Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics. Int J Mol Sci 2022; 23:9834. [PMID: 36077231 PMCID: PMC9456280 DOI: 10.3390/ijms23179834] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/24/2022] [Accepted: 08/26/2022] [Indexed: 12/21/2022] Open
Abstract
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis.
Collapse
Affiliation(s)
- Srinithi Purushothaman
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
| | - Marco Meola
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Swiss Institute of Bioinformatics, University of Basel, 4031 Basel, Switzerland
| | - Adrian Egli
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Clinical Bacteriology and Mycology, University Hospital Basel, 4031 Basel, Switzerland
| |
Collapse
|
35
|
Zhang L, Jonscher KR, Zhang Z, Xiong Y, Mueller RS, Friedman JE, Pan C. Islet autoantibody seroconversion in type-1 diabetes is associated with metagenome-assembled genomes in infant gut microbiomes. Nat Commun 2022; 13:3551. [PMID: 35729161 PMCID: PMC9213500 DOI: 10.1038/s41467-022-31227-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 06/09/2022] [Indexed: 12/13/2022] Open
Abstract
The immune system of some genetically susceptible children can be triggered by certain environmental factors to produce islet autoantibodies (IA) against pancreatic β cells, which greatly increases their risk for Type-1 diabetes. An environmental factor under active investigation is the gut microbiome due to its important role in immune system education. Here, we study gut metagenomes that are de-novo-assembled in 887 at-risk children in the Environmental Determinants of Diabetes in the Young (TEDDY) project. Our results reveal a small set of core protein families, present in >50% of the subjects, which account for 64% of the sequencing reads. Time-series binning generates 21,536 high-quality metagenome-assembled genomes (MAGs) from 883 species, including 176 species that hitherto have no MAG representation in previous comprehensive human microbiome surveys. IA seroconversion is positively associated with 2373 MAGs and negatively with 1549 MAGs. Comparative genomics analysis identifies lipopolysaccharides biosynthesis in Bacteroides MAGs and sulfate reduction in Anaerostipes MAGs as functional signatures of MAGs with positive IA-association. The functional signatures in the MAGs with negative IA-association include carbohydrate degradation in lactic acid bacteria MAGs and nitrate reduction in Escherichia MAGs. Overall, our results show a distinct set of gut microorganisms associated with IA seroconversion and uncovered the functional genomics signatures of these IA-associated microorganisms.
Collapse
Affiliation(s)
- Li Zhang
- Harold Hamm Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.,Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA
| | - Karen R Jonscher
- Harold Hamm Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.,Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Zuyuan Zhang
- School of Computer Science, University of Oklahoma, Norman, OK, USA
| | - Yi Xiong
- Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA
| | - Ryan S Mueller
- Department of Microbiology, Oregon State University, Corvallis, OR, USA
| | - Jacob E Friedman
- Harold Hamm Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.,Department of Biochemistry and Molecular Biology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA.,Department of Physiology, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA
| | - Chongle Pan
- Harold Hamm Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma City, OK, USA. .,Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA. .,School of Computer Science, University of Oklahoma, Norman, OK, USA.
| |
Collapse
|
36
|
Lourenco JM, Welch CB. Using microbiome information to understand and improve animal performance. ITALIAN JOURNAL OF ANIMAL SCIENCE 2022. [DOI: 10.1080/1828051x.2022.2077147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
37
|
Liu P, Hu S, He Z, Feng C, Dong G, An S, Liu R, Xu F, Chen Y, Ying X. Towards Strain-Level Complexity: Sequencing Depth Required for Comprehensive Single-Nucleotide Polymorphism Analysis of the Human Gut Microbiome. Front Microbiol 2022; 13:828254. [PMID: 35602026 PMCID: PMC9119422 DOI: 10.3389/fmicb.2022.828254] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2021] [Accepted: 03/25/2022] [Indexed: 11/13/2022] Open
Abstract
Intestinal bacteria strains play crucial roles in maintaining host health. Researchers have increasingly recognized the importance of strain-level analysis in metagenomic studies. Many analysis tools and several cutting-edge sequencing techniques like single cell sequencing have been proposed to decipher strains in metagenomes. However, strain-level complexity is far from being well characterized up to date. As the indicator of strain-level complexity, metagenomic single-nucleotide polymorphisms (SNPs) have been utilized to disentangle conspecific strains. Lots of SNP-based tools have been developed to identify strains in metagenomes. However, the sufficient sequencing depth for SNP and strain-level analysis remains unclear. We conducted ultra-deep sequencing of the human gut microbiome and constructed an unbiased framework to perform reliable SNP analysis. SNP profiles of the human gut metagenome by ultra-deep sequencing were obtained. SNPs identified from conventional and ultra-deep sequencing data were thoroughly compared and the relationship between SNP identification and sequencing depth were investigated. The results show that the commonly used shallow-depth sequencing is incapable to support a systematic metagenomic SNP discovery. In contrast, ultra-deep sequencing could detect more functionally important SNPs, which leads to reliable downstream analyses and novel discoveries. We also constructed a machine learning model to provide guidance for researchers to determine the optimal sequencing depth for their projects (SNPsnp, https://github.com/labomics/SNPsnp). To conclude, the SNP profiles based on ultra-deep sequencing data extend current knowledge on metagenomics and highlights the importance of evaluating sequencing depth before starting SNP analysis. This study provides new ideas and references for future strain-level investigations.
Collapse
Affiliation(s)
- Pu Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Shuofeng Hu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Zhen He
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Chao Feng
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Guohua Dong
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Sijing An
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Runyan Liu
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Fang Xu
- Yongkang First People’s Hospital, Yongkang, China
| | - Yaowen Chen
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| | - Xiaomin Ying
- Center for Computational Biology, Beijing Institute of Basic Medical Sciences, Beijing, China
| |
Collapse
|
38
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
39
|
Strain identification and quantitative analysis in microbial communities. J Mol Biol 2022; 434:167582. [DOI: 10.1016/j.jmb.2022.167582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/31/2022] [Accepted: 04/03/2022] [Indexed: 12/14/2022]
|
40
|
Smith BJ, Piceno Y, Zydek M, Zhang B, Syriani LA, Terdiman JP, Kassam Z, Ma A, Lynch SV, Pollard KS, El-Nachef N. Strain-resolved analysis in a randomized trial of antibiotic pretreatment and maintenance dose delivery mode with fecal microbiota transplant for ulcerative colitis. Sci Rep 2022; 12:5517. [PMID: 35365713 PMCID: PMC8976058 DOI: 10.1038/s41598-022-09307-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 03/16/2022] [Indexed: 01/04/2023] Open
Abstract
Fecal microbiota transplant is a promising therapy for ulcerative colitis. Parameters maximizing effectiveness and tolerability are not yet clear, and it is not known how import the transmission of donor microbes to patients is. Here (clinicaltrails.gov: NCT03006809) we have tested the effects of antibiotic pretreatment and compared two modes of maintenance dose delivery, capsules versus enema, in a randomized, pilot, open-label, 2 × 2 factorial design with 22 patients analyzed with mild to moderate UC. Clinically, the treatment was well-tolerated with favorable safety profile. Of patients who received antibiotic pretreatment, 6 of 11 experienced remission after 6 weeks of treatment, versus 2 of 11 non-pretreated patients (log odds ratio: 1.69, 95% confidence interval: −0.25 to 3.62). No significant differences were found between maintenance dosing via capsules versus enema. In exploratory analyses, microbiome turnover at both the species and strain levels was extensive and significantly more pronounced in the pretreated patients. Associations were also revealed between taxonomic turnover and changes in the composition of primary and secondary bile acids. Together these findings suggest that antibiotic pretreatment contributes to microbiome engraftment and possibly clinical effectiveness, and validate longitudinal strain tracking as a powerful way to monitor the dynamics and impact of microbiota transfer.
Collapse
Affiliation(s)
- Byron J Smith
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA.,Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA
| | | | - Martin Zydek
- Division of Gastroenterology, University of California, San Francisco, CA, USA
| | - Bing Zhang
- Division of Gastroenterology, University of California, San Francisco, CA, USA.,Division of Gastrointestinal and Liver Diseases, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
| | - Lara Aboud Syriani
- College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, USA
| | - Jonathan P Terdiman
- Division of Gastroenterology, University of California, San Francisco, CA, USA
| | | | - Averil Ma
- Department of Medicine, University of California, San Francisco, CA, USA
| | - Susan V Lynch
- Division of Gastroenterology, University of California, San Francisco, CA, USA.,Benioff Center for Microbiome Medicine, University of California, San Francisco, CA, USA
| | - Katherine S Pollard
- Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, USA. .,Department of Epidemiology and Biostatistics, University of California, San Francisco, CA, USA. .,Chan Zuckerberg Biohub, San Francisco, CA, USA.
| | - Najwa El-Nachef
- Division of Gastroenterology, University of California, San Francisco, CA, USA.
| |
Collapse
|
41
|
van Dijk LR, Walker BJ, Straub TJ, Worby CJ, Grote A, Schreiber HL, Anyansi C, Pickering AJ, Hultgren SJ, Manson AL, Abeel T, Earl AM. StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities. Genome Biol 2022; 23:74. [PMID: 35255937 PMCID: PMC8900328 DOI: 10.1186/s13059-022-02630-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 02/09/2022] [Indexed: 01/21/2023] Open
Abstract
Human-associated microbial communities comprise not only complex mixtures of bacterial species, but also mixtures of conspecific strains, the implications of which are mostly unknown since strain level dynamics are underexplored due to the difficulties of studying them. We introduce the Strain Genome Explorer (StrainGE) toolkit, which deconvolves strain mixtures and characterizes component strains at the nucleotide level from short-read metagenomic sequencing with higher sensitivity and resolution than other tools. StrainGE is able to identify strains at 0.1x coverage and detect variants for multiple conspecific strains within a sample from coverages as low as 0.5x.
Collapse
Affiliation(s)
- Lucas R. van Dijk
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Bruce J. Walker
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,Applied Invention, Cambridge, MA USA
| | - Timothy J. Straub
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.38142.3c000000041936754XDepartment of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA
| | - Colin J. Worby
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Alexandra Grote
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Henry L. Schreiber
- grid.4367.60000 0001 2355 7002Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110 USA ,grid.4367.60000 0001 2355 7002Center for Women’s Infectious Disease Research (CWIDR), Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Christine Anyansi
- grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Amy J. Pickering
- grid.47840.3f0000 0001 2181 7878Department of Civil and Environmental Engineering, University of California, Berkeley, Berkeley, CA 94720 USA ,grid.429997.80000 0004 1936 7531Stuart B. Levy Center for Integrated Management of Antimicrobial Resistance (Levy CIMAR), Tufts University, Boston, MA USA
| | - Scott J. Hultgren
- grid.4367.60000 0001 2355 7002Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110 USA ,grid.4367.60000 0001 2355 7002Center for Women’s Infectious Disease Research (CWIDR), Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Abigail L. Manson
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Thomas Abeel
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Ashlee M. Earl
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| |
Collapse
|
42
|
Chen W, Radford D, Hambleton S. Towards Improved Detection and Identification of Rust Fungal Pathogens in Environmental Samples Using a Metabarcoding Approach. PHYTOPATHOLOGY 2022; 112:535-548. [PMID: 34384241 DOI: 10.1094/phyto-01-21-0020-r] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The dispersion of fungal inocula such as the airborne spores of rust fungi (Pucciniales) can be monitored through metabarcoding of the internal transcribed spacer 2 (ITS2) of the rRNA gene in environmental DNAs. This method is largely dependent on a high-quality reference database (refDB) and primers with proper taxonomic coverage and specificity. For this study, a curated ITS2 reference database (named CR-ITS2-refDB) comprising representatives of the major cereal rust fungi and phylogenetically related species was compiled. Interspecific and intraspecific variation analyses suggested that the ITS2 region had reasonable discriminating power for the majority of the Puccinia species or species complexes in the database. In silico evaluation of nine forward and seven reverse ITS2 primers, including three newly designed, revealed marked variation in DNA amplification efficiency for the rusts. We validated the theoretical assessment of rust-enhanced (Rust2inv/ITS4var_H) and universal fungal (ITS9F/ITS4) ITS2 primer pairs by profiling the airborne rust fungal communities from environmental samples via a metabarcoding approach. Species- or subspecies-level identification of the rusts was improved by use of CR-ITS2-refDB and the Automated Oligonucleotide Design Pipeline (AODP), which identified all mutations distinguishing highly conserved DNA markers between close relatives. A generic bioinformatics pipeline was developed, including all steps used in this study from in silico evaluation of primers to accurate identification of short metabarcodes at the level of interest for defining phytopathogens. The results highlight the importance of primer selection, refDBs that are resolved to reflect phylogenetic relationships, and the use of AODP for improving the reliability of metabarcoding in phytopathogen biosurveillance.
Collapse
Affiliation(s)
- Wen Chen
- Biodiversity and Bioresources, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, K1A 0C6, Canada
| | - Devon Radford
- Biodiversity and Bioresources, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, K1A 0C6, Canada
| | - Sarah Hambleton
- Biodiversity and Bioresources, Ottawa Research and Development Centre, Agriculture and Agri-Food Canada, Ottawa, ON, K1A 0C6, Canada
| |
Collapse
|
43
|
Billington C, Kingsbury JM, Rivas L. Metagenomics Approaches for Improving Food Safety: A Review. J Food Prot 2022; 85:448-464. [PMID: 34706052 DOI: 10.4315/jfp-21-301] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Accepted: 10/21/2021] [Indexed: 11/11/2022]
Abstract
ABSTRACT Advancements in next-generation sequencing technology have dramatically reduced the cost and increased the ease of microbial whole genome sequencing. This approach is revolutionizing the identification and analysis of foodborne microbial pathogens, facilitating expedited detection and mitigation of foodborne outbreaks, improving public health outcomes, and limiting costly recalls. However, next-generation sequencing is still anchored in the traditional laboratory practice of the selection and culture of a single isolate. Metagenomic-based approaches, including metabarcoding and shotgun and long-read metagenomics, are part of the next disruptive revolution in food safety diagnostics and offer the potential to directly identify entire microbial communities in a single food, ingredient, or environmental sample. In this review, metagenomic-based approaches are introduced and placed within the context of conventional detection and diagnostic techniques, and essential considerations for undertaking metagenomic assays and data analysis are described. Recent applications of the use of metagenomics for food safety are discussed alongside current limitations and knowledge gaps and new opportunities arising from the use of this technology. HIGHLIGHTS
Collapse
Affiliation(s)
- Craig Billington
- Institute of Environmental Science and Research, 27 Creyke Road, Ilam, Christchurch 8041, New Zealand
| | - Joanne M Kingsbury
- Institute of Environmental Science and Research, 27 Creyke Road, Ilam, Christchurch 8041, New Zealand
| | - Lucia Rivas
- Institute of Environmental Science and Research, 27 Creyke Road, Ilam, Christchurch 8041, New Zealand
| |
Collapse
|
44
|
Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022; 23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations.
Collapse
Affiliation(s)
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.,Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
45
|
Ruan Z, Zou S, Wang Z, Zhang L, Chen H, Wu Y, Jia H, Draz MS, Feng Y. Toward accurate diagnosis and surveillance of bacterial infections using enhanced strain-level metagenomic next-generation sequencing of infected body fluids. Brief Bioinform 2022; 23:6519793. [PMID: 35108376 DOI: 10.1093/bib/bbac004] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/17/2021] [Accepted: 01/04/2022] [Indexed: 12/12/2022] Open
Abstract
Metagenomic next-generation sequencing (mNGS) enables comprehensive pathogen detection and has become increasingly popular in clinical diagnosis. The distinct pathogenic traits between strains require mNGS to achieve a strain-level resolution, but an equivocal concept of 'strain' as well as the low pathogen loads in most clinical specimens hinders such strain awareness. Here we introduce a metagenomic intra-species typing (MIST) tool (https://github.com/pandafengye/MIST), which hierarchically organizes reference genomes based on average nucleotide identity (ANI) and performs maximum likelihood estimation to infer the strain-level compositional abundance. In silico analysis using synthetic datasets showed that MIST accurately predicted the strain composition at a 99.9% average nucleotide identity (ANI) resolution with a merely 0.001× sequencing depth. When applying MIST on 359 culture-positive and 359 culture-negative real-world specimens of infected body fluids, we found the presence of multiple-strain reached considerable frequencies (30.39%-93.22%), which were otherwise underestimated by current diagnostic techniques due to their limited resolution. Several high-risk clones were identified to be prevalent across samples, including Acinetobacter baumannii sequence type (ST)208/ST195, Staphylococcus aureus ST22/ST398 and Klebsiella pneumoniae ST11/ST15, indicating potential outbreak events occurring in the clinical settings. Interestingly, contaminations caused by the engineered Escherichia coli strain K-12 and BL21 throughout the mNGS datasets were also identified by MIST instead of the statistical decontamination approach. Our study systemically characterized the infected body fluids at the strain level for the first time. Extension of mNGS testing to the strain level can greatly benefit clinical diagnosis of bacterial infections, including the identification of multi-strain infection, decontamination and infection control surveillance.
Collapse
Affiliation(s)
- Zhi Ruan
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shengmei Zou
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Zeyu Wang
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Luhan Zhang
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Hangfei Chen
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuye Wu
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Huiqiong Jia
- Deparment of Laboratory Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mohamed S Draz
- Department of Medicine, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Ye Feng
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
46
|
Hillestad EMR, van der Meeren A, Nagaraja BH, Bjørsvik BR, Haleem N, Benitez-Paez A, Sanz Y, Hausken T, Lied GA, Lundervold A, Berentsen B. Gut bless you: The microbiota-gut-brain axis in irritable bowel syndrome. World J Gastroenterol 2022; 28:412-431. [PMID: 35125827 PMCID: PMC8790555 DOI: 10.3748/wjg.v28.i4.412] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/17/2021] [Revised: 06/24/2021] [Accepted: 01/13/2022] [Indexed: 12/16/2022] Open
Abstract
Irritable bowel syndrome (IBS) is a common clinical label for medically unexplained gastrointestinal symptoms, recently described as a disturbance of the microbiota-gut-brain axis. Despite decades of research, the pathophysiology of this highly heterogeneous disorder remains elusive. However, a dramatic change in the understanding of the underlying pathophysiological mechanisms surfaced when the importance of gut microbiota protruded the scientific picture. Are we getting any closer to understanding IBS' etiology, or are we drowning in unspecific, conflicting data because we possess limited tools to unravel the cluster of secrets our gut microbiota is concealing? In this comprehensive review we are discussing some of the major important features of IBS and their interaction with gut microbiota, clinical microbiota-altering treatment such as the low FODMAP diet and fecal microbiota transplantation, neuroimaging and methods in microbiota analyses, and current and future challenges with big data analysis in IBS.
Collapse
Affiliation(s)
- Eline Margrete Randulff Hillestad
- Department of Clinical Medicine, University of Bergen, Bergen 5021, Norway
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
| | - Aina van der Meeren
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
| | - Bharat Halandur Nagaraja
- Mohn Medical Imaging and Visualization Center, Department of Radiology, Haukeland University Hospital, Bergen 5021, Norway
| | - Ben René Bjørsvik
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
- Mohn Medical Imaging and Visualization Center, Department of Radiology, Haukeland University Hospital, Bergen 5021, Norway
| | - Noman Haleem
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
- Mohn Medical Imaging and Visualization Center, Department of Radiology, Haukeland University Hospital, Bergen 5021, Norway
| | - Alfonso Benitez-Paez
- Host-Microbe Interactions in Metabolic Health Laboratory, Principe Felipe Research Center, Valencia 46012, Spain
| | - Yolanda Sanz
- Microbial Ecology, Nutrition and Health Research Unit, Institute of Agrochemistry and Food Technology, National Research Council, Paterna-Valencia 46980, Spain
| | - Trygve Hausken
- Department of Clinical Medicine, University of Bergen, Bergen 5021, Norway
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
| | - Gülen Arslan Lied
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
- Center for Nutrition, Department of Clinical Medicine, University of Bergen, Bergen 5021, Norway
| | - Arvid Lundervold
- Mohn Medical Imaging and Visualization Center, Department of Radiology, Haukeland University Hospital, Bergen 5021, Norway
- Department of Biomedicine, University of Bergen, Bergen 5021, Norway
| | - Birgitte Berentsen
- Department of Clinical Medicine, University of Bergen, Bergen 5021, Norway
- National Center for Functional Gastrointestinal Disorders, Department of Medicine, Haukeland University Hospital, Bergen 5021, Norway
| |
Collapse
|
47
|
Zhang L, Chen F, Zeng Z, Xu M, Sun F, Yang L, Bi X, Lin Y, Gao Y, Hao H, Yi W, Li M, Xie Y. Advances in Metagenomics and Its Application in Environmental Microorganisms. Front Microbiol 2022; 12:766364. [PMID: 34975791 PMCID: PMC8719654 DOI: 10.3389/fmicb.2021.766364] [Citation(s) in RCA: 64] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2021] [Accepted: 11/18/2021] [Indexed: 01/04/2023] Open
Abstract
Metagenomics is a new approach to study microorganisms obtained from a specific environment by functional gene screening or sequencing analysis. Metagenomics studies focus on microbial diversity, community constitute, genetic and evolutionary relationships, functional activities, and interactions and relationships with the environment. Sequencing technologies have evolved from shotgun sequencing to high-throughput, next-generation sequencing (NGS), and third-generation sequencing (TGS). NGS and TGS have shown the advantage of rapid detection of pathogenic microorganisms. With the help of new algorithms, we can better perform the taxonomic profiling and gene prediction of microbial species. Functional metagenomics is helpful to screen new bioactive substances and new functional genes from microorganisms and microbial metabolites. In this article, basic steps, classification, and applications of metagenomics are reviewed.
Collapse
Affiliation(s)
- Lu Zhang
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - FengXin Chen
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Zhan Zeng
- Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - Mengjiao Xu
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Fangfang Sun
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Liu Yang
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Xiaoyue Bi
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Yanjie Lin
- Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - YuanJiao Gao
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - HongXiao Hao
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Wei Yi
- Department of Gynecology and Obstetrics, Beijing Ditan Hospital, Capital Medical University, Beijing, China
| | - Minghui Li
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China.,Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| | - Yao Xie
- Department of Hepatology Division 2, Beijing Ditan Hospital, Capital Medical University, Beijing, China.,Department of Hepatology Division 2, Peking University Ditan Teaching Hospital, Beijing, China
| |
Collapse
|
48
|
Bani A, Randall KC, Clark DR, Gregson BH, Henderson DK, Losty EC, Ferguson RM. Mind the gaps: What do we know about how multiple chemical stressors impact freshwater aquatic microbiomes? ADV ECOL RES 2022. [DOI: 10.1016/bs.aecr.2022.09.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
49
|
Lawrence D, Campbell DE, Schriefer LA, Rodgers R, Walker FC, Turkin M, Droit L, Parkes M, Handley SA, Baldridge MT. Single-cell genomics for resolution of conserved bacterial genes and mobile genetic elements of the human intestinal microbiota using flow cytometry. Gut Microbes 2022; 14:2029673. [PMID: 35130125 PMCID: PMC8824198 DOI: 10.1080/19490976.2022.2029673] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/19/2021] [Revised: 12/03/2021] [Accepted: 01/07/2022] [Indexed: 02/04/2023] Open
Abstract
As our understanding of the importance of the human microbiota in health and disease grows, so does our need to carefully resolve and delineate its genomic content. 16S rRNA gene-based analyses yield important insights into taxonomic composition, and metagenomics-based approaches reveal the functional potential of microbial communities. However, these methods generally fail to directly link genetic features, including bacterial genes and mobile genetic elements, to each other and to their source bacterial genomes. Further, they are inadequate to capture the microdiversity present within a genus, species, or strain of bacteria within these complex communities. Here, we present a method utilizing fluorescence-activated cell sorting for isolation of single bacterial cells, amplifying their genomes, screening them by 16S rRNA gene analysis, and selecting cells for genomic sequencing. We apply this method to both a cultured laboratory strain of Escherichia coli and human stool samples. Our analyses reveal the capacity of this method to provide nearly complete coverage of bacterial genomes when applied to isolates and partial genomes of bacterial species recovered from complex communities. Additionally, this method permits exploration and comparison of conserved and variable genomic features between individual cells. We generate assemblies of novel genomes within the Ruminococcaceae family and the Holdemanella genus by combining several 16S rRNA gene-matched single cells, and report novel prophages and conjugative transposons for both Bifidobacterium and Ruminococcaceae. Thus, we demonstrate an approach for flow cytometric separation and sequencing of single bacterial cells from the human microbiota, which yields a variety of critical insights into both the functional potential of individual microbes and the variation among those microbes. This method definitively links a variety of conserved and mobile genomic features, and can be extended to further resolve diverse elements present in the human microbiota.
Collapse
Affiliation(s)
- Dylan Lawrence
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Danielle E. Campbell
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Lawrence A. Schriefer
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Rachel Rodgers
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Forrest C. Walker
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Marissa Turkin
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| | - Lindsay Droit
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Miles Parkes
- Division of Gastroenterology Addenbrooke’s Hospital and Department of Medicine, University of Cambridge, Cambridge, UK
| | - Scott A. Handley
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
- Department of Pathology & Immunology, Washington University School of Medicine, St. Louis, MO, USA
| | - Megan T. Baldridge
- Department of Medicine, Division of Infectious Diseases, Washington University School of Medicine, St. Louis, MO, USA
- Edison Family Center for Genome Sciences & Systems Biology, Washington University School of Medicine, St. Louis, MO, USA
| |
Collapse
|
50
|
Siekaniec G, Roux E, Lemane T, Guédon E, Nicolas J. Identification of isolated or mixed strains from long reads: a challenge met on Streptococcus thermophilus using a MinION sequencer. Microb Genom 2021; 7. [PMID: 34812718 PMCID: PMC8743539 DOI: 10.1099/mgen.0.000654] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
This study aimed to provide efficient recognition of bacterial strains on personal computers from MinION (Nanopore) long read data. Thanks to the fall in sequencing costs, the identification of bacteria can now proceed by whole genome sequencing. MinION is a fast, but highly error-prone sequencing device and it is a challenge to successfully identify the strain content of unknown simple or complex microbial samples. It is heavily constrained by memory management and fast access to the read and genome fragments. Our strategy involves three steps: indexing of known genomic sequences for a given or several bacterial species; a request process to assign a read to a strain by matching it to the closest reference genomes; and a final step looking for a minimum set of strains that best explains the observed reads. We have applied our method, called ORI, on 77 strains of Streptococcus thermophilus. We worked on several genomic distances and obtained a detailed classification of the strains, together with a criterion that allows merging of what we termed 'sibling' strains, only separated by a few mutations. Overall, isolated strains can be safely recognized from MinION data. For mixtures of several non-sibling strains, results depend on strain abundance.
Collapse
Affiliation(s)
- Grégoire Siekaniec
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- INRAE, Institut Agro, STLO, F-35000, Rennes, France
| | - Emeline Roux
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- CALBINOTOX (Composés ALimentaire BIofonctionnalités et risques NeuTOXiques) EA7488 Université de Lorraine, France
| | - Téo Lemane
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
| | - Eric Guédon
- INRAE, Institut Agro, STLO, F-35000, Rennes, France
- *Correspondence: Eric Guédon,
| | - Jacques Nicolas
- Univ Rennes, INRIA, Campus de Beaulieu 35042 Rennes cedex, Rennes, France
- *Correspondence: Jacques Nicolas,
| |
Collapse
|