1
|
McKnight DJE, Wong-Bajracharya J, Okoh EB, Snijders F, Lidbetter F, Webster J, Haughton M, Darling AE, Djordjevic SP, Bogema DR, Chapman TA. Xanthomonas rydalmerensis sp. nov., a non-pathogenic member of Group 1 Xanthomonas. Int J Syst Evol Microbiol 2024; 74:006294. [PMID: 38536071 PMCID: PMC10995728 DOI: 10.1099/ijsem.0.006294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/04/2024] [Indexed: 04/07/2024] Open
Abstract
Five bacterial isolates were isolated from Fragaria × ananassa in 1976 in Rydalmere, Australia, during routine biosecurity surveillance. Initially, the results of biochemical characterisation indicated that these isolates represented members of the genus Xanthomonas. To determine their species, further analysis was conducted using both phenotypic and genotypic approaches. Phenotypic analysis involved using MALDI-TOF MS and BIOLOG GEN III microplates, which confirmed that the isolates represented members of the genus Xanthomonas but did not allow them to be classified with respect to species. Genome relatedness indices and the results of extensive phylogenetic analysis confirmed that the isolates were members of the genus Xanthomonas and represented a novel species. On the basis the minimal presence of virulence-associated factors typically found in genomes of members of the genus Xanthomonas, we suggest that these isolates are non-pathogenic. This conclusion was supported by the results of a pathogenicity assay. On the basis of these findings, we propose the name Xanthomonas rydalmerensis, with DAR 34855T = ICMP 24941 as the type strain.
Collapse
|
2
|
Jaya FR, Brito BP, Darling AE. Evaluation of recombination detection methods for viral sequencing. Virus Evol 2023; 9:vead066. [PMID: 38131005 PMCID: PMC10734630 DOI: 10.1093/ve/vead066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/03/2023] [Accepted: 11/15/2023] [Indexed: 12/23/2023] Open
Abstract
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods (RDMs) have been developed over the past two decades; however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed eight RDMs: PhiPack (Profile), 3SEQ, GENECONV, recombination detection program (RDP) (OpenRDP), MaxChi (OpenRDP), Chimaera (OpenRDP), UCHIME (VSEARCH), and gmos; to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Furthermore, we provide a practical example for the analysis and validation of empirical data. We find that RDMs need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
Collapse
|
3
|
Brito BP, Frost MJ, Anantanawat K, Jaya F, Batterham T, Djordjevic SP, Chang WS, Holmes EC, Darling AE, Kirkland PD. Expanding the range of the respiratory infectome in Australian feedlot cattle with and without respiratory disease using metatranscriptomics. MICROBIOME 2023; 11:158. [PMID: 37491320 PMCID: PMC10367309 DOI: 10.1186/s40168-023-01591-1] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 06/03/2023] [Indexed: 07/27/2023]
Abstract
BACKGROUND Bovine respiratory disease (BRD) is one of the most common diseases in intensively managed cattle, often resulting in high morbidity and mortality. Although several pathogens have been isolated and extensively studied, the complete infectome of the respiratory complex consists of a more extensive range unrecognised species. Here, we used total RNA sequencing (i.e., metatranscriptomics) of nasal and nasopharyngeal swabs collected from animals with and without BRD from two cattle feedlots in Australia. RESULTS A high abundance of bovine nidovirus, influenza D, bovine rhinitis A and bovine coronavirus was found in the samples. Additionally, we obtained the complete or near-complete genome of bovine rhinitis B, enterovirus E1, bovine viral diarrhea virus (sub-genotypes 1a and 1c) and bovine respiratory syncytial virus, and partial sequences of other viruses. A new species of paramyxovirus was also identified. Overall, the most abundant RNA virus, was the bovine nidovirus. Characterisation of bacterial species from the transcriptome revealed a high abundance and diversity of Mollicutes in BRD cases and unaffected control animals. Of the non-Mollicutes species, Histophilus somni was detected, whereas there was a low abundance of Mannheimia haemolytica. CONCLUSION This study highlights the use of untargeted sequencing approaches to study the unrecognised range of microorganisms present in healthy or diseased animals and the need to study previously uncultured viral species that may have an important role in cattle respiratory disease. Video Abstract.
Collapse
|
4
|
Krishnan S, DeMaere MZ, Beck D, Ostrowski M, Seymour JR, Darling AE. Rhometa: Population recombination rate estimation from metagenomic read datasets. PLoS Genet 2023; 19:e1010683. [PMID: 36972309 PMCID: PMC10079220 DOI: 10.1371/journal.pgen.1010683] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 04/06/2023] [Accepted: 02/27/2023] [Indexed: 03/29/2023] Open
Abstract
Prokaryotic evolution is influenced by the exchange of genetic information between species through a process referred to as recombination. The rate of recombination is a useful measure for the adaptive capacity of a prokaryotic population. We introduce Rhometa (https://github.com/sid-krish/Rhometa), a new software package to determine recombination rates from shotgun sequencing reads of metagenomes. It extends the composite likelihood approach for population recombination rate estimation and enables the analysis of modern short-read datasets. We evaluated Rhometa over a broad range of sequencing depths and complexities, using simulated and real experimental short-read data aligned to external reference genomes. Rhometa offers a comprehensive solution for determining population recombination rates from contemporary metagenomic read datasets. Rhometa extends the capabilities of conventional sequence-based composite likelihood population recombination rate estimators to include modern aligned metagenomic read datasets with diverse sequencing depths, thereby enabling the effective application of these techniques and their high accuracy rates to the field of metagenomics. Using simulated datasets, we show that our method performs well, with its accuracy improving with increasing numbers of genomes. Rhometa was validated on a real S. pneumoniae transformation experiment, where we show that it obtains plausible estimates of the rate of recombination. Finally, the program was also run on ocean surface water metagenomic datasets, through which we demonstrate that the program works on uncultured metagenomic datasets.
Collapse
|
5
|
Gaio D, DeMaere MZ, Anantanawat K, Eamens GJ, Falconer L, Chapman TA, Djordjevic S, Darling AE. Phylogenetic diversity analysis of shotgun metagenomic reads describes gut microbiome development and treatment effects in the post-weaned pig. PLoS One 2022; 17:e0270372. [PMID: 35749534 PMCID: PMC9232140 DOI: 10.1371/journal.pone.0270372] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open
Abstract
Intensive farming practices can increase exposure of animals to infectious agents against which antibiotics are used. Orally administered antibiotics are well known to cause dysbiosis. To counteract dysbiotic effects, numerous studies in the past two decades sought to understand whether probiotics are a valid tool to help re-establish a healthy gut microbial community after antibiotic treatment. Although dysbiotic effects of antibiotics are well investigated, little is known about the effects of intramuscular antibiotic treatment on the gut microbiome and a few studies attempted to study treatment effects using phylogenetic diversity analysis techniques. In this study we sought to determine the effects of two probiotic- and one intramuscularly administered antibiotic treatment on the developing gut microbiome of post-weaning piglets between their 3rd and 9th week of life. Shotgun metagenomic sequences from over 800 faecal time-series samples derived from 126 post-weaning piglets and 42 sows were analysed in a phylogenetic framework. Differences between individual hosts such as breed, litter, and age, were found to be important contributors to variation in the community composition. Host age was the dominant factor in shaping the gut microbiota of piglets after weaning. The post-weaning pig gut microbiome appeared to follow a highly structured developmental program with characteristic post-weaning changes that can distinguish hosts that were born as little as two days apart in the second month of life. Treatment effects of the antibiotic and probiotic treatments were found but were subtle and included a higher representation of Mollicutes associated with intramuscular antibiotic treatment, and an increase of Lactobacillus associated with probiotic treatment. The discovery of correlations between experimental factors and microbial community composition is more commonly addressed with OTU-based methods and rarely analysed via phylogenetic diversity measures. The latter method, although less intuitive than the former, suffers less from library size normalization biases, and it proved to be instrumental in this study for the discovery of correlations between microbiome composition and host-, and treatment factors.
Collapse
|
6
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 120] [Impact Index Per Article: 60.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
|
7
|
Gaio D, Anantanawat K, To J, Liu M, Monahan L, Darling AE. Hackflex: low-cost, high-throughput, Illumina Nextera Flex library construction. Microb Genom 2022; 8. [PMID: 35014949 PMCID: PMC8914357 DOI: 10.1099/mgen.0.000744] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
We developed a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 14 times more libraries for high-throughput Illumina sequencing to be generated for the same cost. We call this new method Hackflex. The quality of library preparation was tested by constructing libraries from Escherichia coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex (recently renamed as Illumina DNA Prep) or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. In order to test the library quality for genomes with a higher and a lower G+C content, library construction methods were also tested on Pseudomonas aeruginosa PAO1 and Staphylococcus aureus ATCC 25923, respectively. We demonstrated that Hackflex can produce high-quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. We show that strongly size-selected libraries produce sufficient yield and complexity to support de novo microbial genome assembly, and that assemblies of the large-insert libraries can be much more contiguous than standard libraries without strong size selection. We introduce a new set of sample barcodes that are distinct from standard Illumina barcodes, enabling Hackflex samples to be multiplexed with samples barcoded using standard Illumina kits. Using Hackflex, we were able to achieve a per-sample reagent cost for library prep of A$7.22 (Australian dollars) (US $5.60; UK £3.87, £1=A$1.87), which is 9.87 times lower than the standard Nextera Flex protocol at advertised retail price. An additional simple modification and further simplification of the protocol by omitting the wash step enables a further price reduction to reach an overall 14-fold cost saving. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programmes where sequencing large numbers of libraries is beneficial.
Collapse
|
8
|
DeMaere MZ, Darling AE. qc3C: Reference-free quality control for Hi-C sequencing data. PLoS Comput Biol 2021; 17:e1008839. [PMID: 34634030 PMCID: PMC8530316 DOI: 10.1371/journal.pcbi.1008839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 10/21/2021] [Accepted: 09/16/2021] [Indexed: 11/19/2022] Open
Abstract
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods. The Hi-C sequencing technique offers the potential for significant scientific insight about the spatial arrangement of DNA, however achieving such outcomes is highly dependent on the quality of the resulting sequencing library. Unlike conventional next-gen sequencing, only a fraction of a given Hi-C library contains this useful spatial information (the signal) with the remainder being effectively noise. As Hi-C remains a challenging laboratory technique, signal strength of resulting libraries can vary greatly. As a quality metric, the quantification a library’s signal content is an essential asset in any quality mitigation strategy. Quality assessment of Hi-C data has until now relied on access to a (ideally refined) reference sequence, by which indirect indicators of quality are determined. Here we describe qc3C, a software tool capable of the direct, reference-free estimation of the signal content of a Hi-C library. In doing so, not only can researchers make informed decisions on how to progress based on library information content, but eliminating the reference also enables Hi-C quality management for non-model organism and metagenomics researchers.
Collapse
|
9
|
Gaio D, DeMaere MZ, Anantanawat K, Chapman TA, Djordjevic SP, Darling AE. Post-weaning shifts in microbiome composition and metabolism revealed by over 25 000 pig gut metagenome-assembled genomes. Microb Genom 2021; 7. [PMID: 34370660 PMCID: PMC8549361 DOI: 10.1099/mgen.0.000501] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Using a previously described metagenomics dataset of 27 billion reads, we reconstructed over 50 000 metagenome-assembled genomes (MAGs) of organisms resident in the porcine gut, 46.5 % of which were classified as >70 % complete with a <10 % contamination rate, and 24.4 % were nearly complete genomes. Here, we describe the generation and analysis of those MAGs using time-series samples. The gut microbial communities of piglets appear to follow a highly structured developmental programme in the weeks following weaning, and this development is robust to treatments including an intramuscular antibiotic treatment and two probiotic treatments. The high resolution we obtained allowed us to identify specific taxonomic ‘signatures’ that characterize the gut microbial development immediately after weaning. Additionally, we characterized the carbohydrate repertoire of the organisms resident in the porcine gut. We tracked the abundance shifts of 294 carbohydrate active enzymes, and identified the species and higher-level taxonomic groups carrying each of these enzymes in their MAGs. This knowledge can contribute to the design of probiotics and prebiotic interventions as a means to modify the piglet gut microbiome.
Collapse
|
10
|
Quince C, Nurk S, Raguideau S, James R, Soyer OS, Summers JK, Limasset A, Eren AM, Chikhi R, Darling AE. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 2021; 22:214. [PMID: 34311761 PMCID: PMC8311964 DOI: 10.1186/s13059-021-02419-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 06/29/2021] [Indexed: 12/30/2022] Open
Abstract
We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.
Collapse
|
11
|
Vicedomini R, Quince C, Darling AE, Chikhi R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat Commun 2021; 12:4485. [PMID: 34301928 PMCID: PMC8302730 DOI: 10.1038/s41467-021-24515-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/18/2021] [Indexed: 02/07/2023] Open
Abstract
High-throughput short-read metagenomics has enabled large-scale species-level analysis and functional characterization of microbial communities. Microbiomes often contain multiple strains of the same species, and different strains have been shown to have important differences in their functional roles. Recent advances on long-read based methods enabled accurate assembly of bacterial genomes from complex microbiomes and an as-yet-unrealized opportunity to resolve strains. Here we present Strainberry, a metagenome assembly pipeline that performs strain separation in single-sample low-complexity metagenomes and that relies uniquely on long-read data. We benchmarked Strainberry on mock communities for which it produces strain-resolved assemblies with near-complete reference coverage and 99.9% base accuracy. We also applied Strainberry on real datasets for which it improved assemblies generating 20-118% additional genomic material than conventional metagenome assemblies on individual strain genomes. We show that Strainberry is also able to refine microbial diversity in a complex microbiome, with complete separation of strain genomes. We anticipate this work to be a starting point for further methodological improvements on strain-resolved metagenome assembly in environments of higher complexities.
Collapse
|
12
|
Gaio D, DeMaere MZ, Anantanawat K, Eamens GJ, Liu M, Zingali T, Falconer L, Chapman TA, Djordjevic SP, Darling AE. A large-scale metagenomic survey dataset of the post-weaning piglet gut lumen. Gigascience 2021; 10:giab039. [PMID: 34080630 PMCID: PMC8173662 DOI: 10.1093/gigascience/giab039] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 02/22/2021] [Accepted: 05/04/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Early weaning and intensive farming practices predispose piglets to the development of infectious and often lethal diseases, against which antibiotics are used. Besides contributing to the build-up of antimicrobial resistance, antibiotics are known to modulate the gut microbial composition. As an alternative to antibiotic treatment, studies have previously investigated the potential of probiotics for the prevention of postweaning diarrhea. In order to describe the post-weaning gut microbiota, and to study the effects of two probiotics formulations and of intramuscular antibiotic treatment on the gut microbiota, we sampled and processed over 800 faecal time-series samples from 126 piglets and 42 sows. RESULTS Here we report on the largest shotgun metagenomic dataset of the pig gut lumen microbiome to date, consisting of >8 Tbp of shotgun metagenomic sequencing data. The animal trial, the workflow from sample collection to sample processing, and the preparation of libraries for sequencing, are described in detail. We provide a preliminary analysis of the dataset, centered on a taxonomic profiling of the samples, and a 16S-based beta diversity analysis of the mothers and the piglets in the first 5 weeks after weaning. CONCLUSIONS This study was conducted to generate a publicly available databank of the faecal metagenome of weaner piglets aged between 3 and 9 weeks old, treated with different probiotic formulations and intramuscular antibiotic treatment. Besides investigating the effects of the probiotic and intramuscular antibiotic treatment, the dataset can be explored to assess a wide range of ecological questions with regards to antimicrobial resistance, host-associated microbial and phage communities, and their dynamics during the aging of the host.
Collapse
|
13
|
Meyer F, Lesker TR, Koslicki D, Fritz A, Gurevich A, Darling AE, Sczyrba A, Bremges A, McHardy AC. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 2021; 16:1785-1801. [PMID: 33649565 DOI: 10.1038/s41596-020-00480-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 11/26/2020] [Indexed: 01/31/2023]
Abstract
Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.
Collapse
|
14
|
Hastak P, Fourment M, Darling AE, Gottlieb T, Cheong E, Merlino J, Myers GSA, Djordjevic SP, Roy Chowdhury P. Escherichia coli ST8196 is a novel, locally evolved, and extensively drug resistant pathogenic lineage within the ST131 clonal complex. Emerg Microbes Infect 2020; 9:1780-1792. [PMID: 32686595 PMCID: PMC7473005 DOI: 10.1080/22221751.2020.1797541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 07/14/2020] [Indexed: 11/25/2022]
Abstract
The H30Rx subclade of Escherichia coli ST131 is a clinically important, globally dispersed pathogenic lineage that typically displays resistance to fluoroquinolones and extended spectrum β-lactams. Isolates EC233 and EC234, variants of ST131-H30Rx with a novel sequence type (ST) 8196, isolated from unrelated patients presenting with bacteraemia at a Sydney Hospital in 2014 are characterised here. EC233 and EC234 are phylogroup B2, serotype O25:H4A, and resistant to ampicillin, amoxicillin, cefoxitin, ceftazidime, ceftriaxone, ciprofloxacin, norfloxacin and gentamicin and are likely clonal. Both harbour an IncFII_2 plasmid (pSPRC_Ec234-FII) that carries most of the resistance genes on an IS26 associated translocatable unit, two small plasmids and a novel IncI1 plasmid (pSPRC_Ec234-I). SNP-based phylogenetic analysis of the core genome of representatives within the ST131 clonal complex places both isolates in a subclade with three clinical Australian ST131-H30Rx clade-C isolates. A MrBayes phylogeny analysis of EC233 and EC234 indicates ST8196 share a most recent common ancestor with ST131-H30Rx strain EC70 isolated from the same hospital in 2013. Our study identified genomic hallmarks that define the ST131-H30Rx subclade in the ST8196 isolates and highlights a need for unbiased genomic surveillance approaches to identify novel high-risk MDR E. coli pathogens that impact healthcare facilities.
Collapse
|
15
|
Bogema DR, McKinnon J, Liu M, Hitchick N, Miller N, Venturini C, Iredell J, Darling AE, Roy Chowdury P, Djordjevic SP. Whole-genome analysis of extraintestinal Escherichia coli sequence type 73 from a single hospital over a 2 year period identified different circulating clonal groups. Microb Genom 2020; 6. [PMID: 30810518 PMCID: PMC7067039 DOI: 10.1099/mgen.0.000255] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sequence type (ST)73 has emerged as one of the most frequently isolated extraintestinal pathogenic Escherichia coli. To examine the localized diversity of ST73 clonal groups, including their mobile genetic element profile, we sequenced the genomes of 16 multiple-drug resistant ST73 isolates from patients with urinary tract infection from a single hospital in Sydney, Australia, between 2009 and 2011. Genome sequences were used to generate a SNP-based phylogenetic tree to determine the relationship of these isolates in a global context with ST73 sequences (n=210) from public databases. There was no evidence of a dominant outbreak strain of ST73 in patients from this hospital, rather we identified at least eight separate groups, several of which reoccurred, over a 2 year period. The inferred phylogeny of all ST73 strains (n=226) including the ST73 clone D i2 reference genome shows high bootstrap support and clusters into four major groups that correlate with serotype. The Sydney ST73 strains carry a wide variety of virulence-associated genes, but the presence of iss, pic and several iron-acquisition operons was notable.
Collapse
|
16
|
Lodge CJ, Lowe AJ, Milanzi E, Bowatte G, Abramson MJ, Tsimiklis H, Axelrad C, Robertson B, Darling AE, Svanes C, Wjst M, Dharmage SC, Bode L. Human milk oligosaccharide profiles and allergic disease up to 18 years. J Allergy Clin Immunol 2020; 147:1041-1048. [PMID: 32650022 DOI: 10.1016/j.jaci.2020.06.027] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 06/18/2020] [Accepted: 06/24/2020] [Indexed: 01/01/2023]
Abstract
BACKGROUND Human milk oligosaccharides (HMO) are a diverse range of sugars secreted in breast milk that have direct and indirect effects on immunity. The profiles of HMOs produced differ between mothers. OBJECTIVE We sought to determine the relationship between maternal HMO profiles and offspring allergic diseases up to age 18 years. METHODS Colostrum and early lactation milk samples were collected from 285 mothers enrolled in a high-allergy-risk birth cohort, the Melbourne Atopy Cohort Study. Nineteen HMOs were measured. Profiles/patterns of maternal HMOs were determined using LCA. Details of allergic disease outcomes including sensitization, wheeze, asthma, and eczema were collected at multiple follow-ups up to age 18 years. Adjusted logistic regression analyses and generalized estimating equations were used to determine the relationship between HMO profiles and allergy. RESULTS The levels of several HMOs were highly correlated with each other. LCA determined 7 distinct maternal milk profiles with memberships of 10% and 20%. Compared with offspring exposed to the neutral Lewis HMO profile, exposure to acidic Lewis HMOs was associated with a higher risk of allergic disease and asthma over childhood (odds ratio asthma at 18 years, 5.82; 95% CI, 1.59-21.23), whereas exposure to the acidic-predominant profile was associated with a reduced risk of food sensitization (OR at 12 years, 0.08; 95% CI, 0.01-0.67). CONCLUSIONS In this high-allergy-risk birth cohort, some profiles of HMOs were associated with increased and some with decreased allergic disease risks over childhood. Further studies are needed to confirm these findings and realize the potential for intervention.
Collapse
|
17
|
DeMaere MZ, Liu MYZ, Lin E, Djordjevic SP, Charles IG, Worden P, Burke CM, Monahan LG, Gardiner M, Borody TJ, Darling AE. Metagenomic Hi-C of a Healthy Human Fecal Microbiome Transplant Donor. Microbiol Resour Announc 2020; 9:e01523-19. [PMID: 32029559 PMCID: PMC7005124 DOI: 10.1128/mra.01523-19] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/06/2020] [Indexed: 11/20/2022] Open
Abstract
We report the availability of a high-quality metagenomic Hi-C data set generated from a fecal sample taken from a healthy fecal microbiome transplant donor subject. We report on basic features of the data to evaluate their quality.
Collapse
|
18
|
Fourment M, Darling AE. Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 2019; 7:e8272. [PMID: 31976168 PMCID: PMC6966998 DOI: 10.7717/peerj.8272] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/22/2019] [Indexed: 12/21/2022] Open
Abstract
Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.
Collapse
|
19
|
Ayres DL, Cummings MP, Baele G, Darling AE, Lewis PO, Swofford DL, Huelsenbeck JP, Lemey P, Rambaut A, Suchard MA. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Syst Biol 2019; 68:1052-1061. [PMID: 31034053 PMCID: PMC6802572 DOI: 10.1093/sysbio/syz020] [Citation(s) in RCA: 126] [Impact Index Per Article: 25.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 04/10/2019] [Accepted: 04/10/2019] [Indexed: 11/12/2022] Open
Abstract
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
Collapse
|
20
|
Coil DA, Jospin G, Darling AE, Wallis C, Davis IJ, Harris S, Eisen JA, Holcombe LJ, O’Flynn C. Genomes from bacteria associated with the canine oral cavity: A test case for automated genome-based taxonomic assignment. PLoS One 2019; 14:e0214354. [PMID: 31181071 PMCID: PMC6557473 DOI: 10.1371/journal.pone.0214354] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 05/27/2019] [Indexed: 11/18/2022] Open
Abstract
Taxonomy for bacterial isolates is commonly assigned via sequence analysis. However, the most common sequence-based approaches (e.g. 16S rRNA gene-based phylogeny or whole genome comparisons) are still labor intensive and subjective to varying degrees. Here we present a set of 33 bacterial genomes, isolated from the canine oral cavity. Taxonomy of these isolates was first assigned by PCR amplification of the 16S rRNA gene, Sanger sequencing, and taxonomy assignment using BLAST. After genome sequencing, taxonomy was revisited through a manual process using a combination of average nucleotide identity (ANI), concatenated marker gene phylogenies, and 16S rRNA gene phylogenies. This taxonomy was then compared to the automated taxonomic assignment given by the recently proposed Genome Taxonomy Database (GTDB). We found the results of all three methods to be similar (25 out of the 33 had matching genera), but the GTDB approach required fewer subjective decisions, and required far less labor. The primary differences in the non-identical taxonomic assignments involved cases where GTDB has proposed taxonomic revisions.
Collapse
|
21
|
Roy Chowdhury P, Fourment M, DeMaere MZ, Monahan L, Merlino J, Gottlieb T, Darling AE, Djordjevic SP. Identification of a novel lineage of plasmids within phylogenetically diverse subclades of IncHI2-ST1 plasmids. Plasmid 2019; 102:56-61. [PMID: 30885788 DOI: 10.1016/j.plasmid.2019.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 02/22/2019] [Accepted: 03/13/2019] [Indexed: 11/17/2022]
Abstract
IncHI2-ST1 plasmids play an important role in co-mobilizing genes conferring resistance to critically important antibiotics and heavy metals. Here we present the identification and analysis of IncHI2-ST1 plasmid pSPRC-Echo1, isolated from an Enterobacter hormaechei strain from a Sydney hospital, which predates other multi-drug resistant IncHI2-ST1 plasmids reported from Australia. Our time-resolved phylogeny analysis indicates pSPRC-Echo1 represents a new lineage of IncHI2-ST1 plasmids and show how their diversification relates to the era of antibiotics.
Collapse
|
22
|
DeMaere MZ, Darling AE. bin3C: exploiting Hi-C sequencing data to accurately resolve metagenome-assembled genomes. Genome Biol 2019; 20:46. [PMID: 30808380 PMCID: PMC6391755 DOI: 10.1186/s13059-019-1643-1] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Accepted: 01/29/2019] [Indexed: 11/10/2022] Open
Abstract
Most microbes cannot be easily cultured, and metagenomics provides a means to study them. Current techniques aim to resolve individual genomes from metagenomes, so-called metagenome-assembled genomes (MAGs). Leading approaches depend upon time series or transect studies, the efficacy of which is a function of community complexity, target abundance, and sequencing depth. We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs using a single time point. We validate the method and directly compare against a recently announced proprietary service, ProxiMeta. bin3C is an open-source pipeline and makes use of the Infomap clustering algorithm ( https://github.com/cerebis/bin3C ).
Collapse
|
23
|
Monahan LG, DeMaere MZ, Cummins ML, Djordjevic SP, Roy Chowdhury P, Darling AE. High contiguity genome sequence of a multidrug-resistant hospital isolate of Enterobacter hormaechei. Gut Pathog 2019; 11:3. [PMID: 30805030 PMCID: PMC6373042 DOI: 10.1186/s13099-019-0288-7] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 02/06/2019] [Indexed: 11/10/2022] Open
Abstract
Background Enterobacter hormaechei is an important emerging pathogen and a key member of the highly diverse Enterobacter cloacae complex. E. hormaechei strains can persist and spread in nosocomial environments, and often exhibit resistance to multiple clinically important antibiotics. However, the genomic regions that harbour resistance determinants are typically highly repetitive and impossible to resolve with standard short-read sequencing technologies. Results Here we used both short- and long-read methods to sequence the genome of a multidrug-resistant hospital isolate (C15117), which we identified as E. hormaechei. Hybrid assembly generated a complete circular chromosome of 4,739,272 bp and a fully resolved plasmid of 339,920 bp containing several antibiotic resistance genes. The strain also harboured a 34,857 bp repeat encoding copper resistance, which was present in both the chromosome and plasmid. Long reads that unambiguously spanned this repeat were required to resolve the chromosome and plasmid into separate replicons. Conclusion This study provides important insights into the evolution and potential spread of antimicrobial resistance in a nosocomial E. hormaechei strain. More broadly, it further exemplifies the power of long-read sequencing technologies, particularly the Oxford Nanopore platform, for the characterisation of bacteria with complex resistance loci and large repeat elements.
Collapse
|
24
|
Fritz A, Hofmann P, Majda S, Dahms E, Dröge J, Fiedler J, Lesker TR, Belmann P, DeMaere MZ, Darling AE, Sczyrba A, Bremges A, McHardy AC. CAMISIM: simulating metagenomes and microbial communities. MICROBIOME 2019; 7:17. [PMID: 30736849 PMCID: PMC6368784 DOI: 10.1186/s40168-019-0633-6] [Citation(s) in RCA: 97] [Impact Index Per Article: 19.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 01/21/2019] [Indexed: 05/11/2023]
Abstract
BACKGROUND Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
Collapse
|
25
|
Reid CJ, Wyrsch ER, Roy Chowdhury P, Zingali T, Liu M, Darling AE, Chapman TA, Djordjevic SP. Porcine commensal Escherichia coli: a reservoir for class 1 integrons associated with IS26. Microb Genom 2019; 3. [PMID: 29306352 PMCID: PMC5761274 DOI: 10.1099/mgen.0.000143] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Porcine faecal waste is a serious environmental pollutant. Carriage of antimicrobial-resistance genes (ARGs) and virulence-associated genes (VAGs), and the zoonotic potential of commensal Escherichia coli from swine are largely unknown. Furthermore, little is known about the role of commensal E. coli as contributors to the mobilization of ARGs between food animals and the environment. Here, we report whole-genome sequence analysis of 103 class 1 integron-positive E. coli from the faeces of healthy pigs from two commercial production facilities in New South Wales, Australia. Most strains belonged to phylogroups A and B1, and carried VAGs linked with extraintestinal infection in humans. The 103 strains belonged to 37 multilocus sequence types and clonal complex 10 featured prominently. Seventeen ARGs were detected and 97 % (100/103) of strains carried three or more ARGs. Heavy-metal-resistance genes merA, cusA and terA were also common. IS26 was observed in 98 % (101/103) of strains and was often physically associated with structurally diverse class 1 integrons that carried unique genetic features, which may be tracked. This study provides, to our knowledge, the first detailed genomic analysis and point of reference for commensal E. coli of porcine origin in Australia, facilitating tracking of specific lineages and the mobile resistance genes they carry.
Collapse
|