1
|
McKnight DJE, Wong-Bajracharya J, Okoh EB, Snijders F, Lidbetter F, Webster J, Haughton M, Darling AE, Djordjevic SP, Bogema DR, Chapman TA. Xanthomonas rydalmerensis sp. nov., a non-pathogenic member of Group 1 Xanthomonas. Int J Syst Evol Microbiol 2024; 74:006294. [PMID: 38536071 PMCID: PMC10995728 DOI: 10.1099/ijsem.0.006294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Accepted: 03/04/2024] [Indexed: 04/07/2024] Open
Abstract
Five bacterial isolates were isolated from Fragaria × ananassa in 1976 in Rydalmere, Australia, during routine biosecurity surveillance. Initially, the results of biochemical characterisation indicated that these isolates represented members of the genus Xanthomonas. To determine their species, further analysis was conducted using both phenotypic and genotypic approaches. Phenotypic analysis involved using MALDI-TOF MS and BIOLOG GEN III microplates, which confirmed that the isolates represented members of the genus Xanthomonas but did not allow them to be classified with respect to species. Genome relatedness indices and the results of extensive phylogenetic analysis confirmed that the isolates were members of the genus Xanthomonas and represented a novel species. On the basis the minimal presence of virulence-associated factors typically found in genomes of members of the genus Xanthomonas, we suggest that these isolates are non-pathogenic. This conclusion was supported by the results of a pathogenicity assay. On the basis of these findings, we propose the name Xanthomonas rydalmerensis, with DAR 34855T = ICMP 24941 as the type strain.
Collapse
Affiliation(s)
- Daniel J. E. McKnight
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
- University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia
| | - Johanna Wong-Bajracharya
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Efenaide B. Okoh
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
- Western Sydney University, Penrith, NSW, Australia
| | - Fridtjof Snijders
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Fiona Lidbetter
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - John Webster
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Mathew Haughton
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Aaron E. Darling
- University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia
| | | | - Daniel R. Bogema
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
- University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia
| | - Toni A. Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
- University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia
| |
Collapse
|
2
|
Jaya FR, Brito BP, Darling AE. Evaluation of recombination detection methods for viral sequencing. Virus Evol 2023; 9:vead066. [PMID: 38131005 PMCID: PMC10734630 DOI: 10.1093/ve/vead066] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2023] [Revised: 08/03/2023] [Accepted: 11/15/2023] [Indexed: 12/23/2023] Open
Abstract
Recombination is a key evolutionary driver in shaping novel viral populations and lineages. When unaccounted for, recombination can impact evolutionary estimations or complicate their interpretation. Therefore, identifying signals for recombination in sequencing data is a key prerequisite to further analyses. A repertoire of recombination detection methods (RDMs) have been developed over the past two decades; however, the prevalence of pandemic-scale viral sequencing data poses a computational challenge for existing methods. Here, we assessed eight RDMs: PhiPack (Profile), 3SEQ, GENECONV, recombination detection program (RDP) (OpenRDP), MaxChi (OpenRDP), Chimaera (OpenRDP), UCHIME (VSEARCH), and gmos; to determine if any are suitable for the analysis of bulk sequencing data. To test the performance and scalability of these methods, we analysed simulated viral sequencing data across a range of sequence diversities, recombination frequencies, and sample sizes. Furthermore, we provide a practical example for the analysis and validation of empirical data. We find that RDMs need to be scalable, use an analytical approach and resolution that is suitable for the intended research application, and are accurate for the properties of a given dataset (e.g. sequence diversity and estimated recombination frequency). Analysis of simulated and empirical data revealed that the assessed methods exhibited considerable trade-offs between these criteria. Overall, we provide general guidelines for the validation of recombination detection results, the benefits and shortcomings of each assessed method, and future considerations for recombination detection methods for the assessment of large-scale viral sequencing data.
Collapse
Affiliation(s)
- Frederick R Jaya
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Ecology and Evolution, Research School of Biology, Australian National University, 134 Linnaeus Way, Acton, Australian Capital Territory 2600, Australia
| | - Barbara P Brito
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, New South Wales 2568, Australia
| | - Aaron E Darling
- Australian Institute for Microbiology & Infection, University of Technology Sydney, 15 Broadway, Ultimo, New South Wales 2007, Australia
- Illumina Australia Pty Ltd, Ultimo, New South Wales 2007, Australia
| |
Collapse
|
3
|
Brito BP, Frost MJ, Anantanawat K, Jaya F, Batterham T, Djordjevic SP, Chang WS, Holmes EC, Darling AE, Kirkland PD. Expanding the range of the respiratory infectome in Australian feedlot cattle with and without respiratory disease using metatranscriptomics. Microbiome 2023; 11:158. [PMID: 37491320 PMCID: PMC10367309 DOI: 10.1186/s40168-023-01591-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 06/03/2023] [Indexed: 07/27/2023]
Abstract
BACKGROUND Bovine respiratory disease (BRD) is one of the most common diseases in intensively managed cattle, often resulting in high morbidity and mortality. Although several pathogens have been isolated and extensively studied, the complete infectome of the respiratory complex consists of a more extensive range unrecognised species. Here, we used total RNA sequencing (i.e., metatranscriptomics) of nasal and nasopharyngeal swabs collected from animals with and without BRD from two cattle feedlots in Australia. RESULTS A high abundance of bovine nidovirus, influenza D, bovine rhinitis A and bovine coronavirus was found in the samples. Additionally, we obtained the complete or near-complete genome of bovine rhinitis B, enterovirus E1, bovine viral diarrhea virus (sub-genotypes 1a and 1c) and bovine respiratory syncytial virus, and partial sequences of other viruses. A new species of paramyxovirus was also identified. Overall, the most abundant RNA virus, was the bovine nidovirus. Characterisation of bacterial species from the transcriptome revealed a high abundance and diversity of Mollicutes in BRD cases and unaffected control animals. Of the non-Mollicutes species, Histophilus somni was detected, whereas there was a low abundance of Mannheimia haemolytica. CONCLUSION This study highlights the use of untargeted sequencing approaches to study the unrecognised range of microorganisms present in healthy or diseased animals and the need to study previously uncultured viral species that may have an important role in cattle respiratory disease. Video Abstract.
Collapse
Affiliation(s)
- Barbara P Brito
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Ultimo, New South Wales, Australia.
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, New South Wales, Australia.
- Present Address: Biosecurity and Food Safety, NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute (EMAI), Menangle, New South Wales, Australia.
| | - Melinda J Frost
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, New South Wales, Australia
| | - Kay Anantanawat
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Ultimo, New South Wales, Australia
- Illumina Australia, Ultimo, New South Wales, Australia
| | - Frederick Jaya
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Ultimo, New South Wales, Australia
| | | | - Steven P Djordjevic
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Ultimo, New South Wales, Australia
| | - Wei-Shan Chang
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Edward C Holmes
- Sydney Institute for Infectious Diseases, School of Medical Sciences, The University of Sydney, Sydney, NSW, Australia
| | - Aaron E Darling
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Ultimo, New South Wales, Australia
- Illumina Australia, Ultimo, New South Wales, Australia
| | - Peter D Kirkland
- New South Wales Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, New South Wales, Australia
| |
Collapse
|
4
|
Krishnan S, DeMaere MZ, Beck D, Ostrowski M, Seymour JR, Darling AE. Rhometa: Population recombination rate estimation from metagenomic read datasets. PLoS Genet 2023; 19:e1010683. [PMID: 36972309 PMCID: PMC10079220 DOI: 10.1371/journal.pgen.1010683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Revised: 04/06/2023] [Accepted: 02/27/2023] [Indexed: 03/29/2023] Open
Abstract
Prokaryotic evolution is influenced by the exchange of genetic information between species through a process referred to as recombination. The rate of recombination is a useful measure for the adaptive capacity of a prokaryotic population. We introduce Rhometa (https://github.com/sid-krish/Rhometa), a new software package to determine recombination rates from shotgun sequencing reads of metagenomes. It extends the composite likelihood approach for population recombination rate estimation and enables the analysis of modern short-read datasets. We evaluated Rhometa over a broad range of sequencing depths and complexities, using simulated and real experimental short-read data aligned to external reference genomes. Rhometa offers a comprehensive solution for determining population recombination rates from contemporary metagenomic read datasets. Rhometa extends the capabilities of conventional sequence-based composite likelihood population recombination rate estimators to include modern aligned metagenomic read datasets with diverse sequencing depths, thereby enabling the effective application of these techniques and their high accuracy rates to the field of metagenomics. Using simulated datasets, we show that our method performs well, with its accuracy improving with increasing numbers of genomes. Rhometa was validated on a real S. pneumoniae transformation experiment, where we show that it obtains plausible estimates of the rate of recombination. Finally, the program was also run on ocean surface water metagenomic datasets, through which we demonstrate that the program works on uncultured metagenomic datasets.
Collapse
Affiliation(s)
- Sidaswar Krishnan
- Climate Change Cluster, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia
| | - Matthew Z. DeMaere
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Sydney, NSW, Australia
- * E-mail:
| | - Dominik Beck
- Centre for Health Technologies and the School of Biomedical Engineering, University of Technology Sydney, Sydney, NSW, Australia
| | - Martin Ostrowski
- Climate Change Cluster, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia
| | - Justin R. Seymour
- Climate Change Cluster, Faculty of Science, University of Technology Sydney, Sydney, NSW, Australia
| | - Aaron E. Darling
- Australian Institute for Microbiology & Infection, University of Technology Sydney, Sydney, NSW, Australia
- Illumina Australia Pty Ltd, Ultimo, NSW, Australia
| |
Collapse
|
5
|
Gaio D, DeMaere MZ, Anantanawat K, Eamens GJ, Falconer L, Chapman TA, Djordjevic S, Darling AE. Phylogenetic diversity analysis of shotgun metagenomic reads describes gut microbiome development and treatment effects in the post-weaned pig. PLoS One 2022; 17:e0270372. [PMID: 35749534 PMCID: PMC9232140 DOI: 10.1371/journal.pone.0270372] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Accepted: 06/08/2022] [Indexed: 11/18/2022] Open
Abstract
Intensive farming practices can increase exposure of animals to infectious agents against which antibiotics are used. Orally administered antibiotics are well known to cause dysbiosis. To counteract dysbiotic effects, numerous studies in the past two decades sought to understand whether probiotics are a valid tool to help re-establish a healthy gut microbial community after antibiotic treatment. Although dysbiotic effects of antibiotics are well investigated, little is known about the effects of intramuscular antibiotic treatment on the gut microbiome and a few studies attempted to study treatment effects using phylogenetic diversity analysis techniques. In this study we sought to determine the effects of two probiotic- and one intramuscularly administered antibiotic treatment on the developing gut microbiome of post-weaning piglets between their 3rd and 9th week of life. Shotgun metagenomic sequences from over 800 faecal time-series samples derived from 126 post-weaning piglets and 42 sows were analysed in a phylogenetic framework. Differences between individual hosts such as breed, litter, and age, were found to be important contributors to variation in the community composition. Host age was the dominant factor in shaping the gut microbiota of piglets after weaning. The post-weaning pig gut microbiome appeared to follow a highly structured developmental program with characteristic post-weaning changes that can distinguish hosts that were born as little as two days apart in the second month of life. Treatment effects of the antibiotic and probiotic treatments were found but were subtle and included a higher representation of Mollicutes associated with intramuscular antibiotic treatment, and an increase of Lactobacillus associated with probiotic treatment. The discovery of correlations between experimental factors and microbial community composition is more commonly addressed with OTU-based methods and rarely analysed via phylogenetic diversity measures. The latter method, although less intuitive than the former, suffers less from library size normalization biases, and it proved to be instrumental in this study for the discovery of correlations between microbiome composition and host-, and treatment factors.
Collapse
Affiliation(s)
- Daniela Gaio
- iThree Institute, University of Technology Sydney, Ultimo, Australia
- * E-mail:
| | | | - Kay Anantanawat
- iThree Institute, University of Technology Sydney, Ultimo, Australia
| | - Graeme J. Eamens
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, Australia
| | - Linda Falconer
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, Australia
| | - Toni A. Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, Australia
| | - Steven Djordjevic
- iThree Institute, University of Technology Sydney, Ultimo, Australia
| | - Aaron E. Darling
- iThree Institute, University of Technology Sydney, Ultimo, Australia
| |
Collapse
|
6
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Citation(s) in RCA: 89] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
7
|
Abstract
We developed a low-cost method for the production of Illumina-compatible sequencing libraries that allows up to 14 times more libraries for high-throughput Illumina sequencing to be generated for the same cost. We call this new method Hackflex. The quality of library preparation was tested by constructing libraries from Escherichia coli MG1655 genomic DNA using either Hackflex, standard Nextera Flex (recently renamed as Illumina DNA Prep) or a variation of standard Nextera Flex in which the bead-linked transposase is diluted prior to use. In order to test the library quality for genomes with a higher and a lower G+C content, library construction methods were also tested on Pseudomonas aeruginosa PAO1 and Staphylococcus aureus ATCC 25923, respectively. We demonstrated that Hackflex can produce high-quality libraries and yields a highly uniform coverage, equivalent to the standard Nextera Flex kit. We show that strongly size-selected libraries produce sufficient yield and complexity to support de novo microbial genome assembly, and that assemblies of the large-insert libraries can be much more contiguous than standard libraries without strong size selection. We introduce a new set of sample barcodes that are distinct from standard Illumina barcodes, enabling Hackflex samples to be multiplexed with samples barcoded using standard Illumina kits. Using Hackflex, we were able to achieve a per-sample reagent cost for library prep of A$7.22 (Australian dollars) (US $5.60; UK £3.87, £1=A$1.87), which is 9.87 times lower than the standard Nextera Flex protocol at advertised retail price. An additional simple modification and further simplification of the protocol by omitting the wash step enables a further price reduction to reach an overall 14-fold cost saving. This method will allow researchers to construct more libraries within a given budget, thereby yielding more data and facilitating research programmes where sequencing large numbers of libraries is beneficial.
Collapse
Affiliation(s)
- Daniela Gaio
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Kay Anantanawat
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Joyce To
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Michael Liu
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Leigh Monahan
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Aaron E Darling
- iThree Institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
8
|
DeMaere MZ, Darling AE. qc3C: Reference-free quality control for Hi-C sequencing data. PLoS Comput Biol 2021; 17:e1008839. [PMID: 34634030 PMCID: PMC8530316 DOI: 10.1371/journal.pcbi.1008839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2021] [Revised: 10/21/2021] [Accepted: 09/16/2021] [Indexed: 11/19/2022] Open
Abstract
Hi-C is a sample preparation method that enables high-throughput sequencing to capture genome-wide spatial interactions between DNA molecules. The technique has been successfully applied to solve challenging problems such as 3D structural analysis of chromatin, scaffolding of large genome assemblies and more recently the accurate resolution of metagenome-assembled genomes (MAGs). Despite continued refinements, however, preparing a Hi-C library remains a complex laboratory protocol. To avoid costly failures and maximise the odds of successful outcomes, diligent quality management is recommended. Current wet-lab methods provide only a crude assay of Hi-C library quality, while key post-sequencing quality indicators used have—thus far—relied upon reference-based read-mapping. When a reference is accessible, this reliance introduces a concern for quality, where an incomplete or inexact reference skews the resulting quality indicators. We propose a new, reference-free approach that infers the total fraction of read-pairs that are a product of proximity ligation. This quantification of Hi-C library quality requires only a modest amount of sequencing data and is independent of other application-specific criteria. The algorithm builds upon the observation that proximity ligation events are likely to create k-mers that would not naturally occur in the sample. Our software tool (qc3C) is to our knowledge the first to implement a reference-free Hi-C QC tool, and also provides reference-based QC, enabling Hi-C to be more easily applied to non-model organisms and environmental samples. We characterise the accuracy of the new algorithm on simulated and real datasets and compare it to reference-based methods. The Hi-C sequencing technique offers the potential for significant scientific insight about the spatial arrangement of DNA, however achieving such outcomes is highly dependent on the quality of the resulting sequencing library. Unlike conventional next-gen sequencing, only a fraction of a given Hi-C library contains this useful spatial information (the signal) with the remainder being effectively noise. As Hi-C remains a challenging laboratory technique, signal strength of resulting libraries can vary greatly. As a quality metric, the quantification a library’s signal content is an essential asset in any quality mitigation strategy. Quality assessment of Hi-C data has until now relied on access to a (ideally refined) reference sequence, by which indirect indicators of quality are determined. Here we describe qc3C, a software tool capable of the direct, reference-free estimation of the signal content of a Hi-C library. In doing so, not only can researchers make informed decisions on how to progress based on library information content, but eliminating the reference also enables Hi-C quality management for non-model organism and metagenomics researchers.
Collapse
Affiliation(s)
- Matthew Z. DeMaere
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
- * E-mail:
| | - Aaron E. Darling
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
| |
Collapse
|
9
|
Gaio D, DeMaere MZ, Anantanawat K, Chapman TA, Djordjevic SP, Darling AE. Post-weaning shifts in microbiome composition and metabolism revealed by over 25 000 pig gut metagenome-assembled genomes. Microb Genom 2021; 7. [PMID: 34370660 PMCID: PMC8549361 DOI: 10.1099/mgen.0.000501] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
Using a previously described metagenomics dataset of 27 billion reads, we reconstructed over 50 000 metagenome-assembled genomes (MAGs) of organisms resident in the porcine gut, 46.5 % of which were classified as >70 % complete with a <10 % contamination rate, and 24.4 % were nearly complete genomes. Here, we describe the generation and analysis of those MAGs using time-series samples. The gut microbial communities of piglets appear to follow a highly structured developmental programme in the weeks following weaning, and this development is robust to treatments including an intramuscular antibiotic treatment and two probiotic treatments. The high resolution we obtained allowed us to identify specific taxonomic ‘signatures’ that characterize the gut microbial development immediately after weaning. Additionally, we characterized the carbohydrate repertoire of the organisms resident in the porcine gut. We tracked the abundance shifts of 294 carbohydrate active enzymes, and identified the species and higher-level taxonomic groups carrying each of these enzymes in their MAGs. This knowledge can contribute to the design of probiotics and prebiotic interventions as a means to modify the piglet gut microbiome.
Collapse
Affiliation(s)
- Daniela Gaio
- iThree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Matthew Z DeMaere
- iThree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Kay Anantanawat
- iThree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Toni A Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, New South Wales, Australia
| | - Steven P Djordjevic
- iThree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Aaron E Darling
- iThree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
10
|
Quince C, Nurk S, Raguideau S, James R, Soyer OS, Summers JK, Limasset A, Eren AM, Chikhi R, Darling AE. STRONG: metagenomics strain resolution on assembly graphs. Genome Biol 2021; 22:214. [PMID: 34311761 PMCID: PMC8311964 DOI: 10.1186/s13059-021-02419-7] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2020] [Accepted: 06/29/2021] [Indexed: 12/30/2022] Open
Abstract
We introduce STrain Resolution ON assembly Graphs (STRONG), which identifies strains de novo, from multiple metagenome samples. STRONG performs coassembly, and binning into metagenome assembled genomes (MAGs), and stores the coassembly graph prior to variant simplification. This enables the subgraphs and their unitig per-sample coverages, for individual single-copy core genes (SCGs) in each MAG, to be extracted. A Bayesian algorithm, BayesPaths, determines the number of strains present, their haplotypes or sequences on the SCGs, and abundances. STRONG is validated using synthetic communities and for a real anaerobic digestor time series generates haplotypes that match those observed from long Nanopore reads.
Collapse
Affiliation(s)
- Christopher Quince
- Organisms and Ecosystems, Earlham Institute, Norwich, NR4 7UZ, UK.
- Gut Microbes and Health, Quadram Institute, Norwich, NR4 7UQ, UK.
- Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK.
| | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, 20892, MD, USA.
| | - Sebastien Raguideau
- Organisms and Ecosystems, Earlham Institute, Norwich, NR4 7UZ, UK
- Warwick Medical School, University of Warwick, Coventry, CV4 7AL, UK
| | - Robert James
- Gut Microbes and Health, Quadram Institute, Norwich, NR4 7UQ, UK
| | - Orkun S Soyer
- School of Life Sciences, University of Warwick, Coventry, CV4 7AL, UK
| | | | | | - A Murat Eren
- Department of Medicine, University of Chicago, Chicago, Illinois, USA
- Josephine Bay Paul Center, Marine Biological Laboratory, Woods Hole, Massachusetts, USA
| | - Rayan Chikhi
- Department of Computational Biology, Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France
| | - Aaron E Darling
- The iThree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007, NSW, Australia
| |
Collapse
|
11
|
Vicedomini R, Quince C, Darling AE, Chikhi R. Strainberry: automated strain separation in low-complexity metagenomes using long reads. Nat Commun 2021; 12:4485. [PMID: 34301928 PMCID: PMC8302730 DOI: 10.1038/s41467-021-24515-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 06/18/2021] [Indexed: 02/07/2023] Open
Abstract
High-throughput short-read metagenomics has enabled large-scale species-level analysis and functional characterization of microbial communities. Microbiomes often contain multiple strains of the same species, and different strains have been shown to have important differences in their functional roles. Recent advances on long-read based methods enabled accurate assembly of bacterial genomes from complex microbiomes and an as-yet-unrealized opportunity to resolve strains. Here we present Strainberry, a metagenome assembly pipeline that performs strain separation in single-sample low-complexity metagenomes and that relies uniquely on long-read data. We benchmarked Strainberry on mock communities for which it produces strain-resolved assemblies with near-complete reference coverage and 99.9% base accuracy. We also applied Strainberry on real datasets for which it improved assemblies generating 20-118% additional genomic material than conventional metagenome assemblies on individual strain genomes. We show that Strainberry is also able to refine microbial diversity in a complex microbiome, with complete separation of strain genomes. We anticipate this work to be a starting point for further methodological improvements on strain-resolved metagenome assembly in environments of higher complexities.
Collapse
Affiliation(s)
- Riccardo Vicedomini
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France.
| | - Christopher Quince
- Organisms and Ecosystems, Earlham Institute, Norwich, United Kingdom
- Gut Microbes and Health, Quadram Institute, Norwich, United Kingdom
- Warwick Medical School, University of Warwick, Coventry, United Kingdom
| | - Aaron E Darling
- The iThree Institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Rayan Chikhi
- Sequence Bioinformatics, Department of Computational Biology, Institut Pasteur, Paris, France
| |
Collapse
|
12
|
Gaio D, DeMaere MZ, Anantanawat K, Eamens GJ, Liu M, Zingali T, Falconer L, Chapman TA, Djordjevic SP, Darling AE. A large-scale metagenomic survey dataset of the post-weaning piglet gut lumen. Gigascience 2021; 10:giab039. [PMID: 34080630 PMCID: PMC8173662 DOI: 10.1093/gigascience/giab039] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 02/22/2021] [Accepted: 05/04/2021] [Indexed: 12/23/2022] Open
Abstract
BACKGROUND Early weaning and intensive farming practices predispose piglets to the development of infectious and often lethal diseases, against which antibiotics are used. Besides contributing to the build-up of antimicrobial resistance, antibiotics are known to modulate the gut microbial composition. As an alternative to antibiotic treatment, studies have previously investigated the potential of probiotics for the prevention of postweaning diarrhea. In order to describe the post-weaning gut microbiota, and to study the effects of two probiotics formulations and of intramuscular antibiotic treatment on the gut microbiota, we sampled and processed over 800 faecal time-series samples from 126 piglets and 42 sows. RESULTS Here we report on the largest shotgun metagenomic dataset of the pig gut lumen microbiome to date, consisting of >8 Tbp of shotgun metagenomic sequencing data. The animal trial, the workflow from sample collection to sample processing, and the preparation of libraries for sequencing, are described in detail. We provide a preliminary analysis of the dataset, centered on a taxonomic profiling of the samples, and a 16S-based beta diversity analysis of the mothers and the piglets in the first 5 weeks after weaning. CONCLUSIONS This study was conducted to generate a publicly available databank of the faecal metagenome of weaner piglets aged between 3 and 9 weeks old, treated with different probiotic formulations and intramuscular antibiotic treatment. Besides investigating the effects of the probiotic and intramuscular antibiotic treatment, the dataset can be explored to assess a wide range of ecological questions with regards to antimicrobial resistance, host-associated microbial and phage communities, and their dynamics during the aging of the host.
Collapse
Affiliation(s)
- Daniela Gaio
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Matthew Z DeMaere
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Kay Anantanawat
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Graeme J Eamens
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Michael Liu
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Tiziana Zingali
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Linda Falconer
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Toni A Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Rd, Menangle NSW 2568, Australia
| | - Steven P Djordjevic
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| | - Aaron E Darling
- The iThree Institute, University of Technology Sydney, Sydney, NSW 2007, Australia
| |
Collapse
|
13
|
Meyer F, Lesker TR, Koslicki D, Fritz A, Gurevich A, Darling AE, Sczyrba A, Bremges A, McHardy AC. Tutorial: assessing metagenomics software with the CAMI benchmarking toolkit. Nat Protoc 2021; 16:1785-1801. [PMID: 33649565 DOI: 10.1038/s41596-020-00480-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2020] [Accepted: 11/26/2020] [Indexed: 01/31/2023]
Abstract
Computational methods are key in microbiome research, and obtaining a quantitative and unbiased performance estimate is important for method developers and applied researchers. For meaningful comparisons between methods, to identify best practices and common use cases, and to reduce overhead in benchmarking, it is necessary to have standardized datasets, procedures and metrics for evaluation. In this tutorial, we describe emerging standards in computational meta-omics benchmarking derived and agreed upon by a larger community of researchers. Specifically, we outline recent efforts by the Critical Assessment of Metagenome Interpretation (CAMI) initiative, which supplies method developers and applied researchers with exhaustive quantitative data about software performance in realistic scenarios and organizes community-driven benchmarking challenges. We explain the most relevant evaluation metrics for assessing metagenome assembly, binning and profiling results, and provide step-by-step instructions on how to generate them. The instructions use simulated mouse gut metagenome data released in preparation for the second round of CAMI challenges and showcase the use of a repository of tool results for CAMI datasets. This tutorial will serve as a reference for the community and facilitate informative and reproducible benchmarking in microbiome research.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till-Robin Lesker
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - David Koslicki
- Computer Science and Engineering, Biology, and The Huck Institutes of the Life Sciences, Penn State University, State College, PA, USA
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, St. Petersburg State University, St. Petersburg, Russia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, Australia
| | - Alexander Sczyrba
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.
| |
Collapse
|
14
|
Hastak P, Fourment M, Darling AE, Gottlieb T, Cheong E, Merlino J, Myers GSA, Djordjevic SP, Roy Chowdhury P. Escherichia coli ST8196 is a novel, locally evolved, and extensively drug resistant pathogenic lineage within the ST131 clonal complex. Emerg Microbes Infect 2020; 9:1780-1792. [PMID: 32686595 PMCID: PMC7473005 DOI: 10.1080/22221751.2020.1797541] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2020] [Accepted: 07/14/2020] [Indexed: 11/25/2022]
Abstract
The H30Rx subclade of Escherichia coli ST131 is a clinically important, globally dispersed pathogenic lineage that typically displays resistance to fluoroquinolones and extended spectrum β-lactams. Isolates EC233 and EC234, variants of ST131-H30Rx with a novel sequence type (ST) 8196, isolated from unrelated patients presenting with bacteraemia at a Sydney Hospital in 2014 are characterised here. EC233 and EC234 are phylogroup B2, serotype O25:H4A, and resistant to ampicillin, amoxicillin, cefoxitin, ceftazidime, ceftriaxone, ciprofloxacin, norfloxacin and gentamicin and are likely clonal. Both harbour an IncFII_2 plasmid (pSPRC_Ec234-FII) that carries most of the resistance genes on an IS26 associated translocatable unit, two small plasmids and a novel IncI1 plasmid (pSPRC_Ec234-I). SNP-based phylogenetic analysis of the core genome of representatives within the ST131 clonal complex places both isolates in a subclade with three clinical Australian ST131-H30Rx clade-C isolates. A MrBayes phylogeny analysis of EC233 and EC234 indicates ST8196 share a most recent common ancestor with ST131-H30Rx strain EC70 isolated from the same hospital in 2013. Our study identified genomic hallmarks that define the ST131-H30Rx subclade in the ST8196 isolates and highlights a need for unbiased genomic surveillance approaches to identify novel high-risk MDR E. coli pathogens that impact healthcare facilities.
Collapse
Affiliation(s)
- Priyanka Hastak
- The ithree institute, University of Technology Sydney, Ultimo, Australia
| | - Mathieu Fourment
- The ithree institute, University of Technology Sydney, Ultimo, Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, Ultimo, Australia
| | - Thomas Gottlieb
- Department of Microbiology and Infectious Diseases, Concord Hospital, Concord, Australia
- Faculty of Medicine, University of Sydney, Sydney, Australia
| | - Elaine Cheong
- Department of Microbiology and Infectious Diseases, Concord Hospital, Concord, Australia
- Faculty of Medicine, University of Sydney, Sydney, Australia
| | - John Merlino
- Department of Microbiology and Infectious Diseases, Concord Hospital, Concord, Australia
- Faculty of Medicine, University of Sydney, Sydney, Australia
| | - Garry S. A. Myers
- The ithree institute, University of Technology Sydney, Ultimo, Australia
| | - Steven P. Djordjevic
- The ithree institute, University of Technology Sydney, Ultimo, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Broadway, Australia
| | - Piklu Roy Chowdhury
- The ithree institute, University of Technology Sydney, Ultimo, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Broadway, Australia
| |
Collapse
|
15
|
Bogema DR, McKinnon J, Liu M, Hitchick N, Miller N, Venturini C, Iredell J, Darling AE, Roy Chowdury P, Djordjevic SP. Whole-genome analysis of extraintestinal Escherichia coli sequence type 73 from a single hospital over a 2 year period identified different circulating clonal groups. Microb Genom 2020; 6. [PMID: 30810518 PMCID: PMC7067039 DOI: 10.1099/mgen.0.000255] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sequence type (ST)73 has emerged as one of the most frequently isolated extraintestinal pathogenic Escherichia coli. To examine the localized diversity of ST73 clonal groups, including their mobile genetic element profile, we sequenced the genomes of 16 multiple-drug resistant ST73 isolates from patients with urinary tract infection from a single hospital in Sydney, Australia, between 2009 and 2011. Genome sequences were used to generate a SNP-based phylogenetic tree to determine the relationship of these isolates in a global context with ST73 sequences (n=210) from public databases. There was no evidence of a dominant outbreak strain of ST73 in patients from this hospital, rather we identified at least eight separate groups, several of which reoccurred, over a 2 year period. The inferred phylogeny of all ST73 strains (n=226) including the ST73 clone D i2 reference genome shows high bootstrap support and clusters into four major groups that correlate with serotype. The Sydney ST73 strains carry a wide variety of virulence-associated genes, but the presence of iss, pic and several iron-acquisition operons was notable.
Collapse
Affiliation(s)
- D R Bogema
- Elizabeth Macarthur Agricultural Institute, NSW Department of Primary Industries, Menangle, NSW 2568, Australia.,The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| | - J McKinnon
- The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| | - M Liu
- The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| | - N Hitchick
- San Pathology, Sydney Adventist Hospital, Wahroonga, NSW 2076, Australia
| | - N Miller
- San Pathology, Sydney Adventist Hospital, Wahroonga, NSW 2076, Australia
| | - C Venturini
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia
| | - J Iredell
- Centre for Infectious Diseases and Microbiology, Westmead Institute for Medical Research, The University of Sydney, Westmead, NSW 2145, Australia
| | - A E Darling
- The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| | - P Roy Chowdury
- The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| | - S P Djordjevic
- The ithree Institute, University of Technology Sydney, NSW 2007, Australia
| |
Collapse
|
16
|
Lodge CJ, Lowe AJ, Milanzi E, Bowatte G, Abramson MJ, Tsimiklis H, Axelrad C, Robertson B, Darling AE, Svanes C, Wjst M, Dharmage SC, Bode L. Human milk oligosaccharide profiles and allergic disease up to 18 years. J Allergy Clin Immunol 2020; 147:1041-1048. [PMID: 32650022 DOI: 10.1016/j.jaci.2020.06.027] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 06/18/2020] [Accepted: 06/24/2020] [Indexed: 01/01/2023]
Abstract
BACKGROUND Human milk oligosaccharides (HMO) are a diverse range of sugars secreted in breast milk that have direct and indirect effects on immunity. The profiles of HMOs produced differ between mothers. OBJECTIVE We sought to determine the relationship between maternal HMO profiles and offspring allergic diseases up to age 18 years. METHODS Colostrum and early lactation milk samples were collected from 285 mothers enrolled in a high-allergy-risk birth cohort, the Melbourne Atopy Cohort Study. Nineteen HMOs were measured. Profiles/patterns of maternal HMOs were determined using LCA. Details of allergic disease outcomes including sensitization, wheeze, asthma, and eczema were collected at multiple follow-ups up to age 18 years. Adjusted logistic regression analyses and generalized estimating equations were used to determine the relationship between HMO profiles and allergy. RESULTS The levels of several HMOs were highly correlated with each other. LCA determined 7 distinct maternal milk profiles with memberships of 10% and 20%. Compared with offspring exposed to the neutral Lewis HMO profile, exposure to acidic Lewis HMOs was associated with a higher risk of allergic disease and asthma over childhood (odds ratio asthma at 18 years, 5.82; 95% CI, 1.59-21.23), whereas exposure to the acidic-predominant profile was associated with a reduced risk of food sensitization (OR at 12 years, 0.08; 95% CI, 0.01-0.67). CONCLUSIONS In this high-allergy-risk birth cohort, some profiles of HMOs were associated with increased and some with decreased allergic disease risks over childhood. Further studies are needed to confirm these findings and realize the potential for intervention.
Collapse
Affiliation(s)
- Caroline J Lodge
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; Murdoch Childrens Research Institute, Royal Childrens Hospital, Parkville, Australia.
| | - Adrian J Lowe
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; Murdoch Childrens Research Institute, Royal Childrens Hospital, Parkville, Australia
| | - Elasma Milanzi
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia
| | - Gayan Bowatte
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; Department of Basic Sciences, Faculty of Allied Health Sciences, University of Peradeniya, Peradeniya, Sri Lanka; National Institute of Fundamental Studies, Kandy, Sri Lanka
| | - Michael J Abramson
- School of Public Health & Preventive Medicine, Monash University, Melbourne, Australia
| | - Helen Tsimiklis
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton, Australia
| | - Christine Axelrad
- Murdoch Childrens Research Institute, Royal Childrens Hospital, Parkville, Australia
| | - Bianca Robertson
- Department of Pediatrics and Larsson-Rosenquist Foundation Mother-Milk-Infant Center of Research Excellence, University of California San Diego, La Jolla, Calif
| | - Aaron E Darling
- The ithree Institute, University of Technology Sydney, Ultimo, Australia
| | - Cecilie Svanes
- Department of Occupational Medicine, Haukeland University Hospital, Bergen, Norway
| | - Matthias Wjst
- Institute of Lung Biology and Disease, Helmholtz Zentrum Muenchen, German Research Center for Environmental Health (GmbH), Neuherberg, Germany; Institut für Medizinische Informatik Statistik und Epidemiologie, Lehrstuhl für Medizinische Informatik, Klinikum rechts der Isar, Munich, Germany
| | - Shyamali C Dharmage
- Allergy and Lung Health Unit, Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Parkville, Australia; Murdoch Childrens Research Institute, Royal Childrens Hospital, Parkville, Australia
| | - Lars Bode
- Department of Pediatrics and Larsson-Rosenquist Foundation Mother-Milk-Infant Center of Research Excellence, University of California San Diego, La Jolla, Calif
| |
Collapse
|
17
|
DeMaere MZ, Liu MYZ, Lin E, Djordjevic SP, Charles IG, Worden P, Burke CM, Monahan LG, Gardiner M, Borody TJ, Darling AE. Metagenomic Hi-C of a Healthy Human Fecal Microbiome Transplant Donor. Microbiol Resour Announc 2020; 9:e01523-19. [PMID: 32029559 PMCID: PMC7005124 DOI: 10.1128/mra.01523-19] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 01/06/2020] [Indexed: 11/20/2022] Open
Abstract
We report the availability of a high-quality metagenomic Hi-C data set generated from a fecal sample taken from a healthy fecal microbiome transplant donor subject. We report on basic features of the data to evaluate their quality.
Collapse
Affiliation(s)
- Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Michael Y Z Liu
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Enmoore Lin
- The Centre for Digestive Diseases, Five Dock, NSW, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | | | - Paul Worden
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Catherine M Burke
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Leigh G Monahan
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Melissa Gardiner
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Thomas J Borody
- The Centre for Digestive Diseases, Five Dock, NSW, Australia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
18
|
Fourment M, Darling AE. Evaluating probabilistic programming and fast variational Bayesian inference in phylogenetics. PeerJ 2019; 7:e8272. [PMID: 31976168 PMCID: PMC6966998 DOI: 10.7717/peerj.8272] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/22/2019] [Indexed: 12/21/2022] Open
Abstract
Recent advances in statistical machine learning techniques have led to the creation of probabilistic programming frameworks. These frameworks enable probabilistic models to be rapidly prototyped and fit to data using scalable approximation methods such as variational inference. In this work, we explore the use of the Stan language for probabilistic programming in application to phylogenetic models. We show that many commonly used phylogenetic models including the general time reversible substitution model, rate heterogeneity among sites, and a range of coalescent models can be implemented using a probabilistic programming language. The posterior probability distributions obtained via the black box variational inference engine in Stan were compared to those obtained with reference implementations of Markov chain Monte Carlo (MCMC) for phylogenetic inference. We find that black box variational inference in Stan is less accurate than MCMC methods for phylogenetic models, but requires far less compute time. Finally, we evaluate a custom implementation of mean-field variational inference on the Jukes-Cantor substitution model and show that a specialized implementation of variational inference can be two orders of magnitude faster and more accurate than a general purpose probabilistic implementation.
Collapse
Affiliation(s)
- Mathieu Fourment
- ithree Institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Aaron E. Darling
- ithree Institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
19
|
Ayres DL, Cummings MP, Baele G, Darling AE, Lewis PO, Swofford DL, Huelsenbeck JP, Lemey P, Rambaut A, Suchard MA. BEAGLE 3: Improved Performance, Scaling, and Usability for a High-Performance Computing Library for Statistical Phylogenetics. Syst Biol 2019; 68:1052-1061. [PMID: 31034053 PMCID: PMC6802572 DOI: 10.1093/sysbio/syz020] [Citation(s) in RCA: 101] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Revised: 04/10/2019] [Accepted: 04/10/2019] [Indexed: 11/12/2022] Open
Abstract
BEAGLE is a high-performance likelihood-calculation library for phylogenetic inference. The BEAGLE library defines a simple, but flexible, application programming interface (API), and includes a collection of efficient implementations for calculation under a variety of evolutionary models on different hardware devices. The library has been integrated into recent versions of popular phylogenetics software packages including BEAST and MrBayes and has been widely used across a diverse range of evolutionary studies. Here, we present BEAGLE 3 with new parallel implementations, increased performance for challenging data sets, improved scalability, and better usability. We have added new OpenCL and central processing unit-threaded implementations to the library, allowing the effective utilization of a wider range of modern hardware. Further, we have extended the API and library to support concurrent computation of independent partial likelihood arrays, for increased performance of nucleotide-model analyses with greater flexibility of data partitioning. For better scalability and usability, we have improved how phylogenetic software packages use BEAGLE in multi-GPU (graphics processing unit) and cluster environments, and introduced an automated method to select the fastest device given the data set, evolutionary model, and hardware. For application developers who wish to integrate the library, we also have developed an online tutorial. To evaluate the effect of the improvements, we ran a variety of benchmarks on state-of-the-art hardware. For a partitioned exemplar analysis, we observe run-time performance improvements as high as 5.9-fold over our previous GPU implementation. BEAGLE 3 is free, open-source software licensed under the Lesser GPL and available at https://beagle-dev.github.io.
Collapse
Affiliation(s)
- Daniel L Ayres
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Michael P Cummings
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
| | - Guy Baele
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, 3000 Leuven, Belgium
| | - Aaron E Darling
- The ithree Institute, University of Technology Sydney, Ultimo, New South Wales 2007, Australia
| | - Paul O Lewis
- Department of Ecology and Evolutionary Biology, University of Connecticut, Unit 3043, Storrs, CT 06269, USA
| | - David L Swofford
- Florida Museum of Natural History, University of Florida, Gainesville, FL 32611, USA
| | - John P Huelsenbeck
- Department of Integrative Biology, University of California, Berkeley, CA 94720 USA
| | - Philippe Lemey
- Department of Microbiology, Immunology and Transplantation, Rega Institute, KU Leuven – University of Leuven, 3000 Leuven, Belgium
| | - Andrew Rambaut
- Institute of Evolutionary Biology, University of Edinburgh, King’s Buildings, Edinburgh EH9 3FL, UK
- Fogarty International Center, National Institutes of Health, Bethesda, MD 20892, USA
| | - Marc A Suchard
- Department of Biomathematics University of California, Los Angeles, CA 90095, USA
- Department of Biostatistics, University of California, Los Angeles, CA 90095, USA
- Department of Human Genetics, University of California, Los Angeles, CA 90095, USA
| |
Collapse
|
20
|
Coil DA, Jospin G, Darling AE, Wallis C, Davis IJ, Harris S, Eisen JA, Holcombe LJ, O’Flynn C. Genomes from bacteria associated with the canine oral cavity: A test case for automated genome-based taxonomic assignment. PLoS One 2019; 14:e0214354. [PMID: 31181071 PMCID: PMC6557473 DOI: 10.1371/journal.pone.0214354] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2019] [Accepted: 05/27/2019] [Indexed: 11/18/2022] Open
Abstract
Taxonomy for bacterial isolates is commonly assigned via sequence analysis. However, the most common sequence-based approaches (e.g. 16S rRNA gene-based phylogeny or whole genome comparisons) are still labor intensive and subjective to varying degrees. Here we present a set of 33 bacterial genomes, isolated from the canine oral cavity. Taxonomy of these isolates was first assigned by PCR amplification of the 16S rRNA gene, Sanger sequencing, and taxonomy assignment using BLAST. After genome sequencing, taxonomy was revisited through a manual process using a combination of average nucleotide identity (ANI), concatenated marker gene phylogenies, and 16S rRNA gene phylogenies. This taxonomy was then compared to the automated taxonomic assignment given by the recently proposed Genome Taxonomy Database (GTDB). We found the results of all three methods to be similar (25 out of the 33 had matching genera), but the GTDB approach required fewer subjective decisions, and required far less labor. The primary differences in the non-identical taxonomic assignments involved cases where GTDB has proposed taxonomic revisions.
Collapse
Affiliation(s)
- David A. Coil
- Genome Center, University of California, Davis, CA, United States of America
| | - Guillaume Jospin
- Genome Center, University of California, Davis, CA, United States of America
| | - Aaron E. Darling
- The Ithree Institute, University of Technology Sydney, Ultimo NSW, Australia
| | - Corrin Wallis
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Ian J. Davis
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Stephen Harris
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Jonathan A. Eisen
- Genome Center, University of California, Davis, CA, United States of America
- Evolution and Ecology, Medical Microbiology and Immunology, University of California, Davis, Davis, CA, United States of America
| | - Lucy J. Holcombe
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
| | - Ciaran O’Flynn
- The Waltham Centre for Pet Nutrition, Melton Mowbray, Leicestershire, United Kingdom
- * E-mail:
| |
Collapse
|
21
|
Roy Chowdhury P, Fourment M, DeMaere MZ, Monahan L, Merlino J, Gottlieb T, Darling AE, Djordjevic SP. Identification of a novel lineage of plasmids within phylogenetically diverse subclades of IncHI2-ST1 plasmids. Plasmid 2019; 102:56-61. [PMID: 30885788 DOI: 10.1016/j.plasmid.2019.03.002] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2018] [Revised: 02/22/2019] [Accepted: 03/13/2019] [Indexed: 11/17/2022]
Abstract
IncHI2-ST1 plasmids play an important role in co-mobilizing genes conferring resistance to critically important antibiotics and heavy metals. Here we present the identification and analysis of IncHI2-ST1 plasmid pSPRC-Echo1, isolated from an Enterobacter hormaechei strain from a Sydney hospital, which predates other multi-drug resistant IncHI2-ST1 plasmids reported from Australia. Our time-resolved phylogeny analysis indicates pSPRC-Echo1 represents a new lineage of IncHI2-ST1 plasmids and show how their diversification relates to the era of antibiotics.
Collapse
Affiliation(s)
- Piklu Roy Chowdhury
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia; NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Sydney, Australia.
| | - Mathieu Fourment
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia
| | - Matthew Z DeMaere
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia
| | - Leigh Monahan
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia
| | - John Merlino
- Department of Microbiology and Infectious Diseases, Concord Hospital, NSW Health Pathology, Hospital Road, Concord 2139, NSW, Australia; Faculty of Medicine, University of Sydney, NSW, Australia
| | - Thomas Gottlieb
- Department of Microbiology and Infectious Diseases, Concord Hospital, NSW Health Pathology, Hospital Road, Concord 2139, NSW, Australia; Faculty of Medicine, University of Sydney, NSW, Australia
| | - Aaron E Darling
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia
| | - Steven P Djordjevic
- The Ithree Institute, University of Technology Sydney, City Campus, Ultimo, NSW 2007, Australia
| |
Collapse
|
22
|
Abstract
Most microbes cannot be easily cultured, and metagenomics provides a means to study them. Current techniques aim to resolve individual genomes from metagenomes, so-called metagenome-assembled genomes (MAGs). Leading approaches depend upon time series or transect studies, the efficacy of which is a function of community complexity, target abundance, and sequencing depth. We describe an unsupervised method that exploits the hierarchical nature of Hi-C interaction rates to resolve MAGs using a single time point. We validate the method and directly compare against a recently announced proprietary service, ProxiMeta. bin3C is an open-source pipeline and makes use of the Infomap clustering algorithm ( https://github.com/cerebis/bin3C ).
Collapse
Affiliation(s)
- Matthew Z. DeMaere
- The ithree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007 NSW Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, 15 Broadway, Ultimo, 2007 NSW Australia
| |
Collapse
|
23
|
Monahan LG, DeMaere MZ, Cummins ML, Djordjevic SP, Roy Chowdhury P, Darling AE. High contiguity genome sequence of a multidrug-resistant hospital isolate of Enterobacter hormaechei. Gut Pathog 2019; 11:3. [PMID: 30805030 PMCID: PMC6373042 DOI: 10.1186/s13099-019-0288-7] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/09/2018] [Accepted: 02/06/2019] [Indexed: 11/10/2022] Open
Abstract
Background Enterobacter hormaechei is an important emerging pathogen and a key member of the highly diverse Enterobacter cloacae complex. E. hormaechei strains can persist and spread in nosocomial environments, and often exhibit resistance to multiple clinically important antibiotics. However, the genomic regions that harbour resistance determinants are typically highly repetitive and impossible to resolve with standard short-read sequencing technologies. Results Here we used both short- and long-read methods to sequence the genome of a multidrug-resistant hospital isolate (C15117), which we identified as E. hormaechei. Hybrid assembly generated a complete circular chromosome of 4,739,272 bp and a fully resolved plasmid of 339,920 bp containing several antibiotic resistance genes. The strain also harboured a 34,857 bp repeat encoding copper resistance, which was present in both the chromosome and plasmid. Long reads that unambiguously spanned this repeat were required to resolve the chromosome and plasmid into separate replicons. Conclusion This study provides important insights into the evolution and potential spread of antimicrobial resistance in a nosocomial E. hormaechei strain. More broadly, it further exemplifies the power of long-read sequencing technologies, particularly the Oxford Nanopore platform, for the characterisation of bacteria with complex resistance loci and large repeat elements.
Collapse
Affiliation(s)
- Leigh G Monahan
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia
| | - Matthew Z DeMaere
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia
| | - Max L Cummins
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia
| | - Steven P Djordjevic
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia
| | - Piklu Roy Chowdhury
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia.,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, Menangle, 2568 Australia
| | - Aaron E Darling
- 1ithree institute, University of Technology Sydney, Broadway Street, Ultimo, 2007 Australia
| |
Collapse
|
24
|
Fritz A, Hofmann P, Majda S, Dahms E, Dröge J, Fiedler J, Lesker TR, Belmann P, DeMaere MZ, Darling AE, Sczyrba A, Bremges A, McHardy AC. CAMISIM: simulating metagenomes and microbial communities. Microbiome 2019; 7:17. [PMID: 30736849 PMCID: PMC6368784 DOI: 10.1186/s40168-019-0633-6] [Citation(s) in RCA: 86] [Impact Index Per Article: 17.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Accepted: 01/21/2019] [Indexed: 05/11/2023]
Abstract
BACKGROUND Shotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required. RESULTS We describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM. CONCLUSIONS CAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM.
Collapse
Affiliation(s)
- Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
| | - Peter Hofmann
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Stephan Majda
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Eik Dahms
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Johannes Dröge
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Jessika Fiedler
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| | - Till R. Lesker
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, 38124 Germany
| | - Peter Belmann
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Center for Biotechnology and Faculty of Technology, Bielefeld University, Bielefeld, 33615 Germany
| | - Matthew Z. DeMaere
- The ithree institute, University of Technology Sydney, Sydney NSW, 2007 Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, Sydney NSW, 2007 Australia
| | - Alexander Sczyrba
- Center for Biotechnology and Faculty of Technology, Bielefeld University, Bielefeld, 33615 Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, 38124 Germany
| | - Alice C. McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, 38124 Germany
- Formerly Department of Algorithmic Bioinformatics, Heinrich-Heine University Düsseldorf, Düsseldorf, 40225 Germany
| |
Collapse
|
25
|
Reid CJ, Wyrsch ER, Roy Chowdhury P, Zingali T, Liu M, Darling AE, Chapman TA, Djordjevic SP. Porcine commensal Escherichia coli: a reservoir for class 1 integrons associated with IS26. Microb Genom 2019; 3. [PMID: 29306352 PMCID: PMC5761274 DOI: 10.1099/mgen.0.000143] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Porcine faecal waste is a serious environmental pollutant. Carriage of antimicrobial-resistance genes (ARGs) and virulence-associated genes (VAGs), and the zoonotic potential of commensal Escherichia coli from swine are largely unknown. Furthermore, little is known about the role of commensal E. coli as contributors to the mobilization of ARGs between food animals and the environment. Here, we report whole-genome sequence analysis of 103 class 1 integron-positive E. coli from the faeces of healthy pigs from two commercial production facilities in New South Wales, Australia. Most strains belonged to phylogroups A and B1, and carried VAGs linked with extraintestinal infection in humans. The 103 strains belonged to 37 multilocus sequence types and clonal complex 10 featured prominently. Seventeen ARGs were detected and 97 % (100/103) of strains carried three or more ARGs. Heavy-metal-resistance genes merA, cusA and terA were also common. IS26 was observed in 98 % (101/103) of strains and was often physically associated with structurally diverse class 1 integrons that carried unique genetic features, which may be tracked. This study provides, to our knowledge, the first detailed genomic analysis and point of reference for commensal E. coli of porcine origin in Australia, facilitating tracking of specific lineages and the mobile resistance genes they carry.
Collapse
Affiliation(s)
- Cameron J Reid
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Ethan R Wyrsch
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Piklu Roy Chowdhury
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Tiziana Zingali
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Michael Liu
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Aaron E Darling
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Toni A Chapman
- 2NSW Department of Primary Industries, Elizabeth MacArthur Agricultural Institute, Menangle, NSW 2568, Australia
| | - Steven P Djordjevic
- 1The i3 institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| |
Collapse
|
26
|
Abstract
Background Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. Findings We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. Conclusions We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Collapse
Affiliation(s)
- Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2077, Australia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, PO Box 123, Broadway, NSW 2077, Australia
| |
Collapse
|
27
|
Fourment M, Claywell BC, Dinh V, McCoy C, Matsen Iv FA, Darling AE. Effective Online Bayesian Phylogenetics via Sequential Monte Carlo with Guided Proposals. Syst Biol 2018; 67:490-502. [PMID: 29186587 PMCID: PMC5920299 DOI: 10.1093/sysbio/syx090] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Accepted: 11/20/2017] [Indexed: 11/14/2022] Open
Abstract
Modern infectious disease outbreak surveillance produces continuous streams of sequence data which require phylogenetic analysis as data arrives. Current software packages for Bayesian phylogenetic inference are unable to quickly incorporate new sequences as they become available, making them less useful for dynamically unfolding evolutionary stories. This limitation can be addressed by applying a class of Bayesian statistical inference algorithms called sequential Monte Carlo (SMC) to conduct online inference, wherein new data can be continuously incorporated to update the estimate of the posterior probability distribution. In this article, we describe and evaluate several different online phylogenetic sequential Monte Carlo (OPSMC) algorithms. We show that proposing new phylogenies with a density similar to the Bayesian prior suffers from poor performance, and we develop “guided” proposals that better match the proposal density to the posterior. Furthermore, we show that the simplest guided proposals can exhibit pathological behavior in some situations, leading to poor results, and that the situation can be resolved by heating the proposal density. The results demonstrate that relative to the widely used MCMC-based algorithm implemented in MrBayes, the total time required to compute a series of phylogenetic posteriors as sequences arrive can be significantly reduced by the use of OPSMC, without incurring a significant loss in accuracy.
Collapse
Affiliation(s)
- Mathieu Fourment
- ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | | | - Vu Dinh
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Connor McCoy
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | - Aaron E Darling
- ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
| |
Collapse
|
28
|
O'Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, Kaur S, Maier-Hein L, McCarthy DJ, Moore WJ, Stenau E, Swedlow JR, Vuong J, Procter JB. Visualization of Biomedical Data. Annu Rev Biomed Data Sci 2018. [DOI: 10.1146/annurev-biodatasci-080917-013424] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The rapid increase in volume and complexity of biomedical data requires changes in research, communication, and clinical practices. This includes learning how to effectively integrate automated analysis with high–data density visualizations that clearly express complex phenomena. In this review, we summarize key principles and resources from data visualization research that help address this difficult challenge. We then survey how visualization is being used in a selection of emerging biomedical research areas, including three-dimensional genomics, single-cell RNA sequencing (RNA-seq), the protein structure universe, phosphoproteomics, augmented reality–assisted surgery, and metagenomics. While specific research areas need highly tailored visualizations, there are common challenges that can be addressed with general methods and strategies. Also common, however, are poor visualization practices. We outline ongoing initiatives aimed at improving visualization practices in biomedical research via better tools, peer-to-peer learning, and interdisciplinary collaboration with computer scientists, science communicators, and graphic designers. These changes are revolutionizing how we see and think about our data.
Collapse
Affiliation(s)
- Seán I. O'Donoghue
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
- School of Biotechnology and Biomolecular Sciences, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Benedetta Frida Baldi
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Susan J. Clark
- Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia
| | - Aaron E. Darling
- The ithree Institute, University of Technology Sydney, Ultimo NSW 2007, Australia
| | - James M. Hogan
- School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane QLD, 4000, Australia
| | - Sandeep Kaur
- School of Computer Science and Engineering, University of New South Wales (UNSW), Kensington NSW 2033, Australia
| | - Lena Maier-Hein
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Davis J. McCarthy
- European Bioinformatics Institute (EBI), European Molecular Biology Laboratory (EMBL), Wellcome Genome Campus, Hinxton CB10 1SD, United Kingdom
- St. Vincent's Institute of Medical Research, Fitzroy VIC 3065, Australia
| | - William J. Moore
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Esther Stenau
- Division of Computer Assisted Medical Interventions (CAMI), German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Jason R. Swedlow
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| | - Jenny Vuong
- Data61, Commonwealth Scientific and Industrial Research Organisation (CSIRO), Eveleigh NSW 2015, Australia
| | - James B. Procter
- School of Life Sciences, University of Dundee, Dundee DD1 5EH, United Kingdom
| |
Collapse
|
29
|
Abstract
Time-resolved phylogenetic methods use information about the time of sample collection to estimate the rate of evolution. Originally, the models used to estimate evolutionary rates were quite simple, assuming that all lineages evolve at the same rate, an assumption commonly known as the molecular clock. Richer and more complex models have since been introduced to capture the phenomenon of substitution rate variation among lineages. Two well known model extensions are the local clock, wherein all lineages in a clade share a common substitution rate, and the uncorrelated relaxed clock, wherein the substitution rate on each lineage is independent from other lineages while being constrained to fit some parametric distribution. We introduce a further model extension, called the flexible local clock (FLC), which provides a flexible framework to combine relaxed clock models with local clock models. We evaluate the flexible local clock on simulated and real datasets and show that it provides substantially improved fit to an influenza dataset. An implementation of the model is available for download from https://www.github.com/4ment/flc.
Collapse
Affiliation(s)
- Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia
| | - Aaron E Darling
- ithree institute, University of Technology Sydney, Sydney, Australia
| |
Collapse
|
30
|
Wang K, Chen YQ, Salido MM, Kohli GS, Kong JL, Liang HJ, Yao ZT, Xie YT, Wu HY, Cai SQ, Drautz-Moses DI, Darling AE, Schuster SC, Yang L, Ding Y. The rapid in vivo evolution of Pseudomonas aeruginosa in ventilator-associated pneumonia patients leads to attenuated virulence. Open Biol 2018; 7:rsob.170029. [PMID: 28878043 PMCID: PMC5627047 DOI: 10.1098/rsob.170029] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 07/26/2017] [Indexed: 01/15/2023] Open
Abstract
Pseudomonas aeruginosa is an opportunistic pathogen that causes severe airway infections in humans. These infections are usually difficult to treat and associated with high mortality rates. While colonizing the human airways, P. aeruginosa could accumulate genetic mutations that often lead to its better adaptability to the host environment. Understanding these evolutionary traits may provide important clues for the development of effective therapies to treat P. aeruginosa infections. In this study, 25 P. aeruginosa isolates were longitudinally sampled from the airways of four ventilator-associated pneumonia (VAP) patients. Pacbio and Illumina sequencing were used to analyse the in vivo evolutionary trajectories of these isolates. Our analysis showed that positive selection dominantly shaped P. aeruginosa genomes during VAP infections and led to three convergent evolution events, including loss-of-function mutations of lasR and mpl, and a pyoverdine-deficient phenotype. Specifically, lasR encodes one of the major transcriptional regulators in quorum sensing, whereas mpl encodes an enzyme responsible for recycling cell wall peptidoglycan. We also found that P. aeruginosa isolated at late stages of VAP infections produce less elastase and are less virulent in vivo than their earlier isolated counterparts, suggesting the short-term in vivo evolution of P. aeruginosa leads to attenuated virulence.
Collapse
Affiliation(s)
- Ke Wang
- Department of Respiratory Disease, First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China.,Centre for Genomic and Personalized Medicine, Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Yi-Qiang Chen
- Department of Respiratory Disease, First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - May M Salido
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore
| | - Gurjeet S Kohli
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore
| | - Jin-Liang Kong
- Department of Respiratory Disease, First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Hong-Jie Liang
- Department of Clinical Laboratory, First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Zi-Ting Yao
- Centre for Genomic and Personalized Medicine, Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Yan-Tong Xie
- The First Clinical School of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Hua-Yu Wu
- Department of Cell Biology and Genetics, Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Shuang-Qi Cai
- Department of Respiratory Disease, First Affiliated Hospital of Guangxi Medical University, Nanning 530021, Guangxi, People's Republic of China
| | - Daniela I Drautz-Moses
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore
| | - Aaron E Darling
- The ithree Institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Stephan C Schuster
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Liang Yang
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
| | - Yichen Ding
- Singapore Centre for Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore 637551, Singapore .,School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore.,Interdisciplinary Graduate School, SCELSE, Nanyang Technological University, Singapore 639798, Singapore
| |
Collapse
|
31
|
Deutscher AT, Burke CM, Darling AE, Riegler M, Reynolds OL, Chapman TA. Near full-length 16S rRNA gene next-generation sequencing revealed Asaia as a common midgut bacterium of wild and domesticated Queensland fruit fly larvae. Microbiome 2018; 6:85. [PMID: 29729663 PMCID: PMC5935925 DOI: 10.1186/s40168-018-0463-y] [Citation(s) in RCA: 32] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/30/2017] [Accepted: 04/19/2018] [Indexed: 05/25/2023]
Abstract
BACKGROUND Gut microbiota affects tephritid (Diptera: Tephritidae) fruit fly development, physiology, behavior, and thus the quality of flies mass-reared for the sterile insect technique (SIT), a target-specific, sustainable, environmentally benign form of pest management. The Queensland fruit fly, Bactrocera tryoni (Tephritidae), is a significant horticultural pest in Australia and can be managed with SIT. Little is known about the impacts that laboratory-adaptation (domestication) and mass-rearing have on the tephritid larval gut microbiome. Read lengths of previous fruit fly next-generation sequencing (NGS) studies have limited the resolution of microbiome studies, and the diversity within populations is often overlooked. In this study, we used a new near full-length (> 1300 nt) 16S rRNA gene amplicon NGS approach to characterize gut bacterial communities of individual B. tryoni larvae from two field populations (developing in peaches) and three domesticated populations (mass- or laboratory-reared on artificial diets). RESULTS Near full-length 16S rRNA gene sequences were obtained for 56 B. tryoni larvae. OTU clustering at 99% similarity revealed that gut bacterial diversity was low and significantly lower in domesticated larvae. Bacteria commonly associated with fruit (Acetobacteraceae, Enterobacteriaceae, and Leuconostocaceae) were detected in wild larvae, but were largely absent from domesticated larvae. However, Asaia, an acetic acid bacterium not frequently detected within adult tephritid species, was detected in larvae of both wild and domesticated populations (55 out of 56 larval gut samples). Larvae from the same single peach shared a similar gut bacterial profile, whereas larvae from different peaches collected from the same tree had different gut bacterial profiles. Clustering of the Asaia near full-length sequences at 100% similarity showed that the wild flies from different locations had different Asaia strains. CONCLUSIONS Variation in the gut bacterial communities of B. tryoni larvae depends on diet, domestication, and horizontal acquisition. Bacterial variation in wild larvae suggests that more than one bacterial species can perform the same functional role; however, Asaia could be an important gut bacterium in larvae and warrants further study. A greater understanding of the functions of the bacteria detected in larvae could lead to increased fly quality and performance as part of the SIT.
Collapse
Affiliation(s)
- Ania T. Deutscher
- Present Address: Biosecurity and Food Safety, NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW Australia
- Graham Centre for Agricultural Innovation (an alliance between NSW Department of Primary Industries and Charles Sturt University), Elizabeth Macarthur Agricultural Institute, Menangle, NSW Australia
| | - Catherine M. Burke
- School of Life Sciences, University of Technology Sydney, Sydney, NSW Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, Sydney, NSW Australia
| | - Markus Riegler
- Hawkesbury Institute for the Environment, Western Sydney University, Richmond, NSW Australia
| | - Olivia L. Reynolds
- Present Address: Biosecurity and Food Safety, NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW Australia
- Graham Centre for Agricultural Innovation (an alliance between NSW Department of Primary Industries and Charles Sturt University), Elizabeth Macarthur Agricultural Institute, Menangle, NSW Australia
| | - Toni A. Chapman
- Present Address: Biosecurity and Food Safety, NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW Australia
| |
Collapse
|
32
|
Dinh V, Darling AE, Matsen IV FA. Online Bayesian Phylogenetic Inference: Theoretical Foundations via Sequential Monte Carlo. Syst Biol 2018; 67:503-517. [PMID: 29244177 PMCID: PMC5920340 DOI: 10.1093/sysbio/syx087] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2017] [Revised: 11/08/2017] [Accepted: 11/09/2017] [Indexed: 11/29/2022] Open
Abstract
Phylogenetics, the inference of evolutionary trees from molecular sequence data such as DNA, is an enterprise that yields valuable evolutionary understanding of many biological systems. Bayesian phylogenetic algorithms, which approximate a posterior distribution on trees, have become a popular if computationally expensive means of doing phylogenetics. Modern data collection technologies are quickly adding new sequences to already substantial databases. With all current techniques for Bayesian phylogenetics, computation must start anew each time a sequence becomes available, making it costly to maintain an up-to-date estimate of a phylogenetic posterior. These considerations highlight the need for an online Bayesian phylogenetic method which can update an existing posterior with new sequences. Here, we provide theoretical results on the consistency and stability of methods for online Bayesian phylogenetic inference based on Sequential Monte Carlo (SMC) and Markov chain Monte Carlo. We first show a consistency result, demonstrating that the method samples from the correct distribution in the limit of a large number of particles. Next, we derive the first reported set of bounds on how phylogenetic likelihood surfaces change when new sequences are added. These bounds enable us to characterize the theoretical performance of sampling algorithms by bounding the effective sample size (ESS) with a given number of particles from below. We show that the ESS is guaranteed to grow linearly as the number of particles in an SMC sampler grows. Surprisingly, this result holds even though the dimensions of the phylogenetic model grow with each new added sequence.
Collapse
Affiliation(s)
- Vu Dinh
- Department of Mathematical Sciences, University of Delaware, 312 Ewing Hall, Newark, DE 19716, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, 15 Broadway, Ultimo NSW 2007, Australia
| | - Frederick A Matsen IV
- Program in Computational Biology, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA
| |
Collapse
|
33
|
Bogema DR, Micallef ML, Liu M, Padula MP, Djordjevic SP, Darling AE, Jenkins C. Analysis of Theileria orientalis draft genome sequences reveals potential species-level divergence of the Ikeda, Chitose and Buffeli genotypes. BMC Genomics 2018; 19:298. [PMID: 29703152 PMCID: PMC5921998 DOI: 10.1186/s12864-018-4701-2] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Accepted: 04/18/2018] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Theileria orientalis (Apicomplexa: Piroplasmida) has caused clinical disease in cattle of Eastern Asia for many years and its recent rapid spread throughout Australian and New Zealand herds has caused substantial economic losses to production through cattle deaths, late term abortion and morbidity. Disease outbreaks have been linked to the detection of a pathogenic genotype of T. orientalis, genotype Ikeda, which is also responsible for disease outbreaks in Asia. Here, we sequenced and compared the draft genomes of one pathogenic (Ikeda) and two apathogenic (Chitose, Buffeli) isolates of T. orientalis sourced from Australian herds. RESULTS Using de novo assembled sequences and a single nucleotide variant (SNV) analysis pipeline, we found extensive genetic divergence between the T. orientalis genotypes. A genome-wide phylogeny reconstructed to address continued confusion over nomenclature of this species displayed concordance with prior phylogenetic studies based on the major piroplasm surface protein (MPSP) gene. However, average nucleotide identity (ANI) values revealed that the divergence between isolates is comparable to that observed between other theilerias which represent distinct species. Analysis of SNVs revealed putative recombination between the Chitose and Buffeli genotypes and also between Australian and Japanese Ikeda isolates. Finally, to inform future vaccine studies, dN/dS ratios and surface location predictions were analysed. Six predicted surface protein targets were confirmed to be expressed during the piroplasm phase of the parasite by mass spectrometry. CONCLUSIONS We used whole genome sequencing to demonstrate that the T. orientalis Ikeda, Chitose and Buffeli variants show substantial genetic divergence. Our data indicates that future researchers could potentially consider disease-associated Ikeda and closely related genotypes as a separate species from non-pathogenic Chitose and Buffeli.
Collapse
Affiliation(s)
- Daniel R Bogema
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, Australia
| | - Melinda L Micallef
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, Australia
| | - Michael Liu
- The ithree institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Matthew P Padula
- The ithree institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Ultimo, NSW, Australia
| | - Cheryl Jenkins
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Menangle, NSW, Australia.
| |
Collapse
|
34
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1101/099127] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 05/25/2023]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Department of Pediatrics, University of California, San Diego, California, USA
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
- Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France
- National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
- Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
- Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
- Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|
35
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1038/nmeth.4458] [Citation(s) in RCA: 430] [Impact Index Per Article: 61.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 12/12/2022]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Department of Pediatrics, University of California, San Diego, California, USA.,Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark.,Department of Microbiology, University of Copenhagen, Copenhagen, Denmark.,Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France.,National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France.,Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA.,Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.,Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|
36
|
Liu MY, Worden P, Monahan LG, DeMaere MZ, Burke CM, Djordjevic SP, Charles IG, Darling AE. Evaluation of ddRADseq for reduced representation metagenome sequencing. PeerJ 2017; 5:e3837. [PMID: 28948110 PMCID: PMC5609526 DOI: 10.7717/peerj.3837] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2017] [Accepted: 08/31/2017] [Indexed: 11/23/2022] Open
Abstract
Background Profiling of microbial communities via metagenomic shotgun sequencing has enabled researches to gain unprecedented insight into microbial community structure and the functional roles of community members. This study describes a method and basic analysis for a metagenomic adaptation of the double digest restriction site associated DNA sequencing (ddRADseq) protocol for reduced representation metagenome profiling. Methods This technique takes advantage of the sequence specificity of restriction endonucleases to construct an Illumina-compatible sequencing library containing DNA fragments that are between a pair of restriction sites located within close proximity. This results in a reduced sequencing library with coverage breadth that can be tuned by size selection. We assessed the performance of the metagenomic ddRADseq approach by applying the full method to human stool samples and generating sequence data. Results The ddRADseq data yields a similar estimate of community taxonomic profile as obtained from shotgun metagenome sequencing of the same human stool samples. No obvious bias with respect to genomic G + C content and the estimated relative species abundance was detected. Discussion Although ddRADseq does introduce some bias in taxonomic representation, the bias is likely to be small relative to DNA extraction bias. ddRADseq appears feasible and could have value as a tool for metagenome-wide association studies.
Collapse
Affiliation(s)
- Michael Y Liu
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Paul Worden
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Leigh G Monahan
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Catherine M Burke
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Ian G Charles
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
37
|
Fourment M, Darling AE, Holmes EC. The impact of migratory flyways on the spread of avian influenza virus in North America. BMC Evol Biol 2017; 17:118. [PMID: 28545432 PMCID: PMC5445350 DOI: 10.1186/s12862-017-0965-4] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 05/11/2017] [Indexed: 11/16/2022] Open
Abstract
Background Wild birds are the major reservoir hosts for influenza A viruses (AIVs) and have been implicated in the emergence of pandemic events in livestock and human populations. Understanding how AIVs spread within and across continents is therefore critical to the development of successful strategies to manage and reduce the impact of influenza outbreaks. In North America many bird species undergo seasonal migratory movements along a North-South axis, thereby providing opportunities for viruses to spread over long distances. However, the role played by such avian flyways in shaping the genetic structure of AIV populations remains uncertain. Results To assess the relative contribution of bird migration along flyways to the genetic structure of AIV we performed a large-scale phylogeographic study of viruses sampled in the USA and Canada, involving the analysis of 3805 to 4505 sequences from 36 to 38 geographic localities depending on the gene segment data set. To assist in this we developed a maximum likelihood-based genetic algorithm to explore a wide range of complex spatial models, depicting a more complete picture of the migration network than determined previously. Conclusions Based on phylogenies estimated from nucleotide sequence data sets, our results show that AIV migration rates are significantly higher within than between flyways, indicating that the migratory patterns of birds play a key role in viral dispersal. These findings provide valuable insights into the evolution, maintenance and transmission of AIVs, in turn allowing the development of improved programs for surveillance and risk assessment.
Collapse
Affiliation(s)
- Mathieu Fourment
- ithree institute, University of Technology Sydney, Sydney, Australia. .,Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, Australia.
| | - Aaron E Darling
- ithree institute, University of Technology Sydney, Sydney, Australia
| | - Edward C Holmes
- Marie Bashir Institute for Infectious Diseases and Biosecurity, Charles Perkins Centre, School of Life and Environmental Sciences and Sydney Medical School, The University of Sydney, Sydney, Australia
| |
Collapse
|
38
|
DeMaere MZ, Darling AE. Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C). PeerJ 2016; 4:e2676. [PMID: 27843713 PMCID: PMC5103821 DOI: 10.7717/peerj.2676] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2016] [Accepted: 10/11/2016] [Indexed: 11/20/2022] Open
Abstract
BACKGROUND Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. METHODS We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. RESULTS When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. DISCUSSION Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development.
Collapse
Affiliation(s)
- Matthew Z. DeMaere
- ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| | - Aaron E. Darling
- ithree institute, University of Technology Sydney, Sydney, NSW, Australia
| |
Collapse
|
39
|
Burke CM, Darling AE. A method for high precision sequencing of near full-length 16S rRNA genes on an Illumina MiSeq. PeerJ 2016; 4:e2492. [PMID: 27688981 PMCID: PMC5036073 DOI: 10.7717/peerj.2492] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2015] [Accepted: 08/25/2016] [Indexed: 12/21/2022] Open
Abstract
Background The bacterial 16S rRNA gene has historically been used in defining bacterial taxonomy and phylogeny. However, there are currently no high-throughput methods to sequence full-length 16S rRNA genes present in a sample with precision. Results We describe a method for sequencing near full-length 16S rRNA gene amplicons using the high throughput Illumina MiSeq platform and test it using DNA from human skin swab samples. Proof of principle of the approach is demonstrated, with the generation of 1,604 sequences greater than 1,300 nt from a single Nano MiSeq run, with accuracy estimated to be 100-fold higher than standard Illumina reads. The reads were chimera filtered using information from a single molecule dual tagging scheme that boosts the signal available for chimera detection. Conclusions This method could be scaled up to generate many thousands of sequences per MiSeq run and could be applied to other sequencing platforms. This has great potential for populating databases with high quality, near full-length 16S rRNA gene sequences from under-represented taxa and environments and facilitates analyses of microbial communities at higher resolution.
Collapse
Affiliation(s)
- Catherine M Burke
- The i3 Institute, University of Technology Sydney , Sydney, NSW , Australia
| | - Aaron E Darling
- The i3 Institute, University of Technology Sydney , Sydney, NSW , Australia
| |
Collapse
|
40
|
Roy Chowdhury P, DeMaere M, Chapman T, Worden P, Charles IG, Darling AE, Djordjevic SP. Comparative genomic analysis of toxin-negative strains of Clostridium difficile from humans and animals with symptoms of gastrointestinal disease. BMC Microbiol 2016; 16:41. [PMID: 26971047 PMCID: PMC4789261 DOI: 10.1186/s12866-016-0653-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/02/2016] [Indexed: 12/13/2022] Open
Abstract
Background Clostridium difficile infections (CDI) are a significant health problem to humans and food animals. Clostridial toxins ToxA and ToxB encoded by genes tcdA and tcdB are located on a pathogenicity locus known as the PaLoc and are the major virulence factors of C. difficile. While toxin-negative strains of C. difficile are often isolated from faeces of animals and patients suffering from CDI, they are not considered to play a role in disease. Toxin-negative strains of C. difficile have been used successfully to treat recurring CDI but their propensity to acquire the PaLoc via lateral gene transfer and express clinically relevant levels of toxins has reinforced the need to characterise them genetically. In addition, further studies that examine the pathogenic potential of toxin-negative strains of C. difficile and the frequency by which toxin-negative strains may acquire the PaLoc are needed. Results We undertook a comparative genomic analysis of five Australian toxin-negative isolates of C. difficile that lack tcdA, tcdB and both binary toxin genes cdtA and cdtB that were recovered from humans and farm animals with symptoms of gastrointestinal disease. Our analyses show that the five C. difficile isolates cluster closely with virulent toxigenic strains of C. difficile belonging to the same sequence type (ST) and have virulence gene profiles akin to those in toxigenic strains. Furthermore, phage acquisition appears to have played a key role in the evolution of C. difficile. Conclusions Our results are consistent with the C. difficile global population structure comprising six clades each containing both toxin-positive and toxin-negative strains. Our data also suggests that toxin-negative strains of C. difficile encode a repertoire of putative virulence factors that are similar to those found in toxigenic strains of C. difficile, raising the possibility that acquisition of PaLoc by toxin-negative strains poses a threat to human health. Studies in appropriate animal models are needed to examine the pathogenic potential of toxin-negative strains of C. difficile and to determine the frequency by which toxin-negative strains may acquire the PaLoc. Electronic supplementary material The online version of this article (doi:10.1186/s12866-016-0653-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Piklu Roy Chowdhury
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia. .,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia.
| | - Matthew DeMaere
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Toni Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, PMB 8, Camden, NSW, 2570, Australia
| | - Paul Worden
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Ian G Charles
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.,Institute of Food Research, Norwich Research Park, Colney, Norwich, NR4 7UA, UK
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, Sydney, 2007, Australia.
| |
Collapse
|
41
|
Joss TV, Burke CM, Hudson BJ, Darling AE, Forer M, Alber DG, Charles IG, Stow NW. Bacterial Communities Vary between Sinuses in Chronic Rhinosinusitis Patients. Front Microbiol 2016; 6:1532. [PMID: 26834708 PMCID: PMC4722142 DOI: 10.3389/fmicb.2015.01532] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2015] [Accepted: 12/21/2015] [Indexed: 12/02/2022] Open
Abstract
Chronic rhinosinusitis (CRS) is a common and potentially debilitating disease characterized by inflammation of the sinus mucosa for longer than 12 weeks. Bacterial colonization of the sinuses and its role in the pathogenesis of this disease is an ongoing area of research. Recent advances in culture-independent molecular techniques for bacterial identification have the potential to provide a more accurate and complete assessment of the sinus microbiome, however there is little concordance in results between studies, possibly due to differences in the sampling location and techniques. This study aimed to determine whether the microbial communities from one sinus could be considered representative of all sinuses, and examine differences between two commonly used methods for sample collection, swabs, and tissue biopsies. High-throughput DNA sequencing of the bacterial 16S rRNA gene was applied to both swab and tissue samples from multiple sinuses of 19 patients undergoing surgery for treatment of CRS. Results from swabs and tissue biopsies showed a high degree of similarity, indicating that swabbing is sufficient to recover the microbial community from the sinuses. Microbial communities from different sinuses within individual patients differed to varying degrees, demonstrating that it is possible for distinct microbiomes to exist simultaneously in different sinuses of the same patient. The sequencing results correlated well with culture-based pathogen identification conducted in parallel, although the culturing missed many species detected by sequencing. This finding has implications for future research into the sinus microbiome, which should take this heterogeneity into account by sampling patients from more than one sinus.
Collapse
Affiliation(s)
- Tom V Joss
- Faculty of Medicine, Sydney Medical School, University of Sydney Sydney, NSW, Australia
| | - Catherine M Burke
- Faculty of Science, The Ithree Institute, University of Technology Sydney, NSW, Australia
| | - Bernard J Hudson
- Department of Microbiology and Infectious Disease, Royal North Shore Hospital Sydney, NSW, Australia
| | - Aaron E Darling
- Faculty of Science, The Ithree Institute, University of Technology Sydney, NSW, Australia
| | - Martin Forer
- Department of Otolaryngology, Royal North Shore Hospital Sydney, NSW, Australia
| | - Dagmar G Alber
- Faculty of Science, The Ithree Institute, University of Technology Sydney, NSW, Australia
| | - Ian G Charles
- Faculty of Science, The Ithree Institute, University of Technology Sydney, NSW, Australia
| | - Nicholas W Stow
- Department of Otolaryngology, Mona Vale Hospital Sydney, NSW, Australia
| |
Collapse
|
42
|
O'Flynn C, Deusch O, Darling AE, Eisen JA, Wallis C, Davis IJ, Harris SJ. Comparative Genomics of the Genus Porphyromonas Identifies Adaptations for Heme Synthesis within the Prevalent Canine Oral Species Porphyromonas cangingivalis. Genome Biol Evol 2015; 7:3397-413. [PMID: 26568374 PMCID: PMC4700951 DOI: 10.1093/gbe/evv220] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Porphyromonads play an important role in human periodontal disease and recently have been shown to be highly prevalent in canine mouths. Porphyromonas cangingivalis is the most prevalent canine oral bacterial species in both plaque from healthy gingiva and plaque from dogs with early periodontitis. The ability of P. cangingivalis to flourish in the different environmental conditions characterized by these two states suggests a degree of metabolic flexibility. To characterize the genes responsible for this, the genomes of 32 isolates (including 18 newly sequenced and assembled) from 18 Porphyromonad species from dogs, humans, and other mammals were compared. Phylogenetic trees inferred using core genes largely matched previous findings; however, comparative genomic analysis identified several genes and pathways relating to heme synthesis that were present in P. cangingivalis but not in other Porphyromonads. Porphyromonas cangingivalis has a complete protoporphyrin IX synthesis pathway potentially allowing it to synthesize its own heme unlike pathogenic Porphyromonads such as Porphyromonas gingivalis that acquire heme predominantly from blood. Other pathway differences such as the ability to synthesize siroheme and vitamin B12 point to enhanced metabolic flexibility for P. cangingivalis, which may underlie its prevalence in the canine oral cavity.
Collapse
Affiliation(s)
- Ciaran O'Flynn
- The WALTHAM Centre for Pet Nutrition, Waltham-on-the-Wolds, United Kingdom
| | - Oliver Deusch
- The WALTHAM Centre for Pet Nutrition, Waltham-on-the-Wolds, United Kingdom
| | - Aaron E Darling
- The ithree Institute, University of Technology Sydney, Ultimo, New South Wales, Australia
| | - Jonathan A Eisen
- Department of Evolution and Ecology, University of California, Davis Department of Medical Microbiology and Immunology, University of California, Davis UC Davis Genome Center, University of California, Davis
| | - Corrin Wallis
- The WALTHAM Centre for Pet Nutrition, Waltham-on-the-Wolds, United Kingdom
| | - Ian J Davis
- The WALTHAM Centre for Pet Nutrition, Waltham-on-the-Wolds, United Kingdom
| | - Stephen J Harris
- The WALTHAM Centre for Pet Nutrition, Waltham-on-the-Wolds, United Kingdom
| |
Collapse
|
43
|
Abstract
The sequencing, assembly, and basic analysis of microbial genomes, once a painstaking and expensive undertaking, has become much easier for research labs with access to standard molecular biology and computational tools. However, there are a confusing variety of options available for DNA library preparation and sequencing, and inexperience with bioinformatics can pose a significant barrier to entry for many who may be interested in microbial genomics. The objective of the present study was to design, test, troubleshoot, and publish a simple, comprehensive workflow from the collection of an environmental sample (a swab) to a published microbial genome; empowering even a lab or classroom with limited resources and bioinformatics experience to perform it.
Collapse
|
44
|
Wyrsch E, Roy Chowdhury P, Abraham S, Santos J, Darling AE, Charles IG, Chapman TA, Djordjevic SP. Comparative genomic analysis of a multiple antimicrobial resistant enterotoxigenic E. coli O157 lineage from Australian pigs. BMC Genomics 2015; 16:165. [PMID: 25888127 PMCID: PMC4384309 DOI: 10.1186/s12864-015-1382-y] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2014] [Accepted: 02/23/2015] [Indexed: 01/01/2023] Open
Abstract
Background Enterotoxigenic Escherichia coli (ETEC) are a major economic threat to pig production globally, with serogroups O8, O9, O45, O101, O138, O139, O141, O149 and O157 implicated as the leading diarrhoeal pathogens affecting pigs below four weeks of age. A multiple antimicrobial resistant ETEC O157 (O157 SvETEC) representative of O157 isolates from a pig farm in New South Wales, Australia that experienced repeated bouts of pre- and post-weaning diarrhoea resulting in multiple fatalities was characterized here. Enterohaemorrhagic E. coli (EHEC) O157:H7 cause both sporadic and widespread outbreaks of foodborne disease, predominantly have a ruminant origin and belong to the ST11 clonal complex. Here, for the first time, we conducted comparative genomic analyses of two epidemiologically-unrelated porcine, disease-causing ETEC O157; E. coli O157 SvETEC and E. coli O157:K88 734/3, and examined their phylogenetic relationship with EHEC O157:H7. Results O157 SvETEC and O157:K88 734/3 belong to a novel sequence type (ST4245) that comprises part of the ST23 complex and are genetically distinct from EHEC O157. Comparative phylogenetic analysis using PhyloSift shows that E. coli O157 SvETEC and E. coli O157:K88 734/3 group into a single clade and are most similar to the extraintestinal avian pathogenic Escherichia coli (APEC) isolate O78 that clusters within the ST23 complex. Genome content was highly similar between E. coli O157 SvETEC, O157:K88 734/3 and APEC O78, with variability predominantly limited to laterally acquired elements, including prophages, plasmids and antimicrobial resistance gene loci. Putative ETEC virulence factors, including the toxins STb and LT and the K88 (F4) adhesin, were conserved between O157 SvETEC and O157:K88 734/3. The O157 SvETEC isolate also encoded the heat stable enterotoxin STa and a second allele of STb, whilst a prophage within O157:K88 734/3 encoded the serum survival gene bor. Both isolates harbor a large repertoire of antibiotic resistance genes but their association with mobile elements remains undetermined. Conclusions We present an analysis of the first draft genome sequences of two epidemiologically-unrelated, pathogenic ETEC O157. E. coli O157 SvETEC and E. coli O157:K88 734/3 belong to the ST23 complex and are phylogenetically distinct to EHEC O157 lineages that reside within the ST11 complex. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1382-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ethan Wyrsch
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia.
| | - Piklu Roy Chowdhury
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia. .,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Private Bag 4008, Narellan, NSW, 2567, Australia.
| | - Sam Abraham
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Private Bag 4008, Narellan, NSW, 2567, Australia. .,School of Animal and Veterinary Sciences, University of Adelaide, Adelaide, South Australia, 5371, Australia.
| | - Jerran Santos
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia.
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia.
| | - Ian G Charles
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia.
| | - Toni A Chapman
- NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Private Bag 4008, Narellan, NSW, 2567, Australia.
| | - Steven P Djordjevic
- The ithree institute, University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia.
| |
Collapse
|
45
|
|
46
|
Lauro FM, Senstius SJ, Cullen J, Neches R, Jensen RM, Brown MV, Darling AE, Givskov M, McDougald D, Hoeke R, Ostrowski M, Philip GK, Paulsen IT, Grzymski JJ. The common oceanographer: crowdsourcing the collection of oceanographic data. PLoS Biol 2014; 12:e1001947. [PMID: 25203659 PMCID: PMC4159111 DOI: 10.1371/journal.pbio.1001947] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Federico M. Lauro
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales, Australia
- Singapore Centre on Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore
- * E-mail: (FML); (JJG)
| | | | - Jay Cullen
- School of Earth and Ocean Sciences, University of Victoria, Victoria, British Columbia, Canada
| | - Russell Neches
- Genome Center, University of California, Davis, California, United States of America
| | - Rachelle M. Jensen
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales, Australia
| | - Mark V. Brown
- School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, New South Wales, Australia
| | - Aaron E. Darling
- The ithree institute, University of Technology Sydney, Ultimo, New South Wales, Australia
| | - Michael Givskov
- Singapore Centre on Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore
- Costerton Biofilm Center, Department of International Health, Immunology, and Microbiology, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Diane McDougald
- Singapore Centre on Environmental Life Sciences Engineering (SCELSE), Nanyang Technological University, Singapore
- Centre for Marine Bio-Innovation, University of New South Wales, Sydney, New South Wales, Australia
| | - Ron Hoeke
- Centre for Australian Climate and Weather Research, CSIRO, Aspendale, Victoria, Australia
| | - Martin Ostrowski
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Gayle K. Philip
- VLSCI Life Sciences Computation Centre, University of Melbourne, Melbourne, Victoria, Australia
| | - Ian T. Paulsen
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, New South Wales, Australia
| | - Joseph J. Grzymski
- Division of Earth and Ecosystem Sciences, Desert Research Institute, Reno, Nevada, United States of America
- * E-mail: (FML); (JJG)
| |
Collapse
|
47
|
Darling AE, McKinnon J, Worden P, Santos J, Charles IG, Chowdhury PR, Djordjevic SP. A draft genome of Escherichia coli sequence type 127 strain 2009-46. Gut Pathog 2014; 6:32. [PMID: 25197321 PMCID: PMC4155142 DOI: 10.1186/1757-4749-6-32] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 07/15/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Escherichia coli are a frequent cause of urinary tract infections (UTI) and are thought to have a foodborne origin. E. coli with sequence type 127 (ST127) are emerging pathogens increasingly implicated as a cause of urinary tract infections (UTI) globally. A ST127 isolate (2009-46) resistant to ampicillin and trimethoprim was recovered from the urine of a 56 year old patient with a UTI from a hospital in Sydney, Australia and was characterised here. RESULTS We sequenced the genome of Escherichia coli 2009-46 using the Illumina Nextera XT and MiSeq technologies. Assembly of the sequence data reconstructed a 5.14 Mbp genome in 89 scaffolds with an N50 of 161 kbp. The genome has extensive similarity to other sequenced uropathogenic E. coli genomes, but also has several genes that are potentially related to virulence and pathogenicity that are not present in the reference E. coli strain. CONCLUSION E. coli 2009-46 is a multiple antibiotic resistant, phylogroup B2 isolate recovered from a patient with a UTI. This is the first description of a drug resistant E. coli ST127 in Australia.
Collapse
Affiliation(s)
- Aaron E Darling
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| | - Jessica McKinnon
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| | - Paul Worden
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| | - Jerran Santos
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| | - Ian G Charles
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| | - Piklu Roy Chowdhury
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia.,NSW Department of Primary Industries, Elizabeth Macarthur Agricultural Institute, Woodbridge Road, 2568 Menangle, Australia
| | - Steven P Djordjevic
- ithree Institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia
| |
Collapse
|
48
|
Beitel CW, Froenicke L, Lang JM, Korf IF, Michelmore RW, Eisen JA, Darling AE. Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products. PeerJ 2014; 2:e415. [PMID: 24918035 PMCID: PMC4045339 DOI: 10.7717/peerj.415] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2014] [Accepted: 05/15/2014] [Indexed: 12/13/2022] Open
Abstract
Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of "binning" the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a simple synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.
Collapse
Affiliation(s)
| | - Lutz Froenicke
- The University of California, Davis Genome Center , Davis, CA , USA
| | - Jenna M Lang
- The University of California, Davis Genome Center , Davis, CA , USA
| | - Ian F Korf
- The University of California, Davis Genome Center , Davis, CA , USA ; Department of Molecular and Cellular Biology, University of California , Davis, CA , USA
| | - Richard W Michelmore
- The University of California, Davis Genome Center , Davis, CA , USA ; Department of Molecular and Cellular Biology, University of California , Davis, CA , USA ; Department of Plant Sciences, University of California , Davis, CA , USA
| | - Jonathan A Eisen
- The University of California, Davis Genome Center , Davis, CA , USA ; Department of Medical Microbiology and Immunology, University of California , Davis, CA , USA ; Department of Evolution and Ecology, University of California , Davis, CA , USA
| | - Aaron E Darling
- ithree institute, University of Technology Sydney , Sydney, NSW , Australia
| |
Collapse
|
49
|
Abstract
Background Clostridium difficile is the leading cause of infectious diarrhea in humans and responsible for large outbreaks of enteritis in neonatal pigs in both North America and Europe. Disease caused by C. difficile typically occurs during antibiotic therapy and its emergence over the past 40 years is linked with the widespread use of broad-spectrum antibiotics in both human and veterinary medicine. Results We sequenced the genome of Clostridium difficile 5.3 using the Illumina Nextera XT and MiSeq technologies. Assembly of the sequence data reconstructed a 4,009,318 bp genome in 27 scaffolds with an N50 of 786 kbp. The genome has extensive similarity to other sequenced C. difficile genomes, but also has several genes that are potentially related to virulence and pathogenicity that are not present in the reference C. difficile strain. Conclusion Genome sequencing of human and animal isolates is needed to understand the molecular events driving the emergence of C. difficile as a gastrointestinal pathogen of humans and food animals and to better define its zoonotic potential.
Collapse
Affiliation(s)
- Aaron E Darling
- ithree institute, University of Technology Sydney, Broadway Street, 2007 Ultimo, Australia.
| | | | | | | | | | | |
Collapse
|
50
|
Darling AE, Jospin G, Lowe E, Matsen FA, Bik HM, Eisen JA. PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ 2014; 2:e243. [PMID: 24482762 PMCID: PMC3897386 DOI: 10.7717/peerj.243] [Citation(s) in RCA: 407] [Impact Index Per Article: 40.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2013] [Accepted: 12/19/2013] [Indexed: 12/13/2022] Open
Abstract
Like all organisms on the planet, environmental microbes are subject to the forces of molecular evolution. Metagenomic sequencing provides a means to access the DNA sequence of uncultured microbes. By combining DNA sequencing of microbial communities with evolutionary modeling and phylogenetic analysis we might obtain new insights into microbiology and also provide a basis for practical tools such as forensic pathogen detection. In this work we present an approach to leverage phylogenetic analysis of metagenomic sequence data to conduct several types of analysis. First, we present a method to conduct phylogeny-driven Bayesian hypothesis tests for the presence of an organism in a sample. Second, we present a means to compare community structure across a collection of many samples and develop direct associations between the abundance of certain organisms and sample metadata. Third, we apply new tools to analyze the phylogenetic diversity of microbial communities and again demonstrate how this can be associated to sample metadata. These analyses are implemented in an open source software pipeline called PhyloSift. As a pipeline, PhyloSift incorporates several other programs including LAST, HMMER, and pplacer to automate phylogenetic analysis of protein coding and RNA sequences in metagenomic datasets generated by modern sequencing platforms (e.g., Illumina, 454).
Collapse
Affiliation(s)
- Aaron E Darling
- ithree institute, University of Technology Sydney , Sydney , Australia ; Genome Center, University of California , Davis, CA , United States of America
| | - Guillaume Jospin
- Genome Center, University of California , Davis, CA , United States of America
| | - Eric Lowe
- Genome Center, University of California , Davis, CA , United States of America
| | - Frederick A Matsen
- Fred Hutchinson Cancer Research Center , Seattle, WA , United States of America
| | - Holly M Bik
- Genome Center, University of California , Davis, CA , United States of America
| | - Jonathan A Eisen
- Department of Evolution and Ecology, University of California , Davis, CA , United States of America ; Department of Medical Microbiology and Immunology, University of California , Davis, CA , United States of America
| |
Collapse
|