1
|
Gladstone RA, Lo SW, Goater R, Yeats C, Taylor B, Hadfield J, Lees JA, Croucher NJ, van Tonder AJ, Bentley LJ, Quah FX, Blaschke AJ, Pershing NL, Byington CL, Balaji V, Hryniewicz W, Sigauque B, Ravikumar K, Almeida SCG, Ochoa TJ, Ho PL, du Plessis M, Ndlangisa KM, Cornick JE, Kwambana-Adams B, Benisty R, Nzenze SA, Madhi SA, Hawkins PA, Pollard AJ, Everett DB, Antonio M, Dagan R, Klugman KP, von Gottberg A, Metcalf BJ, Li Y, Beall BW, McGee L, Breiman RF, Aanensen DM, Bentley SD. Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates. Microb Genom 2020; 6:e000357. [PMID: 32375991 PMCID: PMC7371119 DOI: 10.1099/mgen.0.000357] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/03/2020] [Indexed: 11/21/2022] Open
Abstract
Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global pneumococcal disease. We provide interactive bioinformatic outputs to explore such topics, aiming to increase dissemination of genomic insights to the wider community, without the need for specialist training. We prepared 12 country-specific phylogenetic snapshots, and international phylogenetic snapshots of 73 common Global Pneumococcal Sequence Clusters (GPSCs) previously defined using PopPUNK, and present them in Microreact. Gene presence and absence defined using Roary, and recombination profiles derived from Gubbins are presented in Phandango for each GPSC. Temporal phylogenetic signal was assessed for each GPSC using BactDating. We provide examples of how such resources can be used. In our example use of a country-specific phylogenetic snapshot we determined that serotype 14 was observed in nine unrelated genetic backgrounds in South Africa. The international phylogenetic snapshot of GPSC9, in which most serotype 14 isolates from South Africa were observed, highlights that there were three independent sub-clusters represented by South African serotype 14 isolates. We estimated from the GPSC9-dated tree that the sub-clusters were each established in South Africa during the 1980s. We show how recombination plots allowed the identification of a 20 kb recombination spanning the capsular polysaccharide locus within GPSC97. This was consistent with a switch from serotype 6A to 19A estimated to have occured in the 1990s from the GPSC97-dated tree. Plots of gene presence/absence of resistance genes (tet, erm, cat) across the GPSC23 phylogeny were consistent with acquisition of a composite transposon. We estimated from the GPSC23-dated tree that the acquisition occurred between 1953 and 1975. Finally, we demonstrate the assignment of GPSC31 to 17 externally generated pneumococcal serotype 1 assemblies from Utah via Pathogenwatch. Most of the Utah isolates clustered within GPSC31 in a USA-specific clade with the most recent common ancestor estimated between 1958 and 1981. The resources we have provided can be used to explore to data, test hypothesis and generate new hypotheses. The accessible assignment of GPSCs allows others to contextualize their own collections beyond the data presented here.
Collapse
Affiliation(s)
| | - Stephanie W. Lo
- Parasites and microbes, Wellcome Sanger InstituteHinxton, UK
| | - Richard Goater
- Centre for Genomic Pathogen Surveillance, Wellcome Genome CampusHinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Corin Yeats
- Centre for Genomic Pathogen Surveillance, Wellcome Genome CampusHinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - Ben Taylor
- Centre for Genomic Pathogen Surveillance, Wellcome Genome CampusHinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | - James Hadfield
- Vaccine and Infectious Disease Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - John A. Lees
- Faculty of Medicine, School of Public Health, Imperial College London, UK
| | | | - Andries J. van Tonder
- Parasites and microbes, Wellcome Sanger InstituteHinxton, UK
- Department of Veterinary Medicine, University of Cambridge, Cambridge, UK
| | - Leon J. Bentley
- Parasites and microbes, Wellcome Sanger InstituteHinxton, UK
| | - Fu Xiang Quah
- Parasites and microbes, Wellcome Sanger InstituteHinxton, UK
| | - Anne J. Blaschke
- Division of Pediatric Infectious Diseases, Department of Pediatrics, School of Medicine, University of Utah, 295 Chipeta Way, Salt Lake City, UT, 84108, USA
| | - Nicole L. Pershing
- Division of Pediatric Infectious Diseases, Department of Pediatrics, School of Medicine, University of Utah, 295 Chipeta Way, Salt Lake City, UT, 84108, USA
| | | | | | - Waleria Hryniewicz
- National Medicines Institute, Division of Clinical Microbiology and Infection Prevention, Warsaw, Poland
| | - Betuel Sigauque
- Fundação Manhiça / Centro de Investigação em Saúde da Manhiça (CISM), Maputo Mozambique, Instituto Nacional de Saúde, inistério de Saúde, Maputo, Mozambique
| | - K.L. Ravikumar
- Central Research Laboratory, Department of Microbiology, Kempegowda Institute of Medical Sciences Hospital & Research Center, Bangalore, India
| | | | - Theresa J. Ochoa
- Instituto de Medicina Tropical, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Pak Leung Ho
- Department of Microbiology and Carol Yu Centre for Infection, The University of Hong Kong, Queen Mary Hospital, Hong Kong, PR China
| | - Mignon du Plessis
- Centre for Respiratory Diseases and Meningitis, National Institute for Communicable Diseases, Johannesburg, South Africa
| | - Kedibone M. Ndlangisa
- Centre for Respiratory Diseases and Meningitis, National Institute for Communicable Diseases, Johannesburg, South Africa
| | | | - Brenda Kwambana-Adams
- NIHR Global Health Research Unit on Mucosal Pathogens, Division of Infection and Immunity, University College London, London, UK
- WHO Collaborating Centre for New Vaccines Surveillance, Medical Research Council Unit The Gambia at The London School of Hygiene & Tropical Medicine, Fajara, The Gambia
| | - Rachel Benisty
- The Faculty of Health Sciences, Ben-Gurion University of the NegevBeer-Sheva, Israel
| | - Susan A. Nzenze
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa
- Department of Science and Technology/National Research Foundation: Vaccine Preventable Diseases, University of the Witwatersrand, Johannesburg, South Africa
| | - Shabir A. Madhi
- Medical Research Council: Respiratory and Meningeal Pathogens Research Unit, University of the Witwatersrand, Johannesburg, South Africa
- Department of Science and Technology/National Research Foundation: Vaccine Preventable Diseases, University of the Witwatersrand, Johannesburg, South Africa
| | | | - Andrew J. Pollard
- Oxford Vaccine Group, Department of Paediatrics, University of Oxford, and the NIHR Oxford Biomedical Research Centre, Oxford, UK
| | | | - Martin Antonio
- WHO Collaborating Centre for New Vaccines Surveillance, Medical Research Council Unit The Gambia at The London School of Hygiene & Tropical Medicine, Fajara, The Gambia
| | - Ron Dagan
- The Faculty of Health Sciences, Ben-Gurion University of the NegevBeer-Sheva, Israel
| | | | - Anne von Gottberg
- Department of Microbiology and Carol Yu Centre for Infection, The University of Hong Kong, Queen Mary Hospital, Hong Kong, PR China
| | | | - Yuan Li
- Centers for Disease Control and Prevention, Atlanta, GA, USA
| | | | - Lesley McGee
- Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Robert F. Breiman
- Rollins School Public Health, Emory University, GA, USA
- Emory Global Health Institute, Atlanta, GA, USA
| | - David M. Aanensen
- Centre for Genomic Pathogen Surveillance, Wellcome Genome CampusHinxton, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
| | | |
Collapse
|
2
|
Hayati M, Biller P, Colijn C. Predicting the short-term success of human influenza virus variants with machine learning. Proc Biol Sci 2020; 287:20200319. [PMID: 32259469 PMCID: PMC7209065 DOI: 10.1098/rspb.2020.0319] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2020] [Accepted: 03/16/2020] [Indexed: 12/13/2022] Open
Abstract
Seasonal influenza viruses are constantly changing and produce a different set of circulating strains each season. Small genetic changes can accumulate over time and result in antigenically different viruses; this may prevent the body's immune system from recognizing those viruses. Due to rapid mutations, in particular, in the haemagglutinin (HA) gene, seasonal influenza vaccines must be updated frequently. This requires choosing strains to include in the updates to maximize the vaccines' benefits, according to estimates of which strains will be circulating in upcoming seasons. This is a challenging prediction task. In this paper, we use longitudinally sampled phylogenetic trees based on HA sequences from human influenza viruses, together with counts of epitope site polymorphisms in HA, to predict which influenza virus strains are likely to be successful. We extract small groups of taxa (subtrees) and use a suite of features of these subtrees as key inputs to the machine learning tools. Using a range of training and testing strategies, including training on H3N2 and testing on H1N1, we find that successful prediction of future expansion of small subtrees is possible from these data, with accuracies of 0.71-0.85 and a classifier 'area under the curve' 0.75-0.9.
Collapse
Affiliation(s)
- Maryam Hayati
- Department of Computing Science, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Priscila Biller
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
| | - Caroline Colijn
- Department of Mathematics, Simon Fraser University, Burnaby, British Columbia, CanadaV5A 1S6
- Department of Mathematics, Imperial College London, London SW7 2BU, UK
| |
Collapse
|
3
|
Goig GA, Blanco S, Garcia-Basteiro AL, Comas I. Contaminant DNA in bacterial sequencing experiments is a major source of false genetic variability. BMC Biol 2020; 18:24. [PMID: 32122347 PMCID: PMC7053099 DOI: 10.1186/s12915-020-0748-z] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Accepted: 02/11/2020] [Indexed: 12/16/2022] Open
Abstract
Background Contaminant DNA is a well-known confounding factor in molecular biology and in genomic repositories. Strikingly, analysis workflows for whole-genome sequencing (WGS) data commonly do not account for errors potentially introduced by contamination, which could lead to the wrong assessment of allele frequency both in basic and clinical research. Results We used a taxonomic filter to remove contaminant reads from more than 4000 bacterial samples from 20 different studies and performed a comprehensive evaluation of the extent and impact of contaminant DNA in WGS. We found that contamination is pervasive and can introduce large biases in variant analysis. We showed that these biases can result in hundreds of false positive and negative SNPs, even for samples with slight contamination. Studies investigating complex biological traits from sequencing data can be completely biased if contamination is neglected during the bioinformatic analysis, and we demonstrate that removing contaminant reads with a taxonomic classifier permits more accurate variant calling. We used both real and simulated data to evaluate and implement reliable, contamination-aware analysis pipelines. Conclusion As sequencing technologies consolidate as precision tools that are increasingly adopted in the research and clinical context, our results urge for the implementation of contamination-aware analysis pipelines. Taxonomic classifiers are a powerful tool to implement such pipelines.
Collapse
Affiliation(s)
- Galo A Goig
- Institute of Biomedicine of Valencia, IBV-CSIC, St. Jaume Roig 11, 46010, Valencia, Spain.
| | - Silvia Blanco
- Centro de Investigaçao em Saúde de Manhiça (CISM), Bairro Cambeve, Rua 12, Distrito da Manhiça, 1929, Maputo, Mozambique
| | - Alberto L Garcia-Basteiro
- Centro de Investigaçao em Saúde de Manhiça (CISM), Bairro Cambeve, Rua 12, Distrito da Manhiça, 1929, Maputo, Mozambique.,ISGlobal, Hospital Clínic - Universitat de Barcelona, Barcelona, Spain
| | - Iñaki Comas
- Institute of Biomedicine of Valencia, IBV-CSIC, St. Jaume Roig 11, 46010, Valencia, Spain.,CIBER in Epidemiology and Public Health, Madrid, Spain
| |
Collapse
|
4
|
Uelze L, Grützke J, Borowiak M, Hammerl JA, Juraschek K, Deneke C, Tausch SH, Malorny B. Typing methods based on whole genome sequencing data. ONE HEALTH OUTLOOK 2020; 2:3. [PMID: 33829127 PMCID: PMC7993478 DOI: 10.1186/s42522-020-0010-1] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/04/2019] [Accepted: 01/08/2020] [Indexed: 05/12/2023]
Abstract
Whole genome sequencing (WGS) of foodborne pathogens has become an effective method for investigating the information contained in the genome sequence of bacterial pathogens. In addition, its highly discriminative power enables the comparison of genetic relatedness between bacteria even on a sub-species level. For this reason, WGS is being implemented worldwide and across sectors (human, veterinary, food, and environment) for the investigation of disease outbreaks, source attribution, and improved risk characterization models. In order to extract relevant information from the large quantity and complex data produced by WGS, a host of bioinformatics tools has been developed, allowing users to analyze and interpret sequencing data, starting from simple gene-searches to complex phylogenetic studies. Depending on the research question, the complexity of the dataset and their bioinformatics skill set, users can choose between a great variety of tools for the analysis of WGS data. In this review, we describe the relevant approaches for phylogenomic studies for outbreak studies and give an overview of selected tools for the characterization of foodborne pathogens based on WGS data. Despite the efforts of the last years, harmonization and standardization of typing tools are still urgently needed to allow for an easy comparison of data between laboratories, moving towards a one health worldwide surveillance system for foodborne pathogens.
Collapse
Affiliation(s)
- Laura Uelze
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Josephine Grützke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Maria Borowiak
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Jens Andre Hammerl
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Katharina Juraschek
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Carlus Deneke
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Simon H. Tausch
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| | - Burkhard Malorny
- Department for Biological Safety, German Federal Institute for Risk Assessment, BfR, Max-Dohrn Straße 8-10, 10589 Berlin, Germany
| |
Collapse
|
6
|
Radomski N, Cadel-Six S, Cherchame E, Felten A, Barbet P, Palma F, Mallet L, Le Hello S, Weill FX, Guillier L, Mistou MY. A Simple and Robust Statistical Method to Define Genetic Relatedness of Samples Related to Outbreaks at the Genomic Scale - Application to Retrospective Salmonella Foodborne Outbreak Investigations. Front Microbiol 2019; 10:2413. [PMID: 31708892 PMCID: PMC6821717 DOI: 10.3389/fmicb.2019.02413] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/07/2019] [Indexed: 12/21/2022] Open
Abstract
The investigation of foodborne outbreaks (FBOs) from genomic data typically relies on inspecting the relatedness of samples through a phylogenomic tree computed on either SNPs, genes, kmers, or alleles (i.e., cgMLST and wgMLST). The phylogenomic reconstruction is often time-consuming, computation-intensive and depends on hidden assumptions, pipelines implementation and their parameterization. In the context of FBO investigations, robust links between isolates are required in a timely manner to trigger appropriate management actions. Here, we propose a non-parametric statistical method to assert the relatedness of samples (i.e., outbreak cases) or whether to reject them (i.e., non-outbreak cases). With typical computation running within minutes on a desktop computer, we benchmarked the ability of three non-parametric statistical tests (i.e., Wilcoxon rank-sum, Kolmogorov-Smirnov and Kruskal-Wallis) on six different genomic features (i.e., SNPs, SNPs excluding recombination events, genes, kmers, cgMLST alleles, and wgMLST alleles) to discriminate outbreak cases (i.e., positive control: C+) from non-outbreak cases (i.e., negative control: C-). We leveraged four well-characterized and retrospectively investigated FBOs of Salmonella Typhimurium and its monophasic variant S. 1,4,[5],12:i:- from France, setting positive and negative controls in all the assays. We show that the approaches relying on pairwise SNP differences distinguished all four considered outbreaks in contrast to the other tested genomic features (i.e., genes, kmers, cgMLST alleles, and wgMLST alleles). The freely available non-parametric method written in R has been designed to be independent of both the phylogenomic reconstruction and the detection methods of genomic features (i.e., SNPs, genes, kmers, or alleles), making it widely and easily usable to anybody working on genomic data from suspected samples.
Collapse
Affiliation(s)
- Nicolas Radomski
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Sabrina Cadel-Six
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Emeline Cherchame
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Arnaud Felten
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Pauline Barbet
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Federica Palma
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Ludovic Mallet
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Simon Le Hello
- Unité des Bactéries Pathogènes Entériques, Institut Pasteur, Centre National de Référence des Salmonella, Paris, France
| | - François-Xavier Weill
- Unité des Bactéries Pathogènes Entériques, Institut Pasteur, Centre National de Référence des Salmonella, Paris, France
| | - Laurent Guillier
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| | - Michel-Yves Mistou
- ANSES, Laboratory for Food Safety, Université PARIS-EST, Maisons-Alfort, France
| |
Collapse
|