1
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
2
|
Palma F, Radomski N, Guérin A, Sévellec Y, Félix B, Bridier A, Soumet C, Roussel S, Guillier L. Genomic elements located in the accessory repertoire drive the adaptation to biocides in Listeria monocytogenes strains from different ecological niches. Food Microbiol 2022; 106:103757. [PMID: 35690455 DOI: 10.1016/j.fm.2021.103757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 01/04/2021] [Accepted: 01/29/2021] [Indexed: 11/25/2022]
Abstract
In response to the massive use of biocides for controlling Listeria monocytogenes (hereafter Lm) contaminations along the food chain, strains showing biocide tolerance emerged. Here, accessory genomic elements were associated with biocide tolerance through pangenome-wide associations performed on 197 Lm strains from different lineages, ecological, geographical and temporal origins. Mobile elements, including prophage-related loci, the Tn6188_qacH transposon and pLMST6_emrC plasmid, were widespread across lineage I and II food strains and associated with tolerance to benzalkonium-chloride (BC), a quaternary ammonium compound (QAC) widely used in food processing. The pLMST6_emrC was also associated with tolerance to another QAC, the didecyldimethylammonium-chloride, displaying a pleiotropic effect. While no associations were detected for chemically reactive biocides (alcohols and chlorines), genes encoding for cell-surface proteins were associated with BC or polymeric biguanide tolerance. The latter was restricted to lineage I strains from animal and the environment. In conclusion, different genetic markers, with polygenic nature or not, appear to have driven the Lm adaptation to biocide, especially in food strains but also from animal and the environment. These markers could aid to monitor and predict the spread of biocide tolerant Lm genotypes across different ecological niches, finally reducing the risk of such strains in food industrial settings.
Collapse
Affiliation(s)
- Federica Palma
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France.
| | - Nicolas Radomski
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France
| | - Alizée Guérin
- Fougères Laboratory, Antibiotics, Biocides, Residues and Resistance Unit, ANSES, Fougères, France
| | - Yann Sévellec
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France
| | - Benjamin Félix
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France
| | - Arnaud Bridier
- Fougères Laboratory, Antibiotics, Biocides, Residues and Resistance Unit, ANSES, Fougères, France
| | - Christophe Soumet
- Fougères Laboratory, Antibiotics, Biocides, Residues and Resistance Unit, ANSES, Fougères, France
| | - Sophie Roussel
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France
| | - Laurent Guillier
- Maisons-Alfort Laboratory of food safety, University Paris-Est, ANSES, Maisons-Alfort, France; Maisons-Alfort Risk Assessment Department, University Paris-Est, ANSES, Maisons-Alfort, France
| |
Collapse
|
3
|
Purushothaman S, Meola M, Egli A. Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics. Int J Mol Sci 2022; 23:9834. [PMID: 36077231 PMCID: PMC9456280 DOI: 10.3390/ijms23179834] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/24/2022] [Accepted: 08/26/2022] [Indexed: 12/21/2022] Open
Abstract
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis.
Collapse
Affiliation(s)
- Srinithi Purushothaman
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
| | - Marco Meola
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Swiss Institute of Bioinformatics, University of Basel, 4031 Basel, Switzerland
| | - Adrian Egli
- Applied Microbiology Research, Department of Biomedicine, University of Basel, 4031 Basel, Switzerland
- Institute of Medical Microbiology, University of Zurich, 8006 Zurich, Switzerland
- Clinical Bacteriology and Mycology, University Hospital Basel, 4031 Basel, Switzerland
| |
Collapse
|
4
|
CleanSeq: A Pipeline for Contamination Detection, Cleanup, and Mutation Verifications from Microbial Genome Sequencing Data. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12126209] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Contaminations frequently occur in bacterial cultures, which significantly affect the reproducibility and reliability of the results from whole-genome sequencing (WGS). Decontaminated WGS data with clean reads is the only desirable source for detecting possible variants correctly. Improvements in bioinformatics are essential to analyze the contaminated WGS dataset. Existing pipelines usually contain contamination detection, decontamination, and variant calling separately. The efficiency and results from existing pipelines fluctuate since distinctive computational models and parameters are applied. It is then promising to develop a bioinformatical tool containing functions to discriminate and remove contaminated reads and improve variant calling from clean reads. In this study, we established a Python-based pipeline named CleanSeq for automatic detection and removal of contaminating reads, analyzing possible genome variants with proper verifications via local re-alignments. The application and reproducibility are proven in either simulated, publicly available datasets or actual genome sequencing reads from our experimental evolution study in Escherichia coli. We successfully obtained decontaminated reads, called out all seven consistent mutations from the contaminated bacterial sample, and derived five colonies. Collectively, the results demonstrated that CleanSeq could effectively process the contaminated samples to achieve decontaminated reads, based on which reliable results (i.e., variant calling) could be obtained.
Collapse
|
5
|
A European-wide dataset to uncover adaptive traits of Listeria monocytogenes to diverse ecological niches. Sci Data 2022; 9:190. [PMID: 35484273 PMCID: PMC9050667 DOI: 10.1038/s41597-022-01278-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Accepted: 03/23/2022] [Indexed: 02/08/2023] Open
Abstract
Listeria monocytogenes (Lm) is a ubiquitous bacterium that causes listeriosis, a serious foodborne illness. In the nature-to-human transmission route, Lm can prosper in various ecological niches. Soil and decaying organic matter are its primary reservoirs. Certain clonal complexes (CCs) are over-represented in food production and represent a challenge to food safety. To gain new understanding of Lm adaptation mechanisms in food, the genetic background of strains found in animals and environment should be investigated in comparison to that of food strains. Twenty-one partners, including food, environment, veterinary and public health laboratories, constructed a dataset of 1484 genomes originating from Lm strains collected in 19 European countries. This dataset encompasses a large number of CCs occurring worldwide, covers many diverse habitats and is balanced between ecological compartments and geographic regions. The dataset presented here will contribute to improve our understanding of Lm ecology and should aid in the surveillance of Lm. This dataset provides a basis for the discovery of the genetic traits underlying Lm adaptation to different ecological niches. Measurement(s) | whole genome sequencing | Technology Type(s) | Illumina Sequencing | Factor Type(s) | Multi-locus sequence types • Geographic location • Animal associated environment isolates • Food product and food production environment isolates | Sample Characteristic - Organism | Listeria monocytogenes | Sample Characteristic - Environment | Farm • Ruminant • Agricultural soil • Wild animals • food processing building • dairy food product • meat or meat product (from mammal) (us cfr) • chicken meat food product • fish food product • vegetable or vegetable product (us cfr) | Sample Characteristic - Location | Europe |
Collapse
|
6
|
Cornet L, Baurain D. Contamination detection in genomic data: more is not enough. Genome Biol 2022; 23:60. [PMID: 35189924 PMCID: PMC8862208 DOI: 10.1186/s13059-022-02619-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Accepted: 01/18/2022] [Indexed: 12/20/2022] Open
Abstract
The decreasing cost of sequencing and concomitant augmentation of publicly available genomes have created an acute need for automated software to assess genomic contamination. During the last 6 years, 18 programs have been published, each with its own strengths and weaknesses. Deciding which tools to use becomes more and more difficult without an understanding of the underlying algorithms. We review these programs, benchmarking six of them, and present their main operating principles. This article is intended to guide researchers in the selection of appropriate tools for specific applications. Finally, we present future challenges in the developing field of contamination detection.
Collapse
Affiliation(s)
- Luc Cornet
- BCCM/IHEM, Mycology and Aerobiology, Sciensano, Bruxelles, Belgium
| | - Denis Baurain
- InBioS-PhytoSYSTEMS, Eukaryotic Phylogenomics, University of Liège, Liège, Belgium.
| |
Collapse
|
7
|
Maćkiw E, Korsak D, Kowalska J, Felix B, Stasiak M, Kucharek K, Antoszewska A, Postupolski J. Genetic diversity of Listeria monocytogenes isolated from ready-to-eat food products in retail in Poland. Int J Food Microbiol 2021; 358:109397. [PMID: 34536853 DOI: 10.1016/j.ijfoodmicro.2021.109397] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Revised: 09/01/2021] [Accepted: 09/04/2021] [Indexed: 12/22/2022]
Abstract
The study describes the characterization of Listeria monocytogenes isolated from the general 2017-2019 national official control and monitoring sampling program. A total of 60,928 of ready-to-eat (RTE) food products were collected in retail in Poland, while the number of L. monocytogenes contaminated samples was 67 (0.1%). The majority of the strains belonged to molecular serotype IVb followed by IIa, frequently associated with human listeriosis. Furthermore, 61.2% of the isolates were resistant at least to one of the tested antimicrobials: penicillin, ampicillin, meropenem, erythromycin, sulfamethoxazole-trimethoprim, amoxicillin-clavulanic acid, ciprofloxacin, chloramphenicol, gentamicin, vancomycin, tetracycline and rifampicin. Virulence genes inlA, inlC, inlJ and lmo2672 were detected in all of the isolates. In our study the llsX gene (encoding LLS) exhibited 11.6% positivity. The 32 strains were grouped into 12 clonal complexes (CCs) which belong to the major clones that are in circulation in Europe. Among them, seven strains with the cgMLST close relatedness (CC2) were isolated from diverse food sectors, underlining a large circulation of this clone in Poland, most likely from multiple introduction sources. Additionally, two RTE strains CC6 and one CC37 were identified as closely related by cgMLST to two publicly available genomes of clinical strains isolated in Poland in 2012-2013. These results indicate the large strain circulation and point to RTE food products as a potential source of human listeriosis. The present study provided data to capture the contamination status of L. monocytogenes in foods at the retail level in Poland and assess the potential risk of this pathogen for human safety.
Collapse
Affiliation(s)
- Elżbieta Maćkiw
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland.
| | - Dorota Korsak
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| | - Joanna Kowalska
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| | - Benjamin Felix
- European Union Reference Laboratory for L. monocytogenes, ANSES, Laboratory for Food Safety, University of Paris-Est, 94700 Maisons-Alfort, France
| | - Monika Stasiak
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| | - Katarzyna Kucharek
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| | - Aleksandra Antoszewska
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| | - Jacek Postupolski
- Department of Food Safety, National Institute of Public Health NIH - National Research Institute, Warsaw, Poland
| |
Collapse
|
8
|
Pettengill JB, Kase JA, Murray MH. The Population Genetics, Virulence, and Public Health Concerns of Escherichia coli Collected From Rats Within an Urban Environment. Front Microbiol 2021; 12:631761. [PMID: 34777266 PMCID: PMC8585510 DOI: 10.3389/fmicb.2021.631761] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2020] [Accepted: 10/04/2021] [Indexed: 11/13/2022] Open
Abstract
The co-existence of rats and humans in urban environments has long been a cause for concern regarding human health because of the potential for rats to harbor and transmit disease-causing pathogens. Here, we analyze whole-genome sequence (WGS) data from 41 Escherichia coli isolates collected from rat feces from 12 locations within the city of Chicago, IL, United States to determine the potential for rats to serve as a reservoir for pathogenic E. coli and describe its population structure. We identified 25 different serotypes, none of which were isolated from strains containing significant virulence markers indicating the presence of Shiga toxin-producing and other disease-causing E. coli. Nor did the E. coli isolates harbor any particularly rare stress tolerant or antimicrobial resistance genes. We then compared the isolates against a public database of approximately 100,000 E. coli and Shigella isolates of primarily food, food facility, or clinical origin. We found that only one isolate was genetically similar to genome sequences in the database. Phylogenetic analyses showed that isolates cluster by serotype, and there was little geographic structure (e.g., isolation by distance) among isolates. However, a greater signal of isolation by distance was observed when we compared genetic and geographic distances among isolates of the same serotype. This suggests that E. coli serotypes are independent lineages and recombination between serotypes is rare.
Collapse
Affiliation(s)
- J B Pettengill
- Division of Biostatistics and Bioinformatics, Office of Analytics and Outreach, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, MD, United States
| | - J A Kase
- Division of Microbiology, Office of Regulatory Science, Center for Food Safety and Applied Nutrition, United States Food and Drug Administration, College Park, MD, United States
| | - M H Murray
- Davee Center for Epidemiology and Endocrinology, Urban Wildlife Institute, Lincoln Park Zoo, Chicago, IL, United States
| |
Collapse
|
9
|
Multidrug Resistance Dynamics in Salmonella in Food Animals in the United States: An Analysis of Genomes from Public Databases. Microbiol Spectr 2021; 9:e0049521. [PMID: 34704804 PMCID: PMC8549754 DOI: 10.1128/spectrum.00495-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
The number of bacterial genomes deposited each year in public databases is growing exponentially. However, efforts to use these genomes to track trends in antimicrobial resistance (AMR) have been limited thus far. We used 22,102 genomes from public databases to track AMR trends in nontyphoidal Salmonella in food animals in the United States. In 2018, genomes deposited in public databases carried genes conferring resistance, on average, to 2.08 antimicrobial classes in poultry, 1.74 in bovines, and 1.28 in swine. This represents a decline in AMR of over 70% compared to the levels in 2000 in bovines and swine, and an increase of 13% for poultry. Trends in resistance inferred from genomic data showed good agreement with U.S. phenotypic surveillance data (weighted mean absolute difference ± standard deviation, 5.86% ± 8.11%). In 2018, resistance to 3rd-generation cephalosporins in bovines, swine, and poultry decreased to 9.97% on average, whereas in quinolones and 4th-generation cephalosporins, resistance increased to 12.53% and 3.87%, respectively. This was concomitant with a decrease of blaCMY-2 but an increase in blaCTX-M-65 and gyrA D87Y (encoding a change of D to Y at position 87). Core genome single-nucleotide polymorphism (SNP) phylogenies show that resistance to these antimicrobial classes was predominantly associated with Salmonella enterica serovar Infantis and, to a lesser extent, S. enterica serovar Typhimurium and its monophasic variant I 4,[5],12:i:−, whereas quinolone resistance was also associated with S. enterica serovar Dublin. Between 2000 and 2018, trends in serovar prevalence showed a composition shift where S. Typhimurium decreased while S. Infantis increased. Our findings illustrate the growing potential of using genomes in public databases to track AMR in regions where sequencing capacities are currently expanding. IMPORTANCE Next-generation sequencing has led to an exponential increase in the number of genomes deposited in public repositories. This growing volume of information presents opportunities to track the prevalence of genes conferring antimicrobial resistance (AMR), a growing threat to the health of humans and animals. Using 22,102 public genomes, we estimated that the prevalence of multidrug resistance (MDR) in the United States decreased in nontyphoidal Salmonella isolates recovered from bovines and swine between 2000 and 2018, whereas it increased in poultry. These trends are consistent with those detected by national surveillance systems that monitor resistance using phenotypic testing. However, using genomes, we identified that genes conferring resistance to critically important antimicrobials were associated with specific MDR serovars that could be the focus for future interventions. Our analysis illustrates the growing potential of public repositories to monitor AMR trends and shows that similar efforts could soon be carried out in other regions where genomic surveillance is increasing.
Collapse
|
10
|
Labbé G, Kruczkiewicz P, Robertson J, Mabon P, Schonfeld J, Kein D, Rankin MA, Gopez M, Hole D, Son D, Knox N, Laing CR, Bessonov K, Taboada EN, Yoshida C, Ziebell K, Nichani A, Johnson RP, Van Domselaar G, Nash JHE. Rapid and accurate SNP genotyping of clonal bacterial pathogens with BioHansel. Microb Genom 2021; 7. [PMID: 34554082 PMCID: PMC8715432 DOI: 10.1099/mgen.0.000651] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Hierarchical genotyping approaches can provide insights into the source, geography and temporal distribution of bacterial pathogens. Multiple hierarchical SNP genotyping schemes have previously been developed so that new isolates can rapidly be placed within pre-computed population structures, without the need to rebuild phylogenetic trees for the entire dataset. This classification approach has, however, seen limited uptake in routine public health settings due to analytical complexity and the lack of standardized tools that provide clear and easy ways to interpret results. The BioHansel tool was developed to provide an organism-agnostic tool for hierarchical SNP-based genotyping. The tool identifies split k-mers that distinguish predefined lineages in whole genome sequencing (WGS) data using SNP-based genotyping schemes. BioHansel uses the Aho-Corasick algorithm to type isolates from assembled genomes or raw read sequence data in a matter of seconds, with limited computational resources. This makes BioHansel ideal for use by public health agencies that rely on WGS methods for surveillance of bacterial pathogens. Genotyping results are evaluated using a quality assurance module which identifies problematic samples, such as low-quality or contaminated datasets. Using existing hierarchical SNP schemes for Mycobacterium tuberculosis and Salmonella Typhi, we compare the genotyping results obtained with the k-mer-based tools BioHansel and SKA, with those of the organism-specific tools TBProfiler and genotyphi, which use gold-standard reference-mapping approaches. We show that the genotyping results are fully concordant across these different methods, and that the k-mer-based tools are significantly faster. We also test the ability of the BioHansel quality assurance module to detect intra-lineage contamination and demonstrate that it is effective, even in populations with low genetic diversity. We demonstrate the scalability of the tool using a dataset of ~8100 S. Typhi public genomes and provide the aggregated results of geographical distributions as part of the tool’s output. BioHansel is an open source Python 3 application available on PyPI and Conda repositories and as a Galaxy tool from the public Galaxy Toolshed. In a public health context, BioHansel enables rapid and high-resolution classification of bacterial pathogens with low genetic diversity.
Collapse
Affiliation(s)
- Geneviève Labbé
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | | | - James Robertson
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Philip Mabon
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Justin Schonfeld
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Daniel Kein
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Marisa A Rankin
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Matthew Gopez
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Darian Hole
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - David Son
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Natalie Knox
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada.,Department of Medical Microbiology & Infectious Diseases, Max Rady College of Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, Manitoba, Canada
| | - Chad R Laing
- National Centres for Animal Disease Lethbridge Laboratory, Canadian Food Inspection Agency, Lethbridge, AB, Canada
| | - Kyrylo Bessonov
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Eduardo N Taboada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Catherine Yoshida
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Kim Ziebell
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Anil Nichani
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Roger P Johnson
- National Microbiology Laboratory, Public Health Agency of Canada, Guelph, Ontario, Canada
| | - Gary Van Domselaar
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada.,National Centres for Animal Disease Lethbridge Laboratory, Canadian Food Inspection Agency, Lethbridge, AB, Canada
| | - John H E Nash
- National Microbiology Laboratory, Public Health Agency of Canada, Toronto, Ontario, Canada
| |
Collapse
|
11
|
Deneke C, Brendebach H, Uelze L, Borowiak M, Malorny B, Tausch SH. Species-Specific Quality Control, Assembly and Contamination Detection in Microbial Isolate Sequences with AQUAMIS. Genes (Basel) 2021; 12:644. [PMID: 33926025 PMCID: PMC8145556 DOI: 10.3390/genes12050644] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 04/23/2021] [Accepted: 04/24/2021] [Indexed: 01/13/2023] Open
Abstract
Sequencing of whole microbial genomes has become a standard procedure for cluster detection, source tracking, outbreak investigation and surveillance of many microorganisms. An increasing number of laboratories are currently in a transition phase from classical methods towards next generation sequencing, generating unprecedented amounts of data. Since the precision of downstream analyses depends significantly on the quality of raw data generated on the sequencing instrument, a comprehensive, meaningful primary quality control is indispensable. Here, we present AQUAMIS, a Snakemake workflow for an extensive quality control and assembly of raw Illumina sequencing data, allowing laboratories to automatize the initial analysis of their microbial whole-genome sequencing data. AQUAMIS performs all steps of primary sequence analysis, consisting of read trimming, read quality control (QC), taxonomic classification, de-novo assembly, reference identification, assembly QC and contamination detection, both on the read and assembly level. The results are visualized in an interactive HTML report including species-specific QC thresholds, allowing non-bioinformaticians to assess the quality of sequencing experiments at a glance. All results are also available as a standard-compliant JSON file, facilitating easy downstream analyses and data exchange. We have applied AQUAMIS to analyze ~13,000 microbial isolates as well as ~1000 in-silico contaminated datasets, proving the workflow's ability to perform in high throughput routine sequencing environments and reliably predict contaminations. We found that intergenus and intragenus contaminations can be detected most accurately using a combination of different QC metrics available within AQUAMIS.
Collapse
Affiliation(s)
| | | | | | | | | | - Simon H. Tausch
- Department Biological Safety, German Federal Institute for Risk Assessment, 10589 Berlin, Germany; (C.D.); (H.B.); (L.U.); (M.B.); (B.M.)
| |
Collapse
|
12
|
San JE, Ngcapu S, Kanzi AM, Tegally H, Fonseca V, Giandhari J, Wilkinson E, Nelson CW, Smidt W, Kiran AM, Chimukangara B, Pillay S, Singh L, Fish M, Gazy I, Martin DP, Khanyile K, Lessells R, de Oliveira T. Transmission dynamics of SARS-CoV-2 within-host diversity in two major hospital outbreaks in South Africa. Virus Evol 2021; 7:veab041. [PMID: 34035952 PMCID: PMC8135343 DOI: 10.1093/ve/veab041] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) causes acute, highly transmissible respiratory infection in humans and a wide range of animal species. Its rapid global spread has resulted in a major public health emergency, necessitating commensurately rapid research to improve control strategies. In particular, the ability to effectively retrace transmission chains in outbreaks remains a major challenge, partly due to our limited understanding of the virus' underlying evolutionary dynamics within and between hosts. We used high-throughput sequencing whole-genome data coupled with bottleneck analysis to retrace the pathways of viral transmission in two nosocomial outbreaks that were previously characterised by epidemiological and phylogenetic methods. Additionally, we assessed the mutational landscape, selection pressures, and diversity at the within-host level for both outbreaks. Our findings show evidence of within-host selection and transmission of variants between samples. Both bottleneck and diversity analyses highlight within-host and consensus-level variants shared by putative source-recipient pairs in both outbreaks, suggesting that certain within-host variants in these outbreaks may have been transmitted upon infection rather than arising de novo independently within multiple hosts. Overall, our findings demonstrate the utility of combining within-host diversity and bottleneck estimations for elucidating transmission events in SARS-CoV-2 outbreaks, provide insight into the maintenance of viral genetic diversity, provide a list of candidate targets of positive selection for further investigation, and demonstrate that within-host variants can be transferred between patients. Together these results will help in developing strategies to understand the nature of transmission events and curtail the spread of SARS-CoV-2.
Collapse
Affiliation(s)
- James E San
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Sinaye Ngcapu
- Centre for the AIDS Programme of Research in South Africa (CAPRISA), Durban, South Africa
- Department of Medical Microbiology, University of KwaZulu-Natal, Durban, South Africa
| | - Aquillah M Kanzi
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Houriiyah Tegally
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Vagner Fonseca
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Jennifer Giandhari
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Eduan Wilkinson
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Chase W Nelson
- Biodiversity Research Center, Academia Sinica, Taipei, Taiwan
- Institute for Comparative Genomics, American Museum of Natural History, New York, NY, USA
| | - Werner Smidt
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
- Africa Health Research Institute (AHRI), Durban, South Africa
| | - Anmol M Kiran
- Malawi-Liverpool-Wellcome Trust, Queen Elizabeth Central Hospital, College of Medicine, Blantyre, Malawi
- Centre for Inflammation Research, Queens Research Institute, University of Edinburgh, Edinburgh, UK
| | - Benjamin Chimukangara
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Sureshnee Pillay
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Lavanya Singh
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Maryam Fish
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Inbal Gazy
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Darren P Martin
- Institute of Infectious Diseases and Molecular Medicine, Division of Computational Biology, Department of Integrative Biomedical Sciences, University of Cape Town, South Africa
| | - Khulekani Khanyile
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Richard Lessells
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
| | - Tulio de Oliveira
- KwaZulu-Natal Research Innovation and Sequencing Platform (KRISP), School of Laboratory Medicine & Medical Sciences, University of KwaZulu- Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, USA
| |
Collapse
|
13
|
Maćkiw E, Korsak D, Kowalska J, Felix B, Stasiak M, Kucharek K, Postupolski J. Incidence and genetic variability of Listeria monocytogenes isolated from vegetables in Poland. Int J Food Microbiol 2020; 339:109023. [PMID: 33341686 DOI: 10.1016/j.ijfoodmicro.2020.109023] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2020] [Revised: 12/08/2020] [Accepted: 12/09/2020] [Indexed: 02/07/2023]
Abstract
The aim of the present study is to investigate the prevalence and genetic diversity of Listeria monocytogenes in various fresh and frozen vegetable products available in Poland. The samples were collected at retail market within the framework of national official control and monitoring program. In the years 2016-2019 a total of 49 samples out of 8712 collected vegetable samples were positive for L. monocytogenes. Our findings demonstrated that the occurrence of L. monocytogenes in various vegetable products was generally low, on average only 0.56% in the studied years. All isolates were susceptible to 11 antimicrobial agents: penicillin, ampicillin, meropenem, erythromycin, sulfamethoxazole-trimethoprim, amoxicillin-clavulanic acid, ciprofloxacin, chloramphenicol, gentamicin, vancomycin, and tetracycline. All of them harbored virulence-associated genes (inlA, inlC, and lmo2672), 82% harbored inlJ gene and few of them (22%) also possessed the llsX gene. The majority of collected isolates (65%) belonged to molecular serogroup 1/2a-3a, followed by 4ab-4b-4d-4e (33%), and only one to serogroup 1/2b-3b-7 (2%). Isolates yielded 18 different restriction profiles, revealing a large cluster of contamination linked to frozen corn (21 strains) and distributed in 3 pulsotypes. MLST analysis classified selected isolates into nine clonal complexes (CCs). The obtained results contribute to characterizing the diversity of L. monocytogenes isolated from various vegetable products in Poland and their impact on food safety and public health.
Collapse
Affiliation(s)
- Elżbieta Maćkiw
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland.
| | - Dorota Korsak
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland
| | - Joanna Kowalska
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland
| | - Benjamin Felix
- European Union Reference Laboratory for L. monocytogenes, ANSES, Laboratory for Food Safety, University of Paris-Est, 94700 Maisons-Alfort, France
| | - Monika Stasiak
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland
| | - Katarzyna Kucharek
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland
| | - Jacek Postupolski
- Department of Food Safety, National Institute of Public Health - National Institute of Hygiene, Warsaw, Poland
| |
Collapse
|
14
|
Mansour MN, Yaghi J, El Khoury A, Felten A, Mistou MY, Atoui A, Radomski N. Prediction of Salmonella serovars isolated from clinical and food matrices in Lebanon and genomic-based investigation focusing on Enteritidis serovar. Int J Food Microbiol 2020; 333:108831. [PMID: 32854018 DOI: 10.1016/j.ijfoodmicro.2020.108831] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 08/09/2020] [Accepted: 08/10/2020] [Indexed: 12/17/2022]
Abstract
Salmonella enterica subsp. enterica serovars are considered major causes of food poisoning and we performed this study because Salmonella is a burden in Lebanon. The present study investigated the ability of genomic information to predict serovar using a collection of Salmonella isolates from infected humans (n = 24) and contaminated food (n = 63) in Lebanon. Further, the phylogenomic relationships of the serovar the predominated in Lebanon (i.e., S. Enteritidis; n = 25) were investigated in comparison with isolates from other countries (n = 130) based on coregenome single nucleotide polymorphisms (SNPs). Genetic elements, specifically Salmonella pathogenicity islands (SPIs), plasmid replicons, and antibiotic-resistance genes were screened in S. Enteritidis genomes (n = 155). Our results revealed that the Salmonella serovars identification by seroagglutination from the samples isolated in Lebanon (n = 87) was highly correlated with the genomic-based prediction of serovars (80.4-85.0% with SeqSero1 and 93.1-94.2% with SeqSero2). The Salmonella serovars isolated from human and food samples in Lebanon were mainly Enteritidis (28.7%) and Infantis (26%). To a rare extent, other serovars included Amager, Anatum, Bredeney, Chincol, Heidelberg, Hofit, Kentucky, Montevideo, Muenster, Newport, Schwarzengrund, Senftenberg and Typhimurium. In comparison with other countries, S. Enteritidis samples isolated in Lebanon (56 ± 27 intra-group pairwise SNP differences) presented a strong phylogenomic relativeness at the coregenome level with samples, as for example with samples isolated from Syria (65 ± 31 inter-group pairwise SNP differences). Most of the studied S. Enteritidis genomes encoded 10 SPIs involved in survival in immune cells (i.e. SPIs 1, 2, 3, 4, 5, 12, 13, 14, 16 and 17). The plasmid replicons IncFIB (S)_1 and IncFII (S)_1 encoding elements involved in virulence were identified in the majority of the S. Enteritidis genomes (94% and 96%, respectively), the majority exhibiting aminoglycosides (gene aac(6')-Iaa_1). The IncI_1_Alpha replicon responsible for ampicillin-resistance was only detected in 2 of 25 S. Enteritidis Lebanese strains. Genomic-based risk assessment of Salmonella serovars in Lebanon showed that food imported from Syria might be an origin of the S. Enteritidis human cases in Lebanon. The detection of several SPIs involved in the survival, plasmid replicons involved in virulence, and aminoglycoside-resistance genes, emphasizes that S. Enteritidis is of paramount importance for public health in Lebanon and other countries.
Collapse
Affiliation(s)
- Marie Noel Mansour
- Centre d'Analyses et de Recherche (CAR), Unité de Recherche « Technologies et Valorisation Agro-alimentaire » (UR-TVA), Faculty of Sciences, Saint-Joseph University of Beirut, Campus of Sciences and Technologies, Mar Roukos, Lebanon.
| | - Joseph Yaghi
- Centre d'Analyses et de Recherche (CAR), Unité de Recherche « Technologies et Valorisation Agro-alimentaire » (UR-TVA), Faculty of Sciences, Saint-Joseph University of Beirut, Campus of Sciences and Technologies, Mar Roukos, Lebanon.
| | - André El Khoury
- Centre d'Analyses et de Recherche (CAR), Unité de Recherche « Technologies et Valorisation Agro-alimentaire » (UR-TVA), Faculty of Sciences, Saint-Joseph University of Beirut, Campus of Sciences and Technologies, Mar Roukos, Lebanon.
| | - Arnaud Felten
- Paris-Est University, French Agency for Food, Environmental and Occupational Health and Safety (Anses), Laboratory for Food Safety (LSAL), Maisons-Alfort, France.
| | - Michel-Yves Mistou
- Applied Mathematics and Computer Science, From Genomes to the Environment (MaIAGE), National Institute for Agricultural, Food and Environmental Research (INRAE), Université Paris-Saclay, Jouy-en-Josas, France.
| | - Ali Atoui
- Laboratory of Microbiology, Department of Life and Earth Sciences, Faculty of Sciences I, Lebanese University, Hadat Campus, Beirut, Lebanon.
| | - Nicolas Radomski
- Paris-Est University, French Agency for Food, Environmental and Occupational Health and Safety (Anses), Laboratory for Food Safety (LSAL), Maisons-Alfort, France.
| |
Collapse
|