1
|
Guzinski J, Tang Y, Chattaway MA, Dallman TJ, Petrovska L. Development and validation of a random forest algorithm for source attribution of animal and human Salmonella Typhimurium and monophasic variants of S. Typhimurium isolates in England and Wales utilising whole genome sequencing data. Front Microbiol 2024; 14:1254860. [PMID: 38533130 PMCID: PMC10963456 DOI: 10.3389/fmicb.2023.1254860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/22/2023] [Indexed: 03/28/2024] Open
Abstract
Source attribution has traditionally involved combining epidemiological data with different pathogen characterisation methods, including 7-gene multi locus sequence typing (MLST) or serotyping, however, these approaches have limited resolution. In contrast, whole genome sequencing data provide an overview of the whole genome that can be used by attribution algorithms. Here, we applied a random forest (RF) algorithm to predict the primary sources of human clinical Salmonella Typhimurium (S. Typhimurium) and monophasic variants (monophasic S. Typhimurium) isolates. To this end, we utilised single nucleotide polymorphism diversity in the core genome MLST alleles obtained from 1,061 laboratory-confirmed human and animal S. Typhimurium and monophasic S. Typhimurium isolates as inputs into a RF model. The algorithm was used for supervised learning to classify 399 animal S. Typhimurium and monophasic S. Typhimurium isolates into one of eight distinct primary source classes comprising common livestock and pet animal species: cattle, pigs, sheep, other mammals (pets: mostly dogs and horses), broilers, layers, turkeys, and game birds (pheasants, quail, and pigeons). When applied to the training set animal isolates, model accuracy was 0.929 and kappa 0.905, whereas for the test set animal isolates, for which the primary source class information was withheld from the model, the accuracy was 0.779 and kappa 0.700. Subsequently, the model was applied to assign 662 human clinical cases to the eight primary source classes. In the dataset, 60/399 (15.0%) of the animal and 141/662 (21.3%) of the human isolates were associated with a known outbreak of S. Typhimurium definitive type (DT) 104. All but two of the 141 DT104 outbreak linked human isolates were correctly attributed by the model to the primary source classes identified as the origin of the DT104 outbreak. A model that was run without the clonal DT104 animal isolates produced largely congruent outputs (training set accuracy 0.989 and kappa 0.985; test set accuracy 0.781 and kappa 0.663). Overall, our results show that RF offers considerable promise as a suitable methodology for epidemiological tracking and source attribution for foodborne pathogens.
Collapse
Affiliation(s)
- Jaromir Guzinski
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| | - Yue Tang
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| | - Marie Anne Chattaway
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London, United Kingdom
| | - Timothy J. Dallman
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London, United Kingdom
| | - Liljana Petrovska
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| |
Collapse
|
2
|
Cardim Falcao R, Edwards MR, Hurst M, Fraser E, Otterstatter M. A Review on Microbiological Source Attribution Methods of Human Salmonellosis: From Subtyping to Whole-Genome Sequencing. Foodborne Pathog Dis 2024; 21:137-146. [PMID: 38032610 PMCID: PMC10924193 DOI: 10.1089/fpd.2023.0075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2023] Open
Abstract
Salmonella is one of the main causes of human foodborne illness. It is endemic worldwide, with different animals and animal-based food products as reservoirs and vehicles of infection. Identifying animal reservoirs and potential transmission pathways of Salmonella is essential for prevention and control. There are many approaches for source attribution, each using different statistical models and data streams. Some aim to identify the animal reservoir, while others aim to determine the point at which exposure occurred. With the advance of whole-genome sequencing (WGS) technologies, new source attribution models will greatly benefit from the discriminating power gained with WGS. This review discusses some key source attribution methods and their mathematical and statistical tools. We also highlight recent studies utilizing WGS for source attribution and discuss open questions and challenges in developing new WGS methods. We aim to provide a better understanding of the current state of these methodologies with application to Salmonella and other foodborne pathogens that are common sources of illness in the poultry and human sectors.
Collapse
Affiliation(s)
- Rebeca Cardim Falcao
- British Columbia Centre for Disease Control, Vancouver, Canada
- School of Population and Public Health, The University of British Columbia, Vancouver, Canada
| | - Megan R Edwards
- British Columbia Centre for Disease Control, Vancouver, Canada
- School of Population and Public Health, The University of British Columbia, Vancouver, Canada
| | - Matt Hurst
- Public Health Agency of Canada, Guelph, Canada
| | - Erin Fraser
- British Columbia Centre for Disease Control, Vancouver, Canada
- School of Population and Public Health, The University of British Columbia, Vancouver, Canada
| | - Michael Otterstatter
- British Columbia Centre for Disease Control, Vancouver, Canada
- School of Population and Public Health, The University of British Columbia, Vancouver, Canada
| |
Collapse
|
3
|
Djordjevic SP, Jarocki VM, Seemann T, Cummins ML, Watt AE, Drigo B, Wyrsch ER, Reid CJ, Donner E, Howden BP. Genomic surveillance for antimicrobial resistance - a One Health perspective. Nat Rev Genet 2024; 25:142-157. [PMID: 37749210 DOI: 10.1038/s41576-023-00649-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/02/2023] [Indexed: 09/27/2023]
Abstract
Antimicrobial resistance (AMR) - the ability of microorganisms to adapt and survive under diverse chemical selection pressures - is influenced by complex interactions between humans, companion and food-producing animals, wildlife, insects and the environment. To understand and manage the threat posed to health (human, animal, plant and environmental) and security (food and water security and biosecurity), a multifaceted 'One Health' approach to AMR surveillance is required. Genomic technologies have enabled monitoring of the mobilization, persistence and abundance of AMR genes and mutations within and between microbial populations. Their adoption has also allowed source-tracing of AMR pathogens and modelling of AMR evolution and transmission. Here, we highlight recent advances in genomic AMR surveillance and the relative strengths of different technologies for AMR surveillance and research. We showcase recent insights derived from One Health genomic surveillance and consider the challenges to broader adoption both in developed and in lower- and middle-income countries.
Collapse
Affiliation(s)
- Steven P Djordjevic
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia.
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia.
| | - Veronica M Jarocki
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Torsten Seemann
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Max L Cummins
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Anne E Watt
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Barbara Drigo
- UniSA STEM, University of South Australia, Adelaide, South Australia, Australia
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
| | - Ethan R Wyrsch
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Cameron J Reid
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Erica Donner
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
- Cooperative Research Centre for Solving Antimicrobial Resistance in Agribusiness, Food, and Environments (CRC SAAFE), Adelaide, South Australia, Australia
| | - Benjamin P Howden
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| |
Collapse
|
4
|
Gu W, Cui Z, Stroika S, Carleton HA, Conrad A, Katz LS, Richardson LC, Hunter J, Click ES, Bruce BB. Predicting Food Sources of Listeria monocytogenes Based on Genomic Profiling Using Random Forest Model. Foodborne Pathog Dis 2023; 20:579-586. [PMID: 37699246 DOI: 10.1089/fpd.2023.0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023] Open
Abstract
Listeria monocytogenes can cause severe foodborne illness, including miscarriage during pregnancy or death in newborn infants. When outbreaks of L. monocytogenes illness occur, it may be possible to determine the food source of the outbreak. However, most reported L. monocytogenes illnesses do not occur as part of a recognized outbreak and most of the time the food source of sporadic L. monocytogenes illness in people cannot be determined. In the United States, L. monocytogenes isolates from patients, foods, and environments are routinely sequenced and analyzed by whole genome multilocus sequence typing (wgMLST) for outbreak detection by PulseNet, the national molecular surveillance system for foodborne illnesses. We investigated whether machine learning approaches applied to wgMLST allele call data could assist in attribution analysis of food source of L. monocytogenes isolates. We compiled isolates with a known source from five food categories (dairy, fruit, meat, seafood, and vegetable) using the metadata of L. monocytogenes isolates in PulseNet, deduplicated closely genetically related isolates, and developed random forest models to predict the food sources of isolates. Prediction accuracy of the final model varied across the food categories; it was highest for meat (65%), followed by fruit (45%), vegetable (45%), dairy (44%), and seafood (37%); overall accuracy was 49%, compared with the naive prediction accuracy of 28%. Our results show that random forest can be used to capture genetically complex features of high-resolution wgMLST for attribution of isolates to their sources.
Collapse
Affiliation(s)
- Weidong Gu
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Zhaohui Cui
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Steven Stroika
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Heather A Carleton
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Amanda Conrad
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Lee S Katz
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - LaTonia C Richardson
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Jennifer Hunter
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Eleanor S Click
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Beau B Bruce
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| |
Collapse
|
5
|
Chalka A, Dallman TJ, Vohra P, Stevens MP, Gally DL. The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Salmonella Typhimurium from the USA. Microb Genom 2023; 9:001116. [PMID: 37843883 PMCID: PMC10634445 DOI: 10.1099/mgen.0.001116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/02/2023] [Indexed: 10/17/2023] Open
Abstract
Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.
Collapse
Affiliation(s)
- Antonia Chalka
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Tim J. Dallman
- Institute for Risk Assessment Sciences (IRAS), University of Utrecht, Heidelberglaan, Utrecht, Netherlands
| | - Prerna Vohra
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Mark P. Stevens
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - David L. Gally
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
6
|
Lagerstrom KM, Hadly EA. Under-Appreciated Phylogroup Diversity of Escherichia coli within and between Animals at the Urban-Wildland Interface. Appl Environ Microbiol 2023:e0014223. [PMID: 37191541 DOI: 10.1128/aem.00142-23] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/17/2023] Open
Abstract
Wild animals have been implicated as reservoirs and even "melting pots" of pathogenic and antimicrobial-resistant bacteria of concern to human health. Though Escherichia coli is common among vertebrate guts and plays a role in the propagation of such genetic information, few studies have explored its diversity beyond humans nor the ecological factors that influence its diversity and distribution in wild animals. We characterized an average of 20 E. coli isolates per scat sample (n = 84) from a community of 14 wild and 3 domestic species. The phylogeny of E. coli comprises 8 phylogroups that are differentially associated with pathogenicity and antibiotic resistance, and we uncovered all of them in one small biological preserve surrounded by intense human activity. Challenging previous assumptions that a single isolate is representative of within-host phylogroup diversity, 57% of individual animals sampled carried multiple phylogroups simultaneously. Host species' phylogroup richness saturated at different levels across species and encapsulated vast within-sample and within-species variation, indicating that distribution patterns are influenced both by isolation source and laboratory sampling depth. Using ecological methods that ensure statistical relevance, we identify trends in phylogroup prevalence associated with host and environmental factors. The vast genetic diversity and broad distribution of E. coli in wildlife populations has implications for biodiversity conservation, agriculture, and public health, as well as for gauging unknown risks at the urban-wildland interface. We propose critical directions for future studies of the "wild side" of E. coli that will expand our understanding of its ecology and evolution beyond the human environment. IMPORTANCE To our knowledge, neither the phylogroup diversity of E. coli within individual wild animals nor that within an interacting multispecies community have previously been assessed. In doing so, we uncovered the globally known phylogroup diversity from an animal community on a preserve imbedded in a human-dominated landscape. We revealed that the phylogroup composition in domestic animals differed greatly from that in their wild counterparts, implying potential human impacts on the domestic animal gut. Significantly, many wild individuals hosted multiple phylogroups simultaneously, indicating the potential for strain-mixing and zoonotic spillback, especially as human encroachment into wildlands increases in the Anthropocene. We reason that due to extensive anthropogenic environmental contamination, wildlife is increasingly exposed to our waste, including E. coli and antibiotics. The gaps in the ecological and evolutionary understanding of E. coli thus necessitate a significant uptick in research to better understand human impacts on wildlife and the risk for zoonotic pathogen emergence.
Collapse
Affiliation(s)
| | - Elizabeth A Hadly
- Department of Biology, Stanford University, Stanford, California, USA
- Jasper Ridge Biological Preserve, Stanford University, Stanford, California, USA
- Center for Innovation in Global Health, Stanford University, Stanford, California, USA
| |
Collapse
|
7
|
Wainaina L, Merlotti A, Remondini D, Henri C, Hald T, Njage PMK. Source Attribution of Human Campylobacteriosis Using Whole-Genome Sequencing Data and Network Analysis. Pathogens 2022; 11:pathogens11060645. [PMID: 35745499 PMCID: PMC9229307 DOI: 10.3390/pathogens11060645] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 05/26/2022] [Accepted: 05/28/2022] [Indexed: 02/04/2023] Open
Abstract
Campylobacter spp. are a leading and increasing cause of gastrointestinal infections worldwide. Source attribution, which apportions human infection cases to different animal species and food reservoirs, has been instrumental in control- and evidence-based intervention efforts. The rapid increase in whole-genome sequencing data provides an opportunity for higher-resolution source attribution models. Important challenges, including the high dimension and complex structure of WGS data, have inspired concerted research efforts to develop new models. We propose network analysis models as an accurate, high-resolution source attribution approach for the sources of human campylobacteriosis. A weighted network analysis approach was used in this study for source attribution comparing different WGS data inputs. The compared model inputs consisted of cgMLST and wgMLST distance matrices from 717 human and 717 animal isolates from cattle, chickens, dogs, ducks, pigs and turkeys. SNP distance matrices from 720 human and 720 animal isolates were also used. The data were collected from 2015 to 2017 in Denmark, with the animal sources consisting of domestic and imports from 7 European countries. Clusters consisted of network nodes representing respective genomes and links representing distances between genomes. Based on the results, animal sources were the main driving factor for cluster formation, followed by type of species and sampling year. The coherence source clustering (CSC) values based on animal sources were 78%, 81% and 78% for cgMLST, wgMLST and SNP, respectively. The CSC values based on Campylobacter species were 78%, 79% and 69% for cgMLST, wgMLST and SNP, respectively. Including human isolates in the network resulted in 88%, 77% and 88% of the total human isolates being clustered with the different animal sources for cgMLST, wgMLST and SNP, respectively. Between 12% and 23% of human isolates were not attributed to any animal source. Most of the human genomes were attributed to chickens from Denmark, with an average attribution percentage of 52.8%, 52.2% and 51.2% for cgMLST, wgMLST and SNP distance matrices respectively, while ducks from Denmark showed the least attribution of 0% for all three distance matrices. The best-performing model was the one using wgMLST distance matrix as input data, which had a CSC value of 81%. Results from our study show that the weighted network-based approach for source attribution is reliable and can be used as an alternative method for source attribution considering the high performance of the model. The model is also robust across the different Campylobacter species, animal sources and WGS data types used as input.
Collapse
Affiliation(s)
- Lynda Wainaina
- Department of Mathematics, University of Padova, 35121 Padova, Italy;
| | - Alessandra Merlotti
- Department of Physics and Astronomy, University of Bologna, 40126 Bologna, Italy; (A.M.); (D.R.)
| | - Daniel Remondini
- Department of Physics and Astronomy, University of Bologna, 40126 Bologna, Italy; (A.M.); (D.R.)
| | - Clementine Henri
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
| | - Tine Hald
- Research Group for Foodborne Pathogens and Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
| | - Patrick Murigu Kamau Njage
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, 2800 Kgs. Lyngby, Denmark;
- Correspondence:
| |
Collapse
|
8
|
Stevens EL, Carleton HA, Beal J, Tillman GE, Lindsey RL, Lauer AC, Pightling A, Jarvis KG, Ottesen A, Ramachandran P, Hintz L, Katz LS, Folster JP, Whichard JM, Trees E, Timme RE, McDERMOTT P, Wolpert B, Bazaco M, Zhao S, Lindley S, Bruce BB, Griffin PM, Brown E, Allard M, Tallent S, Irvin K, Hoffmann M, Wise M, Tauxe R, Gerner-Smidt P, Simmons M, Kissler B, Defibaugh-Chavez S, Klimke W, Agarwala R, Lindsay J, Cook K, Austerman SR, Goldman D, McGARRY S, Hale KR, Dessai U, Musser SM, Braden C. Use of Whole Genome Sequencing by the Federal Interagency Collaboration for Genomics for Food and Feed Safety in the United States. J Food Prot 2022; 85:755-772. [PMID: 35259246 DOI: 10.4315/jfp-21-437] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/22/2022] [Indexed: 11/11/2022]
Abstract
ABSTRACT This multiagency report developed by the Interagency Collaboration for Genomics for Food and Feed Safety provides an overview of the use of and transition to whole genome sequencing (WGS) technology for detection and characterization of pathogens transmitted commonly by food and for identification of their sources. We describe foodborne pathogen analysis, investigation, and harmonization efforts among the following federal agencies: National Institutes of Health; Department of Health and Human Services, Centers for Disease Control and Prevention (CDC) and U.S. Food and Drug Administration (FDA); and the U.S. Department of Agriculture, Food Safety and Inspection Service, Agricultural Research Service, and Animal and Plant Health Inspection Service. We describe single nucleotide polymorphism, core-genome, and whole genome multilocus sequence typing data analysis methods as used in the PulseNet (CDC) and GenomeTrakr (FDA) networks, underscoring the complementary nature of the results for linking genetically related foodborne pathogens during outbreak investigations while allowing flexibility to meet the specific needs of Interagency Collaboration partners. We highlight how we apply WGS to pathogen characterization (virulence and antimicrobial resistance profiles) and source attribution efforts and increase transparency by making the sequences and other data publicly available through the National Center for Biotechnology Information. We also highlight the impact of current trends in the use of culture-independent diagnostic tests for human diagnostic testing on analytical approaches related to food safety and what is next for the use of WGS in the area of food safety. HIGHLIGHTS
Collapse
Affiliation(s)
- Eric L Stevens
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Heather A Carleton
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Jennifer Beal
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Glenn E Tillman
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | - Rebecca L Lindsey
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - A C Lauer
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Arthur Pightling
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Karen G Jarvis
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Andrea Ottesen
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Padmini Ramachandran
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Leslie Hintz
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Lee S Katz
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Jason P Folster
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Jean M Whichard
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Eija Trees
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Ruth E Timme
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Patrick McDERMOTT
- U.S. Food and Drug Administration, Center for Veterinary Medicine, Laurel, Maryland 20708
| | - Beverly Wolpert
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Michael Bazaco
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Shaohua Zhao
- U.S. Food and Drug Administration, Center for Veterinary Medicine, Laurel, Maryland 20708
| | - Sabina Lindley
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Beau B Bruce
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Patricia M Griffin
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Eric Brown
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Marc Allard
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Sandra Tallent
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Kari Irvin
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Maria Hoffmann
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Matt Wise
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Robert Tauxe
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Peter Gerner-Smidt
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Mustafa Simmons
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | - Bonnie Kissler
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | | | - William Klimke
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| | - Richa Agarwala
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894
| | - James Lindsay
- U.S. Department of Agriculture, Agricultural Research Service, Beltsville, Maryland 20705
| | - Kimberly Cook
- U.S. Department of Agriculture, Agricultural Research Service, Beltsville, Maryland 20705
| | - Suelee Robbe Austerman
- U.S. Department of Agriculture, Animal and Plant Health Inspection Service, Ames, Iowa 50010, USA
| | - David Goldman
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | - Sherri McGARRY
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| | - Kis Robertson Hale
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | - Uday Dessai
- U.S. Department of Agriculture, Food Safety and Inspection Service, Washington, DC 20250
| | - Steven M Musser
- U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, Maryland 20740
| | - Chris Braden
- Centers for Disease Control and Prevention, Division of Foodborne, Waterborne and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Atlanta, Georgia 30329
| |
Collapse
|
9
|
Salmonella enterica serovar Typhimurium from Wild Birds in the United States Represent Distinct Lineages Defined by Bird Type. Appl Environ Microbiol 2022; 88:e0197921. [PMID: 35108089 PMCID: PMC8939312 DOI: 10.1128/aem.01979-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Salmonella enterica serovar Typhimurium is typically considered a host generalist; however, certain isolates are associated with specific hosts and show genetic features of host adaptation. Here, we sequenced 131 S. Typhimurium isolates from wild birds collected in 30 U.S. states during 1978–2019. We found that isolates from broad taxonomic host groups including passerine birds, water birds (Aequornithes), and larids (gulls and terns) represented three distinct lineages and certain S. Typhimurium CRISPR types presented in individual lineages. We also showed that lineages formed by wild bird isolates differed from most isolates originating from domestic animal sources, and that genomes from these lineages substantially improved source attribution of Typhimurium genomes to wild birds by a machine learning classifier. Furthermore, virulence gene signatures that differentiated S. Typhimurium from passerines, water birds, and larids were detected. Passerine isolates tended to lack S. Typhimurium-specific virulence plasmids. Isolates from the passerine, water bird, and larid lineages had close genetic relatedness with human clinical isolates, including those from a 2021 U.S. outbreak linked to passerine birds. These observations indicate that S. Typhimurium from wild birds in the United States are likely host-adapted, and the representative genomic data set examined in this study can improve source prediction and facilitate outbreak investigation. IMPORTANCE Within-host evolution of S. Typhimurium may lead to pathovars adapted to specific hosts. Here, we report the emergence of disparate avian S. Typhimurium lineages with distinct virulence gene signatures. The findings highlight the importance of wild birds as a reservoir for S. Typhimurium and contribute to our understanding of the genetic diversity of S. Typhimurium from wild birds. Our study indicates that S. Typhimurium may have undergone adaptive evolution within wild birds in the United States. The representative S. Typhimurium genomes from wild birds, together with the virulence gene signatures identified in these bird isolates, are valuable for S. Typhimurium source attribution and epidemiological surveillance.
Collapse
|
10
|
Phylogeny and potential virulence of cryptic clade Escherichia coli species complex isolates derived from an arable field trial. CURRENT RESEARCH IN MICROBIAL SCIENCES 2022; 3:100093. [PMID: 35005658 PMCID: PMC8718834 DOI: 10.1016/j.crmicr.2021.100093] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2021] [Revised: 12/08/2021] [Accepted: 12/09/2021] [Indexed: 11/22/2022] Open
Abstract
Analysis of Escherichia coli taxonomy has expanded into a species-complex with the identification of divergent cryptic clades. A key question is the evolutionary trajectory of these clades and their relationship to isolates of clinical or veterinary importance. Since they have some environmental association, we screened a collection of E. coli isolated from a long-term spring barley field trial for their presence. While most isolates clustered into the enteric-clade, four of them clustered into Clade-V, and one in Clade-IV. The Clade -V isolates shared >96% intra-clade average nucleotide sequence identity but <91% with other clades. Although pan-genomics analysis confirmed their taxonomy as Clade -V (E. marmotae), retrospective phylogroup PCR did not discriminate them correctly. Differences in metabolic and adherence gene alleles occurred in the Clade -V isolates compared to E. coli sensu scricto. They also encoded the bacteriophage phage-associated cyto-lethal distending toxin (CDT) and antimicrobial resistance (AMR) genes, including an ESBL, blaOXA-453. Thus, the isolate collection encompassed a genetic diversity, and included cryptic clade isolates that encode potential virulence factors. The analysis has determined the phylogenetic relationship of cryptic clade isolates with E. coli sensu scricto and indicates a potential for horizontal transfer of virulence factors.
Collapse
|
11
|
Wang X, Bouzembrak Y, Lansink AO, van der Fels-Klerx HJ. Application of machine learning to the monitoring and prediction of food safety: A review. Compr Rev Food Sci Food Saf 2021; 21:416-434. [PMID: 34907645 DOI: 10.1111/1541-4337.12868] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2021] [Revised: 10/15/2021] [Accepted: 10/21/2021] [Indexed: 12/13/2022]
Abstract
Machine learning (ML) has proven to be a useful technology for data analysis and modeling in a wide variety of domains, including food science and engineering. The use of ML models for the monitoring and prediction of food safety is growing in recent years. Currently, several studies have reviewed ML applications on foodborne disease and deep learning applications on food. This article presents a literature review on ML applications for monitoring and predicting food safety. The paper summarizes and categorizes ML applications in this domain, categorizes and discusses data types used for ML modeling, and provides suggestions for data sources and input variables for future ML applications. The review is based on three scientific literature databases: Scopus, CAB Abstracts, and IEEE. It includes studies that were published in English in the period from January 1, 2011 to April 1, 2021. Results show that most studies applied Bayesian networks, Neural networks, or Support vector machines. Of the various ML models reviewed, all relevant studies showed high prediction accuracy by the validation process. Based on the ML applications, this article identifies several avenues for future studies applying ML models for the monitoring and prediction of food safety, in addition to providing suggestions for data sources and input variables.
Collapse
Affiliation(s)
- Xinxin Wang
- Business Economics, Wageningen University & Research, Wageningen, The Netherlands
| | - Yamine Bouzembrak
- Wageningen Food Safety Research, Wageningen University & Research, Wageningen, The Netherlands
| | - Agjm Oude Lansink
- Business Economics, Wageningen University & Research, Wageningen, The Netherlands
| | - H J van der Fels-Klerx
- Business Economics, Wageningen University & Research, Wageningen, The Netherlands.,Wageningen Food Safety Research, Wageningen University & Research, Wageningen, The Netherlands
| |
Collapse
|
12
|
Holden N. Genomic data of an environmental Escherichia coli isolate shows high resemblance to E. coli K-12 reference strain MG1655. Data Brief 2021; 39:107586. [PMID: 34849384 PMCID: PMC8609132 DOI: 10.1016/j.dib.2021.107586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Revised: 10/28/2021] [Accepted: 11/12/2021] [Indexed: 11/30/2022] Open
Abstract
Escherichia coli species exhibits a high genomic diversification from evolution, mobile genetic elements and recombination. An environmental E. coli isolate, 'JHI_5025' from a crop trial appeared to be clonally related to the historical reference isolate E. coli K-12 strain 'MG1655', warranting further genomic analysis. Their genomes share an average nucleotide identity of 99.74% and whole genome alignment showed little rearrangement of the JHI_5025 sequence compared to the reference. Five genomic islands not in the reference aligned to other sequences in the Enterobacteriaceae. Isolate JHI_5025 contained E. coli K-12 F plasmid sequence and at least one complete prophage sequence. The genome and comparison dataset provides utility of E. coli JHI_5025 as a representative contemporary genetic mimic of a well-known and much used workhorse strain.
Collapse
Affiliation(s)
- Nicola Holden
- SRUC, Department of Rural Land Use, Craibstone Estate, Aberdeen AB21 9YA, UK.,Cell and Molecular Sciences, James Hutton Institute, Dundee DD2 5DA, UK
| |
Collapse
|
13
|
Carlson CJ, Farrell MJ, Grange Z, Han BA, Mollentze N, Phelan AL, Rasmussen AL, Albery GF, Bett B, Brett-Major DM, Cohen LE, Dallas T, Eskew EA, Fagre AC, Forbes KM, Gibb R, Halabi S, Hammer CC, Katz R, Kindrachuk J, Muylaert RL, Nutter FB, Ogola J, Olival KJ, Rourke M, Ryan SJ, Ross N, Seifert SN, Sironen T, Standley CJ, Taylor K, Venter M, Webala PW. The future of zoonotic risk prediction. Philos Trans R Soc Lond B Biol Sci 2021; 376:20200358. [PMID: 34538140 PMCID: PMC8450624 DOI: 10.1098/rstb.2020.0358] [Citation(s) in RCA: 34] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/15/2021] [Indexed: 01/26/2023] Open
Abstract
In the light of the urgency raised by the COVID-19 pandemic, global investment in wildlife virology is likely to increase, and new surveillance programmes will identify hundreds of novel viruses that might someday pose a threat to humans. To support the extensive task of laboratory characterization, scientists may increasingly rely on data-driven rubrics or machine learning models that learn from known zoonoses to identify which animal pathogens could someday pose a threat to global health. We synthesize the findings of an interdisciplinary workshop on zoonotic risk technologies to answer the following questions. What are the prerequisites, in terms of open data, equity and interdisciplinary collaboration, to the development and application of those tools? What effect could the technology have on global health? Who would control that technology, who would have access to it and who would benefit from it? Would it improve pandemic prevention? Could it create new challenges? This article is part of the theme issue 'Infectious disease macroecology: parasite diversity and dynamics across the globe'.
Collapse
Affiliation(s)
- Colin J. Carlson
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, DC 20007, USA
- Department of Microbiology and Immunology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Maxwell J. Farrell
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Zoe Grange
- Public Health Scotland, Glasgow G2 6QE, UK
| | - Barbara A. Han
- Cary Institute of Ecosystem Studies, Millbrook, NY 12545, USA
| | - Nardus Mollentze
- Medical Research Council, University of Glasgow Centre for Virus Research, Glasgow G61 1QH, UK
- Institute of Biodiversity, Animal Health and Comparative Medicine, College of Medical, Veterinary and Life Sciences, University of Glasgow, Glasgow G12 8QQ, UK
| | - Alexandra L. Phelan
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, DC 20007, USA
- O'Neill Institute for National and Global Health Law, Georgetown University Law Center, Washington, DC 20001, USA
| | - Angela L. Rasmussen
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Gregory F. Albery
- Department of Biology, Georgetown University, Washington, DC 20007, USA
| | - Bernard Bett
- Animal and Human Health Program, International Livestock Research Institute, PO Box 30709-00100, Nairobi, Kenya
| | - David M. Brett-Major
- Department of Epidemiology, College of Public Health, University of Nebraska Medical Center, Omaha, NE, USA
| | - Lily E. Cohen
- Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Tad Dallas
- Department of Biological Sciences, Louisiana State University, Baton Rouge, LA 70806, USA
| | - Evan A. Eskew
- Department of Biology, Pacific Lutheran University, Tacoma, WA, USA
| | - Anna C. Fagre
- Department of Microbiology, Immunology, and Pathology, College of Veterinary Medicine and Biomedical Sciences, Colorado State University, Fort Collins, CO, USA
| | - Kristian M. Forbes
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR 72701, USA
| | - Rory Gibb
- Centre on Climate Change and Planetary Health, London School of Hygiene and Tropical Medicine, London, UK
- Centre for Mathematical Modelling of Infectious Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Sam Halabi
- O'Neill Institute for National and Global Health Law, Georgetown University Law Center, Washington, DC 20001, USA
| | - Charlotte C. Hammer
- Centre for the Study of Existential Risk, University of Cambridge, Cambridge, UK
| | - Rebecca Katz
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Jason Kindrachuk
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, Manitoba, Canada R3E 0J9
| | - Renata L. Muylaert
- Molecular Epidemiology and Public Health Laboratory, Hopkirk Research Institute, Massey University, Palmerston North, New Zealand
| | - Felicia B. Nutter
- Department of Infectious Disease and Global Health, Cummings School of Veterinary Medicine, Tufts University, North Grafton, MA 01536, USA
- Department of Public Health and Community Medicine, School of Medicine, Tufts University, Boston, MA 02111, USA
| | | | | | - Michelle Rourke
- Law Futures Centre, Griffith Law School, Griffith University, Nathan, Queensland 4111, Australia
| | - Sadie J. Ryan
- Department of Geography and Emerging Pathogens Institute, University of Florida, Gainesville, FL, USA
- School of Life Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Noam Ross
- EcoHealth Alliance, New York, NY 10018, USA
| | - Stephanie N. Seifert
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, USA
| | - Tarja Sironen
- Department of Virology, University of Helsinki, Helsinki, Finland
- Department of Veterinary Biosciences, University of Helsinki, Helsinki, Finland
| | - Claire J. Standley
- Center for Global Health Science and Security, Georgetown University Medical Center, Washington, DC 20007, USA
- Department of Microbiology and Immunology, Georgetown University Medical Center, Washington, DC 20007, USA
| | - Kishana Taylor
- Department of Chemical Engineering, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | - Marietjie Venter
- Zoonotic Arbo and Respiratory Virus Program, Centre for Viral Zoonoses, Department of Medical Virology, University of Pretoria, Pretoria, South Africa
| | - Paul W. Webala
- Department of Forestry and Wildlife Management, Maasai Mara University, Narok 20500, Kenya
| |
Collapse
|
14
|
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes. Brief Bioinform 2021; 22:bbab313. [PMID: 34379107 PMCID: PMC8575023 DOI: 10.1093/bib/bbab313] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.
Collapse
Affiliation(s)
| | - Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Dustin Machi
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Chunhong Mao
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Rick Stevens
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Alice R Wattam
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Hyunseung Yoo
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, IL, USA
| |
Collapse
|
15
|
Arning N, Sheppard SK, Bayliss S, Clifton DA, Wilson DJ. Machine learning to predict the source of campylobacteriosis using whole genome data. PLoS Genet 2021; 17:e1009436. [PMID: 34662334 PMCID: PMC8553134 DOI: 10.1371/journal.pgen.1009436] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2021] [Revised: 10/28/2021] [Accepted: 08/26/2021] [Indexed: 11/18/2022] Open
Abstract
Campylobacteriosis is among the world's most common foodborne illnesses, caused predominantly by the bacterium Campylobacter jejuni. Effective interventions require determination of the infection source which is challenging as transmission occurs via multiple sources such as contaminated meat, poultry, and drinking water. Strain variation has allowed source tracking based upon allelic variation in multi-locus sequence typing (MLST) genes allowing isolates from infected individuals to be attributed to specific animal or environmental reservoirs. However, the accuracy of probabilistic attribution models has been limited by the ability to differentiate isolates based upon just 7 MLST genes. Here, we broaden the input data spectrum to include core genome MLST (cgMLST) and whole genome sequences (WGS), and implement multiple machine learning algorithms, allowing more accurate source attribution. We increase attribution accuracy from 64% using the standard iSource population genetic approach to 71% for MLST, 85% for cgMLST and 78% for kmerized WGS data using the classifier we named aiSource. To gain insight beyond the source model prediction, we use Bayesian inference to analyse the relative affinity of C. jejuni strains to infect humans and identified potential differences, in source-human transmission ability among clonally related isolates in the most common disease causing lineage (ST-21 clonal complex). Providing generalizable computationally efficient methods, based upon machine learning and population genetics, we provide a scalable approach to global disease surveillance that can continuously incorporate novel samples for source attribution and identify fine-scale variation in transmission potential.
Collapse
Affiliation(s)
- Nicolas Arning
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
- * E-mail:
| | - Samuel K. Sheppard
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Sion Bayliss
- The Milner Centre of Evolution, Department of Biology & Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - David A. Clifton
- Department of Engineering Science, University of Oxford, Oxford, UK; Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Daniel J. Wilson
- Big Data institute, Nuffield Department of Population Health, University of Oxford, Li Ka Shing Centre for Health Information and Discovery, Old Road Campus, Oxford, United Kingdom
| |
Collapse
|
16
|
Mehat JW, van Vliet AHM, La Ragione RM. The Avian Pathogenic Escherichia coli (APEC) pathotype is comprised of multiple distinct, independent genotypes. Avian Pathol 2021; 50:402-416. [PMID: 34047644 DOI: 10.1080/03079457.2021.1915960] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Avian Pathogenic E. coli (APEC) is the causative agent of avian colibacillosis, resulting in economic losses to the poultry industry through morbidity, mortality and carcass condemnation, and impacts the welfare of poultry. Colibacillosis remains a complex disease to manage, hampered by diagnostic and classification strategies for E. coli that are inadequate for defining APEC. However, increased accessibility of whole genome sequencing (WGS) technology has enabled phylogenetic approaches to be applied to the classification of E. coli and genomic characterization of the most common APEC serotypes associated with colibacillosis O1, O2 and O78. These approaches have demonstrated that the O78 serotype is representative of two distinct APEC lineages, ST-23 in phylogroup C and ST-117 in phylogroup G. The O1 and O2 serotypes belong to a third lineage comprised of three sub-populations in phylogroup B2; ST-95, ST-140 and ST-428/ST-429. The frequency with which these genotypes are associated with colibacillosis implicates them as the predominant APEC populations and distinct from those causing incidental or opportunistic infections. The fact that these are disparate clusters from multiple phylogroups suggests that these lineages may have become adapted to the poultry niche independently. WGS studies have highlighted the limitations of traditional APEC classification and can now provide a path towards a robust and more meaningful definition of the APEC pathotype. Future studies should focus on characterizing individual APEC populations in detail and using this information to develop improved diagnostics and interventions.
Collapse
Affiliation(s)
- Jai W Mehat
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| | - Arnoud H M van Vliet
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| | - Roberto M La Ragione
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| |
Collapse
|
17
|
Allen JP, Snitkin E, Pincus NB, Hauser AR. Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning. Trends Microbiol 2021; 29:621-633. [PMID: 33455849 PMCID: PMC8187264 DOI: 10.1016/j.tim.2020.12.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 12/15/2022]
Abstract
The advent of inexpensive and rapid sequencing technologies has allowed bacterial whole-genome sequences to be generated at an unprecedented pace. This wealth of information has revealed an unanticipated degree of strain-to-strain genetic diversity within many bacterial species. Awareness of this genetic heterogeneity has corresponded with a greater appreciation of intraspecies variation in virulence. A number of comparative genomic strategies have been developed to link these genotypic and pathogenic differences with the aim of discovering novel virulence factors. Here, we review recent advances in comparative genomic approaches to identify bacterial virulence determinants, with a focus on genome-wide association studies and machine learning.
Collapse
Affiliation(s)
- Jonathan P Allen
- Department of Microbiology and Immunology, Loyola University Chicago Stritch School of Medicine, Maywood, IL 60153, USA.
| | - Evan Snitkin
- Department of Microbiology and Immunology, Department of Internal Medicine/Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nathan B Pincus
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Alan R Hauser
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Department of Medicine/Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
18
|
Im H, Hwang SH, Kim BS, Choi SH. Pathogenic potential assessment of the Shiga toxin-producing Escherichia coli by a source attribution-considered machine learning model. Proc Natl Acad Sci U S A 2021; 118:e2018877118. [PMID: 33986113 PMCID: PMC8157976 DOI: 10.1073/pnas.2018877118] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Instead of conventional serotyping and virulence gene combination methods, methods have been developed to evaluate the pathogenic potential of newly emerging pathogens. Among them, the machine learning (ML)-based method using whole-genome sequencing (WGS) data are getting attention because of the recent advances in ML algorithms and sequencing technologies. Here, we developed various ML models to predict the pathogenicity of Shiga toxin-producing Escherichia coli (STEC) isolates using their WGS data. The input dataset for the ML models was generated using distinct gene repertoires from positive (pathogenic) and negative (nonpathogenic) control groups in which each STEC isolate was designated based on the source attribution, the relative risk potential of the isolation sources. Among the various ML models examined, a model using the support vector machine (SVM) algorithm, the SVM model, discriminated between the two control groups most accurately. The SVM model successfully predicted the pathogenicity of the isolates from the major sources of STEC outbreaks, the isolates with the history of outbreaks, and the isolates that cannot be assessed by conventional methods. Furthermore, the SVM model effectively differentiated the pathogenic potentials of the isolates at a finer resolution. Permutation importance analyses of the input dataset further revealed the genes important for the estimation, proposing the genes potentially essential for the pathogenicity of STEC. Altogether, these results suggest that the SVM model is a more reliable and broadly applicable method to evaluate the pathogenic potential of STEC isolates compared with conventional methods.
Collapse
Affiliation(s)
- Hanhyeok Im
- National Research Laboratory of Molecular Microbiology and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Center for Food Safety and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea
| | - Seung-Ho Hwang
- National Research Laboratory of Molecular Microbiology and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea
- Department of Agricultural Biotechnology and Center for Food Safety and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea
| | - Byoung Sik Kim
- Department of Food Science and Engineering, Ewha Womans University, 03760 Seoul, Republic of Korea
| | - Sang Ho Choi
- National Research Laboratory of Molecular Microbiology and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea;
- Department of Agricultural Biotechnology and Center for Food Safety and Toxicology, Seoul National University, 08826 Seoul, Republic of Korea
- Center for Food and Bioconvergence, Seoul National University, 08826 Seoul, Republic of Korea
| |
Collapse
|
19
|
Arnold M, Smith RP, Tang Y, Guzinski J, Petrovska L. Bayesian Source Attribution of Salmonella Typhimurium Isolates From Human Patients and Farm Animals in England and Wales. Front Microbiol 2021; 12:579888. [PMID: 33584605 PMCID: PMC7876086 DOI: 10.3389/fmicb.2021.579888] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2020] [Accepted: 01/07/2021] [Indexed: 12/13/2022] Open
Abstract
The purpose of the study was to apply a Bayesian source attribution model to England and Wales based data on Salmonella Typhimurium (ST) and monophasic variants (MST), using different subtyping approaches based on sequence data. The data consisted of laboratory confirmed human cases and mainly livestock samples collected from surveillance or monitoring schemes. Three different subtyping methods were used, 7-loci Multi-Locus Sequence Typing (MLST), Core-genome MLST, and Single Nucleotide Polymorphism distance, with the impact of varying the genetic distance over which isolates would be grouped together being varied for the latter two approaches. A Bayesian frequency matching method, known as the modified Hald method, was applied to the data from each of the subtyping approaches. Pigs were found to be the main contributor to human infection for ST/MST, with approximately 60% of human cases attributed to them, followed by other mammals (mostly horses) and cattle. It was found that the use of different clustering methods based on sequence data had minimal impact on the estimates of source attribution. However, there was an impact of genetic distance over which isolates were grouped: grouping isolates which were relatively closely related increased uncertainty but tended to have a better model fit.
Collapse
Affiliation(s)
- Mark Arnold
- Department of Epidemiological Sciences, Animal and Plant Health Agency (APHA), Addlestone, United Kingdom
| | - Richard Piers Smith
- Department of Epidemiological Sciences, Animal and Plant Health Agency (APHA), Addlestone, United Kingdom
| | - Yue Tang
- Department of Bacteriology, Animal and Plant Health Agency (APHA), Addlestone, United Kingdom
| | - Jaromir Guzinski
- Department of Bacteriology, Animal and Plant Health Agency (APHA), Addlestone, United Kingdom
| | - Liljana Petrovska
- Department of Bacteriology, Animal and Plant Health Agency (APHA), Addlestone, United Kingdom
| |
Collapse
|
20
|
Munck N, Njage PMK, Leekitcharoenphon P, Litrup E, Hald T. Application of Whole-Genome Sequences and Machine Learning in Source Attribution of Salmonella Typhimurium. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2020; 40:1693-1705. [PMID: 32515055 PMCID: PMC7540586 DOI: 10.1111/risa.13510] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Revised: 05/01/2020] [Accepted: 05/04/2020] [Indexed: 06/11/2023]
Abstract
Prevention of the emergence and spread of foodborne diseases is an important prerequisite for the improvement of public health. Source attribution models link sporadic human cases of a specific illness to food sources and animal reservoirs. With the next generation sequencing technology, it is possible to develop novel source attribution models. We investigated the potential of machine learning to predict the animal reservoir from which a bacterial strain isolated from a human salmonellosis case originated based on whole-genome sequencing. Machine learning methods recognize patterns in large and complex data sets and use this knowledge to build models. The model learns patterns associated with genetic variations in bacteria isolated from the different animal reservoirs. We selected different machine learning algorithms to predict sources of human salmonellosis cases and trained the model with Danish Salmonella Typhimurium isolates sampled from broilers (n = 34), cattle (n = 2), ducks (n = 11), layers (n = 4), and pigs (n = 159). Using cgMLST as input features, the model yielded an average accuracy of 0.783 (95% CI: 0.77-0.80) in the source prediction for the random forest and 0.933 (95% CI: 0.92-0.94) for the logit boost algorithm. Logit boost algorithm was most accurate (valid accuracy: 92%, CI: 0.8706-0.9579) and predicted the origin of 81% of the domestic sporadic human salmonellosis cases. The most important source was Danish produced pigs (53%) followed by imported pigs (16%), imported broilers (6%), imported ducks (2%), Danish produced layers (2%), Danish produced cattle and imported cattle (<1%) while 18% was not predicted. Machine learning has potential for improving source attribution modeling based on sequence data. Results of such models can inform risk managers to identify and prioritize food safety interventions.
Collapse
Affiliation(s)
- Nanna Munck
- Research Group for Genomic EpidemiologyThe National Food Institute, Technical University of DenmarkKongens LyngbyDenmark
| | - Patrick Murigu Kamau Njage
- Research Group for Genomic EpidemiologyThe National Food Institute, Technical University of DenmarkKongens LyngbyDenmark
| | - Pimlapas Leekitcharoenphon
- Research Group for Genomic EpidemiologyThe National Food Institute, Technical University of DenmarkKongens LyngbyDenmark
| | - Eva Litrup
- Statens Serum InstituteCopenhagenDenmark
| | - Tine Hald
- Research Group for Genomic EpidemiologyThe National Food Institute, Technical University of DenmarkKongens LyngbyDenmark
| |
Collapse
|
21
|
Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST, Parkhill J, Corander J. Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions. mBio 2020; 11:e01344-20. [PMID: 32636251 PMCID: PMC7343994 DOI: 10.1128/mbio.01344-20] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 06/05/2020] [Indexed: 12/19/2022] Open
Abstract
Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - T Tien Mai
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Marco Galardini
- Biological Design Center, Boston University, Boston, Massachusetts, USA
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
22
|
Guillier L, Gourmelon M, Lozach S, Cadel-Six S, Vignaud ML, Munck N, Hald T, Palma F. AB_SA: Accessory genes-Based Source Attribution - tracing the source of Salmonella enterica Typhimurium environmental strains. Microb Genom 2020; 6:mgen000366. [PMID: 32320376 PMCID: PMC7478624 DOI: 10.1099/mgen.0.000366] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 03/20/2020] [Indexed: 12/31/2022] Open
Abstract
The partitioning of pathogenic strains isolated in environmental or human cases to their sources is challenging. The pathogens usually colonize multiple animal hosts, including livestock, which contaminate the food-production chain and the environment (e.g. soil and water), posing an additional public-health burden and major challenges in the identification of the source. Genomic data opens up new opportunities for the development of statistical models aiming to indicate the likely source of pathogen contamination. Here, we propose a computationally fast and efficient multinomial logistic regression source-attribution classifier to predict the animal source of bacterial isolates based on 'source-enriched' loci extracted from the accessory-genome profiles of a pangenomic dataset. Depending on the accuracy of the model's self-attribution step, the modeller selects the number of candidate accessory genes that best fit the model for calculating the likelihood of (source) category membership. The Accessory genes-Based Source Attribution (AB_SA) method was applied to a dataset of strains of Salmonella enterica Typhimurium and its monophasic variant (S. enterica 1,4,[5],12:i:-). The model was trained on 69 strains with known animal-source categories (i.e. poultry, ruminant and pig). The AB_SA method helped to identify 8 genes as predictors among the 2802 accessory genes. The self-attribution accuracy was 80 %. The AB_SA model was then able to classify 25 of the 29 S. enterica Typhimurium and S. enterica 1,4,[5],12:i:- isolates collected from the environment (considered to be of unknown source) into a specific category (i.e. animal source), with more than 85 % of probability. The AB_SA method herein described provides a user-friendly and valuable tool for performing source-attribution studies in only a few steps. AB_SA is written in R and freely available at https://github.com/lguillier/AB_SA.
Collapse
Affiliation(s)
- Laurent Guillier
- Laboratory for Food Safety, ANSES, University of Paris-EST, Maisons-Alfort, France
- Risk Assessment Department, ANSES, University of Paris-EST, Maisons-Alfort, France
| | - Michèle Gourmelon
- RBE–SGMM, Health, Environment and Microbiology Laboratory, IFREMER, Plouzané, France
| | - Solen Lozach
- RBE–SGMM, Health, Environment and Microbiology Laboratory, IFREMER, Plouzané, France
| | - Sabrina Cadel-Six
- Laboratory for Food Safety, ANSES, University of Paris-EST, Maisons-Alfort, France
| | - Marie-Léone Vignaud
- Laboratory for Food Safety, ANSES, University of Paris-EST, Maisons-Alfort, France
| | - Nanna Munck
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark (DTU), Kongens Lyngby, Denmark
| | - Tine Hald
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark (DTU), Kongens Lyngby, Denmark
| | - Federica Palma
- Laboratory for Food Safety, ANSES, University of Paris-EST, Maisons-Alfort, France
| |
Collapse
|
23
|
Application of artificial intelligence to the in silico assessment of antimicrobial resistance and risks to human and animal health presented by priority enteric bacterial pathogens. ACTA ACUST UNITED AC 2020; 46:180-185. [PMID: 32673383 DOI: 10.14745/ccdr.v46i06a05] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Each year, approximately one in eight Canadians are affected by foodborne illness, either through outbreaks or sporadic illness, with animals being the major reservoir for the pathogens. Whole genome sequence analyses are now routinely implemented by public and animal health laboratories to define epidemiological disease clusters and to identify potential sources of infection. Similarly, a number of bioinformatics tools can be used to identify virulence and antimicrobial resistance (AMR) determinants in the genomes of pathogenic strains. Many important clinical and phenotypic characteristics of these pathogens can now be predicted using machine learning algorithms applied to whole genome sequence data. In this overview, we compare the ability of support vector machines, gradient-boosted decision trees and artificial neural networks to predict the levels of AMR within Salmonella enterica and extended-spectrum β-lactamase (ESBL) producing Escherichia coli. We show that minimum inhibitory concentrations (MIC) for each of 13 antimicrobials for S. enterica strains can be accurately determined, and that ESBL-producing E. coli strains can be accurately classified as susceptible, intermediate or resistant for each of seven antimicrobials. In addition to AMR and bacterial populations of greatest risk to human health, artificial intelligence algorithms hold promise as tools to predict other clinically and epidemiologically important phenotypes of enteric pathogens.
Collapse
|
24
|
Cen S, Yin R, Mao B, Zhao J, Zhang H, Zhai Q, Chen W. Comparative genomics shows niche-specific variations of Lactobacillus plantarum strains isolated from human, Drosophila melanogaster, vegetable and dairy sources. FOOD BIOSCI 2020. [DOI: 10.1016/j.fbio.2020.100581] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
|
25
|
Lupolova N, Lycett SJ, Gally DL. A guide to machine learning for bacterial host attribution using genome sequence data. Microb Genom 2020; 5. [PMID: 31778355 PMCID: PMC6939162 DOI: 10.1099/mgen.0.000317] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With the ever-expanding number of available sequences from bacterial genomes, and the expectation that this data type will be the primary one generated from both diagnostic and research laboratories for the foreseeable future, then there is both an opportunity and a need to evaluate how effectively computational approaches can be used within bacterial genomics to predict and understand complex phenotypes, such as pathogenic potential and host source. This article applied various quantitative methods such as diversity indexes, pangenome-wide association studies (GWAS) and dimensionality reduction techniques to better understand the data and then compared how well unsupervised and supervised machine learning (ML) methods could predict the source host of the isolates. The study uses the example of the pangenomes of 1203 Salmonella enterica serovar Typhimurium isolates in order to predict 'host of isolation' using these different methods. The article is aimed as a review of recent applications of ML in infection biology, but also, by working through this specific dataset, it allows discussion of the advantages and drawbacks of the different techniques. As with all such sub-population studies, the biological relevance will be dependent on the quality and diversity of the input data. Given this major caveat, we show that supervised ML has the potential to add real value to interpretation of bacterial genomic data, as it can provide probabilistic outcomes for important phenotypes, something that is very difficult to achieve with the other methods.
Collapse
Affiliation(s)
- Nadejda Lupolova
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Samantha J Lycett
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - David L Gally
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| |
Collapse
|
26
|
Munck N, Leekitcharoenphon P, Litrup E, Kaas R, Meinen A, Guillier L, Tang Y, Malorny B, Palma F, Borowiak M, Gourmelon M, Simon S, Banerji S, Petrovska L, Dallman TJ, Hald T. Four European Salmonella Typhimurium datasets collected to develop WGS-based source attribution methods. Sci Data 2020; 7:75. [PMID: 32127544 PMCID: PMC7054362 DOI: 10.1038/s41597-020-0417-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 02/03/2020] [Indexed: 11/22/2022] Open
Abstract
Zoonotic Salmonella causes millions of human salmonellosis infections worldwide each year. Information about the source of the bacteria guides risk managers on control and preventive strategies. Source attribution is the effort to quantify the number of sporadic human cases of a specific illness to specific sources and animal reservoirs. Source attribution methods for Salmonella have so far been based on traditional wet-lab typing methods. With the change to whole genome sequencing there is a need to develop new methods for source attribution based on sequencing data. Four European datasets collected in Denmark (DK), Germany (DE), the United Kingdom (UK) and France (FR) are presented in this descriptor. The datasets contain sequenced samples of Salmonella Typhimurium and its monophasic variants isolated from human, food, animal and the environment. The objective of the datasets was either to attribute the human salmonellosis cases to animal reservoirs or to investigate contamination of the environment by attributing the environmental isolates to different animal reservoirs.
Collapse
Affiliation(s)
- Nanna Munck
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark.
| | - Pimlapas Leekitcharoenphon
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Eva Litrup
- Foodborne Infections, Department of Bacteria, Parasites and Fungi, Statens Serum Institute, Copenhagen, Denmark
| | - Rolf Kaas
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Anika Meinen
- Department for Infectious Disease Epidemiology, Robert Koch Institute, Berlin, Germany
| | - Laurent Guillier
- Université Paris Est, ANSES, Laboratory for Food Safety, F-94701, Maisons-Alfort, France
| | - Yue Tang
- Department of Bacteriology, Animal and Plant Health Agency, Weybridge, Surrey, UK
| | - Burkhard Malorny
- Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
| | - Federica Palma
- Université Paris Est, ANSES, Laboratory for Food Safety, F-94701, Maisons-Alfort, France
| | - Maria Borowiak
- Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
| | - Michèle Gourmelon
- Ifremer, Environment and Microbiology Laboratory, RBE, SGMM, Plouzané, France
| | - Sandra Simon
- National Reference Center for Salmonella and other bacterial enteric pathogens, Robert Koch Institute, Wernigerode, Germany
| | - Sangeeta Banerji
- National Reference Center for Salmonella and other bacterial enteric pathogens, Robert Koch Institute, Wernigerode, Germany
| | - Liljana Petrovska
- Department of Bacteriology, Animal and Plant Health Agency, Weybridge, Surrey, UK
| | | | - Tine Hald
- Research Group for Genomic Epidemiology, National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
27
|
Vilne B, Meistere I, Grantiņa-Ieviņa L, Ķibilds J. Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks. Front Microbiol 2019; 10:1722. [PMID: 31447800 PMCID: PMC6691741 DOI: 10.3389/fmicb.2019.01722] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/12/2019] [Indexed: 12/14/2022] Open
Abstract
Foodborne diseases (FBDs) are infections of the gastrointestinal tract caused by foodborne pathogens (FBPs) such as bacteria [Salmonella, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)] and several viruses, but also parasites and some fungi. Artificial intelligence (AI) and its sub-discipline machine learning (ML) are re-emerging and gaining an ever increasing popularity in the scientific community and industry, and could lead to actionable knowledge in diverse ranges of sectors including epidemiological investigations of FBD outbreaks and antimicrobial resistance (AMR). As genotyping using whole-genome sequencing (WGS) is becoming more accessible and affordable, it is increasingly used as a routine tool for the detection of pathogens, and has the potential to differentiate between outbreak strains that are closely related, identify virulence/resistance genes and provide improved understanding of transmission events within hours to days. In most cases, the computational pipeline of WGS data analysis can be divided into four (though, not necessarily consecutive) major steps: de novo genome assembly, genome characterization, comparative genomics, and inference of phylogeny or phylogenomics. In each step, ML could be used to increase the speed and potentially the accuracy (provided increasing amounts of high-quality input data) of identification of the source of ongoing outbreaks, leading to more efficient treatment and prevention of additional cases. In this review, we explore whether ML or any other form of AI algorithms have already been proposed for the respective tasks and compare those with mechanistic model-based approaches.
Collapse
Affiliation(s)
- Baiba Vilne
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
- SIA net-OMICS, Riga, Latvia
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| | | | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| |
Collapse
|
28
|
Zhang S, Li S, Gu W, den Bakker H, Boxrud D, Taylor A, Roe C, Driebe E, Engelthaler DM, Allard M, Brown E, McDermott P, Zhao S, Bruce BB, Trees E, Fields PI, Deng X. Zoonotic Source Attribution of Salmonella enterica Serotype Typhimurium Using Genomic Surveillance Data, United States. Emerg Infect Dis 2019; 25:82-91. [PMID: 30561314 PMCID: PMC6302586 DOI: 10.3201/eid2501.180835] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Increasingly, routine surveillance and monitoring of foodborne pathogens using whole-genome sequencing is creating opportunities to study foodborne illness epidemiology beyond routine outbreak investigations and case–control studies. Using a global phylogeny of Salmonella enterica serotype Typhimurium, we found that major livestock sources of the pathogen in the United States can be predicted through whole-genome sequencing data. Relatively steady rates of sequence divergence in livestock lineages enabled the inference of their recent origins. Elevated accumulation of lineage-specific pseudogenes after divergence from generalist populations and possible metabolic acclimation in a representative swine isolate indicates possible emergence of host adaptation. We developed and retrospectively applied a machine learning Random Forest classifier for genomic source prediction of Salmonella Typhimurium that correctly attributed 7 of 8 major zoonotic outbreaks in the United States during 1998–2013. We further identified 50 key genetic features that were sufficient for robust livestock source prediction.
Collapse
|
29
|
Computational Health Engineering Applied to Model Infectious Diseases and Antimicrobial Resistance Spread. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9122486] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Infectious diseases are the primary cause of mortality worldwide. The dangers of infectious disease are compounded with antimicrobial resistance, which remains the greatest concern for human health. Although novel approaches are under investigation, the World Health Organization predicts that by 2050, septicaemia caused by antimicrobial resistant bacteria could result in 10 million deaths per year. One of the main challenges in medical microbiology is to develop novel experimental approaches, which enable a better understanding of bacterial infections and antimicrobial resistance. After the introduction of whole genome sequencing, there was a great improvement in bacterial detection and identification, which also enabled the characterization of virulence factors and antimicrobial resistance genes. Today, the use of in silico experiments jointly with computational and machine learning offer an in depth understanding of systems biology, allowing us to use this knowledge for the prevention, prediction, and control of infectious disease. Herein, the aim of this review is to discuss the latest advances in human health engineering and their applicability in the control of infectious diseases. An in-depth knowledge of host–pathogen–protein interactions, combined with a better understanding of a host’s immune response and bacterial fitness, are key determinants for halting infectious diseases and antimicrobial resistance dissemination.
Collapse
|
30
|
Wheeler NE, Gardner PP, Barquist L. Machine learning identifies signatures of host adaptation in the bacterial pathogen Salmonella enterica. PLoS Genet 2018; 14:e1007333. [PMID: 29738521 PMCID: PMC5940178 DOI: 10.1371/journal.pgen.1007333] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Accepted: 03/24/2018] [Indexed: 11/18/2022] Open
Abstract
Emerging pathogens are a major threat to public health, however understanding how pathogens adapt to new niches remains a challenge. New methods are urgently required to provide functional insights into pathogens from the massive genomic data sets now being generated from routine pathogen surveillance for epidemiological purposes. Here, we measure the burden of atypical mutations in protein coding genes across independently evolved Salmonella enterica lineages, and use these as input to train a random forest classifier to identify strains associated with extraintestinal disease. Members of the species fall along a continuum, from pathovars which cause gastrointestinal infection and low mortality, associated with a broad host-range, to those that cause invasive infection and high mortality, associated with a narrowed host range. Our random forest classifier learned to perfectly discriminate long-established gastrointestinal and invasive serovars of Salmonella. Additionally, it was able to discriminate recently emerged Salmonella Enteritidis and Typhimurium lineages associated with invasive disease in immunocompromised populations in sub-Saharan Africa, and within-host adaptation to invasive infection. We dissect the architecture of the model to identify the genes that were most informative of phenotype, revealing a common theme of degradation of metabolic pathways in extraintestinal lineages. This approach accurately identifies patterns of gene degradation and diversifying selection specific to invasive serovars that have been captured by more labour-intensive investigations, but can be readily scaled to larger analyses.
Collapse
Affiliation(s)
- Nicole E. Wheeler
- Wellcome Sanger Institute, Hinxton, United Kingdom
- Biomolecular Interaction Centre, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- * E-mail: (NEW); (LB)
| | - Paul P. Gardner
- Biomolecular Interaction Centre, School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | - Lars Barquist
- Institute for Molecular Infection Biology, University of Wuerzburg, Wuerzburg, Germany
- Helmholtz Institute for RNA-based Infection Research, Wuerzburg, Germany
- * E-mail: (NEW); (LB)
| |
Collapse
|