1
|
Feng S, Karanth S, Almuhaideb E, Parveen S, Pradhan AK. Machine learning to predict the relationship between Vibrio spp. concentrations in seawater and oysters and prevalent environmental conditions. Food Res Int 2024; 188:114464. [PMID: 38823834 DOI: 10.1016/j.foodres.2024.114464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2024] [Revised: 04/26/2024] [Accepted: 05/01/2024] [Indexed: 06/03/2024]
Abstract
Vibrio parahaemolyticus and Vibrio vulnificus are bacteria with a significant public health impact. Identifying factors impacting their presence and concentrations in food sources could enable the identification of significant risk factors and prevent incidences of foodborne illness. In recent years, machine learning has shown promise in modeling microbial presence based on prevalent external and internal variables, such as environmental variables and gene presence/absence, respectively, particularly with the generation and availability of large amounts and diverse sources of data. Such analyses can prove useful in predicting microbial behavior in food systems, particularly under the influence of the constant changes in environmental variables. In this study, we tested the efficacy of six machine learning regression models (random forest, support vector machine, elastic net, neural network, k-nearest neighbors, and extreme gradient boosting) in predicting the relationship between environmental variables and total and pathogenic V. parahaemolyticus and V. vulnificus concentrations in seawater and oysters. In general, environmental variables were found to be reliable predictors of total and pathogenic V. parahaemolyticus and V. vulnificus concentrations in seawater, and pathogenic V. parahaemolyticus in oysters (Acceptable Prediction Zone >70 %) when analyzed using our machine learning models. SHapley Additive exPlanations, which was used to identify variables influencing Vibrio concentrations, identified chlorophyll a content, seawater salinity, seawater temperature, and turbidity as influential variables. It is important to note that different strains were differentially impacted by the same environmental variable, indicating the need for further research to study the causes and potential mechanisms of these variations. In conclusion, environmental variables could be important predictors of Vibrio growth and behavior in seafood. Moreover, the models developed in this study could prove invaluable in assessing and managing the risks associated with V. parahaemolyticus and V. vulnificus, particularly in the face of a changing environment.
Collapse
Affiliation(s)
- Shuyi Feng
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA
| | - Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA
| | - Esam Almuhaideb
- Department of Agriculture, Food and Resource Sciences, University of Maryland Eastern Shore, Princess Anne, MD 21853, USA
| | - Salina Parveen
- Department of Agriculture, Food and Resource Sciences, University of Maryland Eastern Shore, Princess Anne, MD 21853, USA
| | - Abani K Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
2
|
Garcia-Vozmediano A, Maurella C, Ceballos LA, Crescio E, Meo R, Martelli W, Pitti M, Lombardi D, Meloni D, Pasqualini C, Ru G. Machine learning approach as an early warning system to prevent foodborne Salmonella outbreaks in northwestern Italy. Vet Res 2024; 55:72. [PMID: 38840261 PMCID: PMC11154984 DOI: 10.1186/s13567-024-01323-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 04/15/2024] [Indexed: 06/07/2024] Open
Abstract
Salmonellosis, one of the most common foodborne infections in Europe, is monitored by food safety surveillance programmes, resulting in the generation of extensive databases. By leveraging tree-based machine learning (ML) algorithms, we exploited data from food safety audits to predict spatiotemporal patterns of salmonellosis in northwestern Italy. Data on human cases confirmed in 2015-2018 (n = 1969) and food surveillance data collected in 2014-2018 were used to develop ML algorithms. We integrated the monthly municipal human incidence with 27 potential predictors, including the observed prevalence of Salmonella in food. We applied the tree regression, random forest and gradient boosting algorithms considering different scenarios and evaluated their predictivity in terms of the mean absolute percentage error (MAPE) and R2. Using a similar dataset from the year 2019, spatiotemporal predictions and their relative sensitivities and specificities were obtained. Random forest and gradient boosting (R2 = 0.55, MAPE = 7.5%) outperformed the tree regression algorithm (R2 = 0.42, MAPE = 8.8%). Salmonella prevalence in food; spatial features; and monitoring efforts in ready-to-eat milk, fruits and vegetables, and pig meat products contributed the most to the models' predictivity, reducing the variance by 90.5%. Conversely, the number of positive samples obtained for specific food matrices minimally influenced the predictions (2.9%). Spatiotemporal predictions for 2019 showed sensitivity and specificity levels of 46.5% (due to the lack of some infection hotspots) and 78.5%, respectively. This study demonstrates the added value of integrating data from human and veterinary health services to develop predictive models of human salmonellosis occurrence, providing early warnings useful for mitigating foodborne disease impacts on public health.
Collapse
Affiliation(s)
- Aitor Garcia-Vozmediano
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy.
| | - Cristiana Maurella
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Leonardo A Ceballos
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Elisabetta Crescio
- Tecnológico de Monterrey, Av. Eugenio Garza Sada 2501 Sur, Tecnológico, 64849, Monterrey, N.L., México
| | - Rosa Meo
- Department of Computer Science, University of Turin, Corso Svizzera 185, 10149, Turin, Italy
| | - Walter Martelli
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Monica Pitti
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Daniela Lombardi
- Piedmont Regional Service for the Epidemiology of Infectious Diseases (SeREMI), Via Venezia 6, 15121, Alessandria, Italy
| | - Daniela Meloni
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| | - Chiara Pasqualini
- Piedmont Regional Service for the Epidemiology of Infectious Diseases (SeREMI), Via Venezia 6, 15121, Alessandria, Italy
| | - Giuseppe Ru
- Istituto Zooprofilattico Sperimentale del Piemonte, Liguria e Valle d'Aosta, Via Bologna 148, 10154, Turin, Italy
| |
Collapse
|
3
|
Mather AE, Gilmour MW, Reid SWJ, French NP. Foodborne bacterial pathogens: genome-based approaches for enduring and emerging threats in a complex and changing world. Nat Rev Microbiol 2024:10.1038/s41579-024-01051-z. [PMID: 38789668 DOI: 10.1038/s41579-024-01051-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/16/2024] [Indexed: 05/26/2024]
Abstract
Foodborne illnesses pose a substantial health and economic burden, presenting challenges in prevention due to the diverse microbial hazards that can enter and spread within food systems. Various factors, including natural, political and commercial drivers, influence food production and distribution. The risks of foodborne illness will continue to evolve in step with these drivers and with changes to food systems. For example, climate impacts on water availability for agriculture, changes in food sustainability targets and evolving customer preferences can all have an impact on the ecology of foodborne pathogens and the agrifood niches that can carry microorganisms. Whole-genome and metagenome sequencing, combined with microbial surveillance schemes and insights from the food system, can provide authorities and businesses with transformative information to address risks and implement new food safety interventions across the food chain. In this Review, we describe how genome-based approaches have advanced our understanding of the evolution and spread of enduring bacterial foodborne hazards as well as their role in identifying emerging foodborne hazards. Furthermore, foodborne hazards exist in complex microbial communities across the entire food chain, and consideration of these co-existing organisms is essential to understanding the entire ecology supporting pathogen persistence and transmission in an evolving food system.
Collapse
Affiliation(s)
- Alison E Mather
- Quadram Institute Bioscience, Norwich, UK.
- University of East Anglia, Norwich, UK.
| | - Matthew W Gilmour
- Quadram Institute Bioscience, Norwich, UK
- University of East Anglia, Norwich, UK
| | | | - Nigel P French
- Tāuwharau Ora, School of Veterinary Science, Te Kunenga Ki Pūrehuroa, Massey University, Papaioea, Palmerston North, Aotearoa New Zealand
| |
Collapse
|
4
|
Taiwo OR, Onyeaka H, Oladipo EK, Oloke JK, Chukwugozie DC. Advancements in Predictive Microbiology: Integrating New Technologies for Efficient Food Safety Models. Int J Microbiol 2024; 2024:6612162. [PMID: 38799770 PMCID: PMC11126350 DOI: 10.1155/2024/6612162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 04/01/2024] [Accepted: 04/23/2024] [Indexed: 05/29/2024] Open
Abstract
Predictive microbiology is a rapidly evolving field that has gained significant interest over the years due to its diverse application in food safety. Predictive models are widely used in food microbiology to estimate the growth of microorganisms in food products. These models represent the dynamic interactions between intrinsic and extrinsic food factors as mathematical equations and then apply these data to predict shelf life, spoilage, and microbial risk assessment. Due to their ability to predict the microbial risk, these tools are also integrated into hazard analysis critical control point (HACCP) protocols. However, like most new technologies, several limitations have been linked to their use. Predictive models have been found incapable of modeling the intricate microbial interactions in food colonized by different bacteria populations under dynamic environmental conditions. To address this issue, researchers are integrating several new technologies into predictive models to improve efficiency and accuracy. Increasingly, newer technologies such as whole genome sequencing (WGS), metagenomics, artificial intelligence, and machine learning are being rapidly adopted into newer-generation models. This has facilitated the development of devices based on robotics, the Internet of Things, and time-temperature indicators that are being incorporated into food processing both domestically and industrially globally. This study reviewed current research on predictive models, limitations, challenges, and newer technologies being integrated into developing more efficient models. Machine learning algorithms commonly employed in predictive modeling are discussed with emphasis on their application in research and industry and their advantages over traditional models.
Collapse
Affiliation(s)
| | - Helen Onyeaka
- School of Chemical Engineering, University of Birmingham, Edgbaston B15 2TT, Birmingham, UK
| | - Elijah K. Oladipo
- Genomics Unit, Helix Biogen Institute, Ogbomosho, Oyo, Nigeria
- Department of Microbiology, Laboratory of Molecular Biology, Immunology and Bioinformatics, Adeleke University, Ede, Osun, Nigeria
| | - Julius Kola Oloke
- Department of Natural Science, Microbiology Unit, Precious Cornerstone University, Ibadan, Oyo, Nigeria
| | | |
Collapse
|
5
|
Guzinski J, Tang Y, Chattaway MA, Dallman TJ, Petrovska L. Development and validation of a random forest algorithm for source attribution of animal and human Salmonella Typhimurium and monophasic variants of S. Typhimurium isolates in England and Wales utilising whole genome sequencing data. Front Microbiol 2024; 14:1254860. [PMID: 38533130 PMCID: PMC10963456 DOI: 10.3389/fmicb.2023.1254860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 12/22/2023] [Indexed: 03/28/2024] Open
Abstract
Source attribution has traditionally involved combining epidemiological data with different pathogen characterisation methods, including 7-gene multi locus sequence typing (MLST) or serotyping, however, these approaches have limited resolution. In contrast, whole genome sequencing data provide an overview of the whole genome that can be used by attribution algorithms. Here, we applied a random forest (RF) algorithm to predict the primary sources of human clinical Salmonella Typhimurium (S. Typhimurium) and monophasic variants (monophasic S. Typhimurium) isolates. To this end, we utilised single nucleotide polymorphism diversity in the core genome MLST alleles obtained from 1,061 laboratory-confirmed human and animal S. Typhimurium and monophasic S. Typhimurium isolates as inputs into a RF model. The algorithm was used for supervised learning to classify 399 animal S. Typhimurium and monophasic S. Typhimurium isolates into one of eight distinct primary source classes comprising common livestock and pet animal species: cattle, pigs, sheep, other mammals (pets: mostly dogs and horses), broilers, layers, turkeys, and game birds (pheasants, quail, and pigeons). When applied to the training set animal isolates, model accuracy was 0.929 and kappa 0.905, whereas for the test set animal isolates, for which the primary source class information was withheld from the model, the accuracy was 0.779 and kappa 0.700. Subsequently, the model was applied to assign 662 human clinical cases to the eight primary source classes. In the dataset, 60/399 (15.0%) of the animal and 141/662 (21.3%) of the human isolates were associated with a known outbreak of S. Typhimurium definitive type (DT) 104. All but two of the 141 DT104 outbreak linked human isolates were correctly attributed by the model to the primary source classes identified as the origin of the DT104 outbreak. A model that was run without the clonal DT104 animal isolates produced largely congruent outputs (training set accuracy 0.989 and kappa 0.985; test set accuracy 0.781 and kappa 0.663). Overall, our results show that RF offers considerable promise as a suitable methodology for epidemiological tracking and source attribution for foodborne pathogens.
Collapse
Affiliation(s)
- Jaromir Guzinski
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| | - Yue Tang
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| | - Marie Anne Chattaway
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London, United Kingdom
| | - Timothy J. Dallman
- Gastrointestinal Bacteria Reference Unit, UK Health Security Agency, London, United Kingdom
| | - Liljana Petrovska
- Animal and Plant Health Agency, Bacteriology Department, Addlestone, United Kingdom
| |
Collapse
|
6
|
Zhang T, Rabhi F, Chen X, Paik HY, MacIntyre CR. A machine learning-based universal outbreak risk prediction tool. Comput Biol Med 2024; 169:107876. [PMID: 38176209 DOI: 10.1016/j.compbiomed.2023.107876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 12/12/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
In order to prevent and control the increasing number of serious epidemics, the ability to predict the risk caused by emerging outbreaks is essential. However, most current risk prediction tools, except EPIRISK, are limited by being designed for targeting only one specific disease and one country. Differences between countries and diseases (e.g., different economic conditions, different modes of transmission, etc.) pose challenges for building models with cross-country and cross-disease prediction capabilities. The limitation of universality affects domestic and international efforts to control and prevent pandemic outbreaks. To address this problem, we used outbreak data from 43 diseases in 206 countries to develop a universal risk prediction system that can be used across countries and diseases. This system used five machine learning models (including Neural Network XGBoost, Logistic Boost, Random Forest and Kernel SVM) to predict and vote together to make ensemble predictions. It can make predictions with around 80%-90 % accuracy from economic, cultural, social, and epidemiological factors. Three different datasets were designed to test the performance of ML models under different realistic situations. This prediction system has strong predictive ability, adaptability, and generality. It can give universal outbreak risk assessment that are not limited by border or disease type, facilitate rapid response to pandemic outbreaks, government decision-making and international cooperation.
Collapse
Affiliation(s)
- Tianyu Zhang
- FinanceIT Research Group, University of New South Wales, Sydney, NSW, Australia.
| | - Fethi Rabhi
- FinanceIT Research Group, University of New South Wales, Sydney, NSW, Australia
| | - Xin Chen
- Biosecurity Program, The Kirby Institute, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Hye-Young Paik
- School of Computer Science and Engineering, Faulty of Engineering, University of New South Wales, Sydney, NSW, 2052, Australia
| | - Chandini Raina MacIntyre
- Biosecurity Program, The Kirby Institute, University of New South Wales, Sydney, NSW, 2052, Australia; College of Public Service & Community Solutions, Arizona State University, Tempe, AZ, 85004, United States
| |
Collapse
|
7
|
Djordjevic SP, Jarocki VM, Seemann T, Cummins ML, Watt AE, Drigo B, Wyrsch ER, Reid CJ, Donner E, Howden BP. Genomic surveillance for antimicrobial resistance - a One Health perspective. Nat Rev Genet 2024; 25:142-157. [PMID: 37749210 DOI: 10.1038/s41576-023-00649-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/02/2023] [Indexed: 09/27/2023]
Abstract
Antimicrobial resistance (AMR) - the ability of microorganisms to adapt and survive under diverse chemical selection pressures - is influenced by complex interactions between humans, companion and food-producing animals, wildlife, insects and the environment. To understand and manage the threat posed to health (human, animal, plant and environmental) and security (food and water security and biosecurity), a multifaceted 'One Health' approach to AMR surveillance is required. Genomic technologies have enabled monitoring of the mobilization, persistence and abundance of AMR genes and mutations within and between microbial populations. Their adoption has also allowed source-tracing of AMR pathogens and modelling of AMR evolution and transmission. Here, we highlight recent advances in genomic AMR surveillance and the relative strengths of different technologies for AMR surveillance and research. We showcase recent insights derived from One Health genomic surveillance and consider the challenges to broader adoption both in developed and in lower- and middle-income countries.
Collapse
Affiliation(s)
- Steven P Djordjevic
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia.
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia.
| | - Veronica M Jarocki
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Torsten Seemann
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Max L Cummins
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Anne E Watt
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Barbara Drigo
- UniSA STEM, University of South Australia, Adelaide, South Australia, Australia
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
| | - Ethan R Wyrsch
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Cameron J Reid
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Erica Donner
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
- Cooperative Research Centre for Solving Antimicrobial Resistance in Agribusiness, Food, and Environments (CRC SAAFE), Adelaide, South Australia, Australia
| | - Benjamin P Howden
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| |
Collapse
|
8
|
Benefo EO, Karanth S, Pradhan AK. A machine learning approach to identifying Salmonella stress response genes in isolates from poultry processing. Food Res Int 2024; 175:113635. [PMID: 38128977 DOI: 10.1016/j.foodres.2023.113635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Revised: 10/21/2023] [Accepted: 10/24/2023] [Indexed: 12/23/2023]
Abstract
We explored the potential of machine learning to identify significant genes associated with Salmonella stress response during poultry processing using whole genome sequencing (WGS) data. The Salmonella isolates (n = 177) used in this study were obtained from various chicken sources (skin before chiller, chicken carcass before chiller, frozen chicken, and post-chill chicken carcass). Six machine learning algorithms (random forest, neural network, cost-sensitive learning, logit boost, and support vector machine linear and radial kernels) were trained on Salmonella WGS data, and model fit was assessed using standard evaluation metrics such as the area under the receiver operating characteristic (AUROC) curve and confusion matrix statistics. All models achieved high performances based on the AUROC metric, with logit boost showing the best performance with an AUROC score of 0.904, sensitivity of 0.889, and specificity of 0.920. The significant genes identified included ybtX, which encodes a Yersiniabactin-associated zinc transporter, and the transferase-encoding genes yccK and thiS. Additionally, genes coding for cold (cspA, cspD, and cspE) and heat shock (rpoH and rpoE) responses were identified. Other significant genes included those involved in lipopolysaccharide biosynthesis (irp1, waaD, rfc, and rfbX), DNA repair and replication (traI), biofilm formation (ccdA and fyuA), and cellular metabolism (irtA).
Collapse
Affiliation(s)
- Edmund O Benefo
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA
| | - Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA
| | - Abani K Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
9
|
Gu W, Cui Z, Stroika S, Carleton HA, Conrad A, Katz LS, Richardson LC, Hunter J, Click ES, Bruce BB. Predicting Food Sources of Listeria monocytogenes Based on Genomic Profiling Using Random Forest Model. Foodborne Pathog Dis 2023; 20:579-586. [PMID: 37699246 DOI: 10.1089/fpd.2023.0046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/14/2023] Open
Abstract
Listeria monocytogenes can cause severe foodborne illness, including miscarriage during pregnancy or death in newborn infants. When outbreaks of L. monocytogenes illness occur, it may be possible to determine the food source of the outbreak. However, most reported L. monocytogenes illnesses do not occur as part of a recognized outbreak and most of the time the food source of sporadic L. monocytogenes illness in people cannot be determined. In the United States, L. monocytogenes isolates from patients, foods, and environments are routinely sequenced and analyzed by whole genome multilocus sequence typing (wgMLST) for outbreak detection by PulseNet, the national molecular surveillance system for foodborne illnesses. We investigated whether machine learning approaches applied to wgMLST allele call data could assist in attribution analysis of food source of L. monocytogenes isolates. We compiled isolates with a known source from five food categories (dairy, fruit, meat, seafood, and vegetable) using the metadata of L. monocytogenes isolates in PulseNet, deduplicated closely genetically related isolates, and developed random forest models to predict the food sources of isolates. Prediction accuracy of the final model varied across the food categories; it was highest for meat (65%), followed by fruit (45%), vegetable (45%), dairy (44%), and seafood (37%); overall accuracy was 49%, compared with the naive prediction accuracy of 28%. Our results show that random forest can be used to capture genetically complex features of high-resolution wgMLST for attribution of isolates to their sources.
Collapse
Affiliation(s)
- Weidong Gu
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Zhaohui Cui
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Steven Stroika
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Heather A Carleton
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Amanda Conrad
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Lee S Katz
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - LaTonia C Richardson
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Jennifer Hunter
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Eleanor S Click
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| | - Beau B Bruce
- Division of Foodborne, Waterborne and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
| |
Collapse
|
10
|
D'Onofrio F, Schirone M, Krasteva I, Tittarelli M, Iannetti L, Pomilio F, Torresi M, Paparella A, D'Alterio N, Luciani M. A comprehensive investigation of protein expression profiles in L. monocytogenes exposed to thermal abuse, mild acid, and salt stress conditions. Front Microbiol 2023; 14:1271787. [PMID: 37876777 PMCID: PMC10591339 DOI: 10.3389/fmicb.2023.1271787] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 09/19/2023] [Indexed: 10/26/2023] Open
Abstract
Preventing L. monocytogenes infection is crucial for food safety, considering its widespread presence in the environment and its association with contaminated RTE foods. The pathogen's ability to persist under adverse conditions, for example, in food processing facilities, is linked to virulence and resistance mechanisms, including biofilm formation. In this study, the protein expression patterns of two L. monocytogenes 1/2a strains, grown under environmental stressors (mild acidic pH, thermal abuse, and high concentration of NaCl), were investigated. Protein identification and prediction were performed by nLC-ESI-MS/MS and nine different bioinformatic software programs, respectively. Gene enrichment analysis was carried out by STRING v11.05. A total of 1,215 proteins were identified, of which 335 were non-cytosolic proteins and 265 were immunogenic proteins. Proteomic analysis revealed differences in protein expression between L. monocytogenes strains in stressful conditions. The two strains exhibited unique protein expression profiles linked to stress response, virulence, and pathogenesis. Studying the proteomic profiles of such microorganisms provides information about adaptation and potential treatments, highlighting their genetic diversity and demonstrating the utility of bioinformatics and proteomics for a broader analysis of pathogens.
Collapse
Affiliation(s)
- Federica D'Onofrio
- Department of Bioscience and Technology for Food, Agriculture and Environment, University of Teramo, Teramo, Italy
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Maria Schirone
- Department of Bioscience and Technology for Food, Agriculture and Environment, University of Teramo, Teramo, Italy
| | - Ivanka Krasteva
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Manuela Tittarelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Luigi Iannetti
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Francesco Pomilio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Marina Torresi
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Antonello Paparella
- Department of Bioscience and Technology for Food, Agriculture and Environment, University of Teramo, Teramo, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Mirella Luciani
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| |
Collapse
|
11
|
Castelli P, De Ruvo A, Bucciacchio A, D'Alterio N, Cammà C, Di Pasquale A, Radomski N. Harmonization of supervised machine learning practices for efficient source attribution of Listeria monocytogenes based on genomic data. BMC Genomics 2023; 24:560. [PMID: 37736708 PMCID: PMC10515079 DOI: 10.1186/s12864-023-09667-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Genomic data-based machine learning tools are promising for real-time surveillance activities performing source attribution of foodborne bacteria such as Listeria monocytogenes. Given the heterogeneity of machine learning practices, our aim was to identify those influencing the source prediction performance of the usual holdout method combined with the repeated k-fold cross-validation method. METHODS A large collection of 1 100 L. monocytogenes genomes with known sources was built according to several genomic metrics to ensure authenticity and completeness of genomic profiles. Based on these genomic profiles (i.e. 7-locus alleles, core alleles, accessory genes, core SNPs and pan kmers), we developed a versatile workflow assessing prediction performance of different combinations of training dataset splitting (i.e. 50, 60, 70, 80 and 90%), data preprocessing (i.e. with or without near-zero variance removal), and learning models (i.e. BLR, ERT, RF, SGB, SVM and XGB). The performance metrics included accuracy, Cohen's kappa, F1-score, area under the curves from receiver operating characteristic curve, precision recall curve or precision recall gain curve, and execution time. RESULTS The testing average accuracies from accessory genes and pan kmers were significantly higher than accuracies from core alleles or SNPs. While the accuracies from 70 and 80% of training dataset splitting were not significantly different, those from 80% were significantly higher than the other tested proportions. The near-zero variance removal did not allow to produce results for 7-locus alleles, did not impact significantly the accuracy for core alleles, accessory genes and pan kmers, and decreased significantly accuracy for core SNPs. The SVM and XGB models did not present significant differences in accuracy between each other and reached significantly higher accuracies than BLR, SGB, ERT and RF, in this order of magnitude. However, the SVM model required more computing power than the XGB model, especially for high amount of descriptors such like core SNPs and pan kmers. CONCLUSIONS In addition to recommendations about machine learning practices for L. monocytogenes source attribution based on genomic data, the present study also provides a freely available workflow to solve other balanced or unbalanced multiclass phenotypes from binary and categorical genomic profiles of other microorganisms without source code modifications.
Collapse
Affiliation(s)
- Pierluigi Castelli
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea De Ruvo
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Andrea Bucciacchio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicola D'Alterio
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Cesare Cammà
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Adriano Di Pasquale
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy
| | - Nicolas Radomski
- Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise "Giuseppe Caporale" (IZSAM), National Reference Centre (NRC) for Whole Genome Sequencing of microbial pathogens: data base and bioinformatics analysis (GENPAT), Via Campo Boario, Teramo, TE, 64100, Italy.
| |
Collapse
|
12
|
Artificial Intelligence Models for Zoonotic Pathogens: A Survey. Microorganisms 2022; 10:microorganisms10101911. [PMID: 36296187 PMCID: PMC9607465 DOI: 10.3390/microorganisms10101911] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 09/19/2022] [Accepted: 09/22/2022] [Indexed: 11/22/2022] Open
Abstract
Zoonotic diseases or zoonoses are infections due to the natural transmission of pathogens between species (animals and humans). More than 70% of emerging infectious diseases are attributed to animal origin. Artificial Intelligence (AI) models have been used for studying zoonotic pathogens and the factors that contribute to their spread. The aim of this literature survey is to synthesize and analyze machine learning, and deep learning approaches applied to study zoonotic diseases to understand predictive models to help researchers identify the risk factors, and develop mitigation strategies. Based on our survey findings, machine learning and deep learning are commonly used for the prediction of both foodborne and zoonotic pathogens as well as the factors associated with the presence of the pathogens.
Collapse
|