1
|
Nsubuga M, Galiwango R, Jjingo D, Mboowa G. Generalizability of machine learning in predicting antimicrobial resistance in E. coli: a multi-country case study in Africa. BMC Genomics 2024; 25:287. [PMID: 38500034 PMCID: PMC10946178 DOI: 10.1186/s12864-024-10214-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Accepted: 03/11/2024] [Indexed: 03/20/2024] Open
Abstract
BACKGROUND Antimicrobial resistance (AMR) remains a significant global health threat particularly impacting low- and middle-income countries (LMICs). These regions often grapple with limited healthcare resources and access to advanced diagnostic tools. Consequently, there is a pressing need for innovative approaches that can enhance AMR surveillance and management. Machine learning (ML) though underutilized in these settings, presents a promising avenue. This study leverages ML models trained on whole-genome sequencing data from England, where such data is more readily available, to predict AMR in E. coli, targeting key antibiotics such as ciprofloxacin, ampicillin, and cefotaxime. A crucial part of our work involved the validation of these models using an independent dataset from Africa, specifically from Uganda, Nigeria, and Tanzania, to ascertain their applicability and effectiveness in LMICs. RESULTS Model performance varied across antibiotics. The Support Vector Machine excelled in predicting ciprofloxacin resistance (87% accuracy, F1 Score: 0.57), Light Gradient Boosting Machine for cefotaxime (92% accuracy, F1 Score: 0.42), and Gradient Boosting for ampicillin (58% accuracy, F1 Score: 0.66). In validation with data from Africa, Logistic Regression showed high accuracy for ampicillin (94%, F1 Score: 0.97), while Random Forest and Light Gradient Boosting Machine were effective for ciprofloxacin (50% accuracy, F1 Score: 0.56) and cefotaxime (45% accuracy, F1 Score:0.54), respectively. Key mutations associated with AMR were identified for these antibiotics. CONCLUSION As the threat of AMR continues to rise, the successful application of these models, particularly on genomic datasets from LMICs, signals a promising avenue for improving AMR prediction to support large AMR surveillance programs. This work thus not only expands our current understanding of the genetic underpinnings of AMR but also provides a robust methodological framework that can guide future research and applications in the fight against AMR.
Collapse
Affiliation(s)
- Mike Nsubuga
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O Box 7072, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, College of Health Sciences, Makerere University, P.O Box 22418, Kampala, Uganda
- Faculty of Health Sciences, University of Bristol, Bristol, BS40 5DU, UK
- Jean Golding Institute, University of Bristol, Bristol, BS8 1UH, UK
| | - Ronald Galiwango
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O Box 7072, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, College of Health Sciences, Makerere University, P.O Box 22418, Kampala, Uganda
| | - Daudi Jjingo
- Department of Computer Science, College of Computing and Information Sciences, Makerere University, P.O Box 7062, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, College of Health Sciences, Makerere University, P.O Box 22418, Kampala, Uganda
| | - Gerald Mboowa
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O Box 7072, Kampala, Uganda.
- The African Center of Excellence in Bioinformatics and Data-Intensive Sciences, Infectious Diseases Institute, College of Health Sciences, Makerere University, P.O Box 22418, Kampala, Uganda.
- Africa Centres for Disease Control and Prevention, African Union Commission, P.O Box 3243, Roosevelt Street, Addis Ababa, W21 K19, Ethiopia.
| |
Collapse
|
2
|
Singh S, Sharma P, Pal N, Sarma DK, Tiwari R, Kumar M. Holistic One Health Surveillance Framework: Synergizing Environmental, Animal, and Human Determinants for Enhanced Infectious Disease Management. ACS Infect Dis 2024; 10:808-826. [PMID: 38415654 DOI: 10.1021/acsinfecdis.3c00625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/29/2024]
Abstract
Recent pandemics, including the COVID-19 outbreak, have brought up growing concerns about transmission of zoonotic diseases from animals to humans. This highlights the requirement for a novel approach to discern and address the escalating health threats. The One Health paradigm has been developed as a responsive strategy to confront forthcoming outbreaks through early warning, highlighting the interconnectedness of humans, animals, and their environment. The system employs several innovative methods such as the use of advanced technology, global collaboration, and data-driven decision-making to come up with an extraordinary solution for improving worldwide disease responses. This Review deliberates environmental, animal, and human factors that influence disease risk, analyzes the challenges and advantages inherent in using the One Health surveillance system, and demonstrates how these can be empowered by Big Data and Artificial Intelligence. The Holistic One Health Surveillance Framework presented herein holds the potential to revolutionize our capacity to monitor, understand, and mitigate the impact of infectious diseases on global populations.
Collapse
Affiliation(s)
- Samradhi Singh
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| | - Poonam Sharma
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| | - Namrata Pal
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| | - Devojit Kumar Sarma
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| | - Rajnarayan Tiwari
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| | - Manoj Kumar
- ICMR - National Institute for Research in Environmental Health, Bhopal Bypass Road, Bhouri, Bhopal-462030, Madhya Pradesh, India
| |
Collapse
|
3
|
Djordjevic SP, Jarocki VM, Seemann T, Cummins ML, Watt AE, Drigo B, Wyrsch ER, Reid CJ, Donner E, Howden BP. Genomic surveillance for antimicrobial resistance - a One Health perspective. Nat Rev Genet 2024; 25:142-157. [PMID: 37749210 DOI: 10.1038/s41576-023-00649-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/02/2023] [Indexed: 09/27/2023]
Abstract
Antimicrobial resistance (AMR) - the ability of microorganisms to adapt and survive under diverse chemical selection pressures - is influenced by complex interactions between humans, companion and food-producing animals, wildlife, insects and the environment. To understand and manage the threat posed to health (human, animal, plant and environmental) and security (food and water security and biosecurity), a multifaceted 'One Health' approach to AMR surveillance is required. Genomic technologies have enabled monitoring of the mobilization, persistence and abundance of AMR genes and mutations within and between microbial populations. Their adoption has also allowed source-tracing of AMR pathogens and modelling of AMR evolution and transmission. Here, we highlight recent advances in genomic AMR surveillance and the relative strengths of different technologies for AMR surveillance and research. We showcase recent insights derived from One Health genomic surveillance and consider the challenges to broader adoption both in developed and in lower- and middle-income countries.
Collapse
Affiliation(s)
- Steven P Djordjevic
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia.
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia.
| | - Veronica M Jarocki
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Torsten Seemann
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Max L Cummins
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Anne E Watt
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| | - Barbara Drigo
- UniSA STEM, University of South Australia, Adelaide, South Australia, Australia
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
| | - Ethan R Wyrsch
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Cameron J Reid
- Australian Institute for Microbiology and Infection, University of Technology Sydney, Sydney, New South Wales, Australia
- Australian Centre for Genomic Epidemiological Microbiology, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Erica Donner
- Future Industries Institute, University of South Australia, Adelaide, South Australia, Australia
- Cooperative Research Centre for Solving Antimicrobial Resistance in Agribusiness, Food, and Environments (CRC SAAFE), Adelaide, South Australia, Australia
| | - Benjamin P Howden
- Centre for Pathogen Genomics, University of Melbourne, Melbourne, Victoria, Australia
- Microbiological Diagnostic Unit Public Health Laboratory, Department of Microbiology and Immunology, University of Melbourne at the Doherty Institute for Infection and Immunity, Melbourne, Victoria, Australia
| |
Collapse
|
4
|
Zaidan AM. The leading global health challenges in the artificial intelligence era. Front Public Health 2023; 11:1328918. [PMID: 38089037 PMCID: PMC10711066 DOI: 10.3389/fpubh.2023.1328918] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 11/13/2023] [Indexed: 12/18/2023] Open
Abstract
Millions of people's health is at risk because of several factors and multiple overlapping crises, all of which hit the vulnerable the most. These challenges are dynamic and evolve in response to emerging health challenges and concerns, which need effective collaboration among countries working toward achieving Sustainable Development Goals (SDGs) and securing global health. Mental Health, the Impact of climate change, cardiovascular diseases (CVDs), diabetes, Infectious diseases, health system, and population aging are examples of challenges known to pose a vast burden worldwide. We are at a point known as the "digital revolution," characterized by the expansion of artificial intelligence (AI) and a fusion of technology types. AI has emerged as a powerful tool for addressing various health challenges, and the last ten years have been influential due to the rapid expansion in the production and accessibility of health-related data. The computational models and algorithms can understand complicated health and medical data to perform various functions and deep-learning strategies. This narrative mini-review summarizes the most current AI applications to address the leading global health challenges. Harnessing its capabilities can ultimately mitigate the Impact of these challenges and revolutionize the field. It has the ability to strengthen global health through personalized health care and improved preparedness and response to future challenges. However, ethical and legal concerns about individual or community privacy and autonomy must be addressed for effective implementation.
Collapse
Affiliation(s)
- Amal Mousa Zaidan
- King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia
- King Abdullah International Medical Research Center (KAIMRC), Riyadh, Saudi Arabia
- Ministry of National Guard Health Affairs, Riyadh, Saudi Arabia
| |
Collapse
|
5
|
Chalka A, Dallman TJ, Vohra P, Stevens MP, Gally DL. The advantage of intergenic regions as genomic features for machine-learning-based host attribution of Salmonella Typhimurium from the USA. Microb Genom 2023; 9:001116. [PMID: 37843883 PMCID: PMC10634445 DOI: 10.1099/mgen.0.001116] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Accepted: 10/02/2023] [Indexed: 10/17/2023] Open
Abstract
Salmonella enterica is a taxonomically diverse pathogen with over 2600 serovars associated with a wide variety of animal hosts including humans, other mammals, birds and reptiles. Some serovars are host-specific or host-restricted and cause disease in distinct host species, while others, such as serovar S. Typhimurium (STm), are generalists and have the potential to colonize a wide variety of species. However, even within generalist serovars such as STm it is becoming clear that pathovariants exist that differ in tropism and virulence. Identifying the genetic factors underlying host specificity is complex, but the availability of thousands of genome sequences and advances in machine learning have made it possible to build specific host prediction models to aid outbreak control and predict the human pathogenic potential of isolates from animals and other reservoirs. We have advanced this area by building host-association prediction models trained on a wide range of genomic features and compared them with predictions based on nearest-neighbour phylogeny. SNPs, protein variants (PVs), antimicrobial resistance (AMR) profiles and intergenic regions (IGRs) were extracted from 3883 high-quality STm assemblies collected from humans, swine, bovine and poultry in the USA, and used to construct Random Forest (RF) machine learning models. An additional 244 recent STm assemblies from farm animals were used as a test set for further validation. The models based on PVs and IGRs had the best performance in terms of predicting the host of origin of isolates and outperformed nearest-neighbour phylogenetic host prediction as well as models based on SNPs or AMR data. However, the models did not yield reliable predictions when tested with isolates that were phylogenetically distinct from the training set. The IGR and PV models were often able to differentiate human isolates in clusters where the majority of isolates were from a single animal source. Notably, IGRs were the feature with the best performance across multiple models which may be due to IGRs acting as both a representation of their flanking genes, equivalent to PVs, while also capturing genomic regulatory variation, such as altered promoter regions. The IGR and PV models predict that ~45 % of the human infections with STm in the USA originate from bovine, ~40 % from poultry and ~14.5 % from swine, although sequences of isolates from other sources were not used for training. In summary, the research demonstrates a significant gain in accuracy for models with IGRs and PVs as features compared to SNP-based and core genome phylogeny predictions when applied within the existing population structure. This article contains data hosted by Microreact.
Collapse
Affiliation(s)
- Antonia Chalka
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Tim J. Dallman
- Institute for Risk Assessment Sciences (IRAS), University of Utrecht, Heidelberglaan, Utrecht, Netherlands
| | - Prerna Vohra
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - Mark P. Stevens
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| | - David L. Gally
- The Roslin Institute and R(D)SVS, University of Edinburgh, Edinburgh, UK
| |
Collapse
|
6
|
Graña-Miraglia L, Morales-Lizcano N, Wang PW, Hwang DM, Yau YCW, Waters VJ, Guttman DS. Predictive modeling of antibiotic eradication therapy success for new-onset Pseudomonas aeruginosa pulmonary infections in children with cystic fibrosis. PLoS Comput Biol 2023; 19:e1011424. [PMID: 37672526 PMCID: PMC10506723 DOI: 10.1371/journal.pcbi.1011424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Revised: 09/18/2023] [Accepted: 08/09/2023] [Indexed: 09/08/2023] Open
Abstract
Chronic Pseudomonas aeruginosa (Pa) lung infections are the leading cause of mortality among cystic fibrosis (CF) patients; therefore, the eradication of new-onset Pa lung infections is an important therapeutic goal that can have long-term health benefits. The use of early antibiotic eradication therapy (AET) has been shown to clear the majority of new-onset Pa infections, and it is hoped that identifying the underlying basis for AET failure will further improve treatment outcomes. Here we generated machine learning models to predict AET outcomes based on pathogen genomic data. We used a nested cross validation design, population structure control, and recursive feature selection to improve model performance and showed that incorporating population structure control was crucial for improving model interpretation and generalizability. Our best model, controlling for population structure and using only 30 recursively selected features, had an area under the curve of 0.87 for a holdout test dataset. The top-ranked features were generally associated with motility, adhesion, and biofilm formation.
Collapse
Affiliation(s)
- Lucía Graña-Miraglia
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Nadia Morales-Lizcano
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
| | - Pauline W. Wang
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - David M. Hwang
- Department of Laboratory Medicine and Pathobiology, Toronto, Ontario, Canada
- Laboratory Medicine and Molecular Diagnostics, Sunnybrook Health Sciences Centre, Toronto, Ontario, Canada
| | - Yvonne C. W. Yau
- Department of Laboratory Medicine and Pathobiology, Toronto, Ontario, Canada
- Department of Paediatric Laboratory Medicine, Division of Microbiology, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Valerie J. Waters
- Department of Pediatrics, Division of Infectious Diseases, The Hospital for Sick Children, Toronto, Ontario, Canada
- Translational Medicine, Research Institute, Hospital for Sick Children, Toronto, Ontario, Canada
| | - David S. Guttman
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Tsai CN, Massicotte MA, MacNair CR, Perry JN, Brown ED, Coombes BK. Screening under infection-relevant conditions reveals chemical sensitivity in multidrug resistant invasive non-typhoidal Salmonella (iNTS). RSC Chem Biol 2023; 4:600-612. [PMID: 37547457 PMCID: PMC10398353 DOI: 10.1039/d3cb00014a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 06/30/2023] [Indexed: 08/08/2023] Open
Abstract
Bloodstream infections caused by invasive, non-typhoidal Salmonella (iNTS) are a major global health concern, particularly in Africa where the pathogenic variant of Salmonella Typhimurium sequence type (ST) 313 is dominant. Unlike S. Typhimurium strains that cause gastroenteritis, iNTS strains cause bloodstream infections and are resistant to multiple first-line antibiotics, thus limiting current treatment options. Here, we developed and implemented multiple small molecule screens under physiological, infection-relevant conditions to reveal chemical sensitivities in ST313 and to identify host-directed therapeutics as entry points to drug discovery to combat the clinical burden of iNTS. Screening ST313 iNTS under host-mimicking growth conditions identified 92 compounds with antimicrobial activity despite inherent multidrug resistance. We characterized the antimicrobial activity of the nucleoside analog 3'-azido-3'-deoxythymidine as an exemplary compound from this screen, which depended on bacterial thymidine kinase activity for antimicrobial activity. In a companion macrophage-based screening platform designed to enrich for host-directed therapeutics, we identified three compounds (amodiaquine, berbamine, and indatraline) as actives that required the presence of host cells for antibacterial activity. These three compounds had antimicrobial activity only in the presence of host cells that significantly inhibited intracellular ST313 iNTS replication in macrophages. This work provides evidence that despite high invasiveness and multidrug resistance, ST313 iNTS remains susceptible to unconventional drug discovery approaches.
Collapse
Affiliation(s)
- Caressa N Tsai
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
- Michael G. DeGroote Institute for Infectious Disease Research Hamilton ON Canada
| | - Marie-Ange Massicotte
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
- Michael G. DeGroote Institute for Infectious Disease Research Hamilton ON Canada
| | - Craig R MacNair
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
- Michael G. DeGroote Institute for Infectious Disease Research Hamilton ON Canada
| | - Jordyn N Perry
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
| | - Eric D Brown
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
- Michael G. DeGroote Institute for Infectious Disease Research Hamilton ON Canada
| | - Brian K Coombes
- Department of Biochemistry & Biomedical Sciences, McMaster University Hamilton ON L8S 4L8 Canada
- Michael G. DeGroote Institute for Infectious Disease Research Hamilton ON Canada
- Farncombe Family Digestive Health Research Institute Hamilton ON Canada
| |
Collapse
|
8
|
Wong F, de la Fuente-Nunez C, Collins JJ. Leveraging artificial intelligence in the fight against infectious diseases. Science 2023; 381:164-170. [PMID: 37440620 PMCID: PMC10663167 DOI: 10.1126/science.adh1114] [Citation(s) in RCA: 27] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2023] [Accepted: 06/05/2023] [Indexed: 07/15/2023]
Abstract
Despite advances in molecular biology, genetics, computation, and medicinal chemistry, infectious disease remains an ominous threat to public health. Addressing the challenges posed by pathogen outbreaks, pandemics, and antimicrobial resistance will require concerted interdisciplinary efforts. In conjunction with systems and synthetic biology, artificial intelligence (AI) is now leading to rapid progress, expanding anti-infective drug discovery, enhancing our understanding of infection biology, and accelerating the development of diagnostics. In this Review, we discuss approaches for detecting, treating, and understanding infectious diseases, underscoring the progress supported by AI in each case. We suggest future applications of AI and how it might be harnessed to help control infectious disease outbreaks and pandemics.
Collapse
Affiliation(s)
- Felix Wong
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA 19104, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - James J. Collins
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
- Institute for Medical Engineering & Science and Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
- Wyss Institute for Biologically Inspired Engineering, Harvard University, Boston, MA 02115, USA
| |
Collapse
|
9
|
Moradigaravand D, Li L, Dechesne A, Nesme J, de la Cruz R, Ahmad H, Banzhaf M, Sørensen SJ, Smets BF, Kreft JU. Plasmid permissiveness of wastewater microbiomes can be predicted from 16S rRNA sequences by machine learning. Bioinformatics 2023; 39:btad400. [PMID: 37348862 PMCID: PMC10318386 DOI: 10.1093/bioinformatics/btad400] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 06/13/2023] [Accepted: 06/21/2023] [Indexed: 06/24/2023] Open
Abstract
MOTIVATION Wastewater treatment plants (WWTPs) harbor a dense and diverse microbial community. They constantly receive antimicrobial residues and resistant strains, and therefore provide conditions for horizontal gene transfer (HGT) of antimicrobial resistance (AMR) determinants. This facilitates the transmission of clinically important genes between, e.g. enteric and environmental bacteria, and vice versa. Despite the clinical importance, tools for predicting HGT remain underdeveloped. RESULTS In this study, we examined to which extent water cycle microbial community composition, as inferred by partial 16S rRNA gene sequences, can predict plasmid permissiveness, i.e. the ability of cells to receive a plasmid through conjugation, based on data from standardized filter mating assays using fluorescent bio-reporter plasmids. We leveraged a range of machine learning models for predicting the permissiveness for each taxon in the community, representing the range of hosts a plasmid is able to transfer to, for three broad host-range resistance IncP plasmids (pKJK5, pB10, and RP4). Our results indicate that the predicted permissiveness from the best performing model (random forest) showed a moderate-to-strong average correlation of 0.49 for pB10 [95% confidence interval (CI): 0.44-0.55], 0.43 for pKJK5 (0.95% CI: 0.41-0.49), and 0.53 for RP4 (0.95% CI: 0.48-0.57) with the experimental permissiveness in the unseen test dataset. Predictive phylogenetic signals occurred despite the broad host-range nature of these plasmids. Our results provide a framework that contributes to the assessment of the risk of AMR pollution in wastewater systems. AVAILABILITY AND IMPLEMENTATION The predictive tool is available as an application at https://github.com/DaneshMoradigaravand/PlasmidPerm.
Collapse
Affiliation(s)
- Danesh Moradigaravand
- Laboratory of Infectious Disease Epidemiology, KAUST Smart-Health Initiative and Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
| | - Liguan Li
- Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
- Department of Civil Engineering, The University of Hong Kong, Hong Kong, China
| | - Arnaud Dechesne
- Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
| | - Joseph Nesme
- Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Roberto de la Cruz
- Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Huda Ahmad
- Laboratory of Infectious Disease Epidemiology, KAUST Smart-Health Initiative and Biological and Environmental Science and Engineering (BESE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955-6900, Saudi Arabia
- Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Manuel Banzhaf
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| | - Søren J Sørensen
- Department of Biology, University of Copenhagen, 2100 Copenhagen, Denmark
| | - Barth F Smets
- Department of Environmental Engineering, Technical University of Denmark, 2800 Kgs Lyngby, Denmark
| | - Jan-Ulrich Kreft
- Center for Computational Biology, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- Institute of Microbiology and Infection, University of Birmingham, Birmingham, B15 2TT, United Kingdom
- School of Biosciences, University of Birmingham, Birmingham, B15 2TT, United Kingdom
| |
Collapse
|
10
|
Karanth S, Pradhan AK. Development of a novel machine learning-based weighted modeling approach to incorporate Salmonella enterica heterogeneity on a genetic scale in a dose-response modeling framework. RISK ANALYSIS : AN OFFICIAL PUBLICATION OF THE SOCIETY FOR RISK ANALYSIS 2023; 43:440-450. [PMID: 35413139 DOI: 10.1111/risa.13924] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Estimating microbial dose-response is an important aspect of a food safety risk assessment. In recent years, there has been considerable interest to advance these models with potential incorporation of gene expression data. The aim of this study was to develop a novel machine learning model that considers the weights of expression of Salmonella genes that could be associated with illness, given exposure, in hosts. Here, an elastic net-based weighted Poisson regression method was proposed to identify Salmonella enterica genes that could be significantly associated with the illness response, irrespective of serovar. The best-fit elastic net model was obtained by 10-fold cross-validation. The best-fit elastic net model identified 33 gene expression-dose interaction terms that added to the predictability of the model. Of these, nine genes associated with Salmonella metabolism and virulence were found to be significant by the best-fit Poisson regression model (p < 0.05). This method could improve or redefine dose-response relationships for illness from relative proportions of significant genes from a microbial genetic dataset, which would help in refining endpoint and risk estimations.
Collapse
Affiliation(s)
- Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, Maryland, USA
| | - Abani K Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, Maryland, USA
- Center for Food Safety and Security Systems, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
11
|
Sharma P, Dahiya S, Kaur P, Kapil A. Computational biology: Role and scope in taming antimicrobial resistance. Indian J Med Microbiol 2023; 41:33-38. [PMID: 36870746 DOI: 10.1016/j.ijmmb.2022.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Revised: 12/11/2022] [Accepted: 12/16/2022] [Indexed: 01/05/2023]
Abstract
BACKGROUND Infectious diseases pose many challenges due to increasing threat of antimicrobial resistance, which necessitates continuous research to develop novel strategies for development of new molecules with antibacterial activity. In the era of computational biology there are tools and techniques available to address and solve the disease management issues in the field of clinical microbiology. The sequencing techniques, structural biology and machine learning can be implemented collectively to tackle infectious diseases e.g. for the diagnosis, epidemiological typing, pathotyping, antimicrobial resistance detection as well as the discovery of novel drugs and vaccine biomarkers. OBJECTIVES The present review is a narrative review representing a comprehensive literature-based assessment regarding the use of whole genome sequencing, structural biology and machine learning for the diagnosis, molecular typing and antibacterial drug discovery. CONTENT Here, we seek to present an overview of molecular and structural basis of resistance to antibiotics, while focusing on the recent bioinformatics approaches in whole genome sequencing and structural biology. The application of next generation sequencing in management of bacterial infections focusing on investigation of microbial population diversity, genotypic resistance testing and scope for the identification of targets for novel drug and vaccine candidates, has been addressed along with the use of structural biophysics and artificial intelligence.
Collapse
Affiliation(s)
- Priyanka Sharma
- Department of Biophysics, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| | - Sushila Dahiya
- Department of Microbiology, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| | - Punit Kaur
- Department of Biophysics, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| | - Arti Kapil
- Department of Microbiology, All India Institute of Medical Sciences, Ansari Nagar, New Delhi, 110029, India.
| |
Collapse
|
12
|
Guillier L, Palma F, Fritsch L. Taking account of genomics in quantitative microbial risk assessment: what methods? what issues? Curr Opin Food Sci 2022. [DOI: 10.1016/j.cofs.2022.100922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
13
|
Wang H, Jia C, Li H, Yin R, Chen J, Li Y, Yue M. Paving the way for precise diagnostics of antimicrobial resistant bacteria. Front Mol Biosci 2022; 9:976705. [PMID: 36032670 PMCID: PMC9413203 DOI: 10.3389/fmolb.2022.976705] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 07/19/2022] [Indexed: 12/26/2022] Open
Abstract
The antimicrobial resistance (AMR) crisis from bacterial pathogens is frequently emerging and rapidly disseminated during the sustained antimicrobial exposure in human-dominated communities, posing a compelling threat as one of the biggest challenges in humans. The frequent incidences of some common but untreatable infections unfold the public health catastrophe that antimicrobial-resistant pathogens have outpaced the available countermeasures, now explicitly amplified during the COVID-19 pandemic. Nowadays, biotechnology and machine learning advancements help create more fundamental knowledge of distinct spatiotemporal dynamics in AMR bacterial adaptation and evolutionary processes. Integrated with reliable diagnostic tools and powerful analytic approaches, a collaborative and systematic surveillance platform with high accuracy and predictability should be established and implemented, which is not just for an effective controlling strategy on AMR but also for protecting the longevity of valuable antimicrobials currently and in the future.
Collapse
Affiliation(s)
- Hao Wang
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
| | - Chenhao Jia
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Hongzhao Li
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
| | - Rui Yin
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
| | - Jiang Chen
- Department of Microbiology, Zhejiang Provincial Center for Disease Control and Prevention, Hangzhou, China
- *Correspondence: Jiang Chen, ; Yan Li, ; Min Yue,
| | - Yan Li
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
- Zhejiang Provincial Key Laboratory of Preventive Veterinary Medicine, Hangzhou, China
- *Correspondence: Jiang Chen, ; Yan Li, ; Min Yue,
| | - Min Yue
- Institute of Preventive Veterinary Sciences & Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou, China
- Hainan Institute of Zhejiang University, Sanya, China
- Zhejiang Provincial Key Laboratory of Preventive Veterinary Medicine, Hangzhou, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou, China
- *Correspondence: Jiang Chen, ; Yan Li, ; Min Yue,
| |
Collapse
|
14
|
Tanui CK, Benefo EO, Karanth S, Pradhan AK. A Machine Learning Model for Food Source Attribution of Listeria monocytogenes. Pathogens 2022; 11:pathogens11060691. [PMID: 35745545 PMCID: PMC9230378 DOI: 10.3390/pathogens11060691] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 06/06/2022] [Accepted: 06/10/2022] [Indexed: 12/07/2022] Open
Abstract
Despite its low morbidity, listeriosis has a high mortality rate due to the severity of its clinical manifestations. The source of human listeriosis is often unclear. In this study, we investigate the ability of machine learning to predict the food source from which clinical Listeria monocytogenes isolates originated. Four machine learning classification algorithms were trained on core genome multilocus sequence typing data of 1212 L. monocytogenes isolates from various food sources. The average accuracies of random forest, support vector machine radial kernel, stochastic gradient boosting, and logit boost were found to be 0.72, 0.61, 0.7, and 0.73, respectively. Logit boost showed the best performance and was used in model testing on 154 L. monocytogenes clinical isolates. The model attributed 17.5 % of human clinical cases to dairy, 32.5% to fruits, 14.3% to leafy greens, 9.7% to meat, 4.6% to poultry, and 18.8% to vegetables. The final model also provided us with genetic features that were predictive of specific sources. Thus, this combination of genomic data and machine learning-based models can greatly enhance our ability to track L. monocytogenes from different food sources.
Collapse
Affiliation(s)
- Collins K. Tanui
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; (C.K.T.); (E.O.B.); (S.K.)
- Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA
| | - Edmund O. Benefo
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; (C.K.T.); (E.O.B.); (S.K.)
| | - Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; (C.K.T.); (E.O.B.); (S.K.)
| | - Abani K. Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; (C.K.T.); (E.O.B.); (S.K.)
- Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA
- Correspondence:
| |
Collapse
|
15
|
Giulieri SG, Guérillot R, Duchene S, Hachani A, Daniel D, Seemann T, Davis JS, Tong SYC, Young BC, Wilson DJ, Stinear TP, Howden BP. Niche-specific genome degradation and convergent evolution shaping Staphylococcus aureus adaptation during severe infections. eLife 2022; 11:77195. [PMID: 35699423 PMCID: PMC9270034 DOI: 10.7554/elife.77195] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 06/08/2022] [Indexed: 11/13/2022] Open
Abstract
During severe infections, Staphylococcus aureus moves from its colonising sites to blood and tissues and is exposed to new selective pressures, thus, potentially driving adaptive evolution. Previous studies have shown the key role of the agr locus in S. aureus pathoadaptation; however, a more comprehensive characterisation of genetic signatures of bacterial adaptation may enable prediction of clinical outcomes and reveal new targets for treatment and prevention of these infections. Here, we measured adaptation using within-host evolution analysis of 2590 S. aureus genomes from 396 independent episodes of infection. By capturing a comprehensive repertoire of single nucleotide and structural genome variations, we found evidence of a distinctive evolutionary pattern within the infecting populations compared to colonising bacteria. These invasive strains had up to 20-fold enrichments for genome degradation signatures and displayed significantly convergent mutations in a distinctive set of genes, linked to antibiotic response and pathogenesis. In addition to agr-mediated adaptation, we identified non-canonical, genome-wide significant loci including sucA-sucB and stp1. The prevalence of adaptive changes increased with infection extent, emphasising the clinical significance of these signatures. These findings provide a high-resolution picture of the molecular changes when S. aureus transitions from colonisation to severe infection and may inform correlation of infection outcomes with adaptation signatures. The bacterium Staphylococcus aureus lives harmlessly on our skin and noses. However, occasionally, it gets into our blood and internal organs, such as our bones and joints, where it causes severe, long-lasting infections that are difficult to treat. Over time, S. aureus acquire characteristics that help them to adapt to different locations, such as transitioning from the nose to the blood, and avoid being killed by antibiotics. Previous studies have identified changes, or ‘mutations’, in genes that are likely to play an important role in this evolutionary process. One of these genes, called accessory gene regulator (or agr for short), has been shown to control the mechanisms S. aureus use to infect cells and disseminate in the body. However, it is unclear if there are changes in other genes that also help S. aureus adapt to life inside the human body. To help resolve this mystery, Giulieri et al. collected 2,500 samples of S. aureus from almost 400 people. This included bacteria harmlessly living on the skin or in the nose, as well as strains that caused an infection. Gene sequencing revealed a small number of genes, referred to as ‘adaptive genes’, that often acquire mutations during infection. Of these, agr was the most commonly altered. However, mutations in less well-known genes were also identified: some of these genes are related to resistance to antibiotics, while others are involved in chemical processes that help the bacteria to process nutrients. Most mutations were caused by random errors being introduced in to the bacteria’s genetic code which stopped genes from working. However, in some cases, genes were turned off by small fragments of DNA moving around and inserting themselves into different parts of the genome. This study highlights a group of genes that help S. aureus to thrive inside the body and cause severe and prolonged infections. If these results can be confirmed, it may help to guide which antibiotics are used to treat different infections. Furthermore, understanding which genes are important for infection could lead to new strategies for eliminating this dangerous bacterium.
Collapse
Affiliation(s)
- Stefano G Giulieri
- Department of Microbiology and Immunology, University of Melbourne, Parkville, Australia
| | - Romain Guérillot
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Sebastian Duchene
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Abderrahman Hachani
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Diane Daniel
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Torsten Seemann
- Microbiological Diagnostic Unit, University of Melbourne, Melbourne, Australia
| | - Joshua S Davis
- Department of Infectious Diseases, John Hunter Hospital, Newcastle, Australia
| | - Steven Y C Tong
- Victorian Infectious Diseases Service, University of Melbourne, Melbourne, Australia
| | | | | | - Timothy P Stinear
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| | - Benjamin P Howden
- Department of Microbiology and Immunology, University of Melbourne, Melbourne, Australia
| |
Collapse
|
16
|
Qin X, Chiang CWK, Gaggiotti OE. KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis. Brief Bioinform 2022; 23:6596986. [PMID: 35649387 PMCID: PMC9294434 DOI: 10.1093/bib/bbac202] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 04/05/2022] [Accepted: 04/29/2022] [Indexed: 12/30/2022] Open
Abstract
Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.
Collapse
Affiliation(s)
- Xinghu Qin
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| | - Charleston W K Chiang
- Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA
| | - Oscar E Gaggiotti
- Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK
| |
Collapse
|
17
|
Salmonella enterica serovar Typhimurium from Wild Birds in the United States Represent Distinct Lineages Defined by Bird Type. Appl Environ Microbiol 2022; 88:e0197921. [PMID: 35108089 PMCID: PMC8939312 DOI: 10.1128/aem.01979-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Salmonella enterica serovar Typhimurium is typically considered a host generalist; however, certain isolates are associated with specific hosts and show genetic features of host adaptation. Here, we sequenced 131 S. Typhimurium isolates from wild birds collected in 30 U.S. states during 1978–2019. We found that isolates from broad taxonomic host groups including passerine birds, water birds (Aequornithes), and larids (gulls and terns) represented three distinct lineages and certain S. Typhimurium CRISPR types presented in individual lineages. We also showed that lineages formed by wild bird isolates differed from most isolates originating from domestic animal sources, and that genomes from these lineages substantially improved source attribution of Typhimurium genomes to wild birds by a machine learning classifier. Furthermore, virulence gene signatures that differentiated S. Typhimurium from passerines, water birds, and larids were detected. Passerine isolates tended to lack S. Typhimurium-specific virulence plasmids. Isolates from the passerine, water bird, and larid lineages had close genetic relatedness with human clinical isolates, including those from a 2021 U.S. outbreak linked to passerine birds. These observations indicate that S. Typhimurium from wild birds in the United States are likely host-adapted, and the representative genomic data set examined in this study can improve source prediction and facilitate outbreak investigation. IMPORTANCE Within-host evolution of S. Typhimurium may lead to pathovars adapted to specific hosts. Here, we report the emergence of disparate avian S. Typhimurium lineages with distinct virulence gene signatures. The findings highlight the importance of wild birds as a reservoir for S. Typhimurium and contribute to our understanding of the genetic diversity of S. Typhimurium from wild birds. Our study indicates that S. Typhimurium may have undergone adaptive evolution within wild birds in the United States. The representative S. Typhimurium genomes from wild birds, together with the virulence gene signatures identified in these bird isolates, are valuable for S. Typhimurium source attribution and epidemiological surveillance.
Collapse
|
18
|
Karanth S, Tanui CK, Meng J, Pradhan AK. Exploring the predictive capability of advanced machine learning in identifying severe disease phenotype in Salmonella enterica. Food Res Int 2022; 151:110817. [PMID: 34980422 DOI: 10.1016/j.foodres.2021.110817] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2021] [Revised: 11/12/2021] [Accepted: 11/17/2021] [Indexed: 11/26/2022]
Abstract
The past few years have seen a significant increase in availability of whole genome sequencing information, allowing for its incorporation in predictive modeling for foodborne pathogens to account for inter- and intra-species differences in their virulence. However, this is hindered by the inability of traditional statistical methods to analyze such large amounts of data compared to the number of observations/isolates. In this study, we have explored the applicability of machine learning (ML) models to predict the disease outcome, while identifying features that exert a significant effect on the prediction. This study was conducted on Salmonella enterica, a major foodborne pathogen with considerable inter- and intra-serovar variation. WGS of isolates obtained from various sources (i.e., human, chicken, and swine) were used as input in four machine learning models (logistic regression with ridge, random forest, support vector machine, and AdaBoost) to classify isolates based on disease severity (extraintestinal vs. gastrointestinal) in the host. The predictive performances of all models were tested with and without Elastic Net regularization to combat dimensionality issues. Elastic Net-regularized logistic regression model showed the best area under the receiver operating characteristic curve (AUC-ROC; 0.86) and outcome prediction accuracy (0.76). Additionally, genes coding for transcriptional regulation, acidic, oxidative, and anaerobic stress response, and antibiotic resistance were found to be significant predictors of disease severity. These genes, which were significantly associated with each outcome, could possibly be input in amended, gene-expression-specific predictive models to estimate virulence pattern-specific effect of Salmonella and other foodborne pathogens on human health.
Collapse
Affiliation(s)
- Shraddha Karanth
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA
| | - Collins K Tanui
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA
| | - Jianghong Meng
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA; Joint Institute for Food Safety and Applied Nutrition, University of Maryland, College Park, MD 20742, USA
| | - Abani K Pradhan
- Department of Nutrition and Food Science, University of Maryland, College Park, MD 20742, USA; Center for Food Safety and Security Systems, University of Maryland, College Park, MD 20742, USA.
| |
Collapse
|
19
|
Tanui CK, Karanth S, Njage PM, Meng J, Pradhan AK. Machine learning-based predictive modeling to identify genotypic traits associated with Salmonella enterica disease endpoints in isolates from ground chicken. Lebensm Wiss Technol 2022. [DOI: 10.1016/j.lwt.2021.112701] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
20
|
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes. Brief Bioinform 2021; 22:bbab313. [PMID: 34379107 PMCID: PMC8575023 DOI: 10.1093/bib/bbab313] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.
Collapse
Affiliation(s)
| | - Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Dustin Machi
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Chunhong Mao
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Rick Stevens
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Alice R Wattam
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Hyunseung Yoo
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, IL, USA
| |
Collapse
|
21
|
Sandholt AKS, Neimanis A, Roos A, Eriksson J, Söderlund R. Genomic signatures of host adaptation in group B Salmonella enterica ST416/ST417 from harbour porpoises. Vet Res 2021; 52:134. [PMID: 34674747 PMCID: PMC8529817 DOI: 10.1186/s13567-021-01001-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 09/21/2021] [Indexed: 11/21/2022] Open
Abstract
A type of monophasic group B Salmonella enterica with the antigenic formula 4,12:a:- (“Fulica-like”) has been described as associated with harbour porpoises (Phocoena phocoena), most frequently recovered from lung samples. In the present study, lung tissue samples from 47 porpoises found along the Swedish coast or as bycatch in fishing nets were analysed, two of which were positive for S. enterica. Pneumonia due to the infection was considered the likely cause of death for one of the two animals. The recovered isolates were whole genome sequenced and found to belong to sequence type (ST) 416 and to be closely related to ST416/ST417 porpoise isolates from UK waters as determined by core-genome MLST. Serovars Bispebjerg, Fulica and Abortusequi were identified as distantly related to the porpoise isolates, but no close relatives from other host species were found. All ST416/417 isolates had extensive loss of function mutations in key Salmonella pathogenicity islands, but carried accessory genetic elements associated with extraintestinal infection such as iron uptake systems. Gene ontology and pathway analysis revealed reduced secondary metabolic capabilities and loss of function in terms of signalling and response to environmental cues, consistent with adaptation for the extraintestinal niche. A classification system based on machine learning identified ST416/417 as more invasive than classical gastrointestinal serovars. Genome analysis results are thus consistent with ST416/417 as a host-adapted and extraintestinal clonal population of S. enterica, which while found in porpoises without associated pathology can also cause severe opportunistic infections.
Collapse
Affiliation(s)
- Arnar K S Sandholt
- Department of Microbiology, National Veterinary Institute, Uppsala, Sweden
| | - Aleksija Neimanis
- Department of Pathology and Wildlife Diseases, National Veterinary Institute, Uppsala, Sweden
| | - Anna Roos
- Department of Environmental Research and Monitoring, Swedish Museum of Natural History, Stockholm, Sweden
| | - Jenny Eriksson
- Department of Microbiology, National Veterinary Institute, Uppsala, Sweden
| | - Robert Söderlund
- Department of Microbiology, National Veterinary Institute, Uppsala, Sweden.
| |
Collapse
|
22
|
Abstract
Accumulation of phosphorylated intermediates during cellular metabolism can have wide-ranging toxic effects on many organisms, including humans and the pathogens that infect them. These toxicities can be induced by feeding an upstream metabolite (a sugar, for instance) while simultaneously blocking the appropriate metabolic pathway with either a mutation or an enzyme inhibitor. Here, we survey the toxicities that can arise in the metabolism of glucose, galactose, fructose, fructose-asparagine, glycerol, trehalose, maltose, mannose, mannitol, arabinose, and rhamnose. Select enzymes in these metabolic pathways may serve as novel therapeutic targets. Some are conserved broadly among prokaryotes and eukaryotes (e.g., glucose and galactose) and are therefore unlikely to be viable drug targets. However, others are found only in bacteria (e.g., fructose-asparagine, rhamnose, and arabinose), and one is found in fungi but not in humans (trehalose). We discuss what is known about the mechanisms of toxicity and how resistance is achieved in order to identify the prospects and challenges associated with targeted exploitation of these pervasive metabolic vulnerabilities.
Collapse
|
23
|
Machine Learning Prediction of Resistance to Subinhibitory Antimicrobial Concentrations from Escherichia coli Genomes. mSystems 2021; 6:e0034621. [PMID: 34427505 PMCID: PMC8407197 DOI: 10.1128/msystems.00346-21] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Escherichia coli is an important cause of bacterial infections worldwide, with multidrug-resistant strains incurring substantial costs on human lives. Besides therapeutic concentrations of antimicrobials in health care settings, the presence of subinhibitory antimicrobial residues in the environment and in clinics selects for antimicrobial resistance (AMR), but the underlying genetic repertoire is less well understood. Here, we used machine learning to predict the population doubling time and cell growth yield of 1,407 genetically diverse E. coli strains expanding under exposure to three subinhibitory concentrations of six classes of antimicrobials from single-nucleotide genetic variants, accessory gene variation, and the presence of known AMR genes. We predicted cell growth yields in the held-out test data with an average correlation (Spearman's ρ) of 0.63 (0.36 to 0.81 across concentrations) and cell doubling times with an average correlation of 0.59 (0.32 to 0.92 across concentrations), with moderate increases in sample size unlikely to improve predictions further. This finding points to the remaining missing heritability of growth under antimicrobial exposure being explained by effects that are too rare or weak to be captured unless sample size is dramatically increased, or by effects other than those conferred by the presence of individual single-nucleotide polymorphisms (SNPs) and genes. Predictions based on whole-genome information were generally superior to those based only on known AMR genes and were accurate for AMR resistance at therapeutic concentrations. We pinpointed genes and SNPs determining the predicted growth and thereby recapitulated many known AMR determinants. Finally, we estimated the effect sizes of resistance genes across the entire collection of strains, disclosing the growth effects for known resistance genes in each individual strain. Our results underscore the potential of predictive modeling of growth patterns from genomic data under subinhibitory concentrations of antimicrobials, although the remaining missing heritability poses a challenge for achieving the accuracy and precision required for clinical use. IMPORTANCE Predicting bacterial growth from genome sequences is important for a rapid characterization of strains in clinical diagnostics and to disclose candidate novel targets for anti-infective drugs. Previous studies have dissected the relationship between bacterial growth and genotype in mutant libraries for laboratory strains, yet no study so far has examined the predictive power of genome sequence in natural strains. In this study, we used a high-throughput phenotypic assay to measure the growth of a systematic collection of natural Escherichia coli strains and then employed machine learning models to predict bacterial growth from genomic data under nontherapeutic subinhibitory concentrations of antimicrobials that are common in nonclinical settings. We found a moderate to strong correlation between predicted and actual values for the different collected data sets. Moreover, we observed that the known resistance genes are still effective at sublethal concentrations, pointing to clinical implications of these concentrations.
Collapse
|
24
|
Peters S, Pascoe B, Wu Z, Bayliss SC, Zeng X, Edwinson A, Veerabadhran-Gurunathan S, Jawahir S, Calland JK, Mourkas E, Patel R, Wiens T, Decuir M, Boxrud D, Smith K, Parker CT, Farrugia G, Zhang Q, Sheppard SK, Grover M. Campylobacter jejuni genotypes are associated with post-infection irritable bowel syndrome in humans. Commun Biol 2021; 4:1015. [PMID: 34462533 PMCID: PMC8405632 DOI: 10.1038/s42003-021-02554-8] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2020] [Accepted: 08/13/2021] [Indexed: 02/08/2023] Open
Abstract
Campylobacter enterocolitis may lead to post-infection irritable bowel syndrome (PI-IBS) and while some C. jejuni strains are more likely than others to cause human disease, genomic and virulence characteristics promoting PI-IBS development remain uncharacterized. We combined pangenome-wide association studies and phenotypic assays to compare C. jejuni isolates from patients who developed PI-IBS with those who did not. We show that variation in bacterial stress response (Cj0145_phoX), adhesion protein (Cj0628_CapA), and core biosynthetic pathway genes (biotin: Cj0308_bioD; purine: Cj0514_purQ; isoprenoid: Cj0894c_ispH) were associated with PI-IBS development. In vitro assays demonstrated greater adhesion, invasion, IL-8 and TNFα secretion on colonocytes with PI-IBS compared to PI-no-IBS strains. A risk-score for PI-IBS development was generated using 22 genomic markers, four of which were from Cj1631c, a putative heme oxidase gene linked to virulence. Our finding that specific Campylobacter genotypes confer greater in vitro virulence and increased risk of PI-IBS has potential to improve understanding of the complex host-pathogen interactions underlying this condition. Stephanie Peters, Ben Pascoe, et al. use whole-genome sequencing and phenotypic analysis of clinical strains from patients to identify potential genetic factors involved in irritable bowel syndrome resulting from Campylobacter jejuni infection. Their data suggest that genes involved in the bacterial stress response and biosynthetic pathways may contribute toward irritable bowel syndrome, providing further insight into links between Campylobacter genotypes and risk of disease.
Collapse
Affiliation(s)
- Stephanie Peters
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Ben Pascoe
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK
| | - Zuowei Wu
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA
| | - Sion C Bayliss
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK
| | - Ximin Zeng
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Adam Edwinson
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | | | | | - Jessica K Calland
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK
| | - Evangelos Mourkas
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK
| | - Robin Patel
- Division of Clinical Microbiology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Terra Wiens
- Division of Clinical Microbiology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Marijke Decuir
- Division of Clinical Microbiology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - David Boxrud
- Division of Clinical Microbiology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Kirk Smith
- Division of Clinical Microbiology, Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, USA
| | - Craig T Parker
- United States Department of Agriculture, Albany, CA, USA
| | - Gianrico Farrugia
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA
| | - Qijing Zhang
- Department of Veterinary Microbiology and Preventive Medicine, Iowa State University, Ames, IA, USA
| | - Samuel K Sheppard
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, UK.
| | - Madhusudan Grover
- Department of Gastroenterology and Hepatology, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
25
|
Mehat JW, van Vliet AHM, La Ragione RM. The Avian Pathogenic Escherichia coli (APEC) pathotype is comprised of multiple distinct, independent genotypes. Avian Pathol 2021; 50:402-416. [PMID: 34047644 DOI: 10.1080/03079457.2021.1915960] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Avian Pathogenic E. coli (APEC) is the causative agent of avian colibacillosis, resulting in economic losses to the poultry industry through morbidity, mortality and carcass condemnation, and impacts the welfare of poultry. Colibacillosis remains a complex disease to manage, hampered by diagnostic and classification strategies for E. coli that are inadequate for defining APEC. However, increased accessibility of whole genome sequencing (WGS) technology has enabled phylogenetic approaches to be applied to the classification of E. coli and genomic characterization of the most common APEC serotypes associated with colibacillosis O1, O2 and O78. These approaches have demonstrated that the O78 serotype is representative of two distinct APEC lineages, ST-23 in phylogroup C and ST-117 in phylogroup G. The O1 and O2 serotypes belong to a third lineage comprised of three sub-populations in phylogroup B2; ST-95, ST-140 and ST-428/ST-429. The frequency with which these genotypes are associated with colibacillosis implicates them as the predominant APEC populations and distinct from those causing incidental or opportunistic infections. The fact that these are disparate clusters from multiple phylogroups suggests that these lineages may have become adapted to the poultry niche independently. WGS studies have highlighted the limitations of traditional APEC classification and can now provide a path towards a robust and more meaningful definition of the APEC pathotype. Future studies should focus on characterizing individual APEC populations in detail and using this information to develop improved diagnostics and interventions.
Collapse
Affiliation(s)
- Jai W Mehat
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| | - Arnoud H M van Vliet
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| | - Roberto M La Ragione
- Department of Pathology and Infectious Diseases, School of Veterinary Medicine, Faculty of Health and Medical Sciences, University of Surrey, Guildford, UK
| |
Collapse
|
26
|
Stevens MP, Kingsley RA. Salmonella pathogenesis and host-adaptation in farmed animals. Curr Opin Microbiol 2021; 63:52-58. [PMID: 34175673 DOI: 10.1016/j.mib.2021.05.013] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 05/28/2021] [Indexed: 10/21/2022]
Abstract
Salmonella is an animal and zoonotic pathogen of global importance. Depending on pathogen and host factors, infections can be asymptomatic or involve acute gastroenteritis or invasive disease. Genomic signatures associated with host-range, tissue tropism or differential virulence of Salmonella enterica serovars, and their variants, have emerged. In turn, it is becoming feasible to predict invasive potential, host-adaptation and zoonotic risk of Salmonella from sequence data to improve outbreak investigation, risk assessment and control strategies. Functional annotation of Salmonella genomes has accelerated with the screening of high-density mutant libraries, revealing host-specific, niche-specific and serovar-specific virulence factors. As natural hosts and reservoirs, farmed animals provide powerful insights into host-adaptation and pathogenesis of Salmonella not always evident from surrogate rodent or cell-based models.
Collapse
Affiliation(s)
- Mark P Stevens
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Easter Bush, Midlothian, EH25 9RG, United Kingdom.
| | - Robert A Kingsley
- Quadram Institute Bioscience, Norwich Research Park, NR4 7UQ, United Kingdom; School of Biological Science, University of East Anglia, Norwich, NR4 7EA, United Kingdom.
| |
Collapse
|
27
|
Pavlovikj N, Gomes-Neto JC, Deogun JS, Benson AK. ProkEvo: an automated, reproducible, and scalable framework for high-throughput bacterial population genomics analyses. PeerJ 2021; 9:e11376. [PMID: 34055480 PMCID: PMC8142932 DOI: 10.7717/peerj.11376] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 04/08/2021] [Indexed: 12/28/2022] Open
Abstract
Whole Genome Sequence (WGS) data from bacterial species is used for a variety of applications ranging from basic microbiological research, diagnostics, and epidemiological surveillance. The availability of WGS data from hundreds of thousands of individual isolates of individual microbial species poses a tremendous opportunity for discovery and hypothesis-generating research into ecology and evolution of these microorganisms. Flexibility, scalability, and user-friendliness of existing pipelines for population-scale inquiry, however, limit applications of systematic, population-scale approaches. Here, we present ProkEvo, an automated, scalable, reproducible, and open-source framework for bacterial population genomics analyses using WGS data. ProkEvo was specifically developed to achieve the following goals: (1) Automation and scaling of complex combinations of computational analyses for many thousands of bacterial genomes from inputs of raw Illumina paired-end sequence reads; (2) Use of workflow management systems (WMS) such as Pegasus WMS to ensure reproducibility, scalability, modularity, fault-tolerance, and robust file management throughout the process; (3) Use of high-performance and high-throughput computational platforms; (4) Generation of hierarchical-based population structure analysis based on combinations of multi-locus and Bayesian statistical approaches for classification for ecological and epidemiological inquiries; (5) Association of antimicrobial resistance (AMR) genes, putative virulence factors, and plasmids from curated databases with the hierarchically-related genotypic classifications; and (6) Production of pan-genome annotations and data compilation that can be utilized for downstream analysis such as identification of population-specific genomic signatures. The scalability of ProkEvo was measured with two datasets comprising significantly different numbers of input genomes (one with ~2,400 genomes, and the second with ~23,000 genomes). Depending on the dataset and the computational platform used, the running time of ProkEvo varied from ~3-26 days. ProkEvo can be used with virtually any bacterial species, and the Pegasus WMS uniquely facilitates addition or removal of programs from the workflow or modification of options within them. To demonstrate versatility of the ProkEvo platform, we performed a hierarchical-based population structure analyses from available genomes of three distinct pathogenic bacterial species as individual case studies. The specific case studies illustrate how hierarchical analyses of population structures, genotype frequencies, and distribution of specific gene functions can be integrated into an analysis. Collectively, our study shows that ProkEvo presents a practical viable option for scalable, automated analyses of bacterial populations with direct applications for basic microbiology research, clinical microbiological diagnostics, and epidemiological surveillance.
Collapse
Affiliation(s)
- Natasha Pavlovikj
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Joao Carlos Gomes-Neto
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Jitender S Deogun
- Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| | - Andrew K Benson
- Department of Food Science and Technology, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America.,Nebraska Food for Health Center, University of Nebraska-Lincoln, Lincoln, Nebraska, United States of America
| |
Collapse
|
28
|
|
29
|
Machine learning approach to support taxonomic species discrimination based on helminth collections data. Parasit Vectors 2021; 14:230. [PMID: 33933139 PMCID: PMC8088700 DOI: 10.1186/s13071-021-04721-6] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Accepted: 04/07/2021] [Indexed: 11/10/2022] Open
Abstract
Background There are more than 300 species of capillariids that parasitize various vertebrate groups worldwide. Species identification is hindered because of the few taxonomically informative structures available, making the task laborious and genus definition controversial. Thus, its taxonomy is one of the most complex among Nematoda. Eggs are the parasitic structures most viewed in coprological analysis in both modern and ancient samples; consequently, their presence is indicative of positive diagnosis for infection. The structure of the egg could play a role in genera or species discrimination. Institutional biological collections are taxonomic repositories of specimens described and strictly identified by systematics specialists. Methods The present work aims to characterize eggs of capillariid species deposited in institutional helminth collections and to process the morphological, morphometric and ecological data using machine learning (ML) as a new approach for taxonomic identification. Specimens of 28 species and 8 genera deposited at Coleção Helmintológica do Instituto Oswaldo Cruz (CHIOC, IOC/FIOCRUZ/Brazil) and Collection de Nématodes Zooparasites du Muséum National d’Histoire Naturelle de Paris (MNHN/France) were examined under light microscopy. In the morphological and morphometric analyses (MM), the total length and width of eggs as well as plugs and shell thickness were considered. In addition, eggshell ornamentations and ecological parameters of the geographical location (GL) and host (H) were included. Results The performance of the logistic model tree (LMT) algorithm showed the highest values in all metrics compared with the other algorithms. Algorithm J48 produced the most reliable decision tree for species identification alongside REPTree. The Majority Voting algorithm showed high metric values, but the combined classifiers did not attenuate the errors revealed in each algorithm alone. The statistical evaluation of the dataset indicated a significant difference between trees, with GL + H + MM and MM only with the best scores. Conclusions The present research proposed a novel procedure for taxonomic species identification, integrating data from centenary biological collections and the logic of artificial intelligence techniques. This study will support future research on taxonomic identification and diagnosis of both modern and archaeological capillariids. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1186/s13071-021-04721-6.
Collapse
|
30
|
Pulford CV, Perez-Sepulveda BM, Canals R, Bevington JA, Bengtsson RJ, Wenner N, Rodwell EV, Kumwenda B, Zhu X, Bennett RJ, Stenhouse GE, Malaka De Silva P, Webster HJ, Bengoechea JA, Dumigan A, Tran-Dien A, Prakash R, Banda HC, Alufandika L, Mautanga MP, Bowers-Barnard A, Beliavskaia AY, Predeus AV, Rowe WPM, Darby AC, Hall N, Weill FX, Gordon MA, Feasey NA, Baker KS, Hinton JCD. Stepwise evolution of Salmonella Typhimurium ST313 causing bloodstream infection in Africa. Nat Microbiol 2021; 6:327-338. [PMID: 33349664 PMCID: PMC8018540 DOI: 10.1038/s41564-020-00836-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 11/20/2020] [Indexed: 02/07/2023]
Abstract
Bloodstream infections caused by nontyphoidal Salmonella are a major public health concern in Africa, causing ~49,600 deaths every year. The most common Salmonella enterica pathovariant associated with invasive nontyphoidal Salmonella disease is Salmonella Typhimurium sequence type (ST)313. It has been proposed that antimicrobial resistance and genome degradation has contributed to the success of ST313 lineages in Africa, but the evolutionary trajectory of such changes was unclear. Here, to define the evolutionary dynamics of ST313, we sub-sampled from two comprehensive collections of Salmonella isolates from African patients with bloodstream infections, spanning 1966 to 2018. The resulting 680 genome sequences led to the discovery of a pan-susceptible ST313 lineage (ST313 L3), which emerged in Malawi in 2016 and is closely related to ST313 variants that cause gastrointestinal disease in the United Kingdom and Brazil. Genomic analysis revealed degradation events in important virulence genes in ST313 L3, which had not occurred in other ST313 lineages. Despite arising only recently in the clinic, ST313 L3 is a phylogenetic intermediate between ST313 L1 and L2, with a characteristic accessory genome. Our in-depth genotypic and phenotypic characterization identifies the crucial loss-of-function genetic events that occurred during the stepwise evolution of invasive S. Typhimurium across Africa.
Collapse
Affiliation(s)
- Caisey V Pulford
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Blanca M Perez-Sepulveda
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Rocío Canals
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Jessica A Bevington
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Rebecca J Bengtsson
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Nicolas Wenner
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Ella V Rodwell
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | | | - Xiaojun Zhu
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Rebecca J Bennett
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - George E Stenhouse
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - P Malaka De Silva
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Hermione J Webster
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Jose A Bengoechea
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, UK
| | - Amy Dumigan
- Wellcome-Wolfson Institute for Experimental Medicine, Queen's University Belfast, Belfast, UK
| | - Alicia Tran-Dien
- Institut Pasteur, Unité des Bactéries Pathogènes Entériques, Paris, France
| | - Reenesh Prakash
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Happy C Banda
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Lovemore Alufandika
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Mike P Mautanga
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Arthur Bowers-Barnard
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Alexandra Y Beliavskaia
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Alexander V Predeus
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Will P M Rowe
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Alistair C Darby
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Neil Hall
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | - Melita A Gordon
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Nicholas A Feasey
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, Blantyre, Malawi
| | - Kate S Baker
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK
| | - Jay C D Hinton
- Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK.
| |
Collapse
|
31
|
Kubicek-Sutherland JZ, Xie G, Shakya M, Dighe PK, Jacobs LL, Daligault H, Davenport K, Stromberg LR, Stromberg ZR, Cheng Q, Kempaiah P, Ong’echa JM, Otieno V, Raballah E, Anyona S, Ouma C, Chain PSG, Perkins DJ, Mukundan H, McMahon BH, Doggett NA. Comparative genomic and phenotypic characterization of invasive non-typhoidal Salmonella isolates from Siaya, Kenya. PLoS Negl Trop Dis 2021; 15:e0008991. [PMID: 33524010 PMCID: PMC7877762 DOI: 10.1371/journal.pntd.0008991] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Revised: 02/11/2021] [Accepted: 11/17/2020] [Indexed: 02/07/2023] Open
Abstract
Non-typhoidal Salmonella (NTS) is a major global health concern that often causes bloodstream infections in areas of the world affected by malnutrition and comorbidities such as HIV and malaria. Developing a strategy to control the emergence and spread of highly invasive and antimicrobial resistant NTS isolates requires a comprehensive analysis of epidemiological factors and molecular pathogenesis. Here, we characterize 11 NTS isolates that caused bloodstream infections in pediatric patients in Siaya, Kenya from 2003-2010. Nine isolates were identified as S. Typhimurium sequence type 313 while the other two were S. Enteritidis. Comprehensive genotypic and phenotypic analyses were performed to compare these isolates to those previously identified in sub-Saharan Africa. We identified a S. Typhimurium isolate referred to as UGA14 that displayed novel plasmid, pseudogene and resistance features as compared to other isolates reported from Africa. Notably, UGA14 is able to ferment both lactose and sucrose due to the acquisition of insertion elements on the pKST313 plasmid. These findings show for the first time the co-evolution of plasmid-mediated lactose and sucrose metabolism along with cephalosporin resistance in NTS further elucidating the evolutionary mechanisms of invasive NTS phenotypes. These results further support the use of combined genomic and phenotypic approaches to detect and characterize atypical NTS isolates in order to advance biosurveillance efforts that inform countermeasures aimed at controlling invasive and antimicrobial resistant NTS.
Collapse
Affiliation(s)
| | - Gary Xie
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| | - Migun Shakya
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| | - Priya K. Dighe
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| | - Lindsey L. Jacobs
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| | | | - Karen Davenport
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| | | | | | - Qiuying Cheng
- Center for Global Health, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - Prakasha Kempaiah
- Center for Global Health, University of New Mexico, Albuquerque, New Mexico, United States of America
| | - John Michael Ong’echa
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
| | - Vincent Otieno
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
| | - Evans Raballah
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
- Department of Medical Laboratory Science, School of Public Health, Biomedical Sciences and Technology, Masinde Muliro University of Science and Technology, Kakamega, Kenya
| | - Samuel Anyona
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
- Department of Medical Biochemistry, School of Medicine, Maseno University, Maseno, Kenya
| | - Collins Ouma
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
- Department of Biomedical Sciences and Technology, School of Public Health and Community Development, Maseno University, Maseno, Kenya
| | | | - Douglas J. Perkins
- Center for Global Health, University of New Mexico, Albuquerque, New Mexico, United States of America
- University of New Mexico/KEMRI Laboratories of Parasitic and Viral Diseases, Kenya Medical Research Institute, Kisumu, Kenya
| | - Harshini Mukundan
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
- * E-mail:
| | | | - Norman A. Doggett
- Los Alamos National Laboratory, Los Alamos, New Mexico, United States
| |
Collapse
|
32
|
Abstract
Food safety continues to threaten public health. Machine learning holds potential in leveraging large, emerging data sets to improve the safety of the food supply and mitigate the impact of food safety incidents. Foodborne pathogen genomes and novel data streams, including text, transactional, and trade data, have seen emerging applications enabled by a machine learning approach, such as prediction of antibiotic resistance, source attribution of pathogens, and foodborne outbreak detection and risk assessment. In this article, we provide a gentle introduction to machine learning in the context of food safety and an overview of recent developments and applications. With many of these applications still in their nascence, general and domain-specific pitfalls and challenges associated with machine learning have begun to be recognized and addressed, which are critical to prospective use and future deployment of large data sets and their associated machine learning models for food safety applications.
Collapse
Affiliation(s)
- Xiangyu Deng
- Center for Food Safety, University of Georgia, Griffin, Georgia 30223, USA;
| | - Shuhao Cao
- Department of Mathematics and Statistics, Washington University, St. Louis, Missouri 63105, USA;
| | - Abigail L Horn
- Department of Preventive Medicine, University of Southern California, Los Angeles, California 90032, USA;
| |
Collapse
|
33
|
Machine learning and statistics to qualify environments through multi-traits in Coffea arabica. PLoS One 2021; 16:e0245298. [PMID: 33434204 PMCID: PMC7802962 DOI: 10.1371/journal.pone.0245298] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/25/2020] [Indexed: 11/30/2022] Open
Abstract
Several factors such as genotype, environment, and post-harvest processing can affect the responses of important traits in the coffee production chain. Determining the influence of these factors is of great relevance, as they can be indicators of the characteristics of the coffee produced. The most efficient models choice to be applied should take into account the variety of information and the particularities of each biological material. This study was developed to evaluate statistical and machine learning models that would better discriminate environments through multi-traits of coffee genotypes and identify the main agronomic and beverage quality traits responsible for the variation of the environments. For that, 31 morpho-agronomic and post-harvest traits were evaluated, from field experiments installed in three municipalities in the Matas de Minas region, in the State of Minas Gerais, Brazil. Two types of post-harvest processing were evaluated: natural and pulped. The apparent error rate was estimated for each method. The Multilayer Perceptron and Radial Basis Function networks were able to discriminate the coffee samples in multi-environment more efficiently than the other methods, identifying differences in multi-traits responses according to the production sites and type of post-harvest processing. The local factors did not present specific traits that favored the severity of diseases and differentiated vegetative vigor. Sensory traits acidity and fragrance/aroma score also made little contribution to the discrimination process, indicating that acidity and fragrance/aroma are characteristic of coffee produced and all coffee samples evaluated are of the special type in the Mata of Minas region. The main traits responsible for the differentiation of production sites are plant height, fruit size, and bean production. The sensory trait "Body" is the main one to discriminate the form of post-harvest processing.
Collapse
|
34
|
Abstract
A balanced gut microbiota contributes to health, but the mechanisms maintaining homeostasis remain elusive. Microbiota assembly during infancy is governed by competition between species and by environmental factors, termed habitat filters, that determine the range of successful traits within the microbial community. These habitat filters include the diet, host-derived resources, and microbiota-derived metabolites, such as short-chain fatty acids. Once the microbiota has matured, competition and habitat filtering prevent engraftment of new microbes, thereby providing protection against opportunistic infections. Competition with endogenous Enterobacterales, habitat filtering by short-chain fatty acids, and a host-derived habitat filter, epithelial hypoxia, also contribute to colonization resistance against Salmonella serovars. However, at a high challenge dose, these frank pathogens can overcome colonization resistance by using their virulence factors to trigger intestinal inflammation. In turn, inflammation increases the luminal availability of host-derived resources, such as oxygen, nitrate, tetrathionate, and lactate, thereby creating a state of abnormal habitat filtering that enables the pathogen to overcome growth inhibition by short-chain fatty acids. Thus, studying the process of ecosystem invasion by Salmonella serovars clarifies that colonization resistance can become weakened by disrupting host-mediated habitat filtering. This insight is relevant for understanding how inflammation triggers dysbiosis linked to noncommunicable diseases, conditions in which endogenous Enterobacterales expand in the fecal microbiota using some of the same growth-limiting resources required by Salmonella serovars for ecosystem invasion. In essence, ecosystem invasion by Salmonella serovars suggests that homeostasis and dysbiosis simply represent states where competition and habitat filtering are normal or abnormal, respectively.
Collapse
|
35
|
Higdon SM, Huang BC, Bennett AB, Weimer BC. Identification of Nitrogen Fixation Genes in Lactococcus Isolated from Maize Using Population Genomics and Machine Learning. Microorganisms 2020; 8:microorganisms8122043. [PMID: 33419343 PMCID: PMC7768417 DOI: 10.3390/microorganisms8122043] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Revised: 12/08/2020] [Accepted: 12/17/2020] [Indexed: 02/06/2023] Open
Abstract
Sierra Mixe maize is a landrace variety from Oaxaca, Mexico, that utilizes nitrogen derived from the atmosphere via an undefined nitrogen fixation mechanism. The diazotrophic microbiota associated with the plant’s mucilaginous aerial root exudate composed of complex carbohydrates was previously identified and characterized by our group where we found 23 lactococci capable of biological nitrogen fixation (BNF) without containing any of the proposed essential genes for this trait (nifHDKENB). To determine the genes in Lactococcus associated with this phenotype, we selected 70 lactococci from the dairy industry that are not known to be diazotrophic to conduct a comparative population genomic analysis. This showed that the diazotrophic lactococcal genomes were distinctly different from the dairy isolates. Examining the pangenome followed by genome-wide association study and machine learning identified genes with the functions needed for BNF in the maize isolates that were absent from the dairy isolates. Many of the putative genes received an ‘unknown’ annotation, which led to the domain analysis of the 135 homologs. This revealed genes with molecular functions needed for BNF, including mucilage carbohydrate catabolism, glycan-mediated host adhesion, iron/siderophore utilization, and oxidation/reduction control. This is the first report of this pathway in this organism to underpin BNF. Consequently, we proposed a model needed for BNF in lactococci that plausibly accounts for BNF in the absence of the nif operon in this organism.
Collapse
Affiliation(s)
- Shawn M. Higdon
- Department of Plant Sciences, University of California, Davis, CA 95616, USA; (S.M.H.); (A.B.B.)
| | - Bihua C. Huang
- Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA 95616, USA;
- 100 K Pathogen Genome Project, University of California, Davis, CA 95616, USA
| | - Alan B. Bennett
- Department of Plant Sciences, University of California, Davis, CA 95616, USA; (S.M.H.); (A.B.B.)
| | - Bart C. Weimer
- Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA 95616, USA;
- 100 K Pathogen Genome Project, University of California, Davis, CA 95616, USA
- Correspondence:
| |
Collapse
|
36
|
Genetic Variation and Preliminary Indications of Divergent Niche Adaptation in Cryptic Clade II of Escherichia. Microorganisms 2020; 8:microorganisms8111713. [PMID: 33142902 PMCID: PMC7716201 DOI: 10.3390/microorganisms8111713] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 10/24/2020] [Accepted: 10/30/2020] [Indexed: 12/03/2022] Open
Abstract
The evolution, habitat, and lifestyle of the cryptic clade II of Escherichia, which were first recovered at low frequency from non-human hosts and later from external environments, were poorly understood. Here, the genomes of selected strains were analyzed for preliminary indications of ecological differentiation within their population. We adopted the delta bitscore metrics to detect functional divergence of their orthologous genes and trained a random forest classifier to differentiate the genomes according to habitats (gastrointestinal vs external environment). Model was built with inclusion of other Escherichia genomes previously demonstrated to have exhibited genomic traits of adaptation to one of the habitats. Overall, gene degradation was more prominent in the gastrointestinal strains. The trained model correctly classified the genomes, identifying a set of predictor genes that were informative of habitat association. Functional divergence in many of these genes were reflective of ecological divergence. Accuracy of the trained model was confirmed by its correct prediction of the habitats of an independent set of strains with known habitat association. In summary, the cryptic clade II of Escherichia displayed genomic signatures that are consistent with divergent adaptation to gastrointestinal and external environments.
Collapse
|
37
|
Abstract
Variation in the genome of Pseudomonas aeruginosa, an important pathogen, can have dramatic impacts on the bacterium's ability to cause disease. We therefore asked whether it was possible to predict the virulence of P. aeruginosa isolates based on their genomic content. We applied a machine learning approach to a genetically and phenotypically diverse collection of 115 clinical P. aeruginosa isolates using genomic information and corresponding virulence phenotypes in a mouse model of bacteremia. We defined the accessory genome of these isolates through the presence or absence of accessory genomic elements (AGEs), sequences present in some strains but not others. Machine learning models trained using AGEs were predictive of virulence, with a mean nested cross-validation accuracy of 75% using the random forest algorithm. However, individual AGEs did not have a large influence on the algorithm's performance, suggesting instead that virulence predictions are derived from a diffuse genomic signature. These results were validated with an independent test set of 25 P. aeruginosa isolates whose virulence was predicted with 72% accuracy. Machine learning models trained using core genome single-nucleotide variants and whole-genome k-mers also predicted virulence. Our findings are a proof of concept for the use of bacterial genomes to predict pathogenicity in P. aeruginosa and highlight the potential of this approach for predicting patient outcomes.IMPORTANCE Pseudomonas aeruginosa is a clinically important Gram-negative opportunistic pathogen. P. aeruginosa shows a large degree of genomic heterogeneity both through variation in sequences found throughout the species (core genome) and through the presence or absence of sequences in different isolates (accessory genome). P. aeruginosa isolates also differ markedly in their ability to cause disease. In this study, we used machine learning to predict the virulence level of P. aeruginosa isolates in a mouse bacteremia model based on genomic content. We show that both the accessory and core genomes are predictive of virulence. This study provides a machine learning framework to investigate relationships between bacterial genomes and complex phenotypes such as virulence.
Collapse
|
38
|
Agany DD, Pietri JE, Gnimpieba EZ. Assessment of vector-host-pathogen relationships using data mining and machine learning. Comput Struct Biotechnol J 2020; 18:1704-1721. [PMID: 32670510 PMCID: PMC7340972 DOI: 10.1016/j.csbj.2020.06.031] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2020] [Revised: 06/19/2020] [Accepted: 06/19/2020] [Indexed: 12/15/2022] Open
Abstract
Infectious diseases, including vector-borne diseases transmitted by arthropods, are a leading cause of morbidity and mortality worldwide. In the era of big data, addressing broad-scale, fundamental questions regarding the complex dynamics of these diseases will increasingly require the integration of diverse datasets to produce new biological knowledge. This review provides a current snapshot of the systematic assessment of the relationships between microbial pathogens, arthropod vectors and mammalian hosts using data mining and machine learning. We employ PRISMA to identify 32 key papers relevant to this topic. Our analysis shows an increasing use of data mining and machine learning tasks and techniques, including prediction, classification, clustering, association rules mining, and deep learning, over the last decade. However, it also reveals a number of critical challenges in applying these to the study of vector-host-pathogen interactions at various systems biology levels. Here, relevant studies, current limitations and future directions are discussed. Furthermore, the quality of data in relevant papers was assessed using the FAIR (Findable, Accessible, Interoperable, Reusable) compliance criteria to evaluate and encourage reproducibility and shareability of research outcomes. Although shortcomings in their application remain, data mining and machine learning have significant potential to break new ground in understanding fundamental aspects of vector-host-pathogen relationships and their application in this field should be encouraged. In particular, while predictive modeling, feature engineering and supervised machine learning are already being used in the field, other data mining and machine learning methods such as deep learning and association rules analysis lag behind and should be implemented in combination with established methods to accelerate hypothesis and knowledge generation in the domain.
Collapse
Affiliation(s)
- Diing D.M. Agany
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| | - Jose E. Pietri
- University of South Dakota, Sanford School of Medicine, Division of Basic Biomedical Sciences, Vermillion, SD, United States
| | - Etienne Z. Gnimpieba
- University of South Dakota, Biomedical Engineering Program, Sioux Falls, SD, United States
- 2DBEST (2-Dimensional Materials for Biofilm Engineering, Science and Technology), United States
| |
Collapse
|
39
|
Bawn M, Alikhan NF, Thilliez G, Kirkwood M, Wheeler NE, Petrovska L, Dallman TJ, Adriaenssens EM, Hall N, Kingsley RA. Evolution of Salmonella enterica serotype Typhimurium driven by anthropogenic selection and niche adaptation. PLoS Genet 2020; 16:e1008850. [PMID: 32511244 PMCID: PMC7302871 DOI: 10.1371/journal.pgen.1008850] [Citation(s) in RCA: 39] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2020] [Revised: 06/18/2020] [Accepted: 05/12/2020] [Indexed: 12/25/2022] Open
Abstract
Salmonella enterica serotype Typhimurium (S. Typhimurium) is a leading cause of gastroenteritis and bacteraemia worldwide, and a model organism for the study of host-pathogen interactions. Two S. Typhimurium strains (SL1344 and ATCC14028) are widely used to study host-pathogen interactions, yet genotypic variation results in strains with diverse host range, pathogenicity and risk to food safety. The population structure of diverse strains of S. Typhimurium revealed a major phylogroup of predominantly sequence type 19 (ST19) and a minor phylogroup of ST36. The major phylogroup had a population structure with two high order clades (α and β) and multiple subclades on extended internal branches, that exhibited distinct signatures of host adaptation and anthropogenic selection. Clade α contained a number of subclades composed of strains from well characterized epidemics in domesticated animals, while clade β contained multiple subclades associated with wild avian species. The contrasting epidemiology of strains in clade α and β was reflected by the distinct distribution of antimicrobial resistance (AMR) genes, accumulation of hypothetically disrupted coding sequences (HDCS), and signatures of functional diversification. These observations were consistent with elevated anthropogenic selection of clade α lineages from adaptation to circulation in populations of domesticated livestock, and the predisposition of clade β lineages to undergo adaptation to an invasive lifestyle by a process of convergent evolution with of host adapted Salmonella serotypes. Gene flux was predominantly driven by acquisition and recombination of prophage and associated cargo genes, with only occasional loss of these elements. The acquisition of large chromosomally-encoded genetic islands was limited, but notably, a feature of two recent pandemic clones (DT104 and monophasic S. Typhimurium ST34) of clade α (SGI-1 and SGI-4).
Collapse
Affiliation(s)
- Matt Bawn
- Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | | | - Gaëtan Thilliez
- Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom
| | - Mark Kirkwood
- Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom
| | - Nicole E. Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Cambridge, United Kingdom
| | | | - Timothy J. Dallman
- Gastrointestinal Bacteria Reference Unit, National Infection Service, Public Health England, London, United Kingdom
| | | | - Neil Hall
- Earlham Institute, Norwich Research Park, Norwich, United Kingdom
| | - Robert A. Kingsley
- Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom
- University of East Anglia, Norwich, United Kingdom
| |
Collapse
|
40
|
Abstract
Rhinology studies anatomy, physiology and diseases affecting the nasal region: one of the most modern techniques to diagnose these diseases is nasal cytology or rhinocytology, which involves analyzing the cells contained in the nasal mucosa under a microscope and researching of other elements such as bacteria, to suspect a pathology. During the microscopic observation, bacteria can be detected in the form of biofilm, that is, a bacterial colony surrounded by an organic extracellular matrix, with a protective function, made of polysaccharides. In the field of nasal cytology, the presence of biofilm in microscopic samples denotes the presence of an infection. In this paper, we describe the design and testing of interesting diagnostic support, for the automatic detection of biofilm, based on a convolutional neural network (CNN). To demonstrate the reliability of the system, alternative solutions based on isolation forest and deep random forest techniques were also tested. Texture analysis is used, with Haralick feature extraction and dominant color. The CNN-based biofilm detection system shows an accuracy of about 98%, an average accuracy of about 100% on the test set and about 99% on the validation set. The CNN-based system designed in this study is confirmed as the most reliable among the best automatic image recognition technologies, in the specific context of this study. The developed system allows the specialist to obtain a rapid and accurate identification of the biofilm in the slide images.
Collapse
|
41
|
Nair S, Fookes M, Corton C, Thomson NR, Wain J, Langridge GC. Genetic Markers in S. Paratyphi C Reveal Primary Adaptation to Pigs. Microorganisms 2020; 8:microorganisms8050657. [PMID: 32365926 PMCID: PMC7285187 DOI: 10.3390/microorganisms8050657] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Revised: 04/24/2020] [Accepted: 04/27/2020] [Indexed: 12/24/2022] Open
Abstract
Salmonella enterica with the identical antigenic formula 6,7:c:1,5 can be differentiated biochemically and by disease syndrome. One grouping, Salmonella Paratyphi C, is currently considered a typhoidal serovar, responsible for enteric fever in humans. The human-restricted typhoidal serovars (S. Typhi and Paratyphi A, B and C) typically display high levels of genome degradation and are cited as an example of convergent evolution for host adaptation in humans. However, S. Paratyphi C presents a different clinical picture to S. Typhi/Paratyphi A, in a patient group with predisposition, raising the possibility that its natural history is different, and that infection is invasive salmonellosis rather than enteric fever. Using whole genome sequencing and metabolic pathway analysis, we compared the genomes of 17 S. Paratyphi C strains to other members of the 6,7:c:1,5 group and to two typhoidal serovars: S. Typhi and Paratyphi A. The genome degradation observed in S. Paratyphi C was much lower than S. Typhi/Paratyphi A, but similar to the other 6,7:c:1,5 strains. Genomic and metabolic comparisons revealed little to no overlap between S. Paratyphi C and the other typhoidal serovars, arguing against convergent evolution and instead providing evidence of a primary adaptation to pigs in accordance with the 6,7:c:1.5 strains.
Collapse
Affiliation(s)
- Satheesh Nair
- Gastrointestinal Bacteria Reference Unit, Public Health England, Colindale, London NW9 5EQ, UK;
| | - Maria Fookes
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; (M.F.); (C.C.); (N.R.T.)
| | - Craig Corton
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; (M.F.); (C.C.); (N.R.T.)
| | - Nicholas R. Thomson
- Wellcome Trust Sanger Institute, Hinxton, Cambridge CB10 1SA, UK; (M.F.); (C.C.); (N.R.T.)
| | - John Wain
- Norwich Medical School, University of East Anglia, Norwich NR4 7UQ, UK
- Microbes in the Food Chain, Quadram Institute, Norwich Research Park, Norwich NR4 7UQ, UK;
- Correspondence:
| | - Gemma C. Langridge
- Microbes in the Food Chain, Quadram Institute, Norwich Research Park, Norwich NR4 7UQ, UK;
| |
Collapse
|
42
|
Lupolova N, Lycett SJ, Gally DL. A guide to machine learning for bacterial host attribution using genome sequence data. Microb Genom 2020; 5. [PMID: 31778355 PMCID: PMC6939162 DOI: 10.1099/mgen.0.000317] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With the ever-expanding number of available sequences from bacterial genomes, and the expectation that this data type will be the primary one generated from both diagnostic and research laboratories for the foreseeable future, then there is both an opportunity and a need to evaluate how effectively computational approaches can be used within bacterial genomics to predict and understand complex phenotypes, such as pathogenic potential and host source. This article applied various quantitative methods such as diversity indexes, pangenome-wide association studies (GWAS) and dimensionality reduction techniques to better understand the data and then compared how well unsupervised and supervised machine learning (ML) methods could predict the source host of the isolates. The study uses the example of the pangenomes of 1203 Salmonella enterica serovar Typhimurium isolates in order to predict 'host of isolation' using these different methods. The article is aimed as a review of recent applications of ML in infection biology, but also, by working through this specific dataset, it allows discussion of the advantages and drawbacks of the different techniques. As with all such sub-population studies, the biological relevance will be dependent on the quality and diversity of the input data. Given this major caveat, we show that supervised ML has the potential to add real value to interpretation of bacterial genomic data, as it can provide probabilistic outcomes for important phenotypes, something that is very difficult to achieve with the other methods.
Collapse
Affiliation(s)
- Nadejda Lupolova
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - Samantha J Lycett
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| | - David L Gally
- Division of Infection and Immunity, The Roslin Institute, University of Edinburgh, Easter Bush Campus, Edinburgh, EH25 9RG, UK
| |
Collapse
|
43
|
Khaledi A, Weimann A, Schniederjans M, Asgari E, Kuo TH, Oliver A, Cabot G, Kola A, Gastmeier P, Hogardt M, Jonas D, Mofrad MR, Bremges A, McHardy AC, Häussler S. Predicting antimicrobial resistance in Pseudomonas aeruginosa with machine learning-enabled molecular diagnostics. EMBO Mol Med 2020; 12:e10264. [PMID: 32048461 PMCID: PMC7059009 DOI: 10.15252/emmm.201910264] [Citation(s) in RCA: 90] [Impact Index Per Article: 22.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2019] [Revised: 12/24/2019] [Accepted: 01/09/2020] [Indexed: 12/20/2022] Open
Abstract
Limited therapy options due to antibiotic resistance underscore the need for optimization of current diagnostics. In some bacterial species, antimicrobial resistance can be unambiguously predicted based on their genome sequence. In this study, we sequenced the genomes and transcriptomes of 414 drug‐resistant clinical Pseudomonas aeruginosa isolates. By training machine learning classifiers on information about the presence or absence of genes, their sequence variation, and expression profiles, we generated predictive models and identified biomarkers of resistance to four commonly administered antimicrobial drugs. Using these data types alone or in combination resulted in high (0.8–0.9) or very high (> 0.9) sensitivity and predictive values. For all drugs except for ciprofloxacin, gene expression information improved diagnostic performance. Our results pave the way for the development of a molecular resistance profiling tool that reliably predicts antimicrobial susceptibility based on genomic and transcriptomic markers. The implementation of a molecular susceptibility test system in routine microbiology diagnostics holds promise to provide earlier and more detailed information on antibiotic resistance profiles of bacterial pathogens and thus could change how physicians treat bacterial infections.
Collapse
Affiliation(s)
- Ariane Khaledi
- Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany
| | - Aaron Weimann
- Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany.,Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Monika Schniederjans
- Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Antonio Oliver
- Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain
| | - Gabriel Cabot
- Servicio de Microbiología y Unidad de Investigación Hospital Universitario Son Espases, Instituto de Investigación Sanitaria Illes Balears (IdISPa), Palma de Mallorca, Spain
| | - Axel Kola
- Institute of Hygiene and Environmental Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Petra Gastmeier
- Institute of Hygiene and Environmental Medicine, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - Michael Hogardt
- Institute of Medical Microbiology and Infection Control, University Hospital Frankfurt, Frankfurt/Main, Germany
| | - Daniel Jonas
- Faculty of Medicine, Institute for Infection Prevention and Hospital Epidemiology, Medical Center-University of Freiburg, Freiburg, Germany
| | - Mohammad Rk Mofrad
- Molecular Cell Biomechanics Laboratory, Departments of Bioengineering and Mechanical Engineering, University of California, Berkeley, CA, USA.,Molecular Biophysics and Integrated Bioimaging Division, Lawrence Berkeley National Lab, Berkeley, CA, USA
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Braunschweig, Germany
| | - Susanne Häussler
- Department of Molecular Bacteriology, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Molecular Bacteriology Group, TWINCORE-Centre for Experimental and Clinical Infection Research, Hannover, Germany
| |
Collapse
|
44
|
Van Puyvelde S, Pickard D, Vandelannoote K, Heinz E, Barbé B, de Block T, Clare S, Coomber EL, Harcourt K, Sridhar S, Lees EA, Wheeler NE, Klemm EJ, Kuijpers L, Mbuyi Kalonji L, Phoba MF, Falay D, Ngbonda D, Lunguya O, Jacobs J, Dougan G, Deborggraeve S. An African Salmonella Typhimurium ST313 sublineage with extensive drug-resistance and signatures of host adaptation. Nat Commun 2019; 10:4280. [PMID: 31537784 PMCID: PMC6753159 DOI: 10.1038/s41467-019-11844-z] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2019] [Accepted: 08/07/2019] [Indexed: 12/22/2022] Open
Abstract
Bloodstream infections by Salmonella enterica serovar Typhimurium constitute a major health burden in sub-Saharan Africa (SSA). These invasive non-typhoidal (iNTS) infections are dominated by isolates of the antibiotic resistance-associated sequence type (ST) 313. Here, we report emergence of ST313 sublineage II.1 in the Democratic Republic of the Congo. Sublineage II.1 exhibits extensive drug resistance, involving a combination of multidrug resistance, extended spectrum β-lactamase production and azithromycin resistance. ST313 lineage II.1 isolates harbour an IncHI2 plasmid we name pSTm-ST313-II.1, with one isolate also exhibiting decreased ciprofloxacin susceptibility. Whole genome sequencing reveals that ST313 II.1 isolates have accumulated genetic signatures potentially associated with altered pathogenicity and host adaptation, related to changes observed in biofilm formation and metabolic capacity. Sublineage II.1 emerged at the beginning of the 21st century and is involved in on-going outbreaks. Our data provide evidence of further evolution within the ST313 clade associated with iNTS in SSA.
Collapse
Affiliation(s)
- Sandra Van Puyvelde
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium. .,Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Laboratory of Medical Microbiology, Vaccine & Infectious Disease Institute, University of Antwerp, Antwerp, Belgium.
| | - Derek Pickard
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Koen Vandelannoote
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Eva Heinz
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Vector Biology, Liverpool School of Tropical Medicine, Pembroke Place, Liverpool, L3 5QA, UK
| | - Barbara Barbé
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Tessa de Block
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| | - Simon Clare
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Eve L Coomber
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Katherine Harcourt
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Sushmita Sridhar
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Emily A Lees
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Elizabeth J Klemm
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura Kuijpers
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium.,Department of Microbiology and Immunology, KU Leuven, Herestraat 49-box 1030, 3000, Leuven, Belgium
| | - Lisette Mbuyi Kalonji
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Marie-France Phoba
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Dadi Falay
- Department of Pediatrics, University Hospital of Kisangani, Avenue Munyororo C/Makiso, Kisangani, BP 2012, Democratic Republic of the Congo
| | - Dauly Ngbonda
- Department of Pediatrics, University Hospital of Kisangani, Avenue Munyororo C/Makiso, Kisangani, BP 2012, Democratic Republic of the Congo
| | - Octavie Lunguya
- Department of Microbiology, National Institute for Biomedical Research, Av. De La Démocratie no, 5345, Kinshasa, Democratic Republic of the Congo.,Department of Microbiology, University Hospital of Kinshasa, Kinshasa, Democratic Republic of the Congo
| | - Jan Jacobs
- Department of Clinical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium.,Department of Microbiology and Immunology, KU Leuven, Herestraat 49-box 1030, 3000, Leuven, Belgium
| | - Gordon Dougan
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.,Department of Medicine, Addenbrooke's Hospital, University of Cambridge, Cambridge, CB2 0SP, UK
| | - Stijn Deborggraeve
- Department of Biomedical Sciences, Institute of Tropical Medicine, Nationalestraat 155, 2000, Antwerp, Belgium
| |
Collapse
|
45
|
Vilne B, Meistere I, Grantiņa-Ieviņa L, Ķibilds J. Machine Learning Approaches for Epidemiological Investigations of Food-Borne Disease Outbreaks. Front Microbiol 2019; 10:1722. [PMID: 31447800 PMCID: PMC6691741 DOI: 10.3389/fmicb.2019.01722] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2019] [Accepted: 07/12/2019] [Indexed: 12/14/2022] Open
Abstract
Foodborne diseases (FBDs) are infections of the gastrointestinal tract caused by foodborne pathogens (FBPs) such as bacteria [Salmonella, Listeria monocytogenes and Shiga toxin-producing E. coli (STEC)] and several viruses, but also parasites and some fungi. Artificial intelligence (AI) and its sub-discipline machine learning (ML) are re-emerging and gaining an ever increasing popularity in the scientific community and industry, and could lead to actionable knowledge in diverse ranges of sectors including epidemiological investigations of FBD outbreaks and antimicrobial resistance (AMR). As genotyping using whole-genome sequencing (WGS) is becoming more accessible and affordable, it is increasingly used as a routine tool for the detection of pathogens, and has the potential to differentiate between outbreak strains that are closely related, identify virulence/resistance genes and provide improved understanding of transmission events within hours to days. In most cases, the computational pipeline of WGS data analysis can be divided into four (though, not necessarily consecutive) major steps: de novo genome assembly, genome characterization, comparative genomics, and inference of phylogeny or phylogenomics. In each step, ML could be used to increase the speed and potentially the accuracy (provided increasing amounts of high-quality input data) of identification of the source of ongoing outbreaks, leading to more efficient treatment and prevention of additional cases. In this review, we explore whether ML or any other form of AI algorithms have already been proposed for the respective tasks and compare those with mechanistic model-based approaches.
Collapse
Affiliation(s)
- Baiba Vilne
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
- SIA net-OMICS, Riga, Latvia
| | - Irēna Meistere
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| | | | - Juris Ķibilds
- Institute of Food Safety, Animal Health and Environment—“BIOR”, Riga, Latvia
| |
Collapse
|
46
|
Computational Health Engineering Applied to Model Infectious Diseases and Antimicrobial Resistance Spread. APPLIED SCIENCES-BASEL 2019. [DOI: 10.3390/app9122486] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Infectious diseases are the primary cause of mortality worldwide. The dangers of infectious disease are compounded with antimicrobial resistance, which remains the greatest concern for human health. Although novel approaches are under investigation, the World Health Organization predicts that by 2050, septicaemia caused by antimicrobial resistant bacteria could result in 10 million deaths per year. One of the main challenges in medical microbiology is to develop novel experimental approaches, which enable a better understanding of bacterial infections and antimicrobial resistance. After the introduction of whole genome sequencing, there was a great improvement in bacterial detection and identification, which also enabled the characterization of virulence factors and antimicrobial resistance genes. Today, the use of in silico experiments jointly with computational and machine learning offer an in depth understanding of systems biology, allowing us to use this knowledge for the prevention, prediction, and control of infectious disease. Herein, the aim of this review is to discuss the latest advances in human health engineering and their applicability in the control of infectious diseases. An in-depth knowledge of host–pathogen–protein interactions, combined with a better understanding of a host’s immune response and bacterial fitness, are key determinants for halting infectious diseases and antimicrobial resistance dissemination.
Collapse
|
47
|
Wheeler NE, Blackmore T, Reynolds AD, Midwinter AC, Marshall J, French NP, Savoian MS, Gardner PP, Biggs PJ. Genomic correlates of extraintestinal infection are linked with changes in cell morphology in Campylobacter jejuni. Microb Genom 2019; 5:e000251. [PMID: 30777818 PMCID: PMC6421344 DOI: 10.1099/mgen.0.000251] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2018] [Accepted: 12/16/2018] [Indexed: 12/12/2022] Open
Abstract
Campylobacter jejuni is the most common cause of bacterial diarrheal disease in the world. Clinical outcomes of infection can range from asymptomatic infection to life-threatening extraintestinal infections. This variability in outcomes for infected patients has raised questions as to whether genetic differences between C. jejuni isolates contribute to their likelihood of causing severe disease. In this study, we compare the genomes of ten C. jejuni isolates that were implicated in extraintestinal infections with reference gastrointestinal isolates, in order to identify unusual patterns of sequence variation associated with infection outcome. We identified a collection of genes that display a higher burden of uncommon mutations in invasive isolates compared with gastrointestinal close relatives, including some that have been previously linked to virulence and invasiveness in C. jejuni. Among the top genes identified were mreB and pgp1, which are both involved in determining cell shape. Electron microscopy confirmed morphological differences in isolates carrying unusual sequence variants of these genes, indicating a possible relationship between extraintestinal infection and changes in cell morphology.
Collapse
Affiliation(s)
- Nicole E. Wheeler
- Center for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Hinxton, UK
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
| | | | - Angela D. Reynolds
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Anne C. Midwinter
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Jonathan Marshall
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Nigel P. French
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
- New Zealand Food Safety Science and Research Centre, Palmerston North, New Zealand
| | - Matthew S. Savoian
- Institute of Fundamental Sciences, Massey University, Palmerston North, New Zealand
| | - Paul P. Gardner
- School of Biological Sciences, University of Canterbury, Christchurch, New Zealand
- Biomolecular Interaction Centre, University of Canterbury, Christchurch, New Zealand
- Department of Biochemistry, University of Otago, Dunedin, New Zealand.
| | - Patrick J. Biggs
- EpiLab, School of Veterinary Science, Massey University, Palmerston North, New Zealand
- New Zealand Genomics Ltd (NZGL – as Massey Genome Service) Massey University, Palmerston North, New Zealand
- Allan Wilson Centre for Molecular Ecology and Evolution, Massey University, Palmerston North, New Zealand
| |
Collapse
|
48
|
Canals R, Hammarlöf DL, Kröger C, Owen SV, Fong WY, Lacharme-Lora L, Zhu X, Wenner N, Carden SE, Honeycutt J, Monack DM, Kingsley RA, Brownridge P, Chaudhuri RR, Rowe WPM, Predeus AV, Hokamp K, Gordon MA, Hinton JCD. Adding function to the genome of African Salmonella Typhimurium ST313 strain D23580. PLoS Biol 2019; 17:e3000059. [PMID: 30645593 PMCID: PMC6333337 DOI: 10.1371/journal.pbio.3000059] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Salmonella Typhimurium sequence type (ST) 313 causes invasive nontyphoidal Salmonella (iNTS) disease in sub-Saharan Africa, targeting susceptible HIV+, malarial, or malnourished individuals. An in-depth genomic comparison between the ST313 isolate D23580 and the well-characterized ST19 isolate 4/74 that causes gastroenteritis across the globe revealed extensive synteny. To understand how the 856 nucleotide variations generated phenotypic differences, we devised a large-scale experimental approach that involved the global gene expression analysis of strains D23580 and 4/74 grown in 16 infection-relevant growth conditions. Comparison of transcriptional patterns identified virulence and metabolic genes that were differentially expressed between D23580 versus 4/74, many of which were validated by proteomics. We also uncovered the S. Typhimurium D23580 and 4/74 genes that showed expression differences during infection of murine macrophages. Our comparative transcriptomic data are presented in a new enhanced version of the Salmonella expression compendium, SalComD23580: http://bioinf.gen.tcd.ie/cgi-bin/salcom_v2.pl. We discovered that the ablation of melibiose utilization was caused by three independent SNP mutations in D23580 that are shared across ST313 lineage 2, suggesting that the ability to catabolize this carbon source has been negatively selected during ST313 evolution. The data revealed a novel, to our knowledge, plasmid maintenance system involving a plasmid-encoded CysS cysteinyl-tRNA synthetase, highlighting the power of large-scale comparative multicondition analyses to pinpoint key phenotypic differences between bacterial pathovariants.
Collapse
Affiliation(s)
- Rocío Canals
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Disa L Hammarlöf
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Carsten Kröger
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Siân V Owen
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Wai Yee Fong
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Lizeth Lacharme-Lora
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Xiaojun Zhu
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Nicolas Wenner
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Sarah E Carden
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Jared Honeycutt
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Denise M Monack
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Robert A Kingsley
- Quadram Institute Bioscience, Norwich Research Park, Norwich, United Kingdom
| | - Philip Brownridge
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Roy R Chaudhuri
- Department of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, United Kingdom
| | - Will P M Rowe
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Alexander V Predeus
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| | - Karsten Hokamp
- Department of Genetics, School of Genetics and Microbiology, Smurfit Institute of Genetics, Trinity College Dublin, Ireland
| | - Melita A Gordon
- Institute of Infection and Global Health, University of Liverpool, Liverpool, United Kingdom
- Malawi-Liverpool-Wellcome Trust Clinical Research Programme, University of Malawi College of Medicine, Malawi, Central Africa
| | - Jay C D Hinton
- Institute of Integrative Biology, University of Liverpool, Liverpool, United Kingdom
| |
Collapse
|
49
|
Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol 2018; 14:e1006258. [PMID: 30550564 PMCID: PMC6310291 DOI: 10.1371/journal.pcbi.1006258] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 12/28/2018] [Accepted: 11/18/2018] [Indexed: 12/17/2022] Open
Abstract
The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.
Collapse
Affiliation(s)
- Danesh Moradigaravand
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Center for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Martin Palm
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Anne Farewell
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Ville Mustonen
- Organismal and Evolutionary Biology Research Programme, Department of Computer Science, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Helsinki, Finland
| | - Jonas Warringer
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Leopold Parts
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Department of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|
50
|
Aun E, Brauer A, Kisand V, Tenson T, Remm M. A k-mer-based method for the identification of phenotype-associated genomic biomarkers and predicting phenotypes of sequenced bacteria. PLoS Comput Biol 2018; 14:e1006434. [PMID: 30346947 PMCID: PMC6211763 DOI: 10.1371/journal.pcbi.1006434] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2018] [Revised: 11/01/2018] [Accepted: 08/15/2018] [Indexed: 11/18/2022] Open
Abstract
We have developed an easy-to-use and memory-efficient method called PhenotypeSeeker that (a) identifies phenotype-specific k-mers, (b) generates a k-mer-based statistical model for predicting a given phenotype and (c) predicts the phenotype from the sequencing data of a given bacterial isolate. The method was validated on 167 Klebsiella pneumoniae isolates (virulence), 200 Pseudomonas aeruginosa isolates (ciprofloxacin resistance) and 459 Clostridium difficile isolates (azithromycin resistance). The phenotype prediction models trained from these datasets obtained the F1-measure of 0.88 on the K. pneumoniae test set, 0.88 on the P. aeruginosa test set and 0.97 on the C. difficile test set. The F1-measures were the same for assembled sequences and raw sequencing data; however, building the model from assembled genomes is significantly faster. On these datasets, the model building on a mid-range Linux server takes approximately 3 to 5 hours per phenotype if assembled genomes are used and 10 hours per phenotype if raw sequencing data are used. The phenotype prediction from assembled genomes takes less than one second per isolate. Thus, PhenotypeSeeker should be well-suited for predicting phenotypes from large sequencing datasets. PhenotypeSeeker is implemented in Python programming language, is open-source software and is available at GitHub (https://github.com/bioinfo-ut/PhenotypeSeeker/).
Collapse
Affiliation(s)
- Erki Aun
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- * E-mail:
| | - Age Brauer
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Veljo Kisand
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Tanel Tenson
- Institute of Technology, University of Tartu, Tartu, Estonia
| | - Maido Remm
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| |
Collapse
|