1
|
Pikalyova K, Orlov A, Horvath D, Marcou G, Varnek A. Predicting S. aureus antimicrobial resistance with interpretable genomic space maps. Mol Inform 2024; 43:e202300263. [PMID: 38386182 DOI: 10.1002/minf.202300263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/15/2024] [Accepted: 02/08/2024] [Indexed: 02/23/2024]
Abstract
Increasing antimicrobial resistance (AMR) represents a global healthcare threat. To decrease the spread of AMR and associated mortality, methods for rapid selection of optimal antibiotic treatment are urgently needed. Machine learning (ML) models based on genomic data to predict resistant phenotypes can serve as a fast screening tool prior to phenotypic testing. Nonetheless, many existing ML methods lack interpretability. Therefore, we present a methodology for visualization of sequence space and AMR prediction based on the non-linear dimensionality reduction method - generative topographic mapping (GTM). This approach, applied to AMR data of >5000 S. aureus isolates retrieved from the PATRIC database, yielded GTM models with reasonable accuracy for all drugs (balanced accuracy values ≥0.75). The Generative Topographic Maps (GTMs) represent data in the form of illustrative maps of the genomic space and allow for antibiotic-wise comparison of resistant phenotypes. The maps were also found to be useful for the analysis of genetic determinants responsible for drug resistance. Overall, the GTM-based methodology is a useful tool for both the illustrative exploration of the genomic sequence space and AMR prediction.
Collapse
Affiliation(s)
- Karina Pikalyova
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | - Alexey Orlov
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | - Dragos Horvath
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | - Gilles Marcou
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| | - Alexandre Varnek
- Laboratoire de Chémoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg, 67000, France
| |
Collapse
|
2
|
Rusic D, Kumric M, Seselja Perisin A, Leskur D, Bukic J, Modun D, Vilovic M, Vrdoljak J, Martinovic D, Grahovac M, Bozic J. Tackling the Antimicrobial Resistance "Pandemic" with Machine Learning Tools: A Summary of Available Evidence. Microorganisms 2024; 12:842. [PMID: 38792673 PMCID: PMC11123121 DOI: 10.3390/microorganisms12050842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/26/2024] Open
Abstract
Antimicrobial resistance is recognised as one of the top threats healthcare is bound to face in the future. There have been various attempts to preserve the efficacy of existing antimicrobials, develop new and efficient antimicrobials, manage infections with multi-drug resistant strains, and improve patient outcomes, resulting in a growing mass of routinely available data, including electronic health records and microbiological information that can be employed to develop individualised antimicrobial stewardship. Machine learning methods have been developed to predict antimicrobial resistance from whole-genome sequencing data, forecast medication susceptibility, recognise epidemic patterns for surveillance purposes, or propose new antibacterial treatments and accelerate scientific discovery. Unfortunately, there is an evident gap between the number of machine learning applications in science and the effective implementation of these systems. This narrative review highlights some of the outstanding opportunities that machine learning offers when applied in research related to antimicrobial resistance. In the future, machine learning tools may prove to be superbugs' kryptonite. This review aims to provide an overview of available publications to aid researchers that are looking to expand their work with new approaches and to acquaint them with the current application of machine learning techniques in this field.
Collapse
Affiliation(s)
- Doris Rusic
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Marko Kumric
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Ana Seselja Perisin
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Dario Leskur
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Josipa Bukic
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Darko Modun
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Marino Vilovic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Josip Vrdoljak
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Dinko Martinovic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Department of Maxillofacial Surgery, University Hospital of Split, Spinciceva 1, 21000 Split, Croatia
| | - Marko Grahovac
- Department of Pharmacology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia;
| | - Josko Bozic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| |
Collapse
|
3
|
Yurtseven A, Buyanova S, Agrawal AA, Bochkareva OO, Kalinina OV. Machine learning and phylogenetic analysis allow for predicting antibiotic resistance in M. tuberculosis. BMC Microbiol 2023; 23:404. [PMID: 38124060 PMCID: PMC10731705 DOI: 10.1186/s12866-023-03147-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Antimicrobial resistance (AMR) poses a significant global health threat, and an accurate prediction of bacterial resistance patterns is critical for effective treatment and control strategies. In recent years, machine learning (ML) approaches have emerged as powerful tools for analyzing large-scale bacterial AMR data. However, ML methods often ignore evolutionary relationships among bacterial strains, which can greatly impact performance of the ML methods, especially if resistance-associated features are attempted to be detected. Genome-wide association studies (GWAS) methods like linear mixed models accounts for the evolutionary relationships in bacteria, but they uncover only highly significant variants which have already been reported in literature. RESULTS In this work, we introduce a novel phylogeny-related parallelism score (PRPS), which measures whether a certain feature is correlated with the population structure of a set of samples. We demonstrate that PRPS can be used, in combination with SVM- and random forest-based models, to reduce the number of features in the analysis, while simultaneously increasing models' performance. We applied our pipeline to publicly available AMR data from PATRIC database for Mycobacterium tuberculosis against six common antibiotics. CONCLUSIONS Using our pipeline, we re-discovered known resistance-associated mutations as well as new candidate mutations which can be related to resistance and not previously reported in the literature. We demonstrated that taking into account phylogenetic relationships not only improves the model performance, but also yields more biologically relevant predicted most contributing resistance markers.
Collapse
Affiliation(s)
- Alper Yurtseven
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany.
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany.
| | - Sofia Buyanova
- Institute of Science and Technology Austria (ISTA), Am Campus 1, Klosterneuburg, 3400, Austria
| | - Amay Ajaykumar Agrawal
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany
| | - Olga O Bochkareva
- Institute of Science and Technology Austria (ISTA), Am Campus 1, Klosterneuburg, 3400, Austria
- Centre for Microbiology and Environmental Systems Science, Division of Computational System Biology, University of Vienna, Djerassiplatz 1 A, Wien, 1030, Austria
| | - Olga V Kalinina
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany
- Faculty of Medicine, Saarland University, Homburg, 66421, Saarland, Germany
| |
Collapse
|
4
|
Humphries RM, Miller L, Zimmer B, Matuschek E, Hindler JA. Contemporary Considerations for Establishing Reference Methods for Antibacterial Susceptibility Testing. J Clin Microbiol 2023; 61:e0188622. [PMID: 36971571 PMCID: PMC10281161 DOI: 10.1128/jcm.01886-22] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/29/2023] Open
Abstract
Antibacterial susceptibility testing (AST) is performed to guide therapy, perform resistance surveillance studies, and support development of new antibacterial agents. For 5 decades, broth microdilution (BMD) has served as the reference method to assess in vitro activity of antibacterial agents against which both novel agents and diagnostic tests have been measured. BMD relies on in vitro inhibition or killing of bacteria. It is associated with several limitations: it is a poor mimic of the in vivo milieu of bacterial infections, requires multiple days to perform, and is associated with subtle, difficult to control variability. In addition, new reference methods will soon be needed for novel agents whose activity cannot be evaluated by BMD (e.g., those that target virulence). Any new reference methods must be standardized, correlated with clinical efficacy and be recognized internationally by researchers, industry, and regulators. Herein, we describe current reference methods for in vitro assessment of antibacterial activity and highlight key considerations for the generation of novel reference methods.
Collapse
Affiliation(s)
- Romney M. Humphries
- Pathology, Microbiology and Immunology, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Linda Miller
- CMID Pharma Consulting, LLC, Dresher, Pennsylvania, USA
| | - Barbara Zimmer
- Beckman Coulter Microbiology, Sacramento, California, USA
| | | | - Janet A. Hindler
- Los Angeles County Department of Public Health, Public Health Laboratory, Los Angeles, California, USA
| |
Collapse
|
5
|
Metagenomic Antimicrobial Susceptibility Testing from Simulated Native Patient Samples. Antibiotics (Basel) 2023; 12:antibiotics12020366. [PMID: 36830277 PMCID: PMC9952719 DOI: 10.3390/antibiotics12020366] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 02/06/2023] [Accepted: 02/08/2023] [Indexed: 02/12/2023] Open
Abstract
Genomic antimicrobial susceptibility testing (AST) has been shown to be accurate for many pathogens and antimicrobials. However, these methods have not been systematically evaluated for clinical metagenomic data. We investigate the performance of in-silico AST from clinical metagenomes (MG-AST). Using isolate sequencing data from a multi-center study on antimicrobial resistance (AMR) as well as shotgun-sequenced septic urine samples, we simulate over 2000 complicated urinary tract infection (cUTI) metagenomes with known resistance phenotype to 5 antimicrobials. Applying rule-based and machine learning-based genomic AST classifiers, we explore the impact of sequencing depth and technology, metagenome complexity, and bioinformatics processing approaches on AST accuracy. By using an optimized metagenomics assembly and binning workflow, MG-AST achieved balanced accuracy within 5.1% of isolate-derived genomic AST. For poly-microbial infections, taxonomic sample complexity and relatedness of taxa in the sample is a key factor influencing metagenomic binning and downstream MG-AST accuracy. We show that the reassignment of putative plasmid contigs by their predicted host range and investigation of whole resistome capabilities improved MG-AST performance on poly-microbial samples. We further demonstrate that machine learning-based methods enable MG-AST with superior accuracy compared to rule-based approaches on simulated native patient samples.
Collapse
|
6
|
Martin SL, Mortimer TD, Grad YH. Machine learning models for Neisseria gonorrhoeae antimicrobial susceptibility tests. Ann N Y Acad Sci 2023; 1520:74-88. [PMID: 36573759 PMCID: PMC9974846 DOI: 10.1111/nyas.14549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Neisseria gonorrhoeae is an urgent public health threat due to the emergence of antibiotic resistance. As most isolates in the United States are susceptible to at least one antibiotic, rapid molecular antimicrobial susceptibility tests (ASTs) would offer the opportunity to tailor antibiotic therapy, thereby expanding treatment options. With genome sequence and antibiotic resistance phenotype data for nearly 20,000 clinical N. gonorrhoeae isolates now available, there is an opportunity to use statistical methods to develop sequence-based diagnostics that predict antibiotic susceptibility from genotype. N. gonorrhoeae, therefore, provides a useful example illustrating how to apply machine learning models to aid in the design of sequence-based ASTs. We present an overview of this framework, which begins with establishing the assay technology, the performance criteria, the population in which the diagnostic will be used, and the clinical goals, and extends to the choices that must be made to arrive at a set of features with the desired properties for predicting susceptibility phenotype from genotype. While we focus on the example of N. gonorrhoeae, the framework generalizes to other organisms for which large-scale genotype and antibiotic resistance data can be combined to aid in diagnostics development.
Collapse
Affiliation(s)
- Skylar L. Martin
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Tatum D. Mortimer
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Yonatan H. Grad
- Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
- Division of Infectious Diseases, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
7
|
Validation and Application of Long-Read Whole-Genome Sequencing for Antimicrobial Resistance Gene Detection and Antimicrobial Susceptibility Testing. Antimicrob Agents Chemother 2023; 67:e0107222. [PMID: 36533931 PMCID: PMC9872642 DOI: 10.1128/aac.01072-22] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Next-generation sequencing applications are increasingly used for detection and characterization of antimicrobial-resistant pathogens in clinical settings. Oxford Nanopore Technologies (ONT) sequencing offers advantages for clinical use compared with other sequencing methodologies because it enables real-time basecalling, produces long sequencing reads that increase the ability to correctly assemble DNA fragments, provides short turnaround times, and requires relatively uncomplicated sample preparation. A drawback of ONT sequencing, however, is its lower per-read accuracy than short-read sequencing. We sought to identify best practices in ONT sequencing protocols. As some variability in sequencing results may be introduced by the DNA extraction methodology, we tested three DNA extraction kits across three independent laboratories using a representative set of six bacterial isolates to investigate accuracy and reproducibility of ONT technology. All DNA extraction techniques showed comparable performance; however, the DNeasy PowerSoil Pro kit had the highest sequencing yield. This kit was subsequently applied to 42 sequentially collected bacterial isolates from blood cultures to assess Ares Genetics's pipelines for predictive whole-genome sequencing antimicrobial susceptibility testing (WGS-AST) performance compared to phenotypic triplicate broth microdilution results. WGS-AST results ranged across the organisms and resulted in an overall categorical agreement of 95% for penicillins, 82.4% for cephalosporins, 76.7% for carbapenems, 86.9% for fluoroquinolones, and 96.2% for aminoglycosides. Very major errors/major errors were 0%/16.7% (penicillins), 11.7%/3.6% (cephalosporins), 0%/24.4% (carbapenems), 2.5%/7.7% (fluoroquinolones), and 0%/4.1% (aminoglycosides), respectively. This work showed that, although additional refinements are necessary, ONT sequencing demonstrates potential as a method to perform WGS-AST on cultured isolates for patient care.
Collapse
|
8
|
Conzemius R, Bergman Y, Májek P, Beisken S, Lewis S, Jacobs EB, Tamma PD, Simner PJ. Automated antimicrobial susceptibility testing and antimicrobial resistance genotyping using Illumina and Oxford Nanopore Technologies sequencing data among Enterobacteriaceae. Front Microbiol 2022; 13:973605. [PMID: 36003946 PMCID: PMC9393496 DOI: 10.3389/fmicb.2022.973605] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Accepted: 07/08/2022] [Indexed: 11/13/2022] Open
Abstract
Whole-genome sequencing (WGS) enables the molecular characterization of bacterial pathogens. We compared the accuracy of the Illumina and Oxford Nanopore Technologies (ONT) sequencing platforms for the determination of AMR classes and antimicrobial susceptibility testing (AST) among 181 clinical Enterobacteriaceae isolates. Sequencing reads for each isolate were uploaded to AREScloud (Ares Genetics) to determine the presence of AMR markers and the predicted WGS-AST profile. The profiles of both sequencing platforms were compared to broth microdilution (BMD) AST. Isolates were delineated by resistance to third-generation cephalosporins and carbapenems as well as the presence of AMR markers to determine clinically relevant AMR classes. The overall categorical agreement (CA) was 90% (Illumina) and 88% (ONT) across all antimicrobials, 96% for the prediction of resistance to third-generation cephalosporins for both platforms, and 94% (Illumina) and 91% (ONT) for the prediction of resistance to carbapenems. Carbapenem resistance was overestimated on ONT with a major error of 16%. Sensitivity for the detection of carbapenemases, extended-spectrum β-lactamases, and plasmid-mediated ampC genes was 98, 95, and 70% by ONT compared to the Illumina dataset as the reference. Our results highlight the potential of the ONT platform’s use in clinical microbiology laboratories. When combined with robust bioinformatics methods, WGS-AST predictions may be a future approach to guide effective antimicrobial decision-making.
Collapse
Affiliation(s)
| | - Yehudit Bergman
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | | | | | - Shawna Lewis
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Emily B. Jacobs
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Pranita D. Tamma
- Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, United States
| | - Patricia J. Simner
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD, United States
- *Correspondence: Patricia J. Simner,
| |
Collapse
|
9
|
Ruigrok M, Xue B, Catanach A, Zhang M, Jesson L, Davy M, Wellenreuther M. The Relative Power of Structural Genomic Variation versus SNPs in Explaining the Quantitative Trait Growth in the Marine Teleost Chrysophrys auratus. Genes (Basel) 2022; 13:genes13071129. [PMID: 35885912 PMCID: PMC9320665 DOI: 10.3390/genes13071129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/08/2022] [Accepted: 06/20/2022] [Indexed: 02/04/2023] Open
Abstract
Background: Genetic diversity provides the basic substrate for evolution. Genetic variation consists of changes ranging from single base pairs (single-nucleotide polymorphisms, or SNPs) to larger-scale structural variants, such as inversions, deletions, and duplications. SNPs have long been used as the general currency for investigations into how genetic diversity fuels evolution. However, structural variants can affect more base pairs in the genome than SNPs and can be responsible for adaptive phenotypes due to their impact on linkage and recombination. In this study, we investigate the first steps needed to explore the genetic basis of an economically important growth trait in the marine teleost finfish Chrysophrys auratus using both SNP and structural variant data. Specifically, we use feature selection methods in machine learning to explore the relative predictive power of both types of genetic variants in explaining growth and discuss the feature selection results of the evaluated methods. Methods: SNP and structural variant callers were used to generate catalogues of variant data from 32 individual fish at ages 1 and 3 years. Three feature selection algorithms (ReliefF, Chi-square, and a mutual-information-based method) were used to reduce the dataset by selecting the most informative features. Following this selection process, the subset of variants was used as features to classify fish into small, medium, or large size categories using KNN, naïve Bayes, random forest, and logistic regression. The top-scoring features in each feature selection method were subsequently mapped to annotated genomic regions in the zebrafish genome, and a permutation test was conducted to see if the number of mapped regions was greater than when random sampling was applied. Results: Without feature selection, the prediction accuracies ranged from 0 to 0.5 for both structural variants and SNPs. Following feature selection, the prediction accuracy increased only slightly to between 0 and 0.65 for structural variants and between 0 and 0.75 for SNPs. The highest prediction accuracy for the logistic regression was achieved for age 3 fish using SNPs, although generally predictions for age 1 and 3 fish were very similar (ranging from 0–0.65 for both SNPs and structural variants). The Chi-square feature selection of SNP data was the only method that had a significantly higher number of matches to annotated genomic regions of zebrafish than would be explained by chance alone. Conclusions: Predicting a complex polygenic trait such as growth using data collected from a low number of individuals remains challenging. While we demonstrate that both SNPs and structural variants provide important information to help understand the genetic basis of phenotypic traits such as fish growth, the full complexities that exist within a genome cannot be easily captured by classical machine learning techniques. When using high-dimensional data, feature selection shows some increase in the prediction accuracy of classification models and provides the potential to identify unknown genomic correlates with growth. Our results show that both SNPs and structural variants significantly impact growth, and we therefore recommend that researchers interested in the genotype–phenotype map should strive to go beyond SNPs and incorporate structural variants in their studies as well. We discuss how our machine learning models can be further expanded to serve as a test bed to inform evolutionary studies and the applied management of species.
Collapse
Affiliation(s)
- Mike Ruigrok
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Bing Xue
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Andrew Catanach
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Mengjie Zhang
- Wellington Faculty of Engineering, Victoria University of Wellington, Wellington 6012, New Zealand; (B.X.); (M.Z.)
| | - Linley Jesson
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Marcus Davy
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
| | - Maren Wellenreuther
- The New Zealand Institute for Plant & Food Research Ltd., Nelson 7010, New Zealand; (M.R.); (A.C.); (L.J.); (M.D.)
- School of Biological Sciences, University of Auckland, Auckland 1010, New Zealand
- Correspondence:
| |
Collapse
|
10
|
Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction. Int J Mol Sci 2021; 22:ijms222313049. [PMID: 34884852 PMCID: PMC8657983 DOI: 10.3390/ijms222313049] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 01/21/2023] Open
Abstract
The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for Pseudomonas aeruginosa, Citrobacter freundii, and Escherichia coli. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average.
Collapse
|
11
|
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes. Brief Bioinform 2021; 22:bbab313. [PMID: 34379107 PMCID: PMC8575023 DOI: 10.1093/bib/bbab313] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.
Collapse
Affiliation(s)
| | - Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Dustin Machi
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Chunhong Mao
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Rick Stevens
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Alice R Wattam
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Hyunseung Yoo
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, IL, USA
| |
Collapse
|
12
|
Core Genome Multilocus Sequence Typing and Prediction of Antimicrobial Susceptibility Using Whole-Genome Sequences of Escherichia coli Bloodstream Infection Isolates. Antimicrob Agents Chemother 2021; 65:e0113921. [PMID: 34424049 DOI: 10.1128/aac.01139-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
In total, 50 Escherichia coli bloodstream isolates from the clinical laboratory and 12 E. coli isolates referred for pulsed-field gel electrophoresis (PFGE) were sequenced, assessed for clonality using core genome multilocus sequence typing (cgMLST), and evaluated for genomic susceptibility predictions using ARESdb. Results of sequence typing using whole-genome sequencing (WGS)-based MLST and sequence type (ST)-specific PCR were identical. Overall categorical agreement between genotypic (ARESdb) and phenotypic susceptibility testing for 62 isolates and 11 antimicrobial agents was 91%. Among the referred isolates, high major error rates were found for ceftazidime, cefepime, and piperacillin-tazobactam.
Collapse
|
13
|
Lüftinger L, Ferreira I, Frank BJH, Beisken S, Weinberger J, von Haeseler A, Rattei T, Hofstaetter JG, Posch AE, Materna A. Predictive Antibiotic Susceptibility Testing by Next-Generation Sequencing for Periprosthetic Joint Infections: Potential and Limitations. Biomedicines 2021; 9:910. [PMID: 34440114 PMCID: PMC8389688 DOI: 10.3390/biomedicines9080910] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 01/18/2023] Open
Abstract
Joint replacement surgeries are one of the most frequent medical interventions globally. Infections of prosthetic joints are a major health challenge and typically require prolonged or even indefinite antibiotic treatment. As multidrug-resistant pathogens continue to rise globally, novel diagnostics are critical to ensure appropriate treatment and help with prosthetic joint infections (PJI) management. To this end, recent studies have shown the potential of molecular methods such as next-generation sequencing to complement established phenotypic, culture-based methods. Together with advanced bioinformatics approaches, next-generation sequencing can provide comprehensive information on pathogen identity as well as antimicrobial susceptibility, potentially enabling rapid diagnosis and targeted therapy of PJIs. In this review, we summarize current developments in next generation sequencing based predictive antibiotic susceptibility testing and discuss potential and limitations for common PJI pathogens.
Collapse
Affiliation(s)
- Lukas Lüftinger
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1030 Vienna, Austria;
| | - Ines Ferreira
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna, 1030 Vienna, Austria;
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, Medical University of Vienna, 1030 Vienna, Austria
| | - Bernhard J. H. Frank
- Michael Ogon Laboratory for Orthopaedic Research, Orthopaedic Hospital Vienna-Speising, 1130 Vienna, Austria; (B.J.H.F.); (J.G.H.)
| | - Stephan Beisken
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
| | - Johannes Weinberger
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
| | - Arndt von Haeseler
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, University of Vienna, 1030 Vienna, Austria;
- Center for Integrative Bioinformatics Vienna, Max Perutz Laboratories, Medical University of Vienna, 1030 Vienna, Austria
- Bioinformatics and Computational Biology, Faculty of Computer Science, University of Vienna, 1090 Vienna, Austria
| | - Thomas Rattei
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, 1030 Vienna, Austria;
| | - Jochen G. Hofstaetter
- Michael Ogon Laboratory for Orthopaedic Research, Orthopaedic Hospital Vienna-Speising, 1130 Vienna, Austria; (B.J.H.F.); (J.G.H.)
| | - Andreas E. Posch
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
| | - Arne Materna
- Ares Genetics GmbH, Karl-Farkas-Gasse 18, 1030 Vienna, Austria; (L.L.); (I.F.); (S.B.); (J.W.); (A.E.P.)
| |
Collapse
|