1
|
Rusic D, Kumric M, Seselja Perisin A, Leskur D, Bukic J, Modun D, Vilovic M, Vrdoljak J, Martinovic D, Grahovac M, Bozic J. Tackling the Antimicrobial Resistance "Pandemic" with Machine Learning Tools: A Summary of Available Evidence. Microorganisms 2024; 12:842. [PMID: 38792673 PMCID: PMC11123121 DOI: 10.3390/microorganisms12050842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2024] [Revised: 04/16/2024] [Accepted: 04/19/2024] [Indexed: 05/26/2024] Open
Abstract
Antimicrobial resistance is recognised as one of the top threats healthcare is bound to face in the future. There have been various attempts to preserve the efficacy of existing antimicrobials, develop new and efficient antimicrobials, manage infections with multi-drug resistant strains, and improve patient outcomes, resulting in a growing mass of routinely available data, including electronic health records and microbiological information that can be employed to develop individualised antimicrobial stewardship. Machine learning methods have been developed to predict antimicrobial resistance from whole-genome sequencing data, forecast medication susceptibility, recognise epidemic patterns for surveillance purposes, or propose new antibacterial treatments and accelerate scientific discovery. Unfortunately, there is an evident gap between the number of machine learning applications in science and the effective implementation of these systems. This narrative review highlights some of the outstanding opportunities that machine learning offers when applied in research related to antimicrobial resistance. In the future, machine learning tools may prove to be superbugs' kryptonite. This review aims to provide an overview of available publications to aid researchers that are looking to expand their work with new approaches and to acquaint them with the current application of machine learning techniques in this field.
Collapse
Affiliation(s)
- Doris Rusic
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Marko Kumric
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Ana Seselja Perisin
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Dario Leskur
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Josipa Bukic
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Darko Modun
- Department of Pharmacy, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (D.R.); (A.S.P.); (D.L.); (J.B.); (D.M.)
| | - Marino Vilovic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Josip Vrdoljak
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| | - Dinko Martinovic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Department of Maxillofacial Surgery, University Hospital of Split, Spinciceva 1, 21000 Split, Croatia
| | - Marko Grahovac
- Department of Pharmacology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia;
| | - Josko Bozic
- Department of Pathophysiology, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia; (M.K.); (M.V.); (J.V.); (D.M.)
- Laboratory for Cardiometabolic Research, University of Split School of Medicine, Soltanska 2A, 21000 Split, Croatia
| |
Collapse
|
2
|
Hu K, Meyer F, Deng ZL, Asgari E, Kuo TH, Münch PC, McHardy AC. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. Brief Bioinform 2024; 25:bbae206. [PMID: 38706320 PMCID: PMC11070729 DOI: 10.1093/bib/bbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.
Collapse
Affiliation(s)
- Kaixin Hu
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Molecular Cell Biomechanics Laboratory, Department of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Philipp C Münch
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| |
Collapse
|
3
|
Rahman MK, Williams RB, Ajulo S, Levent G, Loneragan GH, Awosile B. Predictive Modeling of Phenotypic Antimicrobial Susceptibility of Selected Beta-Lactam Antimicrobials from Beta-Lactamase Resistance Genes. Antibiotics (Basel) 2024; 13:224. [PMID: 38534659 DOI: 10.3390/antibiotics13030224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2024] [Revised: 02/23/2024] [Accepted: 02/26/2024] [Indexed: 03/28/2024] Open
Abstract
The outcome of bacterial infection management relies on prompt diagnosis and effective treatment, but conventional antimicrobial susceptibility testing can be slow and labor-intensive. Therefore, this study aims to predict phenotypic antimicrobial susceptibility of selected beta-lactam antimicrobials in the bacteria of the family Enterobacteriaceae from different beta-lactamase resistance genotypes. Using human datasets extracted from the Antimicrobial Testing Leadership and Surveillance (ATLAS) program conducted by Pfizer and retail meat datasets from the National Antimicrobial Resistance Monitoring System for Enteric Bacteria (NARMS), we used a robust or weighted least square multivariable linear regression modeling framework to explore the relationship between antimicrobial susceptibility data of beta-lactam antimicrobials and different types of beta-lactamase resistance genes. In humans, in the presence of the blaCTX-M-1, blaCTX-M-2, blaCTX-M-8/25, and blaCTX-M-9 groups, MICs of cephalosporins significantly increased by values between 0.34-3.07 μg/mL, however, the MICs of carbapenem significantly decreased by values between 0.81-0.87 μg/mL. In the presence of carbapenemase genes (blaKPC, blaNDM, blaIMP, and blaVIM), the MICs of cephalosporin antimicrobials significantly increased by values between 1.06-5.77 μg/mL, while the MICs of carbapenem antimicrobials significantly increased by values between 5.39-67.38 μg/mL. In retail meat, MIC of ceftriaxone increased significantly in the presence of blaCMY-2, blaCTX-M-1, blaCTX-M-55, blaCTX-M-65, and blaSHV-2 by 55.16 μg/mL, 222.70 μg/mL, 250.81 μg/mL, 204.89 μg/mL, and 31.51 μg/mL respectively. MIC of cefoxitin increased significantly in the presence of blaCTX-M-65 and blaTEM-1 by 1.57 μg/mL and 1.04 μg/mL respectively. In the presence of blaCMY-2, MIC of cefoxitin increased by an average of 8.66 μg/mL over 17 years. Compared to E. coli isolates, MIC of cefoxitin in Salmonella enterica isolates decreased significantly by 0.67 μg/mL. On the other hand, MIC of ceftiofur increased in the presence of blaCTX-M-1, blaCTX-M-65, blaSHV-2, and blaTEM-1 by 8.82 μg/mL, 9.11 μg/mL, 8.18 μg/mL, and 1.04 μg/mL respectively. In the presence of blaCMY-2, MIC of ceftiofur increased by an average of 10.20 μg/mL over 14 years. The ability to predict antimicrobial susceptibility of beta-lactam antimicrobials directly from beta-lactamase resistance genes may help reduce the reliance on routine phenotypic testing with higher turnaround times in diagnostic, therapeutic, and surveillance of antimicrobial-resistant bacteria of the family Enterobacteriaceae.
Collapse
Affiliation(s)
- Md Kaisar Rahman
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| | - Ryan B Williams
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| | - Samuel Ajulo
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| | - Gizem Levent
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| | - Guy H Loneragan
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| | - Babafela Awosile
- School of Veterinary Medicine, Texas Tech University, Amarillo, TX 79106, USA
| |
Collapse
|
4
|
Ayoola MB, Das AR, Krishnan BS, Smith DR, Nanduri B, Ramkumar M. Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility. Microorganisms 2024; 12:134. [PMID: 38257961 PMCID: PMC10819212 DOI: 10.3390/microorganisms12010134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/04/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Salmonella spp., a leading cause of foodborne illness, is a formidable global menace due to escalating antimicrobial resistance (AMR). The evaluation of minimum inhibitory concentration (MIC) for antimicrobials is critical for characterizing AMR. The current whole genome sequencing (WGS)-based approaches for predicting MIC are hindered by both computational and feature identification constraints. We propose an innovative methodology called the "Genome Feature Extractor Pipeline" that integrates traditional machine learning (random forest, RF) with deep learning models (multilayer perceptron (MLP) and DeepLift) for WGS-based MIC prediction. We used a dataset from the National Antimicrobial Resistance Monitoring System (NARMS), comprising 4500 assembled genomes of nontyphoidal Salmonella, each annotated with MIC metadata for 15 antibiotics. Our pipeline involves the batch downloading of annotated genomes, the determination of feature importance using RF, Gini-index-based selection of crucial 10-mers, and their expansion to 20-mers. This is followed by an MLP network, with four hidden layers of 1024 neurons each, to predict MIC values. Using DeepLift, key 20-mers and associated genes influencing MIC are identified. The 10 most significant 20-mers for each antibiotic are listed, showcasing our ability to discern genomic features affecting Salmonella MIC prediction with enhanced precision. The methodology replaces binary indicators with k-mer counts, offering a more nuanced analysis. The combination of RF and MLP addresses the limitations of the existing WGS approach, providing a robust and efficient method for predicting MIC values in Salmonella that could potentially be applied to other pathogens.
Collapse
Affiliation(s)
- Moses B. Ayoola
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - Athish Ram Das
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - B. Santhana Krishnan
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - David R. Smith
- Department of Population Medicine, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA;
| | - Bindu Nanduri
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - Mahalingam Ramkumar
- Department of Computer Science and Engineering, Mississippi State University, Starkville, MS 39762, USA
| |
Collapse
|
5
|
Yang J, Eyre DW, Lu L, Clifton DA. Interpretable machine learning-based decision support for prediction of antibiotic resistance for complicated urinary tract infections. NPJ ANTIMICROBIALS AND RESISTANCE 2023; 1:14. [PMID: 38686216 PMCID: PMC11057209 DOI: 10.1038/s44259-023-00015-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Accepted: 10/04/2023] [Indexed: 05/02/2024]
Abstract
Urinary tract infections are one of the most common bacterial infections worldwide; however, increasing antimicrobial resistance in bacterial pathogens is making it challenging for clinicians to correctly prescribe patients appropriate antibiotics. In this study, we present four interpretable machine learning-based decision support algorithms for predicting antimicrobial resistance. Using electronic health record data from a large cohort of patients diagnosed with potentially complicated UTIs, we demonstrate high predictability of antibiotic resistance across four antibiotics - nitrofurantoin, co-trimoxazole, ciprofloxacin, and levofloxacin. We additionally demonstrate the generalizability of our methods on a separate cohort of patients with uncomplicated UTIs, demonstrating that machine learning-driven approaches can help alleviate the potential of administering non-susceptible treatments, facilitate rapid effective clinical interventions, and enable personalized treatment suggestions. Additionally, these techniques present the benefit of providing model interpretability, explaining the basis for generated predictions.
Collapse
Affiliation(s)
- Jenny Yang
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
| | - David W. Eyre
- Big Data Institute, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Lei Lu
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
| | - David A. Clifton
- Institute of Biomedical Engineering, Department Engineering Science, University of Oxford, Oxford, UK
- Oxford-Suzhou Centre for Advanced Research (OSCAR), Suzhou, China
| |
Collapse
|
6
|
Yang MR, Su SF, Wu YW. Using bacterial pan-genome-based feature selection approach to improve the prediction of minimum inhibitory concentration (MIC). Front Genet 2023; 14:1054032. [PMID: 37323667 PMCID: PMC10267731 DOI: 10.3389/fgene.2023.1054032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 05/16/2023] [Indexed: 06/17/2023] Open
Abstract
Background: Predicting the resistance profiles of antimicrobial resistance (AMR) pathogens is becoming more and more important in treating infectious diseases. Various attempts have been made to build machine learning models to classify resistant or susceptible pathogens based on either known antimicrobial resistance genes or the entire gene set. However, the phenotypic annotations are translated from minimum inhibitory concentration (MIC), which is the lowest concentration of antibiotic drugs in inhibiting certain pathogenic strains. Since the MIC breakpoints that classify a strain to be resistant or susceptible to specific antibiotic drug may be revised by governing institutes, we refrained from translating these MIC values into the categories "susceptible" or "resistant" but instead attempted to predict the MIC values using machine learning approaches. Results: By applying a machine learning feature selection approach on a Salmonella enterica pan-genome, in which the protein sequences were clustered to identify highly similar gene families, we showed that the selected features (genes) performed better than known AMR genes, and that models built on the selected genes achieved very accurate MIC prediction. Functional analysis revealed that about half of the selected genes were annotated as hypothetical proteins (i.e., with unknown functional roles), and that only a small portion of known AMR genes were among the selected genes, indicating that applying feature selection on the entire gene set has the potential of uncovering novel genes that may be associated with and may contribute to pathogenic antimicrobial resistances. Conclusion: The application of the pan-genome-based machine learning approach was indeed capable of predicting MIC values with very high accuracy. The feature selection process may also identify novel AMR genes for inferring bacterial antimicrobial resistance phenotypes.
Collapse
Affiliation(s)
- Ming-Ren Yang
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Shun-Feng Su
- Department of Electrical Engineering, National Taiwan University of Science and Technology, Taipei, Taiwan
| | - Yu-Wei Wu
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei, Taiwan
- TMU Research Center for Digestive Medicine, Taipei Medical University, Taipei, Taiwan
| |
Collapse
|
7
|
Li S, Wu J, Ma N, Liu W, Shao M, Ying N, Zhu L. Prediction of genome-wide imipenem resistance features in Klebsiella pneumoniae using machine learning. J Med Microbiol 2023; 72. [PMID: 36753438 DOI: 10.1099/jmm.0.001657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Introduction. The resistance rate of Klebsiella pneumoniae (K. pneumoniae) to imipenem is increasing year by year, and the imipenem resistance mechanism of K. pneumoniae is complex. Therefore, it is urgent to develop new strategies to explore the resistance mechanism of imipenem for its effective and accurate use in clinical practice.Hypothesis/Gap sStatement. Machine learning could identify resistance features and biological process that influence microbial resistance from whole-genome sequencing (WGS) data.Aims. This work aimed to predict imipenem resistance genetic features in K. pneumoniae from whole-genome k-mer features, and analyse their function for understanding its resistance mechanism.Methods. This study analysed WGS data of K. pneumoniae combined with resistance phenotype for imipenem, and established K. pneumoniae to imipenem genotype-phenotype model to predict resistance features using chi-squared test and random forest. An external clinical dataset was used to verify prediction power of resistance features. The potential genes were identified through alignment the resistance features with the K. pneumoniae reference genome using blastn, the functions of potential genes were further analysed to explore its resistance-related signalling pathways with GO and KEGG analysis, the resistance sequence patterns were screened using streme software. Finally, the resistance features were combined and modelled through four machine-learning algorithms (logistic regression, SVM, GBDT and XGBoost) to evaluate their phenotype prediction ability.Results. A total of 16 670 imipenem resistance features were predicted from genotype-phenotype model. The 30 potential genes were identified by annotating the resistance features and corresponded to known antibiotic-related genes (mdtM, dedA, rne, etc.). GO and KEGG pathway analyses indicated the possible association of imipenem resistance with metabolism process and cell membrane. CRYCAGCDN and CGRDAAAN were found from the imipenem resistance features, which were widely presented in the reported β-lactam resistance genes (bla SHV, bla CTX-M, bla TEM, etc.), and YCYAGCMCAST with metabolic functions (organic substance metabolic process, nitrogen compound metabolic process and cellular metabolic process) was identified from the top 50 resistance features. The 25 resistance genes in the training dataset included 19 genes in the external dataset, which verified the accuracy of prediction. The area under curve values of logistics regression, SVM, GBDT and XGBoost were 0.965, 0.966, 0.969 and 0.969, respectively, indicating that the imipenem resistance features have a strong prediction power.Conclusion. Machine-learning methods could effectively predict the imipenem resistance feature in K. pneumoniae, and provide resistance sequence profiles for predicting resistance phenotype and exploring potential resistance mechanisms. It provides an important insight into the potential therapeutic strategies of K. pneumoniae resistance to imipenem, and speed up the application of machine learning in routine diagnosis.
Collapse
Affiliation(s)
- Shanshan Li
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Jun Wu
- Lin'an Center for Disease Control and Prevention, Lin'an, 311300, PR China
| | - Nan Ma
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Wenjia Liu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou 310018, PR China
| | - Mengjie Shao
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Nanjiao Ying
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Lei Zhu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| |
Collapse
|
8
|
Machine Learning for Antimicrobial Resistance Prediction: Current Practice, Limitations, and Clinical Perspective. Clin Microbiol Rev 2022; 35:e0017921. [PMID: 35612324 DOI: 10.1128/cmr.00179-21] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Antimicrobial resistance (AMR) is a global health crisis that poses a great threat to modern medicine. Effective prevention strategies are urgently required to slow the emergence and further dissemination of AMR. Given the availability of data sets encompassing hundreds or thousands of pathogen genomes, machine learning (ML) is increasingly being used to predict resistance to different antibiotics in pathogens based on gene content and genome composition. A key objective of this work is to advocate for the incorporation of ML into front-line settings but also highlight the further refinements that are necessary to safely and confidently incorporate these methods. The question of what to predict is not trivial given the existence of different quantitative and qualitative laboratory measures of AMR. ML models typically treat genes as independent predictors, with no consideration of structural and functional linkages; they also may not be accurate when new mutational variants of known AMR genes emerge. Finally, to have the technology trusted by end users in public health settings, ML models need to be transparent and explainable to ensure that the basis for prediction is clear. We strongly advocate that the next set of AMR-ML studies should focus on the refinement of these limitations to be able to bridge the gap to diagnostic implementation.
Collapse
|
9
|
Aytan-Aktug D, Clausen PTLC, Szarvas J, Munk P, Otani S, Nguyen M, Davis JJ, Lund O, Aarestrup FM. PlasmidHostFinder: Prediction of Plasmid Hosts Using Random Forest. mSystems 2022; 7:e0118021. [PMID: 35382558 PMCID: PMC9040769 DOI: 10.1128/msystems.01180-21] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2021] [Accepted: 03/16/2022] [Indexed: 11/20/2022] Open
Abstract
Plasmids play a major role facilitating the spread of antimicrobial resistance between bacteria. Understanding the host range and dissemination trajectories of plasmids is critical for surveillance and prevention of antimicrobial resistance. Identification of plasmid host ranges could be improved using automated pattern detection methods compared to homology-based methods due to the diversity and genetic plasticity of plasmids. In this study, we developed a method for predicting the host range of plasmids using machine learning-specifically, random forests. We trained the models with 8,519 plasmids from 359 different bacterial species per taxonomic level; the models achieved Matthews correlation coefficients of 0.662 and 0.867 at the species and order levels, respectively. Our results suggest that despite the diverse nature and genetic plasticity of plasmids, our random forest model can accurately distinguish between plasmid hosts. This tool is available online through the Center for Genomic Epidemiology (https://cge.cbs.dtu.dk/services/PlasmidHostFinder/). IMPORTANCE Antimicrobial resistance is a global health threat to humans and animals, causing high mortality and morbidity while effectively ending decades of success in fighting against bacterial infections. Plasmids confer extra genetic capabilities to the host organisms through accessory genes that can encode antimicrobial resistance and virulence. In addition to lateral inheritance, plasmids can be transferred horizontally between bacterial taxa. Therefore, detection of the host range of plasmids is crucial for understanding and predicting the dissemination trajectories of extrachromosomal genes and bacterial evolution as well as taking effective countermeasures against antimicrobial resistance.
Collapse
Affiliation(s)
- Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | | | - Judit Szarvas
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Patrick Munk
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Saria Otani
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Marcus Nguyen
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
| | - James J. Davis
- Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, Illinois, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, Illinois, USA
| | - Ole Lund
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Frank M. Aarestrup
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
10
|
Florensa AF, Kaas RS, Clausen PTLC, Aytan-Aktug D, Aarestrup FM. ResFinder - an open online resource for identification of antimicrobial resistance genes in next-generation sequencing data and prediction of phenotypes from genotypes. Microb Genom 2022; 8. [PMID: 35072601 PMCID: PMC8914360 DOI: 10.1099/mgen.0.000748] [Citation(s) in RCA: 131] [Impact Index Per Article: 65.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Antimicrobial resistance (AMR) is one of the most important health threats globally. The ability to accurately identify resistant bacterial isolates and the individual antimicrobial resistance genes (ARGs) is essential for understanding the evolution and emergence of AMR and to provide appropriate treatment. The rapid developments in next-generation sequencing technologies have made this technology available to researchers and microbiologists at routine laboratories around the world. However, tools available for those with limited experience with bioinformatics are lacking, especially to enable researchers and microbiologists in low- and middle-income countries (LMICs) to perform their own studies. The CGE-tools (Center for Genomic Epidemiology) including ResFinder (https://cge.cbs.dtu.dk/services/ResFinder/) was developed to provide freely available easy to use online bioinformatic tools allowing inexperienced researchers and microbiologists to perform simple bioinformatic analyses. The main purpose was and is to provide these solutions for people involved in frontline diagnosis especially in LMICs. Since its original publication in 2012, ResFinder has undergone a number of improvements including improvement of the code and databases, inclusion of point mutations for selected bacterial species and predictions of phenotypes also for selected species. As of 28 September 2021, 820 803 analyses have been performed using ResFinder from 61 776 IP-addresses in 171 countries. ResFinder clearly fulfills a need for several people around the globe and we hope to be able to continue to provide this service free of charge in the future. We also hope and expect to provide further improvements including phenotypic predictions for additional bacterial species.
Collapse
Affiliation(s)
| | - Rolf Sommer Kaas
- National Food Institute, Technical University of Denmark, DK-2800 kgs. Lyngby, Denmark
| | | | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, DK-2800 kgs. Lyngby, Denmark
| | | |
Collapse
|
11
|
Nguyen Q, Nguyen TTN, Pham P, Chau V, Nguyen LPH, Nguyen TD, Ha TT, Le NTQ, Vu DT, Baker S, Thwaites GE, Rabaa MA, Pham DT. Genomic insights into the circulation of pandemic fluoroquinolone-resistant extra-intestinal pathogenic Escherichia coli ST1193 in Vietnam. Microb Genom 2021; 7. [PMID: 34904942 PMCID: PMC8767341 DOI: 10.1099/mgen.0.000733] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Extra-intestinal pathogenic Escherichia coli (ExPEC) ST1193, a globally emergent fluoroquinolone-resistant clone, has become an important cause of bloodstream infections (BSIs) associated with significant morbidity and mortality. Previous studies have reported the emergence of fluoroquinolone-resistant ExPEC ST1193 in Vietnam; however, limited data exist regarding the genetic structure, antimicrobial resistance (AMR) determinants and transmission dynamics of this pandemic clone. Here, we performed genomic and phylogenetic analyses of 46 ST1193 isolates obtained from BSIs and healthy individuals in Ho Chi Minh City, Vietnam, to investigate the pathogen population structure, molecular mechanisms of AMR and potential transmission patterns. We further examined the phylogenetic structure of ST1193 isolates in a global context. We found that the endemic E. coli ST1193 population was heterogeneous and highly dynamic, largely driven by multiple strain importations. Several well-supported phylogenetic clusters (C1-C6) were identified and associated with distinct bla CTX-M variants, including bla CTXM-27 (C1-C3, C5), bla CTXM-55 (C4) and bla CTXM-15 (C6). Most ST1193 isolates were multidrug-resistant and carried an extensive array of AMR genes. ST1193 isolates also exhibited the ability to acquire further resistance while circulating in Vietnam. There were phylogenetic links between ST1193 isolates from BSIs and healthy individuals, suggesting these organisms may both establish long-term colonization in the human intestinal tract and induce infections. Our study uncovers factors shaping the population structure and transmission dynamics of multidrug-resistant ST1193 in Vietnam, and highlights the urgent need for local One Health genomic surveillance to capture new emerging ExPEC clones and to better understand the origins and transmission patterns of these pathogens.
Collapse
Affiliation(s)
- Quynh Nguyen
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | | | - Phuong Pham
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Vinh Chau
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | | | | | - Tuyen Thanh Ha
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Nhi Thi Quynh Le
- The University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
| | | | - Stephen Baker
- Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Department of Medicine, University of Cambridge, Cambridge, UK
| | - Guy E Thwaites
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.,Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Maia A Rabaa
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.,Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Duy Thanh Pham
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam.,Centre for Tropical Medicine and Global Health, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| |
Collapse
|
12
|
Schuele L, Cassidy H, Peker N, Rossen JWA, Couto N. Future potential of metagenomics in clinical laboratories. Expert Rev Mol Diagn 2021; 21:1273-1285. [PMID: 34755585 DOI: 10.1080/14737159.2021.2001329] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
INTRODUCTION Rapid and sensitive diagnostic strategies are necessary for patient care and public health. Most of the current conventional microbiological assays detect only a restricted panel of pathogens at a time or require a microbe to be successfully cultured from a sample. Clinical metagenomics next-generation sequencing (mNGS) has the potential to unbiasedly detect all pathogens in a sample, increasing the sensitivity for detection and enabling the discovery of unknown infectious agents. AREAS COVERED High expectations have been built around mNGS; however, this technique is far from widely available. This review highlights the advances and currently available options in terms of costs, turnaround time, sensitivity, specificity, validation, and reproducibility of mNGS as a diagnostic tool in clinical microbiology laboratories. EXPERT OPINION The need for a novel diagnostic tool to increase the sensitivity of microbial diagnostics is clear. mNGS has the potential to revolutionise clinical microbiology. However, its role as a diagnostic tool has yet to be widely established, which is crucial for successfully implementing the technique. A clear definition of diagnostic algorithms that include mNGS is vital to show clinical utility. Similarly to real-time PCR, mNGS will one day become a vital tool in any testing algorithm.
Collapse
Affiliation(s)
- Leonard Schuele
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, the Netherlands
| | - Hayley Cassidy
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, the Netherlands
| | - Nilay Peker
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, the Netherlands
| | - John W A Rossen
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, the Netherlands.,Department of Pathology, University of Utah School of Medicine, Salt Lake City, UT, USA
| | - Natacha Couto
- University of Groningen, University Medical Center Groningen, Department of Medical Microbiology and Infection Prevention, Groningen, the Netherlands.,The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| |
Collapse
|
13
|
VanOeffelen M, Nguyen M, Aytan-Aktug D, Brettin T, Dietrich EM, Kenyon RW, Machi D, Mao C, Olson R, Pusch GD, Shukla M, Stevens R, Vonstein V, Warren AS, Wattam AR, Yoo H, Davis JJ. A genomic data resource for predicting antimicrobial resistance from laboratory-derived antimicrobial susceptibility phenotypes. Brief Bioinform 2021; 22:bbab313. [PMID: 34379107 PMCID: PMC8575023 DOI: 10.1093/bib/bbab313] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 06/18/2021] [Accepted: 07/20/2021] [Indexed: 11/14/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major global health threat that affects millions of people each year. Funding agencies worldwide and the global research community have expended considerable capital and effort tracking the evolution and spread of AMR by isolating and sequencing bacterial strains and performing antimicrobial susceptibility testing (AST). For the last several years, we have been capturing these efforts by curating data from the literature and data resources and building a set of assembled bacterial genome sequences that are paired with laboratory-derived AST data. This collection currently contains AST data for over 67 000 genomes encompassing approximately 40 genera and over 100 species. In this paper, we describe the characteristics of this collection, highlighting areas where sampling is comparatively deep or shallow, and showing areas where attention is needed from the research community to improve sampling and tracking efforts. In addition to using the data to track the evolution and spread of AMR, it also serves as a useful starting point for building machine learning models for predicting AMR phenotypes. We demonstrate this by describing two machine learning models that are built from the entire dataset to show where the predictive power is comparatively high or low. This AMR metadata collection is freely available and maintained on the Bacterial and Viral Bioinformatics Center (BV-BRC) FTP site ftp://ftp.bvbrc.org/RELEASE_NOTES/PATRIC_genomes_AMR.txt.
Collapse
Affiliation(s)
| | - Marcus Nguyen
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Thomas Brettin
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Emily M Dietrich
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
| | - Ronald W Kenyon
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Dustin Machi
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Chunhong Mao
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Robert Olson
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Gordon D Pusch
- Fellowship for Interpretation of Genomes, Burr Ridge, IL, USA
| | - Maulik Shukla
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - Rick Stevens
- Computing Environment and Life Sciences, Argonne National Laboratory, Argonne, IL, USA
- Department of Computer Science, University of Chicago, Chicago, IL, USA
| | | | - Andrew S Warren
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Alice R Wattam
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Biocomplexity Institute and Initiative, University of Virginia, Virginia, USA
| | - Hyunseung Yoo
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
| | - James J Davis
- University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, IL, USA
- Data Science and Learning Division, Argonne National Laboratory, Argonne, IL, USA
- Northwestern Argonne Institute for Science and Engineering, Evanston, IL, USA
| |
Collapse
|
14
|
Tan R, Yu A, Liu Z, Liu Z, Jiang R, Wang X, Liu J, Gao J, Wang X. Prediction of Minimal Inhibitory Concentration of Meropenem Against Klebsiella pneumoniae Using Metagenomic Data. Front Microbiol 2021; 12:712886. [PMID: 34497594 PMCID: PMC8421019 DOI: 10.3389/fmicb.2021.712886] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 07/26/2021] [Indexed: 11/29/2022] Open
Abstract
Minimal inhibitory concentration (MIC) is defined as the lowest concentration of an antimicrobial agent that can inhibit the visible growth of a particular microorganism after overnight incubation. Clinically, antibiotic doses for specific infections are determined according to the fraction of MIC. Therefore, credible assessment of MICs will provide a physician valuable information on the choice of therapeutic strategy. Early and precise usage of antibiotics is the key to an infection therapy. Compared with the traditional culture-based method, the approach of whole genome sequencing to identify MICs can shorten the experimental time, thereby improving clinical efficacy. Klebsiella pneumoniae is one of the most significant members of the genus Klebsiella in the Enterobacteriaceae family and also a common non-social pathogen. Meropenem is a broad-spectrum antibacterial agent of the carbapenem family, which can produce antibacterial effects of most Gram-positive and -negative bacteria. In this study, we used single-nucleotide polymorphism (SNP) information and nucleotide k-mers count based on metagenomic data to predict MICs of meropenem against K. pneumoniae. Then, features of 110 sequenced K. pneumoniae genome data were combined and modeled with XGBoost algorithm and deep neural network (DNN) algorithm to predict MICs. We first use the XGBoost classification model and the XGBoost regression model. After five runs, the average accuracy of the test set was calculated. The accuracy of using nucleotide k-mers to predict MICs of the XGBoost classification model and XGBoost regression model was 84.5 and 89.1%. The accuracy of SNP in predicting MIC was 80 and 81.8%, respectively. The results show that XGBoost regression is better than XGBoost classification in both nucleotide k-mers and SNPs to predict MICs. We further selected 40 nucleotide k-mers and 40 SNPs with the highest correlation with MIC values as features to retrain the XGBoost regression model and DNN regression model. After 100 and 1,000 runs, the results show that the accuracy of the two models was improved. The accuracy of the XGBoost regression model for k-mers, SNPs, and k-mers & SNPs was 91.1, 85.2, and 91.3%, respectively. The accuracy of the DNN regression model was 91.9, 87.1, and 91.8%, respectively. Through external verification, some of the selected features were found to be related to drug resistance.
Collapse
Affiliation(s)
- Rundong Tan
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai, China.,Shanghai Zhangjiang Institute of Medical Innovation, Shanghai, China
| | - Anqi Yu
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai, China.,Shanghai Zhangjiang Institute of Medical Innovation, Shanghai, China
| | - Ziming Liu
- Medical Information Engineering, Department of Medical Information, Harbin Medical University, Harbin, China
| | - Ziqi Liu
- Department of Biostatistics, School of Global Public Health, New York University, New York, NY, United States
| | - Rongfeng Jiang
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai, China.,Shanghai Zhangjiang Institute of Medical Innovation, Shanghai, China
| | - Xiaoli Wang
- Department of Critical Care Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Jialin Liu
- Department of Critical Care Medicine, Ruijin Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
| | - Junhui Gao
- Shanghai Biotecan Pharmaceuticals Co., Ltd., Shanghai, China.,Shanghai Zhangjiang Institute of Medical Innovation, Shanghai, China
| | - Xinjun Wang
- Translational Medical Center for Stem Cell Therapy, Shanghai East Hospital, School of Medicine, Tongji University, Shanghai, China
| |
Collapse
|
15
|
Abstract
Antimicrobial resistance (AMR) is an important global health threat that impacts millions of people worldwide each year. Developing methods that can detect and predict AMR phenotypes can help to mitigate the spread of AMR by informing clinical decision making and appropriate mitigation strategies. Many bioinformatic methods have been developed for predicting AMR phenotypes from whole-genome sequences and AMR genes, but recent studies have indicated that predictions can be made from incomplete genome sequence data. In order to more systematically understand this, we built random forest-based machine learning classifiers for predicting susceptible and resistant phenotypes for Klebsiella pneumoniae (1,640 strains), Mycobacterium tuberculosis (2,497 strains), and Salmonella enterica (1,981 strains). We started by building models from alignments that were based on a reference chromosome for each species. We then subsampled each chromosomal alignment and built models for the resulting subalignments, finding that very small regions, representing approximately 0.1 to 0.2% of the chromosome, are predictive. In K. pneumoniae, M. tuberculosis, and S. enterica, the subalignments are able to predict multiple AMR phenotypes with at least 70% accuracy, even though most do not encode an AMR-related function. We used these models to identify regions of the chromosome with high and low predictive signals. Finally, subalignments that retain high accuracy across larger phylogenetic distances were examined in greater detail, revealing genes and intergenic regions with potential links to AMR, virulence, transport, and survival under stress conditions. IMPORTANCE Antimicrobial resistance causes thousands of deaths annually worldwide. Understanding the regions of the genome that are involved in antimicrobial resistance is important for developing mitigation strategies and preventing transmission. Machine learning models are capable of predicting antimicrobial resistance phenotypes from bacterial genome sequence data by identifying resistance genes, mutations, and other correlated features. They are also capable of implicating regions of the genome that have not been previously characterized as being involved in resistance. In this study, we generated global chromosomal alignments for Klebsiella pneumoniae, Mycobacterium tuberculosis, and Salmonella enterica and systematically searched them for small conserved regions of the genome that enable the prediction of antimicrobial resistance phenotypes. In addition to known antimicrobial resistance genes, this analysis identified genes involved in virulence and transport functions, as well as many genes with no previous implication in antimicrobial resistance.
Collapse
|
16
|
Zhang Z, van Dijk F, de Klein N, van Gijn ME, Franke LH, Sinke RJ, Swertz MA, van der Velde KJ. Feasibility of predicting allele specific expression from DNA sequencing using machine learning. Sci Rep 2021; 11:10606. [PMID: 34012022 PMCID: PMC8134421 DOI: 10.1038/s41598-021-89904-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 05/04/2021] [Indexed: 11/09/2022] Open
Abstract
Allele specific expression (ASE) concerns divergent expression quantity of alternative alleles and is measured by RNA sequencing. Multiple studies show that ASE plays a role in hereditary diseases by modulating penetrance or phenotype severity. However, genome diagnostics is based on DNA sequencing and therefore neglects gene expression regulation such as ASE. To take advantage of ASE in absence of RNA sequencing, it must be predicted using only DNA variation. We have constructed ASE models from BIOS (n = 3432) and GTEx (n = 369) that predict ASE using DNA features. These models are highly reproducible and comprise many different feature types, highlighting the complex regulation that underlies ASE. We applied the BIOS-trained model to population variants in three genes in which ASE plays a clinically relevant role: BRCA2, RET and NF1. This resulted in predicted ASE effects for 27 variants, of which 10 were known pathogenic variants. We demonstrated that ASE can be predicted from DNA features using machine learning. Future efforts may improve sensitivity and translate these models into a new type of genome diagnostic tool that prioritizes candidate pathogenic variants or regulators thereof for follow-up validation by RNA sequencing. All used code and machine learning models are available at GitHub and Zenodo.
Collapse
Affiliation(s)
- Zhenhua Zhang
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Freerk van Dijk
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Prinses Maxima Center for Child Oncology, Heidelberglaan 25, 3584 CS, Utrecht, The Netherlands
| | - Niek de Klein
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Mariëlle E van Gijn
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Lude H Franke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Richard J Sinke
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - Morris A Swertz
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands
| | - K Joeri van der Velde
- Genomics Coordination Center, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
- Department of Genetics, University of Groningen and University Medical Center Groningen, Antonius Deusinglaan 1, 9713 AV, Groningen, The Netherlands.
| |
Collapse
|
17
|
Avershina E, Sharma P, Taxt AM, Singh H, Frye SA, Paul K, Kapil A, Naseer U, Kaur P, Ahmad R. AMR-Diag: Neural network based genotype-to-phenotype prediction of resistance towards β-lactams in Escherichia coli and Klebsiella pneumoniae. Comput Struct Biotechnol J 2021; 19:1896-1906. [PMID: 33897984 PMCID: PMC8060595 DOI: 10.1016/j.csbj.2021.03.027] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2021] [Revised: 03/15/2021] [Accepted: 03/23/2021] [Indexed: 12/11/2022] Open
Abstract
Antibiotic resistance poses a major threat to public health. More effective ways of the antibiotic prescription are needed to delay the spread of antibiotic resistance. Employment of sequencing technologies coupled with the use of trained neural network algorithms for genotype-to-phenotype prediction will reduce the time needed for antibiotic susceptibility profile identification from days to hours. In this work, we have sequenced and phenotypically characterized 171 clinical isolates of Escherichia coli and Klebsiella pneumoniae from Norway and India. Based on the data, we have created neural networks to predict susceptibility for ampicillin, 3rd generation cephalosporins and carbapenems. All networks were trained on unassembled data, enabling prediction within minutes after the sequencing information becomes available. Moreover, they can be used both on Illumina and MinION generated data and do not require high genome coverage for phenotype prediction. We cross-checked our networks with previously published algorithms for genotype-to-phenotype prediction and their corresponding datasets. Besides, we also created an ensemble of networks trained on different datasets, which improved the cross-dataset prediction compared to a single network. Additionally, we have used data from direct sequencing of spiked blood cultures and found that AMR-Diag networks, coupled with MinION sequencing, can predict bacterial species, resistome, and phenotype as fast as 1–8 h from the sequencing start. To our knowledge, this is the first study for genotype-to-phenotype prediction: (1) employing a neural network method; (2) using data from more than one sequencing platform; and (3) utilizing sequence data from spiked blood cultures.
Collapse
Affiliation(s)
- Ekaterina Avershina
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317 Hamar, Norway
| | - Priyanka Sharma
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Arne M Taxt
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317 Hamar, Norway.,Department of Microbiology, Division of Laboratory Medicine, Oslo University Hospital, PB 4956, Nydalen, 0424 Oslo, Norway
| | - Harpreet Singh
- Informatics, System and Research Management, Indian Council of Medical Research, New Delhi, India
| | - Stephan A Frye
- Department of Microbiology, Division of Laboratory Medicine, Oslo University Hospital, PB 4956, Nydalen, 0424 Oslo, Norway
| | - Kolin Paul
- Department of Computer Science & Engineering, IIT Delhi, New Delhi, India
| | - Arti Kapil
- Department of Microbiology, All India Institute of Medical Sciences, New Delhi, India
| | - Umaer Naseer
- Department of Zoonotic, Food- and Waterborne Infections, 0213 Oslo, Norwegian Institute of Public Health, Oslo, Norway
| | - Punit Kaur
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Rafi Ahmad
- Department of Biotechnology, Inland Norway University of Applied Sciences, Holsetgata 22, 2317 Hamar, Norway.,Institute of Clinical Medicine, Faculty of Health Sciences, UiT - The Arctic University of Norway, Hansine Hansens veg 18, 9019 Tromsø, Norway
| |
Collapse
|
18
|
Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights. BIOLOGY 2020; 9:biology9110365. [PMID: 33126516 PMCID: PMC7694136 DOI: 10.3390/biology9110365] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 10/17/2020] [Accepted: 10/19/2020] [Indexed: 12/31/2022]
Abstract
Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.
Collapse
|