1
|
Broschat SL, Siu SWI, de la Fuente-Nunez C. Editorial: Machine learning approaches to antimicrobials: discovery and resistance. FRONTIERS IN BIOINFORMATICS 2024; 4:1458237. [PMID: 39184338 PMCID: PMC11341447 DOI: 10.3389/fbinf.2024.1458237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2024] [Accepted: 07/22/2024] [Indexed: 08/27/2024] Open
Affiliation(s)
- Shira L. Broschat
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
- Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, United States
- Paul G. Allen School for Global Health, Washington State University, Pullman, WA, United States
| | - Shirley W. I. Siu
- Centre for Artificial Intelligence Driven Drug Discovery, Macao Polytechnic University, Macao, Macao SAR, China
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, Macao SAR, China
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States
- Department of Bioengineering, University of Pennsylvania, Philadelphia, PA, United States
- Department of Chemical and Biomolecular Engineering, University of Pennsylvania, Philadelphia, PA, United States
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA, United States
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, United States
| |
Collapse
|
2
|
Akhter S, Miller JH. BPAGS: a web application for bacteriocin prediction via feature evaluation using alternating decision tree, genetic algorithm, and linear support vector classifier. FRONTIERS IN BIOINFORMATICS 2024; 3:1284705. [PMID: 38268970 PMCID: PMC10807691 DOI: 10.3389/fbinf.2023.1284705] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/12/2023] [Indexed: 01/26/2024] Open
Abstract
The use of bacteriocins has emerged as a propitious strategy in the development of new drugs to combat antibiotic resistance, given their ability to kill bacteria with both broad and narrow natural spectra. Hence, a compelling requirement arises for a precise and efficient computational model that can accurately predict novel bacteriocins. Machine learning's ability to learn patterns and features from bacteriocin sequences that are difficult to capture using sequence matching-based methods makes it a potentially superior choice for accurate prediction. A web application for predicting bacteriocin was created in this study, utilizing a machine learning approach. The feature sets employed in the application were chosen using alternating decision tree (ADTree), genetic algorithm (GA), and linear support vector classifier (linear SVC)-based feature evaluation methods. Initially, potential features were extracted from the physicochemical, structural, and sequence-profile attributes of both bacteriocin and non-bacteriocin protein sequences. We assessed the candidate features first using the Pearson correlation coefficient, followed by separate evaluations with ADTree, GA, and linear SVC to eliminate unnecessary features. Finally, we constructed random forest (RF), support vector machine (SVM), decision tree (DT), logistic regression (LR), k-nearest neighbors (KNN), and Gaussian naïve Bayes (GNB) models using reduced feature sets. We obtained the overall top performing model using SVM with ADTree-reduced features, achieving an accuracy of 99.11% and an AUC value of 0.9984 on the testing dataset. We also assessed the predictive capabilities of our best-performing models for each reduced feature set relative to our previously developed software solution, a sequence alignment-based tool, and a deep-learning approach. A web application, titled BPAGS (Bacteriocin Prediction based on ADTree, GA, and linear SVC), was developed to incorporate the predictive models built using ADTree, GA, and linear SVC-based feature sets. Currently, the web-based tool provides classification results with associated probability values and has options to add new samples in the training data to improve the predictive efficacy. BPAGS is freely accessible at https://shiny.tricities.wsu.edu/bacteriocin-prediction/.
Collapse
Affiliation(s)
- Suraiya Akhter
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, United States
| | - John H. Miller
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, United States
| |
Collapse
|
3
|
Akhter S, Miller JH. BaPreS: a software tool for predicting bacteriocins using an optimal set of features. BMC Bioinformatics 2023; 24:313. [PMID: 37592230 PMCID: PMC10433575 DOI: 10.1186/s12859-023-05330-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 05/09/2023] [Indexed: 08/19/2023] Open
Abstract
BACKGROUND Antibiotic resistance is a major public health concern around the globe. As a result, researchers always look for new compounds to develop new antibiotic drugs for combating antibiotic-resistant bacteria. Bacteriocin becomes a promising antimicrobial agent to fight against antibiotic resistance, due to cases of both broad and narrow killing spectra. Sequence matching methods are widely used to identify bacteriocins by comparing them with the known bacteriocin sequences; however, these methods often fail to detect new bacteriocin sequences due to their high diversity. The ability to use a machine learning approach can help find new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. The aim of this work is to develop a machine learning-based software tool called BaPreS (Bacteriocin Prediction Software) using an optimal set of features for detecting bacteriocin protein sequences with high accuracy. We extracted potential features from known bacteriocin and non-bacteriocin sequences by considering the physicochemical and structural properties of the protein sequences. Then we reduced the feature set using statistical justifications and recursive feature elimination technique. Finally, we built support vector machine (SVM) and random forest (RF) models using the selected features and utilized the best machine learning model to implement the software tool. RESULTS We applied BaPreS to an established dataset and evaluated its prediction performance. Acquired results show that the software tool can achieve a prediction accuracy of 95.54% for testing protein sequences. This tool allows users to add new bacteriocin or non-bacteriocin sequences in the training dataset to further enhance the predictive power of the tool. We compared the prediction performance of the BaPreS with a popular sequence matching-based tool and a deep learning-based method, and our software tool outperformed both. CONCLUSIONS BaPreS is a bacteriocin prediction tool that can be used to discover new highly dissimilar bacteriocins for developing highly effective antibiotic drugs. This software tool can be used with Windows, Linux and macOS operating systems. The open-source software package and its user manual are available at https://github.com/suraiya14/BaPreS .
Collapse
Affiliation(s)
- Suraiya Akhter
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA.
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, USA.
| | - John H Miller
- School of Engineering and Applied Sciences, Washington State University Tri-Cities, Richland, WA, USA.
| |
Collapse
|
4
|
Perea-Jacobo R, Paredes-Gutiérrez GR, Guerrero-Chevannier MÁ, Flores DL, Muñiz-Salazar R. Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review. Microorganisms 2023; 11:1872. [PMID: 37630431 PMCID: PMC10456961 DOI: 10.3390/microorganisms11081872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/13/2023] [Accepted: 07/14/2023] [Indexed: 08/27/2023] Open
Abstract
Tuberculosis (TB) remains one of the most significant global health problems, posing a significant challenge to public health systems worldwide. However, diagnosing drug-resistant tuberculosis (DR-TB) has become increasingly challenging due to the rising number of multidrug-resistant (MDR-TB) cases, despite the development of new TB diagnostic tools. Even the World Health Organization-recommended methods such as Xpert MTB/XDR or Truenat are unable to detect all the Mycobacterium tuberculosis genome mutations associated with drug resistance. While Whole Genome Sequencing offers a more precise DR profile, the lack of user-friendly bioinformatics analysis applications hinders its widespread use. This review focuses on exploring various artificial intelligence models for predicting DR-TB profiles, analyzing relevant English-language articles using the PRISMA methodology through the Covidence platform. Our findings indicate that an Artificial Neural Network is the most commonly employed method, with non-statistical dimensionality reduction techniques preferred over traditional statistical approaches such as Principal Component Analysis or t-distributed Stochastic Neighbor Embedding.
Collapse
Affiliation(s)
- Ricardo Perea-Jacobo
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
- Escuela de Ciencias de la Salud, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22890, Mexico
| | - Guillermo René Paredes-Gutiérrez
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Miguel Ángel Guerrero-Chevannier
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Dora-Luz Flores
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Raquel Muñiz-Salazar
- Escuela de Ciencias de la Salud, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22890, Mexico
| |
Collapse
|
5
|
Sidorczuk K, Gagat P, Kała J, Nielsen H, Pietluch F, Mackiewicz P, Burdukiewicz M. Prediction of protein subplastid localization and origin with PlastoGram. Sci Rep 2023; 13:8365. [PMID: 37225726 DOI: 10.1038/s41598-023-35296-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2023] [Accepted: 05/16/2023] [Indexed: 05/26/2023] Open
Abstract
Due to their complex history, plastids possess proteins encoded in the nuclear and plastid genome. Moreover, these proteins localize to various subplastid compartments. Since protein localization is associated with its function, prediction of subplastid localization is one of the most important steps in plastid protein annotation, providing insight into their potential function. Therefore, we create a novel manually curated data set of plastid proteins and build an ensemble model for prediction of protein subplastid localization. Moreover, we discuss problems associated with the task, e.g. data set sizes and homology reduction. PlastoGram classifies proteins as nuclear- or plastid-encoded and predicts their localization considering: envelope, stroma, thylakoid membrane or thylakoid lumen; for the latter, the import pathway is also predicted. We also provide an additional function to differentiate nuclear-encoded inner and outer membrane proteins. PlastoGram is available as a web server at https://biogenies.info/PlastoGram and as an R package at https://github.com/BioGenies/PlastoGram . The code used for described analyses is available at https://github.com/BioGenies/PlastoGram-analysis .
Collapse
Affiliation(s)
| | - Przemysław Gagat
- Faculty of Biotechnology, University of Wrocław, 50-383, Wrocław, Poland
| | - Jakub Kała
- Faculty of Mathematics and Information Science, Warsaw University of Technology, 00-662, Warsaw, Poland
| | - Henrik Nielsen
- Department of Health Technology, Technical University of Denmark, 2800, Kgs. Lyngby, Denmark
| | - Filip Pietluch
- Faculty of Biotechnology, University of Wrocław, 50-383, Wrocław, Poland
| | - Paweł Mackiewicz
- Faculty of Biotechnology, University of Wrocław, 50-383, Wrocław, Poland
| | - Michał Burdukiewicz
- Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, 08193, Cerdanyola del Vallés, Spain.
- Clinical Research Centre, Medical University of Białystok, 15-089, Białystok, Poland.
| |
Collapse
|
6
|
Lu J, Tsoi R, Luo N, Ha Y, Wang S, Kwak M, Baig Y, Moiseyev N, Tian S, Zhang A, Gong NZ, You L. Distributed information encoding and decoding using self-organized spatial patterns. PATTERNS (NEW YORK, N.Y.) 2022; 3:100590. [PMID: 36277815 PMCID: PMC9583124 DOI: 10.1016/j.patter.2022.100590] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/10/2022] [Revised: 07/29/2022] [Accepted: 08/24/2022] [Indexed: 11/28/2022]
Abstract
Dynamical systems often generate distinct outputs according to different initial conditions, and one can infer the corresponding input configuration given an output. This property captures the essence of information encoding and decoding. Here, we demonstrate the use of self-organized patterns that generate high-dimensional outputs, combined with machine learning, to achieve distributed information encoding and decoding. Our approach exploits a critical property of many natural pattern-formation systems: in repeated realizations, each initial configuration generates similar but not identical output patterns due to randomness in the patterning process. However, for sufficiently small randomness, different groups of patterns that arise from different initial configurations can be distinguished from one another. Modulating the pattern-generation and machine learning model training can tune the tradeoff between encoding capacity and security. We further show that this strategy is scalable by implementing the encoding and decoding of all characters of the standard English keyboard.
Collapse
Affiliation(s)
- Jia Lu
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Ryan Tsoi
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Nan Luo
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Yuanchi Ha
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Shangying Wang
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
| | - Minjun Kwak
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Yasa Baig
- Department of Physics, Duke University, Durham, NC 27708, USA
| | - Nicole Moiseyev
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Shari Tian
- Department of Statistical Science, Duke University, Durham, NC 27708, USA
| | - Alison Zhang
- Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
| | - Neil Zhenqiang Gong
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Electrical and Computer Engineering, Duke University, Durham, NC 27708, USA
| | - Lingchong You
- Department of Biomedical Engineering, Duke University, Durham, NC 27708, USA
- Center for Genomic and Computational Biology, Duke University, Durham, NC 27708, USA
- Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27708, USA
| |
Collapse
|
7
|
Sivaramakrishnan M, Suresh R, Ponraj K. Predicting quorum sensing peptides using stacked generalization ensemble with gradient boosting based feature selection. J Microbiol 2022; 60:756-765. [DOI: 10.1007/s12275-022-2044-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Revised: 03/30/2022] [Accepted: 04/11/2022] [Indexed: 11/24/2022]
|
8
|
Sharma A, Machado E, Lima KVB, Suffys PN, Conceição EC. Tuberculosis drug resistance profiling based on machine learning: A literature review. Braz J Infect Dis 2022; 26:102332. [PMID: 35176257 PMCID: PMC9387475 DOI: 10.1016/j.bjid.2022.102332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2021] [Revised: 12/18/2021] [Accepted: 01/01/2022] [Indexed: 11/30/2022] Open
Abstract
Tuberculosis (TB), caused by Mycobacterium tuberculosis (MTB), is one of the top 10 causes of death worldwide. Drug-resistant tuberculosis (DR-TB) poses a major threat to the World Health Organization's "End TB" strategy which has defined its target as the year 2035. In 2019, there were close to 0.5 million cases of DRTB, of which 78% were resistant to multiple TB drugs. The traditional culture-based drug susceptibility test (DST - the current gold standard) often takes multiple weeks and the necessary laboratory facilities are not readily available in low-income countries. Whole genome sequencing (WGS) technology is rapidly becoming an important tool in clinical and research applications including transmission detection or prediction of DR-TB. For the latter, many tools have recently been developed using curated database(s) of known resistance conferring mutations. However, documenting all the mutations and their effect is a time-taking and a continuous process and therefore Machine Learning (ML) techniques can be useful for predicting the presence of DR-TB based on WGS data. This can pave the way to an earlier detection of drug resistance and consequently more efficient treatment when compared to the traditional DST.
Collapse
Affiliation(s)
- Abhinav Sharma
- Faculty of Engineering and Technology, Liverpool John Moores University (LJMU), Liverpool, United Kingdom
| | - Edson Machado
- Fundação Oswaldo Cruz-Fiocruz, Instituto Oswaldo Cruz, Laboratório de Biologia Molecular Aplicada a Micobactérias, Rio de Janeiro, RJ, Brazil
| | - Karla Valeria Batista Lima
- Instituto Evandro Chagas, Seção de Bacteriologia e Micologia, Ananindeua, PA, Brazil
- Universidade do Estado do Pará, Instituto de Ciências Biológicas e da Saúde, Pós-Graduação em Biologia Parasitária na Amazônia, Belém, PA, Brazil
| | - Philip Noel Suffys
- Fundação Oswaldo Cruz-Fiocruz, Instituto Oswaldo Cruz, Laboratório de Biologia Molecular Aplicada a Micobactérias, Rio de Janeiro, RJ, Brazil
| | - Emilyn Costa Conceição
- Programa de Pós-graduação em Pesquisa Clínica e Doenças Infecciosas, Instituto Nacional de Infectologia Evandro Chagas, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil
- Department of Science and Innovation - National Research Foundation Centre of Excellence for Biomedical Tuberculosis Research, South African Medical Research Council Centre for Tuberculosis Research, Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, Stellenbosch University, Cape Town, South Africa
| |
Collapse
|
9
|
Chaplin AV, Korzhanova M, Korostin DO. Identification of bacterial antibiotic resistance genes in next-generation sequencing data (review of literature). Klin Lab Diagn 2021; 66:684-688. [PMID: 34882354 DOI: 10.51620/0869-2084-2021-66-11-684-688] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The spread of antibiotic-resistant human bacterial pathogens is a serious threat to modern medicine. Antibiotic susceptibility testing is essential for treatment regimens optimization and preventing dissemination of antibiotic resistance. Therefore, development of antibiotic susceptibility testing methods is a priority challenge of laboratory medicine. The aim of this review is to analyze the capabilities of the bioinformatics tools for bacterial whole genome sequence data processing. The PubMed database, Russian scientific electronic library eLIBRARY, information networks of World health organization and European Society of Clinical Microbiology and Infectious Diseases (ESCMID) were used during the analysis. In this review, the platforms for whole genome sequencing, which are suitable for detection of bacterial genetic resistance determinants, are described. The classic step of genetic resistance determinants searching is an alignment between the query nucleotide/protein sequence and the subject (database) nucleotide/protein sequence, which is performed using the nucleotide and protein sequence databases. The most commonly used databases are Resfinder, CARD, Bacterial Antimicrobial Resistance Reference Gene Database. The results of the resistance determinants searching in genome assemblies is more correct in comparison to results of the searching in contigs. The new resistance genes searching bioinformatics tools, such as neural networks and machine learning, are discussed in the review. After critical appraisal of the current antibiotic resistance databases we designed a protocol for predicting antibiotic resistance using whole genome sequence data. The designed protocol can be used as a basis of the algorithm for qualitative and quantitative antimicrobial susceptibility testing based on whole genome sequence data.
Collapse
Affiliation(s)
- A V Chaplin
- Pirogov Russian National Research Medical University
| | - M Korzhanova
- Pirogov Russian National Research Medical University
| | - D O Korostin
- Pirogov Russian National Research Medical University
| |
Collapse
|
10
|
Melo MCR, Maasch JRMA, de la Fuente-Nunez C. Accelerating antibiotic discovery through artificial intelligence. Commun Biol 2021; 4:1050. [PMID: 34504303 PMCID: PMC8429579 DOI: 10.1038/s42003-021-02586-0] [Citation(s) in RCA: 70] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2021] [Accepted: 07/16/2021] [Indexed: 02/07/2023] Open
Abstract
By targeting invasive organisms, antibiotics insert themselves into the ancient struggle of the host-pathogen evolutionary arms race. As pathogens evolve tactics for evading antibiotics, therapies decline in efficacy and must be replaced, distinguishing antibiotics from most other forms of drug development. Together with a slow and expensive antibiotic development pipeline, the proliferation of drug-resistant pathogens drives urgent interest in computational methods that promise to expedite candidate discovery. Strides in artificial intelligence (AI) have encouraged its application to multiple dimensions of computer-aided drug design, with increasing application to antibiotic discovery. This review describes AI-facilitated advances in the discovery of both small molecule antibiotics and antimicrobial peptides. Beyond the essential prediction of antimicrobial activity, emphasis is also given to antimicrobial compound representation, determination of drug-likeness traits, antimicrobial resistance, and de novo molecular design. Given the urgency of the antimicrobial resistance crisis, we analyze uptake of open science best practices in AI-driven antibiotic discovery and argue for openness and reproducibility as a means of accelerating preclinical research. Finally, trends in the literature and areas for future inquiry are discussed, as artificially intelligent enhancements to drug discovery at large offer many opportunities for future applications in antibiotic development.
Collapse
Affiliation(s)
- Marcelo C R Melo
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Jacqueline R M A Maasch
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Computer and Information Science, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA.
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
11
|
Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research. J Clin Microbiol 2021; 59:e0126020. [PMID: 33536291 DOI: 10.1128/jcm.01260-20] [Citation(s) in RCA: 58] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Antimicrobial resistance (AMR) remains one of the most challenging phenomena of modern medicine. Machine learning (ML) is a subfield of artificial intelligence that focuses on the development of algorithms that learn how to accurately predict outcome variables using large sets of predictor variables that are typically not hand selected and are minimally curated. Models are parameterized using a training data set and then applied to a test data set on which predictive performance is evaluated. The application of ML algorithms to the problem of AMR has garnered increasing interest in the past 5 years due to the exponential growth of experimental and clinical data, heavy investment in computational capacity, improvements in algorithm performance, and increasing urgency for innovative approaches to reducing the burden of disease. Here, we review the current state of research at the intersection of ML and AMR with an emphasis on three domains of work. The first is the prediction of AMR using genomic data. The second is the use of ML to gain insight into the cellular functions disrupted by antibiotics, which forms the basis for understanding mechanisms of action and developing novel anti-infectives. The third focuses on the application of ML for antimicrobial stewardship using data extracted from the electronic health record. Although the use of ML for understanding, diagnosing, treating, and preventing AMR is still in its infancy, the continued growth of data and interest ensures it will become an important tool for future translational research programs.
Collapse
|
12
|
Chowdhury AS, Reehl SM, Kehn-Hall K, Bishop B, Webb-Robertson BJM. Better understanding and prediction of antiviral peptides through primary and secondary structure feature importance. Sci Rep 2020; 10:19260. [PMID: 33159146 PMCID: PMC7648056 DOI: 10.1038/s41598-020-76161-8] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Accepted: 10/20/2020] [Indexed: 12/19/2022] Open
Abstract
The emergence of viral epidemics throughout the world is of concern due to the scarcity of available effective antiviral therapeutics. The discovery of new antiviral therapies is imperative to address this challenge, and antiviral peptides (AVPs) represent a valuable resource for the development of novel therapies to combat viral infection. We present a new machine learning model to distinguish AVPs from non-AVPs using the most informative features derived from the physicochemical and structural properties of their amino acid sequences. To focus on those features that are most likely to contribute to antiviral performance, we filter potential features based on their importance for classification. These feature selection analyses suggest that secondary structure is the most important peptide sequence feature for predicting AVPs. Our Feature-Informed Reduced Machine Learning for Antiviral Peptide Prediction (FIRM-AVP) approach achieves a higher accuracy than either the model with all features or current state-of-the-art single classifiers. Understanding the features that are associated with AVP activity is a core need to identify and design new AVPs in novel systems. The FIRM-AVP code and standalone software package are available at https://github.com/pmartR/FIRM-AVP with an accompanying web application at https://msc-viz.emsl.pnnl.gov/AVPR.
Collapse
Affiliation(s)
- Abu Sayed Chowdhury
- Biological Sciences Division, Pacific Northwest National Laboratory, J4-18, P.O. Box 999, Richland, WA, 99354, USA
| | - Sarah M Reehl
- Computing and Analytics Division, Pacific Northwest National Laboratory, P.O. Box 999, Richland, WA, 99354, USA
| | - Kylene Kehn-Hall
- School of Systems Biology, George Mason University, Manassas, VA, 20110, USA.,National Center for Biodefense and Infectious Diseases, George Mason University, Manassas, VA, 20110, USA.,Department of Biomedical Sciences and Pathobiology, Virginia Tech, Blacksburg, VA, 24061, USA
| | - Barney Bishop
- Department of Chemistry and Biochemistry, George Mason University, Manassas, VA, 20110, USA
| | - Bobbie-Jo M Webb-Robertson
- Biological Sciences Division, Pacific Northwest National Laboratory, J4-18, P.O. Box 999, Richland, WA, 99354, USA.
| |
Collapse
|
13
|
PARGT: a software tool for predicting antimicrobial resistance in bacteria. Sci Rep 2020; 10:11033. [PMID: 32620856 PMCID: PMC7335159 DOI: 10.1038/s41598-020-67949-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2020] [Accepted: 06/16/2020] [Indexed: 11/08/2022] Open
Abstract
With the ever-increasing availability of whole-genome sequences, machine-learning approaches can be used as an alternative to traditional alignment-based methods for identifying new antimicrobial-resistance genes. Such approaches are especially helpful when pathogens cannot be cultured in the lab. In previous work, we proposed a game-theory-based feature evaluation algorithm. When using the protein characteristics identified by this algorithm, called ‘features’ in machine learning, our model accurately identified antimicrobial resistance (AMR) genes in Gram-negative bacteria. Here we extend our study to Gram-positive bacteria showing that coupling game-theory-identified features with machine learning achieved classification accuracies between 87% and 90% for genes encoding resistance to the antibiotics bacitracin and vancomycin. Importantly, we present a standalone software tool that implements the game-theory algorithm and machine-learning model used in these studies.
Collapse
|
14
|
Peiffer-Smadja N, Dellière S, Rodriguez C, Birgand G, Lescure FX, Fourati S, Ruppé E. Machine learning in the clinical microbiology laboratory: has the time come for routine practice? Clin Microbiol Infect 2020; 26:1300-1309. [PMID: 32061795 DOI: 10.1016/j.cmi.2020.02.006] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 02/04/2020] [Accepted: 02/06/2020] [Indexed: 12/20/2022]
Abstract
BACKGROUND Machine learning (ML) allows the analysis of complex and large data sets and has the potential to improve health care. The clinical microbiology laboratory, at the interface of clinical practice and diagnostics, is of special interest for the development of ML systems. AIMS This narrative review aims to explore the current use of ML In clinical microbiology. SOURCES References for this review were identified through searches of MEDLINE/PubMed, EMBASE, Google Scholar, biorXiv, arXiV, ACM Digital Library and IEEE Xplore Digital Library up to November 2019. CONTENT We found 97 ML systems aiming to assist clinical microbiologists. Overall, 82 ML systems (85%) targeted bacterial infections, 11 (11%) parasitic infections, nine (9%) viral infections and three (3%) fungal infections. Forty ML systems (41%) focused on microorganism detection, identification and quantification, 36 (37%) evaluated antimicrobial susceptibility, and 21 (22%) targeted the diagnosis, disease classification and prediction of clinical outcomes. The ML systems used very diverse data sources: 21 (22%) used genomic data of microorganisms, 19 (20%) microbiota data obtained by metagenomic sequencing, 19 (20%) analysed microscopic images, 17 (18%) spectroscopy data, eight (8%) targeted gene sequencing, six (6%) volatile organic compounds, four (4%) photographs of bacterial colonies, four (4%) transcriptome data, three (3%) protein structure, and three (3%) clinical data. Most systems used data from high-income countries (n = 71, 73%) but a significant number used data from low- and middle-income countries (n = 36, 37%). Performance measures were reported for the 97 ML systems, but no article described their use in clinical practice or reported impact on processes or clinical outcomes. IMPLICATIONS In clinical microbiology, ML has been used with various data sources and diverse practical applications. The evaluation and implementation processes represent the main gap in existing ML systems, requiring a focus on their interpretability and potential integration into real-world settings.
Collapse
Affiliation(s)
- N Peiffer-Smadja
- National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK; Université de Paris, IAME, INSERM, F-75018 Paris, France
| | - S Dellière
- Université de Paris, Laboratoire de Parasitologie-Mycologie, Groupe Hospitalier Saint-Louis-Lariboisière-Fernand-Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - C Rodriguez
- Department of Prevention, Diagnosis and Treatment of Infections, Henri-Mondor Hospital, APHP, Université Paris-Est Créteil, IMRB, INSERM U955, Créteil, France
| | - G Birgand
- National Institute for Health Research Health Protection Research Unit in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, London, UK
| | - F-X Lescure
- Université de Paris, IAME, INSERM, F-75018 Paris, France
| | - S Fourati
- Department of Prevention, Diagnosis and Treatment of Infections, Henri-Mondor Hospital, APHP, Université Paris-Est Créteil, IMRB, INSERM U955, Créteil, France
| | - E Ruppé
- Université de Paris, IAME, INSERM, F-75018 Paris, France.
| |
Collapse
|
15
|
Chowdhury AS, Lofgren ET, Moehring RW, Broschat SL. Identifying predictors of antimicrobial exposure in hospitalized patients using a machine learning approach. J Appl Microbiol 2019; 128:688-696. [PMID: 31651068 DOI: 10.1111/jam.14499] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 10/17/2019] [Accepted: 10/22/2019] [Indexed: 01/07/2023]
Abstract
AIMS Analysis and tracking of antimicrobial utilization (AU) are crucial in antimicrobial stewardship efforts which are used to find effective interventions for controlling antimicrobial resistance. In antimicrobial stewardship, standard risk adjustment models are needed for benchmarking appropriate AU and for fair inter-facility comparison. In this study we identify patient- and facility-level predictors of antimicrobial usage in hospitalized patients using a machine learning approach, which can be used to inform a risk adjustment model to facilitate assessment of AU. To our knowledge, this is the first time machine learning has been applied for this purpose. METHODS AND RESULTS Patient admission records were retrieved from the Duke Antimicrobial Stewardship Outreach Network which include clinical data for 27 community hospitals in the southeastern United States. Candidate features (predictors) were then generated from these records. The number of features was reduced using a statistical approach, and missing values of the reduced feature set were imputed using bootstrapping and expectation-maximization algorithm. Finally, support vector regression (SVR) and cubist regression (CB) models were applied to find root-mean-square error values which were used to evaluate the selected feature set. The performance of the SVR and CB models was found to be better than that of linear null and negative binomial null models, thereby demonstrating the effectiveness of our selected features. CONCLUSIONS Relevant patient- and facility-level predictors of antimicrobial usage in days of therapy were obtained and evaluated. The potential predictor set can be used in risk adjustment strategies for benchmarking antimicrobial use. SIGNIFICANCE AND IMPACT OF THE STUDY One reason for the rapid emergence of antimicrobial resistance is inappropriate use of antibiotics in hospitalized patients. Identifying predictors of antimicrobial exposure using a machine learning technique can improve the use of AU, enhance patient health outcomes, and reduce the infection spread caused by antimicrobial-resistant organisms.
Collapse
Affiliation(s)
- A S Chowdhury
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA
| | - E T Lofgren
- Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, USA.,Department of Mathematics and Statistics, Washington State University, Pullman, WA, USA
| | - R W Moehring
- Department of Medicine, Duke University School of Medicine, Durham, NC, USA
| | - S L Broschat
- School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA.,Paul G. Allen School for Global Animal Health, Washington State University, Pullman, WA, USA.,Department of Veterinary Microbiology and Pathology, Washington State University, Pullman, WA, USA
| |
Collapse
|
16
|
Antimicrobial Resistance Prediction for Gram-Negative Bacteria via Game Theory-Based Feature Evaluation. Sci Rep 2019; 9:14487. [PMID: 31597945 PMCID: PMC6785542 DOI: 10.1038/s41598-019-50686-z] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Accepted: 09/13/2019] [Indexed: 12/16/2022] Open
Abstract
The increasing prevalence of antimicrobial-resistant bacteria drives the need for advanced methods to identify antimicrobial-resistance (AMR) genes in bacterial pathogens. With the availability of whole genome sequences, best-hit methods can be used to identify AMR genes by differentiating unknown sequences with known AMR sequences in existing online repositories. Nevertheless, these methods may not perform well when identifying resistance genes with sequences having low sequence identity with known sequences. We present a machine learning approach that uses protein sequences, with sequence identity ranging between 10% and 90%, as an alternative to conventional DNA sequence alignment-based approaches to identify putative AMR genes in Gram-negative bacteria. By using game theory to choose which protein characteristics to use in our machine learning model, we can predict AMR protein sequences for Gram-negative bacteria with an accuracy ranging from 93% to 99%. In order to obtain similar classification results, identity thresholds as low as 53% were required when using BLASTp.
Collapse
|