1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Wu S, Feng T, Tang W, Qi C, Gao J, He X, Wang J, Zhou H, Fang Z. metaProbiotics: a tool for mining probiotic from metagenomic binning data based on a language model. Brief Bioinform 2024; 25:bbae085. [PMID: 38487846 PMCID: PMC10940841 DOI: 10.1093/bib/bbae085] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/26/2024] [Accepted: 02/15/2024] [Indexed: 03/18/2024] Open
Abstract
Beneficial bacteria remain largely unexplored. Lacking systematic methods, understanding probiotic community traits becomes challenging, leading to various conclusions about their probiotic effects among different publications. We developed language model-based metaProbiotics to rapidly detect probiotic bins from metagenomes, demonstrating superior performance in simulated benchmark datasets. Testing on gut metagenomes from probiotic-treated individuals, it revealed the probioticity of intervention strains-derived bins and other probiotic-associated bins beyond the training data, such as a plasmid-like bin. Analyses of these bins revealed various probiotic mechanisms and bai operon as probiotic Ruminococcaceae's potential marker. In different health-disease cohorts, these bins were more common in healthy individuals, signifying their probiotic role, but relevant health predictions based on the abundance profiles of these bins faced cross-disease challenges. To better understand the heterogeneous nature of probiotics, we used metaProbiotics to construct a comprehensive probiotic genome set from global gut metagenomic data. Module analysis of this set shows that diseased individuals often lack certain probiotic gene modules, with significant variation of the missing modules across different diseases. Additionally, different gene modules on the same probiotic have heterogeneous effects on various diseases. We thus believe that gene function integrity of the probiotic community is more crucial in maintaining gut homeostasis than merely increasing specific gene abundance, and adding probiotics indiscriminately might not boost health. We expect that the innovative language model-based metaProbiotics tool will promote novel probiotic discovery using large-scale metagenomic data and facilitate systematic research on bacterial probiotic effects. The metaProbiotics program can be freely downloaded at https://github.com/zhenchengfang/metaProbiotics.
Collapse
Affiliation(s)
- Shufang Wu
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Tao Feng
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Waijiao Tang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Cancan Qi
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jie Gao
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
- Department of Gastroenterology, The Second Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xiaolong He
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Jiaxuan Wang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Hongwei Zhou
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| | - Zhencheng Fang
- Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
3
|
Ayoola MB, Das AR, Krishnan BS, Smith DR, Nanduri B, Ramkumar M. Predicting Salmonella MIC and Deciphering Genomic Determinants of Antibiotic Resistance and Susceptibility. Microorganisms 2024; 12:134. [PMID: 38257961 PMCID: PMC10819212 DOI: 10.3390/microorganisms12010134] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 01/04/2024] [Accepted: 01/08/2024] [Indexed: 01/24/2024] Open
Abstract
Salmonella spp., a leading cause of foodborne illness, is a formidable global menace due to escalating antimicrobial resistance (AMR). The evaluation of minimum inhibitory concentration (MIC) for antimicrobials is critical for characterizing AMR. The current whole genome sequencing (WGS)-based approaches for predicting MIC are hindered by both computational and feature identification constraints. We propose an innovative methodology called the "Genome Feature Extractor Pipeline" that integrates traditional machine learning (random forest, RF) with deep learning models (multilayer perceptron (MLP) and DeepLift) for WGS-based MIC prediction. We used a dataset from the National Antimicrobial Resistance Monitoring System (NARMS), comprising 4500 assembled genomes of nontyphoidal Salmonella, each annotated with MIC metadata for 15 antibiotics. Our pipeline involves the batch downloading of annotated genomes, the determination of feature importance using RF, Gini-index-based selection of crucial 10-mers, and their expansion to 20-mers. This is followed by an MLP network, with four hidden layers of 1024 neurons each, to predict MIC values. Using DeepLift, key 20-mers and associated genes influencing MIC are identified. The 10 most significant 20-mers for each antibiotic are listed, showcasing our ability to discern genomic features affecting Salmonella MIC prediction with enhanced precision. The methodology replaces binary indicators with k-mer counts, offering a more nuanced analysis. The combination of RF and MLP addresses the limitations of the existing WGS approach, providing a robust and efficient method for predicting MIC values in Salmonella that could potentially be applied to other pathogens.
Collapse
Affiliation(s)
- Moses B. Ayoola
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - Athish Ram Das
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - B. Santhana Krishnan
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - David R. Smith
- Department of Population Medicine, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA;
| | - Bindu Nanduri
- Department of Comparative Biomedical Sciences, College of Veterinary Medicine, Mississippi State University, Starkville, MS 39762, USA; (M.B.A.); (A.R.D.); (B.S.K.); (B.N.)
| | - Mahalingam Ramkumar
- Department of Computer Science and Engineering, Mississippi State University, Starkville, MS 39762, USA
| |
Collapse
|
4
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
5
|
Álvarez VE, Quiroga MP, Centrón D. Identification of a Specific Biomarker of Acinetobacter baumannii Global Clone 1 by Machine Learning and PCR Related to Metabolic Fitness of ESKAPE Pathogens. mSystems 2023:e0073422. [PMID: 37184409 DOI: 10.1128/msystems.00734-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Since the emergence of high-risk clones worldwide, constant investigations have been undertaken to comprehend the molecular basis that led to their prevalent dissemination in nosocomial settings over time. So far, the complex and multifactorial genetic traits of this type of epidemic clones have allowed only the identification of biomarkers with low specificity. A machine learning algorithm was able to recognize unequivocally a biomarker for early and accurate detection of Acinetobacter baumannii global clone 1 (GC1), one of the most disseminated high-risk clones. A support vector machine model identified the U1 sequence with a length of 367 nucleotides that matched a fragment of the moaCB gene, which encodes the molybdenum cofactor biosynthesis C and B proteins. U1 differentiates specifically between A. baumannii GC1 and non-GC1 strains, becoming a suitable biomarker capable of being translated into clinical settings as a molecular typing method for early diagnosis based on PCR as shown here. Since the metabolic pathways of Mo enzymes have been recognized as putative therapeutic targets for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens, our findings highlight that machine learning can also be useful in knowledge gaps of high-risk clones and provides noteworthy support to the literature to identify relevant nosocomial biomarkers for other multidrug-resistant high-risk clones. IMPORTANCE A. baumannii GC1 is an important high-risk clone that rapidly develops extreme drug resistance in the nosocomial niche. Furthermore, several strains have been identified worldwide in environmental samples, exacerbating the risk of human interactions. Early diagnosis is mandatory to limit its dissemination and to outline appropriate antibiotic stewardship schedules. A region with a length of 367 bp (U1) within the moaCB gene that is not subjected to lateral genetic transfer or to antibiotic pressures was successfully found by a support vector machine model that predicts A. baumannii GC1 strains. At the same time, research on the group of Mo enzymes proposed this metabolic pathway related to the superbug's metabolism as a potential future drug target site for ESKAPE pathogens due to its central role in bacterial fitness during infection. These findings confirm that machine learning used for the identification of biomarkers of high-risk lineages can also serve to identify putative novel therapeutic target sites.
Collapse
Affiliation(s)
- Verónica Elizabeth Álvarez
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - María Paula Quiroga
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Nodo de Bioinformática. Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Técnicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - Daniela Centrón
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
6
|
Li S, Wu J, Ma N, Liu W, Shao M, Ying N, Zhu L. Prediction of genome-wide imipenem resistance features in Klebsiella pneumoniae using machine learning. J Med Microbiol 2023; 72. [PMID: 36753438 DOI: 10.1099/jmm.0.001657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Introduction. The resistance rate of Klebsiella pneumoniae (K. pneumoniae) to imipenem is increasing year by year, and the imipenem resistance mechanism of K. pneumoniae is complex. Therefore, it is urgent to develop new strategies to explore the resistance mechanism of imipenem for its effective and accurate use in clinical practice.Hypothesis/Gap sStatement. Machine learning could identify resistance features and biological process that influence microbial resistance from whole-genome sequencing (WGS) data.Aims. This work aimed to predict imipenem resistance genetic features in K. pneumoniae from whole-genome k-mer features, and analyse their function for understanding its resistance mechanism.Methods. This study analysed WGS data of K. pneumoniae combined with resistance phenotype for imipenem, and established K. pneumoniae to imipenem genotype-phenotype model to predict resistance features using chi-squared test and random forest. An external clinical dataset was used to verify prediction power of resistance features. The potential genes were identified through alignment the resistance features with the K. pneumoniae reference genome using blastn, the functions of potential genes were further analysed to explore its resistance-related signalling pathways with GO and KEGG analysis, the resistance sequence patterns were screened using streme software. Finally, the resistance features were combined and modelled through four machine-learning algorithms (logistic regression, SVM, GBDT and XGBoost) to evaluate their phenotype prediction ability.Results. A total of 16 670 imipenem resistance features were predicted from genotype-phenotype model. The 30 potential genes were identified by annotating the resistance features and corresponded to known antibiotic-related genes (mdtM, dedA, rne, etc.). GO and KEGG pathway analyses indicated the possible association of imipenem resistance with metabolism process and cell membrane. CRYCAGCDN and CGRDAAAN were found from the imipenem resistance features, which were widely presented in the reported β-lactam resistance genes (bla SHV, bla CTX-M, bla TEM, etc.), and YCYAGCMCAST with metabolic functions (organic substance metabolic process, nitrogen compound metabolic process and cellular metabolic process) was identified from the top 50 resistance features. The 25 resistance genes in the training dataset included 19 genes in the external dataset, which verified the accuracy of prediction. The area under curve values of logistics regression, SVM, GBDT and XGBoost were 0.965, 0.966, 0.969 and 0.969, respectively, indicating that the imipenem resistance features have a strong prediction power.Conclusion. Machine-learning methods could effectively predict the imipenem resistance feature in K. pneumoniae, and provide resistance sequence profiles for predicting resistance phenotype and exploring potential resistance mechanisms. It provides an important insight into the potential therapeutic strategies of K. pneumoniae resistance to imipenem, and speed up the application of machine learning in routine diagnosis.
Collapse
Affiliation(s)
- Shanshan Li
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Jun Wu
- Lin'an Center for Disease Control and Prevention, Lin'an, 311300, PR China
| | - Nan Ma
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Wenjia Liu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou 310018, PR China
| | - Mengjie Shao
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Nanjiao Ying
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Lei Zhu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| |
Collapse
|
7
|
Deelder W, Manko E, Phelan JE, Campino S, Palla L, Clark TG. Geographical classification of malaria parasites through applying machine learning to whole genome sequence data. Sci Rep 2022; 12:21150. [PMID: 36476815 PMCID: PMC9729610 DOI: 10.1038/s41598-022-25568-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 12/01/2022] [Indexed: 12/12/2022] Open
Abstract
Malaria, caused by Plasmodium parasites, is a major global health challenge. Whole genome sequencing (WGS) of Plasmodium falciparum and Plasmodium vivax genomes is providing insights into parasite genetic diversity, transmission patterns, and can inform decision making for clinical and surveillance purposes. Advances in sequencing technologies are helping to generate timely and big genomic datasets, with the prospect of applying Artificial Intelligence analytical techniques (e.g., machine learning) to support programmatic malaria control and elimination. Here, we assess the potential of applying deep learning convolutional neural network approaches to predict the geographic origin of infections (continents, countries, GPS locations) using WGS data of P. falciparum (n = 5957; 27 countries) and P. vivax (n = 659; 13 countries) isolates. Using identified high-quality genome-wide single nucleotide polymorphisms (SNPs) (P. falciparum: 750 k, P. vivax: 588 k), an analysis of population structure and ancestry revealed clustering at the country-level. When predicting locations for both species, classification (compared to regression) methods had the lowest distance errors, and > 90% accuracy at a country level. Our work demonstrates the utility of machine learning approaches for geo-classification of malaria parasites. With timelier WGS data generation across more malaria-affected regions, the performance of machine learning approaches for geo-classification will improve, thereby supporting disease control activities.
Collapse
Affiliation(s)
- Wouter Deelder
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Dalberg Advisors, 7 Rue de Chantepoulet, 1201, Geneva, Switzerland
| | - Emilia Manko
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Jody E Phelan
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Susana Campino
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Luigi Palla
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Department of Public Health and Infectious Diseases, University of Rome La Sapienza, Rome, Italy
| | - Taane G Clark
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
| |
Collapse
|
8
|
Yee R, Simner PJ. Next-Generation Sequencing Approaches to Predicting Antimicrobial Susceptibility Testing Results. Clin Lab Med 2022; 42:557-572. [PMID: 36368782 DOI: 10.1016/j.cll.2022.09.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Rebecca Yee
- Division of Medical Microbiology, Department of Pathology, Johns Hopkins University School of Medicine, Meyer B1-193, 600 North Wolfe Street, Baltimore, MD 21287-7093, USA
| | - Patricia J Simner
- Division of Medical Microbiology, Department of Pathology, Johns Hopkins University School of Medicine, Meyer B1-193, 600 North Wolfe Street, Baltimore, MD 21287-7093, USA.
| |
Collapse
|
9
|
Zamora-Mendoza L, Guamba E, Miño K, Romero MP, Levoyer A, Alvarez-Barreto JF, Machado A, Alexis F. Antimicrobial Properties of Plant Fibers. MOLECULES (BASEL, SWITZERLAND) 2022; 27:molecules27227999. [PMID: 36432099 PMCID: PMC9699224 DOI: 10.3390/molecules27227999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Revised: 11/09/2022] [Accepted: 11/12/2022] [Indexed: 11/19/2022]
Abstract
Healthcare-associated infections (HAI), or nosocomial infections, are a global health and economic problem in developed and developing countries, particularly for immunocompromised patients in their intensive care units (ICUs) and surgical site hospital areas. Recurrent pathogens in HAIs prevail over antibiotic-resistant bacteria, such as methicillin-resistant Staphylococcus aureus (MRSA) and Pseudomonas aeruginosa. For this reason, natural antibacterial mechanisms are a viable alternative for HAI treatment. Natural fibers can inhibit bacterial growth, which can be considered a great advantage in these applications. Moreover, these fibers have been reported to be biocompatible and biodegradable, essential features for biomedical materials to avoid complications due to infections and significant immune responses. Consequently, tissue engineering, medical textiles, orthopedics, and dental implants, as well as cosmetics, are fields currently expanding the use of plant fibers. In this review, we will discuss the source of natural fibers with antimicrobial properties, antimicrobial mechanisms, and their biomedical applications.
Collapse
Affiliation(s)
- Lizbeth Zamora-Mendoza
- School of Biological Sciences & Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Esteban Guamba
- School of Biological Sciences & Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Karla Miño
- School of Biological Sciences & Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Maria Paula Romero
- School of Biological Sciences & Engineering, Yachay Tech University, Urcuquí 100119, Ecuador
| | - Anghy Levoyer
- Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Departamento de Ingeniería Química, Quito 170901, Ecuador
| | - José F. Alvarez-Barreto
- Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Departamento de Ingeniería Química, Quito 170901, Ecuador
| | - António Machado
- Colegio de Ciencias Biológicas y Ambientales COCIBA, Instituto de Microbiología, Universidad San Francisco de Quito (USFQ), Laboratorio de Bacteriología, Quito 170901, Ecuador
| | - Frank Alexis
- Colegio de Ciencias e Ingenierías, Universidad San Francisco de Quito (USFQ), Departamento de Ingeniería Química, Quito 170901, Ecuador
- Correspondence:
| |
Collapse
|
10
|
Aljeldah MM. Antimicrobial Resistance and Its Spread Is a Global Threat. Antibiotics (Basel) 2022; 11:antibiotics11081082. [PMID: 36009948 PMCID: PMC9405321 DOI: 10.3390/antibiotics11081082] [Citation(s) in RCA: 42] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 07/20/2022] [Accepted: 07/27/2022] [Indexed: 02/07/2023] Open
Abstract
Antimicrobial resistance (AMR) is a challenge to human wellbeing the world over and is one of the more serious public health concerns. AMR has the potential to emerge as a serious healthcare threat if left unchecked, and could put into motion another pandemic. This establishes the need for the establishment of global health solutions around AMR, taking into account microdata from different parts of the world. The positive influences in this regard could be establishing conducive social norms, charting individual and group behavior practices that favor global human health, and lastly, increasing collective awareness around the need for such action. Apart from being an emerging threat in the clinical space, AMR also increases treatment complexity, posing a real challenge to the existing guidelines around the management of antibiotic resistance. The attribute of resistance development has been linked to many genetic elements, some of which have complex transmission pathways between microbes. Beyond this, new mechanisms underlying the development of AMR are being discovered, making this field an important aspect of medical microbiology. Apart from the genetic aspects of AMR, other practices, including misdiagnosis, exposure to broad-spectrum antibiotics, and lack of rapid diagnosis, add to the creation of resistance. However, upgrades and innovations in DNA sequencing technologies with bioinformatics have revolutionized the diagnostic industry, aiding the real-time detection of causes of AMR and its elements, which are important to delineating control and prevention approaches to fight the threat.
Collapse
Affiliation(s)
- Mohammed M Aljeldah
- Department of Clinical Laboratory Sciences, College of Applied Medical Sciences, University of Hafr Al Batin, Hafar al-Batin 31991, Saudi Arabia
| |
Collapse
|
11
|
Balaji A, Kille B, Kappell AD, Godbold GD, Diep M, Elworth RAL, Qian Z, Albin D, Nasko DJ, Shah N, Pop M, Segarra S, Ternus KL, Treangen TJ. SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning. Genome Biol 2022; 23:133. [PMID: 35725628 PMCID: PMC9208262 DOI: 10.1186/s13059-022-02695-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2021] [Accepted: 05/25/2022] [Indexed: 11/10/2022] Open
Abstract
The COVID-19 pandemic has emphasized the importance of accurate detection of known and emerging pathogens. However, robust characterization of pathogenic sequences remains an open challenge. To address this need we developed SeqScreen, which accurately characterizes short nucleotide sequences using taxonomic and functional labels and a customized set of curated Functions of Sequences of Concern (FunSoCs) specific to microbial pathogenesis. We show our ensemble machine learning model can label protein-coding sequences with FunSoCs with high recall and precision. SeqScreen is a step towards a novel paradigm of functionally informed synthetic DNA screening and pathogen characterization, available for download at www.gitlab.com/treangenlab/seqscreen .
Collapse
Affiliation(s)
- Advait Balaji
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Anthony D Kappell
- Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA
| | - Gene D Godbold
- Signature Science, LLC, 1670 Discovery Drive, Charlottesville, VA, USA
| | - Madeline Diep
- Fraunhofer USA Center Mid-Atlantic CMA, Riverdale, MD, USA
| | - R A Leo Elworth
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Zhiqin Qian
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Dreycey Albin
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Daniel J Nasko
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Nidhi Shah
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Mihai Pop
- Department of Computer Science, University of Maryland, College Park, MD, USA
| | - Santiago Segarra
- Department of Electrical and Computer Engineering, Rice University, Houston, TX, USA
| | - Krista L Ternus
- Signature Science, LLC, 8329 North Mopac Expressway, Austin, TX, USA.
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
12
|
Nguyen AQ, Vu HP, Nguyen LN, Wang Q, Djordjevic SP, Donner E, Yin H, Nghiem LD. Monitoring antibiotic resistance genes in wastewater treatment: Current strategies and future challenges. THE SCIENCE OF THE TOTAL ENVIRONMENT 2021; 783:146964. [PMID: 33866168 DOI: 10.1016/j.scitotenv.2021.146964] [Citation(s) in RCA: 97] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 04/01/2021] [Accepted: 04/01/2021] [Indexed: 05/29/2023]
Abstract
Antimicrobial resistance (AMR) is a growing threat to human and animal health. Progress in molecular biology has revealed new and significant challenges for AMR mitigation given the immense diversity of antibiotic resistance genes (ARGs), the complexity of ARG transfer, and the broad range of omnipresent factors contributing to AMR. Municipal, hospital and abattoir wastewater are collected and treated in wastewater treatment plants (WWTPs), where the presence of diverse selection pressures together with a highly concentrated consortium of pathogenic/commensal microbes create favourable conditions for the transfer of ARGs and proliferation of antibiotic resistant bacteria (ARB). The rapid emergence of antibiotic resistant pathogens of clinical and veterinary significance over the past 80 years has re-defined the role of WWTPs as a focal point in the fight against AMR. By reviewing the occurrence of ARGs in wastewater and sludge and the current technologies used to quantify ARGs and identify ARB, this paper provides a research roadmap to address existing challenges in AMR control via wastewater treatment. Wastewater treatment is a double-edged sword that can act as either a pathway for AMR spread or as a barrier to reduce the environmental release of anthropogenic AMR. State of the art ARB identification technologies, such as metagenomic sequencing and fluorescence-activated cell sorting, have enriched ARG/ARB databases, unveiled keystone species in AMR networks, and improved the resolution of AMR dissemination models. Data and information provided in this review highlight significant knowledge gaps. These include inconsistencies in ARG reporting units, lack of ARG/ARB monitoring surrogates, lack of a standardised protocol for determining ARG removal via wastewater treatments, and the inability to support appropriate risk assessment. This is due to a lack of standard monitoring targets and agreed threshold values, and paucity of information on the ARG-pathogen host relationship and risk management. These research gaps need to be addressed and research findings need to be transformed into practical guidance for WWTP operators to enable effective progress towards mitigating the evolution and spread of AMR.
Collapse
Affiliation(s)
- Anh Q Nguyen
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Hang P Vu
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Luong N Nguyen
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Qilin Wang
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Steven P Djordjevic
- Institute of Infection, Immunity and Innovation, University of Technology Sydney, Ultimo, NSW 2007, Australia
| | - Erica Donner
- Future Industries Institute, University of South Australia, Mawson Lakes, SA 5095, Australia
| | - Huabing Yin
- School of Engineering, University of Glasgow, Glasgow G12 8LT, UK
| | - Long D Nghiem
- Centre for Technology in Water and Wastewater, School of Civil and Environmental Engineering, University of Technology Sydney, Ultimo, NSW 2007, Australia; Institute of Environmental Sciences, Nguyen Tat Thanh University, Ho Chi Minh City, Viet Nam.
| |
Collapse
|
13
|
Karlsen ST, Vesth TC, Oregaard G, Poulsen VK, Lund O, Henderson G, Bælum J. Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis. PLoS One 2021; 16:e0246287. [PMID: 33720959 PMCID: PMC7959382 DOI: 10.1371/journal.pone.0246287] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 01/17/2021] [Indexed: 11/18/2022] Open
Abstract
Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (Vmax), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). Vmax was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured Vmax and the predicted Vmax was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.
Collapse
Affiliation(s)
- Signe Tang Karlsen
- Chr. Hansen A/S, Hoersholm, Denmark
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
- * E-mail:
| | | | | | | | - Ole Lund
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | | | | |
Collapse
|
14
|
Overview of bioinformatic methods for analysis of antibiotic resistome from genome and metagenome data. J Microbiol 2021; 59:270-280. [DOI: 10.1007/s12275-021-0652-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/28/2021] [Accepted: 01/29/2021] [Indexed: 12/13/2022]
|
15
|
Lv J, Deng S, Zhang L. A review of artificial intelligence applications for antimicrobial resistance. BIOSAFETY AND HEALTH 2021. [DOI: 10.1016/j.bsheal.2020.08.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
|
16
|
Robust detection of point mutations involved in multidrug-resistant Mycobacterium tuberculosis in the presence of co-occurrent resistance markers. PLoS Comput Biol 2020; 16:e1008518. [PMID: 33347430 PMCID: PMC7785249 DOI: 10.1371/journal.pcbi.1008518] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 01/05/2021] [Accepted: 11/11/2020] [Indexed: 11/23/2022] Open
Abstract
Tuberculosis disease is a major global public health concern and the growing prevalence of drug-resistant Mycobacterium tuberculosis is making disease control more difficult. However, the increasing application of whole-genome sequencing as a diagnostic tool is leading to the profiling of drug resistance to inform clinical practice and treatment decision making. Computational approaches for identifying established and novel resistance-conferring mutations in genomic data include genome-wide association study (GWAS) methodologies, tests for convergent evolution and machine learning techniques. These methods may be confounded by extensive co-occurrent resistance, where statistical models for a drug include unrelated mutations known to be causing resistance to other drugs. Here, we introduce a novel ‘cannibalistic’ elimination algorithm (“Hungry, Hungry SNPos”) that attempts to remove these co-occurrent resistant variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type (n = 3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach is considerably more robust than traditional methods and detects resistance-associated variants too rare to be likely picked up by correlation-based techniques like GWAS. Tuberculosis is one of the deadliest infectious diseases, being responsible for more than one million deaths per year. The causing bacteria are becoming increasingly drug-resistant, which is hampering disease control. At the same time, an unprecedented amount of bacterial whole-genome sequencing is increasingly informing clinical practice. In order to detect the genetic alterations responsible for developing drug resistance and predict resistance status from genomic data, bio-statistical methods and machine learning models have been employed. However, due to strongly overlapping drug resistance phenotypes and genotypes in multidrug-resistant datasets, the results of these correlation-based approaches frequently also contain mutations related to resistance against other drugs. In the past, this issue has often been ignored or partially resolved by either restricting the input data or in post-analysis screening—with both strategies relying on prior information. Here we present a heuristic algorithm for finding resistance-associated variants and demonstrate that it is considerably more robust towards co-occurrent resistance compared to traditional techniques. The software is available at https://github.com/julibeg/HHS.
Collapse
|
17
|
Jaillard M, Palmieri M, van Belkum A, Mahé P. Interpreting k-mer-based signatures for antibiotic resistance prediction. Gigascience 2020; 9:giaa110. [PMID: 33068113 PMCID: PMC7568433 DOI: 10.1093/gigascience/giaa110] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 07/23/2020] [Accepted: 09/16/2020] [Indexed: 11/22/2022] Open
Abstract
BACKGROUND Recent years have witnessed the development of several k-mer-based approaches aiming to predict phenotypic traits of bacteria on the basis of their whole-genome sequences. While often convincing in terms of predictive performance, the underlying models are in general not straightforward to interpret, the interplay between the actual genetic determinant and its translation as k-mers being generally hard to decipher. RESULTS We propose a simple and computationally efficient strategy allowing one to cope with the high correlation inherent to k-mer-based representations in supervised machine learning models, leading to concise and easily interpretable signatures. We demonstrate the benefit of this approach on the task of predicting the antibiotic resistance profile of a Klebsiella pneumoniae strain from its genome, where our method leads to signatures defined as weighted linear combinations of genetic elements that can easily be identified as genuine antibiotic resistance determinants, with state-of-the-art predictive performance. CONCLUSIONS By enhancing the interpretability of genomic k-mer-based antibiotic resistance prediction models, our approach improves their clinical utility and hence will facilitate their adoption in routine diagnostics by clinicians and microbiologists. While antibiotic resistance was the motivating application, the method is generic and can be transposed to any other bacterial trait. An R package implementing our method is available at https://gitlab.com/biomerieux-data-science/clustlasso.
Collapse
Affiliation(s)
| | | | | | - Pierre Mahé
- bioMérieux, Chemin de l'Orme, 69280 Marcy l'Etoile, France
| |
Collapse
|
18
|
Lees JA, Mai TT, Galardini M, Wheeler NE, Horsfield ST, Parkhill J, Corander J. Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions. mBio 2020; 11:e01344-20. [PMID: 32636251 PMCID: PMC7343994 DOI: 10.1128/mbio.01344-20] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 06/05/2020] [Indexed: 12/19/2022] Open
Abstract
Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially.IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.
Collapse
Affiliation(s)
- John A Lees
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - T Tien Mai
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
| | - Marco Galardini
- Biological Design Center, Boston University, Boston, Massachusetts, USA
| | - Nicole E Wheeler
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
| | - Samuel T Horsfield
- MRC Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, Imperial College London, London, United Kingdom
| | - Julian Parkhill
- Department of Veterinary Medicine, University of Cambridge, Cambridge, United Kingdom
| | - Jukka Corander
- Oslo Centre for Biostatistics and Epidemiology, Department of Biostatistics, University of Oslo, Oslo, Norway
- Centre for Genomic Pathogen Surveillance, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, United Kingdom
- Helsinki Institute of Information Technology, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| |
Collapse
|
19
|
Macesic N, Bear Don't Walk OJ, Pe'er I, Tatonetti NP, Peleg AY, Uhlemann AC. Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data. mSystems 2020; 5:e00656-19. [PMID: 32457240 PMCID: PMC7253370 DOI: 10.1128/msystems.00656-19] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 05/01/2020] [Indexed: 02/06/2023] Open
Abstract
Polymyxins are used as treatments of last resort for Gram-negative bacterial infections. Their increased use has led to concerns about emerging polymyxin resistance (PR). Phenotypic polymyxin susceptibility testing is resource intensive and difficult to perform accurately. The complex polygenic nature of PR and our incomplete understanding of its genetic basis make it difficult to predict PR using detection of resistance determinants. We therefore applied machine learning (ML) to whole-genome sequencing data from >600 Klebsiella pneumoniae clonal group 258 (CG258) genomes to predict phenotypic PR. Using a reference-based representation of genomic data with ML outperformed a rule-based approach that detected variants in known PR genes (area under receiver-operator curve [AUROC], 0.894 versus 0.791, P = 0.006). We noted modest increases in performance by using a bacterial genome-wide association study to filter relevant genomic features and by integrating clinical data in the form of prior polymyxin exposure. Conversely, reference-free representation of genomic data as k-mers was associated with decreased performance (AUROC, 0.692 versus 0.894, P = 0.015). When ML models were interpreted to extract genomic features, six of seven known PR genes were correctly identified by models without prior programming and several genes involved in stress responses and maintenance of the cell membrane were identified as potential novel determinants of PR. These findings are a proof of concept that whole-genome sequencing data can accurately predict PR in K. pneumoniae CG258 and may be applicable to other forms of complex antimicrobial resistance.IMPORTANCE Polymyxins are last-resort antibiotics used to treat highly resistant Gram-negative bacteria. There are increasing reports of polymyxin resistance emerging, raising concerns of a postantibiotic era. Polymyxin resistance is therefore a significant public health threat, but current phenotypic methods for detection are difficult and time-consuming to perform. There have been increasing efforts to use whole-genome sequencing for detection of antibiotic resistance, but this has been difficult to apply to polymyxin resistance because of its complex polygenic nature. The significance of our research is that we successfully applied machine learning methods to predict polymyxin resistance in Klebsiella pneumoniae clonal group 258, a common health care-associated and multidrug-resistant pathogen. Our findings highlight that machine learning can be successfully applied even in complex forms of antibiotic resistance and represent a significant contribution to the literature that could be used to predict resistance in other bacteria and to other antibiotics.
Collapse
Affiliation(s)
- Nenad Macesic
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, New York, USA
- Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, Melbourne, Australia
| | | | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, New York, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Anton Y Peleg
- Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, Melbourne, Australia
- Infection and Immunity Program, Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University, Clayton, Victoria, Australia
| | - Anne-Catrin Uhlemann
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, New York, USA
- Microbiome & Pathogen Genomics Core, Columbia University Irving Medical Center, New York, New York, USA
| |
Collapse
|
20
|
Jung LC, Wang H, Li X, Wu C. A machine learning method for selection of genetic variants to increase prediction accuracy of type 2 diabetes mellitus using sequencing data. Stat Anal Data Min 2020. [DOI: 10.1002/sam.11456] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Luann C. Jung
- Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology Cambridge Massachusetts USA
| | - Haiyan Wang
- Department of Statistics, Kansas State University Manhattan Kansas USA
| | - Xukun Li
- Department of Statistics, Kansas State University Manhattan Kansas USA
| | - Cen Wu
- Department of Statistics, Kansas State University Manhattan Kansas USA
| |
Collapse
|
21
|
Vandenberg O, Durand G, Hallin M, Diefenbach A, Gant V, Murray P, Kozlakidis Z, van Belkum A. Consolidation of Clinical Microbiology Laboratories and Introduction of Transformative Technologies. Clin Microbiol Rev 2020; 33:e00057-19. [PMID: 32102900 PMCID: PMC7048017 DOI: 10.1128/cmr.00057-19] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Clinical microbiology is experiencing revolutionary advances in the deployment of molecular, genome sequencing-based, and mass spectrometry-driven detection, identification, and characterization assays. Laboratory automation and the linkage of information systems for big(ger) data management, including artificial intelligence (AI) approaches, also are being introduced. The initial optimism associated with these developments has now entered a more reality-driven phase of reflection on the significant challenges, complexities, and health care benefits posed by these innovations. With this in mind, the ongoing process of clinical laboratory consolidation, covering large geographical regions, represents an opportunity for the efficient and cost-effective introduction of new laboratory technologies and improvements in translational research and development. This will further define and generate the mandatory infrastructure used in validation and implementation of newer high-throughput diagnostic approaches. Effective, structured access to large numbers of well-documented biobanked biological materials from networked laboratories will release countless opportunities for clinical and scientific infectious disease research and will generate positive health care impacts. We describe why consolidation of clinical microbiology laboratories will generate quality benefits for many, if not most, aspects of the services separate institutions already provided individually. We also define the important role of innovative and large-scale diagnostic platforms. Such platforms lend themselves particularly well to computational (AI)-driven genomics and bioinformatics applications. These and other diagnostic innovations will allow for better infectious disease detection, surveillance, and prevention with novel translational research and optimized (diagnostic) product and service development opportunities as key results.
Collapse
Affiliation(s)
- Olivier Vandenberg
- Innovation and Business Development Unit, LHUB-ULB, Groupement Hospitalier Universitaire de Bruxelles (GHUB), Université Libre de Bruxelles, Brussels, Belgium
- Division of Infection and Immunity, Faculty of Medical Sciences, University College London, London, United Kingdom
| | - Géraldine Durand
- bioMérieux, Microbiology Research and Development, La Balme Les Grottes, France
| | - Marie Hallin
- Department of Microbiology, LHUB-ULB, Groupement Hospitalier Universitaire de Bruxelles (GHUB), Université Libre de Bruxelles, Brussels, Belgium
| | - Andreas Diefenbach
- Department of Microbiology, Infectious Diseases and Immunology, Charité-Universitätsmedizin Berlin, Berlin, Germany
- Labor Berlin, Charité-Vivantes GmbH, Berlin, Germany
| | - Vanya Gant
- Department of Clinical Microbiology, University College London Hospitals NHS Foundation Trust, London, United Kingdom
| | - Patrick Murray
- BD Life Sciences Integrated Diagnostic Solutions, Scientific Affairs, Sparks, Maryland, USA
| | - Zisis Kozlakidis
- Laboratory Services and Biobank Group, International Agency for Research on Cancer, World Health Organization, Lyon, France
| | - Alex van Belkum
- bioMérieux, Open Innovation and Partnerships, La Balme Les Grottes, France
| |
Collapse
|
22
|
Bioinformatics Approaches to the Understanding of Molecular Mechanisms in Antimicrobial Resistance. Int J Mol Sci 2020; 21:ijms21041363. [PMID: 32085478 PMCID: PMC7072858 DOI: 10.3390/ijms21041363] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2020] [Revised: 02/13/2020] [Accepted: 02/17/2020] [Indexed: 12/30/2022] Open
Abstract
Antimicrobial resistance (AMR) is a major health concern worldwide. A better understanding of the underlying molecular mechanisms is needed. Advances in whole genome sequencing and other high-throughput unbiased instrumental technologies to study the molecular pathogenicity of infectious diseases enable the accumulation of large amounts of data that are amenable to bioinformatic analysis and the discovery of new signatures of AMR. In this work, we review representative methods published in the past five years to define major approaches developed to-date in the understanding of AMR mechanisms. Advantages and limitations for applications of these methods in clinical laboratory testing and basic research are discussed.
Collapse
|
23
|
Liu Z, Deng D, Lu H, Sun J, Lv L, Li S, Peng G, Ma X, Li J, Li Z, Rong T, Wang G. Evaluation of Machine Learning Models for Predicting Antimicrobial Resistance of Actinobacillus pleuropneumoniae From Whole Genome Sequences. Front Microbiol 2020; 11:48. [PMID: 32117101 PMCID: PMC7016212 DOI: 10.3389/fmicb.2020.00048] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 01/10/2020] [Indexed: 01/05/2023] Open
Abstract
Antimicrobial resistance (AMR) is becoming a huge problem in countries all over the world, and new approaches to identifying strains resistant or susceptible to certain antibiotics are essential in fighting against antibiotic-resistant pathogens. Genotype-based machine learning methods showed great promise as a diagnostic tool, due to the increasing availability of genomic datasets and AST phenotypes. In this article, Support Vector Machine (SVM) and Set Covering Machine (SCM) models were used to learn and predict the resistance of the five drugs (Tetracycline, Ampicillin, Sulfisoxazole, Trimethoprim, and Enrofloxacin). The SVM model used the number of co-occurring k-mers between the genome of the isolates and the reference genes to learn and predict the phenotypes of the bacteria to a specific antimicrobial, while the SCM model uses a greedy approach to construct conjunction or disjunction of Boolean functions to find the most concise set of k-mers that allows for accurate prediction of the phenotype. Five-fold cross-validation was performed on the training set of the SVM and SCM model to select the best hyperparameter values to avoid model overfitting. The training accuracy (mean cross-validation score) and the testing accuracy of SVM and SCM models of five drugs were above 90% regardless of the resistant mechanism of which were acquired resistant or point mutation in the chromosome. The results of correlation between the phenotype and the model predictions of the five drugs indicated that both SVM and SCM models could significantly classify the resistant isolates from the sensitive isolates of the bacteria (p < 0.01), and would be used as potential tools in antimicrobial resistance surveillance and clinical diagnosis in veterinary medicine.
Collapse
Affiliation(s)
- Zhichang Liu
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Dun Deng
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Huijie Lu
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Jian Sun
- National Veterinary Microbiological Drug Resistance Risk Assessment Laboratory, College of Veterinary Medicine, South China Agricultural University, Guangzhou, China
| | - Luchao Lv
- National Veterinary Microbiological Drug Resistance Risk Assessment Laboratory, College of Veterinary Medicine, South China Agricultural University, Guangzhou, China
| | - Shuhong Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Guanghui Peng
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Xianyong Ma
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Jiazhou Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Zhenming Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Ting Rong
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Gang Wang
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| |
Collapse
|
24
|
Panyukov VV, Kiselev SS, Ozoline ON. Unique k-mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling. Int J Mol Sci 2020; 21:ijms21030944. [PMID: 32023871 PMCID: PMC7037511 DOI: 10.3390/ijms21030944] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 01/21/2020] [Accepted: 01/28/2020] [Indexed: 02/07/2023] Open
Abstract
The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn's disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific "barcodes" for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.
Collapse
Affiliation(s)
- Valery V. Panyukov
- Institute of Mathematical Problems of Biology RAS—the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, 142290 Pushchino, Russia;
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
| | - Sergey S. Kiselev
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
- Institute of Cell Biophysics of the Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Olga N. Ozoline
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
- Institute of Cell Biophysics of the Russian Academy of Sciences, 142290 Pushchino, Russia
- Correspondence:
| |
Collapse
|
25
|
Mahé P, El Azami M, Barlas P, Tournoud M. A large scale evaluation of TBProfiler and Mykrobe for antibiotic resistance prediction in Mycobacterium tuberculosis. PeerJ 2019; 7:e6857. [PMID: 31106066 PMCID: PMC6500375 DOI: 10.7717/peerj.6857] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 03/25/2019] [Indexed: 02/02/2023] Open
Abstract
Recent years saw a growing interest in predicting antibiotic resistance from whole-genome sequencing data, with promising results obtained for Staphylococcus aureus and Mycobacterium tuberculosis. In this work, we gathered 6,574 sequencing read datasets of M. tuberculosis public genomes with associated antibiotic resistance profiles for both first and second-line antibiotics. We performed a systematic evaluation of TBProfiler and Mykrobe, two widely recognized softwares allowing to predict resistance in M. tuberculosis. The size of the dataset allowed us to obtain confident estimations of their overall predictive performance, to assess precisely the individual predictive power of the markers they rely on, and to study in addition how these softwares behave across the major M. tuberculosis lineages. While this study confirmed the overall good performance of these tools, it revealed that an important fraction of the catalog of mutations they embed is of limited predictive power. It also revealed that these tools offer different sensitivity/specificity trade-offs, which is mainly due to the different sets of mutation they embed but also to their underlying genotyping pipelines. More importantly, it showed that their level of predictive performance varies greatly across lineages for some antibiotics, therefore suggesting that the predictions made by these softwares should be deemed more or less confident depending on the lineage inferred and the predictive performance of the marker(s) actually detected. Finally, we evaluated the relevance of machine learning approaches operating from the set of markers detected by these softwares and show that they present an attractive alternative strategy, allowing to reach better performance for several drugs while significantly reducing the number of candidate mutations to consider.
Collapse
Affiliation(s)
- Pierre Mahé
- Data Analytics Department, bioMérieux, Marcy l'Etoile, France
| | - Meriem El Azami
- Data Analytics Department, bioMérieux, Marcy l'Etoile, France
| | | | - Maud Tournoud
- Data Analytics Department, bioMérieux, Marcy l'Etoile, France
| |
Collapse
|