1
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
2
|
Mouratidis I, Baltoumas FA, Chantzi N, Patsakis M, Chan CS, Montgomery A, Konnaris MA, Aplakidou E, Georgakopoulos GC, Das A, Chartoumpekis DV, Kovac J, Pavlopoulos GA, Georgakopoulos-Soares I. kmerDB: A database encompassing the set of genomic and proteomic sequence information for each species. Comput Struct Biotechnol J 2024; 23:1919-1928. [PMID: 38711760 PMCID: PMC11070822 DOI: 10.1016/j.csbj.2024.04.050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2023] [Revised: 04/17/2024] [Accepted: 04/18/2024] [Indexed: 05/08/2024] Open
Abstract
The decrease in sequencing expenses has facilitated the creation of reference genomes and proteomes for an expanding array of organisms. Nevertheless, no established repository that details organism-specific genomic and proteomic sequences of specific lengths, referred to as kmers, exists to our knowledge. In this article, we present kmerDB, a database accessible through an interactive web interface that provides kmer-based information from genomic and proteomic sequences in a systematic way. kmerDB currently contains 202,340,859,107 base pairs and 19,304,903,356 amino acids, spanning 54,039 and 21,865 reference genomes and proteomes, respectively, as well as 6,905,362 and 149,305,183 genomic and proteomic species-specific sequences, termed quasi-primes. Additionally, we provide access to 5,186,757 nucleic and 214,904,089 peptide sequences absent from every genome and proteome, termed primes. kmerDB features a user-friendly interface offering various search options and filters for easy parsing and searching. The service is available at: www.kmerdb.com.
Collapse
Affiliation(s)
- Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
| | - Fotis A. Baltoumas
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Michail Patsakis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA, USA
- Department of Statistics, The Pennsylvania State University, University Park, PA, USA
| | - Eleni Aplakidou
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Department of Basic Sciences, School of Medicine, University of Crete, Heraklion, Greece
| | - George C. Georgakopoulos
- National Technical University of Athens, School of Electrical and Computer Engineering, Athens, Greece
| | - Anshuman Das
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Dionysios V. Chartoumpekis
- Service of Endocrinology, Diabetology and Metabolism, Lausanne University Hospital, Lausanne, Switzerland
| | - Jasna Kovac
- Department of Food Science, The Pennsylvania State University, University Park, PA 16802, USA
| | - Georgios A. Pavlopoulos
- Institute for Fundamental Biomedical Research, BSRC "Alexander Fleming", Vari, 16672, Greece
- Center for New Biotechnologies and Precision Medicine, School of Medicine, National and Kapodistrian University of Athens, Athens, 11527, Greece
| | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| |
Collapse
|
3
|
Hu K, Meyer F, Deng ZL, Asgari E, Kuo TH, Münch PC, McHardy AC. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. Brief Bioinform 2024; 25:bbae206. [PMID: 38706320 PMCID: PMC11070729 DOI: 10.1093/bib/bbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.
Collapse
Affiliation(s)
- Kaixin Hu
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Molecular Cell Biomechanics Laboratory, Department of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Philipp C Münch
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| |
Collapse
|
4
|
Baker M, Zhang X, Maciel-Guerra A, Babaarslan K, Dong Y, Wang W, Hu Y, Renney D, Liu L, Li H, Hossain M, Heeb S, Tong Z, Pearcy N, Zhang M, Geng Y, Zhao L, Hao Z, Senin N, Chen J, Peng Z, Li F, Dottorini T. Convergence of resistance and evolutionary responses in Escherichia coli and Salmonella enterica co-inhabiting chicken farms in China. Nat Commun 2024; 15:206. [PMID: 38182559 PMCID: PMC10770378 DOI: 10.1038/s41467-023-44272-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 12/06/2023] [Indexed: 01/07/2024] Open
Abstract
Sharing of genetic elements among different pathogens and commensals inhabiting same hosts and environments has significant implications for antimicrobial resistance (AMR), especially in settings with high antimicrobial exposure. We analysed 661 Escherichia coli and Salmonella enterica isolates collected within and across hosts and environments, in 10 Chinese chicken farms over 2.5 years using data-mining methods. Most isolates within same hosts possessed the same clinically relevant AMR-carrying mobile genetic elements (plasmids: 70.6%, transposons: 78%), which also showed recent common evolution. Supervised machine learning classifiers revealed known and novel AMR-associated mutations and genes underlying resistance to 28 antimicrobials, primarily associated with resistance in E. coli and susceptibility in S. enterica. Many were essential and affected same metabolic processes in both species, albeit with varying degrees of phylogenetic penetration. Multi-modal strategies are crucial to investigate the interplay of mobilome, resistance and metabolism in cohabiting bacteria, especially in ecological settings where community-driven resistance selection occurs.
Collapse
Affiliation(s)
- Michelle Baker
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK
| | - Xibin Zhang
- Shandong New Hope Liuhe Group Co. Ltd. and Qingdao Key Laboratory of Animal Feed Safety, Qingdao, Shandong, 266000, P.R. China
| | - Alexandre Maciel-Guerra
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK
| | - Kubra Babaarslan
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK
| | - Yinping Dong
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China
| | - Wei Wang
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China
| | - Yujie Hu
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China
| | - David Renney
- Nimrod Veterinary Products Limited, 2, Wychwood Court, Cotswold Business Village, Moreton-in-Marsh, GL56 0JQ, London, UK
| | - Longhai Liu
- Shandong Kaijia Food Co. Ltd, Weifang, P. R. China
| | - Hui Li
- Luoyang Center for Disease Control and Prevention, No. 9, Zhenghe Road, Luolong District, Luoyang City, Henan Province, Luolong, 471000, P. R. China
| | - Maqsud Hossain
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK
| | - Stephan Heeb
- School of Life Sciences, University of Nottingham, East Drive, Nottingham, Nottinghamshire, NG7 2RD, UK
| | - Zhiqin Tong
- Luoyang Center for Disease Control and Prevention, No. 9, Zhenghe Road, Luolong District, Luoyang City, Henan Province, Luolong, 471000, P. R. China
| | - Nicole Pearcy
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK
- School of Life Sciences, University of Nottingham, East Drive, Nottingham, Nottinghamshire, NG7 2RD, UK
| | - Meimei Zhang
- Liaoning Provincial Center for Disease Control and Prevention, No. 168, Jinfeng Street, Hunnan District, Shenyang City, Liaoning Province, 110072, P. R. China
| | - Yingzhi Geng
- Liaoning Provincial Center for Disease Control and Prevention, No. 168, Jinfeng Street, Hunnan District, Shenyang City, Liaoning Province, 110072, P. R. China
| | - Li Zhao
- Agricultural Biopharmaceutical Laboratory, College of Chemistry and Pharmaceutical Sciences, Qingdao Agricultural University, No. 700 Changcheng Road, Chengyang District, Qingdao City, Shandong Province, 266109, P. R. China
| | - Zhihui Hao
- Chinese Veterinary Medicine Innovation Center, College of Veterinary Medicine, China Agricultural University, Haidian District, Beijing City, 100193, P. R. China
| | - Nicola Senin
- Department of Engineering, University of Perugia, Perugia, I06125, Italy
| | - Junshi Chen
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China
| | - Zixin Peng
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China.
| | - Fengqin Li
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, 100021, P. R. China.
| | - Tania Dottorini
- School of Veterinary Medicine and Science, University of Nottingham, College Road, Sutton Bonington, Loughborough, Leicestershire, LE12 5RD, UK.
- Centre for Smart Food Research, Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China, Ningbo, 315100, P. R. China.
| |
Collapse
|
5
|
Yang S, Chen J, Fu J, Huang J, Li T, Yao Z, Ye X. Disease-Associated Streptococcus pneumoniae Genetic Variation. Emerg Infect Dis 2024; 30:39-49. [PMID: 38146979 PMCID: PMC10756394 DOI: 10.3201/eid3001.221927] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2023] Open
Abstract
Streptococcus pneumoniae is an opportunistic pathogen that causes substantial illness and death among children worldwide. The genetic backgrounds of pneumococci that cause infection versus asymptomatic carriage vary substantially. To determine the evolutionary mechanisms of opportunistic pathogenicity, we conducted a genomic surveillance study in China. We collected 783 S. pneumoniae isolates from infected and asymptomatic children. By using a 2-stage genomewide association study process, we compared genomic differences between infection and carriage isolates to address genomic variation associated with pathogenicity. We identified 8 consensus k-mers associated with adherence, antimicrobial resistance, and immune modulation, which were unevenly distributed in the infection isolates. Classification accuracy of the best k-mer predictor for S. pneumoniae infection was good, giving a simple target for predicting pathogenic isolates. Our findings suggest that S. pneumoniae pathogenicity is complex and multifactorial, and we provide genetic evidence for precise targeted interventions.
Collapse
|
6
|
Dutta A, McDonald BA, Croll D. Combined reference-free and multi-reference based GWAS uncover cryptic variation underlying rapid adaptation in a fungal plant pathogen. PLoS Pathog 2023; 19:e1011801. [PMID: 37972199 PMCID: PMC10688896 DOI: 10.1371/journal.ppat.1011801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Revised: 11/30/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023] Open
Abstract
Microbial pathogens often harbor substantial functional diversity driven by structural genetic variation. Rapid adaptation from such standing variation threatens global food security and human health. Genome-wide association studies (GWAS) provide a powerful approach to identify genetic variants underlying recent pathogen adaptation. However, the reliance on single reference genomes and single nucleotide polymorphisms (SNPs) obscures the true extent of adaptive genetic variation. Here, we show quantitatively how a combination of multiple reference genomes and reference-free approaches captures substantially more relevant genetic variation compared to single reference mapping. We performed reference-genome based association mapping across 19 reference-quality genomes covering the diversity of the species. We contrasted the results with a reference-free (i.e., k-mer) approach using raw whole-genome sequencing data in a panel of 145 strains collected across the global distribution range of the fungal wheat pathogen Zymoseptoria tritici. We mapped the genetic architecture of 49 life history traits including virulence, reproduction and growth in multiple stressful environments. The inclusion of additional reference genome SNP datasets provides a nearly linear increase in additional loci mapped through GWAS. Variants detected through the k-mer approach explained a higher proportion of phenotypic variation than a reference genome-based approach and revealed functionally confirmed loci that classic GWAS approaches failed to map. The power of GWAS in microbial pathogens can be significantly enhanced by comprehensively capturing structural genetic variation. Our approach is generalizable to a large number of species and will uncover novel mechanisms driving rapid adaptation of pathogens.
Collapse
Affiliation(s)
- Anik Dutta
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Bruce A. McDonald
- Plant Pathology, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Daniel Croll
- Laboratory of Evolutionary Genetics, Institute of Biology, University of Neuchâtel, Neuchâtel, Switzerland
| |
Collapse
|
7
|
Aylward AJ, Petrus S, Mamerto A, Hartwick NT, Michael TP. PanKmer: k-mer-based and reference-free pangenome analysis. Bioinformatics 2023; 39:btad621. [PMID: 37846049 PMCID: PMC10603592 DOI: 10.1093/bioinformatics/btad621] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 08/29/2023] [Accepted: 10/13/2023] [Indexed: 10/18/2023] Open
Abstract
SUMMARY Pangenomes are replacing single reference genomes as the definitive representation of DNA sequence within a species or clade. Pangenome analysis predominantly leverages graph-based methods that require computationally intensive multiple genome alignments, do not scale to highly complex eukaryotic genomes, limit their scope to identifying structural variants (SVs), or incur bias by relying on a reference genome. Here, we present PanKmer, a toolkit designed for reference-free analysis of pangenome datasets consisting of dozens to thousands of individual genomes. PanKmer decomposes a set of input genomes into a table of observed k-mers and their presence-absence values in each genome. These are stored in an efficient k-mer index data format that encodes SNPs, INDELs, and SVs. It also includes functions for downstream analysis of the k-mer index, such as calculating sequence similarity statistics between individuals at whole-genome or local scales. For example, k-mers can be "anchored" in any individual genome to quantify sequence variability or conservation at a specific locus. This facilitates workflows with various biological applications, e.g. identifying cases of hybridization between plant species. PanKmer provides researchers with a valuable and convenient means to explore the full scope of genetic variation in a population, without reference bias. AVAILABILITY AND IMPLEMENTATION PanKmer is implemented as a Python package with components written in Rust, released under a BSD license. The source code is available from the Python Package Index (PyPI) at https://pypi.org/project/pankmer/ as well as Gitlab at https://gitlab.com/salk-tm/pankmer. Full documentation is available at https://salk-tm.gitlab.io/pankmer/.
Collapse
Affiliation(s)
- Anthony J Aylward
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Semar Petrus
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Allen Mamerto
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Nolan T Hartwick
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| | - Todd P Michael
- The Plant Molecular and Cellular Biology Laboratory, The Salk Institute for Biological Studies, La Jolla, CA 92037, United States
| |
Collapse
|
8
|
Karlsen ST, Rau MH, Sánchez BJ, Jensen K, Zeidan AA. From genotype to phenotype: computational approaches for inferring microbial traits relevant to the food industry. FEMS Microbiol Rev 2023; 47:fuad030. [PMID: 37286882 PMCID: PMC10337747 DOI: 10.1093/femsre/fuad030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 05/31/2023] [Accepted: 06/06/2023] [Indexed: 06/09/2023] Open
Abstract
When selecting microbial strains for the production of fermented foods, various microbial phenotypes need to be taken into account to achieve target product characteristics, such as biosafety, flavor, texture, and health-promoting effects. Through continuous advances in sequencing technologies, microbial whole-genome sequences of increasing quality can now be obtained both cheaper and faster, which increases the relevance of genome-based characterization of microbial phenotypes. Prediction of microbial phenotypes from genome sequences makes it possible to quickly screen large strain collections in silico to identify candidates with desirable traits. Several microbial phenotypes relevant to the production of fermented foods can be predicted using knowledge-based approaches, leveraging our existing understanding of the genetic and molecular mechanisms underlying those phenotypes. In the absence of this knowledge, data-driven approaches can be applied to estimate genotype-phenotype relationships based on large experimental datasets. Here, we review computational methods that implement knowledge- and data-driven approaches for phenotype prediction, as well as methods that combine elements from both approaches. Furthermore, we provide examples of how these methods have been applied in industrial biotechnology, with special focus on the fermented food industry.
Collapse
Affiliation(s)
- Signe T Karlsen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Martin H Rau
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Benjamín J Sánchez
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Kristian Jensen
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| | - Ahmad A Zeidan
- Bioinformatics & Modeling, R&D Digital Innovation, Chr. Hansen A/S, Bøge Allé 10-12, 2970 Hørsholm, Denmark
| |
Collapse
|
9
|
Chong YY, Chan PK, Chan VWK, Cheung A, Luk MH, Cheung MH, Fu H, Chiu KY. Application of machine learning in the prevention of periprosthetic joint infection following total knee arthroplasty: a systematic review. ARTHROPLASTY 2023; 5:38. [PMID: 37316877 DOI: 10.1186/s42836-023-00195-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 05/11/2023] [Indexed: 06/16/2023] Open
Abstract
BACKGROUND Machine learning is a promising and powerful technology with increasing use in orthopedics. Periprosthetic joint infection following total knee arthroplasty results in increased morbidity and mortality. This systematic review investigated the use of machine learning in preventing periprosthetic joint infection. METHODS A systematic review was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. PubMed was searched in November 2022. All studies that investigated the clinical applications of machine learning in the prevention of periprosthetic joint infection following total knee arthroplasty were included. Non-English studies, studies with no full text available, studies focusing on non-clinical applications of machine learning, reviews and meta-analyses were excluded. For each included study, its characteristics, machine learning applications, algorithms, statistical performances, strengths and limitations were summarized. Limitations of the current machine learning applications and the studies, including their 'black box' nature, overfitting, the requirement of a large dataset, the lack of external validation, and their retrospective nature were identified. RESULTS Eleven studies were included in the final analysis. Machine learning applications in the prevention of periprosthetic joint infection were divided into four categories: prediction, diagnosis, antibiotic application and prognosis. CONCLUSION Machine learning may be a favorable alternative to manual methods in the prevention of periprosthetic joint infection following total knee arthroplasty. It aids in preoperative health optimization, preoperative surgical planning, the early diagnosis of infection, the early application of suitable antibiotics, and the prediction of clinical outcomes. Future research is warranted to resolve the current limitations and bring machine learning into clinical settings.
Collapse
Affiliation(s)
- Yuk Yee Chong
- Department of Orthopaedics and Traumatology, School of Clinical Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Ping Keung Chan
- Department of Orthopaedics and Traumatology, School of Clinical Medicine, The University of Hong Kong, Hong Kong SAR, China.
| | - Vincent Wai Kwan Chan
- Department of Orthopaedics and Traumatology, Queen Mary Hospital, Hong Kong SAR, China
| | - Amy Cheung
- Department of Orthopaedics and Traumatology, Queen Mary Hospital, Hong Kong SAR, China
| | - Michelle Hilda Luk
- Department of Orthopaedics and Traumatology, Queen Mary Hospital, Hong Kong SAR, China
| | - Man Hong Cheung
- Department of Orthopaedics and Traumatology, School of Clinical Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Henry Fu
- Department of Orthopaedics and Traumatology, School of Clinical Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Kwong Yuen Chiu
- Department of Orthopaedics and Traumatology, School of Clinical Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
10
|
Álvarez VE, Quiroga MP, Centrón D. Identification of a Specific Biomarker of Acinetobacter baumannii Global Clone 1 by Machine Learning and PCR Related to Metabolic Fitness of ESKAPE Pathogens. mSystems 2023:e0073422. [PMID: 37184409 DOI: 10.1128/msystems.00734-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Since the emergence of high-risk clones worldwide, constant investigations have been undertaken to comprehend the molecular basis that led to their prevalent dissemination in nosocomial settings over time. So far, the complex and multifactorial genetic traits of this type of epidemic clones have allowed only the identification of biomarkers with low specificity. A machine learning algorithm was able to recognize unequivocally a biomarker for early and accurate detection of Acinetobacter baumannii global clone 1 (GC1), one of the most disseminated high-risk clones. A support vector machine model identified the U1 sequence with a length of 367 nucleotides that matched a fragment of the moaCB gene, which encodes the molybdenum cofactor biosynthesis C and B proteins. U1 differentiates specifically between A. baumannii GC1 and non-GC1 strains, becoming a suitable biomarker capable of being translated into clinical settings as a molecular typing method for early diagnosis based on PCR as shown here. Since the metabolic pathways of Mo enzymes have been recognized as putative therapeutic targets for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens, our findings highlight that machine learning can also be useful in knowledge gaps of high-risk clones and provides noteworthy support to the literature to identify relevant nosocomial biomarkers for other multidrug-resistant high-risk clones. IMPORTANCE A. baumannii GC1 is an important high-risk clone that rapidly develops extreme drug resistance in the nosocomial niche. Furthermore, several strains have been identified worldwide in environmental samples, exacerbating the risk of human interactions. Early diagnosis is mandatory to limit its dissemination and to outline appropriate antibiotic stewardship schedules. A region with a length of 367 bp (U1) within the moaCB gene that is not subjected to lateral genetic transfer or to antibiotic pressures was successfully found by a support vector machine model that predicts A. baumannii GC1 strains. At the same time, research on the group of Mo enzymes proposed this metabolic pathway related to the superbug's metabolism as a potential future drug target site for ESKAPE pathogens due to its central role in bacterial fitness during infection. These findings confirm that machine learning used for the identification of biomarkers of high-risk lineages can also serve to identify putative novel therapeutic target sites.
Collapse
Affiliation(s)
- Verónica Elizabeth Álvarez
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - María Paula Quiroga
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Nodo de Bioinformática. Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Técnicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - Daniela Centrón
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
11
|
Li S, Wu J, Ma N, Liu W, Shao M, Ying N, Zhu L. Prediction of genome-wide imipenem resistance features in Klebsiella pneumoniae using machine learning. J Med Microbiol 2023; 72. [PMID: 36753438 DOI: 10.1099/jmm.0.001657] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023] Open
Abstract
Introduction. The resistance rate of Klebsiella pneumoniae (K. pneumoniae) to imipenem is increasing year by year, and the imipenem resistance mechanism of K. pneumoniae is complex. Therefore, it is urgent to develop new strategies to explore the resistance mechanism of imipenem for its effective and accurate use in clinical practice.Hypothesis/Gap sStatement. Machine learning could identify resistance features and biological process that influence microbial resistance from whole-genome sequencing (WGS) data.Aims. This work aimed to predict imipenem resistance genetic features in K. pneumoniae from whole-genome k-mer features, and analyse their function for understanding its resistance mechanism.Methods. This study analysed WGS data of K. pneumoniae combined with resistance phenotype for imipenem, and established K. pneumoniae to imipenem genotype-phenotype model to predict resistance features using chi-squared test and random forest. An external clinical dataset was used to verify prediction power of resistance features. The potential genes were identified through alignment the resistance features with the K. pneumoniae reference genome using blastn, the functions of potential genes were further analysed to explore its resistance-related signalling pathways with GO and KEGG analysis, the resistance sequence patterns were screened using streme software. Finally, the resistance features were combined and modelled through four machine-learning algorithms (logistic regression, SVM, GBDT and XGBoost) to evaluate their phenotype prediction ability.Results. A total of 16 670 imipenem resistance features were predicted from genotype-phenotype model. The 30 potential genes were identified by annotating the resistance features and corresponded to known antibiotic-related genes (mdtM, dedA, rne, etc.). GO and KEGG pathway analyses indicated the possible association of imipenem resistance with metabolism process and cell membrane. CRYCAGCDN and CGRDAAAN were found from the imipenem resistance features, which were widely presented in the reported β-lactam resistance genes (bla SHV, bla CTX-M, bla TEM, etc.), and YCYAGCMCAST with metabolic functions (organic substance metabolic process, nitrogen compound metabolic process and cellular metabolic process) was identified from the top 50 resistance features. The 25 resistance genes in the training dataset included 19 genes in the external dataset, which verified the accuracy of prediction. The area under curve values of logistics regression, SVM, GBDT and XGBoost were 0.965, 0.966, 0.969 and 0.969, respectively, indicating that the imipenem resistance features have a strong prediction power.Conclusion. Machine-learning methods could effectively predict the imipenem resistance feature in K. pneumoniae, and provide resistance sequence profiles for predicting resistance phenotype and exploring potential resistance mechanisms. It provides an important insight into the potential therapeutic strategies of K. pneumoniae resistance to imipenem, and speed up the application of machine learning in routine diagnosis.
Collapse
Affiliation(s)
- Shanshan Li
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Jun Wu
- Lin'an Center for Disease Control and Prevention, Lin'an, 311300, PR China
| | - Nan Ma
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Wenjia Liu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,College of Electronics and Information Engineering, Hangzhou Dianzi University, Hangzhou 310018, PR China
| | - Mengjie Shao
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Nanjiao Ying
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| | - Lei Zhu
- College of Automation, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China.,Institute of Biomedical Engineering and Instrument, Hangzhou Dianzi University, Hangzhou, Zhejiang, 310018, PR China
| |
Collapse
|
12
|
Iquebal MA, Jagannadham J, Jaiswal S, Prabha R, Rai A, Kumar D. Potential Use of Microbial Community Genomes in Various Dimensions of Agriculture Productivity and Its Management: A Review. Front Microbiol 2022; 13:708335. [PMID: 35655999 PMCID: PMC9152772 DOI: 10.3389/fmicb.2022.708335] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 03/17/2022] [Indexed: 12/12/2022] Open
Abstract
Agricultural productivity is highly influenced by its associated microbial community. With advancements in omics technology, metagenomics is known to play a vital role in microbial world studies by unlocking the uncultured microbial populations present in the environment. Metagenomics is a diagnostic tool to target unique signature loci of plant and animal pathogens as well as beneficial microorganisms from samples. Here, we reviewed various aspects of metagenomics from experimental methods to techniques used for sequencing, as well as diversified computational resources, including databases and software tools. Exhaustive focus and study are conducted on the application of metagenomics in agriculture, deciphering various areas, including pathogen and plant disease identification, disease resistance breeding, plant pest control, weed management, abiotic stress management, post-harvest management, discoveries in agriculture, source of novel molecules/compounds, biosurfactants and natural product, identification of biosynthetic molecules, use in genetically modified crops, and antibiotic-resistant genes. Metagenomics-wide association studies study in agriculture on crop productivity rates, intercropping analysis, and agronomic field is analyzed. This article is the first of its comprehensive study and prospects from an agriculture perspective, focusing on a wider range of applications of metagenomics and its association studies.
Collapse
Affiliation(s)
- Mir Asif Iquebal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Jaisri Jagannadham
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Sarika Jaiswal
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Ratna Prabha
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Anil Rai
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
| | - Dinesh Kumar
- Centre for Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India
- School of Interdisciplinary and Applied Sciences, Central University of Haryana, Mahendergarh, Haryana, India
| |
Collapse
|
13
|
McElhinney JMWR, Catacutan MK, Mawart A, Hasan A, Dias J. Interfacing Machine Learning and Microbial Omics: A Promising Means to Address Environmental Challenges. Front Microbiol 2022; 13:851450. [PMID: 35547145 PMCID: PMC9083327 DOI: 10.3389/fmicb.2022.851450] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Accepted: 03/14/2022] [Indexed: 11/13/2022] Open
Abstract
Microbial communities are ubiquitous and carry an exceptionally broad metabolic capability. Upon environmental perturbation, microbes are also amongst the first natural responsive elements with perturbation-specific cues and markers. These communities are thereby uniquely positioned to inform on the status of environmental conditions. The advent of microbial omics has led to an unprecedented volume of complex microbiological data sets. Importantly, these data sets are rich in biological information with potential for predictive environmental classification and forecasting. However, the patterns in this information are often hidden amongst the inherent complexity of the data. There has been a continued rise in the development and adoption of machine learning (ML) and deep learning architectures for solving research challenges of this sort. Indeed, the interface between molecular microbial ecology and artificial intelligence (AI) appears to show considerable potential for significantly advancing environmental monitoring and management practices through their application. Here, we provide a primer for ML, highlight the notion of retaining biological sample information for supervised ML, discuss workflow considerations, and review the state of the art of the exciting, yet nascent, interdisciplinary field of ML-driven microbial ecology. Current limitations in this sphere of research are also addressed to frame a forward-looking perspective toward the realization of what we anticipate will become a pivotal toolkit for addressing environmental monitoring and management challenges in the years ahead.
Collapse
Affiliation(s)
- James M. W. R. McElhinney
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
| | | | - Aurelie Mawart
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Ayesha Hasan
- Applied Genomics Laboratory, Center for Membranes and Advanced Water Technology, Khalifa University, Abu Dhabi, United Arab Emirates
- Department of Biomedical Engineering, Khalifa University, Abu Dhabi, United Arab Emirates
| | - Jorge Dias
- EECS, Center for Autonomous Robotic Systems, Khalifa University, Abu Dhabi, United Arab Emirates
| |
Collapse
|
14
|
Peng Z, Maciel-Guerra A, Baker M, Zhang X, Hu Y, Wang W, Rong J, Zhang J, Xue N, Barrow P, Renney D, Stekel D, Williams P, Liu L, Chen J, Li F, Dottorini T. Whole-genome sequencing and gene sharing network analysis powered by machine learning identifies antibiotic resistance sharing between animals, humans and environment in livestock farming. PLoS Comput Biol 2022; 18:e1010018. [PMID: 35333870 PMCID: PMC8986120 DOI: 10.1371/journal.pcbi.1010018] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 04/06/2022] [Accepted: 03/14/2022] [Indexed: 01/26/2023] Open
Abstract
Anthropogenic environments such as those created by intensive farming of livestock, have been proposed to provide ideal selection pressure for the emergence of antimicrobial-resistant Escherichia coli bacteria and antimicrobial resistance genes (ARGs) and spread to humans. Here, we performed a longitudinal study in a large-scale commercial poultry farm in China, collecting E. coli isolates from both farm and slaughterhouse; targeting animals, carcasses, workers and their households and environment. By using whole-genome phylogenetic analysis and network analysis based on single nucleotide polymorphisms (SNPs), we found highly interrelated non-pathogenic and pathogenic E. coli strains with phylogenetic intermixing, and a high prevalence of shared multidrug resistance profiles amongst livestock, human and environment. Through an original data processing pipeline which combines omics, machine learning, gene sharing network and mobile genetic elements analysis, we investigated the resistance to 26 different antimicrobials and identified 361 genes associated to antimicrobial resistance (AMR) phenotypes; 58 of these were known AMR-associated genes and 35 were associated to multidrug resistance. We uncovered an extensive network of genes, correlated to AMR phenotypes, shared among livestock, humans, farm and slaughterhouse environments. We also found several human, livestock and environmental isolates sharing closely related mobile genetic elements carrying ARGs across host species and environments. In a scenario where no consensus exists on how antibiotic use in the livestock may affect antibiotic resistance in the human population, our findings provide novel insights into the broader epidemiology of antimicrobial resistance in livestock farming. Moreover, our original data analysis method has the potential to uncover AMR transmission pathways when applied to the study of other pathogens active in other anthropogenic environments characterised by complex interconnections between host species. Livestock have been suggested as an important source of antimicrobial-resistant (AMR) Escherichia coli, capable of infecting humans and carrying resistance to drugs used in human medicine. China has a large intensive livestock farming industry, poultry being the second most important source of meat in the country, and is the largest user of antibiotics for food production in the world. Here we studied antimicrobial resistance gene overlap between E. coli isolates collected from humans, livestock and their shared environments in a large-scale Chinese poultry farm and associated slaughterhouse. By using a computational approach that integrates machine learning, whole-genome sequencing, gene sharing network and mobile genetic elements analysis we characterized the E. coli community structure, antimicrobial resistance phenotypes and the genetic relatedness of non-pathogenic and pathogenic E. coli strains. We uncovered the network of genes, associated with AMR, shared across host species (animals and workers) and environments (farm and slaughterhouse). Our approach opens up new avenues for the development of a fast, affordable and effective computational solutions that provide novel insights into the broader epidemiology of antimicrobial resistance in livestock farming.
Collapse
Affiliation(s)
- Zixin Peng
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing, People’s Republic of China
| | - Alexandre Maciel-Guerra
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
| | - Michelle Baker
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
| | - Xibin Zhang
- Qingdao Tian run Food Co., Ltd, New Hope, Beijing, People’s Republic of China
| | - Yue Hu
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
| | - Wei Wang
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing, People’s Republic of China
| | - Jia Rong
- Qingdao Tian run Food Co., Ltd, New Hope, Beijing, People’s Republic of China
| | - Jing Zhang
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing, People’s Republic of China
| | - Ning Xue
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
| | - Paul Barrow
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
- School of Veterinary Medicine, University of Surrey, Guildford, Surrey, United Kingdom
| | - David Renney
- Nimrod Veterinary Products Limited, Moreton-in-Marsh, United Kingdom
| | - Dov Stekel
- School of Biosciences, University of Nottingham, Sutton Bonington, United Kingdom
| | - Paul Williams
- Biodiscovery Institute and School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Longhai Liu
- Qingdao Tian run Food Co., Ltd, New Hope, Beijing, People’s Republic of China
| | - Junshi Chen
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing, People’s Republic of China
| | - Fengqin Li
- NHC Key Laboratory of Food Safety Risk Assessment, Chinese Academy of Medical Science Research Unit (2019RU014), China National Center for Food Safety Risk Assessment, Beijing, People’s Republic of China
- * E-mail: (FL); (TD)
| | - Tania Dottorini
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, United Kingdom
- * E-mail: (FL); (TD)
| |
Collapse
|
15
|
Genome-Wide Mutation Scoring for Machine-Learning-Based Antimicrobial Resistance Prediction. Int J Mol Sci 2021; 22:ijms222313049. [PMID: 34884852 PMCID: PMC8657983 DOI: 10.3390/ijms222313049] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 11/25/2021] [Accepted: 11/29/2021] [Indexed: 01/21/2023] Open
Abstract
The prediction of antimicrobial resistance (AMR) based on genomic information can improve patient outcomes. Genetic mechanisms have been shown to explain AMR with accuracies in line with standard microbiology laboratory testing. To translate genetic mechanisms into phenotypic AMR, machine learning has been successfully applied. AMR machine learning models typically use nucleotide k-mer counts to represent genomic sequences. While k-mer representation efficiently captures sequence variation, it also results in high-dimensional and sparse data. With limited training data available, achieving acceptable model performance or model interpretability is challenging. In this study, we explore the utility of feature engineering with several biologically relevant signals. We propose to predict the functional impact of observed mutations with PROVEAN to use the predicted impact as a new feature for each protein in an organism’s proteome. The addition of the new features was tested on a total of 19,521 isolates across nine clinically relevant pathogens and 30 different antibiotics. The new features significantly improved the predictive performance of trained AMR models for Pseudomonas aeruginosa, Citrobacter freundii, and Escherichia coli. The balanced accuracy of the respective models of those three pathogens improved by 6.0% on average.
Collapse
|
16
|
Zhao Z, Woloszynek S, Agbavor F, Mell JC, Sokhansanj BA, Rosen GL. Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network. PLoS Comput Biol 2021; 17:e1009345. [PMID: 34550967 PMCID: PMC8496832 DOI: 10.1371/journal.pcbi.1009345] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/07/2021] [Accepted: 08/12/2021] [Indexed: 01/04/2023] Open
Abstract
Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).
Collapse
Affiliation(s)
- Zhengqiao Zhao
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Stephen Woloszynek
- Beth Israel Deaconess Medical Center, Boston, Massachusetts, United States of America
- Harvard Medical School, Boston, Massachusetts, United States of America
| | - Felix Agbavor
- School of Biomedical Engineering, Science and Health Systems, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Joshua Chang Mell
- College of Medicine, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Bahrad A. Sokhansanj
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| | - Gail L. Rosen
- Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Department of Electrical and Computer Engineering, College of Engineering, Drexel University, Philadelphia, Pennsylvania, United States of America
| |
Collapse
|
17
|
Bellabarba A, Bacci G, Decorosi F, Aun E, Azzarello E, Remm M, Giovannetti L, Viti C, Mengoni A, Pini F. Competitiveness for Nodule Colonization in Sinorhizobium meliloti: Combined In Vitro-Tagged Strain Competition and Genome-Wide Association Analysis. mSystems 2021. [PMID: 34313466 DOI: 10.1101/2020.09.15.298034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2023] Open
Abstract
Associations between leguminous plants and symbiotic nitrogen-fixing rhizobia are a classic example of mutualism between a eukaryotic host and a specific group of prokaryotic microbes. Although this symbiosis is in part species specific, different rhizobial strains may colonize the same nodule. Some rhizobial strains are commonly known as better competitors than others, but detailed analyses that aim to predict rhizobial competitive abilities based on genomes are still scarce. Here, we performed a bacterial genome-wide association (GWAS) analysis to define the genomic determinants related to the competitive capabilities in the model rhizobial species Sinorhizobium meliloti. For this, 13 tester strains were green fluorescent protein (GFP) tagged and assayed versus 3 red fluorescent protein (RFP)-tagged reference competitor strains (Rm1021, AK83, and BL225C) in a Medicago sativa nodule occupancy test. Competition data and strain genomic sequences were employed to build a model for GWAS based on k-mers. Among the k-mers with the highest scores, 51 k-mers mapped on the genomes of four strains showing the highest competition phenotypes (>60% single strain nodule occupancy; GR4, KH35c, KH46, and SM11) versus BL225C. These k-mers were mainly located on the symbiosis-related megaplasmid pSymA, specifically on genes coding for transporters, proteins involved in the biosynthesis of cofactors, and proteins related to metabolism (e.g., fatty acids). The same analysis was performed considering the sum of single and mixed nodules obtained in the competition assays versus BL225C, retrieving k-mers mapped on the genes previously found and on vir genes. Therefore, the competition abilities seem to be linked to multiple genetic determinants and comprise several cellular components. IMPORTANCE Decoding the competitive pattern that occurs in the rhizosphere is challenging in the study of bacterial social interaction strategies. To date, the single-gene approach has mainly been used to uncover the bases of nodulation, but there is still a knowledge gap regarding the main features that a priori characterize rhizobial strains able to outcompete indigenous rhizobia. Therefore, tracking down which traits make different rhizobial strains able to win the competition for plant infection over other indigenous rhizobia will improve the strain selection process and, consequently, plant yield in sustainable agricultural production systems. We proved that a k-mer-based GWAS approach can efficiently identify the competition determinants of a panel of strains previously analyzed for their plant tissue occupancy using double fluorescent labeling. The reported strategy will be useful for detailed studies on the genomic aspects of the evolution of bacterial symbiosis and for an extensive evaluation of rhizobial inoculants.
Collapse
Affiliation(s)
- Agnese Bellabarba
- Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
- Genexpress Laboratory, Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Giovanni Bacci
- Department of Biology, University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Francesca Decorosi
- Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
- Genexpress Laboratory, Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Erki Aun
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartugrid.10939.32, Tartu, Estonia
| | - Elisa Azzarello
- Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Maido Remm
- Department of Bioinformatics, Institute of Molecular and Cell Biology, University of Tartugrid.10939.32, Tartu, Estonia
| | - Luciana Giovannetti
- Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
- Genexpress Laboratory, Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Carlo Viti
- Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
- Genexpress Laboratory, Department of Agronomy, Food, Environmental and Forestry (DAGRI), University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Alessio Mengoni
- Department of Biology, University of Florencegrid.8404.8, Sesto Fiorentino, Italy
| | - Francesco Pini
- Department of Biology, University of Bari Aldo Morogrid.7644.1, Bari, Italy
| |
Collapse
|
18
|
Predictive Antibiotic Susceptibility Testing by Next-Generation Sequencing for Periprosthetic Joint Infections: Potential and Limitations. Biomedicines 2021; 9:biomedicines9080910. [PMID: 34440114 PMCID: PMC8389688 DOI: 10.3390/biomedicines9080910] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2021] [Revised: 07/21/2021] [Accepted: 07/22/2021] [Indexed: 01/18/2023] Open
Abstract
Joint replacement surgeries are one of the most frequent medical interventions globally. Infections of prosthetic joints are a major health challenge and typically require prolonged or even indefinite antibiotic treatment. As multidrug-resistant pathogens continue to rise globally, novel diagnostics are critical to ensure appropriate treatment and help with prosthetic joint infections (PJI) management. To this end, recent studies have shown the potential of molecular methods such as next-generation sequencing to complement established phenotypic, culture-based methods. Together with advanced bioinformatics approaches, next-generation sequencing can provide comprehensive information on pathogen identity as well as antimicrobial susceptibility, potentially enabling rapid diagnosis and targeted therapy of PJIs. In this review, we summarize current developments in next generation sequencing based predictive antibiotic susceptibility testing and discuss potential and limitations for common PJI pathogens.
Collapse
|
19
|
Competitiveness for Nodule Colonization in Sinorhizobium meliloti: Combined In Vitro-Tagged Strain Competition and Genome-Wide Association Analysis. mSystems 2021; 6:e0055021. [PMID: 34313466 PMCID: PMC8407117 DOI: 10.1128/msystems.00550-21] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Associations between leguminous plants and symbiotic nitrogen-fixing rhizobia are a classic example of mutualism between a eukaryotic host and a specific group of prokaryotic microbes. Although this symbiosis is in part species specific, different rhizobial strains may colonize the same nodule. Some rhizobial strains are commonly known as better competitors than others, but detailed analyses that aim to predict rhizobial competitive abilities based on genomes are still scarce. Here, we performed a bacterial genome-wide association (GWAS) analysis to define the genomic determinants related to the competitive capabilities in the model rhizobial species Sinorhizobium meliloti. For this, 13 tester strains were green fluorescent protein (GFP) tagged and assayed versus 3 red fluorescent protein (RFP)-tagged reference competitor strains (Rm1021, AK83, and BL225C) in a Medicago sativa nodule occupancy test. Competition data and strain genomic sequences were employed to build a model for GWAS based on k-mers. Among the k-mers with the highest scores, 51 k-mers mapped on the genomes of four strains showing the highest competition phenotypes (>60% single strain nodule occupancy; GR4, KH35c, KH46, and SM11) versus BL225C. These k-mers were mainly located on the symbiosis-related megaplasmid pSymA, specifically on genes coding for transporters, proteins involved in the biosynthesis of cofactors, and proteins related to metabolism (e.g., fatty acids). The same analysis was performed considering the sum of single and mixed nodules obtained in the competition assays versus BL225C, retrieving k-mers mapped on the genes previously found and on vir genes. Therefore, the competition abilities seem to be linked to multiple genetic determinants and comprise several cellular components. IMPORTANCE Decoding the competitive pattern that occurs in the rhizosphere is challenging in the study of bacterial social interaction strategies. To date, the single-gene approach has mainly been used to uncover the bases of nodulation, but there is still a knowledge gap regarding the main features that a priori characterize rhizobial strains able to outcompete indigenous rhizobia. Therefore, tracking down which traits make different rhizobial strains able to win the competition for plant infection over other indigenous rhizobia will improve the strain selection process and, consequently, plant yield in sustainable agricultural production systems. We proved that a k-mer-based GWAS approach can efficiently identify the competition determinants of a panel of strains previously analyzed for their plant tissue occupancy using double fluorescent labeling. The reported strategy will be useful for detailed studies on the genomic aspects of the evolution of bacterial symbiosis and for an extensive evaluation of rhizobial inoculants.
Collapse
|
20
|
Allen JP, Snitkin E, Pincus NB, Hauser AR. Forest and Trees: Exploring Bacterial Virulence with Genome-wide Association Studies and Machine Learning. Trends Microbiol 2021; 29:621-633. [PMID: 33455849 PMCID: PMC8187264 DOI: 10.1016/j.tim.2020.12.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 12/07/2020] [Accepted: 12/08/2020] [Indexed: 12/15/2022]
Abstract
The advent of inexpensive and rapid sequencing technologies has allowed bacterial whole-genome sequences to be generated at an unprecedented pace. This wealth of information has revealed an unanticipated degree of strain-to-strain genetic diversity within many bacterial species. Awareness of this genetic heterogeneity has corresponded with a greater appreciation of intraspecies variation in virulence. A number of comparative genomic strategies have been developed to link these genotypic and pathogenic differences with the aim of discovering novel virulence factors. Here, we review recent advances in comparative genomic approaches to identify bacterial virulence determinants, with a focus on genome-wide association studies and machine learning.
Collapse
Affiliation(s)
- Jonathan P Allen
- Department of Microbiology and Immunology, Loyola University Chicago Stritch School of Medicine, Maywood, IL 60153, USA.
| | - Evan Snitkin
- Department of Microbiology and Immunology, Department of Internal Medicine/Division of Infectious Diseases, University of Michigan, Ann Arbor, MI 48109, USA
| | - Nathan B Pincus
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| | - Alan R Hauser
- Department of Microbiology-Immunology, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA; Department of Medicine/Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL 60611, USA
| |
Collapse
|
21
|
Wang W, Baker M, Hu Y, Xu J, Yang D, Maciel-Guerra A, Xue N, Li H, Yan S, Li M, Bai Y, Dong Y, Peng Z, Ma J, Li F, Dottorini T. Whole-Genome Sequencing and Machine Learning Analysis of Staphylococcus aureus from Multiple Heterogeneous Sources in China Reveals Common Genetic Traits of Antimicrobial Resistance. mSystems 2021; 6:e0118520. [PMID: 34100643 PMCID: PMC8579812 DOI: 10.1128/msystems.01185-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Accepted: 05/10/2021] [Indexed: 12/27/2022] Open
Abstract
Staphylococcus aureus is a worldwide leading cause of numerous diseases ranging from food-poisoning to lethal infections. Methicillin-resistant S. aureus (MRSA) has been found capable of acquiring resistance to most antimicrobials. MRSA is ubiquitous and diverse even in terms of antimicrobial resistance (AMR) profiles, posing a challenge for treatment. Here, we present a comprehensive study of S. aureus in China, addressing epidemiology, phylogenetic reconstruction, genomic characterization, and identification of AMR profiles. The study analyzes 673 S. aureus isolates from food as well as from hospitalized and healthy individuals. The isolates have been collected over a 9-year period, between 2010 and 2018, from 27 provinces across China. By whole-genome sequencing, Bayesian divergence analysis, and supervised machine learning, we reconstructed the phylogeny of the isolates and compared them to references from other countries. We identified 72 sequence types (STs), of which, 29 were novel. We found 81 MRSA lineages by multilocus sequence type (MLST), spa, staphylococcal cassette chromosome mec element (SCCmec), and Panton-Valentine leukocidin (PVL) typing. In addition, novel variants of SCCmec type IV hosting extra metal and antimicrobial resistance genes, as well as a new SCCmec type, were found. New Bayesian dating of the split times of major clades showed that ST9, ST59, and ST239 in China and European countries fell in different branches, whereas this pattern was not observed for the ST398 clone. On the contrary, the clonal transmission of ST398 was more intermixed in regard to geographic origin. Finally, we identified genetic determinants of resistance to 10 antimicrobials, discriminating drug-resistant bacteria from susceptible strains in the cohort. Our results reveal the emergence of Chinese MRSA lineages enriched of AMR determinants that share similar genetic traits of antimicrobial resistance across human and food, hinting at a complex scenario of evolving transmission routes. IMPORTANCE Little information is available on the epidemiology and characterization of Staphylococcus aureus in China. The role of food is a cause of major concern: staphylococcal foodborne diseases affect thousands every year, and the presence of resistant Staphylococcus strains on raw retail meat products is well documented. We studied a large heterogeneous data set of S. aureus isolates from many provinces of China, isolated from food as well as from individuals. Our large whole-genome collection represents a unique catalogue that can be easily meta-analyzed and integrated with further studies and adds to the library of S. aureus sequences in the public domain in a currently underrepresented geographical region. The new Bayesian dating of the split times of major drug-resistant enriched clones is relevant in showing that Chinese and European methicillin-resistant S. aureus (MRSA) have evolved differently. Our machine learning approach, across a large number of antibiotics, shows novel determinants underlying resistance and reveals frequent resistant traits in specific clonal complexes, highlighting the importance of particular clonal complexes in China. Our findings substantially expand what is known of the evolution and genetic determinants of resistance in food-associated S. aureus in China and add crucial information for whole-genome sequencing (WGS)-based surveillance of S. aureus.
Collapse
Affiliation(s)
- Wei Wang
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Michelle Baker
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, Leicestershire, United Kingdom
| | - Yue Hu
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, Leicestershire, United Kingdom
| | - Jin Xu
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Dajin Yang
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | | | - Ning Xue
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, Leicestershire, United Kingdom
| | - Hui Li
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Shaofei Yan
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Menghan Li
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Yao Bai
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Yinping Dong
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Zixin Peng
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Jinjing Ma
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
- School of Chemistry and Chemical Engineering, Anqing Normal University, Anqing, Anhui, China
| | - Fengqin Li
- NHC Key Laboratory of Food Safety Risk Assessment, China National Center for Food Safety Risk Assessment, Beijing, China
| | - Tania Dottorini
- School of Veterinary Medicine and Science, University of Nottingham, Sutton Bonington, Leicestershire, United Kingdom
| |
Collapse
|
22
|
Applications of Machine Learning to the Problem of Antimicrobial Resistance: an Emerging Model for Translational Research. J Clin Microbiol 2021; 59:e0126020. [PMID: 33536291 DOI: 10.1128/jcm.01260-20] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Antimicrobial resistance (AMR) remains one of the most challenging phenomena of modern medicine. Machine learning (ML) is a subfield of artificial intelligence that focuses on the development of algorithms that learn how to accurately predict outcome variables using large sets of predictor variables that are typically not hand selected and are minimally curated. Models are parameterized using a training data set and then applied to a test data set on which predictive performance is evaluated. The application of ML algorithms to the problem of AMR has garnered increasing interest in the past 5 years due to the exponential growth of experimental and clinical data, heavy investment in computational capacity, improvements in algorithm performance, and increasing urgency for innovative approaches to reducing the burden of disease. Here, we review the current state of research at the intersection of ML and AMR with an emphasis on three domains of work. The first is the prediction of AMR using genomic data. The second is the use of ML to gain insight into the cellular functions disrupted by antibiotics, which forms the basis for understanding mechanisms of action and developing novel anti-infectives. The third focuses on the application of ML for antimicrobial stewardship using data extracted from the electronic health record. Although the use of ML for understanding, diagnosing, treating, and preventing AMR is still in its infancy, the continued growth of data and interest ensures it will become an important tool for future translational research programs.
Collapse
|
23
|
Alfalfa for a Sustainable Ovine Farming System: Proposed Research for a New Feeding Strategy Based on Alfalfa and Ecological Leftovers in Drought Conditions. SUSTAINABILITY 2021. [DOI: 10.3390/su13073880] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
In the past 10 years, the average demand for meat and milk across the world has significantly increased, especially in developing countries. Therefore, to support the production of animal-derived food products, a huge quantity of feed resources is needed. This paper does not present original research, but rather provides a conceptual strategy to improve primary production in a sustainable way, in relation to forthcoming issues linked to climate change. Increases in meat and milk production could be achieved by formulating balanced diets for ovines based on alfalfa integrated with local agricultural by-products. As the central component of the diet is alfalfa, one goal of the project is increasing the yield of alfalfa in a sustainable way via inoculating seeds with symbiotic rhizobia (i.e., Sinorhizobium meliloti). Seed inoculants are already present on the market but have not been optimized for arid soils. Furthermore, a part of the project is focused on the selection of elite symbiotic strains that show increased resistance to salt stress and competitiveness. The second component of the experimental diets is bio-waste, especially that obtained from olive oil manufacturing (i.e., pomace). The addition of agro-by-products allows us to use such waste as a resource for animal feeding, and possibly, to modulate rumen metabolism, thereby increasing the nutritional quality of milk and meat.
Collapse
|
24
|
Karlsen ST, Vesth TC, Oregaard G, Poulsen VK, Lund O, Henderson G, Bælum J. Machine learning predicts and provides insights into milk acidification rates of Lactococcus lactis. PLoS One 2021; 16:e0246287. [PMID: 33720959 PMCID: PMC7959382 DOI: 10.1371/journal.pone.0246287] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2020] [Accepted: 01/17/2021] [Indexed: 11/18/2022] Open
Abstract
Lactococcus lactis strains are important components in industrial starter cultures for cheese manufacturing. They have many strain-dependent properties, which affect the final product. Here, we explored the use of machine learning to create systematic, high-throughput screening methods for these properties. Fast acidification of milk is such a strain-dependent property. To predict the maximum hourly acidification rate (Vmax), we trained Random Forest (RF) models on four different genomic representations: Presence/absence of gene families, counts of Pfam domains, the 8 nucleotide long subsequences of their DNA (8-mers), and the 9 nucleotide long subsequences of their DNA (9-mers). Vmax was measured at different temperatures, volumes, and in the presence or absence of yeast extract. These conditions were added as features in each RF model. The four models were trained on 257 strains, and the correlation between the measured Vmax and the predicted Vmax was evaluated with Pearson Correlation Coefficients (PC) on a separate dataset of 85 strains. The models all had high PC scores: 0.83 (gene presence/absence model), 0.84 (Pfam domain model), 0.76 (8-mer model), and 0.85 (9-mer model). The models all based their predictions on relevant genetic features and showed consensus on systems for lactose metabolism, degradation of casein, and pH stress response. Each model also predicted a set of features not found by the other models.
Collapse
Affiliation(s)
- Signe Tang Karlsen
- Chr. Hansen A/S, Hoersholm, Denmark
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
- * E-mail:
| | | | | | | | - Ole Lund
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | | | | |
Collapse
|
25
|
Lüftinger L, Májek P, Beisken S, Rattei T, Posch AE. Learning From Limited Data: Towards Best Practice Techniques for Antimicrobial Resistance Prediction From Whole Genome Sequencing Data. Front Cell Infect Microbiol 2021; 11:610348. [PMID: 33659219 PMCID: PMC7917081 DOI: 10.3389/fcimb.2021.610348] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 01/11/2021] [Indexed: 01/20/2023] Open
Abstract
Antimicrobial resistance prediction from whole genome sequencing data (WGS) is an emerging application of machine learning, promising to improve antimicrobial resistance surveillance and outbreak monitoring. Despite significant reductions in sequencing cost, the availability and sampling diversity of WGS data with matched antimicrobial susceptibility testing (AST) profiles required for training of WGS-AST prediction models remains limited. Best practice machine learning techniques are required to ensure trained models generalize to independent data for optimal predictive performance. Limited data restricts the choice of machine learning training and evaluation methods and can result in overestimation of model performance. We demonstrate that the widely used random k-fold cross-validation method is ill-suited for application to small bacterial genomics datasets and offer an alternative cross-validation method based on genomic distance. We benchmarked three machine learning architectures previously applied to the WGS-AST problem on a set of 8,704 genome assemblies from five clinically relevant pathogens across 77 species-compound combinations collated from public databases. We show that individual models can be effectively ensembled to improve model performance. By combining models via stacked generalization with cross-validation, a model ensembling technique suitable for small datasets, we improved average sensitivity and specificity of individual models by 1.77% and 3.20%, respectively. Furthermore, stacked models exhibited improved robustness and were thus less prone to outlier performance drops than individual component models. In this study, we highlight best practice techniques for antimicrobial resistance prediction from WGS data and introduce the combination of genome distance aware cross-validation and stacked generalization for robust and accurate WGS-AST.
Collapse
Affiliation(s)
- Lukas Lüftinger
- Ares Genetics GmbH, Vienna, Austria
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | | | | | - Thomas Rattei
- Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | | |
Collapse
|
26
|
Lepuschitz S, Weinmaier T, Mrazek K, Beisken S, Weinberger J, Posch AE. Analytical Performance Validation of Next-Generation Sequencing Based Clinical Microbiology Assays Using a K-mer Analysis Workflow. Front Microbiol 2020; 11:1883. [PMID: 32849463 PMCID: PMC7422695 DOI: 10.3389/fmicb.2020.01883] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Accepted: 07/17/2020] [Indexed: 12/13/2022] Open
Abstract
Next-generation sequencing (NGS) enables clinical microbiology assays such as molecular typing of bacterial isolates which is now routinely applied for infection control and epidemiology. Additionally, feasibility for NGS-based identification of antimicrobial resistance (AMR) markers as well as genetic prediction of antibiotic susceptibility testing results has been demonstrated. Various bioinformatics approaches enabling NGS-based clinical microbiology assays exist, but standardized, computationally efficient and scalable sample-to-results workflows including validated quality control parameters are still lacking. Bioinformatics analysis workflows based on k-mers have been shown to allow for fast and efficient analysis of large genomics data sets as obtained from microbial sequencing applications. We here demonstrate applicability of k-mer based clinical microbiology assays for whole-genome sequencing (WGS) including variant calling, taxonomic identification, bacterial typing as well as AMR marker detection. The wet-lab and dry-lab workflows were developed and validated in line with Clinical Laboratory Improvement Act (CLIA) guidelines for laboratory-developed tests (LDTs) on multi-drug resistant ESKAPE pathogens. The developed k-mer based workflow demonstrated ≥99.39% repeatability, ≥99.09% reproducibility and ≥99.76% accuracy for variant calling and applied assays as determined by intra-day and inter-day triplicate measurements. The limit of detection (LOD) across assays was found to be at 20× sequencing depth and 15× for AMR marker detection. Thorough benchmarking of the k-mer based workflow revealed analytical performance criteria are comparable to state-of-the-art alignment based workflows across clinical microbiology assays. Diagnostic sensitivity and specificity for multilocus sequence typing (MLST) and phylogenetic analysis were 100% for both approaches. For AMR marker detection, sensitivity and specificity were 95.29 and 99.78% for the k-mer based workflow as compared to 95.17 and 99.77% for the alignment-based approach. Summarizing, results illustrate that k-mer based analysis workflows enable a broad range of clinical microbiology assays, potentially not only for WGS-based typing and AMR gene detection but also genetic prediction of antibiotic susceptibility testing results.
Collapse
|
27
|
Santos-Cortez RLP, Bhutta MF, Earl JP, Hafrén L, Jennings M, Mell JC, Pichichero ME, Ryan AF, Tateossian H, Ehrlich GD. Panel 3: Genomics, precision medicine and targeted therapies. Int J Pediatr Otorhinolaryngol 2020; 130 Suppl 1:109835. [PMID: 32007292 PMCID: PMC7155947 DOI: 10.1016/j.ijporl.2019.109835] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
OBJECTIVE To review the most recent advances in human and bacterial genomics as applied to pathogenesis and clinical management of otitis media. DATA SOURCES PubMed articles published since the last meeting in June 2015 up to June 2019. REVIEW METHODS A panel of experts in human and bacterial genomics of otitis media was formed. Each panel member reviewed the literature in their respective fields and wrote draft reviews. The reviews were shared with all panel members, and a merged draft was created. The panel met at the 20th International Symposium on Recent Advances in Otitis Media in June 2019, discussed the review and refined the content. A final draft was made, circulated, and approved by the panel members. CONCLUSION Trans-disciplinary approaches applying pan-omic technologies to identify human susceptibility to otitis media and to understand microbial population dynamics, patho-adaptation and virulence mechanisms are crucial to the development of novel, personalized therapeutics and prevention strategies for otitis media. IMPLICATIONS FOR PRACTICE In the future otitis media prevention strategies may be augmented by mucosal immunization, combination vaccines targeting multiple pathogens, and modulation of the middle ear microbiome. Both treatment and vaccination may be tailored to an individual's otitis media phenotype as defined by molecular profiles obtained by using rapidly developing techniques in microbial and host genomics.
Collapse
Affiliation(s)
- Regie Lyn P. Santos-Cortez
- Department of Otolaryngology, School of Medicine, University of Colorado Anschutz Medical Campus, 12700 E. 19 Ave., Aurora, CO 80045, USA
| | - Mahmood F. Bhutta
- Department of ENT, Royal Sussex County Hospital, Eastern Road, Brighton BN2 5BE, UK
| | - Joshua P. Earl
- Center for Genomic Sciences, Institute for Molecular Medicine and Infectious Disease; Department of Microbiology and Immunology; Drexel University College of Medicine, 245 N. 15 St., Philadelphia, PA 19102, USA
| | - Lena Hafrén
- Department of Otorhinolaryngology, Head & Neck Surgery, University of Helsinki and Helsinki University Hospital, Helsinki, Tukholmankatu 8A, 00290 Helsinki, Finland
| | - Michael Jennings
- Institute for Glycomics, Gold Coast campus, Griffith University, QLD 4222, Australia
| | - Joshua C. Mell
- Center for Genomic Sciences, Institute for Molecular Medicine and Infectious Disease; Department of Microbiology and Immunology; Drexel University College of Medicine, 245 N. 15 St., Philadelphia, PA 19102, USA
| | - Michael E. Pichichero
- Center for Infectious Diseases and Immunology, Rochester General Hospital Research Institute, 1425 Portland Ave., Rochester, NY 14621, USA
| | - Allen F. Ryan
- Department of Surgery/Otolaryngology, University of California San Diego, 9500 Gilman Dr., La Jolla, CA 92093, USA
| | - Hilda Tateossian
- Mammalian Genetics Unit, MRC Harwell Institute, Harwell, Oxford, Didcot OX11 0RD, UK
| | - Garth D. Ehrlich
- Center for Genomic Sciences, Institute for Molecular Medicine and Infectious Disease; Department of Microbiology and Immunology; Drexel University College of Medicine, 245 N. 15 St., Philadelphia, PA 19102, USA
| |
Collapse
|
28
|
Panyukov VV, Kiselev SS, Ozoline ON. Unique k-mers as Strain-Specific Barcodes for Phylogenetic Analysis and Natural Microbiome Profiling. Int J Mol Sci 2020; 21:ijms21030944. [PMID: 32023871 PMCID: PMC7037511 DOI: 10.3390/ijms21030944] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 01/21/2020] [Accepted: 01/28/2020] [Indexed: 02/07/2023] Open
Abstract
The need for a comparative analysis of natural metagenomes stimulated the development of new methods for their taxonomic profiling. Alignment-free approaches based on the search for marker k-mers turned out to be capable of identifying not only species, but also strains of microorganisms with known genomes. Here, we evaluated the ability of genus-specific k-mers to distinguish eight phylogroups of Escherichia coli (A, B1, C, E, D, F, G, B2) and assessed the presence of their unique 22-mers in clinical samples from microbiomes of four healthy people and four patients with Crohn's disease. We found that a phylogenetic tree inferred from the pairwise distance matrix for unique 18-mers and 22-mers of 124 genomes was fully consistent with the topology of the tree, obtained with concatenated aligned sequences of orthologous genes. Therefore, we propose strain-specific "barcodes" for rapid phylotyping. Using unique 22-mers for taxonomic analysis, we detected microbes of all groups in human microbiomes; however, their presence in the five samples was significantly different. Pointing to the intraspecies heterogeneity of E. coli in the natural microflora, this also indicates the feasibility of further studies of the role of this heterogeneity in maintaining population homeostasis.
Collapse
Affiliation(s)
- Valery V. Panyukov
- Institute of Mathematical Problems of Biology RAS—the Branch of Keldysh Institute of Applied Mathematics of Russian Academy of Sciences, 142290 Pushchino, Russia;
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
| | - Sergey S. Kiselev
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
- Institute of Cell Biophysics of the Russian Academy of Sciences, 142290 Pushchino, Russia
| | - Olga N. Ozoline
- Structural and Functional Genomics Group, Federal Research Center “Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences”, 142290 Pushchino, Russia;
- Institute of Cell Biophysics of the Russian Academy of Sciences, 142290 Pushchino, Russia
- Correspondence:
| |
Collapse
|
29
|
San JE, Baichoo S, Kanzi A, Moosa Y, Lessells R, Fonseca V, Mogaka J, Power R, de Oliveira T. Current Affairs of Microbial Genome-Wide Association Studies: Approaches, Bottlenecks and Analytical Pitfalls. Front Microbiol 2020; 10:3119. [PMID: 32082269 PMCID: PMC7002396 DOI: 10.3389/fmicb.2019.03119] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 12/24/2019] [Indexed: 12/12/2022] Open
Abstract
Microbial genome-wide association studies (mGWAS) are a new and exciting research field that is adapting human GWAS methods to understand how variations in microbial genomes affect host or pathogen phenotypes, such as drug resistance, virulence, host specificity and prognosis. Several computational tools and methods have been developed or adapted from human GWAS to facilitate the discovery of novel mutations and structural variations that are associated with the phenotypes of interest. However, no comprehensive, end-to-end, user-friendly tool is currently available. The development of a broadly applicable pipeline presents a real opportunity among computational biologists. Here, (i) we review the prominent and promising tools, (ii) discuss analytical pitfalls and bottlenecks in mGWAS, (iii) provide insights into the selection of appropriate tools, (iv) highlight the gaps that still need to be filled and how users and developers can work together to overcome these bottlenecks. Use of mGWAS research can inform drug repositioning decisions as well as accelerate the discovery and development of more effective vaccines and antimicrobials for pressing infectious diseases of global health significance, such as HIV, TB, influenza, and malaria.
Collapse
Affiliation(s)
- James Emmanuel San
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Shakuntala Baichoo
- Department of Digital Technologies, FoICDT, University of Mauritius, Réduit, Mauritius
| | - Aquillah Kanzi
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Yumna Moosa
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Richard Lessells
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
| | - Vagner Fonseca
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Laboratório de Genética Celular e Molecular, ICB, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - John Mogaka
- Discipline of Public Health, University of Kwazulu-Natal, Durban, South Africa
| | - Robert Power
- St Edmund Hall, Oxford University, Oxford, United Kingdom
| | - Tulio de Oliveira
- Kwazulu-Natal Research and Innovation Sequencing Platform (KRISP), College of Health Sciences, University of KwaZulu-Natal, Durban, South Africa
- Department of Global Health, University of Washington, Seattle, WA, United States
| |
Collapse
|
30
|
Collineau L, Boerlin P, Carson CA, Chapman B, Fazil A, Hetman B, McEwen SA, Parmley EJ, Reid-Smith RJ, Taboada EN, Smith BA. Integrating Whole-Genome Sequencing Data Into Quantitative Risk Assessment of Foodborne Antimicrobial Resistance: A Review of Opportunities and Challenges. Front Microbiol 2019; 10:1107. [PMID: 31231317 PMCID: PMC6558386 DOI: 10.3389/fmicb.2019.01107] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 05/01/2019] [Indexed: 12/20/2022] Open
Abstract
Whole-genome sequencing (WGS) will soon replace traditional phenotypic methods for routine testing of foodborne antimicrobial resistance (AMR). WGS is expected to improve AMR surveillance by providing a greater understanding of the transmission of resistant bacteria and AMR genes throughout the food chain, and therefore support risk assessment activities. At this stage, it is unclear how WGS data can be integrated into quantitative microbial risk assessment (QMRA) models and whether their integration will impact final risk estimates or the assessment of risk mitigation measures. This review explores opportunities and challenges of integrating WGS data into QMRA models that follow the Codex Alimentarius Guidelines for Risk Analysis of Foodborne AMR. We describe how WGS offers an opportunity to enhance the next-generation of foodborne AMR QMRA modeling. Instead of considering all hazard strains as equally likely to cause disease, WGS data can improve hazard identification by focusing on those strains of highest public health relevance. WGS results can be used to stratify hazards into strains with similar genetic profiles that are expected to behave similarly, e.g., in terms of growth, survival, virulence or response to antimicrobial treatment. The QMRA input distributions can be tailored to each strain accordingly, making it possible to capture the variability in the strains of interest while decreasing the uncertainty in the model. WGS also allows for a more meaningful approach to explore genetic similarity among bacterial populations found at successive stages of the food chain, improving the estimation of the probability and magnitude of exposure to AMR hazards at point of consumption. WGS therefore has the potential to substantially improve the utility of foodborne AMR QMRA models. However, some degree of uncertainty remains in relation to the thresholds of genetic similarity to be used, as well as the degree of correlation between genotypic and phenotypic profiles. The latter could be improved using a functional approach based on prediction of microbial behavior from a combination of 'omics' techniques (e.g., transcriptomics, proteomics and metabolomics). We strongly recommend that methodologies to incorporate WGS data in risk assessment be included in any future revision of the Codex Alimentarius Guidelines for Risk Analysis of Foodborne AMR.
Collapse
Affiliation(s)
- Lucie Collineau
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
| | - Patrick Boerlin
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Carolee A. Carson
- Centre for Foodborne, Environmental and Zoonotic Infectious Diseases, Public Health Agency of Canada, Guelph, ON, Canada
| | - Brennan Chapman
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Aamir Fazil
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
| | - Benjamin Hetman
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Scott A. McEwen
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - E. Jane Parmley
- Centre for Foodborne, Environmental and Zoonotic Infectious Diseases, Public Health Agency of Canada, Guelph, ON, Canada
| | - Richard J. Reid-Smith
- Centre for Foodborne, Environmental and Zoonotic Infectious Diseases, Public Health Agency of Canada, Guelph, ON, Canada
- Department of Population Medicine, Ontario Veterinary College, University of Guelph, Guelph, ON, Canada
| | - Eduardo N. Taboada
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, MB, Canada
| | - Ben A. Smith
- Public Health Risk Sciences Division, National Microbiology Laboratory, Public Health Agency of Canada, Guelph, ON, Canada
| |
Collapse
|
31
|
Moradigaravand D, Palm M, Farewell A, Mustonen V, Warringer J, Parts L. Prediction of antibiotic resistance in Escherichia coli from large-scale pan-genome data. PLoS Comput Biol 2018; 14:e1006258. [PMID: 30550564 PMCID: PMC6310291 DOI: 10.1371/journal.pcbi.1006258] [Citation(s) in RCA: 85] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2018] [Revised: 12/28/2018] [Accepted: 11/18/2018] [Indexed: 12/17/2022] Open
Abstract
The emergence of microbial antibiotic resistance is a global health threat. In clinical settings, the key to controlling spread of resistant strains is accurate and rapid detection. As traditional culture-based methods are time consuming, genetic approaches have recently been developed for this task. The detection of antibiotic resistance is typically made by measuring a few known determinants previously identified from genome sequencing, and thus requires the prior knowledge of its biological mechanisms. To overcome this limitation, we employed machine learning models to predict resistance to 11 compounds across four classes of antibiotics from existing and novel whole genome sequences of 1936 E. coli strains. We considered a range of methods, and examined population structure, isolation year, gene content, and polymorphism information as predictors. Gradient boosted decision trees consistently outperformed alternative models with an average accuracy of 0.91 on held-out data (range 0.81-0.97). While the best models most frequently employed gene content, an average accuracy score of 0.79 could be obtained using population structure information alone. Single nucleotide variation data were less useful, and significantly improved prediction only for two antibiotics, including ciprofloxacin. These results demonstrate that antibiotic resistance in E. coli can be accurately predicted from whole genome sequences without a priori knowledge of mechanisms, and that both genomic and epidemiological data can be informative. This paves way to integrating machine learning approaches into diagnostic tools in the clinic.
Collapse
Affiliation(s)
- Danesh Moradigaravand
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Center for Computational Biology, Institute of Cancer and Genomic Sciences, University of Birmingham, Birmingham, United Kingdom
| | - Martin Palm
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Anne Farewell
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Ville Mustonen
- Organismal and Evolutionary Biology Research Programme, Department of Computer Science, Institute of Biotechnology, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Helsinki, Finland
| | - Jonas Warringer
- Department for Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden
- Centre for Antibiotic Resistance Research at the University of Gothenburg, Gothenburg, Sweden
| | - Leopold Parts
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- Department of Computer Science, University of Tartu, Tartu, Estonia
| |
Collapse
|