1
|
Bergin S, Doorley LA, Rybak JM, Wolfe KH, Butler G, Cuomo CA, Rogers PD. Analysis of clinical Candida parapsilosis isolates reveals copy number variation in key fluconazole resistance genes. Antimicrob Agents Chemother 2024:e0161923. [PMID: 38712935 DOI: 10.1128/aac.01619-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2023] [Accepted: 04/08/2024] [Indexed: 05/08/2024] Open
Abstract
We used whole-genome sequencing to analyze a collection of 35 fluconazole-resistant and 7 susceptible Candida parapsilosis isolates together with coverage analysis and GWAS techniques to identify new mechanisms of fluconazole resistance. Phylogenetic analysis shows that although the collection is diverse, two persistent clinical lineages were identified. We identified copy number variation (CNV) of two genes, ERG11 and CDR1B, in resistant isolates. Two strains have a CNV at the ERG11 locus; the entire ORF is amplified in one, and only the promoter region is amplified in the other. We show that the annotated telomeric gene CDR1B is actually an artifactual in silico fusion of two highly similar neighboring CDR genes due to an assembly error in the C. parapsilosis CDC317 reference genome. We report highly variable copy numbers of the CDR1B region across the collection. Several strains have increased the expansion of the two genes into a tandem array of new chimeric genes. Other strains have experienced a deletion between the two genes creating a single gene with a reciprocal chimerism. We find translocations, duplications, and gene conversion across the CDR gene family in the C. parapsilosis species complex, showing that it is a highly dynamic family.
Collapse
Affiliation(s)
- Sean Bergin
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Laura A Doorley
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Jeffrey M Rybak
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| | - Kenneth H Wolfe
- School of Medicine, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Geraldine Butler
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Christina A Cuomo
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
- Molecular Microbiology and Immunology Department, Brown University, Providence, Rhode Island, USA
| | - P David Rogers
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee, USA
| |
Collapse
|
2
|
Batisti Biffignandi G, Chindelevitch L, Corbella M, Feil EJ, Sassera D, Lees JA. Optimising machine learning prediction of minimum inhibitory concentrations in Klebsiella pneumoniae. Microb Genom 2024; 10:001222. [PMID: 38529944 PMCID: PMC10995625 DOI: 10.1099/mgen.0.001222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Accepted: 03/07/2024] [Indexed: 03/27/2024] Open
Abstract
Minimum Inhibitory Concentrations (MICs) are the gold standard for quantitatively measuring antibiotic resistance. However, lab-based MIC determination can be time-consuming and suffers from low reproducibility, and interpretation as sensitive or resistant relies on guidelines which change over time. Genome sequencing and machine learning promise to allow in silico MIC prediction as an alternative approach which overcomes some of these difficulties, albeit the interpretation of MIC is still needed. Nevertheless, precisely how we should handle MIC data when dealing with predictive models remains unclear, since they are measured semi-quantitatively, with varying resolution, and are typically also left- and right-censored within varying ranges. We therefore investigated genome-based prediction of MICs in the pathogen Klebsiella pneumoniae using 4367 genomes with both simulated semi-quantitative traits and real MICs. As we were focused on clinical interpretation, we used interpretable rather than black-box machine learning models, namely, Elastic Net, Random Forests, and linear mixed models. Simulated traits were generated accounting for oligogenic, polygenic, and homoplastic genetic effects with different levels of heritability. Then we assessed how model prediction accuracy was affected when MICs were framed as regression and classification. Our results showed that treating the MICs differently depending on the number of concentration levels of antibiotic available was the most promising learning strategy. Specifically, to optimise both prediction accuracy and inference of the correct causal variants, we recommend considering the MICs as continuous and framing the learning problem as a regression when the number of observed antibiotic concentration levels is large, whereas with a smaller number of concentration levels they should be treated as a categorical variable and the learning problem should be framed as a classification. Our findings also underline how predictive models can be improved when prior biological knowledge is taken into account, due to the varying genetic architecture of each antibiotic resistance trait. Finally, we emphasise that incrementing the population database is pivotal for the future clinical implementation of these models to support routine machine-learning based diagnostics.
Collapse
Affiliation(s)
- Gherard Batisti Biffignandi
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, UK
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| | - Leonid Chindelevitch
- MRC Centre for Global Infectious Disease Analysis, Imperial College, London, England, UK
| | - Marta Corbella
- Microbiology and Virology Unit, Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - Edward J. Feil
- The Milner Centre for Evolution, Department of Life Sciences, University of Bath, Bath, UK
| | - Davide Sassera
- Department of Biology and Biotechnology, University of Pavia, Pavia, Italy
- Fondazione IRCCS Policlinico San Matteo, Pavia, Italy
| | - John A. Lees
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, UK
| |
Collapse
|
3
|
Carhuaricra-Huaman D, Setubal JC. Step-by-Step Bacterial Genome Comparison. Methods Mol Biol 2024; 2802:107-134. [PMID: 38819558 DOI: 10.1007/978-1-0716-3838-5_5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Thanks to advancements in genome sequencing and bioinformatics, thousands of bacterial genome sequences are available in public databases. This presents an opportunity to study bacterial diversity in unprecedented detail. This chapter describes a complete bioinformatics workflow for comparative genomics of bacterial genomes, including genome annotation, pangenome reconstruction and visualization, phylogenetic analysis, and identification of sequences of interest such as antimicrobial-resistance genes, virulence factors, and phage sequences. The workflow uses state-of-the-art, open-source tools. The workflow is presented by means of a comparative analysis of Salmonella enterica serovar Typhimurium genomes. The workflow is based on Linux commands and scripts, and result visualization relies on the R environment. The chapter provides a step-by-step protocol that researchers with basic expertise in bioinformatics can easily follow to conduct investigations on their own genome datasets.
Collapse
Affiliation(s)
- Dennis Carhuaricra-Huaman
- Programa de Pós-Graduação Interunidades em Bioinformática, Instituto de Matemática e Estatística, Universidade de São Paulo, Sao Paulo, SP, Brazil
- Research Group in Biotechnology Applied to Animal Health, Production and Conservation (SANIGEN), Laboratory of Biology and Molecular Genetics, Faculty of Veterinary Medicine, Universidad Nacional Mayor de San Marcos, San Borja, Lima, Peru
| | - João Carlos Setubal
- Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, Sao Paulo, SP, Brazil.
| |
Collapse
|
4
|
Bergin S, Doorley LA, Rybak JM, Wolfe KH, Butler G, Cuomo CA, Rogers PD. Analysis of clinical Candida parapsilosis isolates reveals copy number variation in key fluconazole resistance genes. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.13.571446. [PMID: 38168157 PMCID: PMC10760152 DOI: 10.1101/2023.12.13.571446] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
We used whole-genome sequencing to analyse a collection of 35 fluconazole resistant and 7 susceptible Candida parapsilosis isolates together with coverage analysis and GWAS techniques to identify new mechanisms of fluconazole resistance. Phylogenetic analysis shows that although the collection is diverse, two probable outbreak groups were identified. We identified copy number variation of two genes, ERG11 and CDR1B, in resistant isolates. Two strains have a CNV at the ERG11 locus; the entire ORF is amplified in one, and only the promoter region is amplified in the other. We show the annotated telomeric gene CDR1B is actually an artefactual in silico fusion of two highly similar neighbouring CDR genes due to an assembly error in the C. parapsilosis CDC317 reference genome. We report highly variable copy numbers of the CDR1B region across the collection. Several strains have increased expansion of the two genes into a tandem array of new chimeric genes. Other strains have experienced a deletion between the two genes creating a single gene with a reciprocal chimerism. We find translocations, duplications, and gene conversion across the CDR gene family in the C. parapsilosis species complex, showing that it is a highly dynamic family.
Collapse
Affiliation(s)
- Sean Bergin
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Laura A Doorley
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Jeffrey M Rybak
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
| | - Kenneth H Wolfe
- School of Medicine, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Geraldine Butler
- School of Biomolecular and Biomedical Science, Conway Institute, University College Dublin, Belfield, Dublin, Ireland
| | - Christina A Cuomo
- Infectious Disease and Microbiome Program, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Molecular Microbiology and Immunology Department, Brown University, Providence, RI, USA
| | - P David Rogers
- Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, TN, USA
| |
Collapse
|
5
|
Wang CC, Hung YT, Chou CY, Hsuan SL, Chen ZW, Chang PY, Jan TR, Tung CW. Using random forest to predict antimicrobial minimum inhibitory concentrations of nontyphoidal Salmonella in Taiwan. Vet Res 2023; 54:11. [PMID: 36747286 PMCID: PMC9903507 DOI: 10.1186/s13567-023-01141-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 01/13/2023] [Indexed: 02/08/2023] Open
Abstract
Antimicrobial resistance (AMR) is a global health issue and surveillance of AMR can be useful for understanding AMR trends and planning intervention strategies. Salmonella, widely distributed in food-producing animals, has been considered the first priority for inclusion in the AMR surveillance program by the World Health Organization (WHO). Recent advances in rapid and affordable whole-genome sequencing (WGS) techniques lead to the emergence of WGS as a one-stop test to predict the antimicrobial susceptibility. Since the variation of sequencing and minimum inhibitory concentration (MIC) measurement methods could result in different results, this study aimed to develop WGS-based random forest models for predicting MIC values of 24 drugs using data generated from the same laboratories in Taiwan. The WGS data have been transformed as a feature vector of 10-mers for machine learning. Based on rigorous validation and independent tests, a good performance was obtained with an average mean absolute error (MAE) less than 1 for both validation and independent test. Feature selection was then applied to identify top-ranked 10-mers that can further improve the prediction performance. For surveillance purposes, the genome sequence-based machine learning methods could be utilized to monitor the difference between predicted and experimental MIC, where a large difference might be worthy of investigation on the emerging genomic determinants.
Collapse
Affiliation(s)
- Chia-Chi Wang
- grid.19188.390000 0004 0546 0241Department and Graduate Institute of Veterinary Medicine, School of Veterinary Medicine, National Taiwan University, Taipei, 106 Taiwan
| | - Yu-Ting Hung
- grid.482517.dAnimal Technology Laboratories, Agricultural Technology Research Institute, Hsinchu City, 300 Taiwan ,grid.260542.70000 0004 0532 3749Graduate Institute of Veterinary Pathobiology, College of Veterinary Medicine, National Chung Hsing University, Taichung, 402 Taiwan
| | - Che-Yu Chou
- grid.412896.00000 0000 9337 0481Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, 106 Taiwan
| | - Shih-Ling Hsuan
- grid.260542.70000 0004 0532 3749Graduate Institute of Veterinary Pathobiology, College of Veterinary Medicine, National Chung Hsing University, Taichung, 402 Taiwan
| | - Zeng-Weng Chen
- grid.482517.dAnimal Technology Laboratories, Agricultural Technology Research Institute, Hsinchu City, 300 Taiwan
| | - Pei-Yu Chang
- grid.59784.370000000406229172Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 350 Taiwan
| | - Tong-Rong Jan
- Department and Graduate Institute of Veterinary Medicine, School of Veterinary Medicine, National Taiwan University, Taipei, 106, Taiwan.
| | - Chun-Wei Tung
- Graduate Institute of Data Science, College of Management, Taipei Medical University, Taipei, 106, Taiwan. .,Institute of Biotechnology and Pharmaceutical Research, National Health Research Institutes, Miaoli County, 350, Taiwan.
| |
Collapse
|
6
|
Combination of Whole Genome Sequencing and Metagenomics for Microbiological Diagnostics. Int J Mol Sci 2022; 23:ijms23179834. [PMID: 36077231 PMCID: PMC9456280 DOI: 10.3390/ijms23179834] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Revised: 08/24/2022] [Accepted: 08/26/2022] [Indexed: 12/21/2022] Open
Abstract
Whole genome sequencing (WGS) provides the highest resolution for genome-based species identification and can provide insight into the antimicrobial resistance and virulence potential of a single microbiological isolate during the diagnostic process. In contrast, metagenomic sequencing allows the analysis of DNA segments from multiple microorganisms within a community, either using an amplicon- or shotgun-based approach. However, WGS and shotgun metagenomic data are rarely combined, although such an approach may generate additive or synergistic information, critical for, e.g., patient management, infection control, and pathogen surveillance. To produce a combined workflow with actionable outputs, we need to understand the pre-to-post analytical process of both technologies. This will require specific databases storing interlinked sequencing and metadata, and also involves customized bioinformatic analytical pipelines. This review article will provide an overview of the critical steps and potential clinical application of combining WGS and metagenomics together for microbiological diagnosis.
Collapse
|