1
|
Hall MB, Lima L, Coin LJM, Iqbal Z. Drug resistance prediction for Mycobacterium tuberculosis with reference graphs. Microb Genom 2023; 9:mgen001081. [PMID: 37552534 PMCID: PMC10483414 DOI: 10.1099/mgen.0.001081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 07/14/2023] [Indexed: 08/09/2023] Open
Abstract
Tuberculosis is a global pandemic disease with a rising burden of antimicrobial resistance. As a result, the World Health Organization (WHO) has a goal of enabling universal access to drug susceptibility testing (DST). Given the slowness of and infrastructure requirements for phenotypic DST, whole-genome sequencing, followed by genotype-based prediction of DST, now provides a route to achieving this. Since a central component of genotypic DST is to detect the presence of any known resistance-causing mutations, a natural approach is to use a reference graph that allows encoding of known variation. We have developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of a Mycobacterium tuberculosis drug resistance reference graph. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44 709 and 138 publicly available Illumina and Nanopore samples with associated phenotypes. We find that DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe with Nanopore data. We discover and discuss novel insights into resistance-conferring variation for M. tuberculosis - including deletion of genes katG and pncA - and suggest mutations that may warrant reclassification as associated with resistance.
Collapse
Affiliation(s)
- Michael B. Hall
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Australia
| | - Leandro Lima
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
| | - Lachlan J. M. Coin
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, Australia
| | - Zamin Iqbal
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, Cambridgeshire, UK
| |
Collapse
|
2
|
Jiang Z, Lu Y, Liu Z, Wu W, Xu X, Dinnyés A, Yu Z, Chen L, Sun Q. Drug resistance prediction and resistance genes identification in Mycobacterium tuberculosis based on a hierarchical attentive neural network utilizing genome-wide variants. Brief Bioinform 2022; 23:6553603. [PMID: 35325021 DOI: 10.1093/bib/bbac041] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/18/2022] [Accepted: 01/27/2022] [Indexed: 01/25/2023] Open
Abstract
Prediction of antimicrobial resistance based on whole-genome sequencing data has attracted greater attention due to its rapidity and convenience. Numerous machine learning-based studies have used genetic variants to predict drug resistance in Mycobacterium tuberculosis (MTB), assuming that variants are homogeneous, and most of these studies, however, have ignored the essential correlation between variants and corresponding genes when encoding variants, and used a limited number of variants as prediction input. In this study, taking advantage of genome-wide variants for drug-resistance prediction and inspired by natural language processing, we summarize drug resistance prediction into document classification, in which variants are considered as words, mutated genes in an isolate as sentences, and an isolate as a document. We propose a novel hierarchical attentive neural network model (HANN) that helps discover drug resistance-related genes and variants and acquire more interpretable biological results. It captures the interaction among variants in a mutated gene as well as among mutated genes in an isolate. Our results show that for the four first-line drugs of isoniazid (INH), rifampicin (RIF), ethambutol (EMB) and pyrazinamide (PZA), the HANN achieves the optimal area under the ROC curve of 97.90, 99.05, 96.44 and 95.14% and the optimal sensitivity of 94.63, 96.31, 92.56 and 87.05%, respectively. In addition, without any domain knowledge, the model identifies drug resistance-related genes and variants consistent with those confirmed by previous studies, and more importantly, it discovers one more potential drug-resistance-related gene.
Collapse
Affiliation(s)
- Zhonghua Jiang
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Yongmei Lu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Zhuochong Liu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Wei Wu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Xinyi Xu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - András Dinnyés
- BioTalentum Ltd. Aulich Lajos str. 26. 2100 Gödöllõ, Hungary
| | - Zhonghua Yu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Li Chen
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Qun Sun
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| |
Collapse
|
3
|
Yang F, Yang J, Zhang Z, Tu G, Yao X, Xue W, Zhu F. Recent Advances in Computer-aided Antiviral Drug Design Targeting HIV-1 Integrase and Reverse Transcriptase Associated Ribonuclease H. Curr Med Chem 2021; 29:1664-1676. [PMID: 34238145 DOI: 10.2174/0929867328666210708090123] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/29/2021] [Accepted: 05/06/2021] [Indexed: 11/22/2022]
Abstract
Acquired immunodeficiency syndrome (AIDS) has been a chronic, life-threatening disease for a long time. However, a broad range of antiretroviral drug regimens are applicable for the successful suppression of virus replication in human immunodeficiency virus type 1 (HIV-1) infected people. The mutation-induced drug resistance problems during the treatment of AIDS forced people to continuously look for new antiviral agents. HIV-1 integrase (IN) and reverse transcriptase associated ribonuclease (RT-RNase H), two pivotal enzymes in HIV-1 replication progress, has gain popularity as drug-able targets for designing novel HIV-1 antiviral drugs. During the development of HIV-1 IN and/or RT-RNase H inhibitors, computer-aided drug design (CADD), including homology modeling, pharmacophore, docking, molecular dynamics (MD) simulation, and binding free energy calculation, represents a significant tool to accelerate the discovery of new drug candidates and reduce costs in antiviral drug development. In this review, we summarized the recent advances in the design of single-and dual-target inhibitors against HIV-1 IN or/and RT-RNase H as well as the prediction of mutation-induced drug resistance based on computational methods. We highlighted the results of the reported literature and proposed some perspectives on the design of novel and more effective antiviral drugs in the future.
Collapse
Affiliation(s)
- Fengyuan Yang
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| | - Jingyi Yang
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| | - Zhao Zhang
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| | - Gao Tu
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| | - Xiaojun Yao
- State Key Laboratory of Applied Organic Chemistry and Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| | - Feng Zhu
- School of Pharmaceutical Sciences, Chongqing Key Laboratory of Natural Product Synthesis and Drug Research, Chongqing University, Chongqing 401331, China
| |
Collapse
|
4
|
Alves NG, Mata AI, Luís JP, Brito RMM, Simões CJV. An Innovative Sequence-to-Structure-Based Approach to Drug Resistance Interpretation and Prediction: The Use of Molecular Interaction Fields to Detect HIV-1 Protease Binding-Site Dissimilarities. Front Chem 2020; 8:243. [PMID: 32411655 PMCID: PMC7202381 DOI: 10.3389/fchem.2020.00243] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 03/13/2020] [Indexed: 12/15/2022] Open
Abstract
In silico methodologies have opened new avenues of research to understanding and predicting drug resistance, a pressing health issue that keeps rising at alarming pace. Sequence-based interpretation systems are routinely applied in clinical context in an attempt to predict mutation-based drug resistance and thus aid the choice of the most adequate antibiotic and antiviral therapy. An important limitation of approaches based on genotypic data exclusively is that mutations are not considered in the context of the three-dimensional (3D) structure of the target. Structure-based in silico methodologies are inherently more suitable to interpreting and predicting the impact of mutations on target-drug interactions, at the cost of higher computational and time demands when compared with sequence-based approaches. Herein, we present a fast, computationally inexpensive, sequence-to-structure-based approach to drug resistance prediction, which makes use of 3D protein structures encoded by input target sequences to draw binding-site comparisons with susceptible templates. Rather than performing atom-by-atom comparisons between input target and template structures, our workflow generates and compares Molecular Interaction Fields (MIFs) that map the areas of energetically favorable interactions between several chemical probe types and the target binding site. Quantitative, pairwise dissimilarity measurements between the target and the template binding sites are thus produced. The method is particularly suited to understanding changes to the 3D structure and the physicochemical environment introduced by mutations into the target binding site. Furthermore, the workflow relies exclusively on freeware, making it accessible to anyone. Using four datasets of known HIV-1 protease sequences as a case-study, we show that our approach is capable of correctly classifying resistant and susceptible sequences given as input. Guided by ROC curve analyses, we fined-tuned a dissimilarity threshold of classification that results in remarkable discriminatory performance (accuracy ≈ ROC AUC ≈ 0.99), illustrating the high potential of sequence-to-structure-, MIF-based approaches in the context of drug resistance prediction. We discuss the complementarity of the proposed methodology to existing prediction algorithms based on genotypic data. The present work represents a new step toward a more comprehensive and structurally-informed interpretation of the impact of genetic variability on the response to HIV-1 therapies.
Collapse
Affiliation(s)
- Nuno G Alves
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - Ana I Mata
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - João P Luís
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal
| | - Rui M M Brito
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal.,BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| | - Carlos J V Simões
- Department of Chemistry, Coimbra Chemistry Centre, University of Coimbra, Coimbra, Portugal.,BSIM Therapeutics, Instituto Pedro Nunes, Coimbra, Portugal
| |
Collapse
|
5
|
Enkirch T, Werngren J, Groenheit R, Alm E, Advani R, Lind Karlberg M, Mansjö M. Systematic Review of Whole-Genome Sequencing Data To Predict Phenotypic Drug Resistance and Susceptibility in Swedish Mycobacterium tuberculosis Isolates, 2016 to 2018. Antimicrob Agents Chemother 2020; 64:e02550-19. [PMID: 32122893 DOI: 10.1128/AAC.02550-19] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 02/21/2020] [Indexed: 11/20/2022] Open
Abstract
In this retrospective study, whole-genome sequencing (WGS) data generated on an Ion Torrent platform was used to predict phenotypic drug resistance profiles for first- and second-line drugs among Swedish clinical Mycobacterium tuberculosis isolates from 2016 to 2018. The accuracy was ∼99% for all first-line drugs and 100% for four second-line drugs. Our analysis supports the introduction of WGS into routine diagnostics, which might, at least in Sweden, replace phenotypic drug susceptibility testing in the future.
Collapse
|
6
|
Shea J, Halse TA, Lapierre P, Shudt M, Kohlerschmidt D, Van Roey P, Limberger R, Taylor J, Escuyer V, Musser KA. Comprehensive Whole-Genome Sequencing and Reporting of Drug Resistance Profiles on Clinical Cases of Mycobacterium tuberculosis in New York State. J Clin Microbiol 2017; 55:1871-1882. [PMID: 28381603 PMCID: PMC5442544 DOI: 10.1128/jcm.00298-17] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2017] [Accepted: 03/28/2017] [Indexed: 12/19/2022] Open
Abstract
Whole-genome sequencing (WGS) is a newer alternative for tuberculosis (TB) diagnostics and is capable of providing rapid drug resistance profiles while performing species identification and capturing the data necessary for genotyping. Our laboratory developed and validated a comprehensive and sensitive WGS assay to characterize Mycobacterium tuberculosis and other M. tuberculosis complex (MTBC) strains, composed of a novel DNA extraction, optimized library preparation, paired-end WGS, and an in-house-developed bioinformatics pipeline. This new assay was assessed using 608 MTBC isolates, with 146 isolates during the validation portion of this study and 462 samples received prospectively. In February 2016, this assay was implemented to test all clinical cases of MTBC in New York State, including isolates and early positive Bactec mycobacterial growth indicator tube (MGIT) 960 cultures from primary specimens. Since the inception of the assay, we have assessed the accuracy of identification of MTBC strains to the species level, concordance with culture-based drug susceptibility testing (DST), and turnaround time. Species identification by WGS was determined to be 99% accurate. Concordance between drug resistance profiles generated by WGS and culture-based DST methods was 96% for eight drugs, with an average resistance-predictive value of 93% and susceptible-predictive value of 96%. This single comprehensive WGS assay has replaced seven molecular assays and has resulted in resistance profiles being reported to physicians an average of 9 days sooner than with culture-based DST for first-line drugs and 32 days sooner for second-line drugs.
Collapse
Affiliation(s)
- Joseph Shea
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Tanya A Halse
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Pascal Lapierre
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Matthew Shudt
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Donna Kohlerschmidt
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Patrick Van Roey
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Ronald Limberger
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Jill Taylor
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Vincent Escuyer
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| | - Kimberlee A Musser
- Wadsworth Center, New York State Department of Health, Albany, New York, USA
| |
Collapse
|
7
|
Durham EEA, Yu X, Harrison RW. FDT 2.0: Improving scalability of the fuzzy decision tree induction tool - integrating database storage. Proc IEEE Symp Comput Intell Healthc Ehealth 2015; 2014:187-190. [PMID: 29226916 DOI: 10.1109/cicare.2014.7007853] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Effective machine-learning handles large datasets efficiently. One key feature of handling large data is the use of databases such as MySQL. The freeware fuzzy decision tree induction tool, FDT, is a scalable supervised-classification software tool implementing fuzzy decision trees. It is based on an optimized fuzzy ID3 (FID3) algorithm. FDT 2.0 improves upon FDT 1.0 by bridging the gap between data science and data engineering: it combines a robust decisioning tool with data retention for future decisions, so that the tool does not need to be recalibrated from scratch every time a new decision is required. In this paper we briefly review the analytical capabilities of the freeware FDT tool and its major features and functionalities; examples of large biological datasets from HIV, microRNAs and sRNAs are included. This work shows how to integrate fuzzy decision algorithms with modern database technology. In addition, we show that integrating the fuzzy decision tree induction tool with database storage allows for optimal user satisfaction in today's Data Analytics world.
Collapse
Affiliation(s)
| | - Xiaxia Yu
- Department of Computer Science, Georgia State University, Atlanta, USA
| | - Robert W Harrison
- Department of Computer Science, Georgia State University, Atlanta, USA
| |
Collapse
|