1
|
Guliaev A, Hjort K, Rossi M, Jonsson S, Nicoloff H, Guy L, Andersson DI. Machine learning detection of heteroresistance in Escherichia coli. EBioMedicine 2025; 113:105618. [PMID: 39986174 DOI: 10.1016/j.ebiom.2025.105618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 02/10/2025] [Accepted: 02/11/2025] [Indexed: 02/24/2025] Open
Abstract
BACKGROUND Heteroresistance (HR) is a significant type of antibiotic resistance observed for several bacterial species and antibiotic classes where a susceptible main population contains small subpopulations of resistant cells. Mathematical models, animal experiments and clinical studies associate HR with treatment failure. Currently used susceptibility tests do not detect heteroresistance reliably, which can result in misclassification of heteroresistant isolates as susceptible which might lead to treatment failure. Here we examined if whole genome sequence (WGS) data and machine learning (ML) can be used to detect bacterial HR. METHODS We classified 467 Escherichia coli clinical isolates as HR or non-HR to the often used β-lactam/inhibitor combination piperacillin-tazobactam using pre-screening and Population Analysis Profiling tests. We sequenced the isolates, assembled the whole genomes and created a set of predictors based on current knowledge of HR mechanisms. Then we trained several machine learning models on 80% of this data set aiming to detect HR isolates. We compared performance of the best ML models on the remaining 20% of the data set with a baseline model based solely on the presence of β-lactamase genes. Furthermore, we sequenced the resistant sub-populations in order to analyse the genetic mechanisms underlying HR. FINDINGS The best ML model achieved 100% sensitivity and 84.6% specificity, outperforming the baseline model. The strongest predictors of HR were the total number of β-lactamase genes, β-lactamase gene variants and presence of IS elements flanking them. Genetic analysis of HR strains confirmed that HR is caused by an increased copy number of resistance genes via gene amplification or plasmid copy number increase. This aligns with the ML model's findings, reinforcing the hypothesis that this mechanism underlies HR in Gram-negative bacteria. INTERPRETATION We demonstrate that a combination of WGS and ML can identify HR in bacteria with perfect sensitivity and high specificity. This improved detection would allow for better-informed treatment decisions and potentially reduce the occurrence of treatment failures associated with HR. FUNDING Funding provided to DIA from the Swedish Research Council (2021-02091) and NIH (1U19AI158080-01).
Collapse
Affiliation(s)
- Andrei Guliaev
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Karin Hjort
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Michele Rossi
- Department of Biosciences, University of Milan, Milan, Italy; Department of Electronics, Information and Bioengineering, Politecnico di Milano, Milan, Italy
| | - Sofia Jonsson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Hervé Nicoloff
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lionel Guy
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden; SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
2
|
Babirye SR, Nsubuga M, Mboowa G, Batte C, Galiwango R, Kateete DP. Machine learning-based prediction of antibiotic resistance in Mycobacterium tuberculosis clinical isolates from Uganda. BMC Infect Dis 2024; 24:1391. [PMID: 39639222 PMCID: PMC11622658 DOI: 10.1186/s12879-024-10282-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2024] [Accepted: 11/27/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Efforts toward tuberculosis management and control are challenged by the emergence of Mycobacterium tuberculosis (MTB) resistance to existing anti-TB drugs. This study aimed to explore the potential of machine learning algorithms in predicting drug resistance of four anti-TB drugs (rifampicin, isoniazid, streptomycin, and ethambutol) in MTB using whole-genome sequence and clinical data from Uganda. We also assessed the model's generalizability on another dataset from South Africa. RESULTS We trained ten machine learning algorithms on a dataset comprising of 182 MTB isolates with clinical data variables (age, sex, HIV status) and SNP mutations across the entire genome as predictor variables and phenotypic drug-susceptibility data for the four drugs as the outcome variable. Model performance varied across the four anti-TB drugs after a five-fold cross validation. The best model was selected considering the highest Mathews Correlation Coefficient (MCC) and Area Under the Receiver Operating Characteristic Curve (AUC) score as key metrics. The Logistic regression excelled in predicting rifampicin resistance (MCC: 0.83 (95% confidence intervals (CI) 0.73-0.86) and AUC: 0.96 (95% CI 0.95-0.98) and streptomycin (MCC: 0.44 (95% CI 0.27-0.58) and AUC: 0.80 (95% CI 0.74-0.82), Extreme Gradient Boosting (XGBoost) for ethambutol (MCC: 0.65 (95% CI 0.54-0.74) and AUC: 0.90 (95% CI 0.83-0.96) and Gradient Boosting (GBC) for isoniazid (MCC: 0.69 (95% CI 0.61-0.78) and AUC: 0.91 (95% CI 0.88-0.96). The best performing model per drug was only trained on the SNP dataset after excluding the clinical data variables because intergrating them with SNP mutations showed a marginal improvement in the model's performance. Despite the high MCC (0.18 to 0.72) and AUC (0.66 to 0.95) scores for all the best models with the Uganda test dataset, LR model for rifampicin and streptomycin didn't generalize with the South Africa dataset compared to the GBC and XGBoost models. Compared to TB profiler, LR for RIF was very sensitive and the GBC for INH and XGBoost for EMB were very specific on the Uganda dataset. TB profiler outperformed all the best models on the South Africa dataset. We identified key mutations associated with drug resistance for these antibiotics. HIV status was also identified among the top significant features in predicting drug resistance. CONCLUSION Leveraging machine learning applications in predicting antimicrobial resistance represents a promising avenue in addressing the global health challenge posed by antimicrobial resistance. This work demonstrates that integration of diverse data types such as genomic and clinical data could improve resistance predictions while using machine learning algorithms, support robust surveillance systems and also inform targeted interventions to curb the rising threat of antimicrobial resistance.
Collapse
Affiliation(s)
- Sandra Ruth Babirye
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O. Box 7072, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Science (ACE), Kampala, Uganda
| | - Mike Nsubuga
- The African Center of Excellence in Bioinformatics and Data-Intensive Science (ACE), Kampala, Uganda
- Faculty of Health Sciences, University of Bristol, Bristol, BS40 5DU, UK
- Jean Golding Institute, University of Bristol, Bristol, BS8 1UH, UK
- The Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - Gerald Mboowa
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O. Box 7072, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Science (ACE), Kampala, Uganda
| | - Charles Batte
- Lung Institute, School of Medicine, College of Health Sciences, Makerere University, Kampala, Uganda
| | - Ronald Galiwango
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O. Box 7072, Kampala, Uganda
- The African Center of Excellence in Bioinformatics and Data-Intensive Science (ACE), Kampala, Uganda
- The Infectious Diseases Institute, Makerere University, Kampala, Uganda
| | - David Patrick Kateete
- Department of Immunology and Molecular Biology, School of Biomedical Sciences, College of Health Sciences, Makerere University, P.O. Box 7072, Kampala, Uganda.
| |
Collapse
|
3
|
Jin C, Jia C, Hu W, Xu H, Shen Y, Yue M. Predicting antimicrobial resistance in E. coli with discriminative position fused deep learning classifier. Comput Struct Biotechnol J 2024; 23:559-565. [PMID: 38274998 PMCID: PMC10809114 DOI: 10.1016/j.csbj.2023.12.041] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2023] [Revised: 12/26/2023] [Accepted: 12/26/2023] [Indexed: 01/27/2024] Open
Abstract
Escherichia coli (E. coli) has become a particular concern due to the increasing incidence of antimicrobial resistance (AMR) observed worldwide. Using machine learning (ML) to predict E. coli AMR is a more efficient method than traditional laboratory testing. However, further improvement in the predictive performance of existing models remains challenging. In this study, we collected 1937 high-quality whole genome sequencing (WGS) data from public databases with an antimicrobial resistance phenotype and modified the existing workflow by adding an attention mechanism to enable the modified workflow to focus more on core single nucleotide polymorphisms (SNPs) that may significantly lead to the development of AMR in E. coli. While comparing the model performance before and after adding the attention mechanism, we also performed a cross-comparison among the published models using random forest (RF), support vector machine (SVM), logistic regression (LR), and convolutional neural network (CNN). Our study demonstrates that the discriminative positional colors of Chaos Game Representation (CGR) images can selectively influence and highlight genome regions without prior knowledge, enhancing prediction accuracy. Furthermore, we developed an online tool (https://github.com/tjiaa/E.coli-ML/tree/main) for assisting clinicians in the rapid prediction of the AMR phenotype of E. coli and accelerating clinical decision-making.
Collapse
Affiliation(s)
- Canghong Jin
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China
| | - Chenghao Jia
- Institute of Preventive Veterinary Sciences and Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou 310058, China
| | - Wenkang Hu
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China
- College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China
| | - Haidong Xu
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China
| | - Yanyi Shen
- School of Computer and Computing Science, Hangzhou City University, Hangzhou 310015, China
| | - Min Yue
- Institute of Preventive Veterinary Sciences and Department of Veterinary Medicine, Zhejiang University College of Animal Sciences, Hangzhou 310058, China
- Hainan Institute of Zhejiang University, Sanya 572000, China
- Zhejiang Provincial Key Laboratory of Preventive Veterinary Medicine, Hangzhou 310058, China
- State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, National Medical Center for Infectious Diseases, The First Affiliated Hospital, College of Medicine, Zhejiang University, Hangzhou 310003, China
| |
Collapse
|
4
|
Pruthi SS, Billows N, Thorpe J, Campino S, Phelan JE, Mohareb F, Clark TG. Leveraging large-scale Mycobacterium tuberculosis whole genome sequence data to characterise drug-resistant mutations using machine learning and statistical approaches. Sci Rep 2024; 14:27091. [PMID: 39511309 PMCID: PMC11544221 DOI: 10.1038/s41598-024-77947-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Accepted: 10/28/2024] [Indexed: 11/15/2024] Open
Abstract
Tuberculosis disease (TB), caused by Mycobacterium tuberculosis (Mtb), is a major global public health problem, resulting in > 1 million deaths each year. Drug resistance (DR), including the multi-drug form (MDR-TB), is challenging control of the disease. Whilst many DR mutations in the Mtb genome are known, analysis of large datasets generated using whole genome sequencing (WGS) platforms can reveal new variants through the assessment of genotype-phenotype associations. Here, we apply tree-based ensemble methods to a dataset comprised of 35,777 Mtb WGS and phenotypic drug-susceptibility test data across first- and second-line drugs. We compare model performance across models trained using mutations in drug-specific regions and genome-wide variants, and find high predictive ability for both first-line (area under ROC curve (AUC); range 88.3-96.5) and second-line (AUC range 84.1-95.4) drugs. To aggregate information from low-frequency variants, we pool mutations by functional impact and observe large improvements in predictive accuracy (e.g., sensitivity: pyrazinamide + 25%; ethionamide + 10%). We further characterise loss-of-function mutations observed in resistant phenotypes, uncovering putative markers of resistance (e.g., ndh 293dupG, Rv3861 78delC). Finally, we profile the distribution of known DR-associated single nucleotide polymorphisms across discretised minimum inhibitory concentration (MIC) data generated from phenotypic testing (n = 12,066), and identify mutations associated with highly resistant phenotypes (e.g., inhA - 779G > T and 62T > C). Overall, our work demonstrates that applying machine learning to large-scale WGS data is useful for providing insights into predicting Mtb binary drug resistance and MIC phenotypes, thereby potentially assisting diagnosis and treatment decision-making for infection control.
Collapse
Affiliation(s)
- Siddharth Sanjay Pruthi
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- School of Water, Energy and Environment, Cranfield University, Bedford, UK
| | - Nina Billows
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Joseph Thorpe
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Susana Campino
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Jody E Phelan
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Fady Mohareb
- School of Water, Energy and Environment, Cranfield University, Bedford, UK
| | - Taane G Clark
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, WC1E 7HT, UK.
| |
Collapse
|
5
|
Falcao IWS, Cardoso DL, Coutinho dos Santos Santos AE, Paixao E, Costa FAR, Figueiredo K, Carneiro S, Seruffo MCDR. Model for predicting drug resistance based on the clinical profile of tuberculosis patients using machine learning techniques. PeerJ Comput Sci 2024; 10:e2246. [PMID: 39650511 PMCID: PMC11623081 DOI: 10.7717/peerj-cs.2246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2024] [Accepted: 07/17/2024] [Indexed: 12/11/2024]
Abstract
Tuberculosis (TB) is a disease caused by the bacterium Mycobacterium tuberculosis and despite effective treatments, still affects millions of people worldwide. The advent of new treatments has not eliminated the significant challenge of TB drug resistance. Repeated and inadequate exposure to drugs has led to the development of strains of the bacteria that are resistant to conventional treatments, making the eradication of the disease even more complex. In this context, it is essential to seek more effective approaches to fighting TB. This article proposes a model for predicting drug resistance based on the clinical profile of TB patients, using machine learning techniques. The model aims to optimize the work of health professionals directly involved with tuberculosis patients, driving the creation of new containment strategies and preventive measures, as it specifies the clinical data that has the greatest impact and identifies the individuals with the greatest predisposition to develop resistance to anti-tuberculosis drugs. The results obtained show, in one of the scenarios, a probability of development of 70% and an accuracy of 84.65% for predicting drug resistance.
Collapse
Affiliation(s)
| | | | | | - Erminio Paixao
- Institute of Technology, Federal University of Para, Belém, PA, Brazil
| | | | - Karla Figueiredo
- Computer Science, State University of Rio de Janeiro, Rio de Janeiro, RJ, Brazil
| | - Saul Carneiro
- João de Barros Barreto University Hospital, Federal University of Para, Belém, PA, Brazil
| | | |
Collapse
|
6
|
Sawant PA, Hiralkar SS, Hulsurkar YP, Phutane MS, Mahajan US, Kudale AM. Predicting over-the-counter antibiotic use in rural Pune, India, using machine learning methods. Epidemiol Health 2024; 46:e2024044. [PMID: 38637971 PMCID: PMC11417445 DOI: 10.4178/epih.e2024044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 03/25/2024] [Indexed: 04/20/2024] Open
Abstract
OBJECTIVES Over-the-counter (OTC) antibiotic use can cause antibiotic resistance, threatening global public health gains. To counter OTC use, this study used machine learning (ML) methods to identify predictors of OTC antibiotic use in rural Pune, India. METHODS The features of OTC antibiotic use were selected using stepwise logistic, lasso, random forest, XGBoost, and Boruta algorithms. Regression and tree-based models with all confirmed and tentatively important features were built to predict the use of OTC antibiotics. Five-fold cross-validation was used to tune the models' hyperparameters. The final model was selected based on the highest area under the curve (AUROC) with a 95% confidence interval (CI) and the lowest log-loss. RESULTS In rural Pune, the prevalence of OTC antibiotic use was 35.9% (95% CI, 31.6 to 40.5). The perception that buying medicines directly from a medicine shop/pharmacy is useful, using antibiotics for eye-related complaints, more household members consuming antibiotics, and longer duration and higher doses of antibiotic consumption in rural blocks and other social groups were confirmed as important features by the Boruta algorithm. The final model was the XGBoost+Boruta model with 7 predictors (AUROC, 0.934; 95% CI, 0.891 to 0.978; log-loss, 0.279) log-loss. CONCLUSIONS XGBoost+Boruta, with 7 predictors, was the most accurate model for predicting OTC antibiotic use in rural Pune. Using OTC antibiotics for eye-related complaints, higher consumption of antibiotics and the perception that buying antibiotics directly from a medicine shop/pharmacy is useful were identified as key factors for planning interventions to improve awareness about proper antibiotic use.
Collapse
Affiliation(s)
- Pravin Arun Sawant
- Department of Health Sciences, School of Health Sciences, Savitribai Phule Pune University, Pune, India
| | - Sakshi Shantanu Hiralkar
- Department of Health Sciences, School of Health Sciences, Savitribai Phule Pune University, Pune, India
| | | | - Mugdha Sharad Phutane
- Department of Health Sciences, School of Health Sciences, Savitribai Phule Pune University, Pune, India
| | - Uma Satish Mahajan
- Department of Health Sciences, School of Health Sciences, Savitribai Phule Pune University, Pune, India
| | - Abhay Machindra Kudale
- Department of Health Sciences, School of Health Sciences, Savitribai Phule Pune University, Pune, India
| |
Collapse
|
7
|
Carter JJ, Walker TM, Walker AS, Whitfield MG, Morlock GP, Lynch CI, Adlard D, Peto TEA, Posey JE, Crook DW, Fowler PW. Prediction of pyrazinamide resistance in Mycobacterium tuberculosis using structure-based machine-learning approaches. JAC Antimicrob Resist 2024; 6:dlae037. [PMID: 38500518 PMCID: PMC10946228 DOI: 10.1093/jacamr/dlae037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024] Open
Abstract
Background Pyrazinamide is one of four first-line antibiotics used to treat tuberculosis; however, antibiotic susceptibility testing for pyrazinamide is challenging. Resistance to pyrazinamide is primarily driven by genetic variation in pncA, encoding an enzyme that converts pyrazinamide into its active form. Methods We curated a dataset of 664 non-redundant, missense amino acid mutations in PncA with associated high-confidence phenotypes from published studies and then trained three different machine-learning models to predict pyrazinamide resistance. All models had access to a range of protein structural-, chemical- and sequence-based features. Results The best model, a gradient-boosted decision tree, achieved a sensitivity of 80.2% and a specificity of 76.9% on the hold-out test dataset. The clinical performance of the models was then estimated by predicting the binary pyrazinamide resistance phenotype of 4027 samples harbouring 367 unique missense mutations in pncA derived from 24 231 clinical isolates. Conclusions This work demonstrates how machine learning can enhance the sensitivity/specificity of pyrazinamide resistance prediction in genetics-based clinical microbiology workflows, highlights novel mutations for future biochemical investigation, and is a proof of concept for using this approach in other drugs.
Collapse
Affiliation(s)
- Joshua J Carter
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Timothy M Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - A Sarah Walker
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- NIHR Health Protection Research Unit in Healthcare Associated Infection and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Michael G Whitfield
- Division of Molecular Biology and Human Genetics, Faculty of Medicine and Health Sciences, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Stellenbosch University, Tygerberg, South Africa
| | - Glenn P Morlock
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Charlotte I Lynch
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Dylan Adlard
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - Timothy E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| | - James E Posey
- Division of Tuberculosis Elimination, National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- NIHR Health Protection Research Unit in Healthcare Associated Infection and Antimicrobial Resistance, University of Oxford, Oxford, UK
| | - Philip W Fowler
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way, Oxford OX3 9DU, UK
| |
Collapse
|
8
|
Dixit A, Freschi L, Vargas R, Gröschel MI, Nakhoul M, Tahseen S, Alam SMM, Kamal SMM, Skrahina A, Basilio RP, Lim DR, Ismail N, Farhat MR. Estimation of country-specific tuberculosis resistance antibiograms using pathogen genomics and machine learning. BMJ Glob Health 2024; 9:e013532. [PMID: 38548342 PMCID: PMC10982777 DOI: 10.1136/bmjgh-2023-013532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/26/2024] [Indexed: 04/02/2024] Open
Abstract
BACKGROUND Global tuberculosis (TB) drug resistance (DR) surveillance focuses on rifampicin. We examined the potential of public and surveillance Mycobacterium tuberculosis (Mtb) whole-genome sequencing (WGS) data, to generate expanded country-level resistance prevalence estimates (antibiograms) using in silico resistance prediction. METHODS We curated and quality-controlled Mtb WGS data. We used a validated random forest model to predict phenotypic resistance to 12 drugs and bias-corrected for model performance, outbreak sampling and rifampicin resistance oversampling. Validation leveraged a national DR survey conducted in South Africa. RESULTS Mtb isolates from 29 countries (n=19 149) met sequence quality criteria. Global marginal genotypic resistance among mono-resistant TB estimates overlapped with the South African DR survey, except for isoniazid, ethionamide and second-line injectables, which were underestimated (n=3134). Among multidrug resistant (MDR) TB (n=268), estimates overlapped for the fluoroquinolones but overestimated other drugs. Globally pooled mono-resistance to isoniazid was 10.9% (95% CI: 10.2-11.7%, n=14 012). Mono-levofloxacin resistance rates were highest in South Asia (Pakistan 3.4% (0.1-11%), n=111 and India 2.8% (0.08-9.4%), n=114). Given the recent interest in drugs enhancing ethionamide activity and their expected activity against isolates with resistance discordance between isoniazid and ethionamide, we measured this rate and found it to be high at 74.4% (IQR: 64.5-79.7%) of isoniazid-resistant isolates predicted to be ethionamide susceptible. The global susceptibility rate to pyrazinamide and levofloxacin among MDR was 15.1% (95% CI: 10.2-19.9%, n=3964). CONCLUSIONS This is the first attempt at global Mtb antibiogram estimation. DR prevalence in Mtb can be reliably estimated using public WGS and phenotypic resistance prediction for key antibiotics, but public WGS data demonstrates oversampling of isolates with higher resistance levels than MDR. Nevertheless, our results raise concerns about the empiric use of short-course fluoroquinolone regimens for drug-susceptible TB in South Asia and indicate underutilisation of ethionamide in MDR treatment.
Collapse
Affiliation(s)
- Avika Dixit
- Division of Infectious Diseases, Department of Pediatrics, Boston Children's Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Luca Freschi
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Roger Vargas
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Center for Computational Biomedicine, Harvard Medical School, Boston, Massachusetts, USA
| | - Matthias I Gröschel
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Maria Nakhoul
- Informatics and Analytics Department, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Sabira Tahseen
- National Tuberculosis Control Programme, Islamabad, Pakistan
| | - S M Masud Alam
- Ministry of Health and Family Welfare, Kolkata, West Bengal, India
| | - S M Mostofa Kamal
- National Institute of Diseases of the Chest and Hospital, Dhaka, Bangladesh
| | - Alena Skrahina
- Republican Scientific and Practical Center for Pulmonology and Tuberculosis, Minsk, Belarus
| | - Ramon P Basilio
- Research Institute for Tropical Medicine, Muntinlupa City, Philippines
| | - Dodge R Lim
- Research Institute for Tropical Medicine, Muntinlupa City, Philippines
| | - Nazir Ismail
- Clinical Microbiology and Infectious Diseases, University of the Witwatersrand Johannesburg Faculty of Health Sciences, Johannesburg, South Africa
| | - Maha R Farhat
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
- Pulmonary and Critical Care Medicine, Massachusetts General Hospital, Boston, Massachusetts, USA
| |
Collapse
|
9
|
Hu K, Meyer F, Deng ZL, Asgari E, Kuo TH, Münch PC, McHardy AC. Assessing computational predictions of antimicrobial resistance phenotypes from microbial genomes. Brief Bioinform 2024; 25:bbae206. [PMID: 38706320 PMCID: PMC11070729 DOI: 10.1093/bib/bbae206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 04/08/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024] Open
Abstract
The advent of rapid whole-genome sequencing has created new opportunities for computational prediction of antimicrobial resistance (AMR) phenotypes from genomic data. Both rule-based and machine learning (ML) approaches have been explored for this task, but systematic benchmarking is still needed. Here, we evaluated four state-of-the-art ML methods (Kover, PhenotypeSeeker, Seq2Geno2Pheno and Aytan-Aktug), an ML baseline and the rule-based ResFinder by training and testing each of them across 78 species-antibiotic datasets, using a rigorous benchmarking workflow that integrates three evaluation approaches, each paired with three distinct sample splitting methods. Our analysis revealed considerable variation in the performance across techniques and datasets. Whereas ML methods generally excelled for closely related strains, ResFinder excelled for handling divergent genomes. Overall, Kover most frequently ranked top among the ML approaches, followed by PhenotypeSeeker and Seq2Geno2Pheno. AMR phenotypes for antibiotic classes such as macrolides and sulfonamides were predicted with the highest accuracies. The quality of predictions varied substantially across species-antibiotic combinations, particularly for beta-lactams; across species, resistance phenotyping of the beta-lactams compound, aztreonam, amoxicillin/clavulanic acid, cefoxitin, ceftazidime and piperacillin/tazobactam, alongside tetracyclines demonstrated more variable performance than the other benchmarked antibiotics. By organism, Campylobacter jejuni and Enterococcus faecium phenotypes were more robustly predicted than those of Escherichia coli, Staphylococcus aureus, Salmonella enterica, Neisseria gonorrhoeae, Klebsiella pneumoniae, Pseudomonas aeruginosa, Acinetobacter baumannii, Streptococcus pneumoniae and Mycobacterium tuberculosis. In addition, our study provides software recommendations for each species-antibiotic combination. It furthermore highlights the need for optimization for robust clinical applications, particularly for strains that diverge substantially from those used for training.
Collapse
Affiliation(s)
- Kaixin Hu
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Ehsaneddin Asgari
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Molecular Cell Biomechanics Laboratory, Department of Bioengineering and Mechanical Engineering, University of California, Berkeley, USA
| | - Tzu-Hao Kuo
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Philipp C Münch
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, Braunschweig, Germany
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Center for Infection Research, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| |
Collapse
|
10
|
Perea-Jacobo R, Paredes-Gutiérrez GR, Guerrero-Chevannier MÁ, Flores DL, Muñiz-Salazar R. Machine Learning of the Whole Genome Sequence of Mycobacterium tuberculosis: A Scoping PRISMA-Based Review. Microorganisms 2023; 11:1872. [PMID: 37630431 PMCID: PMC10456961 DOI: 10.3390/microorganisms11081872] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 07/13/2023] [Accepted: 07/14/2023] [Indexed: 08/27/2023] Open
Abstract
Tuberculosis (TB) remains one of the most significant global health problems, posing a significant challenge to public health systems worldwide. However, diagnosing drug-resistant tuberculosis (DR-TB) has become increasingly challenging due to the rising number of multidrug-resistant (MDR-TB) cases, despite the development of new TB diagnostic tools. Even the World Health Organization-recommended methods such as Xpert MTB/XDR or Truenat are unable to detect all the Mycobacterium tuberculosis genome mutations associated with drug resistance. While Whole Genome Sequencing offers a more precise DR profile, the lack of user-friendly bioinformatics analysis applications hinders its widespread use. This review focuses on exploring various artificial intelligence models for predicting DR-TB profiles, analyzing relevant English-language articles using the PRISMA methodology through the Covidence platform. Our findings indicate that an Artificial Neural Network is the most commonly employed method, with non-statistical dimensionality reduction techniques preferred over traditional statistical approaches such as Principal Component Analysis or t-distributed Stochastic Neighbor Embedding.
Collapse
Affiliation(s)
- Ricardo Perea-Jacobo
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
- Escuela de Ciencias de la Salud, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22890, Mexico
| | - Guillermo René Paredes-Gutiérrez
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Miguel Ángel Guerrero-Chevannier
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Dora-Luz Flores
- Facultad de Ingeniería Arquitectura y Diseño, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22860, Mexico; (R.P.-J.); (G.R.P.-G.); (M.Á.G.-C.)
| | - Raquel Muñiz-Salazar
- Escuela de Ciencias de la Salud, Universidad Autónoma de Baja California, Campus Ensenada, Ensenada 22890, Mexico
| |
Collapse
|
11
|
Álvarez VE, Quiroga MP, Centrón D. Identification of a Specific Biomarker of Acinetobacter baumannii Global Clone 1 by Machine Learning and PCR Related to Metabolic Fitness of ESKAPE Pathogens. mSystems 2023:e0073422. [PMID: 37184409 DOI: 10.1128/msystems.00734-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
Since the emergence of high-risk clones worldwide, constant investigations have been undertaken to comprehend the molecular basis that led to their prevalent dissemination in nosocomial settings over time. So far, the complex and multifactorial genetic traits of this type of epidemic clones have allowed only the identification of biomarkers with low specificity. A machine learning algorithm was able to recognize unequivocally a biomarker for early and accurate detection of Acinetobacter baumannii global clone 1 (GC1), one of the most disseminated high-risk clones. A support vector machine model identified the U1 sequence with a length of 367 nucleotides that matched a fragment of the moaCB gene, which encodes the molybdenum cofactor biosynthesis C and B proteins. U1 differentiates specifically between A. baumannii GC1 and non-GC1 strains, becoming a suitable biomarker capable of being translated into clinical settings as a molecular typing method for early diagnosis based on PCR as shown here. Since the metabolic pathways of Mo enzymes have been recognized as putative therapeutic targets for ESKAPE (Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Enterobacter species) pathogens, our findings highlight that machine learning can also be useful in knowledge gaps of high-risk clones and provides noteworthy support to the literature to identify relevant nosocomial biomarkers for other multidrug-resistant high-risk clones. IMPORTANCE A. baumannii GC1 is an important high-risk clone that rapidly develops extreme drug resistance in the nosocomial niche. Furthermore, several strains have been identified worldwide in environmental samples, exacerbating the risk of human interactions. Early diagnosis is mandatory to limit its dissemination and to outline appropriate antibiotic stewardship schedules. A region with a length of 367 bp (U1) within the moaCB gene that is not subjected to lateral genetic transfer or to antibiotic pressures was successfully found by a support vector machine model that predicts A. baumannii GC1 strains. At the same time, research on the group of Mo enzymes proposed this metabolic pathway related to the superbug's metabolism as a potential future drug target site for ESKAPE pathogens due to its central role in bacterial fitness during infection. These findings confirm that machine learning used for the identification of biomarkers of high-risk lineages can also serve to identify putative novel therapeutic target sites.
Collapse
Affiliation(s)
- Verónica Elizabeth Álvarez
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - María Paula Quiroga
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
- Nodo de Bioinformática. Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Técnicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| | - Daniela Centrón
- Laboratorio de Investigaciones en Mecanismos de Resistencia a Antibióticos (LIMRA), Instituto de Investigaciones en Microbiología y Parasitología Médica, Facultad de Medicina, Universidad de Buenos Aires-Consejo Nacional de Investigaciones Científicas y Tecnológicas (IMPaM, UBA-CONICET), Ciudad Autónoma de Buenos Aires, Argentina
| |
Collapse
|
12
|
Naidu A, Nayak SS, Lulu S S, Sundararajan V. Advances in computational frameworks in the fight against TB: The way forward. Front Pharmacol 2023; 14:1152915. [PMID: 37077815 PMCID: PMC10106641 DOI: 10.3389/fphar.2023.1152915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2023] [Accepted: 03/20/2023] [Indexed: 04/05/2023] Open
Abstract
Around 1.6 million people lost their life to Tuberculosis in 2021 according to WHO estimates. Although an intensive treatment plan exists against the causal agent, Mycobacterium Tuberculosis, evolution of multi-drug resistant strains of the pathogen puts a large number of global populations at risk. Vaccine which can induce long-term protection is still in the making with many candidates currently in different phases of clinical trials. The COVID-19 pandemic has further aggravated the adversities by affecting early TB diagnosis and treatment. Yet, WHO remains adamant on its "End TB" strategy and aims to substantially reduce TB incidence and deaths by the year 2035. Such an ambitious goal would require a multi-sectoral approach which would greatly benefit from the latest computational advancements. To highlight the progress of these tools against TB, through this review, we summarize recent studies which have used advanced computational tools and algorithms for-early TB diagnosis, anti-mycobacterium drug discovery and in the designing of the next-generation of TB vaccines. At the end, we give an insight on other computational tools and Machine Learning approaches which have successfully been applied in biomedical research and discuss their prospects and applications against TB.
Collapse
Affiliation(s)
| | | | | | - Vino Sundararajan
- Department of Biotechnology, School of Bio Sciences and Technology, VIT University, Vellore, India
| |
Collapse
|
13
|
Kumar R, Yadav G, Kuddus M, Ashraf GM, Singh R. Unlocking the microbial studies through computational approaches: how far have we reached? ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:48929-48947. [PMID: 36920617 PMCID: PMC10016191 DOI: 10.1007/s11356-023-26220-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Accepted: 02/24/2023] [Indexed: 04/16/2023]
Abstract
The metagenomics approach accelerated the study of genetic information from uncultured microbes and complex microbial communities. In silico research also facilitated an understanding of protein-DNA interactions, protein-protein interactions, docking between proteins and phyto/biochemicals for drug design, and modeling of the 3D structure of proteins. These in silico approaches provided insight into analyzing pathogenic and nonpathogenic strains that helped in the identification of probable genes for vaccines and antimicrobial agents and comparing whole-genome sequences to microbial evolution. Artificial intelligence, more precisely machine learning (ML) and deep learning (DL), has proven to be a promising approach in the field of microbiology to handle, analyze, and utilize large data that are generated through nucleic acid sequencing and proteomics. This enabled the understanding of the functional and taxonomic diversity of microorganisms. ML and DL have been used in the prediction and forecasting of diseases and applied to trace environmental contaminants and environmental quality. This review presents an in-depth analysis of the recent application of silico approaches in microbial genomics, proteomics, functional diversity, vaccine development, and drug design.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India
- Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA
| | - Garima Yadav
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India
| | - Mohammed Kuddus
- Department of Biochemistry, College of Medicine, University of Hail, Hail, Saudi Arabia
| | - Ghulam Md Ashraf
- Department of Medical Laboratory Sciences, College of Health Sciences, and Sharjah Institute for Medical Research, University of Sharjah, Sharjah , 27272, United Arab Emirates
| | - Rachana Singh
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India.
| |
Collapse
|
14
|
Libiseller-Egger J, Wang L, Deelder W, Campino S, Clark TG, Phelan JE. TB-ML-a framework for comparing machine learning approaches to predict drug resistance of Mycobacterium tuberculosis. BIOINFORMATICS ADVANCES 2023; 3:vbad040. [PMID: 37033466 PMCID: PMC10074023 DOI: 10.1093/bioadv/vbad040] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2022] [Revised: 03/03/2023] [Accepted: 03/22/2023] [Indexed: 04/08/2023]
Abstract
Motivation Machine learning (ML) has shown impressive performance in predicting antimicrobial resistance (AMR) from sequence data, including for Mycobacterium tuberculosis, the causative agent of tuberculosis. However, current ML development and publication practices make it difficult for researchers and clinicians to use, test or reproduce published models. Results We packaged a number of published and unpublished ML models for predicting AMR of M.tuberculosis into Docker containers. Similarly, the pipelines required for pre-processing genomic data into the formats required by the models were also packaged into separate containers. By following a minimal container I/O standard, we ensured as much interoperability as possible. We also created a command-line application, TB-ML, which can be used to easily combine pre-processing and prediction containers into complete pipelines ready for predicting resistance from novel, raw data with a single command. As long as there is adherence to this minimal standard for the container interface, containers produced by researchers holding new models can likewise be included in these pipelines, making benchmark comparisons of different models simple and facilitating faster uptake in the clinic. Availability and implementation TB-ML contains a simple Docker API written in Python and is available at https://github.com/jodyphelan/tb-ml. Example Docker containers for resistance prediction and corresponding data pre-processing as well as a tutorial on how to create new containers for TB-ML are available at https://tb-ml.github.io/tb-ml-containers/. Contact jody.phelan@lshtm.ac.uk.
Collapse
Affiliation(s)
- Julian Libiseller-Egger
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| | - Linfeng Wang
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| | - Wouter Deelder
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| | - Susana Campino
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
- Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| | - Jody E Phelan
- Faculty of Infectious and Tropical Diseases, London School of Hygiene & Tropical Medicine, London WC1E 7HT, UK
| |
Collapse
|
15
|
Velichko A, Huyut MT, Belyaev M, Izotov Y, Korzun D. Machine Learning Sensors for Diagnosis of COVID-19 Disease Using Routine Blood Values for Internet of Things Application. SENSORS (BASEL, SWITZERLAND) 2022; 22:7886. [PMID: 36298235 PMCID: PMC9610709 DOI: 10.3390/s22207886] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 10/10/2022] [Accepted: 10/14/2022] [Indexed: 05/16/2023]
Abstract
Healthcare digitalization requires effective applications of human sensors, when various parameters of the human body are instantly monitored in everyday life due to the Internet of Things (IoT). In particular, machine learning (ML) sensors for the prompt diagnosis of COVID-19 are an important option for IoT application in healthcare and ambient assisted living (AAL). Determining a COVID-19 infected status with various diagnostic tests and imaging results is costly and time-consuming. This study provides a fast, reliable and cost-effective alternative tool for the diagnosis of COVID-19 based on the routine blood values (RBVs) measured at admission. The dataset of the study consists of a total of 5296 patients with the same number of negative and positive COVID-19 test results and 51 routine blood values. In this study, 13 popular classifier machine learning models and the LogNNet neural network model were exanimated. The most successful classifier model in terms of time and accuracy in the detection of the disease was the histogram-based gradient boosting (HGB) (accuracy: 100%, time: 6.39 sec). The HGB classifier identified the 11 most important features (LDL, cholesterol, HDL-C, MCHC, triglyceride, amylase, UA, LDH, CK-MB, ALP and MCH) to detect the disease with 100% accuracy. In addition, the importance of single, double and triple combinations of these features in the diagnosis of the disease was discussed. We propose to use these 11 features and their binary combinations as important biomarkers for ML sensors in the diagnosis of the disease, supporting edge computing on Arduino and cloud IoT service.
Collapse
Affiliation(s)
- Andrei Velichko
- Institute of Physics and Technology, Petrozavodsk State University, 33 Lenin Ave., 185910 Petrozavodsk, Russia
| | - Mehmet Tahir Huyut
- Department of Biostatistics and Medical Informatics, Faculty of Medicine, Erzincan Binali Yıldırım University, 24000 Erzincan, Türkiye
| | - Maksim Belyaev
- Institute of Physics and Technology, Petrozavodsk State University, 33 Lenin Ave., 185910 Petrozavodsk, Russia
| | - Yuriy Izotov
- Institute of Physics and Technology, Petrozavodsk State University, 33 Lenin Ave., 185910 Petrozavodsk, Russia
| | - Dmitry Korzun
- Department of Computer Science, Institute of Mathematics and Information Technology, Petrozavodsk State University, 33 Lenin Ave., 185910 Petrozavodsk, Russia
| |
Collapse
|
16
|
Tzelves L, Lazarou L, Feretzakis G, Kalles D, Mourmouris P, Loupelis E, Basourakos S, Berdempes M, Manolitsis I, Mitsogiannis I, Skolarikos A, Varkarakis I. Using machine learning techniques to predict antimicrobial resistance in stone disease patients. World J Urol 2022; 40:1731-1736. [PMID: 35616713 DOI: 10.1007/s00345-022-04043-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Accepted: 05/02/2022] [Indexed: 10/18/2022] Open
Abstract
PURPOSE Artificial intelligence is part of our daily life and machine learning techniques offer possibilities unknown until now in medicine. This study aims to offer an evaluation of the performance of machine learning (ML) techniques, for predicting bacterial resistance in a urology department. METHODS Data were retrieved from laboratory information system (LIS) concerning 239 patients with urolithiasis hospitalized in the urology department of a tertiary hospital over a 1-year period (2019): age, gender, Gram stain (positive, negative), bacterial species, sample type, antibiotics and antimicrobial susceptibility. In our experiments, we compared several classifiers following a tenfold cross-validation approach on 2 different versions of our dataset; the first contained only information of Gram stain, while the second had knowledge of bacterial species. RESULTS The best results in the balanced dataset containing Gram stain, achieve a weighted average receiver operator curve (ROC) area of 0.768 and F-measure of 0.708, using a multinomial logistic regression model with a ridge estimator. The corresponding results of the balanced dataset, that contained bacterial species, achieve a weighted average ROC area of 0.874 and F-measure of 0.783, with a bagging classifier. CONCLUSIONS Artificial intelligence technology can be used for making predictions on antibiotic resistance patterns when knowing Gram staining with an accuracy of 77% and nearly 87% when identifying specific microorganisms. This knowledge can aid urologists prescribing the appropriate antibiotic 24-48 h before test results are known.
Collapse
Affiliation(s)
- Lazaros Tzelves
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Lazaros Lazarou
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Georgios Feretzakis
- School of Science and Technology, Hellenic Open University, Patras, Greece.,Department of Quality Control, Research and Continuing Education, Sismanogleio General Hospital, Marousi, Greece.,Information Technologies Department, Sismanogleio General Hospital, Marousi, Greece
| | - Dimitris Kalles
- School of Science and Technology, Hellenic Open University, Patras, Greece
| | - Panagiotis Mourmouris
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Evangelos Loupelis
- Information Technologies Department, Sismanogleio General Hospital, Marousi, Greece
| | - Spyridon Basourakos
- Department of Urology, New York Presbyterian Hospital/Weill Cornell Medicine, New York, NY, USA
| | - Marinos Berdempes
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Ioannis Manolitsis
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece.
| | - Iraklis Mitsogiannis
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Andreas Skolarikos
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| | - Ioannis Varkarakis
- 2nd Department of Urology, Sismanogleio General Hospital, National and Kapodistrian University of Athens, Sismanogleiou 37, Marousi, 15126, Athens, Greece
| |
Collapse
|
17
|
Walker TM, Crook DW. Realising the Potential of Genomics for M. tuberculosis: A Silver Lining to the Pandemic? China CDC Wkly 2022; 4:437-439. [PMID: 35685689 PMCID: PMC9167617 DOI: 10.46234/ccdcw2022.063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 03/23/2022] [Indexed: 11/06/2022] Open
Affiliation(s)
- Timothy M Walker
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Derrick W Crook
- Nuffield Department of Medicine University of Oxford, Oxford, UK
| |
Collapse
|
18
|
Jiang Z, Lu Y, Liu Z, Wu W, Xu X, Dinnyés A, Yu Z, Chen L, Sun Q. Drug resistance prediction and resistance genes identification in Mycobacterium tuberculosis based on a hierarchical attentive neural network utilizing genome-wide variants. Brief Bioinform 2022; 23:6553603. [PMID: 35325021 DOI: 10.1093/bib/bbac041] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 01/18/2022] [Accepted: 01/27/2022] [Indexed: 01/25/2023] Open
Abstract
Prediction of antimicrobial resistance based on whole-genome sequencing data has attracted greater attention due to its rapidity and convenience. Numerous machine learning-based studies have used genetic variants to predict drug resistance in Mycobacterium tuberculosis (MTB), assuming that variants are homogeneous, and most of these studies, however, have ignored the essential correlation between variants and corresponding genes when encoding variants, and used a limited number of variants as prediction input. In this study, taking advantage of genome-wide variants for drug-resistance prediction and inspired by natural language processing, we summarize drug resistance prediction into document classification, in which variants are considered as words, mutated genes in an isolate as sentences, and an isolate as a document. We propose a novel hierarchical attentive neural network model (HANN) that helps discover drug resistance-related genes and variants and acquire more interpretable biological results. It captures the interaction among variants in a mutated gene as well as among mutated genes in an isolate. Our results show that for the four first-line drugs of isoniazid (INH), rifampicin (RIF), ethambutol (EMB) and pyrazinamide (PZA), the HANN achieves the optimal area under the ROC curve of 97.90, 99.05, 96.44 and 95.14% and the optimal sensitivity of 94.63, 96.31, 92.56 and 87.05%, respectively. In addition, without any domain knowledge, the model identifies drug resistance-related genes and variants consistent with those confirmed by previous studies, and more importantly, it discovers one more potential drug-resistance-related gene.
Collapse
Affiliation(s)
- Zhonghua Jiang
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Yongmei Lu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Zhuochong Liu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Wei Wu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - Xinyi Xu
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| | - András Dinnyés
- BioTalentum Ltd. Aulich Lajos str. 26. 2100 Gödöllõ, Hungary
| | - Zhonghua Yu
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Li Chen
- College of Computer Science, Sichuan University, Chengdu, Sichuan 610065, China
| | - Qun Sun
- Key Laboratory of Bio-resources and Eco-environment of the Ministry of Education, College of Life Sciences, Sichuan University, Chengdu, Sichuan 610064, China
| |
Collapse
|
19
|
Müller SJ, Meraba RL, Dlamini GS, Mapiye DS. First-line drug resistance profiling of Mycobacterium tuberculosis: a machine learning approach. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2022; 2021:891-899. [PMID: 35309001 PMCID: PMC8861754] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The persistence and emergence of new multi-drug resistant Mycobacterium tuberculosis (M. tb) strains continues to advance the devastating tuberculosis (TB) epidemic. Robust systems are needed to accurately and rapidly perform drug-resistance profiling, and machine learning (ML) methods combined with genomic sequence data may provide novel insights into drug-resistance mechanisms. Using 372 M. tb isolates, the combined utility of ML and bioinformatics to perform drug-resistance profiling is demonstrated. SNPs, InDels, and dinucleotide frequencies are explored as input features for three ML models, namely Decision Trees, Random Forest, and the eXtreme Gradient Boosted model. Using SNPs and InDels, all three models performed equally well yielding a 99% accuracy, 97% recall, and 99% F1-score. Using dinucleotide frequencies, the XGBoost algorithm was superior with a 97% accuracy, 94% recall and 97% F1-score. This study validates the use of variants and presents dinucleotide features as another effective feature encoding method for ML-based phenotype classification.
Collapse
|
20
|
Kuang X, Wang F, Hernandez KM, Zhang Z, Grossman RL. Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN. Sci Rep 2022; 12:2427. [PMID: 35165358 PMCID: PMC8844416 DOI: 10.1038/s41598-022-06449-4] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2021] [Accepted: 01/31/2022] [Indexed: 12/04/2022] Open
Abstract
Effective and timely antibiotic treatment depends on accurate and rapid in silico antimicrobial-resistant (AMR) predictions. Existing statistical rule-based Mycobacterium tuberculosis (MTB) drug resistance prediction methods using bacterial genomic sequencing data often achieve varying results: high accuracy on some antibiotics but relatively low accuracy on others. Traditional machine learning (ML) approaches have been applied to classify drug resistance for MTB and have shown more stable performance. However, there is no study that uses deep learning architecture like Convolutional Neural Network (CNN) on a large and diverse cohort of MTB samples for AMR prediction. We developed 24 binary classifiers of MTB drug resistance status across eight anti-MTB drugs and three different ML algorithms: logistic regression, random forest and 1D CNN using a training dataset of 10,575 MTB isolates collected from 16 countries across six continents, where an extended pan-genome reference was used for detecting genetic features. Our 1D CNN architecture was designed to integrate both sequential and non-sequential features. In terms of F1-scores, 1D CNN models are our best classifiers that are also more accurate and stable than the state-of-the-art rule-based tool Mykrobe predictor (81.1 to 93.8%, 93.7 to 96.2%, 93.1 to 94.8%, 95.9 to 97.2% and 97.1 to 98.2% for ethambutol, rifampicin, pyrazinamide, isoniazid and ofloxacin respectively). We applied filter-based feature selection to find AMR relevant features. All selected variant features are AMR-related ones in CARD database. 78.8% of them are also in the catalogue of MTB mutations that were recently identified as drug resistance-associated ones by WHO. To facilitate ML model development for AMR prediction, we packaged every step into an automated pipeline and shared the source code at https://github.com/KuangXY3/MTB-AMR-classification-CNN.
Collapse
|
21
|
Deelder W, Napier G, Campino S, Palla L, Phelan J, Clark TG. A modified decision tree approach to improve the prediction and mutation discovery for drug resistance in Mycobacterium tuberculosis. BMC Genomics 2022; 23:46. [PMID: 35016609 PMCID: PMC8753810 DOI: 10.1186/s12864-022-08291-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Accepted: 01/03/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug resistant Mycobacterium tuberculosis is complicating the effective treatment and control of tuberculosis disease (TB). With the adoption of whole genome sequencing as a diagnostic tool, machine learning approaches are being employed to predict M. tuberculosis resistance and identify underlying genetic mutations. However, machine learning approaches can overfit and fail to identify causal mutations if they are applied out of the box and not adapted to the disease-specific context. We introduce a machine learning approach that is customized to the TB setting, which extracts a library of genomic variants re-occurring across individual studies to improve genotypic profiling. RESULTS We developed a customized decision tree approach, called Treesist-TB, that performs TB drug resistance prediction by extracting and evaluating genomic variants across multiple studies. The application of Treesist-TB to rifampicin (RIF), isoniazid (INH) and ethambutol (EMB) drugs, for which resistance mutations are known, demonstrated a level of predictive accuracy similar to the widely used TB-Profiler tool (Treesist-TB vs. TB-Profiler tool: RIF 97.5% vs. 97.6%; INH 96.8% vs. 96.5%; EMB 96.8% vs. 95.8%). Application of Treesist-TB to less understood second-line drugs of interest, ethionamide (ETH), cycloserine (CYS) and para-aminosalisylic acid (PAS), led to the identification of new variants (52, 6 and 11, respectively), with a high number absent from the TB-Profiler library (45, 4, and 6, respectively). Thereby, Treesist-TB had improved predictive sensitivity (Treesist-TB vs. TB-Profiler tool: PAS 64.3% vs. 38.8%; CYS 45.3% vs. 30.7%; ETH 72.1% vs. 71.1%). CONCLUSION Our work reinforces the utility of machine learning for drug resistance prediction, while highlighting the need to customize approaches to the disease-specific context. Through applying a modified decision learning approach (Treesist-TB) across a range of anti-TB drugs, we identified plausible resistance-encoding genomic variants with high predictive ability, whilst potentially overcoming the overfitting challenges that can affect standard machine learning applications.
Collapse
Affiliation(s)
- Wouter Deelder
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Dalberg Advisors, 7 Rue de Chantepoulet, CH-1201, Geneva, Switzerland
| | - Gary Napier
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Susana Campino
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Luigi Palla
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
- Department of Public Health and Infectious Diseases, University of Rome La Sapienza, Rome, Italy
| | - Jody Phelan
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK
| | - Taane G Clark
- London School of Hygiene & Tropical Medicine, Keppel Street, London, WC1E 7HT, UK.
- Department of Infection Biology, Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.
| |
Collapse
|
22
|
Ren Y, Chakraborty T, Doijad S, Falgenhauer L, Falgenhauer J, Goesmann A, Schwengers O, Heider D. Multi-label classification for multi-drug resistance prediction of Escherichia coli. Comput Struct Biotechnol J 2022; 20:1264-1270. [PMID: 35317240 PMCID: PMC8918850 DOI: 10.1016/j.csbj.2022.03.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Revised: 03/08/2022] [Accepted: 03/08/2022] [Indexed: 11/03/2022] Open
|
23
|
Nwanosike EM, Conway BR, Merchant HA, Hasan SS. Potential applications and performance of machine learning techniques and algorithms in clinical practice: A systematic review. Int J Med Inform 2021; 159:104679. [PMID: 34990939 DOI: 10.1016/j.ijmedinf.2021.104679] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2021] [Revised: 12/08/2021] [Accepted: 12/27/2021] [Indexed: 12/11/2022]
Abstract
PURPOSE The advent of clinically adapted machine learning algorithms can solve numerous problems ranging from disease diagnosis and prognosis to therapy recommendations. This systematic review examines the performance of machine learning (ML) algorithms and evaluates the progress made to date towards their implementation in clinical practice. METHODS Systematic searching of databases (PubMed, MEDLINE, Scopus, Google Scholar, Cochrane Library and WHO Covid-19 database) to identify original articles published between January 2011 and October 2021. Studies reporting ML techniques in clinical practice involving humans and ML algorithms with a performance metric were considered. RESULTS Of 873 unique articles identified, 36 studies were eligible for inclusion. The XGBoost (extreme gradient boosting) algorithm showed the highest potential for clinical applications (n = 7 studies); this was followed jointly by random forest algorithm, logistic regression, and the support vector machine, respectively (n = 5 studies). Prediction of outcomes (n = 33), in particular Inflammatory diseases (n = 7) received the most attention followed by cancer and neuropsychiatric disorders (n = 5 for each) and Covid-19 (n = 4). Thirty-three out of the thirty-six included studies passed more than 50% of the selected quality assessment criteria in the TRIPOD checklist. In contrast, none of the studies could achieve an ideal overall bias rating of 'low' based on the PROBAST checklist. In contrast, only three studies showed evidence of the deployment of ML algorithm(s) in clinical practice. CONCLUSIONS ML is potentially a reliable tool for clinical decision support. Although advocated widely in clinical practice, work is still in progress to validate clinically adapted ML algorithms. Improving quality standards, transparency, and interpretability of ML models will further lower the barriers to acceptability.
Collapse
Affiliation(s)
- Ezekwesiri Michael Nwanosike
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Barbara R Conway
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Hamid A Merchant
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom
| | - Syed Shahzad Hasan
- Department of Pharmacy, School of Applied Sciences, University of Huddersfield, Queensgate Huddersfield HD1 3DH, West Yorkshire, United Kingdom; School of Biomedical Sciences & Pharmacy, University of Newcastle, Callaghan, Australia.
| |
Collapse
|
24
|
Yang Y, Walker TM, Kouchaki S, Wang C, Peto TEA, Crook DW, Clifton DA. An end-to-end heterogeneous graph attention network for Mycobacterium tuberculosis drug-resistance prediction. Brief Bioinform 2021; 22:6355133. [PMID: 34414415 PMCID: PMC8575050 DOI: 10.1093/bib/bbab299] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 06/28/2021] [Accepted: 07/16/2021] [Indexed: 11/23/2022] Open
Abstract
Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
Collapse
Affiliation(s)
- Yang Yang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, UK.,Oxford-Suzhou Centre for Advanced Research, Suzhou, 215123, China
| | - Timothy M Walker
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - Samaneh Kouchaki
- Centre for vision, Speech, and Signal processing, University of Surrey, Guildford, UK
| | - Chenyang Wang
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, UK
| | - Timothy E A Peto
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital Headley Way, OX3 9DU, Oxford, UK
| | - Derrick W Crook
- Nuffield Department of Medicine, University of Oxford, John Radcliffe Hospital Headley Way, OX3 9DU, Oxford, UK.,NIHR Oxford Biomedical Research Centre, John Radcliffe Hospital, Headley Way Headington, OX3 9DU, Oxford, UK
| | | | - David A Clifton
- Institute of Biomedical Engineering, Department of Engineering Science, University of Oxford, Oxford, OX3 7DQ, UK.,Oxford-Suzhou Centre for Advanced Research, Suzhou, 215123, China
| |
Collapse
|
25
|
Li D, Wang Y, Hu W, Chen F, Zhao J, Chen X, Han L. Application of Machine Learning Classifier to Candida auris Drug Resistance Analysis. Front Cell Infect Microbiol 2021; 11:742062. [PMID: 34722336 PMCID: PMC8554202 DOI: 10.3389/fcimb.2021.742062] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2021] [Accepted: 09/22/2021] [Indexed: 12/30/2022] Open
Abstract
Candida auris (C. auris) is an emerging fungus associated with high morbidity. It has a unique transmission ability and is often resistant to multiple drugs. In this study, we evaluated the ability of different machine learning models to classify the drug resistance and predicted and ranked the drug resistance mutations of C. auris. Two C. auris strains were obtained. Combined with other 356 strains collected from the European Bioinformatics Institute (EBI) databases, the whole genome sequencing (WGS) data were analyzed by bioinformatics. Machine learning classifiers were used to build drug resistance models, which were evaluated and compared by various evaluation methods based on AUC value. Briefly, two strains were assigned to Clade III in the phylogenetic tree, which was consistent with previous studies; nevertheless, the phylogenetic tree was not completely consistent with the conclusion of clustering according to the geographical location discovered earlier. The clustering results of C. auris were related to its drug resistance. The resistance genes of C. auris were not under additional strong selection pressure, and the performance of different models varied greatly for different drugs. For drugs such as azoles and echinocandins, the models performed relatively well. In addition, two machine learning algorithms, based on the balanced test and imbalanced test, were designed and evaluated; for most drugs, the evaluation results on the balanced test set were better than on the imbalanced test set. The mutations strongly be associated with drug resistance of C. auris were predicted and ranked by Recursive Feature Elimination with Cross-Validation (RFECV) combined with a machine learning classifier. In addition to known drug resistance mutations, some new resistance mutations were predicted, such as Y501H and I466M mutation in the ERG11 gene and R278H mutation in the ERG10 gene, which may be associated with fluconazole (FCZ), micafungin (MCF), and amphotericin B (AmB) resistance, respectively; these mutations were in the “hot spot” regions of the ergosterol pathway. To sum up, this study suggested that machine learning classifiers are a useful and cost-effective method to identify fungal drug resistance-related mutations, which is of great significance for the research on the resistance mechanism of C. auris.
Collapse
Affiliation(s)
- Dingchen Li
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China
| | - Yaru Wang
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China.,School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China
| | - Wenjuan Hu
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China.,School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China
| | - Fangyan Chen
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China
| | - Jingya Zhao
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China
| | - Xia Chen
- School of Mathematics and Statistics, Shaanxi Normal University, Xi'an, China
| | - Li Han
- Department of Disinfection and Infection Control, Chinese People's Liberation Army (PLA) Center for Disease Control and Prevention, Beijing, China
| |
Collapse
|
26
|
He S, Leanse LG, Feng Y. Artificial intelligence and machine learning assisted drug delivery for effective treatment of infectious diseases. Adv Drug Deliv Rev 2021; 178:113922. [PMID: 34461198 DOI: 10.1016/j.addr.2021.113922] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Revised: 07/14/2021] [Accepted: 08/09/2021] [Indexed: 12/23/2022]
Abstract
In the era of antimicrobial resistance, the prevalence of multidrug-resistant microorganisms that resist conventional antibiotic treatment has steadily increased. Thus, it is now unquestionable that infectious diseases are significant global burdens that urgently require innovative treatment strategies. Emerging studies have demonstrated that artificial intelligence (AI) can transform drug delivery to promote effective treatment of infectious diseases. In this review, we propose to evaluate the significance, essential principles, and popular tools of AI in drug delivery for infectious disease treatment. Specifically, we will focus on the achievements and key findings of current research, as well as the applications of AI on drug delivery throughout the whole antimicrobial treatment process, with an emphasis on drug development, treatment regimen optimization, drug delivery system and administration route design, and drug delivery outcome prediction. To that end, the challenges of AI in drug delivery for infectious disease treatments and their current solutions and future perspective will be presented and discussed.
Collapse
Affiliation(s)
- Sheng He
- Boston Children's Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| | - Leon G Leanse
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA
| | - Yanfang Feng
- Massachusetts General Hospital, Harvard Medical School, Harvard University, Boston, MA, USA.
| |
Collapse
|
27
|
Mugumbate G, Nyathi B, Zindoga A, Munyuki G. Application of Computational Methods in Understanding Mutations in Mycobacterium tuberculosis Drug Resistance. Front Mol Biosci 2021; 8:643849. [PMID: 34651013 PMCID: PMC8505691 DOI: 10.3389/fmolb.2021.643849] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2020] [Accepted: 08/16/2021] [Indexed: 11/23/2022] Open
Abstract
The emergence of drug-resistant strains of Mycobacterium tuberculosis (Mtb) impedes the End TB Strategy by the World Health Organization aiming for zero deaths, disease, and suffering at the hands of tuberculosis (TB). Mutations within anti-TB drug targets play a major role in conferring drug resistance within Mtb; hence, computational methods and tools are being used to understand the mechanisms by which they facilitate drug resistance. In this article, computational techniques such as molecular docking and molecular dynamics are applied to explore point mutations and their roles in affecting binding affinities for anti-TB drugs, often times lowering the protein’s affinity for the drug. Advances and adoption of computational techniques, chemoinformatics, and bioinformatics in molecular biosciences and resources supporting machine learning techniques are in abundance, and this has seen a spike in its use to predict mutations in Mtb. This article highlights the importance of molecular modeling in deducing how point mutations in proteins confer resistance through destabilizing binding sites of drugs and effectively inhibiting the drug action.
Collapse
Affiliation(s)
- Grace Mugumbate
- Department of Chemical Sciences, Midlands State University, Gweru, Zimbabwe
| | - Brilliant Nyathi
- Department of Chemistry, Chinhoyi University of Technology, Chinhoyi, Zimbabwe
| | - Albert Zindoga
- Department of Chemistry, Chinhoyi University of Technology, Chinhoyi, Zimbabwe
| | - Gadzikano Munyuki
- Department of Chemistry, Chinhoyi University of Technology, Chinhoyi, Zimbabwe
| |
Collapse
|
28
|
Ren Y, Chakraborty T, Doijad S, Falgenhauer L, Falgenhauer J, Goesmann A, Hauschild AC, Schwengers O, Heider D. Prediction of antimicrobial resistance based on whole-genome sequencing and machine learning. Bioinformatics 2021; 38:325-334. [PMID: 34613360 PMCID: PMC8722762 DOI: 10.1093/bioinformatics/btab681] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Revised: 08/27/2021] [Accepted: 09/24/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Antimicrobial resistance (AMR) is one of the biggest global problems threatening human and animal health. Rapid and accurate AMR diagnostic methods are thus very urgently needed. However, traditional antimicrobial susceptibility testing (AST) is time-consuming, low throughput and viable only for cultivable bacteria. Machine learning methods may pave the way for automated AMR prediction based on genomic data of the bacteria. However, comparing different machine learning methods for the prediction of AMR based on different encodings and whole-genome sequencing data without previously known knowledge remains to be done. RESULTS In this study, we evaluated logistic regression (LR), support vector machine (SVM), random forest (RF) and convolutional neural network (CNN) for the prediction of AMR for the antibiotics ciprofloxacin, cefotaxime, ceftazidime and gentamicin. We could demonstrate that these models can effectively predict AMR with label encoding, one-hot encoding and frequency matrix chaos game representation (FCGR encoding) on whole-genome sequencing data. We trained these models on a large AMR dataset and evaluated them on an independent public dataset. Generally, RFs and CNNs perform better than LR and SVM with AUCs up to 0.96. Furthermore, we were able to identify mutations that are associated with AMR for each antibiotic. AVAILABILITY AND IMPLEMENTATION Source code in data preparation and model training are provided at GitHub website (https://github.com/YunxiaoRen/ML-iAMR). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yunxiao Ren
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg 35032, Germany
| | - Trinad Chakraborty
- Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen 35392, Germany,German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany
| | - Swapnil Doijad
- Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen 35392, Germany,German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany
| | - Linda Falgenhauer
- German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany,Institute of Hygiene and Environmental Medicine, Justus Liebig University Giessen, Giessen 35392, Germany,Hessisches universitäres Kompetenzzentrum Krankenhaushygiene, Giessen 35392, Germany
| | - Jane Falgenhauer
- Institute of Medical Microbiology, Justus Liebig University Giessen, Giessen 35392, Germany,German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany
| | - Alexander Goesmann
- German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany,Department of Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen 35392, Germany
| | - Anne-Christin Hauschild
- Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, Philipps-University of Marburg, Marburg 35032, Germany
| | - Oliver Schwengers
- German Center for Infection Research, Partner site Giessen-Marburg-Langen, Giessen 35392, Germany,Department of Bioinformatics and Systems Biology, Justus Liebig University Giessen, Giessen 35392, Germany
| | | |
Collapse
|
29
|
Zabeti H, Dexter N, Safari AH, Sedaghat N, Libbrecht M, Chindelevitch L. INGOT-DR: an interpretable classifier for predicting drug resistance in M. tuberculosis. Algorithms Mol Biol 2021; 16:17. [PMID: 34376217 PMCID: PMC8353837 DOI: 10.1186/s13015-021-00198-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2021] [Accepted: 07/23/2021] [Indexed: 12/13/2022] Open
Abstract
Motivation Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data. Contribution In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time. Results We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.
Collapse
|
30
|
Goodswen SJ, Barratt JLN, Kennedy PJ, Kaufer A, Calarco L, Ellis JT. Machine learning and applications in microbiology. FEMS Microbiol Rev 2021; 45:6174022. [PMID: 33724378 PMCID: PMC8498514 DOI: 10.1093/femsre/fuab015] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2021] [Accepted: 02/28/2021] [Indexed: 12/15/2022] Open
Abstract
To understand the intricacies of microorganisms at the molecular level requires making sense of copious volumes of data such that it may now be humanly impossible to detect insightful data patterns without an artificial intelligence application called machine learning. Applying machine learning to address biological problems is expected to grow at an unprecedented rate, yet it is perceived by the uninitiated as a mysterious and daunting entity entrusted to the domain of mathematicians and computer scientists. The aim of this review is to identify key points required to start the journey of becoming an effective machine learning practitioner. These key points are further reinforced with an evaluation of how machine learning has been applied so far in a broad scope of real-life microbiology examples. This includes predicting drug targets or vaccine candidates, diagnosing microorganisms causing infectious diseases, classifying drug resistance against antimicrobial medicines, predicting disease outbreaks and exploring microbial interactions. Our hope is to inspire microbiologists and other related researchers to join the emerging machine learning revolution.
Collapse
Affiliation(s)
- Stephen J Goodswen
- School of Life Sciences, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Joel L N Barratt
- Parasitic Diseases Branch, Division of Parasitic Diseases and Malaria, Center for Global Health, Centers for Disease Control and Prevention, Atlanta, GA, USA
| | - Paul J Kennedy
- School of Computer Science, Faculty of Engineering and Information Technology and the Australian Artificial Intelligence Institute, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Alexa Kaufer
- School of Life Sciences, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - Larissa Calarco
- School of Life Sciences, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| | - John T Ellis
- School of Life Sciences, University of Technology Sydney (UTS), Ultimo, NSW, Australia
| |
Collapse
|
31
|
Robust detection of point mutations involved in multidrug-resistant Mycobacterium tuberculosis in the presence of co-occurrent resistance markers. PLoS Comput Biol 2020; 16:e1008518. [PMID: 33347430 PMCID: PMC7785249 DOI: 10.1371/journal.pcbi.1008518] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Revised: 01/05/2021] [Accepted: 11/11/2020] [Indexed: 11/23/2022] Open
Abstract
Tuberculosis disease is a major global public health concern and the growing prevalence of drug-resistant Mycobacterium tuberculosis is making disease control more difficult. However, the increasing application of whole-genome sequencing as a diagnostic tool is leading to the profiling of drug resistance to inform clinical practice and treatment decision making. Computational approaches for identifying established and novel resistance-conferring mutations in genomic data include genome-wide association study (GWAS) methodologies, tests for convergent evolution and machine learning techniques. These methods may be confounded by extensive co-occurrent resistance, where statistical models for a drug include unrelated mutations known to be causing resistance to other drugs. Here, we introduce a novel ‘cannibalistic’ elimination algorithm (“Hungry, Hungry SNPos”) that attempts to remove these co-occurrent resistant variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type (n = 3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach is considerably more robust than traditional methods and detects resistance-associated variants too rare to be likely picked up by correlation-based techniques like GWAS. Tuberculosis is one of the deadliest infectious diseases, being responsible for more than one million deaths per year. The causing bacteria are becoming increasingly drug-resistant, which is hampering disease control. At the same time, an unprecedented amount of bacterial whole-genome sequencing is increasingly informing clinical practice. In order to detect the genetic alterations responsible for developing drug resistance and predict resistance status from genomic data, bio-statistical methods and machine learning models have been employed. However, due to strongly overlapping drug resistance phenotypes and genotypes in multidrug-resistant datasets, the results of these correlation-based approaches frequently also contain mutations related to resistance against other drugs. In the past, this issue has often been ignored or partially resolved by either restricting the input data or in post-analysis screening—with both strategies relying on prior information. Here we present a heuristic algorithm for finding resistance-associated variants and demonstrate that it is considerably more robust towards co-occurrent resistance compared to traditional techniques. The software is available at https://github.com/julibeg/HHS.
Collapse
|
32
|
Khalili E, Kouchaki S, Ramazi S, Ghanati F. Machine Learning Techniques for Soybean Charcoal Rot Disease Prediction. FRONTIERS IN PLANT SCIENCE 2020; 11:590529. [PMID: 33381132 PMCID: PMC7767839 DOI: 10.3389/fpls.2020.590529] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Accepted: 11/23/2020] [Indexed: 06/01/2023]
Abstract
Early prediction of pathogen infestation is a key factor to reduce the disease spread in plants. Macrophomina phaseolina (Tassi) Goid, as one of the main causes of charcoal rot disease, suppresses the plant productivity significantly. Charcoal rot disease is one of the most severe threats to soybean productivity. Prediction of this disease in soybeans is very tedious and non-practical using traditional approaches. Machine learning (ML) techniques have recently gained substantial traction across numerous domains. ML methods can be applied to detect plant diseases, prior to the full appearance of symptoms. In this paper, several ML techniques were developed and examined for prediction of charcoal rot disease in soybean for a cohort of 2,000 healthy and infected plants. A hybrid set of physiological and morphological features were suggested as inputs to the ML models. All developed ML models were performed better than 90% in terms of accuracy. Gradient Tree Boosting (GBT) was the best performing classifier which obtained 96.25% and 97.33% in terms of sensitivity and specificity. Our findings supported the applicability of ML especially GBT for charcoal rot disease prediction in a real environment. Moreover, our analysis demonstrated the importance of including physiological featured in the learning. The collected dataset and source code can be found in https://github.com/Elham-khalili/Soybean-Charcoal-Rot-Disease-Prediction-Dataset-code.
Collapse
Affiliation(s)
- Elham Khalili
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| | - Samaneh Kouchaki
- Faculty of Engineering and Physical Sciences, Centre for Vision, Speech, and Signal Processing, University of Surrey, Guildford, United Kingdom
| | - Shahin Ramazi
- Department of Biophysics, Faculty of Biological Science, Tarbiat Modares University, Tehran, Iran
| | - Faezeh Ghanati
- Department of Plant Science, Faculty of Science, Tarbiat Modarres University, Tehran, Iran
| |
Collapse
|
33
|
Li X, Lin J, Hu Y, Zhou J. PARMAP: A Pan-Genome-Based Computational Framework for Predicting Antimicrobial Resistance. Front Microbiol 2020; 11:578795. [PMID: 33193203 PMCID: PMC7642336 DOI: 10.3389/fmicb.2020.578795] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Accepted: 09/24/2020] [Indexed: 11/17/2022] Open
Abstract
Antimicrobial resistance (AMR) has emerged as one of the most urgent global threats to public health. Accurate detection of AMR phenotypes is critical for reducing the spread of AMR strains. Here, we developed PARMAP (Prediction of Antimicrobial Resistance by MAPping genetic alterations in pan-genome) to predict AMR phenotypes and to identify AMR-associated genetic alterations based on the pan-genome of bacteria by utilizing machine learning algorithms. When we applied PARMAP to 1,597 Neisseria gonorrhoeae strains, it successfully predicted their AMR phenotypes based on a pan-genome analysis. Furthermore, it identified 328 genetic alterations in 23 known AMR genes and discovered many new AMR-associated genetic alterations in ciprofloxacin-resistant N. gonorrhoeae, and it clearly indicated the genetic heterogeneity of AMR genes in different subtypes of resistant N. gonorrhoeae. Additionally, PARMAP performed well in predicting the AMR phenotypes of Mycobacterium tuberculosis and Escherichia coli, indicating the robustness of the PARMAP framework. In conclusion, PARMAP not only precisely predicts the AMR of a population of strains of a given species but also uses whole-genome sequencing data to prioritize candidate AMR-associated genetic alterations based on their likelihood of contributing to AMR. Thus, we believe that PARMAP will accelerate investigations into AMR mechanisms in other human pathogens.
Collapse
Affiliation(s)
- Xuefei Li
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jingxia Lin
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Yongfei Hu
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| | - Jiajian Zhou
- Dermatology Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|
34
|
Pataki BÁ, Matamoros S, van der Putten BCL, Remondini D, Giampieri E, Aytan-Aktug D, Hendriksen RS, Lund O, Csabai I, Schultsz C. Understanding and predicting ciprofloxacin minimum inhibitory concentration in Escherichia coli with machine learning. Sci Rep 2020; 10:15026. [PMID: 32929164 PMCID: PMC7490380 DOI: 10.1038/s41598-020-71693-5] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2020] [Accepted: 08/18/2020] [Indexed: 11/13/2022] Open
Abstract
It is important that antibiotics prescriptions are based on antimicrobial susceptibility data to ensure effective treatment outcomes. The increasing availability of next-generation sequencing, bacterial whole genome sequencing (WGS) can facilitate a more reliable and faster alternative to traditional phenotyping for the detection and surveillance of AMR. This work proposes a machine learning approach that can predict the minimum inhibitory concentration (MIC) for a given antibiotic, here ciprofloxacin, on the basis of both genome-wide mutation profiles and profiles of acquired antimicrobial resistance genes. We analysed 704 Escherichia coli genomes combined with their respective MIC measurements for ciprofloxacin originating from different countries. The four most important predictors found by the model, mutations in gyrA residues Ser83 and Asp87, a mutation in parC residue Ser80 and presence of the qnrS1 gene, have been experimentally validated before. Using only these four predictors in a linear regression model, 65% and 93% of the test samples’ MIC were correctly predicted within a two- and a four-fold dilution range, respectively. The presented work does not treat machine learning as a black box model concept, but also identifies the genomic features that determine susceptibility. The recent progress in WGS technology in combination with machine learning analysis approaches indicates that in the near future WGS of bacteria might become cheaper and faster than a MIC measurement.
Collapse
Affiliation(s)
- Bálint Ármin Pataki
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary. .,Department of Computational Sciences, Wigner Research Centre for Physics of the HAS, Budapest, Hungary.
| | - Sébastien Matamoros
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Boas C L van der Putten
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.,Department of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | - Daniel Remondini
- Department of Physics and Astronomy (DIFA), University of Bologna, Bologna, Italy
| | - Enrico Giampieri
- Department of Experimental, Diagnostic and Specialty Medicine (DIMES), University of Bologna, Bologna, Italy
| | - Derya Aytan-Aktug
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Rene S Hendriksen
- National Food Institute, Technical University of Denmark, Lyngby, Denmark
| | - Ole Lund
- Department of Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - István Csabai
- Department of Physics of Complex Systems, ELTE Eötvös Loránd University, Budapest, Hungary.,Department of Computational Sciences, Wigner Research Centre for Physics of the HAS, Budapest, Hungary
| | - Constance Schultsz
- Department of Medical Microbiology, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands.,Department of Global Health, Amsterdam Institute for Global Health and Development, Amsterdam UMC, University of Amsterdam, Amsterdam, The Netherlands
| | | |
Collapse
|
35
|
Raja Kumaran S, Othman MS, Mi Yusuf L. ESTIMATION OF MISSING VALUES USING OPTIMISED HYBRID FUZZY C-MEANS AND MAJORITY VOTE FOR MICROARRAY DATA. JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY 2020. [DOI: 10.32890/jict2020.19.4.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Missing values are a huge constraint in microarray technologies towards improving and identifying disease-causing genes. Estimating missing values is an undeniable scenario faced by field experts. The imputation method is an effective way to impute the proper values to proceed with the next process in microarray technology. Missing value imputation methods may increase the classification accuracy. Although these methods might predict the values, classification accuracy rates prove the ability of the methods to identify the missing values in gene expression data. In this study, a novel method, Optimised Hybrid of Fuzzy C-Means and Majority Vote (opt-FCMMV), was proposed to identify the missing values in the data. Using the Majority Vote (MV) and optimisation through Particle Swarm Optimisation (PSO), this study predicted missing values in the data to form more informative and solid data. In order to verify the effectiveness of opt-FCMMV, several experiments were carried out on two publicly available microarray datasets (i.e. Ovary and Lung Cancer) under three missing value mechanisms with five different percentage values in the biomedical domain using Support Vector Machine (SVM) classifier. The experimental results showed that the proposed method functioned efficiently by showcasing the highest accuracy rate as compared to the one without imputations, with imputation by Fuzzy C-Means (FCM), and imputation by Fuzzy C-Means with Majority Vote (FCMMV). For example, the accuracy rates for Ovary Cancer data with 5% missing values were 64.0% for no imputation, 81.8% (FCM), 90.0% (FCMMV), and 93.7% (opt-FCMMV). Such an outcome indicates that the opt-FCMMV may also be applied in different domains in order to prepare the dataset for various data mining tasks.
Collapse
|
36
|
Macesic N, Bear Don't Walk OJ, Pe'er I, Tatonetti NP, Peleg AY, Uhlemann AC. Predicting Phenotypic Polymyxin Resistance in Klebsiella pneumoniae through Machine Learning Analysis of Genomic Data. mSystems 2020; 5:e00656-19. [PMID: 32457240 PMCID: PMC7253370 DOI: 10.1128/msystems.00656-19] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2019] [Accepted: 05/01/2020] [Indexed: 02/06/2023] Open
Abstract
Polymyxins are used as treatments of last resort for Gram-negative bacterial infections. Their increased use has led to concerns about emerging polymyxin resistance (PR). Phenotypic polymyxin susceptibility testing is resource intensive and difficult to perform accurately. The complex polygenic nature of PR and our incomplete understanding of its genetic basis make it difficult to predict PR using detection of resistance determinants. We therefore applied machine learning (ML) to whole-genome sequencing data from >600 Klebsiella pneumoniae clonal group 258 (CG258) genomes to predict phenotypic PR. Using a reference-based representation of genomic data with ML outperformed a rule-based approach that detected variants in known PR genes (area under receiver-operator curve [AUROC], 0.894 versus 0.791, P = 0.006). We noted modest increases in performance by using a bacterial genome-wide association study to filter relevant genomic features and by integrating clinical data in the form of prior polymyxin exposure. Conversely, reference-free representation of genomic data as k-mers was associated with decreased performance (AUROC, 0.692 versus 0.894, P = 0.015). When ML models were interpreted to extract genomic features, six of seven known PR genes were correctly identified by models without prior programming and several genes involved in stress responses and maintenance of the cell membrane were identified as potential novel determinants of PR. These findings are a proof of concept that whole-genome sequencing data can accurately predict PR in K. pneumoniae CG258 and may be applicable to other forms of complex antimicrobial resistance.IMPORTANCE Polymyxins are last-resort antibiotics used to treat highly resistant Gram-negative bacteria. There are increasing reports of polymyxin resistance emerging, raising concerns of a postantibiotic era. Polymyxin resistance is therefore a significant public health threat, but current phenotypic methods for detection are difficult and time-consuming to perform. There have been increasing efforts to use whole-genome sequencing for detection of antibiotic resistance, but this has been difficult to apply to polymyxin resistance because of its complex polygenic nature. The significance of our research is that we successfully applied machine learning methods to predict polymyxin resistance in Klebsiella pneumoniae clonal group 258, a common health care-associated and multidrug-resistant pathogen. Our findings highlight that machine learning can be successfully applied even in complex forms of antibiotic resistance and represent a significant contribution to the literature that could be used to predict resistance in other bacteria and to other antibiotics.
Collapse
Affiliation(s)
- Nenad Macesic
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, New York, USA
- Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, Melbourne, Australia
| | | | - Itsik Pe'er
- Department of Computer Science, Columbia University, New York, New York, USA
| | - Nicholas P Tatonetti
- Department of Biomedical Informatics, Columbia University, New York, New York, USA
| | - Anton Y Peleg
- Department of Infectious Diseases, The Alfred Hospital and Central Clinical School, Monash University, Melbourne, Australia
- Infection and Immunity Program, Monash Biomedicine Discovery Institute, Department of Microbiology, Monash University, Clayton, Victoria, Australia
| | - Anne-Catrin Uhlemann
- Division of Infectious Diseases, Columbia University Irving Medical Center, New York, New York, USA
- Microbiome & Pathogen Genomics Core, Columbia University Irving Medical Center, New York, New York, USA
| |
Collapse
|
37
|
Kouchaki S, Yang Y, Lachapelle A, Walker TM, Walker AS, Peto TEA, Crook DW, Clifton DA. Multi-Label Random Forest Model for Tuberculosis Drug Resistance Classification and Mutation Ranking. Front Microbiol 2020; 11:667. [PMID: 32390972 PMCID: PMC7188832 DOI: 10.3389/fmicb.2020.00667] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/24/2020] [Indexed: 12/12/2022] Open
Abstract
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.
Collapse
Affiliation(s)
- Samaneh Kouchaki
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, United Kingdom
| | - Yang Yang
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, United Kingdom
- Oxford-Suzhou Centre for Advanced Research, Suzhou, China
| | - Alexander Lachapelle
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, United Kingdom
| | - Timothy M. Walker
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, United Kingdom
- Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
| | - A. Sarah Walker
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, United Kingdom
- NIHR Biomedical Research Centre, Oxford, United Kingdom
| | | | - Timothy E. A. Peto
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, United Kingdom
- NIHR Biomedical Research Centre, Oxford, United Kingdom
| | - Derrick W. Crook
- Nuffield Department of Medicine, University of Oxford, Oxford, United Kingdom
- National Institute of Health Research Oxford Biomedical Research Centre, John Radcliffe Hospital, Oxford, United Kingdom
- NIHR Biomedical Research Centre, Oxford, United Kingdom
| | - David A. Clifton
- Department of Engineering Science, Institute of Biomedical Engineering, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
38
|
Liu Z, Deng D, Lu H, Sun J, Lv L, Li S, Peng G, Ma X, Li J, Li Z, Rong T, Wang G. Evaluation of Machine Learning Models for Predicting Antimicrobial Resistance of Actinobacillus pleuropneumoniae From Whole Genome Sequences. Front Microbiol 2020; 11:48. [PMID: 32117101 PMCID: PMC7016212 DOI: 10.3389/fmicb.2020.00048] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 01/10/2020] [Indexed: 01/05/2023] Open
Abstract
Antimicrobial resistance (AMR) is becoming a huge problem in countries all over the world, and new approaches to identifying strains resistant or susceptible to certain antibiotics are essential in fighting against antibiotic-resistant pathogens. Genotype-based machine learning methods showed great promise as a diagnostic tool, due to the increasing availability of genomic datasets and AST phenotypes. In this article, Support Vector Machine (SVM) and Set Covering Machine (SCM) models were used to learn and predict the resistance of the five drugs (Tetracycline, Ampicillin, Sulfisoxazole, Trimethoprim, and Enrofloxacin). The SVM model used the number of co-occurring k-mers between the genome of the isolates and the reference genes to learn and predict the phenotypes of the bacteria to a specific antimicrobial, while the SCM model uses a greedy approach to construct conjunction or disjunction of Boolean functions to find the most concise set of k-mers that allows for accurate prediction of the phenotype. Five-fold cross-validation was performed on the training set of the SVM and SCM model to select the best hyperparameter values to avoid model overfitting. The training accuracy (mean cross-validation score) and the testing accuracy of SVM and SCM models of five drugs were above 90% regardless of the resistant mechanism of which were acquired resistant or point mutation in the chromosome. The results of correlation between the phenotype and the model predictions of the five drugs indicated that both SVM and SCM models could significantly classify the resistant isolates from the sensitive isolates of the bacteria (p < 0.01), and would be used as potential tools in antimicrobial resistance surveillance and clinical diagnosis in veterinary medicine.
Collapse
Affiliation(s)
- Zhichang Liu
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Dun Deng
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Huijie Lu
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Jian Sun
- National Veterinary Microbiological Drug Resistance Risk Assessment Laboratory, College of Veterinary Medicine, South China Agricultural University, Guangzhou, China
| | - Luchao Lv
- National Veterinary Microbiological Drug Resistance Risk Assessment Laboratory, College of Veterinary Medicine, South China Agricultural University, Guangzhou, China
| | - Shuhong Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Guanghui Peng
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Xianyong Ma
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Jiazhou Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Zhenming Li
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Ting Rong
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| | - Gang Wang
- Institute of Animal Science, Guangdong Academy of Agricultural Sciences, Guangzhou, China.,State Key Laboratory of Livestock and Poultry Breeding, Guangzhou, China.,Key Laboratory of Animal Nutrition and Feed Science of Ministry of Agriculture (South China), Guangzhou, China.,Guangdong Engineering Technology Research Center of Animal Meat Quality and Safety Control and Evaluation, Guangzhou, China
| |
Collapse
|
39
|
Deelder W, Christakoudi S, Phelan J, Benavente ED, Campino S, McNerney R, Palla L, Clark TG. Machine Learning Predicts Accurately Mycobacterium tuberculosis Drug Resistance From Whole Genome Sequencing Data. Front Genet 2019; 10:922. [PMID: 31616478 PMCID: PMC6775242 DOI: 10.3389/fgene.2019.00922] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 09/02/2019] [Indexed: 11/25/2022] Open
Abstract
Background: Tuberculosis disease, caused by Mycobacterium tuberculosis, is a major public health problem. The emergence of M. tuberculosis strains resistant to existing treatments threatens to derail control efforts. Resistance is mainly conferred by mutations in genes coding for drug targets or converting enzymes, but our knowledge of these mutations is incomplete. Whole genome sequencing (WGS) is an increasingly common approach to rapidly characterize isolates and identify mutations predicting antimicrobial resistance and thereby providing a diagnostic tool to assist clinical decision making. Methods: We applied machine learning approaches to 16,688 M. tuberculosis isolates that have undergone WGS and laboratory drug-susceptibility testing (DST) across 14 antituberculosis drugs, with 22.5% of samples being multidrug resistant and 2.1% being extensively drug resistant. We used non-parametric classification-tree and gradient-boosted-tree models to predict drug resistance and uncover any associated novel putative mutations. We fitted separate models for each drug, with and without "co-occurrent resistance" markers known to be causing resistance to drugs other than the one of interest. Predictive performance was measured using sensitivity, specificity, and the area under the receiver operating characteristic curve, assuming DST results as the gold standard. Results: The predictive performance was highest for resistance to first-line drugs, amikacin, kanamycin, ciprofloxacin, moxifloxacin, and multidrug-resistant tuberculosis (area under the receiver operating characteristic curve above 96%), and lowest for third-line drugs such as D-cycloserine and Para-aminosalisylic acid (area under the curve below 85%). The inclusion of co-occurrent resistance markers led to improved performance for some drugs and superior results when compared to similar models in other large-scale studies, which had smaller sample sizes. Overall, the gradient-boosted-tree models performed better than the classification-tree models. The mutation-rank analysis detected no new single nucleotide polymorphisms linked to drug resistance. Discordance between DST and genotypically inferred resistance may be explained by DST errors, novel rare mutations, hetero-resistance, and nongenomic drivers such as efflux-pump upregulation. Conclusion: Our work demonstrates the utility of machine learning as a flexible approach to drug resistance prediction that is able to accommodate a much larger number of predictors and to summarize their predictive ability, thus assisting clinical decision making and single nucleotide polymorphism detection in an era of increasing WGS data generation.
Collapse
Affiliation(s)
- Wouter Deelder
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
- Dalberg Advisors, Geneva, Switzerland
| | - Sofia Christakoudi
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
- Epidemiology and Biostatistics Department, Imperial College London, St Mary’s Campus, London, United Kingdom
| | - Jody Phelan
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Ernest Diez Benavente
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Susana Campino
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Ruth McNerney
- Department of Medicine, University of Cape Town, Cape Town, South Africa
| | - Luigi Palla
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| | - Taane G. Clark
- Faculties of Epidemiology & Population Health and Infectious & Tropical Diseases, London School of Hygiene & Tropical Medicine, London, United Kingdom
| |
Collapse
|
40
|
Millstein J, Battaglin F, Barrett M, Cao S, Zhang W, Stintzing S, Heinemann V, Lenz HJ. Partition: a surjective mapping approach for dimensionality reduction. Bioinformatics 2019; 36:676-681. [PMID: 31504178 PMCID: PMC8215926 DOI: 10.1093/bioinformatics/btz661] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/22/2019] [Accepted: 08/20/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Large amounts of information generated by genomic technologies are accompanied by statistical and computational challenges due to redundancy, badly behaved data and noise. Dimensionality reduction (DR) methods have been developed to mitigate these challenges. However, many approaches are not scalable to large dimensions or result in excessive information loss. RESULTS The proposed approach partitions data into subsets of related features and summarizes each into one and only one new feature, thus defining a surjective mapping. A constraint on information loss determines the size of the reduced dataset. Simulation studies demonstrate that when multiple related features are associated with a response, this approach can substantially increase the number of true associations detected as compared to principal components analysis, non-negative matrix factorization or no DR. This increase in true discoveries is explained both by a reduced multiple-testing challenge and a reduction in extraneous noise. In an application to real data collected from metastatic colorectal cancer tumors, more associations between gene expression features and progression free survival and response to treatment were detected in the reduced than in the full untransformed dataset. AVAILABILITY AND IMPLEMENTATION Freely available R package from CRAN, https://cran.r-project.org/package=partition. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Francesca Battaglin
- Department of Medicine, Division of Medical Oncology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA,Clinical and Experimental Oncology Department, Medical Oncology Unit 1, Veneto Institute of Oncology IOV-IRCCS, Padua 35128, Italy
| | | | - Shu Cao
- Department of Preventive Medicine, CA 90033, USA
| | - Wu Zhang
- Department of Medicine, Division of Medical Oncology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| | - Sebastian Stintzing
- Medical Department, Division of Oncology and Hematology, Charité Universitaetsmedizin Berlin, Berlin 10117, Germany
| | - Volker Heinemann
- Department of Medicine III, University Hospital Munich, Munich 80336, Germany
| | - Heinz-Josef Lenz
- Department of Medicine, Division of Medical Oncology, Norris Comprehensive Cancer Center, Keck School of Medicine, University of Southern California, Los Angeles, CA 90033, USA
| |
Collapse
|