Molecular diversity of Mycobacterium tuberculosis complex in Sikkim, India and prediction of dominant spoligotypes using artificial intelligence.
Sci Rep 2021;
11:7365. [PMID:
33795751 PMCID:
PMC8016865 DOI:
10.1038/s41598-021-86626-z]
[Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2020] [Accepted: 03/12/2021] [Indexed: 02/06/2023] Open
Abstract
In India, tuberculosis is an enormous public health problem. This study provides the first description of molecular diversity of the Mycobacterium tuberculosis complex (MTBC) from Sikkim, India. A total of 399 Acid Fast Bacilli sputum positive samples were cultured on Lőwenstein-Jensen media and genetic characterisation was done by spoligotyping and 24-loci MIRU-VNTR typing. Spoligotyping revealed the occurrence of 58 different spoligotypes. Beijing spoligotype was the most dominant type constituting 62.41% of the total isolates and was associated with Multiple Drug Resistance. Minimum Spanning tree analysis of 249 Beijing strains based on 24-loci MIRU-VNTR analysis identified 12 clonal complexes (Single Locus Variants). The principal component analysis was used to visualise possible grouping of MTBC isolates from Sikkim belonging to major spoligotypes using 24-MIRU VNTR profiles. Artificial intelligence-based machine learning (ML) methods such as Random Forests (RF), Support Vector Machines (SVM) and Artificial Neural Networks (ANN) were used to predict dominant spoligotypes of MTBC using MIRU-VNTR data. K-fold cross-validation and validation using unseen testing data set revealed high accuracy of ANN, RF, and SVM for predicting Beijing, CAS1_Delhi, and T1 Spoligotypes (93-99%). However, prediction using the external new validation data set revealed that the RF model was more accurate than SVM and ANN.
Collapse