1
|
Winkler DA. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front Chem 2021; 9:614073. [PMID: 33791277 PMCID: PMC8005575 DOI: 10.3389/fchem.2021.614073] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Neglected tropical diseases continue to create high levels of morbidity and mortality in a sizeable fraction of the world’s population, despite ongoing research into new treatments. Some of the most important technological developments that have accelerated drug discovery for diseases of affluent countries have not flowed down to neglected tropical disease drug discovery. Pharmaceutical development business models, cost of developing new drug treatments and subsequent costs to patients, and accessibility of technologies to scientists in most of the affected countries are some of the reasons for this low uptake and slow development relative to that for common diseases in developed countries. Computational methods are starting to make significant inroads into discovery of drugs for neglected tropical diseases due to the increasing availability of large databases that can be used to train ML models, increasing accuracy of these methods, lower entry barrier for researchers, and widespread availability of public domain machine learning codes. Here, the application of artificial intelligence, largely the subset called machine learning, to modelling and prediction of biological activities and discovery of new drugs for neglected tropical diseases is summarized. The pathways for the development of machine learning methods in the short to medium term and the use of other artificial intelligence methods for drug discovery is discussed. The current roadblocks to, and likely impacts of, synergistic new technological developments on the use of ML methods for neglected tropical disease drug discovery in the future are also discussed.
Collapse
Affiliation(s)
- David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia.,Latrobe Institute for Molecular Science, La Trobe University, Bundoora, VIC, Australia.,School of Pharmacy, University of Nottingham, Nottingham, United Kingdom.,CSIRO Data61, Pullenvale, QLD, Australia
| |
Collapse
|
2
|
Shi M, Wang J, Zhang L, Yan Y, Miao YD, Zhang X. Effects of Integrated Case Payment on Medical Expenditure and Readmission of Inpatients with Chronic Obstructive Pulmonary Disease: A Nonrandomized, Comparative Study in Xi County, China. Curr Med Sci 2018; 38:558-566. [PMID: 30074226 DOI: 10.1007/s11596-018-1914-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2017] [Revised: 11/07/2017] [Indexed: 01/05/2023]
Abstract
In the past few decades, Chinese government attempted to reduce the economic burden of chronic diseases and lower family financial risk of patients by establishing a nationwide coverage of Social Health Insurance system. However, the payment mode of Social Health Insurance varies across Chinese healthcare settings, and the effectiveness of each mode differs. This study aimed to evaluate the effects of integrated case payment on medical expenditure and readmission of inpatients with chronic obstructive pulmonary disease (COPD), a complex, multicomponent, chronic condition. A nonrandomized, comparative method was used in this study. Inpatients with COPD before (n=1569) and after the integrated case payment reform (n=4764) were selected from the inpatient information database of the New Cooperative Medical Scheme Agency of Xi County. The integrated case payment comprises the case payment (including price-cap case payment and fixed-reimbursement case payment) and clinical pathway (including clinical pathway A, clinical pathway B and clinical pathway C). Effects of integrated case payment were evaluated with indicators of per capita total medical expense and readmission within 30 days. A multivariate linear regression and a binary logistic regression were used to conduct statistical analysis. The results showed that case payment, comprising price-cap case payment β=2382.988, P<0.001) and fixed-reimbursement case payment β=2613.564, P<0.001), and clinical pathway C β=1996.467, P<0.001) were risk factors of per capita total medical expenses. Clinical pathway A β=1443.409, P<0.001) and clinical pathway B β=1583.791, P<0.001) were protective factors. The interactive effects of case payment with hospital level β=0.710, P<0.001) lowered the readmission rate within 30 days. Meanwhile, clinical pathways A β=18.949, P<0.001), B (β=19.152, PO.OOl) and C β=1.882, P<0.001) were associated with the rate increase. The findings revealed that integrated case payment ensured the quality of care for inpatients with COPD to some extent. However, this payment mode increased the per capita total medical expense. Further, policy-makers should set reasonable reimbursement standards of case payment, unify the type of case payment, and strengthen the supervision of the reform to enhance its function on medical cost control.
Collapse
Affiliation(s)
- Meng Shi
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.,Research Center for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, 430030, China
| | - Jing Wang
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.,Research Center for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, 430030, China
| | - Liang Zhang
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.,Research Center for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, 430030, China
| | - Yan Yan
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China.,Research Center for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, 430030, China
| | - Yu-Dong Miao
- School of Health Policy and Management, Nanjing Medical University, Nanjing, 211166, China
| | - Xiang Zhang
- School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, China. .,Research Center for Rural Health Service, Key Research Institute of Humanities & Social Sciences of Hubei Provincial Department of Education, Wuhan, 430030, China.
| |
Collapse
|
3
|
Machine Learning Approaches Toward Building Predictive Models for Small Molecule Modulators of miRNA and Its Utility in Virtual Screening of Molecular Databases. Methods Mol Biol 2018; 1517:155-168. [PMID: 27924481 DOI: 10.1007/978-1-4939-6563-2_11] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
The ubiquitous role of microRNAs (miRNAs) in a number of pathological processes has suggested that they could act as potential drug targets. RNA-binding small molecules offer an attractive means for modulating miRNA function. The availability of bioassay data sets for a variety of biological assays and molecules in public domain provides a new opportunity toward utilizing them to create models and further utilize them for in silico virtual screening approaches to prioritize or assign potential functions for small molecules. Here, we describe a computational strategy based on machine learning for creation of predictive models from high-throughput biological screens for virtual screening of small molecules with the potential to inhibit microRNAs. Such models could be potentially used for computational prioritization of small molecules before performing high-throughput biological assay.
Collapse
|
4
|
Ai L, Tian H, Chen Z, Chen H, Xu J, Fang JY. Systematic evaluation of supervised classifiers for fecal microbiota-based prediction of colorectal cancer. Oncotarget 2017; 8:9546-9556. [PMID: 28061434 PMCID: PMC5354752 DOI: 10.18632/oncotarget.14488] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022] Open
Abstract
Predicting colorectal cancer (CRC) based on fecal microbiota presents a promising method for non-invasive screening of CRC, but the optimization of classification models remains an unaddressed question. The purpose of this study was to systematically evaluate the effectiveness of different supervised machine-learning models in predicting CRC in two independent eastern and western populations. The structures of intestinal microflora in feces in Chinese population (N = 141) were determined by 454 FLX pyrosequencing, and different supervised classifiers were employed to predict CRC based on fecal microbiota operational taxonomic unit (OTUs). As a result, Bayes Net and Random Forest displayed higher accuracies than other algorithms in both populations, although Bayes Net was found with a lower false negative rate than that of Random Forest. Gut microbiota-based prediction was more accurate than the standard fecal occult blood test (FOBT), and the combination of both approaches further improved the prediction accuracy. Moreover, when unclassified OTUs were used as input, the BayesDMNB text algorithm achieved higher accuracy in the Chinese population (AUC=0.994). Taken together, our results suggest that Bayes Net classification model combined with unclassified OTUs may present an accurate method for predicting CRC based on the compositions of gut microbiota.
Collapse
Affiliation(s)
- Luoyan Ai
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Haiying Tian
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Zhaofei Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Huimin Chen
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jie Xu
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| | - Jing-Yuan Fang
- Division of Gastroenterology and Hepatology, Shanghai Institute of Digestive Disease, Key Laboratory of Gastroenterology and Hepatology, Ministry of Health, State Key Laboratory for Oncogenes and Related Genes, Renji Hospital, School of Medicine, Shanghai Jiao-Tong University, Shanghai 200001, China
| |
Collapse
|
5
|
Jamal S, Arora S, Scaria V. Computational Analysis and Predictive Cheminformatics Modeling of Small Molecule Inhibitors of Epigenetic Modifiers. PLoS One 2016; 11:e0083032. [PMID: 27622288 PMCID: PMC5021286 DOI: 10.1371/journal.pone.0083032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 10/30/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The dynamic and differential regulation and expression of genes is majorly governed by the complex interactions of a subset of biomolecules in the cell operating at multiple levels starting from genome organisation to protein post-translational regulation. The regulatory layer contributed by the epigenetic layer has been one of the favourite areas of interest recently. This layer of regulation as we know today largely comprises of DNA modifications, histone modifications and noncoding RNA regulation and the interplay between each of these major components. Epigenetic regulation has been recently shown to be central to development of a number of disease processes. The availability of datasets of high-throughput screens for molecules for biological properties offer a new opportunity to develop computational methodologies which would enable in-silico screening of large molecular libraries. METHODS In the present study, we have used data from high throughput screens for the inhibitors of epigenetic modifiers. Computational predictive models were constructed based on the molecular descriptors. Machine learning algorithms for supervised training, Naive Bayes and Random Forest, were used to generate predictive models for the small molecule inhibitors of histone methyl-transferases and demethylases. Random forest, with the accuracy of 80%, was identified as the most accurate classifier. Further we complemented the study with substructure search approach filtering out the probable pharmacophores from the active molecules leading to drug molecules. RESULTS We show that effective use of appropriate computational algorithms could be used to learn molecular and structural correlates of biological activities of small molecules. The computational models developed could be potentially used to screen and identify potential new biological activities of molecules from large molecular libraries and prioritise them for in-depth biological assays. To the best of our knowledge, this is the first and most comprehensive computational analysis towards understanding activities of small molecules inhibitors of epigenetic modifiers.
Collapse
Affiliation(s)
- Salma Jamal
- CSIR Open Source Drug Discovery Unit (CSIR-OSDD), Anusandhan Bhawan, Delhi, India
| | - Sonam Arora
- Delhi Technological University, Delhi, India
| | - Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
- * E-mail:
| |
Collapse
|
6
|
Reporting Statistics in Biomedical Research Literature: The Numbers Say it All. Indian Pediatr 2016; 53:811-814. [PMID: 27771648 DOI: 10.1007/s13312-016-0936-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
7
|
Computational Analysis and In silico Predictive Modeling for Inhibitors of PhoP Regulon in S. typhi on High-Throughput Screening Bioassay Dataset. Interdiscip Sci 2015; 8:95-101. [DOI: 10.1007/s12539-015-0273-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 09/01/2014] [Accepted: 09/10/2014] [Indexed: 10/23/2022]
|
8
|
Jamal S, Goyal S, Shanker A, Grover A. Checking the STEP-Associated Trafficking and Internalization of Glutamate Receptors for Reduced Cognitive Deficits: A Machine Learning Approach-Based Cheminformatics Study and Its Application for Drug Repurposing. PLoS One 2015; 10:e0129370. [PMID: 26066505 PMCID: PMC4466797 DOI: 10.1371/journal.pone.0129370] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/07/2015] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Alzheimer's disease, a lethal neurodegenerative disorder that leads to progressive memory loss, is the most common form of dementia. Owing to the complexity of the disease, its root cause still remains unclear. The existing anti-Alzheimer's drugs are unable to cure the disease while the current therapeutic options have provided only limited help in restoring moderate memory and remain ineffective at restricting the disease's progression. The striatal-enriched protein tyrosine phosphatase (STEP) has been shown to be involved in the internalization of the receptor, N-methyl D-aspartate (NMDR) and thus is associated with the disease. The present study was performed using machine learning algorithms, docking protocol and molecular dynamics (MD) simulations to develop STEP inhibitors, which could be novel anti-Alzheimer's molecules. METHODS The present study deals with the generation of computational predictive models based on chemical descriptors of compounds using machine learning approaches followed by substructure fragment analysis. To perform this analysis, the 2D molecular descriptors were generated and machine learning algorithms (Naïve Bayes, Random Forest and Sequential Minimization Optimization) were utilized. The binding mechanisms and the molecular interactions between the predicted active compounds and the target protein were modelled using docking methods. Further, the stability of the protein-ligand complex was evaluated using MD simulation studies. The substructure fragment analysis was performed using Substructure fingerprint (SubFp), which was further explored using a predefined dictionary. RESULTS The present study demonstrates that the computational methodology used can be employed to examine the biological activities of small molecules and prioritize them for experimental screening. Large unscreened chemical libraries can be screened to identify potential novel hits and accelerate the drug discovery process. Additionally, the chemical libraries can be searched for significant substructure patterns as reported in the present study, thus possibly contributing to the activity of these molecules.
Collapse
Affiliation(s)
- Salma Jamal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, India
| | - Sukriti Goyal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, India
| | - Asheesh Shanker
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, Rajasthan, India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
9
|
Balekundri U, Sajjan SS, Madagi SB. Two dimensional quantitative structure activity relationship models for 5alpha-reductase type 2 inhibitors. JOURNAL OF PHARMACEUTICAL INVESTIGATION 2015. [DOI: 10.1007/s40005-015-0173-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
10
|
Wahi D, Jamal S, Goyal S, Singh A, Jain R, Rana P, Grover A. Cheminformatics models based on machine learning approaches for design of USP1/UAF1 abrogators as anticancer agents. SYSTEMS AND SYNTHETIC BIOLOGY 2015; 9:33-43. [PMID: 25972987 DOI: 10.1007/s11693-015-9162-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Revised: 01/14/2015] [Accepted: 01/23/2015] [Indexed: 12/17/2022]
Abstract
Cancer cells have upregulated DNA repair mechanisms, enabling them survive DNA damage induced during repeated rapid cell divisions and targeted chemotherapeutic treatments. Cancer cell proliferation and survival targeting via inhibition of DNA repair pathways is currently a very promiscuous anti-tumor approach. The deubiquitinating enzyme, USP1 is known to promote DNA repair via complexing with UAF1. The USP1/UAF1 complex is responsible for regulating DNA break repair pathways such as trans-lesion synthesis pathway, Fanconi anemia pathway and homologous recombination. Thus, USP1/UAF1 inhibition poses as an efficient anti-cancer strategy. The recently made available high throughput screen data for anti USP1/UAF1 activity prompted us to compute bioactivity predictive models that could help in screening for potential USP1/UAF1 inhibitors having anti-cancer properties. The current study utilizes publicly available high throughput screen data set of chemical compounds evaluated for their potential USP1/UAF1 inhibitory effect. A machine learning approach was devised for generation of computational models that could predict for potential anti USP1/UAF1 biological activity of novel anticancer compounds. Additional efficacy of active compounds was screened by applying SMARTS filter to eliminate molecules with non-drug like features. The structural fragment analysis was further performed to explore structural properties of the molecules. We demonstrated that modern machine learning approaches could be efficiently employed in building predictive computational models and their predictive performance is statistically accurate. The structure fragment analysis revealed the structures that could play an important role in identification of USP1/UAF1 inhibitors.
Collapse
Affiliation(s)
- Divya Wahi
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067 India
| | - Salma Jamal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, 304022 Rajasthan India
| | - Sukriti Goyal
- Department of Bioscience and Biotechnology, Banasthali University, Tonk, 304022 Rajasthan India
| | - Aditi Singh
- Department of Biotechnology, TERI University, Plot No. 10, Institutional Area, Vasant Kunj, New Delhi, 110 070 India
| | - Ritu Jain
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067 India
| | - Preeti Rana
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067 India
| | - Abhinav Grover
- School of Biotechnology, Jawaharlal Nehru University, New Delhi, 110067 India
| |
Collapse
|
11
|
Kaur H, Ahmad M, Scaria V. Computational analysis and In-silico predictive modeling for inhibitors of PhoP regulon in S. typhi on high-throughput screening bioassay dataset. Interdiscip Sci 2015. [PMID: 25595584 DOI: 10.1007/s12539-014-0212-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Revised: 09/01/2014] [Accepted: 09/10/2014] [Indexed: 06/04/2023]
Abstract
The emergence of multidrug resistant salmonella enterica serotype typhi in pandemic proportions throughout the world and therefore there is a necessity to speed up the discovery of novel molecules having different modes of action and also less influenced to the resistance formation that would be used as drug for the treatment of salmonellosis particularly typhoid fever. The PhoP regulon is well studied and have now been shown to be critical regulator of number of gene expression whose required for intracellular survival of S.enterica and pathophysiology of disease like typhoid. The evident roles of two component PhoP/PhoQ-regulated products in salmonella virulence have motivated attempts to target them as therapeutically. Although the discovery process of biologically active compounds for the treatment of typhoid rely on hit finding procedure using high throughput screening technology alone is very expensive, as well as time consuming when performed on large scales. With the recent advancement in combinatorial chemistry and contemporary technique for compounds synthesis, there are more and more compounds become available that gives ample growth of diverse compound library, but the time and endeavor required to screen these unfocused massive and diverse library has been slightly reduced in the past years. Hence there is demand to improve the high quality hits and success rate for high-throughput screening (HTS) that required focused and biased compound library towards the particular target. Therefore we still need an advantageous and expedient method to prioritize the molecules that will be utilized for biological screens which save time and also inexpensive. In this concept In-silico methods like Machine Learning are widely applicable technique used to build computational model for high-throughput virtual screens to prioritize molecules for advance study. Furthermore in computational analysis we extended our study to identify the common enriched structural entities among the biologically active compound towards find out the privileged scaffold.
Collapse
Affiliation(s)
- Harleen Kaur
- Department of Computer Science, Hamdard University, New Delhi, India,
| | | | | |
Collapse
|
12
|
Cheminformatics models for inhibitors of Schistosoma mansoni thioredoxin glutathione reductase. ScientificWorldJournal 2014; 2014:957107. [PMID: 25629082 PMCID: PMC4275605 DOI: 10.1155/2014/957107] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/13/2014] [Accepted: 10/01/2014] [Indexed: 12/25/2022] Open
Abstract
Schistosomiasis is a neglected tropical disease caused by a parasite Schistosoma mansoni and affects over 200 million annually. There is an urgent need to discover novel therapeutic options to control the disease with the recent emergence of drug resistance. The multifunctional protein, thioredoxin glutathione reductase (TGR), an essential enzyme for the survival of the pathogen in the redox environment has been actively explored as a potential drug target. The recent availability of small-molecule screening datasets against this target provides a unique opportunity to learn molecular properties and apply computational models for discovery of activities in large molecular libraries. Such a prioritisation approach could have the potential to reduce the cost of failures in lead discovery. A supervised learning approach was employed to develop a cost sensitive classification model to evaluate the biological activity of the molecules. Random forest was identified to be the best classifier among all the classifiers with an accuracy of around 80 percent. Independent analysis using a maximally occurring substructure analysis revealed 10 highly enriched scaffolds in the actives dataset and their docking against was also performed. We show that a combined approach of machine learning and other cheminformatics approaches such as substructure comparison and molecular docking is efficient to prioritise molecules from large molecular datasets.
Collapse
|
13
|
Cheng T, Pan Y, Hao M, Wang Y, Bryant SH. PubChem applications in drug discovery: a bibliometric analysis. Drug Discov Today 2014; 19:1751-1756. [PMID: 25168772 DOI: 10.1016/j.drudis.2014.08.008] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2014] [Revised: 07/17/2014] [Accepted: 08/18/2014] [Indexed: 12/18/2022]
Abstract
A bibliometric analysis of PubChem applications is presented by reviewing 1132 research articles. The massive volume of chemical structure and bioactivity data in PubChem and its online services have been used globally in various fields including chemical biology, medicinal chemistry and informatics research. PubChem supports drug discovery in many aspects such as lead identification and optimization, compound-target profiling, polypharmacology studies and unknown chemical identity elucidation. PubChem has also become a valuable resource for developing secondary databases, informatics tools and web services. The growing PubChem resource with its public availability offers support and great opportunities for the interrogation of pharmacological mechanisms and the genetic basis of diseases, which are vital for drug innovation and repurposing.
Collapse
Affiliation(s)
- Tiejun Cheng
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yongmei Pan
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Ming Hao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | - Yanli Wang
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| | - Stephen H Bryant
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA.
| |
Collapse
|
14
|
Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | | | | |
Collapse
|
15
|
Jamal S, Scaria V. Data-mining of potential antitubercular activities from molecular ingredients of traditional Chinese medicines. PeerJ 2014; 2:e476. [PMID: 25081126 PMCID: PMC4106188 DOI: 10.7717/peerj.476] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 06/16/2014] [Indexed: 11/20/2022] Open
Abstract
Background. Traditional Chinese medicine encompasses a well established alternate system of medicine based on a broad range of herbal formulations and is practiced extensively in the region for the treatment of a wide variety of diseases. In recent years, several reports describe in depth studies of the molecular ingredients of traditional Chinese medicines on the biological activities including anti-bacterial activities. The availability of a well-curated dataset of molecular ingredients of traditional Chinese medicines and accurate in-silico cheminformatics models for data mining for antitubercular agents and computational filters to prioritize molecules has prompted us to search for potential hits from these datasets. Results. We used a consensus approach to predict molecules with potential antitubercular activities from a large dataset of molecular ingredients of traditional Chinese medicines available in the public domain. We further prioritized 160 molecules based on five computational filters (SMARTSfilter) so as to avoid potentially undesirable molecules. We further examined the molecules for permeability across Mycobacterial cell wall and for potential activities against non-replicating and drug tolerant Mycobacteria. Additional in-depth literature surveys for the reported antitubercular activities of the molecular ingredients and their sources were considered for drawing support to prioritization. Conclusions. Our analysis suggests that datasets of molecular ingredients of traditional Chinese medicines offer a new opportunity to mine for potential biological activities. In this report, we suggest a proof-of-concept methodology to prioritize molecules for further experimental assays using a variety of computational tools. We also additionally suggest that a subset of prioritized molecules could be used for evaluation for tuberculosis due to their additional effect against non-replicating tuberculosis as well as the additional hepato-protection offered by the source of these ingredients.
Collapse
Affiliation(s)
- Salma Jamal
- CSIR Open Source Drug Discovery Unit , Anusandhan Bhavan, Delhi , India
| | - Vinod Scaria
- CSIR Open Source Drug Discovery Unit , Anusandhan Bhavan, Delhi , India
| | | |
Collapse
|
16
|
Ekins S, Pottorf R, Reynolds R, Williams AJ, Clark AM, Freundlich JS. Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:1070-82. [PMID: 24665947 PMCID: PMC4004261 DOI: 10.1021/ci500077v] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Indexed: 02/07/2023]
Abstract
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative
Drug Discovery, 1633
Bayshore Highway, Suite 342, Burlingame, California 94010, United States
- Collaborations
in Chemistry, 5616 Hilltop
Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Richard Pottorf
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Robert
C. Reynolds
- Department
of Chemistry, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Antony J. Williams
- Royal
Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, United States
| | - Alex M. Clark
- Molecular
Materials Informatics, 1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1
| | - Joel S. Freundlich
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
- Department
of Medicine, Center for Emerging and Reemerging
Pathogens, Rutgers University − New
Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| |
Collapse
|
17
|
|
18
|
Jamal S, Scaria V. Cheminformatic models based on machine learning for pyruvate kinase inhibitors of Leishmania mexicana. BMC Bioinformatics 2013; 14:329. [PMID: 24252103 PMCID: PMC4225525 DOI: 10.1186/1471-2105-14-329] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 10/01/2013] [Indexed: 02/04/2023] Open
Abstract
Background Leishmaniasis is a neglected tropical disease which affects approx. 12 million individuals worldwide and caused by parasite Leishmania. The current drugs used in the treatment of Leishmaniasis are highly toxic and has seen widespread emergence of drug resistant strains which necessitates the need for the development of new therapeutic options. The high throughput screen data available has made it possible to generate computational predictive models which have the ability to assess the active scaffolds in a chemical library followed by its ADME/toxicity properties in the biological trials. Results In the present study, we have used publicly available, high-throughput screen datasets of chemical moieties which have been adjudged to target the pyruvate kinase enzyme of L. mexicana (LmPK). The machine learning approach was used to create computational models capable of predicting the biological activity of novel antileishmanial compounds. Further, we evaluated the molecules using the substructure based approach to identify the common substructures contributing to their activity. Conclusion We generated computational models based on machine learning methods and evaluated the performance of these models based on various statistical figures of merit. Random forest based approach was determined to be the most sensitive, better accuracy as well as ROC. We further added a substructure based approach to analyze the molecules to identify potentially enriched substructures in the active dataset. We believe that the models developed in the present study would lead to reduction in cost and length of clinical studies and hence newer drugs would appear faster in the market providing better healthcare options to the patients.
Collapse
Affiliation(s)
- Salma Jamal
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology, Mall Road, Delhi, 110007, India.
| | | |
Collapse
|
19
|
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 2013; 53:3054-63. [PMID: 24144044 DOI: 10.1021/ci400480s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | | | | |
Collapse
|
20
|
Ekins S, Freundlich JS, Hobrath JV, Lucile White E, Reynolds RC. Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery. Pharm Res 2013; 31:414-35. [PMID: 24132686 DOI: 10.1007/s11095-013-1172-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2013] [Accepted: 07/28/2013] [Indexed: 12/19/2022]
Abstract
PURPOSE Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. METHODS A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. RESULTS In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. CONCLUSION Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California, 94010, USA,
| | | | | | | | | |
Collapse
|
21
|
Screening strategies to identify new chemical diversity for drug development to treat kinetoplastid infections. Parasitology 2013; 141:140-6. [DOI: 10.1017/s003118201300142x] [Citation(s) in RCA: 113] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
SUMMARYThe Drugs for Neglected Diseases initiative (DNDi) has defined and implemented an early discovery strategy over the last few years, in fitting with its virtual R&D business model. This strategy relies on a medium- to high-throughput phenotypic assay platform to expedite the screening of compound libraries accessed through its collaborations with partners from the pharmaceutical industry. We review the pragmatic approaches used to select compound libraries for screening against kinetoplastids, taking into account screening capacity. The advantages, limitations and current achievements in identifying new quality series for further development into preclinical candidates are critically discussed, together with attractive new approaches currently under investigation.
Collapse
|
22
|
Singla D, Tewari R, Kumar A, Raghava GP. Designing of inhibitors against drug tolerant Mycobacterium tuberculosis (H37Rv). Chem Cent J 2013; 7:49. [PMID: 23497593 PMCID: PMC3639817 DOI: 10.1186/1752-153x-7-49] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 02/25/2013] [Indexed: 01/26/2023] Open
Abstract
Background Mycobacterium tuberculosis (M.tb) is the causative agent of tuberculosis, killing ~1.7 million people annually. The remarkable capacity of this pathogen to escape the host immune system for decades and then to cause active tuberculosis disease, makes M.tb a successful pathogen. Currently available anti-mycobacterial therapy has poor compliance due to requirement of prolonged treatment resulting in accelerated emergence of drug resistant strains. Hence, there is an urgent need to identify new chemical entities with novel mechanism of action and potent activity against the drug resistant strains. Results This study describes novel computational models developed for predicting inhibitors against both replicative and non-replicative phase of drug-tolerant M.tb under carbon starvation stage. These models were trained on highly diverse dataset of 2135 compounds using four classes of binary fingerprint namely PubChem, MACCS, EState, SubStructure. We achieved the best performance Matthews correlation coefficient (MCC) of 0.45 using the model based on MACCS fingerprints for replicative phase inhibitor dataset. In case of non-replicative phase, Hybrid model based on PubChem, MACCS, EState, SubStructure fingerprints performed better with maximum MCC value of 0.28. In this study, we have shown that molecular weight, polar surface area and rotatable bond count of inhibitors (replicating and non-replicating phase) are significantly different from non-inhibitors. The fragment analysis suggests that substructures like hetero_N_nonbasic, heterocyclic, carboxylic_ester, and hetero_N_basic_no_H are predominant in replicating phase inhibitors while hetero_O, ketone, secondary_mixed_amine are preferred in the non-replicative phase inhibitors. It was observed that nitro, alkyne, and enamine are important for the molecules inhibiting bacilli residing in both the phases. In this study, we introduced a new algorithm based on Matthews correlation coefficient called MCCA for feature selection and found that this algorithm is better or comparable to frequency based approach. Conclusion In this study, we have developed computational models to predict phase specific inhibitors against drug resistant strains of M.tb grown under carbon starvation. Based on simple molecular properties, we have derived some rules, which would be useful in robust identification of tuberculosis inhibitors. Based on these observations, we have developed a webserver for predicting inhibitors against drug tolerant M.tb H37Rv available at http://crdd.osdd.net/oscadd/mdri/.
Collapse
Affiliation(s)
- Deepak Singla
- Bioinformatics Centre, CSIR-Institute of Microbial Technology, Sector 39A, Chandigarh, India.
| | | | | | | | | |
Collapse
|
23
|
Jamal S, Periwal V, Scaria V. Predictive modeling of anti-malarial molecules inhibiting apicoplast formation. BMC Bioinformatics 2013; 14:55. [PMID: 23419172 PMCID: PMC3599641 DOI: 10.1186/1471-2105-14-55] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2012] [Accepted: 02/04/2013] [Indexed: 11/13/2022] Open
Abstract
Background Malaria is a major healthcare problem worldwide resulting in an estimated 0.65 million deaths every year. It is caused by the members of the parasite genus Plasmodium. The current therapeutic options for malaria are limited to a few classes of molecules, and are fast shrinking due to the emergence of widespread resistance to drugs in the pathogen. The recent availability of high-throughput phenotypic screen datasets for antimalarial activity offers a possibility to create computational models for bioactivity based on chemical descriptors of molecules with potential to accelerate drug discovery for malaria. Results In the present study, we have used high-throughput screen datasets for the discovery of apicoplast inhibitors of the malarial pathogen as assayed from the delayed death response. We employed machine learning approach and developed computational predictive models to predict the biological activity of new antimalarial compounds. The molecules were further evaluated for common substructures using a Maximum Common Substructure (MCS) based approach. Conclusions We created computational models using state-of-the-art machine learning algorithms. The models were evaluated based on multiple statistical criteria. We found Random Forest based approach provides for better accuracy as assessed from ROC curve analysis. We further evaluated the active molecules using a substructure based approach to identify common substructures enriched in the active set. We argue that the computational models generated could be effectively used to screen large molecular datasets to prioritize them for phenotypic screens, drastically reducing cost while improving the hit rate.
Collapse
Affiliation(s)
- Salma Jamal
- CSIR Open Source Drug Discovery Unit, Anusandhan Bhavan, Delhi 110001, India
| | | | | | | |
Collapse
|
24
|
Abstract
The search for small molecules with activity against Mycobacterium tuberculosis increasingly uses -high-throughput screening and computational methods. Previously, we have analyzed recent studies in which computational tools were used for cheminformatics. We have now updated this analysis to illustrate how they may assist in finding desirable leads for tuberculosis drug discovery. We provide our thoughts on strategies for drug discovery efforts for neglected diseases.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay Varina, NC, USA
| | | |
Collapse
|
25
|
Jamal S, Periwal V, Scaria V. Computational analysis and predictive modeling of small molecule modulators of microRNA. J Cheminform 2012; 4:16. [PMID: 22889302 PMCID: PMC3466443 DOI: 10.1186/1758-2946-4-16] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2012] [Accepted: 07/30/2012] [Indexed: 11/30/2022] Open
Abstract
Background MicroRNAs (miRNA) are small endogenously transcribed regulatory RNA which modulates gene expression at a post transcriptional level. These small RNAs have now been shown to be critical regulators in a number of biological processes in the cell including pathophysiology of diseases like cancers. The increasingly evident roles of microRNA in disease processes have also motivated attempts to target them therapeutically. Recently there has been immense interest in understanding small molecule mediated regulation of RNA, including microRNA. Results We have used publicly available datasets of high throughput screens on small molecules with potential to inhibit microRNA. We employed computational methods based on chemical descriptors and machine learning to create predictive computational models for biological activity of small molecules. We further used a substructure based approach to understand common substructures potentially contributing to the activity. Conclusion We generated computational models based on Naïve Bayes and Random Forest towards mining small RNA binding molecules from large molecular datasets. We complement this with substructure based approach to identify and understand potentially enriched substructures in the active dataset. We use this approach to identify miRNA binding potential of a set of approved drugs, suggesting a probable novel mechanism of off-target activity of these drugs. To the best of our knowledge, this is the first and most comprehensive computational analysis towards understanding RNA binding activities of small molecules and predictive modeling of these activities.
Collapse
Affiliation(s)
- Salma Jamal
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Mall Road, Delhi, 110007, India.
| | | | | | | |
Collapse
|