1
|
Artificial intelligence and machine-learning approaches in structure and ligand-based discovery of drugs affecting central nervous system. Mol Divers 2022; 27:959-985. [PMID: 35819579 DOI: 10.1007/s11030-022-10489-3] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2022] [Accepted: 06/21/2022] [Indexed: 12/11/2022]
Abstract
CNS disorders are indications with a very high unmet medical needs, relatively smaller number of available drugs, and a subpar satisfaction level among patients and caregiver. Discovery of CNS drugs is extremely expensive affair with its own unique challenges leading to extremely high attrition rates and low efficiency. With explosion of data in information age, there is hardly any aspect of life that has not been touched by data driven technologies such as artificial intelligence (AI) and machine learning (ML). Drug discovery is no exception, emergence of big data via genomic, proteomic, biological, and chemical technologies has driven pharmaceutical giants to collaborate with AI oriented companies to revolutionise drug discovery, with the goal of increasing the efficiency of the process. In recent years many examples of innovative applications of AI and ML techniques in CNS drug discovery has been reported. Research on therapeutics for diseases such as schizophrenia, Alzheimer's and Parkinsonism has been provided with a new direction and thrust from these developments. AI and ML has been applied to both ligand-based and structure-based drug discovery and design of CNS therapeutics. In this review, we have summarised the general aspects of AI and ML from the perspective of drug discovery followed by a comprehensive coverage of the recent developments in the applications of AI/ML techniques in CNS drug discovery.
Collapse
|
2
|
Lemenze A, Mittal N, Perryman AL, Daher SS, Ekins S, Occi J, Ahn YM, Wang X, Russo R, Patel JS, Daugherty RM, Wood DO, Connell N, Freundlich JS. Rickettsia Aglow: A Fluorescence Assay and Machine Learning Model to Identify Inhibitors of Intracellular Infection. ACS Infect Dis 2022; 8:1280-1290. [PMID: 35748568 PMCID: PMC9912140 DOI: 10.1021/acsinfecdis.2c00014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
Rickettsia is a genus of Gram-negative bacteria that has for centuries caused large-scale morbidity and mortality. In recent years, the resurgence of rickettsial diseases as a major cause of pyrexias of unknown origin, bioterrorism concerns, vector movement, and concerns over drug resistance is driving a need to identify novel treatments for these obligate intracellular bacteria. Utilizing an uvGFP plasmid reporter, we developed a screen for identifying anti-rickettsial small molecule inhibitors using Rickettsia canadensis as a model organism. The screening data were utilized to train a Bayesian model to predict growth inhibition in this assay. This two-pronged methodology identified anti-rickettsial compounds, including duartin and JSF-3204 as highly specific, efficacious, and noncytotoxic compounds. Both molecules exhibited in vitro growth inhibition of R. prowazekii, the causative agent of epidemic typhus. These small molecules and the workflow, featuring a high-throughput phenotypic screen for growth inhibitors of intracellular Rickettsia spp. and machine learning models for the prediction of growth inhibition of an obligate intracellular Gram-negative bacterium, should prove useful in the search for new therapeutic strategies to treat infections from Rickettsia spp. and other obligate intracellular bacteria.
Collapse
Affiliation(s)
- Alexander Lemenze
- Department of Medicine, and the Ruy V. Lourenco Center for the Study of Emerging and Reemerging Pathogens, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Department of Pathology, Immunology, and Laboratory Medicine, Rutgers University - New Jersey Medical School, Cancer Center Building, 205 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Nisha Mittal
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Bristol Myers Squibb, 1 Squibb Drive, Building 85 Room A-WS216D, New Brunswick, New Jersey 08901, United States
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Repare Therapeutics, 7171 Rue Frederick-Banting, Montreal, Quebec H4S 1Z9, Canada
| | - Samer S. Daher
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Ambrx, 10975 N. Torrey Pines Road, La Jolla, California 92037, United States
| | - Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, North Carolina 27526, United States; Present Address: Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, North Carolina 27606, United States
| | - James Occi
- Department of Medicine, and the Ruy V. Lourenco Center for the Study of Emerging and Reemerging Pathogens, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Center for Vector Biology, Department of Entomology, Rutgers University, 180 Jones Avenue, New Brunswick, New Jersey 08901, United States
| | - Yong-Mo Ahn
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, United States
| | - Riccardo Russo
- Department of Medicine, and the Ruy V. Lourenco Center for the Study of Emerging and Reemerging Pathogens, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Jimmy S. Patel
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: Department of Radiation Oncology, Winship Cancer Institute of Emory University, 1365-A Clifton Road NE, Atlanta, Georgia 30322, United States
| | - Robin M. Daugherty
- Department of Microbiology and Immunology, University of South Alabama, Mobile, Alabama 36688, United States
| | - David O. Wood
- Department of Microbiology and Immunology, University of South Alabama, Mobile, Alabama 36688, United States
| | - Nancy Connell
- Department of Medicine, and the Ruy V. Lourenco Center for the Study of Emerging and Reemerging Pathogens, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States; Present Address: U.S. National Academies of Science, Engineering and Medicine, 500 5th Street NW, Washington, District of Columbia 20002, United States
| | - Joel S. Freundlich
- Department of Medicine, and the Ruy V. Lourenco Center for the Study of Emerging and Reemerging Pathogens and Department of Pharmacology, Physiology, and Neuroscience, Rutgers University - New Jersey Medical School, Newark, New Jersey 07103, United States
| |
Collapse
|
3
|
Lane TR, Urbina F, Rank L, Gerlach J, Riabova O, Lepioshkin A, Kazakova E, Vocat A, Tkachenko V, Cole S, Makarov V, Ekins S. Machine Learning Models for Mycobacterium tuberculosisIn Vitro Activity: Prediction and Target Visualization. Mol Pharm 2022; 19:674-689. [PMID: 34964633 PMCID: PMC9121329 DOI: 10.1021/acs.molpharmaceut.1c00791] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Tuberculosis (TB) is a major global health challenge, with approximately 1.4 million deaths per year. There is still a need to develop novel treatments for patients infected with Mycobacterium tuberculosis (Mtb). There have been many large-scale phenotypic screens that have led to the identification of thousands of new compounds. Yet, there is very limited investment in TB drug discovery which points to the need for new methods to increase the efficiency of drug discovery against Mtb. We have used machine learning approaches to learn from the public Mtb data, resulting in many data sets and models with robust enrichment and hit rates leading to the discovery of new active compounds. Recently, we have curated predominantly small-molecule Mtb data and developed new machine learning classification models with 18 886 molecules at different activity cutoffs. We now describe the further validation of these Bayesian models using a library of over 1000 molecules synthesized as part of EU-funded New Medicines for TB and More Medicines for TB programs. We highlight molecular features which are enriched in these active compounds. In addition, we provide new regression and classification models that can be used for scoring compound libraries or used to design new molecules. We have also visualized these molecules in the context of known molecular targets and identified clusters in chemical property space, which may aid in future target identification efforts. Finally, we are also making these data sets publicly available, representing a significant increase to the available Mtb inhibition data in the public domain.
Collapse
Affiliation(s)
- Thomas R. Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Laura Rank
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Olga Riabova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | | | - Elena Kazakova
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Anthony Vocat
- Global Health Institute, Ecole Polytechnique Fédérale de Lausanne, Lausanne 1015, Switzerland
| | - Valery Tkachenko
- Science Data Experts, 14909 Forest Landing Cir, Rockville, MD 20850
| | | | - Vadim Makarov
- Research Center of Biotechnology RAS, 119071 Moscow, Russia
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
4
|
Alajlani MM. The Chemical Property Position of Bedaquiline Construed by a Chemical Global Positioning System-Natural Product. Molecules 2022; 27:753. [DOI: https:/doi.org/10.3390/molecules27030753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/08/2023] Open
Abstract
Bedaquiline is a novel adenosine triphosphate synthase inhibitor anti-tuberculosis drug. Bedaquiline belongs to the class of diarylquinolines, which are antituberculosis drugs that are quite different mechanistically from quinolines and flouroquinolines. The fact that relatively similar chemical drugs produce different mechanisms of action is still not widely understood. To enhance discrimination in favor of bedaquiline, a new approach using eight-score principal component analysis (PCA), provided by a ChemGPS-NP model, is proposed. PCA scores were calculated based on 35 + 1 different physicochemical properties and demonstrated clear differences when compared with other quinolines. The ChemGPS-NP model provided an exceptional 100 compounds nearest to bedaquiline from antituberculosis screening sets (with a cumulative Euclidian distance of 196.83), compared with the different 2Dsimilarity provided by Tanimoto methods (extended connective fingerprints and the Molecular ACCess System, showing 30% and 182% increases in cumulative Euclidian distance, respectively). Potentially similar compounds from publicly available antituberculosis compounds and Maybridge sets, based on bedaquiline’s eight-dimensional similarity and different filtrations, were identified too.
Collapse
|
5
|
The Chemical Property Position of Bedaquiline Construed by a Chemical Global Positioning System-Natural Product. Molecules 2022; 27:molecules27030753. [PMID: 35164018 PMCID: PMC8838968 DOI: 10.3390/molecules27030753] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 11/18/2022] Open
Abstract
Bedaquiline is a novel adenosine triphosphate synthase inhibitor anti-tuberculosis drug. Bedaquiline belongs to the class of diarylquinolines, which are antituberculosis drugs that are quite different mechanistically from quinolines and flouroquinolines. The fact that relatively similar chemical drugs produce different mechanisms of action is still not widely understood. To enhance discrimination in favor of bedaquiline, a new approach using eight-score principal component analysis (PCA), provided by a ChemGPS-NP model, is proposed. PCA scores were calculated based on 35 + 1 different physicochemical properties and demonstrated clear differences when compared with other quinolines. The ChemGPS-NP model provided an exceptional 100 compounds nearest to bedaquiline from antituberculosis screening sets (with a cumulative Euclidian distance of 196.83), compared with the different 2Dsimilarity provided by Tanimoto methods (extended connective fingerprints and the Molecular ACCess System, showing 30% and 182% increases in cumulative Euclidian distance, respectively). Potentially similar compounds from publicly available antituberculosis compounds and Maybridge sets, based on bedaquiline’s eight-dimensional similarity and different filtrations, were identified too.
Collapse
|
6
|
Computational Drug Repurposing for Antituberculosis Therapy: Discovery of Multi-Strain Inhibitors. Antibiotics (Basel) 2021; 10:antibiotics10081005. [PMID: 34439055 PMCID: PMC8388932 DOI: 10.3390/antibiotics10081005] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2021] [Revised: 08/15/2021] [Accepted: 08/17/2021] [Indexed: 12/13/2022] Open
Abstract
Tuberculosis remains the most afflicting infectious disease known by humankind, with one quarter of the population estimated to have it in the latent state. Discovering antituberculosis drugs is a challenging, complex, expensive, and time-consuming task. To overcome the substantial costs and accelerate drug discovery and development, drug repurposing has emerged as an attractive alternative to find new applications for “old” drugs and where computational approaches play an essential role by filtering the chemical space. This work reports the first multi-condition model based on quantitative structure–activity relationships and an ensemble of neural networks (mtc-QSAR-EL) for the virtual screening of potential antituberculosis agents able to act as multi-strain inhibitors. The mtc-QSAR-EL model exhibited an accuracy higher than 85%. A physicochemical and fragment-based structural interpretation of this model was provided, and a large dataset of agency-regulated chemicals was virtually screened, with the mtc-QSAR-EL model identifying already proven antituberculosis drugs while proposing chemicals with great potential to be experimentally repurposed as antituberculosis (multi-strain inhibitors) agents. Some of the most promising molecules identified by the mtc-QSAR-EL model as antituberculosis agents were also confirmed by another computational approach, supporting the capabilities of the mtc-QSAR-EL model as an efficient tool for computational drug repurposing.
Collapse
|
7
|
Recent advances in drug repurposing using machine learning. Curr Opin Chem Biol 2021; 65:74-84. [PMID: 34274565 DOI: 10.1016/j.cbpa.2021.06.001] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Revised: 05/28/2021] [Accepted: 06/01/2021] [Indexed: 12/11/2022]
Abstract
Drug repurposing aims to find new uses for already existing and approved drugs. We now provide a brief overview of recent developments in drug repurposing using machine learning alongside other computational approaches for comparison. We also highlight several applications for cancer using kinase inhibitors, Alzheimer's disease as well as COVID-19.
Collapse
|
8
|
Vatansever S, Schlessinger A, Wacker D, Kaniskan HÜ, Jin J, Zhou M, Zhang B. Artificial intelligence and machine learning-aided drug discovery in central nervous system diseases: State-of-the-arts and future directions. Med Res Rev 2021; 41:1427-1473. [PMID: 33295676 PMCID: PMC8043990 DOI: 10.1002/med.21764] [Citation(s) in RCA: 131] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/30/2020] [Accepted: 11/20/2020] [Indexed: 01/11/2023]
Abstract
Neurological disorders significantly outnumber diseases in other therapeutic areas. However, developing drugs for central nervous system (CNS) disorders remains the most challenging area in drug discovery, accompanied with the long timelines and high attrition rates. With the rapid growth of biomedical data enabled by advanced experimental technologies, artificial intelligence (AI) and machine learning (ML) have emerged as an indispensable tool to draw meaningful insights and improve decision making in drug discovery. Thanks to the advancements in AI and ML algorithms, now the AI/ML-driven solutions have an unprecedented potential to accelerate the process of CNS drug discovery with better success rate. In this review, we comprehensively summarize AI/ML-powered pharmaceutical discovery efforts and their implementations in the CNS area. After introducing the AI/ML models as well as the conceptualization and data preparation, we outline the applications of AI/ML technologies to several key procedures in drug discovery, including target identification, compound screening, hit/lead generation and optimization, drug response and synergy prediction, de novo drug design, and drug repurposing. We review the current state-of-the-art of AI/ML-guided CNS drug discovery, focusing on blood-brain barrier permeability prediction and implementation into therapeutic discovery for neurological diseases. Finally, we discuss the major challenges and limitations of current approaches and possible future directions that may provide resolutions to these difficulties.
Collapse
Affiliation(s)
- Sezen Vatansever
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Avner Schlessinger
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Daniel Wacker
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of NeuroscienceIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - H. Ümit Kaniskan
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Jian Jin
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Therapeutics DiscoveryIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Ming‐Ming Zhou
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Oncological Sciences, Tisch Cancer InstituteIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Bin Zhang
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Mount Sinai Center for Transformative Disease ModelingIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Icahn Institute for Data Science and Genomic TechnologyIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Pharmacological SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| |
Collapse
|
9
|
Ye Q, Chai X, Jiang D, Yang L, Shen C, Zhang X, Li D, Cao D, Hou T. Identification of active molecules against Mycobacterium tuberculosis through machine learning. Brief Bioinform 2021; 22:6209685. [PMID: 33822874 DOI: 10.1093/bib/bbab068] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Revised: 01/23/2021] [Accepted: 02/09/2021] [Indexed: 11/14/2022] Open
Abstract
Tuberculosis (TB) is an infectious disease caused by Mycobacterium tuberculosis (Mtb) and it has been one of the top 10 causes of death globally. Drug-resistant tuberculosis (XDR-TB), extensively resistant to the commonly used first-line drugs, has emerged as a major challenge to TB treatment. Hence, it is quite necessary to discover novel drug candidates for TB treatment. In this study, based on different types of molecular representations, four machine learning (ML) algorithms, including support vector machine, random forest (RF), extreme gradient boosting (XGBoost) and deep neural networks (DNN), were used to develop classification models to distinguish Mtb inhibitors from noninhibitors. The results demonstrate that the XGBoost model exhibits the best prediction performance. Then, two consensus strategies were employed to integrate the predictions from multiple models. The evaluation results illustrate that the consensus model by stacking the RF, XGBoost and DNN predictions offers the best predictions with area under the receiver operating characteristic curve of 0.842 and 0.942 for the 10-fold cross-validated training set and external test set, respectively. Besides, the association between the important descriptors and the bioactivities of molecules was interpreted by using the Shapley additive explanations method. Finally, an online webserver called ChemTB (http://cadd.zju.edu.cn/chemtb/) was developed, and it offers a freely available computational tool to detect potential Mtb inhibitors.
Collapse
Affiliation(s)
- Qing Ye
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Xin Chai
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Dejun Jiang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Liu Yang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Chao Shen
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Xujun Zhang
- College of Pharmaceutical Sciences at Zhejiang University, China
| | - Dan Li
- College of Pharmaceutical Sciences, Zhejiang University, China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences at Central South University, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences at Zhejiang University, China
| |
Collapse
|
10
|
Minias A, Żukowska L, Lechowicz E, Gąsior F, Knast A, Podlewska S, Zygała D, Dziadek J. Early Drug Development and Evaluation of Putative Antitubercular Compounds in the -Omics Era. Front Microbiol 2021; 11:618168. [PMID: 33603720 PMCID: PMC7884339 DOI: 10.3389/fmicb.2020.618168] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Accepted: 12/30/2020] [Indexed: 12/14/2022] Open
Abstract
Tuberculosis (TB) is an infectious disease caused by the bacterium Mycobacterium tuberculosis. According to the WHO, the disease is one of the top 10 causes of death of people worldwide. Mycobacterium tuberculosis is an intracellular pathogen with an unusually thick, waxy cell wall and a complex life cycle. These factors, combined with M. tuberculosis ability to enter prolonged periods of latency, make the bacterium very difficult to eradicate. The standard treatment of TB requires 6-20months, depending on the drug susceptibility of the infecting strain. The need to take cocktails of antibiotics to treat tuberculosis effectively and the emergence of drug-resistant strains prompts the need to search for new antitubercular compounds. This review provides a perspective on how modern -omic technologies facilitate the drug discovery process for tuberculosis treatment. We discuss how methods of DNA and RNA sequencing, proteomics, and genetic manipulation of organisms increase our understanding of mechanisms of action of antibiotics and allow the evaluation of drugs. We explore the utility of mathematical modeling and modern computational analysis for the drug discovery process. Finally, we summarize how -omic technologies contribute to our understanding of the emergence of drug resistance.
Collapse
Affiliation(s)
- Alina Minias
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
| | - Lidia Żukowska
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- BioMedChem Doctoral School of the University of Lodz and the Institutes of the Polish Academy of Sciences in Lodz, Lodz, Poland
| | - Ewelina Lechowicz
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Microbiology, Biotechnology and Immunology, Faculty of Biology and Environmental Protection, University of Lodz, Lodz, Poland
| | - Filip Gąsior
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- BioMedChem Doctoral School of the University of Lodz and the Institutes of the Polish Academy of Sciences in Lodz, Lodz, Poland
| | - Agnieszka Knast
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Molecular and Industrial Biotechnology, Faculty of Biotechnology and Food Sciences, Lodz University of Technology, Lodz, Poland
| | - Sabina Podlewska
- Department of Technology and Biotechnology of Drugs, Jagiellonian University Medical College, Krakow, Poland
- Maj Institute of Pharmacology, Polish Academy of Sciences, Krakow, Poland
| | - Daria Zygała
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
- Institute of Microbiology, Biotechnology and Immunology, Faculty of Biology and Environmental Protection, University of Lodz, Lodz, Poland
| | - Jarosław Dziadek
- Laboratory of Genetics and Physiology of Mycobacterium, Institute of Medical Biology, Polish Academy of Sciences, Lodz, Poland
| |
Collapse
|
11
|
Pires DEV, Ascher DB. mycoCSM: Using Graph-Based Signatures to Identify Safe Potent Hits against Mycobacteria. J Chem Inf Model 2020; 60:3450-3456. [PMID: 32615035 DOI: 10.1021/acs.jcim.0c00362] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Development of new potent, safe drugs to treat Mycobacteria has proven to be challenging, with limited hit rates of initial screens restricting subsequent development efforts. Despite significant efforts and the evolution of quantitative structure-activity relationship as well as machine learning-based models for computationally predicting molecule bioactivity, there is an unmet need for efficient and reliable methods for identifying biologically active compounds against Mycobacterium that are also safe for humans. Here we developed mycoCSM, a graph-based signature approach to rapidly identify compounds likely to be active against bacteria from the genus Mycobacterium, or against specific Mycobacteria species. mycoCSM was trained and validated on eight organism-specific and for the first time a general Mycobacteria data set, achieving correlation coefficients of up to 0.89 on cross-validation and 0.88 on independent blind tests, when predicting bioactivity in terms of minimum inhibitory concentration. In addition, we also developed a predictor to identify those compounds likely to penetrate in necrotic tuberculosis foci, which achieved a correlation coefficient of 0.75. Together with a built-in estimator of the maximum tolerated dose in humans, we believe this method will provide a valuable resource to enrich screening libraries with potent, safe molecules. To provide simple guidance in the selection of libraries with favorable anti-Mycobacteria properties, we made mycoCSM freely available online at http://biosig.unimelb.edu.au/myco_csm.
Collapse
Affiliation(s)
- Douglas E V Pires
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne 3004, VIC, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville 3052, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Parkville 3052, VIC, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, 75 Commercial Road, Melbourne 3004, VIC, Australia.,Department of Biochemistry and Molecular Biology, Bio21 Institute, University of Melbourne, 30 Flemington Rd, Parkville 3052, VIC, Australia.,Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, England
| |
Collapse
|
12
|
Makarov V, Salina E, Reynolds RC, Kyaw Zin PP, Ekins S. Molecule Property Analyses of Active Compounds for Mycobacterium tuberculosis. J Med Chem 2020; 63:8917-8955. [PMID: 32259446 DOI: 10.1021/acs.jmedchem.9b02075] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Tuberculosis (TB) continues to claim the lives of around 1.7 million people per year. Most concerning are the reports of multidrug drug resistance. Paradoxically, this global health pandemic is demanding new therapies when resources and interest are waning. However, continued tuberculosis drug discovery is critical to address the global health need and burgeoning multidrug resistance. Many diverse classes of antitubercular compounds have been identified with activity in vitro and in vivo. Our analyses of over 100 active leads are representative of thousands of active compounds generated over the past decade, suggests that they come from few chemical classes or natural product sources. We are therefore repeatedly identifying compounds that are similar to those that preceded them. Our molecule-centered cheminformatics analyses point to the need to dramatically increase the diversity of chemical libraries tested and get outside of the historic Mtb property space if we are to generate novel improved antitubercular leads.
Collapse
Affiliation(s)
- Vadim Makarov
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Elena Salina
- FRC Fundamentals of Biotechnology, Russian Academy of Science, Moscow 119071, Russia
| | - Robert C Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 Second Avenue South, Birmingham, Alabama 35294-3300, United States
| | - Phyo Phyo Kyaw Zin
- Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States.,Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510 Raleigh, North Carolina 27606, United States
| |
Collapse
|
13
|
Amin SA, Banerjee S, Adhikari N, Jha T. Discriminations of active from inactive HDAC8 inhibitors Part II: Bayesian classification study to find molecular fingerprints. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2020; 31:245-260. [PMID: 32073312 DOI: 10.1080/1062936x.2020.1723136] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 01/26/2020] [Indexed: 06/10/2023]
Abstract
In continuation of our earlier work (Doi: 10.1080/07391102.2019.1661876), a statistically validated and robust Bayesian model was developed on a large diverse set of HDAC8 inhibitors. The training set comprised of 676 small molecules and 293 compounds were considered as test set molecules. The findings of this analysis will help to explore some major directions regarding the HDAC8 inhibitor designing approach. Acrylamide (G1-G3, G9), N-substituted 2-phenylimidazole (G4-G8, G9, G12-G13, G16-G19), benzimidazole (G10-G11), piperidine substituted pyrrole (G13-G14) groups, alkyl/aryl amide (G15) and aryloxy carboxamide (G20) fingerprints were found to play a crucial role in HDAC8 inhibitory activity whereas -CH-N=CH- (B1, B4-B6, B14) motif, benzamide (B2-B3, B9-B13, B16-B17) groups and heptazepine (B7-B8, B15, B18-B20) group were found to influence negatively the HDAC8 inhibitory activity. The importance of such fingerprints was further validated by the HDAC8 enzyme and related inhibitor interactions at the receptor level. These results are in close agreement with those of our previous work that validate each other. Moreover, this comparative learning may enrich future endeavours regarding the designing strategy of HDAC8 inhibitors.
Collapse
Affiliation(s)
- S A Amin
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - S Banerjee
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - N Adhikari
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| | - T Jha
- Natural Science Laboratory, Division of Medicinal and Pharmaceutical Chemistry, Department of Pharmaceutical Technology, Jadavpur University, Kolkata, India
| |
Collapse
|
14
|
Evaluating Antimycobacterial Screening Schemes Using Chemical Global Positioning System-Natural Product Analysis. Molecules 2020; 25:molecules25040945. [PMID: 32093238 PMCID: PMC7071165 DOI: 10.3390/molecules25040945] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2019] [Revised: 02/06/2020] [Accepted: 02/14/2020] [Indexed: 11/20/2022] Open
Abstract
Most of the targeted discoveries in tuberculosis research have covered previously explored chemical structures but neglected physiochemical properties. Until now, no efficient prediction tools have been developed to discriminate the novelty of screened compounds at early stages. To overcome this deficit, a drastic novel approach must include physicochemical properties filters provided by Chemical Global Positioning System-Natural Product analysis (ChemGPS-NP). Three different screening schemes GSK, GVKBio, and NIAID provided 776, 2880, and 3779 compounds respectively and were evaluated based on their physicochemical properties and thereby proposed as deduction examples. Charting the physiochemical property spaces of these sets identified the merits and demerits of each screening scheme by simply observing the distribution over the chemical property space. We found that GSK screening set was confined to a certain space, losing potentially active compounds when compared with an in-house constructed 459 highly active compounds (active set), while the GVKBio and NIAID screening schemes were evenly distributed through space. The latter two sets had the advantage, as they have covered a larger space and presented compounds with additional variety of properties and activities. The in-house active set was cross-validated with MycPermCheck and SmartsFilter to be able to identify priority compounds. The model demonstrated undiscovered spaces when matched with Maybridge drug-like space, providing further potential targets. These undiscovered spaces should be considered in any future investigations. We have included the most active compounds along with permeability and toxicity filters as supplemented material.
Collapse
|
15
|
Alajlani MM, Backlund A. Evaluating Antimycobacterial Screening Schemes Using Chemical Global Positioning System-Natural Product Analysis. Molecules 2020; 25:945. [DOI: https:/doi.org/10.3390/molecules25040945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/08/2023] Open
Abstract
Most of the targeted discoveries in tuberculosis research have covered previously explored chemical structures but neglected physiochemical properties. Until now, no efficient prediction tools have been developed to discriminate the novelty of screened compounds at early stages. To overcome this deficit, a drastic novel approach must include physicochemical properties filters provided by Chemical Global Positioning System-Natural Product analysis (ChemGPS-NP). Three different screening schemes GSK, GVKBio, and NIAID provided 776, 2880, and 3779 compounds respectively and were evaluated based on their physicochemical properties and thereby proposed as deduction examples. Charting the physiochemical property spaces of these sets identified the merits and demerits of each screening scheme by simply observing the distribution over the chemical property space. We found that GSK screening set was confined to a certain space, losing potentially active compounds when compared with an in-house constructed 459 highly active compounds (active set), while the GVKBio and NIAID screening schemes were evenly distributed through space. The latter two sets had the advantage, as they have covered a larger space and presented compounds with additional variety of properties and activities. The in-house active set was cross-validated with MycPermCheck and SmartsFilter to be able to identify priority compounds. The model demonstrated undiscovered spaces when matched with Maybridge drug-like space, providing further potential targets. These undiscovered spaces should be considered in any future investigations. We have included the most active compounds along with permeability and toxicity filters as supplemented material.
Collapse
|
16
|
Amin SA, Ghosh K, Mondal D, Jha T, Gayen S. Exploring indole derivatives as myeloid cell leukaemia-1 (Mcl-1) inhibitors with multi-QSAR approach: a novel hope in anti-cancer drug discovery. NEW J CHEM 2020. [DOI: 10.1039/d0nj03863f] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
In humans, the over-expression of Mcl-1 protein causes different cancers and it is also responsible for cancer resistance to different cytotoxic agents.
Collapse
Affiliation(s)
- Sk. Abdul Amin
- Natural Science Laboratory
- Division of Medicinal and Pharmaceutical Chemistry
- Department of Pharmaceutical Technology
- Jadavpur University
- Kolkata
| | - Kalyan Ghosh
- Laboratory of Drug Design and Discovery
- Department of Pharmaceutical Sciences
- Dr Harisingh Gour University
- Sagar
- India
| | - Dipayan Mondal
- Laboratory of Drug Design and Discovery
- Department of Pharmaceutical Sciences
- Dr Harisingh Gour University
- Sagar
- India
| | - Tarun Jha
- Natural Science Laboratory
- Division of Medicinal and Pharmaceutical Chemistry
- Department of Pharmaceutical Technology
- Jadavpur University
- Kolkata
| | - Shovanlal Gayen
- Laboratory of Drug Design and Discovery
- Department of Pharmaceutical Sciences
- Dr Harisingh Gour University
- Sagar
- India
| |
Collapse
|
17
|
Zhou WN, Zhang YM, Qiao X, Pan J, Yin LF, Zhu L, Zhao JN, Lu S, Lu T, Chen YD, Liu HC. Virtual Screening Strategy Combined Bayesian Classification Model, Molecular Docking for Acetyl-CoA Carboxylases Inhibitors. Curr Comput Aided Drug Des 2019; 15:193-205. [PMID: 30411690 DOI: 10.2174/1573409914666181109110030] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Revised: 08/11/2018] [Accepted: 10/16/2018] [Indexed: 11/22/2022]
Abstract
INTRODUCTION Acetyl-CoA Carboxylases (ACC) have been an important target for the therapy of metabolic syndrome, such as obesity, hepatic steatosis, insulin resistance, dyslipidemia, non-alcoholic fatty liver disease (NAFLD), non-alcoholic steatohepatitis (NASH), type 2 diabetes (T2DM), and some other diseases. METHODS In this study, virtual screening strategy combined with Bayesian categorization modeling, molecular docking and binding site analysis with protein ligand interaction fingerprint (PLIF) was adopted to validate some potent ACC inhibitors. First, the best Bayesian model with an excellent value of Area Under Curve (AUC) value (training set AUC: 0.972, test set AUC: 0.955) was used to screen compounds of validation library. Then the compounds screened by best Bayesian model were further screened by molecule docking again. RESULTS Finally, the hit compounds evaluated with four percentages (1%, 2%, 5%, 10%) were verified to reveal enrichment rates for the compounds. The combination of the ligandbased Bayesian model and structure-based virtual screening resulted in the identification of top four compounds which exhibited excellent IC 50 values against ACC in top 1% of the validation library. CONCLUSION In summary, the whole strategy is of high efficiency, and would be helpful for the discovery of ACC inhibitors and some other target inhibitors.
Collapse
Affiliation(s)
- Wei-Neng Zhou
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Yan-Min Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Xin Qiao
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Jing Pan
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Ling-Feng Yin
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Lu Zhu
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Jun-Nan Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Shuai Lu
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Ya-Dong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| | - Hai-Chun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Basic Science, China Pharmaceutical University, Nanjing, Jiangsu, China
| |
Collapse
|
18
|
Azimi F, Ghasemi JB, Saghaei L, Hassanzadeh F, Mahdavi M, Sadeghi-Aliabadi H, Scotti MT, Scotti L. Identification of Essential 2D and 3D Chemical Features for Discovery of the Novel Tubulin Polymerization Inhibitors. Curr Top Med Chem 2019; 19:1092-1120. [DOI: 10.2174/1568026619666190520083655] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2018] [Revised: 02/12/2019] [Accepted: 04/02/2019] [Indexed: 12/21/2022]
Abstract
Background:
Tubulin polymerization inhibitors interfere with microtubule assembly and
their functions lead to mitotic arrest, therefore they are attractive target for design and development of
novel anticancer compounds.
Objective:
The proposed novel and effective structures following the use of three-dimensionalquantitative
structure activity relationship (3D-QSAR) pharmacophore based virtual screening clearly
demonstrate the high efficiency of this method in modern drug discovery.
Method:
Combined computational approach was applied to extract the essential 2D and 3D features
requirements for higher activity as well as identify new anti-tubulin agents.
Results:
The best quantitative pharmacophore model, Hypo1, exhibited good correlation of 0.943
(RMSD=1.019) and excellent predictive power in the training set compounds. Generated model
AHHHR, was well mapped to colchicine site and three-dimensional spatial arrangement of their features
were in good agreement with the vital interactions in the active site. Total prediction accuracy
(0.92 for training set and 0.86 for test set), enrichment factor (4.2 for training set and 4.5 for test set)
and the area under the ROC curve (0.86 for training set and 0.94 for the test set), the developed model
using Extended Class FingerPrints of maximum diameter 4 (ECFP_4) was chosen as the best model.
Conclusion:
Developed computational platform provided a better understanding of requirement features
for colchicine site inhibitors and we believe the results of this study might be useful for the rational
design and optimization of new inhibitors.
Collapse
Affiliation(s)
- Fateme Azimi
- Department of Medicinal Chemistry, Faculty of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Jahan B. Ghasemi
- Department of Chemistry, Faculty of Sciences, University of Tehran, Tehran, Iran
| | - Lotfollah Saghaei
- Department of Medicinal Chemistry, Faculty of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Farshid Hassanzadeh
- Department of Medicinal Chemistry, Faculty of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Mohammad Mahdavi
- Endocrinology and Metabolism Research Center, Endocrinology and Metabolism Research Institute, Tehran University of Medical Sciences, Tehran, Iran
| | - Hojjat Sadeghi-Aliabadi
- Department of Medicinal Chemistry, Faculty of Pharmacy, Isfahan University of Medical Sciences, Isfahan, Iran
| | - Marcus T. Scotti
- Federal University of Paraiba, Health Sciences Center, Campus I, Joao Pessoa, PB, Brazil
| | - Luciana Scotti
- Federal University of Paraiba, Health Sciences Center, Campus I, Joao Pessoa, PB, Brazil
| |
Collapse
|
19
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 369] [Impact Index Per Article: 61.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
20
|
Lane T, Russo DP, Zorn KM, Clark AM, Korotcov A, Tkachenko V, Reynolds RC, Perryman AL, Freundlich JS, Ekins AS. Comparing and Validating Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. Mol Pharm 2018; 15:4346-4360. [PMID: 29672063 PMCID: PMC6167198 DOI: 10.1021/acs.molpharmaceut.8b00083] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Tuberculosis is a global health dilemma. In 2016, the WHO reported 10.4 million incidences and 1.7 million deaths. The need to develop new treatments for those infected with Mycobacterium tuberculosis ( Mtb) has led to many large-scale phenotypic screens and many thousands of new active compounds identified in vitro. However, with limited funding, efforts to discover new active molecules against Mtb needs to be more efficient. Several computational machine learning approaches have been shown to have good enrichment and hit rates. We have curated small molecule Mtb data and developed new models with a total of 18,886 molecules with activity cutoffs of 10 μM, 1 μM, and 100 nM. These data sets were used to evaluate different machine learning methods (including deep learning) and metrics and to generate predictions for additional molecules published in 2017. One Mtb model, a combined in vitro and in vivo data Bayesian model at a 100 nM activity yielded the following metrics for 5-fold cross validation: accuracy = 0.88, precision = 0.22, recall = 0.91, specificity = 0.88, kappa = 0.31, and MCC = 0.41. We have also curated an evaluation set ( n = 153 compounds) published in 2017, and when used to test our model, it showed the comparable statistics (accuracy = 0.83, precision = 0.27, recall = 1.00, specificity = 0.81, kappa = 0.36, and MCC = 0.47). We have also compared these models with additional machine learning algorithms showing Bayesian machine learning models constructed with literature Mtb data generated by different laboratories generally were equivalent to or outperformed deep neural networks with external test sets. Finally, we have also compared our training and test sets to show they were suitably diverse and different in order to represent useful evaluation sets. Such Mtb machine learning models could help prioritize compounds for testing in vitro and in vivo.
Collapse
Affiliation(s)
- Thomas Lane
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - Daniel P. Russo
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
- The Rutgers Center for Computational and Integrative Biology, Camden, NJ, 08102, USA
| | - Kimberley M. Zorn
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| | - Alex M. Clark
- Molecular Materials Informatics, Inc., 1900 St. Jacques #302, Montreal H3J 2S1, Quebec, Canada
| | - Alexandru Korotcov
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Valery Tkachenko
- Science Data Software, LLC, 14914 Bradwill Court, Rockville, MD 20850, USA
| | - Robert C. Reynolds
- Department of Medicine, Division of Hematology and Oncology, University of Alabama at Birmingham, NP 2540 J, 1720 2Avenue South, Birmingham, AL 35294-3300, USA
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School, Newark, New Jersey 07103, USA
- Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University–New Jersey Medical School, Newark, New Jersey 07103, USA
| | - and Sean Ekins
- Collaborations Pharmaceuticals, Inc., Main Campus Drive, Lab 3510 Raleigh, NC 27606, USA
| |
Collapse
|
21
|
Diverse classes of HDAC8 inhibitors: in search of molecular fingerprints that regulate activity. Future Med Chem 2018; 10:1589-1602. [PMID: 29953251 DOI: 10.4155/fmc-2018-0005] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
AIM HDAC8 is one of the crucial enzymes involved in malignancy. Structural explorations of HDAC8 inhibitory activity and selectivity are required. MATERIALS & METHODS A mathematical framework was constructed to explore important molecular fragments responsible for HDAC8 inhibition. Bayesian classification models were developed on a large set of structurally diverse HDAC8 inhibitors. RESULTS This study helps to understand the structural importance of HDAC8 inhibitors. The hydrophobic aryl cap function is important for HDAC8 inhibition whereas benzamide moiety shows a negative impact on HDAC8 inhibition. CONCLUSION This work validates our previously proposed structural features for better HDAC8 inhibition. The comparative learning between the statistical and intelligent methods will surely enrich future drug design aspects of HDAC8 inhibitors.
Collapse
|
22
|
Raychaudhury C, Rizvi MIH, Pal D. Combinatorial Design of Molecule using Activity-Linked Substructural Topological Information as Applied to Antitubercular Compounds. Curr Comput Aided Drug Des 2018; 15:67-81. [PMID: 29741142 DOI: 10.2174/1573409914666180509152711] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2017] [Revised: 04/20/2018] [Accepted: 04/30/2018] [Indexed: 01/07/2023]
Abstract
BACKGROUND Generating a large number of compounds using combinatorial methods increases the possibility of finding novel bioactive compounds. Although some combinatorial structure generation algorithms are available, any method for generating structures from activity-linked substructural topological information is not yet reported. OBJECTIVE To develop a method using graph-theoretical techniques for generating structures of antitubercular compounds combinatorially from activity-linked substructural topological information, predict activity and prioritize and screen potential drug candidates. METHODS Activity related vertices are identified from datasets composed of both active and inactive or, differently active compounds and structures are generated combinatorially using the topological distance distribution associated with those vertices. Biological activities are predicted using topological distance based vertex indices and a rule based method. Generated structures are prioritized using a newly defined Molecular Priority Score (MPS). RESULTS Studies considering a series of Acid Alkyl Ester (AAE) compounds and three known antitubercular drugs show that active compounds can be generated from substructural information of other active compounds for all these classes of compounds. Activity predictions show high level of success rate and a number of highly active AAE compounds produced high MPS score indicating that MPS score may help prioritize and screen potential drug molecules. A possible relation of this work with scaffold hopping and inverse Quantitative Structure-Activity Relationship (iQSAR) problem has also been discussed. CONCLUSION The proposed method seems to hold promise for discovering novel therapeutic candidates for combating Tuberculosis and may be useful for discovering novel drug molecules for the treatment of other diseases as well.
Collapse
Affiliation(s)
- Chandan Raychaudhury
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| | - Md Imbesat Hassan Rizvi
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| | - Debnath Pal
- Department of Computational and Data Sciences, Indian Institute of Science, Bangalore, India
| |
Collapse
|
23
|
Amin SA, Adhikari N, Gayen S, Jha T. An integrated ligand-based modelling approach to explore the structure-property relationships of influenza endonuclease inhibitors. Struct Chem 2017. [DOI: 10.1007/s11224-017-0933-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
24
|
Shoombuatong W, Prathipati P, Owasirikul W, Worachartcheewan A, Simeon S, Anuwongcharoen N, Wikberg JES, Nantasenamat C. Towards the Revival of Interpretable QSAR Models. CHALLENGES AND ADVANCES IN COMPUTATIONAL CHEMISTRY AND PHYSICS 2017. [DOI: 10.1007/978-3-319-56850-8_1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
25
|
Improved pose and affinity predictions using different protocols tailored on the basis of data availability. J Comput Aided Mol Des 2016; 30:817-828. [DOI: 10.1007/s10822-016-9982-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Accepted: 09/28/2016] [Indexed: 12/22/2022]
|
26
|
Robust design of some selective matrix metalloproteinase-2 inhibitors over matrix metalloproteinase-9 through in silico/fragment-based lead identification and de novo lead modification: Syntheses and biological assays. Bioorg Med Chem 2016; 24:4291-4309. [DOI: 10.1016/j.bmc.2016.07.023] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 07/11/2016] [Accepted: 07/12/2016] [Indexed: 12/28/2022]
|
27
|
Ekins S, Perryman AL, Clark AM, Reynolds RC, Freundlich JS. Machine Learning Model Analysis and Data Visualization with Small Molecules Tested in a Mouse Model of Mycobacterium tuberculosis Infection (2014-2015). J Chem Inf Model 2016; 56:1332-43. [PMID: 27335215 PMCID: PMC4962118 DOI: 10.1021/acs.jcim.6b00004] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
![]()
The
renewed urgency to develop new treatments for Mycobacterium
tuberculosis (Mtb)
infection has resulted in large-scale phenotypic screening and thousands
of new active compounds in vitro. The next challenge
is to identify candidates to pursue in a mouse in vivo efficacy model as a step to predicting clinical efficacy. We previously
analyzed over 70 years of this mouse in vivo efficacy
data, which we used to generate and validate machine learning models.
Curation of 60 additional small molecules with in vivo data published in 2014 and 2015 was undertaken to further test these
models. This represents a much larger test set than for the previous
models. Several computational approaches have now been applied to
analyze these molecules and compare their molecular properties beyond
those attempted previously. Our previous machine learning models have
been updated, and a novel aspect has been added in the form of mouse
liver microsomal half-life (MLM t1/2)
and in vitro-based Mtb models incorporating
cytotoxicity data that were used to predict in vivo activity for comparison. Our best Mtbin
vivo models possess fivefold ROC values > 0.7, sensitivity
> 80%, and concordance > 60%, while the best specificity value
is
>40%. Use of an MLM t1/2 Bayesian model
affords comparable results for scoring the 60 compounds tested. Combining
MLM stability and in vitroMtb models
in a novel consensus workflow in the best cases has a positive predicted
value (hit rate) > 77%. Our results indicate that Bayesian models
constructed with literature in vivoMtb data generated by different laboratories in various mouse models
can have predictive value and may be used alongside MLM t1/2 and in vitro-based Mtb models to assist in selecting antitubercular compounds with desirable in vivo efficacy. We demonstrate for the first time that
consensus models of any kind can be used to predict in vivo activity for Mtb. In addition, we describe a new
clustering method for data visualization and apply this to the in vivo training and test data, ultimately making the method
accessible in a mobile app.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery , 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States.,Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Alexander L Perryman
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| | - Alex M Clark
- Molecular Materials Informatics, Inc. , 1900 St. Jacques #302, Montreal, Quebec H3J 2S1, Canada
| | - Robert C Reynolds
- Division of Hematology and Oncology, Department of Medicine, and Department of Chemistry, College of Arts and Sciences, University of Alabama at Birmingham , 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Joel S Freundlich
- Department of Pharmacology, Physiology and Neuroscience, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States.,Division of Infectious Diseases, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School , Newark, New Jersey 07103, United States
| |
Collapse
|
28
|
Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data. Pharm Res 2016; 33:433-49. [PMID: 26415647 PMCID: PMC4712113 DOI: 10.1007/s11095-015-1800-5] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 09/22/2015] [Indexed: 02/07/2023]
Abstract
PURPOSE Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Collapse
Affiliation(s)
- Alexander L Perryman
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA
| | - Thomas P Stratton
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA.
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA.
| |
Collapse
|
29
|
Ekins S, Freundlich JS, Clark AM, Anantpadma M, Davey RA, Madrid P. Machine learning models identify molecules active against the Ebola virus in vitro. F1000Res 2016; 4:1091. [PMID: 26834994 DOI: 10.12688/f1000research.7217.2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/23/2015] [Indexed: 12/15/2022] Open
Abstract
The search for small molecule inhibitors of Ebola virus (EBOV) has led to several high throughput screens over the past 3 years. These have identified a range of FDA-approved active pharmaceutical ingredients (APIs) with anti-EBOV activity in vitro and several of which are also active in a mouse infection model. There are millions of additional commercially-available molecules that could be screened for potential activities as anti-EBOV compounds. One way to prioritize compounds for testing is to generate computational models based on the high throughput screening data and then virtually screen compound libraries. In the current study, we have generated Bayesian machine learning models with viral pseudotype entry assay and the EBOV replication assay data. We have validated the models internally and externally. We have also used these models to computationally score the MicroSource library of drugs to select those likely to be potential inhibitors. Three of the highest scoring molecules that were not in the model training sets, quinacrine, pyronaridine and tilorone, were tested in vitro and had EC50 values of 350, 420 and 230 nM, respectively. Pyronaridine is a component of a combination therapy for malaria that was recently approved by the European Medicines Agency, which may make it more readily accessible for clinical testing. Like other known antimalarial drugs active against EBOV, it shares the 4-aminoquinoline scaffold. Tilorone, is an investigational antiviral agent that has shown a broad array of biological activities including cell growth inhibition in cancer cells, antifibrotic properties, α7 nicotinic receptor agonist activity, radioprotective activity and activation of hypoxia inducible factor-1. Quinacrine is an antimalarial but also has use as an anthelmintic. Our results suggest data sets with less than 1,000 molecules can produce validated machine learning models that can in turn be utilized to identify novel EBOV inhibitors in vitro.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, NC, 27526, USA
- Collaborations Pharmaceuticals Inc, Fuquay-Varina, NC, 27526, USA
- Collaborative Drug Discovery, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ, New Jersey Medical School, Newark, NJ, 07103, USA
| | - Alex M Clark
- Molecular Materials Informatics, Inc., Montreal, 94025, Canada
| | - Manu Anantpadma
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | - Robert A Davey
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | | |
Collapse
|
30
|
Fang J, Pang X, Yan R, Lian W, Li C, Wang Q, Liu AL, Du GH. Discovery of neuroprotective compounds by machine learning approaches. RSC Adv 2016. [DOI: 10.1039/c5ra23035g] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The classification models were constructed to discover neuroprotective compounds against glutamate or H2O2-induced neurotoxicity through machine learning approaches.
Collapse
Affiliation(s)
- Jiansong Fang
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
- Institute of Clinical Pharmacology
| | - Xiaocong Pang
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
| | - Rong Yan
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
| | - Wenwen Lian
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
| | - Chao Li
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
| | - Qi Wang
- Institute of Clinical Pharmacology
- Guangzhou University of Traditional Chinese Medicine
- Guangzhou 510006
- China
| | - Ai-Lin Liu
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
- Beijing Key Laboratory of Drug Target and Screening Research
| | - Guan-Hua Du
- Institute of Materia Medica
- Chinese Academy of Medical Sciences and Peking Union Medical College
- Beijing 100050
- PR China
- Beijing Key Laboratory of Drug Target and Screening Research
| |
Collapse
|
31
|
Ekins S, Freundlich JS, Clark AM, Anantpadma M, Davey RA, Madrid P. Machine learning models identify molecules active against the Ebola virus in vitro. F1000Res 2015; 4:1091. [PMID: 26834994 DOI: 10.12688/f1000research.7217.1] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 10/15/2015] [Indexed: 12/23/2022] Open
Abstract
The search for small molecule inhibitors of Ebola virus (EBOV) has led to several high throughput screens over the past 3 years. These have identified a range of FDA-approved active pharmaceutical ingredients (APIs) with anti-EBOV activity in vitro and several of which are also active in a mouse infection model. There are millions of additional commercially-available molecules that could be screened for potential activities as anti-EBOV compounds. One way to prioritize compounds for testing is to generate computational models based on the high throughput screening data and then virtually screen compound libraries. In the current study, we have generated Bayesian machine learning models with viral pseudotype entry assay and the EBOV replication assay data. We have validated the models internally and externally. We have also used these models to computationally score the MicroSource library of drugs to select those likely to be potential inhibitors. Three of the highest scoring molecules that were not in the model training sets, quinacrine, pyronaridine and tilorone, were tested in vitro and had EC 50 values of 350, 420 and 230 nM, respectively. Pyronaridine is a component of a combination therapy for malaria that was recently approved by the European Medicines Agency, which may make it more readily accessible for clinical testing. Like other known antimalarial drugs active against EBOV, it shares the 4-aminoquinoline scaffold. Tilorone, is an investigational antiviral agent that has shown a broad array of biological activities including cell growth inhibition in cancer cells, antifibrotic properties, α7 nicotinic receptor agonist activity, radioprotective activity and activation of hypoxia inducible factor-1. Quinacrine is an antimalarial but also has use as an anthelmintic. Our results suggest data sets with less than 1,000 molecules can produce validated machine learning models that can in turn be utilized to identify novel EBOV inhibitors in vitro.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, NC, 27526, USA.,Collaborations Pharmaceuticals Inc, Fuquay-Varina, NC, 27526, USA.,Collaborative Drug Discovery, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ, New Jersey Medical School, Newark, NJ, 07103, USA
| | - Alex M Clark
- Molecular Materials Informatics, Inc., Montreal, 94025, Canada
| | - Manu Anantpadma
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | - Robert A Davey
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | | |
Collapse
|
32
|
Ekins S, Freundlich JS, Clark AM, Anantpadma M, Davey RA, Madrid P. Machine learning models identify molecules active against the Ebola virus in vitro. F1000Res 2015; 4:1091. [PMID: 26834994 PMCID: PMC4706063 DOI: 10.12688/f1000research.7217.3] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/17/2017] [Indexed: 12/21/2022] Open
Abstract
The search for small molecule inhibitors of Ebola virus (EBOV) has led to several high throughput screens over the past 3 years. These have identified a range of FDA-approved active pharmaceutical ingredients (APIs) with anti-EBOV activity
in vitro and several of which are also active in a mouse infection model. There are millions of additional commercially-available molecules that could be screened for potential activities as anti-EBOV compounds. One way to prioritize compounds for testing is to generate computational models based on the high throughput screening data and then virtually screen compound libraries. In the current study, we have generated Bayesian machine learning models with viral pseudotype entry assay and the EBOV replication assay data. We have validated the models internally and externally. We have also used these models to computationally score the MicroSource library of drugs to select those likely to be potential inhibitors. Three of the highest scoring molecules that were not in the model training sets, quinacrine, pyronaridine and tilorone, were tested
in vitro and had EC
50 values of 350, 420 and 230 nM, respectively. Pyronaridine is a component of a combination therapy for malaria that was recently approved by the European Medicines Agency, which may make it more readily accessible for clinical testing. Like other known antimalarial drugs active against EBOV, it shares the 4-aminoquinoline scaffold. Tilorone, is an investigational antiviral agent that has shown a broad array of biological activities including cell growth inhibition in cancer cells, antifibrotic properties, α7 nicotinic receptor agonist activity, radioprotective activity and activation of hypoxia inducible factor-1. Quinacrine is an antimalarial but also has use as an anthelmintic. Our results suggest data sets with less than 1,000 molecules can produce validated machine learning models that can in turn be utilized to identify novel EBOV inhibitors
in vitro.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, NC, 27526, USA.,Collaborations Pharmaceuticals Inc, Fuquay-Varina, NC, 27526, USA.,Collaborative Drug Discovery, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ, New Jersey Medical School, Newark, NJ, 07103, USA
| | - Alex M Clark
- Molecular Materials Informatics, Inc., Montreal, 94025, Canada
| | - Manu Anantpadma
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | - Robert A Davey
- Texas Biomedical Research Institute, San Antonio, TX, 78227, USA
| | | |
Collapse
|
33
|
Xiong X, Yuan H, Zhang Y, Xu J, Ran T, Liu H, Lu S, Xu A, Li H, Jiang Y, Lu T, Chen Y. Protein flexibility oriented virtual screening strategy for JAK2 inhibitors. J Mol Struct 2015. [DOI: 10.1016/j.molstruc.2015.05.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
34
|
Ekins S, Lage de Siqueira-Neto J, McCall LI, Sarker M, Yadav M, Ponder EL, Kallel EA, Kellar D, Chen S, Arkin M, Bunin BA, McKerrow JH, Talcott C. Machine Learning Models and Pathway Genome Data Base for Trypanosoma cruzi Drug Discovery. PLoS Negl Trop Dis 2015; 9:e0003878. [PMID: 26114876 PMCID: PMC4482694 DOI: 10.1371/journal.pntd.0003878] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/05/2015] [Indexed: 12/21/2022] Open
Abstract
Background Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The current clinical and preclinical pipeline for T. cruzi is extremely sparse and lacks drug target diversity. Methodology/Principal Findings In the present study we developed a computational approach that utilized data from several public whole-cell, phenotypic high throughput screens that have been completed for T. cruzi by the Broad Institute, including a single screen of over 300,000 molecules in the search for chemical probes as part of the NIH Molecular Libraries program. We have also compiled and curated relevant biological and chemical compound screening data including (i) compounds and biological activity data from the literature, (ii) high throughput screening datasets, and (iii) predicted metabolites of T. cruzi metabolic pathways. This information was used to help us identify compounds and their potential targets. We have constructed a Pathway Genome Data Base for T. cruzi. In addition, we have developed Bayesian machine learning models that were used to virtually screen libraries of compounds. Ninety-seven compounds were selected for in vitro testing, and 11 of these were found to have EC50 < 10μM. We progressed five compounds to an in vivo mouse efficacy model of Chagas disease and validated that the machine learning model could identify in vitro active compounds not in the training set, as well as known positive controls. The antimalarial pyronaridine possessed 85.2% efficacy in the acute Chagas mouse model. We have also proposed potential targets (for future verification) for this compound based on structural similarity to known compounds with targets in T. cruzi. Conclusions/ Significance We have demonstrated how combining chemoinformatics and bioinformatics for T. cruzi drug discovery can bring interesting in vivo active molecules to light that may have been overlooked. The approach we have taken is broadly applicable to other NTDs. Chagas disease is a neglected tropical disease (NTD) caused by the eukaryotic parasite Trypanosoma cruzi. The disease is endemic to Latin America but is increasingly found in North America and Europe, primarily through immigration, and the spread of this disease is bringing new attention to the need for novel, safe, and effective therapeutics to treat T. cruzi infection. We have used data from a phenotypic screen to build Bayesian models to predict anti-parasitic activity against T. cruzi in vitro. These models were used to score various small libraries of molecules. We selected less than 100 compounds for testing and found in vitro actives, some of which were tested in an in vivo efficacy model. We identified the antimalarial pyronaridine as having in vivo efficacy and provides us with a new starting point for further investigation and optimization.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, Burlingame, California, United States of America
- Collaborations in Chemistry, Fuquay-Varina, North Carolina, United States of America
- * E-mail:
| | - Jair Lage de Siqueira-Neto
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Laura-Isobel McCall
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Malabika Sarker
- SRI International, Menlo Park, California, United States of America
| | - Maneesh Yadav
- SRI International, Menlo Park, California, United States of America
| | - Elizabeth L. Ponder
- Chemistry, Engineering & Medicine for Human Health (ChEM-H), Stanford, California, United States of America
| | - E. Adam Kallel
- Collaborative Drug Discovery, Burlingame, California, United States of America
| | - Danielle Kellar
- Department of Pathology, University of California, San Francisco, San Francisco, California, United States of America
| | - Steven Chen
- Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America
| | - Michelle Arkin
- Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, United States of America
| | - Barry A. Bunin
- Collaborative Drug Discovery, Burlingame, California, United States of America
| | - James H. McKerrow
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, San Diego, California, United States of America
| | - Carolyn Talcott
- SRI International, Menlo Park, California, United States of America
| |
Collapse
|
35
|
Clark AM, Ekins S. Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL. J Chem Inf Model 2015; 55:1246-60. [PMID: 25995041 DOI: 10.1021/acs.jcim.5b00144] [Citation(s) in RCA: 63] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In an associated paper, we have described a reference implementation of Laplacian-corrected naïve Bayesian model building using extended connectivity (ECFP)- and molecular function class fingerprints of maximum diameter 6 (FCFP)-type fingerprints. As a follow-up, we have now undertaken a large-scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this, we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement. In order to test these datasets with the two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all collections. With these datasets, we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and we were able to quantify the impact of fingerprint folding on the receiver operator curve cross-validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK and several mobile apps. We have also explored some novel visualization methods which leverage the structural origins of the ECFP/FCFP fingerprints to attribute regions of a molecule responsible for positive and negative contributions to activity. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects as well as suggest potential targets for compounds derived from phenotypic screens.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Sean Ekins
- ‡Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,§Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,∥Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| |
Collapse
|
36
|
Fragment virtual screening based on Bayesian categorization for discovering novel VEGFR-2 scaffolds. Mol Divers 2015; 19:895-913. [DOI: 10.1007/s11030-015-9592-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2014] [Accepted: 03/25/2015] [Indexed: 12/24/2022]
|
37
|
Wang L, Le X, Li L, Ju Y, Lin Z, Gu Q, Xu J. Discovering New Agents Active against Methicillin-Resistant Staphylococcus aureus with Ligand-Based Approaches. J Chem Inf Model 2014; 54:3186-97. [DOI: 10.1021/ci500253q] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Affiliation(s)
- Ling Wang
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Xiu Le
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Long Li
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Yingchen Ju
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Zhongxiang Lin
- College
of Chemical Engineering, Nanjing Forestry University, Nanjing 210037, China
| | - Qiong Gu
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| | - Jun Xu
- Research
Center for Drug Discovery and Institute of Human Virology, School
of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, China
| |
Collapse
|
38
|
LBVS: an online platform for ligand-based virtual screening using publicly accessible databases. Mol Divers 2014; 18:829-40. [DOI: 10.1007/s11030-014-9545-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2014] [Accepted: 08/12/2014] [Indexed: 12/20/2022]
|
39
|
Novel Bayesian classification models for predicting compounds blocking hERG potassium channels. Acta Pharmacol Sin 2014; 35:1093-102. [PMID: 24976154 DOI: 10.1038/aps.2014.35] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2014] [Accepted: 04/10/2014] [Indexed: 02/03/2023] Open
Abstract
AIM A large number of drug-induced long QT syndromes are ascribed to blockage of hERG potassium channels. The aim of this study was to construct novel computational models to predict compounds blocking hERG channels. METHODS Doddareddy's hERG blockage data containing 2644 compounds were used, which divided into training (2389) and test (255) sets. Laplacian-corrected Bayesian classification models were constructed using Discovery Studio. The models were internally validated with the training set of compounds, and then applied to the test set for validation. Doddareddy's experimentally validated dataset with 60 compounds was used for external test set validation. RESULTS A Bayesian classification model considering the effects of four molecular properties (Mw, PPSA, ALogP and pKa_basic) as well as extended-connectivity fingerprints (ECFP_14) exhibited a global accuracy (91%), parameter sensitivity (90%) and specificity (92%) in the test set validation, and a global accuracy (58%), parameter sensitivity (61%) and specificity (57%) in the external test set validation. CONCLUSION The novel model is better than those in the literatures for predicting compounds blocking hERG channels, and can be used for large-scale prediction.
Collapse
|
40
|
Liu Z, Zheng M, Yan X, Gu Q, Gasteiger J, Tijhuis J, Maas P, Li J, Xu J. ChemStable: a web server for rule-embedded naïve Bayesian learning approach to predict compound stability. J Comput Aided Mol Des 2014; 28:941-50. [PMID: 25031075 DOI: 10.1007/s10822-014-9778-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2014] [Accepted: 07/09/2014] [Indexed: 11/26/2022]
Abstract
Predicting compound chemical stability is important because unstable compounds can lead to either false positive or to false negative conclusions in bioassays. Experimental data (COMDECOM) measured from DMSO/H2O solutions stored at 50 °C for 105 days were used to predicted stability by applying rule-embedded naïve Bayesian learning, based upon atom center fragment (ACF) features. To build the naïve Bayesian classifier, we derived ACF features from 9,746 compounds in the COMDECOM dataset. By recursively applying naïve Bayesian learning from the data set, each ACF is assigned with an expected stable probability (p(s)) and an unstable probability (p(uns)). 13,340 ACFs, together with their p(s) and p(uns) data, were stored in a knowledge base for use by the Bayesian classifier. For a given compound, its ACFs were derived from its structure connection table with the same protocol used to drive ACFs from the training data. Then, the Bayesian classifier assigned p(s) and p(uns) values to the compound ACFs by a structural pattern recognition algorithm, which was implemented in-house. Compound instability is calculated, with Bayes' theorem, based upon the p(s) and p(uns) values of the compound ACFs. We were able to achieve performance with an AUC value of 84% and a tenfold cross validation accuracy of 76.5%. To reduce false negatives, a rule-based approach has been embedded in the classifier. The rule-based module allows the program to improve its predictivity by expanding its compound instability knowledge base, thus further reducing the possibility of false negatives. To our knowledge, this is the first in silico prediction service for the prediction of the stabilities of organic compounds.
Collapse
Affiliation(s)
- Zhihong Liu
- Research Center for Drug Discovery, School of Pharmaceutical Sciences, Sun Yat-sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | | | | |
Collapse
|
42
|
Martins F, Santos S, Ventura C, Elvas-Leitão R, Santos L, Vitorino S, Reis M, Miranda V, Correia HF, Aires-de-Sousa J, Kovalishyn V, Latino DA, Ramos J, Viveiros M. Design, synthesis and biological evaluation of novel isoniazid derivatives with potent antitubercular activity. Eur J Med Chem 2014; 81:119-38. [DOI: 10.1016/j.ejmech.2014.04.077] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Revised: 03/08/2014] [Accepted: 04/26/2014] [Indexed: 11/28/2022]
|
43
|
Predicting mTOR inhibitors with a classifier using recursive partitioning and Naïve Bayesian approaches. PLoS One 2014; 9:e95221. [PMID: 24819222 PMCID: PMC4018356 DOI: 10.1371/journal.pone.0095221] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2014] [Accepted: 03/25/2014] [Indexed: 01/31/2023] Open
Abstract
Background Mammalian target of rapamycin (mTOR) is a central controller of cell growth, proliferation, metabolism, and angiogenesis. Thus, there is a great deal of interest in developing clinical drugs based on mTOR. In this paper, in silico models based on multi-scaffolds were developed to predict mTOR inhibitors or non-inhibitors. Methods First 1,264 diverse compounds were collected and categorized as mTOR inhibitors and non-inhibitors. Two methods, recursive partitioning (RP) and naïve Bayesian (NB), were used to build combinatorial classification models of mTOR inhibitors versus non-inhibitors using physicochemical descriptors, fingerprints, and atom center fragments (ACFs). Results A total of 253 models were constructed and the overall predictive accuracies of the best models were more than 90% for both the training set of 964 and the external test set of 300 diverse compounds. The scaffold hopping abilities of the best models were successfully evaluated through predicting 37 new recently published mTOR inhibitors. Compared with the best RP and Bayesian models, the classifier based on ACFs and Bayesian shows comparable or slightly better in performance and scaffold hopping abilities. A web server was developed based on the ACFs and Bayesian method (http://rcdd.sysu.edu.cn/mtor/). This web server can be used to predict whether a compound is an mTOR inhibitor or non-inhibitor online. Conclusion In silico models were constructed to predict mTOR inhibitors using recursive partitioning and naïve Bayesian methods, and a web server (mTOR Predictor) was also developed based on the best model results. Compound prediction or virtual screening can be carried out through our web server. Moreover, the favorable and unfavorable fragments for mTOR inhibitors obtained from Bayesian classifiers will be helpful for lead optimization or the design of new mTOR inhibitors.
Collapse
|
44
|
Ekins S, Pottorf R, Reynolds R, Williams AJ, Clark AM, Freundlich JS. Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:1070-82. [PMID: 24665947 PMCID: PMC4004261 DOI: 10.1021/ci500077v] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Indexed: 02/07/2023]
Abstract
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative
Drug Discovery, 1633
Bayshore Highway, Suite 342, Burlingame, California 94010, United States
- Collaborations
in Chemistry, 5616 Hilltop
Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Richard Pottorf
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Robert
C. Reynolds
- Department
of Chemistry, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Antony J. Williams
- Royal
Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, United States
| | - Alex M. Clark
- Molecular
Materials Informatics, 1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1
| | - Joel S. Freundlich
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
- Department
of Medicine, Center for Emerging and Reemerging
Pathogens, Rutgers University − New
Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| |
Collapse
|
45
|
Kandel DD, Raychaudhury C, Pal D. Two new atom centered fragment descriptors and scoring function enhance classification of antibacterial activity. J Mol Model 2014; 20:2164. [PMID: 24664120 DOI: 10.1007/s00894-014-2164-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Accepted: 01/30/2014] [Indexed: 11/26/2022]
Abstract
Classification of pharmacologic activity of a chemical compound is an essential step in any drug discovery process. We develop two new atom-centered fragment descriptors (vertex indices)--one based solely on topological considerations without discriminating atom or bond types, and another based on topological and electronic features. We also assess their usefulness by devising a method to rank and classify molecules with regard to their antibacterial activity. Classification performances of our method are found to be superior compared to two previous studies on large heterogeneous data sets for hit finding and hit-to-lead studies even though we use much fewer parameters. It is found that for hit finding studies topological features (simple graph) alone provide significant discriminating power, and for hit-to-lead process small but consistent improvement can be made by additionally including electronic features (colored graph). Our approach is simple, interpretable, and suitable for design of molecules as we do not use any physicochemical properties. The singular use of vertex index as descriptor, novel range based feature extraction, and rigorous statistical validation are the key elements of this study.
Collapse
|
46
|
Theoretical approaches to identify the potent scaffold for human sirtuin1 activator: Bayesian modeling and density functional theory. Med Chem Res 2014. [DOI: 10.1007/s00044-014-0983-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
47
|
Ekins S, Casey AC, Roberts D, Parish T, Bunin BA. Bayesian models for screening and TB Mobile for target inference with Mycobacterium tuberculosis. Tuberculosis (Edinb) 2014; 94:162-9. [PMID: 24440548 PMCID: PMC4394018 DOI: 10.1016/j.tube.2013.12.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 12/04/2013] [Accepted: 12/09/2013] [Indexed: 12/19/2022]
Abstract
The search for compounds active against Mycobacterium tuberculosis is reliant upon high-throughput screening (HTS) in whole cells. We have used Bayesian machine learning models which can predict anti-tubercular activity to filter an internal library of over 150,000 compounds prior to in vitro testing. We used this to select and test 48 compounds in vitro; 11 were active with MIC values ranging from 0.4 μM to 10.2 μM, giving a high hit rate of 22.9%. Among the hits, we identified several compounds belonging to the same series including five quinolones (including ciprofloxacin), three molecules with long aliphatic linkers and three singletons. This approach represents a rapid method to prioritize compounds for testing that can be used alongside medicinal chemistry insight and other filters to identify active molecules. Such models can significantly increase the hit rate of HTS, above the usual 1% or lower rates seen. In addition, the potential targets for the 11 molecules were predicted using TB Mobile and clustering alongside a set of over 740 molecules with known M. tuberculosis target annotations. These predictions may serve as a mechanism for prioritizing compounds for further optimization.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA; Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
| | - Allen C Casey
- Infectious Disease Research Institute, Seattle, WA, USA
| | - David Roberts
- Infectious Disease Research Institute, Seattle, WA, USA
| | - Tanya Parish
- Infectious Disease Research Institute, Seattle, WA, USA
| | - Barry A Bunin
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA
| |
Collapse
|
48
|
Samat N, Tan PJ, Shaari K, Abas F, Lee HB. Prioritization of natural extracts by LC-MS-PCA for the identification of new photosensitizers for photodynamic therapy. Anal Chem 2014; 86:1324-31. [PMID: 24405504 DOI: 10.1021/ac403709a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Photodynamic therapy (PDT) is an alternative treatment for cancer that involves administration of a photosensitive drug or photosensitizer that localizes at the tumor tissue followed by in situ excitation at an appropriate wavelength of light. Tumour tissues are then killed by cytotoxic reactive oxygen species generated by the photosensitizer. Targeted excitation and photokilling of affected tissues is achieved through focal light irradiation, thereby minimizing systemic side effects to the normal healthy tissues. Currently, there are only a small number of photosensitizers that are in the clinic and many of these share the same structural core based on cyclic tetrapyrroles. This paper describes how metabolic tools are utilized to prioritize natural extracts to search for structurally new photosensitizers from Malaysian biodiversity. As proof of concept, we analyzed 278 photocytotoxic extracts using a hyphenated technique of liquid chromatography-mass spectrometry coupled with principal component analysis (LC-MS-PCA) and prioritized 27 extracts that potentially contained new photosensitizers for chemical dereplication using an in-house UPLC-PDA-MS-Photocytotoxic assay platform. This led to the identification of 2 new photosensitizers with cyclic tetrapyrrolic structures, thereby demonstrating the feasibility of the metabolic approach.
Collapse
Affiliation(s)
- Norazwana Samat
- Cancer Research Initiatives Foundation (CARIF), Drug Discovery Laboratory, 12A, Jalan TP 5, Taman Perindustrian UEP, 47600 Subang Jaya, Selangor Darul Ehsan, Malaysia
| | | | | | | | | |
Collapse
|
49
|
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 2013; 53:3054-63. [PMID: 24144044 DOI: 10.1021/ci400480s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | | | | |
Collapse
|
50
|
Ekins S, Freundlich JS, Hobrath JV, Lucile White E, Reynolds RC. Combining computational methods for hit to lead optimization in Mycobacterium tuberculosis drug discovery. Pharm Res 2013; 31:414-35. [PMID: 24132686 DOI: 10.1007/s11095-013-1172-7] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2013] [Accepted: 07/28/2013] [Indexed: 12/19/2022]
Abstract
PURPOSE Tuberculosis treatments need to be shorter and overcome drug resistance. Our previous large scale phenotypic high-throughput screening against Mycobacterium tuberculosis (Mtb) has identified 737 active compounds and thousands that are inactive. We have used this data for building computational models as an approach to minimize the number of compounds tested. METHODS A cheminformatics clustering approach followed by Bayesian machine learning models (based on publicly available Mtb screening data) was used to illustrate that application of these models for screening set selections can enrich the hit rate. RESULTS In order to explore chemical diversity around active cluster scaffolds of the dose-response hits obtained from our previous Mtb screens a set of 1924 commercially available molecules have been selected and evaluated for antitubercular activity and cytotoxicity using Vero, THP-1 and HepG2 cell lines with 4.3%, 4.2% and 2.7% hit rates, respectively. We demonstrate that models incorporating antitubercular and cytotoxicity data in Vero cells can significantly enrich the selection of non-toxic actives compared to random selection. Across all cell lines, the Molecular Libraries Small Molecule Repository (MLSMR) and cytotoxicity model identified ~10% of the hits in the top 1% screened (>10 fold enrichment). We also showed that seven out of nine Mtb active compounds from different academic published studies and eight out of eleven Mtb active compounds from a pharmaceutical screen (GSK) would have been identified by these Bayesian models. CONCLUSION Combining clustering and Bayesian models represents a useful strategy for compound prioritization and hit-to lead optimization of antitubercular agents.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California, 94010, USA,
| | | | | | | | | |
Collapse
|