1
|
Patel JS, Norambuena J, Al-Tameemi H, Ahn YM, Perryman AL, Wang X, Daher SS, Occi J, Russo R, Park S, Zimmerman M, Ho HP, Perlin DS, Dartois V, Ekins S, Kumar P, Connell N, Boyd JM, Freundlich JS. Bayesian Modeling and Intrabacterial Drug Metabolism Applied to Drug-Resistant Staphylococcus aureus. ACS Infect Dis 2021; 7:2508-2521. [PMID: 34342426 DOI: 10.1021/acsinfecdis.1c00265] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
We present the application of Bayesian modeling to identify chemical tools and/or drug discovery entities pertinent to drug-resistant Staphylococcus aureus infections. The quinoline JSF-3151 was predicted by modeling and then empirically demonstrated to be active against in vitro cultured clinical methicillin- and vancomycin-resistant strains while also exhibiting efficacy in a mouse peritonitis model of methicillin-resistant S. aureus infection. We highlight the utility of an intrabacterial drug metabolism (IBDM) approach to probe the mechanism by which JSF-3151 is transformed within the bacteria. We also identify and then validate two mechanisms of resistance in S. aureus: one mechanism involves increased expression of a lipocalin protein, and the other arises from the loss of function of an azoreductase. The computational and experimental approaches, discovery of an antibacterial agent, and elucidated resistance mechanisms collectively hold promise to advance our understanding of therapeutic regimens for drug-resistant S. aureus.
Collapse
Affiliation(s)
- Jimmy S. Patel
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
| | - Javiera Norambuena
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08901, United States
| | - Hassan Al-Tameemi
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08901, United States
| | - Yong-Mo Ahn
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
| | - Alexander L. Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
| | - Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
| | - Samer S. Daher
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
| | - James Occi
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University − New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Riccardo Russo
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University − New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Steven Park
- Public Health Research Institute, Rutgers University − New Jersey Medical School, 225 Warren St, Newark, New Jersey 07103, United States
| | - Matthew Zimmerman
- Public Health Research Institute, Rutgers University − New Jersey Medical School, 225 Warren St, Newark, New Jersey 07103, United States
| | - Hsin-Pin Ho
- Public Health Research Institute, Rutgers University − New Jersey Medical School, 225 Warren St, Newark, New Jersey 07103, United States
| | - David S. Perlin
- Public Health Research Institute, Rutgers University − New Jersey Medical School, 225 Warren St, Newark, New Jersey 07103, United States
| | - Véronique Dartois
- Public Health Research Institute, Rutgers University − New Jersey Medical School, 225 Warren St, Newark, New Jersey 07103, United States
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Pradeep Kumar
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University − New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Nancy Connell
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University − New Jersey Medical School, Newark, New Jersey 07103, United States
| | - Jeffrey M. Boyd
- Department of Biochemistry and Microbiology, Rutgers, The State University of New Jersey, New Brunswick, New Jersey 08901, United States
| | - Joel S. Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University − New Jersey Medical School, 185 South Orange Ave, Newark, New Jersey 07103, United States
- Department of Medicine, Center for Emerging and Re-emerging Pathogens, Rutgers University − New Jersey Medical School, Newark, New Jersey 07103, United States
| |
Collapse
|
2
|
Pereira JC, Daher SS, Zorn KM, Sherwood M, Russo R, Perryman AL, Wang X, Freundlich MJ, Ekins S, Freundlich JS. Machine Learning Platform to Discover Novel Growth Inhibitors of Neisseria gonorrhoeae. Pharm Res 2020; 37:141. [PMID: 32661900 DOI: 10.1007/s11095-020-02876-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 07/06/2020] [Indexed: 12/17/2022]
Abstract
PURPOSE To advance fundamental biological and translational research with the bacterium Neisseria gonorrhoeae through the prediction of novel small molecule growth inhibitors via naïve Bayesian modeling methodology. METHODS Inspection and curation of data from the publicly available ChEMBL web site for small molecule growth inhibition data of the bacterium Neisseria gonorrhoeae resulted in a training set for the construction of machine learning models. A naïve Bayesian model for bacterial growth inhibition was utilized in a workflow to predict novel antibacterial agents against this bacterium of global health relevance from a commercial library of >105 drug-like small molecules. Follow-up efforts involved empirical assessment of the predictions and validation of the hits. RESULTS Specifically, two small molecules were found that exhibited promising activity profiles and represent novel chemotypes for agents against N. gonorrrhoeae. CONCLUSIONS This represents, to the best of our knowledge, the first machine learning approach to successfully predict novel growth inhibitors of this bacterium. To assist the chemical tool and drug discovery fields, we have made our curated training set available as part of the Supplementary Material and the Bayesian model is accessible via the web. Graphical Abstract.
Collapse
Affiliation(s)
- Janaina Cruz Pereira
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Samer S Daher
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Kimberley M Zorn
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Matthew Sherwood
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Riccardo Russo
- Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA
| | - Alexander L Perryman
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Repare Therapeutics,, 7210 Rue Frederick-Banting Suite 100, Montreal, QC, H4S 2A1, Canada
| | - Xin Wang
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.,Department of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Madeleine J Freundlich
- Stuart Country Day School of the Sacred Heart, 1200 Stuart Road, Princeton, NJ, 08540, USA
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.,Collaborations in Chemistry, Inc. 5616 Hilltop Needmore Road, Fuquay-, Varina, NC, 27526, USA
| | - Joel S Freundlich
- Department of Pharmacology, Physiology, and Neuroscience, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA. .,Division of Infectious Disease, Department of Medicine and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University New Jersey Medical School, I-503 185 South Orange Avenue, Newark, NJ, 07103, USA.
| |
Collapse
|
3
|
GCAC: galaxy workflow system for predictive model building for virtual screening. BMC Bioinformatics 2019; 19:550. [PMID: 30717669 PMCID: PMC7394323 DOI: 10.1186/s12859-018-2492-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2018] [Accepted: 11/13/2018] [Indexed: 11/16/2022] Open
Abstract
Background Traditional drug discovery approaches are time-consuming, tedious and expensive. Identifying a potential drug-like molecule using high throughput screening (HTS) with high confidence is always a challenging task in drug discovery and cheminformatics. A small percentage of molecules that pass the clinical trial phases receives FDA approval. This whole process takes 10–12 years and millions of dollar of investment. The inconsistency in HTS is also a challenge for reproducible results. Reproducible research in computational research is highly desirable as a measure to evaluate scientific claims and published findings. This paper describes the development and availability of a knowledge based predictive model building system using the R Statistical Computing Environment and its ensured reproducibility using Galaxy workflow system. Results We describe a web-enabled data mining analysis pipeline which employs reproducible research approaches to confront the issue of availability of tools in high throughput virtual screening. The pipeline, named as “Galaxy for Compound Activity Classification (GCAC)” includes descriptor calculation, feature selection, model building, and screening to extract potent candidates, by leveraging the combined capabilities of R statistical packages and literate programming tools contained within a workflow system environment with automated configuration. Conclusion GCAC can serve as a standard for screening drug candidates using predictive model building under galaxy environment, allowing for easy installation and reproducibility. A demo site of the tool is available at http://ccbb.jnu.ac.in/gcac Electronic supplementary material The online version of this article (10.1186/s12859-018-2492-8) contains supplementary material, which is available to authorized users.
Collapse
|
4
|
Gad A, Manuel AT, K R J, John L, R S, V G SP, U C AJ. Virtual screening and repositioning of inconclusive molecules of beta-lactamase Bioassays-A data mining approach. Comput Biol Chem 2017; 70:65-88. [PMID: 28822333 DOI: 10.1016/j.compbiolchem.2017.07.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Revised: 03/17/2017] [Accepted: 07/26/2017] [Indexed: 10/19/2022]
Abstract
This study focuses on the best possible way forward in utilizing inconclusive molecules of PubChem bioassays AID 1332, AID 434987 and AID 434955, which are related to beta-lactamase inhibitors of Mycobacterium tuberculosis (Mtb). The inadequacy in the experimental methods that were observed during the invitro screening resulted in an inconclusive dataset. This could be due to certain moieties present within the molecules. In order to reconsider such molecules, insilico methods can be suggested in place of invitro methods For instance, datamining and medicinal chemistry methods: have been adopted to prioritise the inconclusive dataset into active or inactive molecules. These include the Random Forest algorithm for dataminning, Lilly MedChem rules for virtually screening out the promiscuity, and Self Organizing Maps (SOM) for clustering the active molecules and enlisting them for repositioning through the use of artificial neural networks. These repositioned molecules could then be prioritized for downstream drug discovery analysis.
Collapse
Affiliation(s)
- Akshata Gad
- CSIR-OSDD Research Unit, Indian Institute of Science Campus, Bengaluru, Karnataka, 560012, India
| | - Andrew Titus Manuel
- Open Source Pharma Foundation, 22-WTC, Brigade Campus, Malleshwaram, Bengaluru, Karnataka, 560055, India
| | - Jinuraj K R
- Research and Development Centre, Bharathiar University, Marudhamalai Rd, Coimbatore, Tamil Nadu, 641046, India
| | - Lijo John
- CSIR-OSDD Research Unit, Indian Institute of Science Campus, Bengaluru, Karnataka, 560012, India
| | - Sajeev R
- CSIR-OSDD Research Unit, Indian Institute of Science Campus, Bengaluru, Karnataka, 560012, India
| | - Shanmuga Priya V G
- Department of Biotechnology, KLE's Dr. M.S.S. College of Engineering and Technology, Belgaum, Karnataka, 590008, India
| | - Abdul Jaleel U C
- Open Source Pharma Foundation, 22-WTC, Brigade Campus, Malleshwaram, Bengaluru, Karnataka, 560055, India.
| |
Collapse
|
5
|
Collaborative drug discovery for More Medicines for Tuberculosis (MM4TB). Drug Discov Today 2016; 22:555-565. [PMID: 27884746 DOI: 10.1016/j.drudis.2016.10.009] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Revised: 10/11/2016] [Accepted: 10/21/2016] [Indexed: 01/30/2023]
Abstract
Neglected disease drug discovery is generally poorly funded compared with major diseases and hence there is an increasing focus on collaboration and precompetitive efforts such as public-private partnerships (PPPs). The More Medicines for Tuberculosis (MM4TB) project is one such collaboration funded by the EU with the goal of discovering new drugs for tuberculosis. Collaborative Drug Discovery has provided a commercial web-based platform called CDD Vault which is a hosted collaborative solution for securely sharing diverse chemistry and biology data. Using CDD Vault alongside other commercial and free cheminformatics tools has enabled support of this and other large collaborative projects, aiding drug discovery efforts and fostering collaboration. We will describe CDD's efforts in assisting with the MM4TB project.
Collapse
|
6
|
Jamal S, Arora S, Scaria V. Computational Analysis and Predictive Cheminformatics Modeling of Small Molecule Inhibitors of Epigenetic Modifiers. PLoS One 2016; 11:e0083032. [PMID: 27622288 PMCID: PMC5021286 DOI: 10.1371/journal.pone.0083032] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 10/30/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND The dynamic and differential regulation and expression of genes is majorly governed by the complex interactions of a subset of biomolecules in the cell operating at multiple levels starting from genome organisation to protein post-translational regulation. The regulatory layer contributed by the epigenetic layer has been one of the favourite areas of interest recently. This layer of regulation as we know today largely comprises of DNA modifications, histone modifications and noncoding RNA regulation and the interplay between each of these major components. Epigenetic regulation has been recently shown to be central to development of a number of disease processes. The availability of datasets of high-throughput screens for molecules for biological properties offer a new opportunity to develop computational methodologies which would enable in-silico screening of large molecular libraries. METHODS In the present study, we have used data from high throughput screens for the inhibitors of epigenetic modifiers. Computational predictive models were constructed based on the molecular descriptors. Machine learning algorithms for supervised training, Naive Bayes and Random Forest, were used to generate predictive models for the small molecule inhibitors of histone methyl-transferases and demethylases. Random forest, with the accuracy of 80%, was identified as the most accurate classifier. Further we complemented the study with substructure search approach filtering out the probable pharmacophores from the active molecules leading to drug molecules. RESULTS We show that effective use of appropriate computational algorithms could be used to learn molecular and structural correlates of biological activities of small molecules. The computational models developed could be potentially used to screen and identify potential new biological activities of molecules from large molecular libraries and prioritise them for in-depth biological assays. To the best of our knowledge, this is the first and most comprehensive computational analysis towards understanding activities of small molecules inhibitors of epigenetic modifiers.
Collapse
Affiliation(s)
- Salma Jamal
- CSIR Open Source Drug Discovery Unit (CSIR-OSDD), Anusandhan Bhawan, Delhi, India
| | - Sonam Arora
- Delhi Technological University, Delhi, India
| | - Vinod Scaria
- GN Ramachandran Knowledge Center for Genome Informatics, CSIR Institute of Genomics and Integrative Biology (CSIR-IGIB), Delhi, India
- * E-mail:
| |
Collapse
|
7
|
Perryman AL, Stratton TP, Ekins S, Freundlich JS. Predicting Mouse Liver Microsomal Stability with "Pruned" Machine Learning Models and Public Data. Pharm Res 2016; 33:433-49. [PMID: 26415647 PMCID: PMC4712113 DOI: 10.1007/s11095-015-1800-5] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 09/22/2015] [Indexed: 02/07/2023]
Abstract
PURPOSE Mouse efficacy studies are a critical hurdle to advance translational research of potential therapeutic compounds for many diseases. Although mouse liver microsomal (MLM) stability studies are not a perfect surrogate for in vivo studies of metabolic clearance, they are the initial model system used to assess metabolic stability. Consequently, we explored the development of machine learning models that can enhance the probability of identifying compounds possessing MLM stability. METHODS Published assays on MLM half-life values were identified in PubChem, reformatted, and curated to create a training set with 894 unique small molecules. These data were used to construct machine learning models assessed with internal cross-validation, external tests with a published set of antitubercular compounds, and independent validation with an additional diverse set of 571 compounds (PubChem data on percent metabolism). RESULTS "Pruning" out the moderately unstable / moderately stable compounds from the training set produced models with superior predictive power. Bayesian models displayed the best predictive power for identifying compounds with a half-life ≥1 h. CONCLUSIONS Our results suggest the pruning strategy may be of general benefit to improve test set enrichment and provide machine learning models with enhanced predictive value for the MLM stability of small organic molecules. This study represents the most exhaustive study to date of using machine learning approaches with MLM data from public sources.
Collapse
Affiliation(s)
- Alexander L Perryman
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA
| | - Thomas P Stratton
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA
| | - Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Division of Infectious Disease, Department of Medicine, and the Ruy V. Lourenço Center for the Study of Emerging and Re-emerging Pathogens, Rutgers University-New Jersey Medical School, Newark, New Jersey, 07103, USA.
- Department of Pharmacology & Physiology, Rutgers University-New Jersey Medical School, Medical Sciences Building, I-503, 185 South Orange Ave., Newark, New Jersey, 07103, USA.
| |
Collapse
|
8
|
Ekins S, Madrid PB, Sarker M, Li SG, Mittal N, Kumar P, Wang X, Stratton TP, Zimmerman M, Talcott C, Bourbon P, Travers M, Yadav M, Freundlich JS. Combining Metabolite-Based Pharmacophores with Bayesian Machine Learning Models for Mycobacterium tuberculosis Drug Discovery. PLoS One 2015; 10:e0141076. [PMID: 26517557 PMCID: PMC4627656 DOI: 10.1371/journal.pone.0141076] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2015] [Accepted: 10/05/2015] [Indexed: 12/15/2022] Open
Abstract
Integrated computational approaches for Mycobacterium tuberculosis (Mtb) are useful to identify new molecules that could lead to future tuberculosis (TB) drugs. Our approach uses information derived from the TBCyc pathway and genome database, the Collaborative Drug Discovery TB database combined with 3D pharmacophores and dual event Bayesian models of whole-cell activity and lack of cytotoxicity. We have prioritized a large number of molecules that may act as mimics of substrates and metabolites in the TB metabolome. We computationally searched over 200,000 commercial molecules using 66 pharmacophores based on substrates and metabolites from Mtb and further filtering with Bayesian models. We ultimately tested 110 compounds in vitro that resulted in two compounds of interest, BAS 04912643 and BAS 00623753 (MIC of 2.5 and 5 μg/mL, respectively). These molecules were used as a starting point for hit-to-lead optimization. The most promising class proved to be the quinoxaline di-N-oxides, evidenced by transcriptional profiling to induce mRNA level perturbations most closely resembling known protonophores. One of these, SRI58 exhibited an MIC = 1.25 μg/mL versus Mtb and a CC50 in Vero cells of >40 μg/mL, while featuring fair Caco-2 A-B permeability (2.3 x 10−6 cm/s), kinetic solubility (125 μM at pH 7.4 in PBS) and mouse metabolic stability (63.6% remaining after 1 h incubation with mouse liver microsomes). Despite demonstration of how a combined bioinformatics/cheminformatics approach afforded a small molecule with promising in vitro profiles, we found that SRI58 did not exhibit quantifiable blood levels in mice.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery Inc., 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, United States of America
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, United States of America
- * E-mail: (SE); (PBM); (JSF)
| | - Peter B. Madrid
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
- * E-mail: (SE); (PBM); (JSF)
| | - Malabika Sarker
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Shao-Gang Li
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Nisha Mittal
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Pradeep Kumar
- Department of Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Xin Wang
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Thomas P. Stratton
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
| | - Matthew Zimmerman
- Public Health Research Institute, Rutgers University–New Jersey Medical School, Newark, NJ, 07103, United States of America
| | - Carolyn Talcott
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Pauline Bourbon
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Mike Travers
- Collaborative Drug Discovery Inc., 1633 Bayshore Highway, Suite 342, Burlingame, CA, 94010, United States of America
| | - Maneesh Yadav
- SRI International, 333 Ravenswood Avenue, Menlo Park, CA, 94025, United States of America
| | - Joel S. Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, Rutgers University–New Jersey Medical School, 185 South Orange Avenue, Newark, NJ, 07103, United States of America
- * E-mail: (SE); (PBM); (JSF)
| |
Collapse
|
9
|
Clark AM, Ekins S. Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL. J Chem Inf Model 2015; 55:1246-60. [PMID: 25995041 DOI: 10.1021/acs.jcim.5b00144] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
In an associated paper, we have described a reference implementation of Laplacian-corrected naïve Bayesian model building using extended connectivity (ECFP)- and molecular function class fingerprints of maximum diameter 6 (FCFP)-type fingerprints. As a follow-up, we have now undertaken a large-scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this, we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement. In order to test these datasets with the two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all collections. With these datasets, we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and we were able to quantify the impact of fingerprint folding on the receiver operator curve cross-validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK and several mobile apps. We have also explored some novel visualization methods which leverage the structural origins of the ECFP/FCFP fingerprints to attribute regions of a molecule responsible for positive and negative contributions to activity. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects as well as suggest potential targets for compounds derived from phenotypic screens.
Collapse
Affiliation(s)
- Alex M Clark
- †Molecular Materials Informatics, Inc., 1900 St. Jacques No. 302, Montreal H3J 2S1, Quebec, Canada
| | - Sean Ekins
- ‡Collaborations Pharmaceuticals, Inc., 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,§Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States.,∥Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| |
Collapse
|
10
|
Ekins S, Freundlich JS, Coffee M. A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus. F1000Res 2014; 3:277. [PMID: 25653841 PMCID: PMC4304229 DOI: 10.12688/f1000research.5741.2] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 12/12/2014] [Indexed: 01/01/2023] Open
Abstract
We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus
in vitro and
in vivo and we propose that this hypothesis could be readily tested.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, NC, 27526, USA ; Collaborative Drug Discovery, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ - New Jersey Medical School, NJ, 07103, USA
| | - Megan Coffee
- Center for Infectious Diseases and Emerging Readiness, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
11
|
Ekins S, Freundlich JS, Coffee M. A common feature pharmacophore for FDA-approved drugs inhibiting the Ebola virus. F1000Res 2014; 3:277. [PMID: 25653841 DOI: 10.12688/f1000research.5741.1] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/14/2014] [Indexed: 01/05/2023] Open
Abstract
We are currently faced with a global infectious disease crisis which has been anticipated for decades. While many promising biotherapeutics are being tested, the search for a small molecule has yet to deliver an approved drug or therapeutic for the Ebola or similar filoviruses that cause haemorrhagic fever. Two recent high throughput screens published in 2013 did however identify several hits that progressed to animal studies that are FDA approved drugs used for other indications. The current computational analysis uses these molecules from two different structural classes to construct a common features pharmacophore. This ligand-based pharmacophore implicates a possible common target or mechanism that could be further explored. A recent structure based design project yielded nine co-crystal structures of pyrrolidinone inhibitors bound to the viral protein 35 (VP35). When receptor-ligand pharmacophores based on the analogs of these molecules and the protein structures were constructed, the molecular features partially overlapped with the common features of solely ligand-based pharmacophore models based on FDA approved drugs. These previously identified FDA approved drugs with activity against Ebola were therefore docked into this protein. The antimalarials chloroquine and amodiaquine docked favorably in VP35. We propose that these drugs identified to date as inhibitors of the Ebola virus may be targeting VP35. These computational models may provide preliminary insights into the molecular features that are responsible for their activity against Ebola virus in vitro and in vivo and we propose that this hypothesis could be readily tested.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, Fuquay-Varina, NC, 27526, USA ; Collaborative Drug Discovery, Burlingame, CA, 94010, USA
| | - Joel S Freundlich
- Departments of Pharmacology & Physiology and Medicine, Center for Emerging and Reemerging Pathogens, UMDNJ - New Jersey Medical School, NJ, 07103, USA
| | - Megan Coffee
- Center for Infectious Diseases and Emerging Readiness, University of California, Berkeley, CA, 94720, USA
| |
Collapse
|
12
|
Ekins S, Clark AM, Swamidass SJ, Litterman N, Williams AJ. Bigger data, collaborative tools and the future of predictive drug discovery. J Comput Aided Mol Des 2014; 28:997-1008. [PMID: 24943138 PMCID: PMC4198464 DOI: 10.1007/s10822-014-9762-y] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2014] [Accepted: 06/09/2014] [Indexed: 12/31/2022]
Abstract
Over the past decade we have seen a growth in the provision of chemistry data and cheminformatics tools as either free websites or software as a service commercial offerings. These have transformed how we find molecule-related data and use such tools in our research. There have also been efforts to improve collaboration between researchers either openly or through secure transactions using commercial tools. A major challenge in the future will be how such databases and software approaches handle larger amounts of data as it accumulates from high throughput screening and enables the user to draw insights, enable predictions and move projects forward. We now discuss how information from some drug discovery datasets can be made more accessible and how privacy of data should not overwhelm the desire to share it at an appropriate time with collaborators. We also discuss additional software tools that could be made available and provide our thoughts on the future of predictive drug discovery in this age of big data. We use some examples from our own research on neglected diseases, collaborations, mobile apps and algorithm development to illustrate these ideas.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC, 27526, USA,
| | | | | | | | | |
Collapse
|
13
|
Ekins S, Freundlich JS, Reynolds RC. Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:2157-65. [PMID: 24968215 DOI: 10.1021/ci500264r] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25 000-350 000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry , 5616 Hilltop Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | | | | |
Collapse
|
14
|
Ekins S, Pottorf R, Reynolds R, Williams AJ, Clark AM, Freundlich JS. Looking back to the future: predicting in vivo efficacy of small molecules versus Mycobacterium tuberculosis. J Chem Inf Model 2014; 54:1070-82. [PMID: 24665947 PMCID: PMC4004261 DOI: 10.1021/ci500077v] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Indexed: 02/07/2023]
Abstract
Selecting and translating in vitro leads for a disease into molecules with in vivo activity in an animal model of the disease is a challenge that takes considerable time and money. As an example, recent years have seen whole-cell phenotypic screens of millions of compounds yielding over 1500 inhibitors of Mycobacterium tuberculosis (Mtb). These must be prioritized for testing in the mouse in vivo assay for Mtb infection, a validated model utilized to select compounds for further testing. We demonstrate learning from in vivo active and inactive compounds using machine learning classification models (Bayesian, support vector machines, and recursive partitioning) consisting of 773 compounds. The Bayesian model predicted 8 out of 11 additional in vivo actives not included in the model as an external test set. Curation of 70 years of Mtb data can therefore provide statistically robust computational models to focus resources on in vivo active small molecule antituberculars. This highlights a cost-effective predictor for in vivo testing elsewhere in other diseases.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative
Drug Discovery, 1633
Bayshore Highway, Suite 342, Burlingame, California 94010, United States
- Collaborations
in Chemistry, 5616 Hilltop
Needmore Road, Fuquay-Varina, North Carolina 27526, United States
| | - Richard Pottorf
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| | - Robert
C. Reynolds
- Department
of Chemistry, University of Alabama at Birmingham, 1530 Third Avenue South, Birmingham, Alabama 35294-1240, United States
| | - Antony J. Williams
- Royal
Society of Chemistry, 904 Tamaras Circle, Wake Forest, North Carolina 27587, United States
| | - Alex M. Clark
- Molecular
Materials Informatics, 1900 St. Jacques #302, Montreal, Quebec, Canada H3J 2S1
| | - Joel S. Freundlich
- Department
of Pharmacology & Physiology, Rutgers
University − New Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
- Department
of Medicine, Center for Emerging and Reemerging
Pathogens, Rutgers University − New
Jersey Medical School, 185 South Orange Avenue, Newark, New Jersey 07103, United States
| |
Collapse
|
15
|
Ekins S, Casey AC, Roberts D, Parish T, Bunin BA. Bayesian models for screening and TB Mobile for target inference with Mycobacterium tuberculosis. Tuberculosis (Edinb) 2014; 94:162-9. [PMID: 24440548 PMCID: PMC4394018 DOI: 10.1016/j.tube.2013.12.001] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 12/04/2013] [Accepted: 12/09/2013] [Indexed: 12/19/2022]
Abstract
The search for compounds active against Mycobacterium tuberculosis is reliant upon high-throughput screening (HTS) in whole cells. We have used Bayesian machine learning models which can predict anti-tubercular activity to filter an internal library of over 150,000 compounds prior to in vitro testing. We used this to select and test 48 compounds in vitro; 11 were active with MIC values ranging from 0.4 μM to 10.2 μM, giving a high hit rate of 22.9%. Among the hits, we identified several compounds belonging to the same series including five quinolones (including ciprofloxacin), three molecules with long aliphatic linkers and three singletons. This approach represents a rapid method to prioritize compounds for testing that can be used alongside medicinal chemistry insight and other filters to identify active molecules. Such models can significantly increase the hit rate of HTS, above the usual 1% or lower rates seen. In addition, the potential targets for the 11 molecules were predicted using TB Mobile and clustering alongside a set of over 740 molecules with known M. tuberculosis target annotations. These predictions may serve as a mechanism for prioritizing compounds for further optimization.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA; Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
| | - Allen C Casey
- Infectious Disease Research Institute, Seattle, WA, USA
| | - David Roberts
- Infectious Disease Research Institute, Seattle, WA, USA
| | - Tanya Parish
- Infectious Disease Research Institute, Seattle, WA, USA
| | - Barry A Bunin
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA
| |
Collapse
|
16
|
Ekins S, Freundlich JS, Reynolds RC. Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. J Chem Inf Model 2013; 53:3054-63. [PMID: 24144044 DOI: 10.1021/ci400480s] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The search for new tuberculosis treatments continues as we need to find molecules that can act more quickly, be accommodated in multidrug regimens, and overcome ever increasing levels of drug resistance. Multiple large scale phenotypic high-throughput screens against Mycobacterium tuberculosis (Mtb) have generated dose response data, enabling the generation of machine learning models. These models also incorporated cytotoxicity data and were recently validated with a large external data set. A cheminformatics data-fusion approach followed by Bayesian machine learning, Support Vector Machine, or Recursive Partitioning model development (based on publicly available Mtb screening data) was used to compare individual data sets and subsequent combined models. A set of 1924 commercially available molecules with promising antitubercular activity (and lack of relative cytotoxicity to Vero cells) were used to evaluate the predictive nature of the models. We demonstrate that combining three data sets incorporating antitubercular and cytotoxicity data in Vero cells from our previous screens results in external validation receiver operator curve (ROC) of 0.83 (Bayesian or RP Forest). Models that do not have the highest 5-fold cross-validation ROC scores can outperform other models in a test set dependent manner. We demonstrate with predictions for a recently published set of Mtb leads from GlaxoSmithKline that no single machine learning model may be enough to identify compounds of interest. Data set fusion represents a further useful strategy for machine learning construction as illustrated with Mtb. Coverage of chemistry and Mtb target spaces may also be limiting factors for the whole-cell screening data generated to date.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, California 94010, United States
| | | | | |
Collapse
|
17
|
Ekins S, Williams AJ. Curing TB with open science. Tuberculosis (Edinb) 2013; 94:183-5. [PMID: 24388836 DOI: 10.1016/j.tube.2013.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Accepted: 10/16/2013] [Indexed: 12/27/2022]
Abstract
There are many funded efforts going on focused on tuberculosis research and drug or vaccine development. There is little if any global coordination or collaboration and subsequently there are likely to be huge data silos and duplication of efforts. We now propose steps to remedy this by fostering more open science in TB research.
Collapse
Affiliation(s)
- Sean Ekins
- Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA; Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.
| | - Antony J Williams
- Royal Society of Chemistry, 904 Tamaras Circle, Wake Forest, NC 27587, USA
| |
Collapse
|
18
|
Ponder EL, Freundlich JS, Sarker M, Ekins S. Computational models for neglected diseases: gaps and opportunities. Pharm Res 2013; 31:271-7. [PMID: 23990313 DOI: 10.1007/s11095-013-1170-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Accepted: 07/28/2013] [Indexed: 01/22/2023]
Abstract
Neglected diseases, such as Chagas disease, African sleeping sickness, and intestinal worms, affect millions of the world's poor. They disproportionately affect marginalized populations, lack effective treatments or vaccines, or existing products are not accessible to the populations affected. Computational approaches have been used across many of these diseases for various aspects of research or development, and yet data produced by computational approaches are not integrated and widely accessible to others. Here, we identify gaps in which computational approaches have been used for some neglected diseases and not others. We also make recommendations for the broad-spectrum integration of these techniques into a neglected disease drug discovery and development workflow.
Collapse
Affiliation(s)
- Elizabeth L Ponder
- Center for Emerging and Neglected Diseases, Berkeley, 444A Li Ka Shing Center, Berkeley, California, 94720-3370, USA,
| | | | | | | |
Collapse
|