1
|
Humayun F, Khan F, Khan A, Alshammari A, Ji J, Farhan A, Fawad N, Alam W, Ali A, Wei DQ. De novo generation of dual-target ligands for the treatment of SARS-CoV-2 using deep learning, virtual screening, and molecular dynamic simulations. J Biomol Struct Dyn 2024; 42:3019-3029. [PMID: 37449757 DOI: 10.1080/07391102.2023.2234481] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/30/2023] [Indexed: 07/18/2023]
Abstract
De novo generation of molecules with the necessary features offers a promising opportunity for artificial intelligence, such as deep generative approaches. However, creating novel compounds having biological activities toward two distinct targets continues to be a very challenging task. In this study, we develop a unique computational framework for the de novo synthesis of bioactive compounds directed at two predetermined therapeutic targets. This framework is referred to as the dual-target ligand generative network. Our approach uses a stochastic policy to explore chemical spaces called a sequence-based simple molecular input line entry system (SMILES) generator. The steps in the high-level workflow would be to gather and prepare the training data for both targets' molecules, build a neural network model and train it to make molecules, create new molecules using generative AI, and then virtually screen the newly validated molecules against the SARS-CoV-2 PLpro and 3CLpro drug targets. Results shows that novel molecules generated have higher binding affinity with both targets than the conventional drug i.e. Remdesivir being used for the treatment of SARS-CoV-2.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Fahad Humayun
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Fatima Khan
- National Institute of Health, Islamabad, Pakistan
| | - Abbas Khan
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Abdulrahman Alshammari
- Department of Pharmacology and Toxicology, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Jun Ji
- Henan Provincial Engineering and Technology Center of Health Products for Livestock and Poultry, Henan Provincial Engineering and Technology Center of Animal Disease Diagnosis and Integrated Control, Nanyang Normal University, Nanyang, PR China
| | - Ali Farhan
- Department of Chemistry, Chung Yuan Christian University, Taoyuan, Taiwan
| | - Nasim Fawad
- Poultry Research Institute, Rawalpindi, Pakistan
| | - Waheed Alam
- National Institute of Health, Islamabad, Pakistan
| | - Arif Ali
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
| | - Dong-Qing Wei
- Department of Bioinformatics and Biological Statistics, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- State Key Laboratory of Microbial Metabolism and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, PR China
- Centre for Research in Molecular Modeling, Concordia University, Québec, Canada
| |
Collapse
|
2
|
Day EC, Chittari SS, Bogen MP, Knight AS. Navigating the Expansive Landscapes of Soft Materials: A User Guide for High-Throughput Workflows. ACS POLYMERS AU 2023; 3:406-427. [PMID: 38107416 PMCID: PMC10722570 DOI: 10.1021/acspolymersau.3c00025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2023] [Revised: 11/02/2023] [Accepted: 11/07/2023] [Indexed: 12/19/2023]
Abstract
Synthetic polymers are highly customizable with tailored structures and functionality, yet this versatility generates challenges in the design of advanced materials due to the size and complexity of the design space. Thus, exploration and optimization of polymer properties using combinatorial libraries has become increasingly common, which requires careful selection of synthetic strategies, characterization techniques, and rapid processing workflows to obtain fundamental principles from these large data sets. Herein, we provide guidelines for strategic design of macromolecule libraries and workflows to efficiently navigate these high-dimensional design spaces. We describe synthetic methods for multiple library sizes and structures as well as characterization methods to rapidly generate data sets, including tools that can be adapted from biological workflows. We further highlight relevant insights from statistics and machine learning to aid in data featurization, representation, and analysis. This Perspective acts as a "user guide" for researchers interested in leveraging high-throughput screening toward the design of multifunctional polymers and predictive modeling of structure-property relationships in soft materials.
Collapse
Affiliation(s)
| | | | - Matthew P. Bogen
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Abigail S. Knight
- Department of Chemistry, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
3
|
Shyam Mohan AH, Rao SN, D S, Rajeswari N. In silico structural, phylogenetic and drug target analysis of putrescine monooxygenase from Shewanella putrefaciens-95. J Genet Eng Biotechnol 2022; 20:57. [PMID: 35412199 PMCID: PMC9005580 DOI: 10.1186/s43141-022-00338-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Accepted: 03/22/2022] [Indexed: 11/10/2022]
Abstract
BACKGROUND The enormous and irresponsible use of antibiotics has led to the emergence of resistant strains of bacteria globally. A new approach to combat this crisis has been nutritional immunity limiting the availability of nutrients to pathogens. Targeting the siderophore biosynthetic pathway that helps in iron acquisition, an essential microelement in the bacterial system has been the topic of interest in recent days that backs the concept of nutritional immunity. Supporting this view, we have chosen to study a key enzyme in the biosynthetic pathway of putrebactin called putrescine monooxygenase (SpPMO) from Shewanella putrefaciens. In our previous study, we co-expressed putrescine monooxygenase recombinantly in Escherichia coli BL21 Star (DE3). The bioinformatic analysis and screening of inhibitors will broaden the scope of SpPMO as a drug target. RESULTS In the present study, we have analysed the physicochemical properties of the target enzyme and other N-hydroxylating monooxygenases (NMOs) using ExPASy server. The target enzyme SpPMO and most of the selected NMOs have a slightly acidic isoelectric point and are medially thermostable and generally insoluble. The multiple sequence alignment identified the GXGXX(N/A), DXXXFATGYXXXXP motives and conserved amino acids involved in FAD binding, NADP binding, secondary structure formation and substrate binding. The phylogenetic analysis indicated the distribution of the monooxygenases into different clades according to their substrate specificity. Further, a 3D model of SpPMO was predicted using I-TASSER online tool with DfoA from Erwinia amylovora as a template. The model was validated using the SAVES server and deposited to the Protein Model Database with the accession number PM0082222. The molecular docking analysis with different substrates revealed the presence of a putrescine binding pocket made of conserved amino acids and another binding pocket present on the surface of the protein wherein all other ligands interact with high binding affinity. The molecular docking of naturally occurring inhibitor molecules with SpPMO 3D model identified curcumin and niazirin with 1.83 and 2.81 μM inhibition constants as two promising inhibitors. Further studies on kinetic parameters of curcumin and niazirin inhibitors in vitro determined the Ki to be 2.6±0.0036 μM and 18.38±0.008 μM respectively. CONCLUSION This analysis will help us understand the structural, phylogenetic and drug target aspects of putrescine monooxygenase from Shewanella putrefaciens-95 in detail. It sheds light on the precautionary measures that can be developed to inhibit the enzyme and thereby the secondary infections caused by them.
Collapse
Affiliation(s)
- Anil H Shyam Mohan
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Shavige Malleswara Hills, Bengaluru-78, Karnataka, India
| | - Saroja Narsing Rao
- Pesticide Residue and Food Quality Analysis Laboratory, University of Agricultural Sciences, Raichur, Karnataka, 584104, India.
| | - Srividya D
- Department of Biotechnology, Davangere University, Shivagangothri, Davangere, Karnataka, 577007, India
| | - N Rajeswari
- Department of Biotechnology, Dayananda Sagar College of Engineering, Kumaraswamy Layout, Shavige Malleswara Hills, Bengaluru-78, Karnataka, India
| |
Collapse
|
4
|
Lu F, Li M, Min X, Li C, Zeng X. De novo generation of dual-target ligands using adversarial training and reinforcement learning. Brief Bioinform 2021; 22:6354720. [PMID: 34410338 DOI: 10.1093/bib/bbab333] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2021] [Revised: 07/26/2021] [Indexed: 12/25/2022] Open
Abstract
Artificial intelligence, such as deep generative methods, represents a promising solution to de novo design of molecules with the desired properties. However, generating new molecules with biological activities toward two specific targets remains an extremely difficult challenge. In this work, we conceive a novel computational framework, herein called dual-target ligand generative network (DLGN), for the de novo generation of bioactive molecules toward two given objectives. Via adversarial training and reinforcement learning, DLGN treats a sequence-based simplified molecular input line entry system (SMILES) generator as a stochastic policy for exploring chemical spaces. Two discriminators are then used to encourage the generation of molecules that belong to the intersection of two bioactive-compound distributions. In a case study, we employ our methods to design a library of dual-target ligands targeting dopamine receptor D2 and 5-hydroxytryptamine receptor 1A as new antipsychotics. Experimental results demonstrate that the proposed model can generate novel compounds with high similarity to both bioactive datasets in several structure-based metrics. Our model exhibits a performance comparable to that of various state-of-the-art multi-objective molecule generation models. We envision that this framework will become a generally applicable approach for designing dual-target drugs in silico.
Collapse
Affiliation(s)
- Fengqing Lu
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Mufei Li
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xiaoping Min
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Department of Computer Science, Xiamen University, Xiamen 361005, China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, 410086 Changsha, China
| |
Collapse
|
5
|
Li M, Zhou J, Hu J, Fan W, Zhang Y, Gu Y, Karypis G. DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science. ACS OMEGA 2021; 6:27233-27238. [PMID: 34693143 PMCID: PMC8529678 DOI: 10.1021/acsomega.1c04017] [Citation(s) in RCA: 59] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 09/24/2021] [Indexed: 06/01/2023]
Abstract
Graph neural networks (GNNs) constitute a class of deep learning methods for graph data. They have wide applications in chemistry and biology, such as molecular property prediction, reaction prediction, and drug-target interaction prediction. Despite the interest, GNN-based modeling is challenging as it requires graph data preprocessing and modeling in addition to programming and deep learning. Here, we present Deep Graph Library (DGL)-LifeSci, an open-source package for deep learning on graphs in life science. Deep Graph Library (DGL)-LifeSci is a python toolkit based on RDKit, PyTorch, and Deep Graph Library (DGL). DGL-LifeSci allows GNN-based modeling on custom datasets for molecular property prediction, reaction prediction, and molecule generation. With its command-line interfaces, users can perform modeling without any background in programming and deep learning. We test the command-line interfaces using standard benchmarks MoleculeNet, USPTO, and ZINC. Compared with previous implementations, DGL-LifeSci achieves a speed up by up to 6×. For modeling flexibility, DGL-LifeSci provides well-optimized modules for various stages of the modeling pipeline. In addition, DGL-LifeSci provides pretrained models for reproducing the test experiment results and applying models without training. The code is distributed under an Apache-2.0 License and is freely accessible at https://github.com/awslabs/dgl-lifesci.
Collapse
Affiliation(s)
- Mufei Li
- AWS
Shanghai AI Lab, 5F-102, 1901 Huashan Road, Shanghai200030, P. R. China
| | - Jinjing Zhou
- AWS
Shanghai AI Lab, 5F-102, 1901 Huashan Road, Shanghai200030, P. R. China
| | - Jiajing Hu
- Maurice
Wohl Clinical Neuroscience Institute, King’s College London, 5 Cutcombe Road, London SE5 9RT, U.K.
| | - Wenxuan Fan
- School
of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - Yangkang Zhang
- College
of Computer Science and Technology, Zhejiang
University, 866 Yuhangtang Road, Hangzhou 310058, P. R. China
| | - Yaxin Gu
- School
of Pharmacy, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, P. R. China
| | - George Karypis
- AWS
AI, East Palo Alto, California 94303, United States
- Department
of Computer Science and Engineering, University
of Minnesota, 4-192 KHKH,
200 Union St SE, Minnesota, Minneapolis55455, United States
| |
Collapse
|
6
|
Clemons PA, Bittker JA, Wagner FF, Hands A, Dančík V, Schreiber SL, Choudhary A, Wagner BK. The Use of Informer Sets in Screening: Perspectives on an Efficient Strategy to Identify New Probes. SLAS DISCOVERY 2021; 26:855-861. [PMID: 34130532 DOI: 10.1177/24725552211019410] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Small-molecule discovery typically involves large-scale screening campaigns, spanning multiple compound collections. However, such activities can be cost- or time-prohibitive, especially when using complex assay systems, limiting the number of compounds tested. Further, low hit rates can make the process inefficient. Sparse coverage of chemical structure or biological activity space can lead to limited success in a primary screen and represents a missed opportunity by virtue of selecting the "wrong" compounds to test. Thus, the choice of screening collections becomes of paramount importance. In this perspective, we discuss the utility of generating "informer sets" for small-molecule discovery, and how this strategy can be leveraged to prioritize probe candidates. While many researchers may assume that informer sets are focused on particular targets (e.g., kinases) or processes (e.g., autophagy), efforts to assemble informer sets based on historical bioactivity or successful human exposure (e.g., repurposing collections) have shown promise as well. Another method for generating informer sets is based on chemical structure, particularly when the compounds have unknown activities and targets. We describe our efforts to screen an informer set representing a collection of 100,000 small molecules synthesized through diversity-oriented synthesis (DOS). This process enables researchers to identify activity early and more extensively screen only a few chemical scaffolds, rather than the entire collection. This elegant and economic outcome is a goal of the informer set approach. Here, we aim not only to shed light on this process, but also to promote the use of informer sets more widely in small-molecule discovery projects.
Collapse
Affiliation(s)
- Paul A Clemons
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Joshua A Bittker
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA.,Vertex Pharmaceuticals, Boston, MA, USA
| | - Florence F Wagner
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA
| | - Allison Hands
- Center for the Development of Therapeutics, Broad Institute, Cambridge, MA, USA
| | - Vlado Dančík
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Stuart L Schreiber
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Amit Choudhary
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| | - Bridget K Wagner
- Chemical Biology and Therapeutics Science Program, Broad Institute, Cambridge, MA, USA
| |
Collapse
|
7
|
Shrivastava AD, Kell DB. FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space. Molecules 2021; 26:2065. [PMID: 33916824 PMCID: PMC8038408 DOI: 10.3390/molecules26072065] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 03/29/2021] [Accepted: 04/01/2021] [Indexed: 12/12/2022] Open
Abstract
The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing "similar" molecules to cluster together in an effective and interpretable way.
Collapse
Affiliation(s)
- Aditya Divyakant Shrivastava
- Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India;
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
- Mellizyme Ltd., Liverpool Science Park, IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK
| |
Collapse
|
8
|
Salcedo-Sora JE, Jindal S, O'Hagan S, Kell DB. A palette of fluorophores that are differentially accumulated by wild-type and mutant strains of Escherichia coli: surrogate ligands for profiling bacterial membrane transporters. MICROBIOLOGY (READING, ENGLAND) 2021; 167:001016. [PMID: 33406033 PMCID: PMC8131027 DOI: 10.1099/mic.0.001016] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 12/15/2020] [Indexed: 12/12/2022]
Abstract
Our previous work demonstrated that two commonly used fluorescent dyes that were accumulated by wild-type Escherichia coli MG1655 were differentially transported in single-gene knockout strains, and also that they might be used as surrogates in flow cytometric transporter assays. We summarize the desirable properties of such stains, and here survey 143 candidate dyes. We eventually triage them (on the basis of signal, accumulation levels and cost) to a palette of 39 commercially available and affordable fluorophores that are accumulated significantly by wild-type cells of the 'Keio' strain BW25113, as measured flow cytometrically. Cheminformatic analyses indicate both their similarities and their (much more considerable) structural differences. We describe the effects of pH and of the efflux pump inhibitor chlorpromazine on the accumulation of the dyes. Even the 'wild-type' MG1655 and BW25113 strains can differ significantly in their ability to take up such dyes. We illustrate the highly differential uptake of our dyes into strains with particular lesions in, or overexpressed levels of, three particular transporters or transporter components (yhjV, yihN and tolC). The relatively small collection of dyes described offers a rapid, inexpensive, convenient and informative approach to the assessment of microbial physiology and phenotyping of membrane transporter function.
Collapse
Affiliation(s)
- Jesus Enrique Salcedo-Sora
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK
| | - Srijan Jindal
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK
| | - Steve O'Hagan
- Department of Chemistry and Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, Faculty of Health and Life Sciences, University of Liverpool, Crown St, Liverpool L69 7ZB, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
| |
Collapse
|
9
|
van Santen JA, Kautsar SA, Medema MH, Linington RG. Microbial natural product databases: moving forward in the multi-omics era. Nat Prod Rep 2021; 38:264-278. [PMID: 32856641 PMCID: PMC7864863 DOI: 10.1039/d0np00053a] [Citation(s) in RCA: 41] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Covering: 2010-2020The digital revolution is driving significant changes in how people store, distribute, and use information. With the advent of new technologies around linked data, machine learning and large-scale network inference, the natural products research field is beginning to embrace real-time sharing and large-scale analysis of digitized experimental data. Databases play a key role in this, as they allow systematic annotation and storage of data for both basic and advanced applications. The quality of the content, structure, and accessibility of these databases all contribute to their usefulness for the scientific community in practice. This review covers the development of databases relevant for microbial natural product discovery during the past decade (2010-2020), including repositories of chemical structures/properties, metabolomics, and genomic data (biosynthetic gene clusters). It provides an overview of the most important databases and their functionalities, highlights some early meta-analyses using such databases, and discusses basic principles to enable widespread interoperability between databases. Furthermore, it points out conceptual and practical challenges in the curation and usage of natural products databases. Finally, the review closes with a discussion of key action points required for the field moving forward, not only for database developers but for any scientist active in the field.
Collapse
|
10
|
O’Hagan S, Kell DB. Structural Similarities between Some Common Fluorophores Used in Biology, Marketed Drugs, Endogenous Metabolites, and Natural Products. Mar Drugs 2020; 18:E582. [PMID: 33238416 PMCID: PMC7700180 DOI: 10.3390/md18110582] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Revised: 11/16/2020] [Accepted: 11/20/2020] [Indexed: 12/12/2022] Open
Abstract
It is known that at least some fluorophores can act as 'surrogate' substrates for solute carriers (SLCs) involved in pharmaceutical drug uptake, and this promiscuity is taken to reflect at least a certain structural similarity. As part of a comprehensive study seeking the 'natural' substrates of 'orphan' transporters that also serve to take up pharmaceutical drugs into cells, we have noted that many drugs bear structural similarities to natural products. A cursory inspection of common fluorophores indicates that they too are surprisingly 'drug-like', and they also enter at least some cells. Some are also known to be substrates of efflux transporters. Consequently, we sought to assess the structural similarity of common fluorophores to marketed drugs, endogenous mammalian metabolites, and natural products. We used a set of some 150 fluorophores along with standard fingerprinting methods and the Tanimoto similarity metric. Results: The great majority of fluorophores tested exhibited significant similarity (Tanimoto similarity > 0.75) to at least one drug, as judged via descriptor properties (especially their aromaticity, for identifiable reasons that we explain), by molecular fingerprints, by visual inspection, and via the "quantitative estimate of drug likeness" technique. It is concluded that this set of fluorophores does overlap with a significant part of both the drug space and natural products space. Consequently, fluorophores do indeed offer a much wider opportunity than had possibly been realised to be used as surrogate uptake molecules in the competitive or trans-stimulation assay of membrane transporter activities.
Collapse
Affiliation(s)
- Steve O’Hagan
- Department of Chemistry, The University of Manchester, Manchester M13 9PT, UK;
- Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Molecular, Integrative and Systems Biology, Biosciences Building, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kongens Lyngby, Denmark
| |
Collapse
|
11
|
Khemchandani Y, O'Hagan S, Samanta S, Swainston N, Roberts TJ, Bollegala D, Kell DB. DeepGraphMolGen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. J Cheminform 2020; 12:53. [PMID: 33431037 PMCID: PMC7487898 DOI: 10.1186/s13321-020-00454-3] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2020] [Accepted: 08/18/2020] [Indexed: 02/03/2023] Open
Abstract
We address the problem of generating novel molecules with desired interaction properties as a multi-objective optimization problem. Interaction binding models are learned from binding data using graph convolution networks (GCNs). Since the experimentally obtained property scores are recognised as having potentially gross errors, we adopted a robust loss for the model. Combinations of these terms, including drug likeness and synthetic accessibility, are then optimized using reinforcement learning based on a graph convolution policy approach. Some of the molecules generated, while legitimate chemically, can have excellent drug-likeness scores but appear unusual. We provide an example based on the binding potency of small molecules to dopamine transporters. We extend our method successfully to use a multi-objective reward function, in this case for generating novel molecules that bind with dopamine transporters but not with those for norepinephrine. Our method should be generally applicable to the generation in silico of molecules with desirable properties.
Collapse
Affiliation(s)
- Yash Khemchandani
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
- Indian Institute of Technology Bombay, Powai, Mumbai, Maharashtra, 400 076, India
| | - Stephen O'Hagan
- Dept of Chemistry, Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester, M1 7DN, UK
| | - Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Timothy J Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK
| | - Danushka Bollegala
- Dept of Computer Science, University of Liverpool, Ashton Building, Ashton Street, Liverpool, L69 3BX, UK
| | - Douglas B Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool, L69 7ZB, UK.
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kemitorvet 200, Kgs, 2800, Lyngby, Denmark.
| |
Collapse
|
12
|
Samanta S, O’Hagan S, Swainston N, Roberts TJ, Kell DB. VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder. Molecules 2020; 25:E3446. [PMID: 32751155 PMCID: PMC7435890 DOI: 10.3390/molecules25153446] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/21/2020] [Accepted: 07/28/2020] [Indexed: 01/13/2023] Open
Abstract
Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
Collapse
Affiliation(s)
- Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Steve O’Hagan
- Department of Chemistry, The Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK;
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Timothy J. Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
| |
Collapse
|
13
|
Generation of a Small Library of Natural Products Designed to Cover Chemical Space Inexpensively. PHARMACEUTICAL FRONTIERS 2019; 1:e190005. [PMID: 31485581 PMCID: PMC6726486 DOI: 10.20900/pf20190005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Natural products space includes at least 200,000 compounds and the structures of most of these compounds are available in digital format. Previous analyses showed (i) that although they were capable of taking up synthetic pharmaceutical drugs, such exogenous molecules were likely the chief ‘natural’ substrates in the evolution of the transporters used to gain cellular entry by pharmaceutical drugs, and (ii) that a relatively simple but rapid clustering algorithm could produce clusters from which individual elements might serve to form a representative library covering natural products space. This exploited the fact that the larger clusters were likely to be formed early in evolution (and hence to have been accompanied by suitable transporters), so that very small clusters, including singletons, could be ignored. In the latter work, we assumed that the molecule chosen might be that in the middle of the cluster. However, this ignored two other criteria, namely the commercial availability and the financial cost of the individual elements of these clusters. We here develop a small representative library in which we to seek to satisfy the somewhat competing criteria of coverage (‘representativeness’), availability and cost. It is intended that the library chosen might serve as a testbed of molecules that may or may not be substrates for known or orphan drug transporters. A supplementary spreadsheet provides details, and their availability via a particular supplier.
Collapse
|
14
|
Baidoo EEK, Teixeira Benites V. Mass Spectrometry-Based Microbial Metabolomics: Techniques, Analysis, and Applications. Methods Mol Biol 2019; 1859:11-69. [PMID: 30421222 DOI: 10.1007/978-1-4939-8757-3_2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The demand for understanding the roles genes play in biological systems has steered the biosciences into the direction the metabolome, as it closely reflects the metabolic activities within a cell. The importance of the metabolome is further highlighted by its ability to influence the genome, transcriptome, and proteome. Consequently, metabolomic information is being used to understand microbial metabolic networks. At the forefront of this work is mass spectrometry, the most popular metabolomics measurement technique. Mass spectrometry-based metabolomic analyses have made significant contributions to microbiological research in the environment and human disease. In this chapter, we break down the technical aspects of mass spectrometry-based metabolomics and discuss its application to microbiological research.
Collapse
Affiliation(s)
- Edward E K Baidoo
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA.
- Joint BioEnergy Institute, Emeryville, California, USA.
| | - Veronica Teixeira Benites
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
- Joint BioEnergy Institute, Emeryville, California, USA
| |
Collapse
|
15
|
Lagarde N, Rey J, Gyulkhandanyan A, Tufféry P, Miteva MA, Villoutreix BO. Online structure-based screening of purchasable approved drugs and natural compounds: retrospective examples of drug repositioning on cancer targets. Oncotarget 2018; 9:32346-32361. [PMID: 30190791 PMCID: PMC6122352 DOI: 10.18632/oncotarget.25966] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2018] [Accepted: 07/31/2018] [Indexed: 12/11/2022] Open
Abstract
Drug discovery is a long and difficult process that benefits from the integration of virtual screening methods in experimental screening campaigns such as to generate testable hypotheses, accelerate and/or reduce the cost of drug development. Current drug attrition rate is still a major issue in all therapeutic areas and especially in the field of cancer. Drug repositioning as well as the screening of natural compounds constitute promising approaches to accelerate and improve the success rate of drug discovery. We developed three compounds libraries of purchasable compounds: Drugs-lib, FOOD-lib and NP-lib that contain approved drugs, food constituents and natural products, respectively, that are optimized for structure-based virtual screening studies. The three compounds libraries are implemented in the MTiOpenScreen web server that allows users to perform structure-based virtual screening computations on their selected protein targets. The server outputs a list of 1,500 molecules with predicted binding scores that can then be processed further by the users and purchased for experimental validation. To illustrate the potential of our service for drug repositioning endeavours, we selected five recently published drugs that have been repositioned in vitro and/or in vivo on cancer targets. For each drug, we used the MTiOpenScreen service to screen the Drugs-lib collection against the corresponding anti-cancer target and we show that our protocol is able to rank these drugs within the top ranked compounds. This web server should assist the discovery of promising molecules that could benefit patients, with faster development times, and reduced costs and risk.
Collapse
Affiliation(s)
- Nathalie Lagarde
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| | - Julien Rey
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| | - Aram Gyulkhandanyan
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| | - Pierre Tufféry
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| | - Maria A. Miteva
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| | - Bruno O. Villoutreix
- Université Paris Diderot, Sorbonne Paris Cité, Molécules Thérapeutiques In Silico, INSERM UMR-S 973, Paris, France
- INSERM, U973, Paris, France
| |
Collapse
|
16
|
Blaženović I, Kind T, Ji J, Fiehn O. Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics. Metabolites 2018; 8:E31. [PMID: 29748461 PMCID: PMC6027441 DOI: 10.3390/metabo8020031] [Citation(s) in RCA: 373] [Impact Index Per Article: 62.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Revised: 04/26/2018] [Accepted: 05/06/2018] [Indexed: 01/17/2023] Open
Abstract
The annotation of small molecules remains a major challenge in untargeted mass spectrometry-based metabolomics. We here critically discuss structured elucidation approaches and software that are designed to help during the annotation of unknown compounds. Only by elucidating unknown metabolites first is it possible to biologically interpret complex systems, to map compounds to pathways and to create reliable predictive metabolic models for translational and clinical research. These strategies include the construction and quality of tandem mass spectral databases such as the coalition of MassBank repositories and investigations of MS/MS matching confidence. We present in silico fragmentation tools such as MS-FINDER, CFM-ID, MetFrag, ChemDistiller and CSI:FingerID that can annotate compounds from existing structure databases and that have been used in the CASMI (critical assessment of small molecule identification) contests. Furthermore, the use of retention time models from liquid chromatography and the utility of collision cross-section modelling from ion mobility experiments are covered. Workflows and published examples of successfully annotated unknown compounds are included.
Collapse
Affiliation(s)
- Ivana Blaženović
- NIH West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA 95616, USA.
| | - Tobias Kind
- NIH West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA 95616, USA.
| | - Jian Ji
- State Key Laboratory of Food Science and Technology, School of Food Science of Jiangnan University, School of Food Science Synergetic Innovation Center of Food Safety and Nutrition, Wuxi 214122, China.
| | - Oliver Fiehn
- NIH West Coast Metabolomics Center, UC Davis Genome Center, University of California, Davis, CA 95616, USA.
- Department of Biochemistry, Faculty of Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
| |
Collapse
|
17
|
O'Hagan S, Wright Muelas M, Day PJ, Lundberg E, Kell DB. GeneGini: Assessment via the Gini Coefficient of Reference "Housekeeping" Genes and Diverse Human Transporter Expression Profiles. Cell Syst 2018; 6:230-244.e1. [PMID: 29428416 PMCID: PMC5840522 DOI: 10.1016/j.cels.2018.01.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2017] [Revised: 09/26/2017] [Accepted: 12/30/2017] [Indexed: 01/13/2023]
Abstract
The expression levels of SLC or ABC membrane transporter transcripts typically differ 100- to 10,000-fold between different tissues. The Gini coefficient characterizes such inequalities and here is used to describe the distribution of the expression of each transporter among different human tissues and cell lines. Many transporters exhibit extremely high Gini coefficients even for common substrates, indicating considerable specialization consistent with divergent evolution. The expression profiles of SLC transporters in different cell lines behave similarly, although Gini coefficients for ABC transporters tend to be larger in cell lines than in tissues, implying selection. Transporter genes are significantly more heterogeneously expressed than the members of most non-transporter gene classes. Transcripts with the stablest expression have a low Gini index and often differ significantly from the "housekeeping" genes commonly used for normalization in transcriptomics/qPCR studies. PCBP1 has a low Gini coefficient, is reasonably expressed, and is an excellent novel reference gene. The approach, referred to as GeneGini, provides rapid and simple characterization of expression-profile distributions and improved normalization of genome-wide expression-profiling data.
Collapse
Affiliation(s)
- Steve O'Hagan
- School of Chemistry, 131, Princess Street, Manchester M1 7DN, UK; The Manchester Institute of Biotechnology, 131, Princess Street, Manchester M1 7DN, UK
| | - Marina Wright Muelas
- School of Chemistry, 131, Princess Street, Manchester M1 7DN, UK; The Manchester Institute of Biotechnology, 131, Princess Street, Manchester M1 7DN, UK
| | - Philip J Day
- The Manchester Institute of Biotechnology, 131, Princess Street, Manchester M1 7DN, UK; Faculty of Biology, Medicine and Health, The University of Manchester, Oxford Road, Manchester M13 9PL, UK
| | - Emma Lundberg
- Science for Life Laboratory, Royal Institute of Technology (KTH), SE-17121 Solna, Sweden.
| | - Douglas B Kell
- School of Chemistry, 131, Princess Street, Manchester M1 7DN, UK; The Manchester Institute of Biotechnology, 131, Princess Street, Manchester M1 7DN, UK.
| |
Collapse
|