1
|
Ahmad S, Singh V, Gautam HK, Raza K. Multisampling-based docking reveals Imidazolidinyl urea as a multitargeted inhibitor for lung cancer: an optimisation followed multi-simulation and in-vitro study. J Biomol Struct Dyn 2024; 42:2494-2511. [PMID: 37154501 DOI: 10.1080/07391102.2023.2209673] [Citation(s) in RCA: 20] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2023] [Accepted: 04/16/2023] [Indexed: 05/10/2023]
Abstract
Lung Cancer is one of the deadliest cancers, responsible for more than 1.80 million deaths annually worldwide, and it is on the priority list of WHO. In the current scenario, when cancer cells become resistant to the drug, making it less effective leaves the patient in vulnerable conditions. To overcome this situation, researchers are constantly working on new drugs and medications that can help fight drug resistance and improve patients' outcomes. In this study, we have taken five main proteins of lung cancer, namely RSK4 N-terminal kinase, guanylate kinase, cyclin-dependent kinase 2, kinase CK2 holoenzyme, tumour necrosis factor-alpha and screened the prepared Drug Bank library with 1,55,888 compounds against all using three Glide-based docking algorithms namely HTVS, standard precision and extra precise with a docking score ranging from -5.422 to -8.432 Kcal/mol. The poses were filtered with the MM\GBSA calculations, which helped to identify Imidazolidinyl urea C11H16N8O8 (DB14075) as a multitargeted inhibitor for lung cancer, validated with advanced computations like ADMET, interaction pattern fingerprints, and optimised the compound with Jaguar, producing satisfied relative energy. All five complexes were performed with MD Simulation for 100 ns with NPT ensemble class, producing cumulative deviation and fluctuations < 2 Å and a web of intermolecular interaction, making the complexes stable. Further, the in-vitro analysis for morphological imaging, Annexin V/PI FACS assay, ROS and MMP analysis caspase3//7 activity were performed on the A549 cell line producing promising results and can be an option to treat lung cancer at a significantly cheaper state.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Shaban Ahmad
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| | - Vijay Singh
- Immunology and Infectious Disease, Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
| | - Hemant K Gautam
- Immunology and Infectious Disease, Institute of Genomics and Integrative Biology (IGIB), New Delhi, India
| | - Khalid Raza
- Department of Computer Science, Jamia Millia Islamia, New Delhi, India
| |
Collapse
|
2
|
Vivek-Ananth R, Sahoo AK, Baskaran SP, Samal A. Scaffold and Structural Diversity of the Secondary Metabolite Space of Medicinal Fungi. ACS OMEGA 2023; 8:3102-3113. [PMID: 36713723 PMCID: PMC9878629 DOI: 10.1021/acsomega.2c06428] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 11/10/2022] [Indexed: 06/18/2023]
Abstract
Medicinal fungi, including mushrooms, have well-documented therapeutic uses. In this study, we perform a cheminformatics-based investigation of the scaffold and structural diversity of the secondary metabolite space of medicinal fungi and, moreover, perform a detailed comparison with approved drugs, other natural product libraries, and semi-synthetic libraries. We find that the secondary metabolite space of medicinal fungi has similar or higher scaffold diversity in comparison to other natural product libraries analyzed here. Notably, 94% of the scaffolds in the secondary metabolite space of medicinal fungi are not present in the approved drugs. Further, we find that the secondary metabolites, on the one hand, are structurally far from the approved drugs, while, on the other hand, they are close in terms of molecular properties to the approved drugs. Lastly, chemical space visualization using dimensionality reduction methods showed that the secondary metabolite space has minimal overlap with the approved drug space. In a nutshell, our results underscore that the secondary metabolite space of medicinal fungi is a valuable resource for identifying potential lead molecules for natural product-based drug discovery.
Collapse
Affiliation(s)
- R.P. Vivek-Ananth
- The
Institute of Mathematical Sciences (IMSc), Chennai600113, India
- Homi
Bhabha National Institute (HBNI), Mumbai400094, India
| | - Ajaya Kumar Sahoo
- The
Institute of Mathematical Sciences (IMSc), Chennai600113, India
- Homi
Bhabha National Institute (HBNI), Mumbai400094, India
| | - Shanmuga Priya Baskaran
- The
Institute of Mathematical Sciences (IMSc), Chennai600113, India
- Homi
Bhabha National Institute (HBNI), Mumbai400094, India
| | - Areejit Samal
- The
Institute of Mathematical Sciences (IMSc), Chennai600113, India
- Homi
Bhabha National Institute (HBNI), Mumbai400094, India
| |
Collapse
|
3
|
Priya S, Tripathi G, Singh DB, Jain P, Kumar A. Machine learning approaches and their applications in drug discovery and design. Chem Biol Drug Des 2022; 100:136-153. [PMID: 35426249 DOI: 10.1111/cbdd.14057] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 03/30/2022] [Accepted: 04/10/2022] [Indexed: 01/04/2023]
Abstract
This review is focused on several machine learning approaches used in chemoinformatics. Machine learning approaches provide tools and algorithms to improve drug discovery. Many physicochemical properties of drugs like toxicity, absorption, drug-drug interaction, carcinogenesis, and distribution have been effectively modeled by QSAR techniques. Machine learning is a subset of artificial intelligence, and this technique has shown tremendous potential in the field of drug discovery. Techniques discussed in this review are capable of modeling non-linear datasets, as well as big data of increasing depth and complexity. Various machine learning-based approaches are being used for drug target prediction, modeling the structure of drug target, binding site prediction, ligand-based similarity searching, de novo designing of ligands with desired properties, developing scoring functions for molecular docking, building QSAR model for biological activity prediction, and prediction of pharmacokinetic and pharmacodynamic properties of ligands. In recent years, these predictive tools and models have achieved good accuracy. By the use of more related input data, relevant parameters, and appropriate algorithms, the accuracy of these predictions can be further improved.
Collapse
Affiliation(s)
- Sonal Priya
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Garima Tripathi
- Department of Chemistry, T. N. B. College, TMBU, Bhagalpur, India
| | - Dev Bukhsh Singh
- Department of Biotechnology, Siddharth University, Siddharth Nagar, India
| | - Priyanka Jain
- National Institute of Plant Genome Research, New Delhi, India
| | - Abhijeet Kumar
- Department of Chemistry, Mahatma Gandhi Central University, Motihari, India
| |
Collapse
|
4
|
Exploring Future Promising Technologies in Hydrogen Fuel Cell Transportation. SUSTAINABILITY 2022. [DOI: 10.3390/su14020917] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
The purpose of this research was to derive promising technologies for the transport of hydrogen fuel cells, thereby supporting the development of research and development policy and presenting directions for investment. We also provide researchers with information about technology that will lead the technology field in the future. Hydrogen energy, as the core of carbon neutral and green energy, is a major issue in changing the future industrial structure and national competitive advantage. In this study, we derived promising technology at the core of future hydrogen fuel cell transportation using the published US patent and paper databases (DB). We first performed text mining and data preprocessing and then discovered promising technologies through generative topographic mapping analysis. We analyzed both the patent DB and treatise DB in parallel and compared the results. As a result, two promising technologies were derived from the patent DB analysis, and five were derived from the paper DB analysis.
Collapse
|
5
|
Bort W, Baskin II, Gimadiev T, Mukanov A, Nugmanov R, Sidorov P, Marcou G, Horvath D, Klimchuk O, Madzhidov T, Varnek A. Discovery of novel chemical reactions by deep generative recurrent neural network. Sci Rep 2021; 11:3178. [PMID: 33542271 PMCID: PMC7862614 DOI: 10.1038/s41598-021-81889-y] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2020] [Accepted: 01/06/2021] [Indexed: 12/18/2022] Open
Abstract
The "creativity" of Artificial Intelligence (AI) in terms of generating de novo molecular structures opened a novel paradigm in compound design, weaknesses (stability & feasibility issues of such structures) notwithstanding. Here we show that "creative" AI may be as successfully taught to enumerate novel chemical reactions that are stoichiometrically coherent. Furthermore, when coupled to reaction space cartography, de novo reaction design may be focused on the desired reaction class. A sequence-to-sequence autoencoder with bidirectional Long Short-Term Memory layers was trained on on-purpose developed "SMILES/CGR" strings, encoding reactions of the USPTO database. The autoencoder latent space was visualized on a generative topographic map. Novel latent space points were sampled around a map area populated by Suzuki reactions and decoded to corresponding reactions. These can be critically analyzed by the expert, cleaned of irrelevant functional groups and eventually experimentally attempted, herewith enlarging the synthetic purpose of popular synthetic pathways.
Collapse
Affiliation(s)
- William Bort
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Igor I Baskin
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
- Department of Materials Science and Engineering, Technion - Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Gimadiev
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Artem Mukanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Ramil Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Pavel Sidorov
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan
| | - Gilles Marcou
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Olga Klimchuk
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France
| | - Timur Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya str. 18, 420008, Kazan, Russia
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, UMR 7140 CNRS, University of Strasbourg, 1, rue Blaise Pascal, 67000, Strasbourg, France.
- Institute for Chemical Reaction Design and Discovery (WPI-ICReDD), Hokkaido University, Kita 21 Nishi 10, Kita-ku, Sapporo, 001-0021, Japan.
| |
Collapse
|
6
|
Identification of Vacant and Emerging Technologies in Smart Mobility Through the GTM-Based Patent Map Development. SUSTAINABILITY 2020. [DOI: 10.3390/su12229310] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
With the development of the online platforms and the Internet of Things (IoT), various transportation services have been provided, and the lifestyle of the general public has changed significantly. However, the speed of development of technologies and services for the mobility handicapped has been relatively slow. Accordingly, in this paper, the smart mobility patent data for the mobility handicapped is subdivided through clustering to derive the mobility handicapped-related vacant technologies, and the prospect of the vacant technology is verified. For each cluster, a technology level map is generated in consideration of the technology growth level and the scope of authority of the vacant technology derived through the generative topographic map (GTM) patent map, and the level of the vacant technology is checked in terms of quantity and quality. Both indicators perform time series analyses on superior technology to predict technology trends and determine the technology’s promisingness. Unlike the precedent studies that focused only on quantitative analysis methods, this paper identified the usefulness of the technology through clustering and various verification processes and materialized it as a vacant technology that is applicable to actual R&D. Accordingly, through this empirical paper, it is possible to understand the current level of vacant technology in smart mobility for the mobility handicapped and establish an R&D strategy to prevent monopoly in technology in the future market and maintain competitiveness. It can also be utilized for new technology development in consideration of convergence with currently developed technology.
Collapse
|
7
|
Gasteiger J. Chemistry in Times of Artificial Intelligence. Chemphyschem 2020; 21:2233-2242. [PMID: 32808729 PMCID: PMC7702165 DOI: 10.1002/cphc.202000518] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 08/14/2020] [Indexed: 11/09/2022]
Abstract
Chemists have to a large extent gained their knowledge by doing experiments and thus gather data. By putting various data together and then analyzing them, chemists have fostered their understanding of chemistry. Since the 1960s, computer methods have been developed to perform this process from data to information to knowledge. Simultaneously, methods were developed for assisting chemists in solving their fundamental questions such as the prediction of chemical, physical, or biological properties, the design of organic syntheses, and the elucidation of the structure of molecules. This eventually led to a discipline of its own: chemoinformatics. Chemoinformatics has found important applications in the fields of drug discovery, analytical chemistry, organic chemistry, agrichemical research, food science, regulatory science, material science, and process control. From its inception, chemoinformatics has utilized methods from artificial intelligence, an approach that has recently gained more momentum.
Collapse
Affiliation(s)
- Johann Gasteiger
- Computer-Chemie-Centrum and Institute of Organic ChemistryUniversity of Erlangen-NurembergNaegelsbachstrasse 2591052ErlangenGermany
| |
Collapse
|
8
|
Samanta S, O’Hagan S, Swainston N, Roberts TJ, Kell DB. VAE-Sim: A Novel Molecular Similarity Measure Based on a Variational Autoencoder. Molecules 2020; 25:E3446. [PMID: 32751155 PMCID: PMC7435890 DOI: 10.3390/molecules25153446] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2020] [Revised: 07/21/2020] [Accepted: 07/28/2020] [Indexed: 01/13/2023] Open
Abstract
Molecular similarity is an elusive but core "unsupervised" cheminformatics concept, yet different "fingerprint" encodings of molecular structures return very different similarity values, even when using the same similarity metric. Each encoding may be of value when applied to other problems with objective or target functions, implying that a priori none are "better" than the others, nor than encoding-free metrics such as maximum common substructure (MCSS). We here introduce a novel approach to molecular similarity, in the form of a variational autoencoder (VAE). This learns the joint distribution p(z|x) where z is a latent vector and x are the (same) input/output data. It takes the form of a "bowtie"-shaped artificial neural network. In the middle is a "bottleneck layer" or latent vector in which inputs are transformed into, and represented as, a vector of numbers (encoding), with a reverse process (decoding) seeking to return the SMILES string that was the input. We train a VAE on over six million druglike molecules and natural products (including over one million in the final holdout set). The VAE vector distances provide a rapid and novel metric for molecular similarity that is both easily and rapidly calculated. We describe the method and its application to a typical similarity problem in cheminformatics.
Collapse
Affiliation(s)
- Soumitra Samanta
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Steve O’Hagan
- Department of Chemistry, The Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK;
| | - Neil Swainston
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Timothy J. Roberts
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
| | - Douglas B. Kell
- Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St, Liverpool L69 7ZB, UK; (S.S.); (N.S.); (T.J.R.)
- Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark
| |
Collapse
|
9
|
Sattarov B, Baskin II, Horvath D, Marcou G, Bjerrum EJ, Varnek A. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. J Chem Inf Model 2019; 59:1182-1196. [PMID: 30785751 DOI: 10.1021/acs.jcim.8b00751] [Citation(s) in RCA: 77] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Here we show that Generative Topographic Mapping (GTM) can be used to explore the latent space of the SMILES-based autoencoders and generate focused molecular libraries of interest. We have built a sequence-to-sequence neural network with Bidirectional Long Short-Term Memory layers and trained it on the SMILES strings from ChEMBL23. Very high reconstruction rates of the test set molecules were achieved (>98%), which are comparable to the ones reported in related publications. Using GTM, we have visualized the autoencoder latent space on the two-dimensional topographic map. Targeted map zones can be used for generating novel molecular structures by sampling associated latent space points and decoding them to SMILES. The sampling method based on a genetic algorithm was introduced to optimize compound properties "on the fly". The generated focused molecular libraries were shown to contain original and a priori feasible compounds which, pending actual synthesis and testing, showed encouraging behavior in independent structure-based affinity estimation procedures (pharmacophore matching, docking).
Collapse
Affiliation(s)
- Boris Sattarov
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Igor I Baskin
- Faculty of Physics , M.V. Lomonosov Moscow State University , Leninskie Gory , Moscow 19991 , Russia
| | - Dragos Horvath
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Gilles Marcou
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| | - Esben Jannik Bjerrum
- Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41 , 2860 Søborg , Denmark
| | - Alexandre Varnek
- Laboratory of Chemoinformatics , UMR 7177 University of Strasbourg/CNRS , 4 rue B. Pascal , 67000 Strasbourg , France
| |
Collapse
|
10
|
Delalande C, Awale M, Rubin M, Probst D, Ozhathil LC, Gertsch J, Abriel H, Reymond JL. Optimizing TRPM4 inhibitors in the MHFP6 chemical space. Eur J Med Chem 2019; 166:167-177. [DOI: 10.1016/j.ejmech.2019.01.048] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2018] [Revised: 12/18/2018] [Accepted: 01/19/2019] [Indexed: 12/12/2022]
|
11
|
A Novel Discovery: Holistic Efficacy at the Special Organ Level of Pungent Flavored Compounds from Pungent Traditional Chinese Medicine. Int J Mol Sci 2019; 20:ijms20030752. [PMID: 30754631 PMCID: PMC6387020 DOI: 10.3390/ijms20030752] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 12/25/2022] Open
Abstract
Pungent traditional Chinese medicines (TCMs) play a vital role in the clinical treatment of hepatobiliary disease, gastrointestinal diseases, cardiovascular diseases, diabetes, skin diseases and so on. Pungent TCMs have a vastness of pungent flavored (with pungent taste or smell) compounds. To elucidate the molecular mechanism of pungent flavored compounds in treating cardiovascular diseases (CVDs) and liver diseases, five pungent TCMs with the action of blood-activating and stasis-resolving (BASR) were selected. Here, an integrated systems pharmacology approach is presented for illustrating the molecular correlations between pungent flavored compounds and their holistic efficacy at the special organ level. First, we identified target proteins that are associated with pungent flavored compounds and found that these targets were functionally related to CVDs and liver diseases. Then, based on the phenotype that directly links human genes to the body parts they affect, we clustered target modules associated with pungent flavored compounds into liver and heart organs. We applied systems-based analysis to introduce a pungent flavored compound-target-pathway-organ network that clarifies mechanisms of pungent substances treating cardiovascular diseases and liver diseases by acting on the heart/liver organ. The systems pharmacology also suggests a novel systematic strategy for rational drug development from pungent TCMs in treating cardiovascular disease and associated liver diseases.
Collapse
|
12
|
Miyao T, Funatsu K, Bajorath J. Three-Dimensional Activity Landscape Models of Different Design and Their Application to Compound Mapping and Potency Prediction. J Chem Inf Model 2018; 59:993-1004. [DOI: 10.1021/acs.jcim.8b00661] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Tomoyuki Miyao
- Data Science Center and Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
| | - Kimito Funatsu
- Data Science Center and Graduate School of Science and Technology, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma, Nara 630-0192, Japan
- Department of Chemical System Engineering, School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany
| |
Collapse
|
13
|
Lo YC, Rensi SE, Torng W, Altman RB. Machine learning in chemoinformatics and drug discovery. Drug Discov Today 2018; 23:1538-1546. [PMID: 29750902 DOI: 10.1016/j.drudis.2018.05.010] [Citation(s) in RCA: 451] [Impact Index Per Article: 75.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2017] [Revised: 03/29/2018] [Accepted: 05/02/2018] [Indexed: 01/03/2023]
Abstract
Chemoinformatics is an established discipline focusing on extracting, processing and extrapolating meaningful data from chemical structures. With the rapid explosion of chemical 'big' data from HTS and combinatorial synthesis, machine learning has become an indispensable tool for drug designers to mine chemical information from large compound databases to design drugs with important biological properties. To process the chemical data, we first reviewed multiple processing layers in the chemoinformatics pipeline followed by the introduction of commonly used machine learning models in drug discovery and QSAR analysis. Here, we present basic principles and recent case studies to demonstrate the utility of machine learning techniques in chemoinformatics analyses; and we discuss limitations and future directions to guide further development in this evolving field.
Collapse
Affiliation(s)
- Yu-Chen Lo
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Stefano E Rensi
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Wen Torng
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford University, Stanford, CA, USA.
| |
Collapse
|
14
|
Chakravarti SK. Distributed Representation of Chemical Fragments. ACS OMEGA 2018; 3:2825-2836. [PMID: 30023852 PMCID: PMC6044751 DOI: 10.1021/acsomega.7b02045] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 02/23/2018] [Indexed: 06/08/2023]
Abstract
This article describes an unsupervised machine learning method for computing distributed vector representation of molecular fragments. These vectors encode fragment features in a continuous high-dimensional space and enable similarity computation between individual fragments, even for small fragments with only two heavy atoms. The method is based on a word embedding algorithm borrowed from natural language processing field, and approximately 6 million unlabeled PubChem chemicals were used for training. The resulting dense fragment vectors are in contrast to the traditional sparse "one-hot" fragment representation and capture rich relational structure in the fragment space. The vectors of small linear fragments were averaged to yield distributed vectors of bigger fragments and molecules, which were used for different tasks, e.g., clustering, ligand recall, and quantitative structure-activity relationship modeling. The distributed vectors were found to be better at clustering ring systems and recall of kinase ligands as compared to standard binary fingerprints. This work demonstrates unsupervised learning of fragment chemistry from large sets of unlabeled chemical structures and subsequent application to supervised training on relatively small data sets of labeled chemicals.
Collapse
|
15
|
Olmedo DA, González-Medina M, Gupta MP, Medina-Franco JL. Cheminformatic characterization of natural products from Panama. Mol Divers 2017; 21:779-789. [PMID: 28831697 DOI: 10.1007/s11030-017-9781-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2017] [Accepted: 08/07/2017] [Indexed: 12/26/2022]
Abstract
In this work, we discuss the characterization and diversity analysis of 354 natural products (NPs) from Panama, systematically analyzed for the first time. The in-house database was compared to NPs from Brazil, compounds from Traditional Chinese Medicine, natural and semisynthetic collections used in high-throughput screening, and compounds from ChEMBL. An analysis of the "global diversity" was conducted using molecular properties of pharmaceutical interest, three molecular fingerprints of different design, molecular scaffolds, and molecular complexity. The global diversity was visualized using consensus diversity plots that revealed that the secondary metabolites in the Panamanian flora have a large scaffold diversity as compared to other composite databases and also have several unique scaffolds. The large scaffold diversity is in agreement with the broad range of biological activities that this collection of NPs from Panama has shown. This study also provided further quantitative evidence of the large structural complexity of NPs. The results obtained in this study support that NPs from Panama are promising candidates to identify selective molecules and are suitable sources of compounds for virtual screening campaigns.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- CIFLORPAN, Center for Pharmacognostic Research on Panamanian Flora, College of Pharmacy, University of Panama, Campus Universitario Octavio Méndez Pereira, Avenida Octavio Méndez Pereira, P.O. Box 0824-00172, Panama City, Republic of Panama.
| | - Mariana González-Medina
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico
| | - Mahabir P Gupta
- CIFLORPAN, Center for Pharmacognostic Research on Panamanian Flora, College of Pharmacy, University of Panama, Campus Universitario Octavio Méndez Pereira, Avenida Octavio Méndez Pereira, P.O. Box 0824-00172, Panama City, Republic of Panama
| | - José L Medina-Franco
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de México, Avenida Universidad 3000, 04510, Mexico City, Mexico.
| |
Collapse
|
16
|
Predictive cartography of metal binders using generative topographic mapping. J Comput Aided Mol Des 2017; 31:701-714. [DOI: 10.1007/s10822-017-0033-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2017] [Accepted: 06/11/2017] [Indexed: 12/27/2022]
|
17
|
González-Medina M, Owen JR, El-Elimat T, Pearce CJ, Oberlies NH, Figueroa M, Medina-Franco JL. Scaffold Diversity of Fungal Metabolites. Front Pharmacol 2017; 8:180. [PMID: 28420994 PMCID: PMC5376591 DOI: 10.3389/fphar.2017.00180] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 03/17/2017] [Indexed: 11/26/2022] Open
Abstract
Many drug discovery projects rely on commercial compounds to discover active leads. However, current commercial libraries, with mostly synthetic compounds, access a small fraction of the possible chemical diversity. Natural products, in contrast, possess a vast structural diversity and have proven to be an outstanding source of new drugs. Several chemoinformatic analyses of natural products have demonstrated their diversity and structural complexity. However, to our knowledge, the scaffold content and structural diversity of fungal secondary metabolites have never been studied. Herein, the scaffold diversity of 223 fungal metabolites was measured and compared to the diversity of approved drugs and commercial libraries for HTS containing natural, synthetic, and semi-synthetic compounds. In addition, the global diversity of the fungal isolates was assessed and compared to other reference data sets using Consensus Diversity Plots, a chemoinformatic tool recently developed. It was concluded that fungal secondary metabolites are cyclic systems with few ramifications and more diverse than the commercial libraries with natural products and semi-synthetic compounds. The fungal metabolites data set was one of the most structurally diverse, containing a large proportion of different and unique scaffolds not found in the other compound data sets including ChEMBL. Therefore, fungal metabolites offer a rich source of molecules suited for identifying diverse candidates for drug discovery.
Collapse
Affiliation(s)
- Mariana González-Medina
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| | - John R Owen
- High-Performance Computing Research Group, ECIT Institute, Northern Ireland Science ParkBelfast, UK
| | - Tamam El-Elimat
- Department of Medicinal Chemistry and Pharmacognosy, Faculty of Pharmacy, Jordan University of Science and TechnologyIrbid, Jordan
| | | | - Nicholas H Oberlies
- Department of Chemistry and Biochemistry, University of North Carolina at GreensboroGreensboro, NC, USA
| | - Mario Figueroa
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| | - José L Medina-Franco
- Departamento de Farmacia, Facultad de Química, Universidad Nacional Autónoma de MéxicoMexico, Mexico
| |
Collapse
|
18
|
On Generative Topographic Mapping and Graph Theory combined approach for unsupervised non-linear data visualization and fault identification. Comput Chem Eng 2017. [DOI: 10.1016/j.compchemeng.2016.12.009] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
|
19
|
Velkoborsky J, Hoksza D. Scaffold analysis of PubChem database as background for hierarchical scaffold-based visualization. J Cheminform 2016; 8:74. [PMID: 28090217 PMCID: PMC5199768 DOI: 10.1186/s13321-016-0186-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Accepted: 12/02/2016] [Indexed: 11/25/2022] Open
Abstract
Background Visualization of large molecular datasets is a challenging yet important topic utilised in diverse fields of chemistry ranging from material engineering to drug design. Especially in drug design, modern methods of high-throughput screening generate large amounts of molecular data that call for methods enabling their analysis. One such method is classification of compounds based on their molecular scaffolds, a concept widely used by medicinal chemists to group molecules of similar properties. This classification can then be utilized for intuitive visualization of compounds. Results In this paper, we propose a scaffold hierarchy as a result of large-scale analysis of the PubChem Compound database. The analysis not only provided insights into scaffold diversity of the PubChem Compound database, but also enables scaffold-based hierarchical visualization of user compound data sets on the background of empirical chemical space, as defined by the PubChem data, or on the background of any other user-defined data set. The visualization is performed by a web based client-server application called Scaffvis. It provides an interactive zoomable tree map visualization of data sets up to hundreds of thousands molecules. Scaffvis is free to use and its source codes have been published under an open source license.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0186-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jakub Velkoborsky
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| | - David Hoksza
- Faculty of Mathematics and Physics, Charles University, Prague, Czech Republic
| |
Collapse
|
20
|
Consensus Diversity Plots: a global diversity analysis of chemical libraries. J Cheminform 2016; 8:63. [PMID: 27895718 PMCID: PMC5105260 DOI: 10.1186/s13321-016-0176-9] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 10/27/2016] [Indexed: 01/14/2023] Open
Abstract
Background Measuring the structural diversity of compound databases is relevant in drug discovery and many other areas of chemistry. Since molecular diversity depends on molecular representation, comprehensive chemoinformatic analysis of the diversity of libraries uses multiple criteria. For instance, the diversity of the molecular libraries is typically evaluated employing molecular scaffolds, structural fingerprints, and physicochemical properties. However, the assessment with each criterion is analyzed independently and it is not straightforward to provide an evaluation of the “global diversity”. Results Herein the Consensus Diversity Plot (CDP) is proposed as a novel method to represent in low dimensions the diversity of chemical libraries considering simultaneously multiple molecular representations. We illustrate the application of CDPs to classify eight compound data sets and two subsets with different sizes and compositions using molecular scaffolds, structural fingerprints, and physicochemical properties. Conclusions CDPs are general data mining tools that represent in two-dimensions the global diversity of compound data sets using multiple metrics. These plots can be constructed using single or combined measures of diversity. An online version of the CDPs is freely available at: https://consensusdiversityplots-difacquim-unam.shinyapps.io/RscriptsCDPlots/.Consensus Diversity Plot is a novel data mining tool that represents in two-dimensions the global diversity of compound data sets using multiple metrics. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-016-0176-9) contains supplementary material, which is available to authorized users.
Collapse
|
21
|
Awale M, Reymond JL. Web-based 3D-visualization of the DrugBank chemical space. J Cheminform 2016; 8:25. [PMID: 27148409 PMCID: PMC4855437 DOI: 10.1186/s13321-016-0138-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2016] [Accepted: 04/27/2016] [Indexed: 12/14/2022] Open
Abstract
Background Similarly to the periodic table for elements, chemical space offers an organizing principle for representing the diversity of organic molecules, usually in the form of multi-dimensional property spaces that are subjected to dimensionality reduction methods to obtain 3D-spaces or 2D-maps suitable for visual inspection. Unfortunately, tools to look at chemical space on the internet are currently very limited. Results Herein we present webDrugCS, a web application freely available at www.gdb.unibe.ch to visualize DrugBank (www.drugbank.ca, containing over 6000 investigational and approved drugs) in five different property spaces. WebDrugCS displays 3D-clouds of color-coded grid points representing molecules, whose structural formula is displayed on mouse over with an option to link to the corresponding molecule page at the DrugBank website. The 3D-clouds are obtained by principal component analysis of high dimensional property spaces describing constitution and topology (42D molecular quantum numbers MQN), structural features (34D SMILES fingerprint SMIfp), molecular shape (20D atom pair fingerprint APfp), pharmacophores (55D atom category extended atom pair fingerprint Xfp) and substructures (1024D binary substructure fingerprint Sfp). User defined molecules can be uploaded as SMILES lists and displayed together with DrugBank. In contrast to 2D-maps where many compounds fold onto each other, these 3D-spaces have a comparable resolution to their parent high-dimensional chemical space. Conclusion To the best of our knowledge webDrugCS is the first publicly available web tool for interactive visualization and exploration of the DrugBank chemical space in 3D. WebDrugCS works on computers, tablets and phones, and facilitates the visual exploration of DrugBank to rapidly learn about the structural diversity of small molecule drugs.webDrugCS visualization of DrugBank projected in 3D MQN space color-coded by ring count, with pointer showing the drug 5-fluorouracil. ![]()
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Bern, Freiestrasse 3, 3012 Bern, Switzerland
| |
Collapse
|
22
|
Gaspar HA, Baskin II, Varnek A. Visualization of a Multidimensional Descriptor Space. FRONTIERS IN MOLECULAR DESIGN AND CHEMICAL INFORMATION SCIENCE - HERMAN SKOLNIK AWARD SYMPOSIUM 2015: JÜRGEN BAJORATH 2016. [DOI: 10.1021/bk-2016-1222.ch012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Affiliation(s)
- Héléna A. Gaspar
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| | - Igor I. Baskin
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| | - Alexandre Varnek
- Laboratoire de Chemoinformatique, UMR 7140, Université de Strasbourg, 1 rue Blaise Pascal, Strasbourg 67000, France
- Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 119991, Russia
- Laboratory of Chemoinformatics, Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| |
Collapse
|
23
|
Awale M, Reymond JL. Similarity Mapplet: Interactive Visualization of the Directory of Useful Decoys and ChEMBL in High Dimensional Chemical Spaces. J Chem Inf Model 2015. [PMID: 26207526 DOI: 10.1021/acs.jcim.5b00182] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
An Internet portal accessible at www.gdb.unibe.ch has been set up to automatically generate color-coded similarity maps of the ChEMBL database in relation to up to two sets of active compounds taken from the enhanced Directory of Useful Decoys (eDUD), a random set of molecules, or up to two sets of user-defined reference molecules. These maps visualize the relationships between the selected compounds and ChEMBL in six different high dimensional chemical spaces, namely MQN (42-D molecular quantum numbers), SMIfp (34-D SMILES fingerprint), APfp (20-D shape fingerprint), Xfp (55-D pharmacophore fingerprint), Sfp (1024-bit substructure fingerprint), and ECfp4 (1024-bit extended connectivity fingerprint). The maps are supplied in form of Java based desktop applications called "similarity mapplets" allowing interactive content browsing and linked to a "Multifingerprint Browser for ChEMBL" (also accessible directly at www.gdb.unibe.ch ) to perform nearest neighbor searches. One can obtain six similarity mapplets of ChEMBL relative to random reference compounds, 606 similarity mapplets relative to single eDUD active sets, 30,300 similarity mapplets relative to pairs of eDUD active sets, and any number of similarity mapplets relative to user-defined reference sets to help visualize the structural diversity of compound series in drug optimization projects and their relationship to other known bioactive compounds.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
24
|
Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS. Progress in visual representations of chemical space. Expert Opin Drug Discov 2015; 10:959-73. [DOI: 10.1517/17460441.2015.1060216] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
25
|
Kaneko H, Funatsu K. Applicability Domain Based on Ensemble Learning in Classification and Regression Analyses. J Chem Inf Model 2014; 54:2469-82. [DOI: 10.1021/ci500364e] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Hiromasa Kaneko
- Department
of Chemical Systems
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department
of Chemical Systems
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
26
|
Ovchinnikova SI, Bykov AA, Tsivadze AY, Dyachkov EP, Kireeva NV. Supervised extensions of chemography approaches: case studies of chemical liabilities assessment. J Cheminform 2014; 6:20. [PMID: 24868246 PMCID: PMC4018504 DOI: 10.1186/1758-2946-6-20] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2013] [Accepted: 04/28/2014] [Indexed: 12/04/2022] Open
Abstract
Chemical liabilities, such as adverse effects and toxicity, play a significant role in modern drug discovery process. In silico assessment of chemical liabilities is an important step aimed to reduce costs and animal testing by complementing or replacing in vitro and in vivo experiments. Herein, we propose an approach combining several classification and chemography methods to be able to predict chemical liabilities and to interpret obtained results in the context of impact of structural changes of compounds on their pharmacological profile. To our knowledge for the first time, the supervised extension of Generative Topographic Mapping is proposed as an effective new chemography method. New approach for mapping new data using supervised Isomap without re-building models from the scratch has been proposed. Two approaches for estimation of model's applicability domain are used in our study to our knowledge for the first time in chemoinformatics. The structural alerts responsible for the negative characteristics of pharmacological profile of chemical compounds has been found as a result of model interpretation.
Collapse
Affiliation(s)
- Svetlana I Ovchinnikova
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Arseniy A Bykov
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| | - Aslan Yu Tsivadze
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
| | - Evgeny P Dyachkov
- Kurnakov Institute of General and Inorganic Chemistry RAS, Leninsky pr-t 31, 119071 Moscow, Russia
| | - Natalia V Kireeva
- Frumkin Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31-4, 119071 Moscow, Russia
- Moscow Institute of Physics and Technology, Institutsky per., 9, 141700 Dolgoprudny, Russia
| |
Collapse
|
27
|
Nonlinear Dimensionality Reduction for Visualizing Toxicity Data: Distance-Based Versus Topology-Based Approaches. ChemMedChem 2014; 9:1047-59. [DOI: 10.1002/cmdc.201400027] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2014] [Indexed: 01/11/2023]
|
28
|
Impact of distance-based metric learning on classification and visualization model performance and structure–activity landscapes. J Comput Aided Mol Des 2014; 28:61-73. [DOI: 10.1007/s10822-014-9719-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2013] [Accepted: 01/24/2014] [Indexed: 10/25/2022]
|
29
|
Medina-Franco JL, Méndez-Lucio O, Martinez-Mayorga K. The Interplay Between Molecular Modeling and Chemoinformatics to Characterize Protein–Ligand and Protein–Protein Interactions Landscapes for Drug Discovery. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 96:1-37. [DOI: 10.1016/bs.apcsb.2014.06.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
30
|
Hasegawa K, Funatsu K. Prediction of ProteinProtein Interaction Pocket Using L-Shaped PLS Approach and Its Visualizations by Generative Topographic Mapping. Mol Inform 2013; 33:65-72. [DOI: 10.1002/minf.201300137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Accepted: 10/02/2013] [Indexed: 12/26/2022]
|
31
|
|
32
|
Kaneko H, Funatsu K. Criterion for Evaluating the Predictive Ability of Nonlinear Regression Models without Cross-Validation. J Chem Inf Model 2013; 53:2341-8. [DOI: 10.1021/ci4003766] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Hiromasa Kaneko
- Department of Chemical System
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| | - Kimito Funatsu
- Department of Chemical System
Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8656, Japan
| |
Collapse
|
33
|
Xiao X, Min JL, Wang P, Chou KC. iCDI-PseFpt: identify the channel-drug interaction in cellular networking with PseAAC and molecular fingerprints. J Theor Biol 2013; 337:71-9. [PMID: 23988798 DOI: 10.1016/j.jtbi.2013.08.013] [Citation(s) in RCA: 104] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2013] [Revised: 07/26/2013] [Accepted: 08/14/2013] [Indexed: 12/29/2022]
Abstract
Many crucial functions in life, such as heartbeat, sensory transduction and central nervous system response, are controlled by cell signalings via various ion channels. Therefore, ion channels have become an excellent drug target, and study of ion channel-drug interaction networks is an important topic for drug development. However, it is both time-consuming and costly to determine whether a drug and a protein ion channel are interacting with each other in a cellular network by means of experimental techniques. Although some computational methods were developed in this regard based on the knowledge of the 3D (three-dimensional) structure of protein, unfortunately their usage is quite limited because the 3D structures for most protein ion channels are still unknown. With the avalanche of protein sequences generated in the post-genomic age, it is highly desirable to develop the sequence-based computational method to address this problem. To take up the challenge, we developed a new predictor called iCDI-PseFpt, in which the protein ion-channel sample is formulated by the PseAAC (pseudo amino acid composition) generated with the gray model theory, the drug compound by the 2D molecular fingerprint, and the operation engine is the fuzzy K-nearest neighbor algorithm. The overall success rate achieved by iCDI-PseFpt via the jackknife cross-validation was 87.27%, which is remarkably higher than that by any of the existing predictors in this area. As a user-friendly web-server, iCDI-PseFpt is freely accessible to the public at the website http://www.jci-bioinfo.cn/iCDI-PseFpt/. Furthermore, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated math equations presented in the paper just for its integrity. It has not escaped our notice that the current approach can also be used to study other drug-target interaction networks.
Collapse
Affiliation(s)
- Xuan Xiao
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China; Information School, Zhe-Jiang Textile & Fashion College, Ning-Bo 315211, China; Gordon Life Science Institute, 53 South Cottage Road, Belmont, MA 02478, United States.
| | - Jian-Liang Min
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | - Pu Wang
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China.
| | - Kuo-Chen Chou
- Center of Excellence in Genomic Medicine Research (CEGMR), King Abdulaziz University, Jeddah, Saudi Arabia; Gordon Life Science Institute, 53 South Cottage Road, Belmont, MA 02478, United States.
| |
Collapse
|
34
|
Medina-Franco JL, Aguayo-Ortiz R. Progress in the Visualization and Mining of Chemical and Target Spaces. Mol Inform 2013; 32:942-53. [DOI: 10.1002/minf.201300041] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 05/06/2013] [Indexed: 01/15/2023]
|
35
|
MacCuish JD, MacCuish NE. Chemoinformatics applications of cluster analysis. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2013. [DOI: 10.1002/wcms.1152] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
36
|
Awale M, van Deursen R, Reymond JL. MQN-mapplet: visualization of chemical space with interactive maps of DrugBank, ChEMBL, PubChem, GDB-11, and GDB-13. J Chem Inf Model 2013; 53:509-18. [PMID: 23297797 DOI: 10.1021/ci300513m] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Abstract
The MQN-mapplet is a Java application giving access to the structure of small molecules in large databases via color-coded maps of their chemical space. These maps are projections from a 42-dimensional property space defined by 42 integer value descriptors called molecular quantum numbers (MQN), which count different categories of atoms, bonds, polar groups, and topological features and categorize molecules by size, rigidity, and polarity. Despite its simplicity, MQN-space is relevant to biological activities. The MQN-mapplet allows localization of any molecule on the color-coded images, visualization of the molecules, and identification of analogs as neighbors on the MQN-map or in the original 42-dimensional MQN-space. No query molecule is necessary to start the exploration, which may be particularly attractive for nonchemists. To our knowledge, this type of interactive exploration tool is unprecedented for very large databases such as PubChem and GDB-13 (almost one billion molecules). The application is freely available for download at www.gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| | | | | |
Collapse
|
37
|
Kireeva N, Kuznetsov SL, Tsivadze AY. Toward Navigating Chemical Space of Ionic Liquids: Prediction of Melting Points Using Generative Topographic Maps. Ind Eng Chem Res 2012. [DOI: 10.1021/ie3021895] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Natalia Kireeva
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
- Laboratoire d’Infochimie,
UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - Sergey L. Kuznetsov
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
| | - Aslan Yu. Tsivadze
- Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31,
119071 Moscow Russian Federation
| |
Collapse
|
38
|
Medina-Franco JL. Interrogating Novel Areas of Chemical Space for Drug Discovery using Chemoinformatics. Drug Dev Res 2012. [DOI: 10.1002/ddr.21034] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
39
|
Waddell J, Medina-Franco JL. Bioactivity landscape modeling: Chemoinformatic characterization of structure–activity relationships of compounds tested across multiple targets. Bioorg Med Chem 2012; 20:5443-52. [DOI: 10.1016/j.bmc.2011.11.051] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2011] [Revised: 11/01/2011] [Accepted: 11/23/2011] [Indexed: 12/14/2022]
|
40
|
Yongye AB, Waddell J, Medina-Franco JL. Molecular scaffold analysis of natural products databases in the public domain. Chem Biol Drug Des 2012; 80:717-24. [PMID: 22863071 DOI: 10.1111/cbdd.12011] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Natural products represent important sources of bioactive compounds in drug discovery efforts. In this work, we compiled five natural products databases available in the public domain and performed a comprehensive chemoinformatic analysis focused on the content and diversity of the scaffolds with an overview of the diversity based on molecular fingerprints. The natural products databases were compared with each other and with a set of molecules obtained from in-house combinatorial libraries, and with a general screening commercial library. It was found that publicly available natural products databases have different scaffold diversity. In contrast to the common concept that larger libraries have the largest scaffold diversity, the largest natural products collection analyzed in this work was not the most diverse. The general screening library showed, overall, the highest scaffold diversity. However, considering the most frequent scaffolds, the general reference library was the least diverse. In general, natural products databases in the public domain showed low molecule overlap. In addition to benzene and acyclic compounds, flavones, coumarins, and flavanones were identified as the most frequent molecular scaffolds across the different natural products collections. The results of this work have direct implications in the computational and experimental screening of natural product databases for drug discovery.
Collapse
Affiliation(s)
- Austin B Yongye
- Torrey Pines Institute for Molecular Studies, 11350 SW Village Parkway, Port St. Lucie, FL 34987, USA
| | | | | |
Collapse
|
41
|
López-Vallejo F, Giulianotti MA, Houghten RA, Medina-Franco JL. Expanding the medicinally relevant chemical space with compound libraries. Drug Discov Today 2012; 17:718-26. [PMID: 22515962 DOI: 10.1016/j.drudis.2012.04.001] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2012] [Revised: 03/01/2012] [Accepted: 04/02/2012] [Indexed: 02/04/2023]
Abstract
Analysis of marketed drugs and commercial vendor libraries used in high-throughput screening suggests that the medicinally relevant chemical space may be expanded to unexplored regions. Novel regions of the chemical space can be conveniently explored with structurally unique molecules with increased complexity and balanced physicochemical properties. As a case study, we discuss the chemoinformatic profile of natural products in the Traditional Chinese Medicine (TCM) database and a large collection assembled from 30 small-molecule combinatorial libraries with emphasis on assessing molecular complexity. The herein surveyed combinatorial libraries have been successfully used over the past 20 years to identify novel bioactive compounds across different therapeutic areas. Combinatorial libraries and natural products are suitable sources to expand the traditional relevant medicinal chemistry space.
Collapse
Affiliation(s)
- Fabian López-Vallejo
- Torrey Pines Institute for Molecular Studies, 11350 SW Village Parkway, Port St. Lucie, FL 34987, USA
| | | | | | | |
Collapse
|
42
|
Kireeva N, Baskin II, Gaspar HA, Horvath D, Marcou G, Varnek A. Generative Topographic Mapping (GTM): Universal Tool for Data Visualization, Structure-Activity Modeling and Dataset Comparison. Mol Inform 2012; 31:301-12. [PMID: 27477099 DOI: 10.1002/minf.201100163] [Citation(s) in RCA: 86] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Accepted: 02/29/2012] [Indexed: 11/10/2022]
Abstract
Here, the utility of Generative Topographic Maps (GTM) for data visualization, structure-activity modeling and database comparison is evaluated, on hand of subsets of the Database of Useful Decoys (DUD). Unlike other popular dimensionality reduction approaches like Principal Component Analysis, Sammon Mapping or Self-Organizing Maps, the great advantage of GTMs is providing data probability distribution functions (PDF), both in the high-dimensional space defined by molecular descriptors and in 2D latent space. PDFs for the molecules of different activity classes were successfully used to build classification models in the framework of the Bayesian approach. Because PDFs are represented by a mixture of Gaussian functions, the Bhattacharyya kernel has been proposed as a measure of the overlap of datasets, which leads to an elegant method of global comparison of chemical libraries.
Collapse
Affiliation(s)
- N Kireeva
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.,Institute of Physical Chemistry and Electrochemistry RAS, Leninsky pr-t 31, 119991 Moscow, Russian Federation
| | - I I Baskin
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.,Department of Chemistry, Lomonosov Moscow State University, 119991, Moscow, Russian Federation
| | - H A Gaspar
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - D Horvath
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - G Marcou
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France
| | - A Varnek
- Laboratoire d'Infochimie, UMR 7177 CNRS, Université de Strasbourg, 4 rue B. Pascal, Strasbourg 67000, France.
| |
Collapse
|
43
|
Lounkine E, Kutchukian P, Petrone P, Davies JW, Glick M. Chemotography for multi-target SAR analysis in the context of biological pathways. Bioorg Med Chem 2012; 20:5416-27. [PMID: 22405595 DOI: 10.1016/j.bmc.2012.02.034] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2012] [Revised: 02/08/2012] [Accepted: 02/11/2012] [Indexed: 10/28/2022]
Abstract
The increasing amount of chemogenomics data, that is, activity measurements of many compounds across a variety of biological targets, allows for better understanding of pharmacology in a broad biological context. Rather than assessing activity at individual biological targets, today understanding of compound interaction with complex biological systems and molecular pathways is often sought in phenotypic screens. This perspective poses novel challenges to structure-activity relationship (SAR) assessment. Today, the bottleneck of drug discovery lies in the understanding of SAR of rich datasets that go beyond single targets in the context of biological pathways, potential off-targets, and complex selectivity profiles. To aid in the understanding and interpretation of such complex SAR, we introduce Chemotography (chemotype chromatography), which encodes chemical space using a color spectrum by combining clustering and multidimensional scaling. Rich biological data in our approach were visualized using spatial dimensions traditionally reserved for chemical space. This allowed us to analyze SAR in the context of target hierarchies and phylogenetic trees, two-target activity scatter plots, and biological pathways. Chemotography, in combination with the Kyoto Encyclopedia of Genes and Genomes (KEGG), also allowed us to extract pathway-relevant SAR from the ChEMBL database. We identified chemotypes showing polypharmacology and selectivity-conferring scaffolds, even in cases where individual compounds have not been tested against all relevant targets. In addition, we analyzed SAR in ChEMBL across the entire Kinome, going beyond individual compounds. Our method combines the strengths of chemical space visualization for SAR analysis and graphical representation of complex biological data. Chemotography is a new paradigm for chemogenomic data visualization and its versatile applications presented here may allow for improved assessment of SAR in biological context, such as phenotypic assay hit lists.
Collapse
Affiliation(s)
- Eugen Lounkine
- Lead Discovery Informatics, Novartis Institutes for Biomedical Research, 250 Massachusetts Ave., Cambridge, MA 02139, USA.
| | | | | | | | | |
Collapse
|
44
|
Colliandre L, Le Guilloux V, Bourg S, Morin-Allory L. Visual characterization and diversity quantification of chemical libraries: 2. Analysis and selection of size-independent, subspace-specific diversity indices. J Chem Inf Model 2012; 52:327-42. [PMID: 22181665 DOI: 10.1021/ci200535y] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
High Throughput Screening (HTS) is a standard technique widely used to find hit compounds in drug discovery projects. The high costs associated with such experiments have highlighted the need to carefully design screening libraries in order to avoid wasting resources. Molecular diversity is an established concept that has been used to this end for many years. In this article, a new approach to quantify the molecular diversity of screening libraries is presented. The approach is based on the Delimited Reference Chemical Subspace (DRCS) methodology, a new method that can be used to delimit the densest subspace spanned by a reference library in a reduced 2D continuous space. A total of 22 diversity indices were implemented or adapted to this methodology, which is used here to remove outliers and obtain a relevant cell-based partition of the subspace. The behavior of these indices was assessed and compared in various extreme situations and with respect to a set of theoretical rules that a diversity function should satisfy when libraries of different sizes have to be compared. Some gold standard indices are found inappropriate in such a context, while none of the tested indices behave perfectly in all cases. Five DRCS-based indices accounting for different aspects of diversity were finally selected, and a simple framework is proposed to use them effectively. Various libraries have been profiled with respect to more specific subspaces, which further illustrate the interest of the method.
Collapse
Affiliation(s)
- Lionel Colliandre
- Institut de Chimie Organique et Analytique (ICOA), Université d'Orléans-CNRS, UMR 7311 B.P. 6759 Rue de Chartres, 45067 Orléans Cedex 2, France
| | | | | | | |
Collapse
|
45
|
Yoo J, Medina-Franco JL. Chemoinformatic Approaches for Inhibitors of DNA Methyltransferases: Comprehensive Characterization of Screening Libraries. ACTA ACUST UNITED AC 2011. [DOI: 10.4236/cmb.2011.11002] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|