1
|
Thomas JR, Shelton C, Murphy J, Brittain S, Bray MA, Aspesi P, Concannon J, King FJ, Ihry RJ, Ho DJ, Henault M, Hadjikyriacou A, Neri M, Sigoillot FD, Pham HT, Shum M, Barys L, Jones MD, Martin EJ, Blechschmidt A, Rieffel S, Troxler TJ, Mapa FA, Jenkins JL, Jain RK, Kutchukian PS, Schirle M, Renner S. Enhancing the Small-Scale Screenable Biological Space beyond Known Chemogenomics Libraries with Gray Chemical Matter─Compounds with Novel Mechanisms from High-Throughput Screening Profiles. ACS Chem Biol 2024; 19:938-952. [PMID: 38565185 PMCID: PMC11040606 DOI: 10.1021/acschembio.3c00737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/28/2024] [Accepted: 03/01/2024] [Indexed: 04/04/2024]
Abstract
Phenotypic assays have become an established approach to drug discovery. Greater disease relevance is often achieved through cellular models with increased complexity and more detailed readouts, such as gene expression or advanced imaging. However, the intricate nature and cost of these assays impose limitations on their screening capacity, often restricting screens to well-characterized small compound sets such as chemogenomics libraries. Here, we outline a cheminformatics approach to identify a small set of compounds with likely novel mechanisms of action (MoAs), expanding the MoA search space for throughput limited phenotypic assays. Our approach is based on mining existing large-scale, phenotypic high-throughput screening (HTS) data. It enables the identification of chemotypes that exhibit selectivity across multiple cell-based assays, which are characterized by persistent and broad structure activity relationships (SAR). We validate the effectiveness of our approach in broad cellular profiling assays (Cell Painting, DRUG-seq, and Promotor Signature Profiling) and chemical proteomics experiments. These experiments revealed that the compounds behave similarly to known chemogenetic libraries, but with a notable bias toward novel protein targets. To foster collaboration and advance research in this area, we have curated a public set of such compounds based on the PubChem BioAssay dataset and made it available for use by the scientific community.
Collapse
Affiliation(s)
- Jason R. Thomas
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Claude Shelton
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jason Murphy
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Scott Brittain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Mark-Anthony Bray
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Peter Aspesi
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - John Concannon
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Frederick J. King
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Robert J. Ihry
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Daniel J. Ho
- Novartis
Biomedical Research, San Diego, California 92121, United States
| | - Martin Henault
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Marilisa Neri
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | | | - Helen T. Pham
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Matthew Shum
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Louise Barys
- Novartis
Biomedical Research, Basel 4056, Switzerland
| | - Michael D. Jones
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Eric J. Martin
- Novartis
Biomedical Research, Emeryville, California 94608, United States
| | | | | | | | - Felipa A. Mapa
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Jeremy L. Jenkins
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | - Rishi K. Jain
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | | - Markus Schirle
- Novartis
Biomedical Research, Cambridge, Massachusetts 02139, United States
| | | |
Collapse
|
2
|
Melancon K, Pliushcheuskaya P, Meiler J, Künze G. Targeting ion channels with ultra-large library screening for hit discovery. Front Mol Neurosci 2024; 16:1336004. [PMID: 38249296 PMCID: PMC10796734 DOI: 10.3389/fnmol.2023.1336004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Accepted: 12/05/2023] [Indexed: 01/23/2024] Open
Abstract
Ion channels play a crucial role in a variety of physiological and pathological processes, making them attractive targets for drug development in diseases such as diabetes, epilepsy, hypertension, cancer, and chronic pain. Despite the importance of ion channels in drug discovery, the vastness of chemical space and the complexity of ion channels pose significant challenges for identifying drug candidates. The use of in silico methods in drug discovery has dramatically reduced the time and cost of drug development and has the potential to revolutionize the field of medicine. Recent advances in computer hardware and software have enabled the screening of ultra-large compound libraries. Integration of different methods at various scales and dimensions is becoming an inevitable trend in drug development. In this review, we provide an overview of current state-of-the-art computational chemistry methodologies for ultra-large compound library screening and their application to ion channel drug discovery research. We discuss the advantages and limitations of various in silico techniques, including virtual screening, molecular mechanics/dynamics simulations, and machine learning-based approaches. We also highlight several successful applications of computational chemistry methodologies in ion channel drug discovery and provide insights into future directions and challenges in this field.
Collapse
Affiliation(s)
- Kortney Melancon
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
| | | | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, United States
- Center for Structural Biology, Vanderbilt University, Nashville, TN, United States
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
| | - Georg Künze
- Medical Faculty, Institute for Drug Discovery, Leipzig University, Leipzig, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics, Leipzig University, Leipzig, Germany
| |
Collapse
|
3
|
Hanser T. Federated learning for molecular discovery. Curr Opin Struct Biol 2023; 79:102545. [PMID: 36804704 DOI: 10.1016/j.sbi.2023.102545] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2022] [Revised: 01/06/2023] [Accepted: 01/13/2023] [Indexed: 02/18/2023]
Abstract
Federated Learning enables machine learning across multiple sources of data and alleviates the risk of leaking private information between partners thereby encouraging knowledge sharing and collaborative modelling. Hence, Federated Learning opens the ways to a new generation of improved models. Domains involving molecular informatics, like Drug Discovery, are progressively adopting Federated Learning; this review describes the main projects and applications of Federated Learning for molecular discovery with a special focus on their benefits and the remaining challenges. All the studies demonstrate a real benefit of Federated Learning, namely the improvement of the performance of models as well as their applicability domain thanks to knowledge aggregation. The selected publications also reveal several remaining challenges to be addressed to fully exploit Federated Learning.
Collapse
Affiliation(s)
- Thierry Hanser
- Lhasa Limited, Granary Wharf House. 2 Canal Wharf. LS11 5PS Leeds United Kingdom.
| |
Collapse
|
4
|
MORTAR: a rich client application for in silico molecule fragmentation. J Cheminform 2023; 15:1. [PMID: 36593523 PMCID: PMC9809053 DOI: 10.1186/s13321-022-00674-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Accepted: 12/17/2022] [Indexed: 01/03/2023] Open
Abstract
Developing and implementing computational algorithms for the extraction of specific substructures from molecular graphs (in silico molecule fragmentation) is an iterative process. It involves repeated sequences of implementing a rule set, applying it to relevant structural data, checking the results, and adjusting the rules. This requires a computational workflow with data import, fragmentation algorithm integration, and result visualisation. The described workflow is normally unavailable for a new algorithm and must be set up individually. This work presents an open Java rich client Graphical User Interface (GUI) application to support the development of new in silico molecule fragmentation algorithms and make them readily available upon release. The MORTAR (MOlecule fRagmenTAtion fRamework) application visualises fragmentation results of a set of molecules in various ways and provides basic analysis features. Fragmentation algorithms can be integrated and developed within MORTAR by using a specific wrapper class. In addition, fragmentation pipelines with any combination of the available fragmentation methods can be executed. Upon release, three fragmentation algorithms are already integrated: ErtlFunctionalGroupsFinder, Sugar Removal Utility, and Scaffold Generator. These algorithms, as well as all cheminformatics functionalities in MORTAR, are implemented based on the Chemistry Development Kit (CDK).
Collapse
|
5
|
Dai X, Xu Y, Qiu H, Qian X, Lin M, Luo L, Zhao Y, Huang D, Zhang Y, Chen Y, Liu H, Jiang Y. KID: A Kinase-Focused Interaction Database and Its Application in the Construction of Kinase-Focused Molecule Databases. J Chem Inf Model 2022; 62:6022-6034. [PMID: 36447388 DOI: 10.1021/acs.jcim.2c00908] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Protein kinases are important drug targets for the treatment of several diseases. The interaction between kinases and ligands is vital in the process of small-molecule kinase inhibitor (SMKI) design. In this study, we propose a method to extract fragments and amino acid residues from crystal structures for kinase-ligand interactions. In addition, core fragments that interact with the important hinge region of kinases were extracted along with their decorations. Based on the superimposed structural data of kinases from the kinase-ligand interaction fingerprint and structure database, we obtained two libraries, namely, a hinge-unfocused fragment-amino acid pair library (FAP Lib) that contains 6672 pairs of fragments and corresponding amino-acids, and a hinge-focused hinge binder library (HB Lib) of 3560 pairs of hinge-binding scaffolds with their corresponding decorations. These two libraries constitute a kinase-focused interaction database (KID). In depth analysis was conducted on KID to explore important characteristics of fragments in the design of SMKIs. With KID, we built two kinase-focused molecule databases, one called Recomb_DB, which contains 1,72,346 molecules generated through fragment recombination based on the FAP Lib, and another called RsdHB_DB, which contains 93,030 molecules generated based on our HB Lib using molecular generation methods. Compared with five databases both commercial and non-commercial, these two databases both ranked top 3 in scaffold diversity, top 4 in molecule fingerprint diversity, and are more focused on the chemical space of kinase inhibitors. Hence, KID presents a useful addition to existing databases for the exploration of novel SMKIs.
Collapse
Affiliation(s)
- Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xu Qian
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Mingde Lin
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yulei Jiang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
6
|
Gider V, Budak C. Instruction of molecular structure similarity and scaffolds of drugs under investigation in ebola virus treatment by atom-pair and graph network: A combination of favipiravir and molnupiravir. Comput Biol Chem 2022; 101:107778. [DOI: 10.1016/j.compbiolchem.2022.107778] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 11/26/2022]
|
7
|
Schaub J, Zander J, Zielesny A, Steinbeck C. Scaffold Generator: a Java library implementing molecular scaffold functionalities in the Chemistry Development Kit (CDK). J Cheminform 2022; 14:79. [PMID: 36357931 PMCID: PMC9650898 DOI: 10.1186/s13321-022-00656-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Accepted: 10/30/2022] [Indexed: 11/12/2022] Open
Abstract
The concept of molecular scaffolds as defining core structures of organic molecules is utilised in many areas of chemistry and cheminformatics, e.g. drug design, chemical classification, or the analysis of high-throughput screening data. Here, we present Scaffold Generator, a comprehensive open library for the generation, handling, and display of molecular scaffolds, scaffold trees and networks. The new library is based on the Chemistry Development Kit (CDK) and highly customisable through multiple settings, e.g. five different structural framework definitions are available. For display of scaffold hierarchies, the open GraphStream Java library is utilised. Performance snapshots with natural products (NP) from the COCONUT (COlleCtion of Open Natural prodUcTs) database and drug molecules from DrugBank are reported. The generation of a scaffold network from more than 450,000 NP can be achieved within a single day.
Collapse
Affiliation(s)
- Jonas Schaub
- grid.9613.d0000 0001 1939 2794Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessing Strasse 8, 07743 Jena, Germany
| | - Julian Zander
- grid.454254.60000 0004 0647 4362Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665 Recklinghausen, Germany
| | - Achim Zielesny
- grid.454254.60000 0004 0647 4362Institute for Bioinformatics and Chemoinformatics, Westphalian University of Applied Sciences, August-Schmidt-Ring 10, 45665 Recklinghausen, Germany
| | - Christoph Steinbeck
- grid.9613.d0000 0001 1939 2794Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Lessing Strasse 8, 07743 Jena, Germany
| |
Collapse
|
8
|
Tang B, He F, Liu D, He F, Wu T, Fang M, Niu Z, Wu Z, Xu D. AI-Aided Design of Novel Targeted Covalent Inhibitors against SARS-CoV-2. Biomolecules 2022. [PMID: 35740872 DOI: 10.1101/2020.03.03.972133v1.full] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/26/2023] Open
Abstract
The drug repurposing of known approved drugs (e.g., lopinavir/ritonavir) has failed to treat SARS-CoV-2-infected patients. Therefore, it is important to generate new chemical entities against this virus. As a critical enzyme in the lifecycle of the coronavirus, the 3C-like main protease (3CLpro or Mpro) is the most attractive target for antiviral drug design. Based on a recently solved structure (PDB ID: 6LU7), we developed a novel advanced deep Q-learning network with a fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CLpro. We obtained a series of derivatives from the lead compounds based on our structure-based optimization policy (SBOP). All of the 47 lead compounds obtained directly with our AI model and related derivatives based on the SBOP are accessible in our molecular library. These compounds can be used as potential candidates by researchers to develop drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Bowen Tang
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
- MindRank AI Ltd., Hangzhou 310000, China
| | - Fengming He
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | - Dongpeng Liu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| | - Fei He
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Tong Wu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
- Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100006, China
| | - Meijuan Fang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | | | - Zhen Wu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
9
|
Tang B, He F, Liu D, He F, Wu T, Fang M, Niu Z, Wu Z, Xu D. AI-Aided Design of Novel Targeted Covalent Inhibitors against SARS-CoV-2. Biomolecules 2022; 12:746. [PMID: 35740872 PMCID: PMC9220321 DOI: 10.3390/biom12060746] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 05/17/2022] [Accepted: 05/20/2022] [Indexed: 02/04/2023] Open
Abstract
The drug repurposing of known approved drugs (e.g., lopinavir/ritonavir) has failed to treat SARS-CoV-2-infected patients. Therefore, it is important to generate new chemical entities against this virus. As a critical enzyme in the lifecycle of the coronavirus, the 3C-like main protease (3CLpro or Mpro) is the most attractive target for antiviral drug design. Based on a recently solved structure (PDB ID: 6LU7), we developed a novel advanced deep Q-learning network with a fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CLpro. We obtained a series of derivatives from the lead compounds based on our structure-based optimization policy (SBOP). All of the 47 lead compounds obtained directly with our AI model and related derivatives based on the SBOP are accessible in our molecular library. These compounds can be used as potential candidates by researchers to develop drugs against SARS-CoV-2.
Collapse
Affiliation(s)
- Bowen Tang
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
- MindRank AI Ltd., Hangzhou 310000, China;
| | - Fengming He
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | - Dongpeng Liu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
| | - Fei He
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Tong Wu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
- Department of Epidemiology and Statistics, Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences, School of Basic Medicine, Peking Union Medical College, Beijing 100006, China
| | - Meijuan Fang
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | | | - Zhen Wu
- Fujian Provincial Key Laboratory of Innovative Drug Target Research, School of Pharmaceutical Sciences, Xiamen University, Xiamen 361000, China; (F.H.); (M.F.)
| | - Dong Xu
- Department of Electrical Engineering and Computer Science, Informatics Institute, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA; (B.T.); (D.L.); (F.H.); (T.W.)
| |
Collapse
|
10
|
Simm J, Humbeck L, Zalewski A, Sturm N, Heyndrickx W, Moreau Y, Beck B, Schuffenhauer A. Splitting chemical structure data sets for federated privacy-preserving machine learning. J Cheminform 2021; 13:96. [PMID: 34876230 PMCID: PMC8650276 DOI: 10.1186/s13321-021-00576-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 11/22/2021] [Indexed: 11/10/2022] Open
Abstract
With the increase in applications of machine learning methods in drug design and related fields, the challenge of designing sound test sets becomes more and more prominent. The goal of this challenge is to have a realistic split of chemical structures (compounds) between training, validation and test set such that the performance on the test set is meaningful to infer the performance in a prospective application. This challenge is by its own very interesting and relevant, but is even more complex in a federated machine learning approach where multiple partners jointly train a model under privacy-preserving conditions where chemical structures must not be shared between the different participating parties. In this work we discuss three methods which provide a splitting of a data set and are applicable in a federated privacy-preserving setting, namely: a. locality-sensitive hashing (LSH), b. sphere exclusion clustering, c. scaffold-based binning (scaffold network). For evaluation of these splitting methods we consider the following quality criteria (compared to random splitting): bias in prediction performance, classification label and data imbalance, similarity distance between the test and training set compounds. The main findings of the paper are a. both sphere exclusion clustering and scaffold-based binning result in high quality splitting of the data sets, b. in terms of compute costs sphere exclusion clustering is very expensive in the case of federated privacy-preserving setting.
Collapse
Affiliation(s)
- Jaak Simm
- KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, 3001, Heverlee, Belgium
| | - Lina Humbeck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Adam Zalewski
- Amgen Research (Munich) GmbH, Staffelseestraße 2, 81477, Munich, Germany
| | - Noe Sturm
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002, Basel, Switzerland
| | - Wouter Heyndrickx
- Janssen Pharmaceutica N.V., Janssen Pharmaceutica, Turnhoutseweg 30, 2340, Beerse, Belgium
| | - Yves Moreau
- KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, 3001, Heverlee, Belgium
| | - Bernd Beck
- Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, 88397, Biberach an der Riss, Germany
| | - Ansgar Schuffenhauer
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002, Basel, Switzerland.
| |
Collapse
|
11
|
Naveja JJ, Vogt M. Automatic Identification of Analogue Series from Large Compound Data Sets: Methods and Applications. Molecules 2021; 26:5291. [PMID: 34500724 PMCID: PMC8433811 DOI: 10.3390/molecules26175291] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 08/27/2021] [Accepted: 08/28/2021] [Indexed: 01/21/2023] Open
Abstract
Analogue series play a key role in drug discovery. They arise naturally in lead optimization efforts where analogues are explored based on one or a few core structures. However, it is much harder to accurately identify and extract pairs or series of analogue molecules in large compound databases with no predefined core structures. This methodological review outlines the most common and recent methodological developments to automatically identify analogue series in large libraries. Initial approaches focused on using predefined rules to extract scaffold structures, such as the popular Bemis-Murcko scaffold. Later on, the matched molecular pair concept led to efficient algorithms to identify similar compounds sharing a common core structure by exploring many putative scaffolds for each compound. Further developments of these ideas yielded, on the one hand, approaches for hierarchical scaffold decomposition and, on the other hand, algorithms for the extraction of analogue series based on single-site modifications (so-called matched molecular series) by exploring potential scaffold structures based on systematic molecule fragmentation. Eventually, further development of these approaches resulted in methods for extracting analogue series defined by a single core structure with several substitution sites that allow convenient representations, such as R-group tables. These methods enable the efficient analysis of large data sets with hundreds of thousands or even millions of compounds and have spawned many related methodological developments.
Collapse
Affiliation(s)
- José J. Naveja
- Instituto de Química, Universidad Nacional Autónoma de México, Mexico City 04510, Mexico;
| | - Martin Vogt
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5-6, 53115 Bonn, Germany
| |
Collapse
|
12
|
Manelfi C, Gemei M, Talarico C, Cerchia C, Fava A, Lunghini F, Beccari AR. "Molecular Anatomy": a new multi-dimensional hierarchical scaffold analysis tool. J Cheminform 2021; 13:54. [PMID: 34301327 PMCID: PMC8299179 DOI: 10.1186/s13321-021-00526-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Accepted: 06/13/2021] [Indexed: 11/10/2022] Open
Abstract
The scaffold representation is widely employed to classify bioactive compounds on the basis of common core structures or correlate compound classes with specific biological activities. In this paper, we present a novel approach called "Molecular Anatomy" as a flexible and unbiased molecular scaffold-based metrics to cluster large set of compounds. We introduce a set of nine molecular representations at different abstraction levels, combined with fragmentation rules, to define a multi-dimensional network of hierarchically interconnected molecular frameworks. We demonstrate that the introduction of a flexible scaffold definition and multiple pruning rules is an effective method to identify relevant chemical moieties. This approach allows to cluster together active molecules belonging to different molecular classes, capturing most of the structure activity information, in particular when libraries containing a huge number of singletons are analyzed. We also propose a procedure to derive a network visualization that allows a full graphical representation of compounds dataset, permitting an efficient navigation in the scaffold's space and significantly contributing to perform high quality SAR analysis. The protocol is freely available as a web interface at https://ma.exscalate.eu .
Collapse
Affiliation(s)
- Candida Manelfi
- Dompé Farmaceutici SpA, Via Campo di Pile, 67100, L'Aquila, Italy
| | - Marica Gemei
- Dompé Farmaceutici SpA, Via Campo di Pile, 67100, L'Aquila, Italy
| | - Carmine Talarico
- Dompé Farmaceutici SpA, Via Campo di Pile, 67100, L'Aquila, Italy
| | - Carmen Cerchia
- Department of Pharmacy, University of Naples "Federico II", 80131, Napoli, Italy
| | - Anna Fava
- Dompé Farmaceutici SpA, Via Campo di Pile, 67100, L'Aquila, Italy
| | - Filippo Lunghini
- Dompé Farmaceutici SpA, Via Campo di Pile, 67100, L'Aquila, Italy
| | | |
Collapse
|
13
|
Balachandra C, Padhi D, Govindaraju T. Cyclic Dipeptide: A Privileged Molecular Scaffold to Derive Structural Diversity and Functional Utility. ChemMedChem 2021; 16:2558-2587. [PMID: 33938157 DOI: 10.1002/cmdc.202100149] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Indexed: 12/11/2022]
Abstract
Cyclic dipeptides (CDPs) are the simplest form of cyclic peptides with a wide range of applications from therapeutics to biomaterials. CDP is a versatile molecular platform endowed with unique properties such as conformational rigidity, intermolecular interactions, structural diversification through chemical synthesis, bioavailability and biocompatibility. A variety of natural products with the CDP core exhibit anticancer, antifungal, antibacterial, and antiviral activities. The inherent bioactivities have inspired the development of synthetic analogues as drug candidates and drug delivery systems. CDP plays a crucial role as conformation and molecular assembly directing core in the design of molecular receptors, peptidomimetics and fabrication of functional material architectures. In recent years, CDP has rapidly become a privileged scaffold for the design of advanced drug candidates, drug delivery agents, bioimaging, and biomaterials to mitigate numerous disease conditions. This review describes the structural diversification and multifarious biomedical applications of the CDP scaffold, discusses challenges, and provides future directions for the emerging field.
Collapse
Affiliation(s)
- Chenikkayala Balachandra
- Bioorganic Chemistry Laboratory, New Chemistry Unit and School of Advanced materials (SAMat), Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Jakkur P.O., Bangalore, 560064, India
| | - Dikshaa Padhi
- Bioorganic Chemistry Laboratory, New Chemistry Unit and School of Advanced materials (SAMat), Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Jakkur P.O., Bangalore, 560064, India
| | - Thimmaiah Govindaraju
- Bioorganic Chemistry Laboratory, New Chemistry Unit and School of Advanced materials (SAMat), Jawaharlal Nehru Centre for Advanced Scientific Research (JNCASR), Jakkur P.O., Bangalore, 560064, India
| |
Collapse
|
14
|
Yoshimori A, Hu H, Bajorath J. Adapting the DeepSARM approach for dual-target ligand design. J Comput Aided Mol Des 2021; 35:587-600. [PMID: 33712972 PMCID: PMC8131309 DOI: 10.1007/s10822-021-00379-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2021] [Accepted: 02/24/2021] [Indexed: 11/29/2022]
Abstract
The structure–activity relationship (SAR) matrix (SARM) methodology and data structure was originally developed to extract structurally related compound series from data sets of any composition, organize these series in matrices reminiscent of R-group tables, and visualize SAR patterns. The SARM approach combines the identification of structural relationships between series of active compounds with analog design, which is facilitated by systematically exploring combinations of core structures and substituents that have not been synthesized. The SARM methodology was extended through the introduction of DeepSARM, which added deep learning and generative modeling to target-based analog design by taking compound information from related targets into account to further increase structural novelty. Herein, we present the foundations of the SARM methodology and discuss how DeepSARM modeling can be adapted for the design of compounds with dual-target activity. Generating dual-target compounds represents an equally attractive and challenging task for polypharmacology-oriented drug discovery. The DeepSARM-based approach is illustrated using a computational proof-of-concept application focusing on the design of candidate inhibitors for two prominent anti-cancer targets.
Collapse
Affiliation(s)
- Atsushi Yoshimori
- Institute for Theoretical Medicine, Inc., 26-1 Muraoka-Higashi 2-chome, Fujisawa, Kanagawa, 251-0012, Japan
| | - Huabin Hu
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, 53115, Bonn, Germany.
| |
Collapse
|
15
|
Lai J, Li X, Wang Y, Yin S, Zhou J, Liu Z. AIScaffold: A Web-Based Tool for Scaffold Diversification Using Deep Learning. J Chem Inf Model 2020; 61:1-6. [PMID: 33356237 DOI: 10.1021/acs.jcim.0c00867] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Molecular scaffolds are widely used in drug design. Many methods and tools have been developed to utilize the information in scaffolds. Scaffold diversification is frequently used by medicinal chemists in tasks such as lead compound optimization, but tools for scaffold diversification are still lacking. Here, we propose AIScaffold (https://iaidrug.stonewise.cn), a web-based tool for scaffold diversification using the deep generative model. This tool can perform large-scale (up to 500,000 molecules) diversification in several minutes and recommend the top 500 (top 0.1%) molecules. Features such as site-specific diversification are also supported. This tool can facilitate the scaffold diversification process for medicinal chemists, thereby accelerating drug design.
Collapse
Affiliation(s)
- Junyong Lai
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Xiangbin Li
- Stonewise, No. 19 Zhongguancun Street, Haidian District, 100080 Beijing, P. R. China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| | - Shiqiu Yin
- Stonewise, No. 19 Zhongguancun Street, Haidian District, 100080 Beijing, P. R. China
| | - Jielong Zhou
- Stonewise, No. 19 Zhongguancun Street, Haidian District, 100080 Beijing, P. R. China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, 100191 Beijing, P. R. China
| |
Collapse
|
16
|
Scott OB, Edith Chan AW. ScaffoldGraph: an open-source library for the generation and analysis of molecular scaffold networks and scaffold trees. Bioinformatics 2020; 36:3930-3931. [PMID: 32232438 DOI: 10.1093/bioinformatics/btaa219] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 03/17/2020] [Accepted: 03/25/2020] [Indexed: 11/13/2022] Open
Abstract
SUMMARY ScaffoldGraph (SG) is an open-source Python library and command-line tool for the generation and analysis of molecular scaffold networks and trees, with the capability of processing large sets of input molecules. With the increase in high-throughput screening data, scaffold graphs have proven useful for the navigation and analysis of chemical space, being used for visualization, clustering, scaffold-diversity analysis and active-series identification. Built on RDKit and NetworkX, SG integrates scaffold graph analysis into the growing scientific/cheminformatics Python stack, increasing the flexibility and extendibility of the tool compared to existing software. AVAILABILITY AND IMPLEMENTATION SG is freely available and released under the MIT licence at https://github.com/UCLCheminformatics/ScaffoldGraph.
Collapse
Affiliation(s)
- Oliver B Scott
- Wolfson Institute of Biomedical Research, University College London, London WC1E 6BT, UK
| | - A W Edith Chan
- Wolfson Institute of Biomedical Research, University College London, London WC1E 6BT, UK
| |
Collapse
|
17
|
Schuffenhauer A, Schneider N, Hintermann S, Auld D, Blank J, Cotesta S, Engeloch C, Fechner N, Gaul C, Giovannoni J, Jansen J, Joslin J, Krastel P, Lounkine E, Manchester J, Monovich LG, Pelliccioli AP, Schwarze M, Shultz MD, Stiefl N, Baeschlin DK. Evolution of Novartis' Small Molecule Screening Deck Design. J Med Chem 2020; 63:14425-14447. [PMID: 33140646 DOI: 10.1021/acs.jmedchem.0c01332] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This article summarizes the evolution of the screening deck at the Novartis Institutes for BioMedical Research (NIBR). Historically, the screening deck was an assembly of all available compounds. In 2015, we designed a first deck to facilitate access to diverse subsets with optimized properties. We allocated the compounds as plated subsets on a 2D grid with property based ranking in one dimension and increasing structural redundancy in the other. The learnings from the 2015 screening deck were applied to the design of a next generation in 2019. We found that using traditional leadlikeness criteria (mainly MW, clogP) reduces the hit rates of attractive chemical starting points in subset screening. Consequently, the 2019 deck relies on solubility and permeability to select preferred compounds. The 2019 design also uses NIBR's experimental assay data and inferred biological activity profiles in addition to structural diversity to define redundancy across the compound sets.
Collapse
Affiliation(s)
- Ansgar Schuffenhauer
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Nadine Schneider
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Samuel Hintermann
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Douglas Auld
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Jutta Blank
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Simona Cotesta
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Caroline Engeloch
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Christoph Gaul
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jerome Giovannoni
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Johanna Jansen
- Novartis Institutes for BioMedical Research-Emeryville, 5300 Chiron Way, Emeryville, California 94608-2916, United States
| | - John Joslin
- Genomics Institute of the Novartis Foundation, San Diego, California 92121, United States
| | - Philipp Krastel
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Eugen Lounkine
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - John Manchester
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Lauren G Monovich
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Anna Paola Pelliccioli
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Manuel Schwarze
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Michael D Shultz
- Novartis Institutes for BioMedical Research Inc., 181 Massachusetts Avenue, Cambridge, Massachusetts 02139, United States
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Daniel K Baeschlin
- Novartis Institutes for BioMedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| |
Collapse
|
18
|
Kruger F, Stiefl N, Landrum GA. rdScaffoldNetwork: The Scaffold Network Implementation in RDKit. J Chem Inf Model 2020; 60:3331-3335. [PMID: 32584031 DOI: 10.1021/acs.jcim.0c00296] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
We present an implementation of the scaffold network in the open source cheminformatics toolkit RDKit. Scaffold networks have been introduced in the literature as a powerful method to navigate and analyze large screening data sets in medicinal chemistry. Such a network can be created by iteratively applying predefined fragmentation rules to the investigated set of small molecules and by linking the produced fragments according to their descendence. This procedure results in a network graph, where the nodes correspond to the fragments and the edges correspond to the operations producing one fragment from another. In extension to the scaffold network implementations suggested in the literature, the presented implementation in RDKit allows an enhanced flexibility in terms of customizing the fragmentation rules and enables the inclusion of atom- and bond-generic scaffolds into the network. The output, providing node and edge information on the network, enables a simple and elegant navigation through the network, laying the basis to organize and better understand the data set being investigated.
Collapse
Affiliation(s)
- Franziska Kruger
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | | |
Collapse
|
19
|
Kruger F, Fechner N, Stiefl N. Automated Identification of Chemical Series: Classifying like a Medicinal Chemist. J Chem Inf Model 2020; 60:2888-2902. [PMID: 32374165 DOI: 10.1021/acs.jcim.0c00204] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We investigate different automated approaches for the classification of chemical series in early drug discovery, with the aim of closely mimicking human chemical series conception. Chemical series, which are commonly defined by hand-drawn scaffolds, organize datasets in drug discovery projects. Often, they form the basis for further project decisions. To trace and evaluate these decisions in historic and ongoing projects, it is important to know or reconstruct chemical series. There is not a unique correct definition of chemical series, and the human definition certainly involves a subjective bias. Hence, we first develop quality metrics for the chemical series definitions, evaluating the size and specificity of chemical series. These metrics are applied to categorize human series definitions and implemented in automated classification approaches. For the automated classification of chemical series, we test different fragmentation and similarity-based clustering algorithms and apply different approaches to infer series definitions from these clusters or sets of fragments. We benchmark the classification results against human-defined series from 30 internal projects. The best results in reproducing the composition of human-defined series are achieved when applying UPGMA (unweighted pair group method with arithmetic mean) clustering to the project dataset and calculating maximum common substructures of the clusters as series definitions. We evaluate this approach in more detail on a public dataset and assess its robustness by 10-fold cross-validation, each time sampling 40% of the dataset. Through these benchmarking and validation experiments, we show that the proposed automated approach is able to accurately and robustly identify human-defined series, which comply with a certain, predefined level of specificity and size. Suggesting a thoroughly tested algorithm for series classification, as well as quality metrics for series and several benchmarking approaches, this work lays the foundation for further analysis of project decisions, and it offers an enhanced understanding of the properties of human-defined chemical series.
Collapse
Affiliation(s)
- Franziska Kruger
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolas Fechner
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for BioMedical Research, Novartis Pharma AG, Novartis Campus, 4002 Basel, Switzerland
| |
Collapse
|
20
|
Tang B, He F, Liu D, Fang M, Wu Z, Xu D. AI-aided design of novel targeted covalent inhibitors against SARS-CoV-2. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2020. [PMID: 32511346 DOI: 10.1101/2020.03.03.972133] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The focused drug repurposing of known approved drugs (such as lopinavir/ritonavir) has been reported failed for curing SARS-CoV-2 infected patients. It is urgent to generate new chemical entities against this virus. As a key enzyme in the life-cycle of coronavirus, the 3C-like main protease (3CL pro or M pro ) is the most attractive for antiviral drug design. Based on a recently solved structure (PDB ID: 6LU7), we developed a novel advanced deep Q-learning network with the fragment-based drug design (ADQN-FBDD) for generating potential lead compounds targeting SARS-CoV-2 3CL pro . We obtained a series of derivatives from those lead compounds by our structure-based optimization policy (SBOP). All the 47 lead compounds directly from our AI-model and related derivatives based on SBOP are accessible in our molecular library at https://github.com/tbwxmu/2019-nCov . These compounds can be used as potential candidates for researchers in their development of drugs against SARS-CoV-2.
Collapse
|
21
|
Li Y, Hu J, Wang Y, Zhou J, Zhang L, Liu Z. DeepScaffold: A Comprehensive Tool for Scaffold-Based De Novo Drug Discovery Using Deep Learning. J Chem Inf Model 2019; 60:77-91. [PMID: 31809029 DOI: 10.1021/acs.jcim.9b00727] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The ultimate goal of drug design is to find novel compounds with desirable pharmacological properties. Designing molecules retaining particular scaffolds as their core structures is an efficient way to obtain potential drug candidates. We propose a scaffold-based molecular generative model for drug discovery, which performs molecule generation based on a wide spectrum of scaffold definitions, including Bemis-Murcko scaffolds, cyclic skeletons, and scaffolds with specifications on side-chain properties. The model can generalize the learned chemical rules of adding atoms and bonds to a given scaffold. The generated compounds were evaluated by molecular docking in DRD2 targets, and the results demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compounds containing a given scaffold and de novo drug design of potential drug candidates with specific docking scores.
Collapse
Affiliation(s)
- Yibo Li
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences , Peking University , Xueyuan Road 38 , Haidian District, 100191 Beijing , China.,Stonewise , Haidian Middle Street 15 , Haidian District, 100080 Beijing , China
| | - Jianxing Hu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences , Peking University , Xueyuan Road 38 , Haidian District, 100191 Beijing , China
| | - Yanxing Wang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences , Peking University , Xueyuan Road 38 , Haidian District, 100191 Beijing , China
| | - Jielong Zhou
- Stonewise , Haidian Middle Street 15 , Haidian District, 100080 Beijing , China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences , Peking University , Xueyuan Road 38 , Haidian District, 100191 Beijing , China
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences , Peking University , Xueyuan Road 38 , Haidian District, 100191 Beijing , China
| |
Collapse
|
22
|
|
23
|
Liu M, Karuso P, Feng Y, Kellenberger E, Liu F, Wang C, Quinn RJ. Is it time for artificial intelligence to predict the function of natural products based on 2D-structure. MEDCHEMCOMM 2019; 10:1667-1677. [PMID: 31803392 PMCID: PMC6836574 DOI: 10.1039/c9md00128j] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 06/04/2019] [Indexed: 12/17/2022]
Abstract
Currently, there is no established technique that allows the function of a compound produced by nature to be predicted by looking at its 2-dimensional chemical structure. One of chemistry's grand challenges: to find a function for every known metabolite. We explore the opportunity for Artificial Intelligence to provide rationale interrogation of metabolites to predict their function.
Collapse
Affiliation(s)
- Miaomiao Liu
- Griffith Institute for Drug Discovery , Griffith University , Brisbane , Qld 4111 , Australia . ; Tel: +61 7 3735 6006
| | - Peter Karuso
- Department of Molecular Sciences , Macquarie University , Sydney , NSW 2109 , Australia
| | - Yunjiang Feng
- Griffith Institute for Drug Discovery , Griffith University , Brisbane , Qld 4111 , Australia . ; Tel: +61 7 3735 6006
| | - Esther Kellenberger
- Laboratory of Therapeutic Innovation , Medalis Drug Discovery Center , University of Strasbourg , Illkirch , France
| | - Fei Liu
- Department of Molecular Sciences , Macquarie University , Sydney , NSW 2109 , Australia
| | - Can Wang
- School of Information and Communication Technology , Griffith University , Gold Coast campus , Qld 4222 , Australia
| | - Ronald J Quinn
- Griffith Institute for Drug Discovery , Griffith University , Brisbane , Qld 4111 , Australia . ; Tel: +61 7 3735 6006
| |
Collapse
|
24
|
Bandyopadhyay D, Kreatsoulas C, Brady PG, Boyer J, He Z, Scavello G, Peryea T, Jadhav A, Nguyen DT, Guha R. Scaffold-Based Analytics: Enabling Hit-to-Lead Decisions by Visualizing Chemical Series Linked across Large Datasets. J Chem Inf Model 2019; 59:4880-4892. [DOI: 10.1021/acs.jcim.9b00243] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Affiliation(s)
- Deepak Bandyopadhyay
- GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, Pennsylvania 19426, United States
| | | | - Pat G. Brady
- GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, Pennsylvania 19426, United States
| | - Joseph Boyer
- GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, Pennsylvania 19426, United States
| | - Zangdong He
- GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, Pennsylvania 19426, United States
| | - Genaro Scavello
- GlaxoSmithKline, 1250 S. Collegeville Rd, Collegeville, Pennsylvania 19426, United States
| | - Tyler Peryea
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Ajit Jadhav
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Dac-Trung Nguyen
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| | - Rajarshi Guha
- National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, Maryland 20850, United States
| |
Collapse
|
25
|
Kunkel C, Schober C, Oberhofer H, Reuter K. Knowledge discovery through chemical space networks: the case of organic electronics. J Mol Model 2019; 25:87. [DOI: 10.1007/s00894-019-3950-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Accepted: 01/29/2019] [Indexed: 12/14/2022]
|
26
|
Saha A, Varghese T, Liu A, Allen SJ, Mirzadegan T, Hack MD. An Analysis of Different Components of a High-Throughput Screening Library. J Chem Inf Model 2018; 58:2057-2068. [PMID: 30204440 DOI: 10.1021/acs.jcim.8b00258] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Since many projects at pharmaceutical organizations get their start from a high-throughput screening (HTS) campaign, improving the quality of the HTS deck can improve the likelihood of discovering a high-quality lead molecule that can be progressed to a drug candidate. Over the past decade, Janssen has implemented several strategies for external compound acquisition to augment the screening deck beyond the chemical space and number of molecules synthesized for internal projects. In this report, we analyzed the performance of each of those compound collections in the screening campaigns performed internally within Janssen during the last five years. We classified the screening library into two broad categories: Internal and External. The comparison of the performance of these sets of libraries was done by considering the primary, confirmation, and dose response hit rates. Our analysis revealed that Internal compounds (resulting from numerous medicinal chemistry efforts against diverse protein targets) have higher average confirmation hit rates than External ones; however, actives from both categories show similar probabilities of hitting multiple distinct targets. We also investigated the property landscape of both sets of libraries to identify the key elements which make a difference in these categories of compounds. From this analysis, Janssen aims to understand the descriptor landscape of the compounds with the highest hit rates and to use them for improving its future acquisition strategies as well as to inform our plating strategy.
Collapse
Affiliation(s)
- Arjun Saha
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| | - Teena Varghese
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| | - Annie Liu
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| | - Samantha J Allen
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| | - Taraneh Mirzadegan
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| | - Michael D Hack
- Janssen Pharmaceutical Research and Development , 3210 Merryfield Row , La Jolla , California 92121 , United States
| |
Collapse
|
27
|
Li Y, Zhang L, Liu Z. Multi-objective de novo drug design with conditional graph generative model. J Cheminform 2018; 10:33. [PMID: 30043127 PMCID: PMC6057868 DOI: 10.1186/s13321-018-0287-6] [Citation(s) in RCA: 141] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 07/13/2018] [Indexed: 12/31/2022] Open
Abstract
Recently, deep generative models have revealed itself as a promising way of performing de novo molecule design. However, previous research has focused mainly on generating SMILES strings instead of molecular graphs. Although available, current graph generative models are are often too general and computationally expensive. In this work, a new de novo molecular design framework is proposed based on a type of sequential graph generators that do not use atom level recurrent units. Compared with previous graph generative models, the proposed method is much more tuned for molecule generation and has been scaled up to cover significantly larger molecules in the ChEMBL database. It is shown that the graph-based model outperforms SMILES based models in a variety of metrics, especially in the rate of valid outputs. For the application of drug design tasks, conditional graph generative model is employed. This method offers highe flexibility and is suitable for generation based on multiple objectives. The results have demonstrated that this approach can be effectively applied to solve several drug design problems, including the generation of compounds containing a given scaffold, compounds with specific drug-likeness and synthetic accessibility requirements, as well as dual inhibitors against JNK3 and GSK-3β.
Collapse
Affiliation(s)
- Yibo Li
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, Beijing, 100191, China
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, Beijing, 100191, China.
| | - Zhenming Liu
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Xueyuan Road 38, Haidian District, Beijing, 100191, China.
| |
Collapse
|
28
|
Lin A, Horvath D, Afonina V, Marcou G, Reymond JL, Varnek A. Mapping of the Available Chemical Space versus the Chemical Universe of Lead-Like Compounds. ChemMedChem 2018; 13:540-554. [PMID: 29154440 DOI: 10.1002/cmdc.201700561] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 11/07/2017] [Indexed: 12/15/2022]
Abstract
This is, to our knowledge, the most comprehensive analysis to date based on generative topographic mapping (GTM) of fragment-like chemical space (40 million molecules with no more than 17 heavy atoms, both from the theoretically enumerated GDB-17 and real-world PubChem/ChEMBL databases). The challenge was to prove that a robust map of fragment-like chemical space can actually be built, in spite of a limited (≪105 ) maximal number of compounds ("frame set") usable for fitting the GTM manifold. An evolutionary map building strategy has been updated with a "coverage check" step, which discards manifolds failing to accommodate compounds outside the frame set. The evolved map has a good propensity to separate actives from inactives for more than 20 external structure-activity sets. It was proven to properly accommodate the entire collection of 40 m compounds. Next, it served as a library comparison tool to highlight biases of real-world molecules (PubChem and ChEMBL) versus the universe of all possible species represented by FDB-17, a fragment-like subset of GDB-17 containing 10 million molecules. Specific patterns, proper to some libraries and absent from others (diversity holes), were highlighted.
Collapse
Affiliation(s)
- Arkadii Lin
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Dragos Horvath
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Valentina Afonina
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France.,Laboratory of Chemoinformatics and Molecular Modeling, Department of Organic Chemistry, A.M. Butlerov Institute of Chemistry, Kazan Federal University, 18 Kremlyovskaya str., 420008, Kazan, Russia
| | - Gilles Marcou
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne, 3 Freiestrasse, 3012, Berne, Switzerland
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Faculty of Chemistry, University of Strasbourg, 4 Blaise Pascal str., 67081, Strasbourg, France
| |
Collapse
|
29
|
Nakagawa T, Miyao T, Funatsu K. Identification of Bioactive Scaffolds Based on QSAR Models. Mol Inform 2017; 37. [DOI: 10.1002/minf.201700103] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 10/02/2017] [Indexed: 11/10/2022]
Affiliation(s)
- Tomoki Nakagawa
- Department of Chemical System Engineering, School of Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| | - Tomoyuki Miyao
- Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry Rheinische Friedrich-Wilhelms-Universität Dahlmannstr. 2 D-53113 Bonn Germany
| | - Kimito Funatsu
- Department of Chemical System Engineering, School of Engineering The University of Tokyo 7-3-1 Hongo Bunkyo-ku, Tokyo 113-8656 Japan
| |
Collapse
|
30
|
Gaspar HA, Breen G. Drug enrichment and discovery from schizophrenia genome-wide association results: an analysis and visualisation approach. Sci Rep 2017; 7:12460. [PMID: 28963561 PMCID: PMC5622077 DOI: 10.1038/s41598-017-12325-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2017] [Accepted: 09/06/2017] [Indexed: 12/27/2022] Open
Abstract
Using successful genome-wide association results in psychiatry for drug repurposing is an ongoing challenge. Databases collecting drug targets and gene annotations are growing and can be harnessed to shed a new light on psychiatric disorders. We used genome-wide association study (GWAS) summary statistics from the Psychiatric Genetics Consortium (PGC) Schizophrenia working group to build a drug repositioning model for schizophrenia. As sample size increases, schizophrenia GWAS results show increasing enrichment for known antipsychotic drugs, selective calcium channel blockers, and antiepileptics. Each of these therapeutical classes targets different gene subnetworks. We identify 123 Bonferroni-significant druggable genes outside the MHC, and 128 FDR-significant biological pathways related to neurons, synapses, genic intolerance, membrane transport, epilepsy, and mental disorders. These results suggest that, in schizophrenia, current well-powered GWAS results can reliably detect known schizophrenia drugs and thus may hold considerable potential for the identification of new therapeutic leads. Moreover, antiepileptics and calcium channel blockers may provide repurposing opportunities. This study also reveals significant pathways in schizophrenia that were not identified previously, and provides a workflow for pathway analysis and drug repurposing using GWAS results.
Collapse
Affiliation(s)
- H A Gaspar
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, MRC Social, Genetic and Developmental Psychiatry (SGDP) Centre, London, UK.
- National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Trust, London, UK.
| | - G Breen
- King's College London, Institute of Psychiatry, Psychology and Neuroscience, MRC Social, Genetic and Developmental Psychiatry (SGDP) Centre, London, UK
- National Institute for Health Research Biomedical Research Centre, South London and Maudsley National Health Service Trust, London, UK
| |
Collapse
|
31
|
Marcou G, Horvath D, Varnek A. Neighboring Structure Visualization on a Grid-based Layout. Mol Inform 2017; 36. [PMID: 28902973 DOI: 10.1002/minf.201700047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2017] [Accepted: 06/12/2017] [Indexed: 11/09/2022]
Abstract
Here, we describe an algorithm to visualize chemical structures on a grid-based layout in such a way that similar structures are neighboring. It is based on structure reordering with the help of the Hilbert Schmidt Independence Criterion, representing an empirical estimate of the Hilbert-Schmidt norm of the cross-covariance operator. The method can be applied to any layout of bi- or three-dimensional shape. The approach is demonstrated on a set of dopamine D5 ligands visualized on squared, disk and spherical layouts.
Collapse
Affiliation(s)
- G Marcou
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| | - A Varnek
- Laboratory of Chemoinformatics, University of Strasbourg, 1 rue Biaise Pascal, 67000, Strasbourg, France
| |
Collapse
|
32
|
Schäfer T, Kriege N, Humbeck L, Klein K, Koch O, Mutzel P. Scaffold Hunter: a comprehensive visual analytics framework for drug discovery. J Cheminform 2017; 9:28. [PMID: 29086162 PMCID: PMC5425364 DOI: 10.1186/s13321-017-0213-3] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2016] [Accepted: 04/10/2017] [Indexed: 01/31/2023] Open
Abstract
The era of big data is influencing the way how rational drug discovery and the development of bioactive molecules is performed and versatile tools are needed to assist in molecular design workflows. Scaffold Hunter is a flexible visual analytics framework for the analysis of chemical compound data and combines techniques from several fields such as data mining and information visualization. The framework allows analyzing high-dimensional chemical compound data in an interactive fashion, combining intuitive visualizations with automated analysis methods including versatile clustering methods. Originally designed to analyze the scaffold tree, Scaffold Hunter is continuously revised and extended. We describe recent extensions that significantly increase the applicability for a variety of tasks.
Collapse
Affiliation(s)
- Till Schäfer
- Department of Computer Science, TU Dortmund University, Otto-Hahn-Str. 14, Dortmund, 44227, Germany
| | - Nils Kriege
- Department of Computer Science, TU Dortmund University, Otto-Hahn-Str. 14, Dortmund, 44227, Germany
| | - Lina Humbeck
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Str. 6, Dortmund, 44227, Germany
| | - Karsten Klein
- Department of Computer and Information Science, University of Konstanz, Universitaetsstrasse 10, Konstanz, 78464, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Otto-Hahn-Str. 6, Dortmund, 44227, Germany.
| | - Petra Mutzel
- Department of Computer Science, TU Dortmund University, Otto-Hahn-Str. 14, Dortmund, 44227, Germany.
| |
Collapse
|
33
|
Hu Y, Stumpfe D, Bajorath J. Computational Exploration of Molecular Scaffolds in Medicinal Chemistry. J Med Chem 2016; 59:4062-76. [DOI: 10.1021/acs.jmedchem.5b01746] [Citation(s) in RCA: 79] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Affiliation(s)
- Ye Hu
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | - Dagmar Stumpfe
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science
Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal
Chemistry, Rheinische Friedrich-Wilhelms-Universität, Dahlmannstrasse 2, D-53113 Bonn, Germany
| |
Collapse
|
34
|
Systematic assessment of scaffold hopping versus activity cliff formation across bioactive compound classes following a molecular hierarchy. Bioorg Med Chem 2015; 23:3183-91. [DOI: 10.1016/j.bmc.2015.04.067] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Revised: 04/21/2015] [Accepted: 04/24/2015] [Indexed: 11/22/2022]
|
35
|
Hu Y, Zhang B, Bajorath J. Method for Systematic Assessment of Chemical Changes in Molecular Scaffolds with Conserved Topology and Application to the Analysis of Scaffold-Activity Relationships. Mol Inform 2015; 34:531-49. [DOI: 10.1002/minf.201500034] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2015] [Accepted: 04/23/2015] [Indexed: 11/10/2022]
|
36
|
Osolodkin DI, Radchenko EV, Orlov AA, Voronkov AE, Palyulin VA, Zefirov NS. Progress in visual representations of chemical space. Expert Opin Drug Discov 2015; 10:959-73. [DOI: 10.1517/17460441.2015.1060216] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
37
|
Du F, Babcock JJ, Yu H, Zou B, Li M. Global analysis reveals families of chemical motifs enriched for HERG inhibitors. PLoS One 2015; 10:e0118324. [PMID: 25700001 PMCID: PMC4336329 DOI: 10.1371/journal.pone.0118324] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 12/01/2014] [Indexed: 11/18/2022] Open
Abstract
Promiscuous inhibition of the human ether-à-go-go-related gene (hERG) potassium channel by drugs poses a major risk for life threatening arrhythmia and costly drug withdrawals. Current knowledge of this phenomenon is derived from a limited number of known drugs and tool compounds. However, in a diverse, naïve chemical library, it remains unclear which and to what degree chemical motifs or scaffolds might be enriched for hERG inhibition. Here we report electrophysiology measurements of hERG inhibition and computational analyses of >300,000 diverse small molecules. We identify chemical ‘communities’ with high hERG liability, containing both canonical scaffolds and structurally distinctive molecules. These data enable the development of more effective classifiers to computationally assess hERG risk. The resultant predictive models now accurately classify naïve compound libraries for tendency of hERG inhibition. Together these results provide a more complete reference map of characteristic chemical motifs for hERG liability and advance a systematic approach to rank chemical collections for cardiotoxicity risk.
Collapse
Affiliation(s)
- Fang Du
- The Solomon H. Snyder Department of Neuroscience, High Throughput Biology Center and Johns Hopkins Ion Channel Center (JHICC), Johns Hopkins University, 733 North Broadway, Baltimore, MD 21205, United States of America
| | - Joseph J. Babcock
- The Solomon H. Snyder Department of Neuroscience, High Throughput Biology Center and Johns Hopkins Ion Channel Center (JHICC), Johns Hopkins University, 733 North Broadway, Baltimore, MD 21205, United States of America
| | - Haibo Yu
- The Solomon H. Snyder Department of Neuroscience, High Throughput Biology Center and Johns Hopkins Ion Channel Center (JHICC), Johns Hopkins University, 733 North Broadway, Baltimore, MD 21205, United States of America
| | - Beiyan Zou
- The Solomon H. Snyder Department of Neuroscience, High Throughput Biology Center and Johns Hopkins Ion Channel Center (JHICC), Johns Hopkins University, 733 North Broadway, Baltimore, MD 21205, United States of America
| | - Min Li
- The Solomon H. Snyder Department of Neuroscience, High Throughput Biology Center and Johns Hopkins Ion Channel Center (JHICC), Johns Hopkins University, 733 North Broadway, Baltimore, MD 21205, United States of America
- * E-mail:
| |
Collapse
|
38
|
Gaspar HA, Baskin II, Marcou G, Horvath D, Varnek A. Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. J Chem Inf Model 2014; 55:84-94. [PMID: 25423612 DOI: 10.1021/ci500575y] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
This paper is devoted to the analysis and visualization in 2-dimensional space of large data sets of millions of compounds using the incremental version of generative topographic mapping (iGTM). The iGTM algorithm implemented in the in-house ISIDA-GTM program was applied to a database of more than 2 million compounds combining data sets of 36 chemicals suppliers and the NCI collection, encoded either by MOE descriptors or by MACCS keys. Taking advantage of the probabilistic nature of GTM, several approaches to data analysis were proposed. The chemical space coverage was evaluated using the normalized Shannon entropy. Different views of the data (property landscapes) were obtained by mapping various physical and chemical properties (molecular weight, aqueous solubility, LogP, etc.) onto the iGTM map. The superposition of these views helped to identify the regions in the chemical space populated by compounds with desirable physicochemical profiles and the suppliers providing them. The data sets similarity in the latent space was assessed by applying several metrics (Euclidean distance, Tanimoto and Bhattacharyya coefficients) to data probability distributions based on cumulated responsibility vectors. As a complementary approach, data sets were compared by considering them as individual objects on a meta-GTM map, built on cumulated responsibility vectors or property landscapes produced with iGTM. We believe that the iGTM methodology described in this article represents a fast and reliable way to analyze and visualize large chemical databases.
Collapse
Affiliation(s)
- Héléna A Gaspar
- Laboratory of Chemoinformatics, University of Strasbourg , 67081 Strasbourg, France
| | | | | | | | | |
Collapse
|
39
|
Abstract
Efforts to compile the phenotypic effects of drugs and environmental chemicals offer the opportunity to adopt a chemo-centric view of human health that does not require detailed mechanistic information. Here, we consider thousands of chemicals and analyze the relationship of their structures with adverse and therapeutic responses. Our study includes molecules related to the etiology of 934 health threatening conditions and used to treat 835 diseases. We first identify chemical moieties that could be independently associated with each phenotypic effect. Using these fragments, we build accurate predictors for approximately 400 clinical phenotypes, finding many privileged and liable structures. Finally, we connect two diseases if they relate to similar chemical structures. The resulting networks of human conditions are able to predict disease comorbidities, as well as identifying potential drug side effects and opportunities for drug repositioning, and show a remarkable coincidence with clinical observations.
Collapse
|
40
|
Skuta C, Bartůněk P, Svozil D. InCHlib - interactive cluster heatmap for web applications. J Cheminform 2014; 6:44. [PMID: 25264459 PMCID: PMC4173117 DOI: 10.1186/s13321-014-0044-4] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2014] [Accepted: 09/08/2014] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Hierarchical clustering is an exploratory data analysis method that reveals the groups (clusters) of similar objects. The result of the hierarchical clustering is a tree structure called dendrogram that shows the arrangement of individual clusters. To investigate the row/column hierarchical cluster structure of a data matrix, a visualization tool called 'cluster heatmap' is commonly employed. In the cluster heatmap, the data matrix is displayed as a heatmap, a 2-dimensional array in which the colour of each element corresponds to its value. The rows/columns of the matrix are ordered such that similar rows/columns are near each other. The ordering is given by the dendrogram which is displayed on the side of the heatmap. RESULTS We developed InCHlib (Interactive Cluster Heatmap Library), a highly interactive and lightweight JavaScript library for cluster heatmap visualization and exploration. InCHlib enables the user to select individual or clustered heatmap rows, to zoom in and out of clusters or to flexibly modify heatmap appearance. The cluster heatmap can be augmented with additional metadata displayed in a different colour scale. In addition, to further enhance the visualization, the cluster heatmap can be interconnected with external data sources or analysis tools. Data clustering and the preparation of the input file for InCHlib is facilitated by the Python utility script inchlib_clust. CONCLUSIONS The cluster heatmap is one of the most popular visualizations of large chemical and biomedical data sets originating, e.g., in high-throughput screening, genomics or transcriptomics experiments. The presented JavaScript library InCHlib is a client-side solution for cluster heatmap exploration. InCHlib can be easily deployed into any modern web application and configured to cooperate with external tools and data sources. Though InCHlib is primarily intended for the analysis of chemical or biological data, it is a versatile tool which application domain is not limited to the life sciences only.
Collapse
Affiliation(s)
- Ctibor Skuta
- Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic ; CZ-OPENSCREEN, Institute of Molecular Genetics of the ASCR, v. v. i, Vídeňská 1083, CZ-142 20 Prague, Czech Republic
| | - Petr Bartůněk
- CZ-OPENSCREEN, Institute of Molecular Genetics of the ASCR, v. v. i, Vídeňská 1083, CZ-142 20 Prague, Czech Republic
| | - Daniel Svozil
- Laboratory of Informatics and Chemistry, Faculty of Chemical Technology, Institute of Chemical Technology Prague, Technická 5, CZ-166 28 Prague, Czech Republic ; CZ-OPENSCREEN, Institute of Molecular Genetics of the ASCR, v. v. i, Vídeňská 1083, CZ-142 20 Prague, Czech Republic
| |
Collapse
|
41
|
Ertl P. Intuitive ordering of scaffolds and scaffold similarity searching using scaffold keys. J Chem Inf Model 2014; 54:1617-22. [PMID: 24846291 DOI: 10.1021/ci5001983] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Scaffold Keys-scaffold descriptors based on simple topological parameters such as number of ring and chain atoms, number and type of heteroatoms, and other simple structural features-are presented. Scaffold Keys enable intuitive ordering of scaffolds from small and simple to large and complex, ordering that is consistent with the way medicinal chemists themselves classify scaffolds. Scaffold Keys may be also used as descriptors for scaffold similarity searches, providing results compatible with expectations of chemists and well-suited for use in scaffold bioisosteric replacement and scaffold hopping. Scaffold Keys also support visualization of large chemical data sets. Scaffold Keys descriptors are easy to understand by chemists as well as easy to implement.
Collapse
Affiliation(s)
- Peter Ertl
- Novartis Institutes for BioMedical Research , Novartis Campus, CH-4056, Basel, Switzerland
| |
Collapse
|
42
|
Lind P. Construction and Use of Fragment-Augmented Molecular Hasse Diagrams. J Chem Inf Model 2014; 54:387-95. [DOI: 10.1021/ci4004464] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Peter Lind
- Medivir AB, Box 1086, 14122 Huddinge, Sweden
| |
Collapse
|
43
|
Analysis of chemical and biological features yields mechanistic insights into drug side effects. ACTA ACUST UNITED AC 2013; 20:594-603. [PMID: 23601648 DOI: 10.1016/j.chembiol.2013.03.017] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2012] [Revised: 03/08/2013] [Accepted: 03/25/2013] [Indexed: 12/31/2022]
Abstract
Side effects (SEs) are the unintended consequence of therapeutic treatments, but they can also be seen as valuable readouts of drug effects, resulting from the perturbation of biological systems by chemical compounds. Unfortunately, biology and chemistry are often considered separately, leading to incomplete models unable to provide a unified view of SEs. Here, we investigate the molecular bases of over 1,600 SEs by navigating both chemical and biological spaces. We identified characteristic molecular traits for 1,162 SEs, 38% of which can be explained using solely biological arguments, and only 6% are exclusively associated with the chemistry of the compounds, implying that the drug action is somewhat unspecific. Overall, we provide mechanistic insights for most SEs and emphasize the need to blend biology and chemistry to surpass intricate phenomena not captured in the molecular biology view.
Collapse
|
44
|
Klein K, Koch O, Kriege N, Mutzel P, Schäfer T. Visual Analysis of Biological Activity Data with Scaffold Hunter. Mol Inform 2013; 32:964-75. [DOI: 10.1002/minf.201300087] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2013] [Accepted: 07/25/2013] [Indexed: 02/03/2023]
|
45
|
Matlock MK, Zaretzki JM, Swamidass SJ. Scaffold network generator: a tool for mining molecular structures. Bioinformatics 2013; 29:2655-6. [DOI: 10.1093/bioinformatics/btt448] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
46
|
Florent JC. [Small compounds libraries: a research tool for chemical biology]. Biol Aujourdhui 2013; 207:39-54. [PMID: 23694724 DOI: 10.1051/jbio/2013006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2013] [Indexed: 06/02/2023]
Abstract
Obtaining and screening collections of small molecules remain a challenge for biologists. Recent advances in analytical techniques and instrumentation now make screening possible in academia. The history of the creation of such public or commercial collections and their accessibility is related. It shows that there is interest for an academic laboratory involved in medicinal chemistry, chemogenomics or "chemical biology" to organize its own collection and make it available through existing networks such as the French National chimiothèque or the European partner network "European Infrastructure of open screening platforms for Chemical Biology" EU-OpenScreen under construction.
Collapse
Affiliation(s)
- Jean-Claude Florent
- Laboratoire de Conception, Synthèse et Vectorisation de Biomolécules (CSVB), UMR 176 CNRS-Institut Curie, Institut Curie Centre de Recherche, 75248 Paris Cedex, France.
| |
Collapse
|
47
|
|
48
|
Titarenko Z, Vasilevich N, Zernov V, Kirpichenok M, Genis D. Oxygen-containing fragments in natural products. J Comput Aided Mol Des 2012; 27:125-60. [PMID: 23271273 DOI: 10.1007/s10822-012-9629-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2012] [Accepted: 12/17/2012] [Indexed: 01/08/2023]
Abstract
An analysis of the chemical environment of the oxygen atoms in the DNP database compared to the CMC and SCD databases was performed. Some structural clusters were identified which are predominant among the natural products and can be considered as distinctive features of NPs. Fifty-three oxygen-containing structural fragments that are distinctive for the DNP (distinctive set of fragments DSF) in comparison with the SCD have been identified. A new descriptor Mc was introduced for describing the ratio of atoms involved in the DSF to the total number of heavy atoms. A significant difference in the Mc values among the reference databases allowed the use of a specific cluster of the DSF as a tool for performing similarity searches for oxygen-containing NP molecules, or for evaluation or comparison of databases according to their NP-likeness. An example illustrating that the suggested approach could allow not only estimating the NP-likeness, but also serve as a tool for designing new NP-like compounds is provided. The suggested approach for NP-likeness evaluation moves away from the traditional ideas of scaffolds, cycles, linkers and substituents.
Collapse
Affiliation(s)
- Zoya Titarenko
- ASINEX, 20 Geroev Panfilovtsev Str., Moscow 125480, Russia
| | | | | | | | | |
Collapse
|
49
|
Natural-product-derived fragments for fragment-based ligand discovery. Nat Chem 2012; 5:21-8. [DOI: 10.1038/nchem.1506] [Citation(s) in RCA: 217] [Impact Index Per Article: 18.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2012] [Accepted: 10/19/2012] [Indexed: 12/11/2022]
|
50
|
Vogt M, Bajorath J. Chemoinformatics: A view of the field and current trends in method development. Bioorg Med Chem 2012; 20:5317-23. [DOI: 10.1016/j.bmc.2012.03.030] [Citation(s) in RCA: 40] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Revised: 03/09/2012] [Accepted: 03/12/2012] [Indexed: 12/18/2022]
|