1
|
Orsi M, Reymond JL. One chiral fingerprint to find them all. J Cheminform 2024; 16:53. [PMID: 38741153 DOI: 10.1186/s13321-024-00849-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 04/28/2024] [Indexed: 05/16/2024] Open
Abstract
Molecular fingerprints are indispensable tools in cheminformatics. However, stereochemistry is generally not considered, which is problematic for large molecules which are almost all chiral. Herein we report MAP4C, a chiral version of our previously reported fingerprint MAP4, which lists MinHashes computed from character strings containing the SMILES of all pairs of circular substructures up to a diameter of four bonds and the shortest topological distance between their central atoms. MAP4C includes the Cahn-Ingold-Prelog (CIP) annotation (R, S, r or s) whenever the chiral atom is the center of a circular substructure, a question mark for undefined stereocenters, and double bond cis-trans information if specified. MAP4C performs slightly better than the achiral MAP4, ECFP and AP fingerprints in non-stereoselective virtual screening benchmarks. Furthermore, MAP4C distinguishes between stereoisomers in chiral molecules from small molecule drugs to large natural products and peptides comprising thousands of diastereomers, with a degree of distinction smaller than between structural isomers and proportional to the number of chirality changes. Due to its excellent performance across diverse molecular classes and its ability to handle stereochemistry, MAP4C is recommended as a generally applicable chiral molecular fingerprint. SCIENTIFIC CONTRIBUTION: The ability of our chiral fingerprint MAP4C to handle stereoisomers from small molecules to large natural products and peptides is unprecedented and opens the way for cheminformatics to include stereochemistry as an important molecular parameter across all fields of molecular design.
Collapse
Affiliation(s)
- Markus Orsi
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry, Biochemistry and Pharmaceutical Sciences, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| |
Collapse
|
2
|
Kim H, Lee K, Kim C, Lim J, Kim WY. DFRscore: Deep Learning-Based Scoring of Synthetic Complexity with Drug-Focused Retrosynthetic Analysis for High-Throughput Virtual Screening. J Chem Inf Model 2024; 64:2432-2444. [PMID: 37651152 DOI: 10.1021/acs.jcim.3c01134] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
Recently emerging generative AI models enable us to produce a vast number of compounds for potential applications. While they can provide novel molecular structures, the synthetic feasibility of the generated molecules is often questioned. To address this issue, a few recent studies have attempted to use deep learning models to estimate the synthetic accessibility of many molecules rapidly. However, retrosynthetic analysis tools used to train the models rely on reaction templates automatically extracted from a large reaction database that are not domain-specific and may exhibit low chemical correctness. To overcome this limitation, we introduce DFRscore (Drug-Focused Retrosynthetic score), a deep learning-based approach for a more practical assessment of synthetic accessibility in drug discovery. The DFRscore model is trained exclusively on drug-focused reactions, providing a predicted number of minimally required synthetic steps for each compound. This approach enables practitioners to filter out compounds that do not meet their desired level of synthetic accessibility at an early stage of high-throughput virtual screening for accelerated drug discovery. The proposed strategy can be easily adapted to other domains by adjusting the synthesis planning setup of the reaction templates and starting materials.
Collapse
Affiliation(s)
- Hyeongwoo Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Kyunghoon Lee
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Chansu Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| | - Jaechang Lim
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
| | - Woo Youn Kim
- Department of Chemistry, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
- HITS Incorporation, 124 Teheran-ro, Gangnam-gu, Seoul 06234, Republic of Korea
- AI Institute, KAIST, 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea
| |
Collapse
|
3
|
Olmedo DA, Durant-Archibold AA, López-Pérez JL, Medina-Franco JL. Design and Diversity Analysis of Chemical Libraries in Drug Discovery. Comb Chem High Throughput Screen 2024; 27:502-515. [PMID: 37409545 DOI: 10.2174/1386207326666230705150110] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/30/2023] [Accepted: 05/30/2023] [Indexed: 07/07/2023]
Abstract
Chemical libraries and compound data sets are among the main inputs to start the drug discovery process at universities, research institutes, and the pharmaceutical industry. The approach used in the design of compound libraries, the chemical information they possess, and the representation of structures, play a fundamental role in the development of studies: chemoinformatics, food informatics, in silico pharmacokinetics, computational toxicology, bioinformatics, and molecular modeling to generate computational hits that will continue the optimization process of drug candidates. The prospects for growth in drug discovery and development processes in chemical, biotechnological, and pharmaceutical companies began a few years ago by integrating computational tools with artificial intelligence methodologies. It is anticipated that it will increase the number of drugs approved by regulatory agencies shortly.
Collapse
Affiliation(s)
- Dionisio A Olmedo
- Centro de Investigaciones Farmacognósticas de la Flora Panameña (CIFLORPAN), Facultad de Farmacia, Universidad de Panamá, Ciudad de Panamá, Apartado, 0824-00178, Panamá
- Sistema Nacional de Investigación (SNI), Secretaria Nacional de Ciencia, Tecnología e Innovación (SENACYT), Ciudad del Saber, Clayton, Panamá
| | - Armando A Durant-Archibold
- Centro de Biodiversidad y Descubrimiento de Drogas, Instituto de Investigaciones Científicas y Servicios de Alta Tecnología (INDICASAT AIP), Apartado, 0843-01103, Panamá
- Departamento de Bioquímica, Facultad de Ciencias Naturales, Exactas y Tecnología, Universidad de Panamá, Ciudad de Panamá, Panamá
| | - José Luis López-Pérez
- CESIFAR, Departamento de Farmacología, Facultad de Medicina, Universidad de Panamá, Ciudad de Panamá, Panamá
- Departamento de Ciencias Farmacéuticas, Facultad de Farmacia, Universidad de Salamanca, Avda. Campo Charro s/n, 37071 Salamanca, España
| | - José Luis Medina-Franco
- DIFACQUIM Grupo de Investigación, Departamento de Farmacia, Escuela de Química, Universidad Nacional Autónoma de México, Ciudad de México, Apartado, 04510, México
| |
Collapse
|
4
|
Korn M, Ehrt C, Ruggiu F, Gastreich M, Rarey M. Navigating large chemical spaces in early-phase drug discovery. Curr Opin Struct Biol 2023; 80:102578. [PMID: 37019067 DOI: 10.1016/j.sbi.2023.102578] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2022] [Revised: 01/28/2023] [Accepted: 02/26/2023] [Indexed: 04/07/2023]
Abstract
The size of actionable chemical spaces is surging, owing to a variety of novel techniques, both computational and experimental. As a consequence, novel molecular matter is now at our fingertips that cannot and should not be neglected in early-phase drug discovery. Huge, combinatorial, make-on-demand chemical spaces with high probability of synthetic success rise exponentially in content, generative machine learning models go hand in hand with synthesis prediction, and DNA-encoded libraries offer new ways of hit structure discovery. These technologies enable to search for new chemical matter in a much broader and deeper manner with less effort and fewer financial resources. These transformational developments require new cheminformatics approaches to make huge chemical spaces searchable and analyzable with low resources, and with as little energy consumption as possible. Substantial progress has been made in the past years with respect to computation as well as organic synthesis. First examples of bioactive compounds resulting from the successful use of these novel technologies demonstrate their power to contribute to tomorrow's drug discovery programs. This article gives a compact overview of the state-of-the-art.
Collapse
Affiliation(s)
- Malte Korn
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany
| | - Christiane Ehrt
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany
| | - Fiorella Ruggiu
- insitro, 279 E Grand Ave., CA 94608, South San Francisco, USA
| | - Marcus Gastreich
- BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany
| | - Matthias Rarey
- Universität Hamburg, ZBH - Center for Bioinformatics, Bundesstr. 43, 20146 Hamburg, Germany.
| |
Collapse
|
5
|
Béquignon OJM, Bongers BJ, Jespers W, IJzerman AP, van der Water B, van Westen GJP. Papyrus: a large-scale curated dataset aimed at bioactivity predictions. J Cheminform 2023; 15:3. [PMID: 36609528 PMCID: PMC9824924 DOI: 10.1186/s13321-022-00672-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/17/2022] [Indexed: 01/07/2023] Open
Abstract
With the ongoing rapid growth of publicly available ligand-protein bioactivity data, there is a trove of valuable data that can be used to train a plethora of machine-learning algorithms. However, not all data is equal in terms of size and quality and a significant portion of researchers' time is needed to adapt the data to their needs. On top of that, finding the right data for a research question can often be a challenge on its own. To meet these challenges, we have constructed the Papyrus dataset. Papyrus is comprised of around 60 million data points. This dataset contains multiple large publicly available datasets such as ChEMBL and ExCAPE-DB combined with several smaller datasets containing high-quality data. The aggregated data has been standardised and normalised in a manner that is suitable for machine learning. We show how data can be filtered in a variety of ways and also perform some examples of quantitative structure-activity relationship analyses and proteochemometric modelling. Our ambition is that this pruned data collection constitutes a benchmark set that can be used for constructing predictive models, while also providing an accessible data source for research.
Collapse
Affiliation(s)
- O. J. M. Béquignon
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. J. Bongers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - W. Jespers
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - A. P. IJzerman
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - B. van der Water
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| | - G. J. P. van Westen
- grid.5132.50000 0001 2312 1970Division of Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden University, Leiden, The Netherlands
| |
Collapse
|
6
|
Li Y, Zhang L, Wang Y, Zou J, Yang R, Luo X, Wu C, Yang W, Tian C, Xu H, Wang F, Yang X, Li L, Yang S. Generative deep learning enables the discovery of a potent and selective RIPK1 inhibitor. Nat Commun 2022; 13:6891. [PMID: 36371441 PMCID: PMC9653409 DOI: 10.1038/s41467-022-34692-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2022] [Accepted: 11/03/2022] [Indexed: 11/13/2022] Open
Abstract
The retrieval of hit/lead compounds with novel scaffolds during early drug development is an important but challenging task. Various generative models have been proposed to create drug-like molecules. However, the capacity of these generative models to design wet-lab-validated and target-specific molecules with novel scaffolds has hardly been verified. We herein propose a generative deep learning (GDL) model, a distribution-learning conditional recurrent neural network (cRNN), to generate tailor-made virtual compound libraries for given biological targets. The GDL model is then applied to RIPK1. Virtual screening against the generated tailor-made compound library and subsequent bioactivity evaluation lead to the discovery of a potent and selective RIPK1 inhibitor with a previously unreported scaffold, RI-962. This compound displays potent in vitro activity in protecting cells from necroptosis, and good in vivo efficacy in two inflammatory models. Collectively, the findings prove the capacity of our GDL model in generating hit/lead compounds with unreported scaffolds, highlighting a great potential of deep learning in drug discovery.
Collapse
Affiliation(s)
- Yueshan Li
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Liting Zhang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Yifei Wang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Jun Zou
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Ruicheng Yang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Xinling Luo
- grid.13291.380000 0001 0807 1581Key Laboratory of Drug Targeting and Drug Delivery System of Ministry of Education, West China School of Pharmacy, Sichuan University, 610041 Chengdu, Sichuan China
| | - Chengyong Wu
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Wei Yang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Chenyu Tian
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Haixing Xu
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Falu Wang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Xin Yang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| | - Linli Li
- grid.13291.380000 0001 0807 1581Key Laboratory of Drug Targeting and Drug Delivery System of Ministry of Education, West China School of Pharmacy, Sichuan University, 610041 Chengdu, Sichuan China
| | - Shengyong Yang
- grid.13291.380000 0001 0807 1581State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, 610041 Chengdu, Sichuan China
| |
Collapse
|
7
|
Kumar R, Sharma A, Alexiou A, Ashraf GM. Artificial Intelligence in De novo Drug Design: Are We Still There? Curr Top Med Chem 2022; 22:2483-2492. [PMID: 36263480 DOI: 10.2174/1568026623666221017143244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Revised: 09/06/2022] [Accepted: 09/15/2022] [Indexed: 01/20/2023]
Abstract
BACKGROUND The artificial intelligence (AI)-assisted design of drug candidates with novel structures and desired properties has received significant attention in the recent past, so related areas of forward prediction that aim to discover chemical matters worth synthesizing and further experimental investigation. OBJECTIVES The purpose behind developing AI-driven models is to explore the broader chemical space and suggest new drug candidate scaffolds with promising therapeutic value. Moreover, it is anticipated that such AI-based models may not only significantly reduce the cost and time but also decrease the attrition rate of drug candidates that fail to reach the desirable endpoints at the final stages of drug development. In an attempt to develop AI-based models for de novo drug design, numerous methods have been proposed by various study groups by applying machine learning and deep learning algorithms to chemical datasets. However, there are many challenges in obtaining accurate predictions, and real breakthroughs in de novo drug design are still scarce. METHODS In this review, we explore the recent trends in developing AI-based models for de novo drug design to assess the current status, challenges, and opportunities in the field. CONCLUSION The consistently improved AI algorithms and the abundance of curated training chemical data indicate that AI-based de novo drug design should perform better than the current models. Improvements in the performance are warranted to obtain better outcomes in the form of potential drug candidates, which can perform well in in vivo conditions, especially in the case of more complex diseases.
Collapse
Affiliation(s)
- Rajnish Kumar
- Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Uttar Pradesh, India
| | - Anju Sharma
- Department of Applied Science, Indian Institute of Information Technology, Allahabad, Uttar Pradesh, India
| | - Athanasios Alexiou
- Novel Global Community Educational Foundation, Hebersham, 2770 NSW, Australia.,AFNP Med Austria, 1010 Wien, Austria
| | - Ghulam Md Ashraf
- Pre-Clinical Research Unit (PCRU), King Fahd Medical Research Center, King Abdulaziz University, Jeddah, Saudi Arabia.,Department of Medical Laboratory Technology, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
8
|
Petrović D, Scott JS, Bodnarchuk MS, Lorthioir O, Boyd S, Hughes GM, Lane J, Wu A, Hargreaves D, Robinson J, Sadowski J. Virtual Screening in the Cloud Identifies Potent and Selective ROS1 Kinase Inhibitors. J Chem Inf Model 2022; 62:3832-3843. [PMID: 35920716 DOI: 10.1021/acs.jcim.2c00644] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
ROS1 rearrangements account for 1-2% of non-small cell lung cancer patients, yet there are no specifically designed, selective ROS1 therapies in the clinic. Previous knowledge of potent ROS1 inhibitors with selectivity over TrkA, a selected antitarget, enabled virtual screening as a hit finding approach in this project. The ligand-based virtual screening was focused on identifying molecules with a similar 3D shape and pharmacophore to the known actives. To that end, we turned to the AstraZeneca virtual library, estimated to cover 1015 synthesizable make-on-demand molecules. We used cloud computing-enabled FastROCS technology to search the enumerated 1010 subset of the full virtual space. A small number of specific libraries were prioritized based on the compound properties and a medicinal chemistry assessment and further enumerated with available building blocks. Following the docking evaluation to the ROS1 structure, the most promising hits were synthesized and tested, resulting in the identification of several potent and selective series. The best among them gave a nanomolar ROS1 inhibitor with over 1000-fold selectivity over TrkA and, from the preliminary established SAR, these have the potential to be further optimized. Our prospective study describes how conceptually simple shape-matching approaches can identify potent and selective compounds by searching ultralarge virtual libraries, demonstrating the applicability of such workflows and their importance in early drug discovery.
Collapse
Affiliation(s)
- Dušan Petrović
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 50, Sweden
| | - James S Scott
- Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | | | | | - Scott Boyd
- Oncology R&D, AstraZeneca, Cambridge CB4 0WG, United Kingdom
| | - George M Hughes
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Jordan Lane
- Discovery Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Allan Wu
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, Massachusetts 02451, United States
| | - David Hargreaves
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - James Robinson
- Mechanistic and Structural Biology, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge CB2 0AA, United Kingdom
| | - Jens Sadowski
- Hit Discovery, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Gothenburg 431 50, Sweden
| |
Collapse
|
9
|
Saldívar-González FI, Medina-Franco JL. Approaches for enhancing the analysis of chemical space for drug discovery. Expert Opin Drug Discov 2022; 17:789-798. [PMID: 35640229 DOI: 10.1080/17460441.2022.2084608] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
INTRODUCTION Chemical space is a powerful, general, and practical conceptual framework in drug discovery and other areas in chemistry that addresses the diversity of molecules and it has various applications. Moreover, chemical space is a cornerstone of chemoinformatics as a scientific discipline. In response to the increase in the set of chemical compounds in databases, generators of chemical structures, and tools to calculate molecular descriptors, novel approaches to generate visual representations of chemical space in low dimensions are emerging and evolving. Such approaches include a wide range of commercial and free applications, software, and open-source methods. AREAS COVERED The current state of chemical space in drug design and discovery is reviewed. The topics discussed herein include advances for efficient navigation in chemical space, the use of this concept in assessing the diversity of different data sets, exploring structure-property/activity relationships for one or multiple endpoints, and compound library design. Recent advances in methodologies for generating visual representations of chemical space have been highlighted, thereby emphasizing open-source methods. EXPERT OPINION Quantitative and qualitative generation and analysis of chemical space require novel approaches for handling the increasing number of molecules and their information available in chemical databases (including emerging ultra-large libraries). In addition, it is of utmost importance to note that chemical space is a conceptual framework that goes beyond visual representation in low dimensions. However, the graphical representation of chemical space has several practical applications in drug discovery and beyond.
Collapse
Affiliation(s)
- Fernanda I Saldívar-González
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry, Universidad Nacional Autónoma de México, Avenida Universidad 3000, Mexico City 04510, Mexico
| |
Collapse
|
10
|
Moshawih S, Goh HP, Kifli N, Idris AC, Yassin H, Kotra V, Goh KW, Liew KB, Ming LC. Synergy between machine learning and natural products cheminformatics: Application to the lead discovery of anthraquinone derivatives. Chem Biol Drug Des 2022; 100:185-217. [PMID: 35490393 DOI: 10.1111/cbdd.14062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 04/15/2022] [Accepted: 04/23/2022] [Indexed: 11/28/2022]
Abstract
Cheminformatics utilizing machine learning (ML) techniques have opened up a new horizon in drug discovery. This is owing to vast chemical space expansion with rocketing numbers of expected hits and lead compounds that match druggable macromolecular targets, in particular from natural compounds. Due to the natural products' (NP) structural complexity, uniqueness, and diversity, they could occupy a bigger space in pharmaceuticals, allowing the industry to pursue more selective leads in the nanomolar range of binding affinity. ML is an essential part of each step of the drug design pipeline, such as target prediction, compound library preparation, and lead optimization. Notably, molecular mechanic and dynamic simulations, induced docking, and free energy perturbations are essential in predicting best binding poses, binding free energy values, and molecular mechanics force fields. Those applications have leveraged from artificial intelligence (AI), which decreases the computational costs required for such costly simulations. This review aimed to describe chemical space and compound libraries related to NPs. High-throughput screening utilized for fractionating NPs and high-throughput virtual screening and their strategies, and significance, are reviewed. Particular emphasis was given to AI approaches, ML tools, algorithms, and techniques, especially in drug discovery of macrocyclic compounds and approaches in computer-aided and ML-based drug discovery. Anthraquinone derivatives were discussed as a source of new lead compounds that can be developed using ML tools for diverse medicinal uses such as cancer, infectious diseases, and metabolic disorders. Furthermore, the power of principal component analysis in understanding relevant protein conformations, and molecular modeling of protein-ligand interaction were also presented. Apart from being a concise reference for cheminformatics, this review is a useful text to understand the application of ML-based algorithms to molecular dynamics simulation and in silico absorption, distribution, metabolism, excretion, and toxicity prediction.
Collapse
Affiliation(s)
- Said Moshawih
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hui Poh Goh
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Nurolaini Kifli
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Azam Che Idris
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Hayati Yassin
- Faculty of Integrated Technologies, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| | - Vijay Kotra
- Faculty of Pharmacy, Quest International University, Perak, Malaysia
| | - Khang Wen Goh
- Faculty of Data Science and Information Technology, INTI International University, Nilai, Malaysia
| | - Kai Bin Liew
- Faculty of Pharmacy, University of Cyberjaya, Cyberjaya, Malaysia
| | - Long Chiau Ming
- PAP Rashidah Sa'adatul Bolkiah Institute of Health Sciences, Universiti Brunei Darussalam, Gadong, Brunei Darussalam
| |
Collapse
|
11
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
12
|
Warr WA, Nicklaus MC, Nicolaou CA, Rarey M. Exploration of Ultralarge Compound Collections for Drug Discovery. J Chem Inf Model 2022; 62:2021-2034. [PMID: 35421301 DOI: 10.1021/acs.jcim.2c00224] [Citation(s) in RCA: 46] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Designing new medicines more cheaply and quickly is tightly linked to the quest of exploring chemical space more widely and efficiently. Chemical space is monumentally large, but recent advances in computer software and hardware have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures. This review specifically concerns collections of many millions or even billions of enumerated chemical structures as well as even larger chemical spaces that are not fully enumerated. We present examples of chemical libraries and spaces and the means used to construct them, and we discuss new technologies for searching huge libraries and for searching combinatorially in chemical space. We also cover space navigation techniques and consider new approaches to de novo drug design and the impact of the "autonomous laboratory" on synthesis of designed compounds. Finally, we summarize some other challenges and opportunities for the future.
Collapse
Affiliation(s)
- Wendy A Warr
- Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Crewe, Cheshire CW4 7HZ, United Kingdom
| | - Marc C Nicklaus
- NCI, NIH, CADD Group, NCI-Frederick, Frederick, Maryland 21702, United States
| | - Christos A Nicolaou
- Discovery Chemistry, Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana 46285, United States
| | - Matthias Rarey
- Universität Hamburg, ZBH Center for Bioinformatics, 20146 Hamburg, Germany
| |
Collapse
|
13
|
Pujol‐Giménez J, Poirier M, Bühlmann S, Schuppisser C, Bhardwaj R, Awale M, Visini R, Javor S, Hediger MA, Reymond J. Inhibitors of Human Divalent Metal Transporters DMT1 (SLC11A2) and ZIP8 (SLC39A8) from a GDB-17 Fragment Library. ChemMedChem 2021; 16:3306-3314. [PMID: 34309203 PMCID: PMC8596699 DOI: 10.1002/cmdc.202100467] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Indexed: 11/06/2022]
Abstract
Solute carrier proteins (SLCs) are membrane proteins controlling fluxes across biological membranes and represent an emerging class of drug targets. Here we searched for inhibitors of divalent metal transporters in a library of 1,676 commercially available 3D-shaped fragment-like molecules from the generated database GDB-17, which lists all possible organic molecules up to 17 atoms of C, N, O, S and halogen following simple criteria for chemical stability and synthetic feasibility. While screening against DMT1 (SLC11A2), an iron transporter associated with hemochromatosis and for which only very few inhibitors are known, only yielded two weak inhibitors, our approach led to the discovery of the first inhibitor of ZIP8 (SLC39A8), a zinc transporter associated with manganese homeostasis and osteoarthritis but with no previously reported pharmacology, demonstrating that this target is druggable.
Collapse
Affiliation(s)
- Jonai Pujol‐Giménez
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Marion Poirier
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Sven Bühlmann
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Céline Schuppisser
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Rajesh Bhardwaj
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Mahendra Awale
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Ricardo Visini
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Sacha Javor
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| | - Matthias A. Hediger
- Department of Biomedical Research and Department of Nephrology and Hypertension Membrane Transport Discovery Lab Inselspital, Bern University HospitalUniversity of BernCH-3010BernSwitzerland
| | - Jean‐Louis Reymond
- Department of Chemistry Biochemistry and Pharmaceutical SciencesUniversity of BernFreiestrasse 33012BernSwitzerland
| |
Collapse
|
14
|
Marques G, Leswing K, Robertson T, Giesen D, Halls MD, Goldberg A, Marshall K, Staker J, Morisato T, Maeshima H, Arai H, Sasago M, Fujii E, Matsuzawa NN. De Novo Design of Molecules with Low Hole Reorganization Energy Based on a Quarter-Million Molecule DFT Screen. J Phys Chem A 2021; 125:7331-7343. [PMID: 34342466 DOI: 10.1021/acs.jpca.1c04587] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Materials exhibiting higher mobilities than conventional organic semiconducting materials such as fullerenes and fused thiophenes are in high demand for applications in printed electronics. To discover new molecules in the heteroacene family that might show improved hole mobility, three de novo design methods were applied. Machine learning (ML) models were generated based on previously calculated hole reorganization energies of a quarter million examples of heteroacenes, where the energies were calculated by applying density functional theory (DFT) and a massive cloud computing environment. The three generative methods applied were (1) the continuous space method, where molecular structures are converted into continuous variables by applying the variational autoencoder/decoder technique; (2) the method based on reinforcement learning of SMILES strings (the REINVENT method); and (3) the junction tree variational autoencoder method that directly generates molecular graphs. Among the three methods, the second and third methods succeeded in obtaining chemical structures whose DFT-calculated hole reorganization energy was lower than the lowest energy in the training dataset. This suggests that an extrapolative materials design protocol can be developed by applying generative modeling to a quantitative structure-property relationship (QSPR) utility function.
Collapse
Affiliation(s)
- Gabriel Marques
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Karl Leswing
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Tim Robertson
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - David Giesen
- Schrödinger Inc., 120 West 45th Street, 17th Floor, New York, New York 10036, United States
| | - Mathew D Halls
- Schrödinger Inc., 10201 Wateridge Circle, Suite 220, San Diego, California 92121, United States
| | - Alexander Goldberg
- Schrödinger Inc., 10201 Wateridge Circle, Suite 220, San Diego, California 92121, United States
| | - Kyle Marshall
- Schrödinger Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Joshua Staker
- Schrödinger Inc., 101 SW Main Street, Suite 1300, Portland, Oregon 97204, United States
| | - Tsuguo Morisato
- Schrödinger Inc., 13th Floor, Marunouchi Trust Tower North Building, 1-8-1 Marunouchi, Chiyoda-ku, Tokyo 100-0005, Japan
| | - Hiroyuki Maeshima
- Engineering Division, Industrial Solutions Company, Panasonic Corp., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Hideyuki Arai
- Engineering Division, Industrial Solutions Company, Panasonic Corp., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Masaru Sasago
- Engineering Division, Industrial Solutions Company, Panasonic Corp., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Eiji Fujii
- Engineering Division, Industrial Solutions Company, Panasonic Corp., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| | - Nobuyuki N Matsuzawa
- Engineering Division, Industrial Solutions Company, Panasonic Corp., 1006 Kadoma, Kadoma, Osaka 571-8506, Japan
| |
Collapse
|
15
|
Meier K, Arús‐Pous J, Reymond J. A Potent and Selective Janus Kinase Inhibitor with a Chiral 3D‐Shaped Triquinazine Ring System from Chemical Space. Angew Chem Int Ed Engl 2021. [DOI: 10.1002/ange.202012049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Kris Meier
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Josep Arús‐Pous
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean‐Louis Reymond
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
16
|
Meier K, Arús‐Pous J, Reymond J. A Potent and Selective Janus Kinase Inhibitor with a Chiral 3D‐Shaped Triquinazine Ring System from Chemical Space. Angew Chem Int Ed Engl 2020; 60:2074-2077. [DOI: 10.1002/anie.202012049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2020] [Revised: 09/25/2020] [Indexed: 01/31/2023]
Affiliation(s)
- Kris Meier
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Josep Arús‐Pous
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| | - Jean‐Louis Reymond
- Department of Chemistry and Biochemistry University of Bern Freiestrasse 3 3012 Bern Switzerland
| |
Collapse
|
17
|
Berenger F, Yamanishi Y. Ranking Molecules with Vanishing Kernels and a Single Parameter: Active Applicability Domain Included. J Chem Inf Model 2020; 60:4376-4387. [PMID: 32281797 DOI: 10.1021/acs.jcim.9b01075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
In ligand-based virtual screening, high-throughput screening (HTS) data sets can be exploited to train classification models. Such models can be used to prioritize yet untested molecules, from the most likely active (against a protein target of interest) to the least likely active. In this study, a single-parameter ranking method with an Applicability Domain (AD) is proposed. In effect, Kernel Density Estimates (KDE) are revisited to improve their computational efficiency and incorporate an AD. Two modifications are proposed: (i) using vanishing kernels (i.e., kernel functions with a finite support) and (ii) using the Tanimoto distance between molecular fingerprints as a radial basis function. This construction is termed "Vanishing Ranking Kernels" (VRK). Using VRK on 21 HTS assays, it is shown that VRK can compete in performance with a graph convolutional deep neural network. VRK are conceptually simple and fast to train. During training, they require optimizing a single parameter. A trained VRK model usually defines an active AD. Exploiting this AD can significantly increase the screening frequency of a VRK model. Software: https://github.com/UnixJunkie/rankers. Data sets: https://zenodo.org/record/1320776 and https://zenodo.org/record/3540423.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Kawazu, 680-4 Iizuka, Japan
| |
Collapse
|
18
|
Poroikov VV. Computer-Aided Drug Design: from Discovery of Novel Pharmaceutical Agents to Systems Pharmacology. BIOCHEMISTRY (MOSCOW), SUPPLEMENT SERIES B: BIOMEDICAL CHEMISTRY 2020. [DOI: 10.1134/s1990750820030117] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
|
19
|
Poroikov VV. [Computer-aided drug design: from discovery of novel pharmaceutical agents to systems pharmacology]. BIOMEDIT︠S︡INSKAI︠A︡ KHIMII︠A︡ 2020; 66:30-41. [PMID: 32116224 DOI: 10.18097/pbmc20206601030] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
New drug discovery is based on the analysis of public information about the mechanisms of the disease, molecular targets, and ligands, which interaction with the target could lead to the normalization of the pathological process. The available data on diseases, drugs, pharmacological effects, molecular targets, and drug-like substances, taking into account the combinatorics of the associative relations between them, correspond to the Big Data. To analyze such data, the application of computer-aided drug design methods is necessary. An overview of the studies in this area performed by the Laboratory for Structure-Function Based Drug Design of IBMC is presented. We have developed the approaches to identifying promising pharmacological targets, predicting several thousand types of biological activity based on the structural formula of the compound, analyzing protein-ligand interactions based on assessing local similarity of amino acid sequences, identifying likely molecular mechanisms of side effects of drugs, calculating the integral toxicity of drugs taking into account their metabolism, have been developed in the human body, predicting sustainable and sensitive options strains and evaluating the effectiveness of combinations of antiretroviral drugs in patients, taking into account the molecular genetic characteristics of the clinical isolates of HIV-1. Our computer programs are implemented as the web-services freely available on the Internet, which are used by thousands of researchers from many countries of the world to select the most promising substances for the synthesis and determine the priority areas for experimental testing of their biological activity.
Collapse
Affiliation(s)
- V V Poroikov
- Institute of Biomedical Chemistry, Moscow, Russia
| |
Collapse
|
20
|
Rosenstein JK, Rose C, Reda S, Weber PM, Kim E, Sello J, Geiser J, Kennedy E, Arcadia C, Dombroski A, Oakley K, Chen SL, Tann H, Rubenstein BM. Principles of Information Storage in Small-Molecule Mixtures. IEEE Trans Nanobioscience 2020; 19:378-384. [PMID: 32142450 DOI: 10.1109/tnb.2020.2977304] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Molecular data systems have the potential to store information at dramatically higher density than existing electronic media. Some of the first experimental demonstrations of this idea have used DNA, but nature also uses a wide diversity of smaller non-polymeric molecules to preserve, process, and transmit information. In this paper, we present a general framework for quantifying chemical memory, which is not limited to polymers and extends to mixtures of molecules of all types. We show that the theoretical limit for molecular information is two orders of magnitude denser by mass than DNA, although this comes with different practical constraints on total capacity. We experimentally demonstrate kilobyte-scale information storage in mixtures of small synthetic molecules, and we consider some of the new perspectives that will be necessary to harness the information capacity available from the vast non-genomic chemical space.
Collapse
|
21
|
Bühlmann S, Reymond JL. ChEMBL-Likeness Score and Database GDBChEMBL. Front Chem 2020; 8:46. [PMID: 32117874 PMCID: PMC7010641 DOI: 10.3389/fchem.2020.00046] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 01/15/2020] [Indexed: 01/02/2023] Open
Abstract
The generated database GDB17 enumerates 166.4 billion molecules up to 17 atoms of C, N, O, S and halogens following simple rules of chemical stability and synthetic feasibility. However, most molecules in GDB17 are too complex to be considered for chemical synthesis. To address this limitation, we report GDBChEMBL as a subset of GDB17 featuring 10 million molecules selected according to a ChEMBL-likeness score (CLscore) calculated from the frequency of occurrence of circular substructures in ChEMBL, followed by uniform sampling across molecular size, stereocenters and heteroatoms. Compared to the previously reported subsets FDB17 and GDBMedChem selected from GDB17 by fragment-likeness, respectively, medicinal chemistry criteria, our new subset features molecules with higher synthetic accessibility and possibly bioactivity yet retains a broad and continuous coverage of chemical space typical of the entire GDB17. GDBChEMBL is accessible at http://gdb.unibe.ch for download and for browsing using an interactive chemical space map at http://faerun.gdb.tools.
Collapse
Affiliation(s)
- Sven Bühlmann
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Bern, Switzerland
| |
Collapse
|
22
|
Yang J, Wang D, Jia C, Wang M, Hao G, Yang G. Freely Accessible Chemical Database Resources of Compounds for In Silico Drug Discovery. Curr Med Chem 2020; 26:7581-7597. [PMID: 29737247 DOI: 10.2174/0929867325666180508100436] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Revised: 01/26/2018] [Accepted: 04/18/2018] [Indexed: 11/22/2022]
Abstract
BACKGROUND In silico drug discovery has been proved to be a solidly established key component in early drug discovery. However, this task is hampered by the limitation of quantity and quality of compound databases for screening. In order to overcome these obstacles, freely accessible database resources of compounds have bloomed in recent years. Nevertheless, how to choose appropriate tools to treat these freely accessible databases is crucial. To the best of our knowledge, this is the first systematic review on this issue. OBJECTIVE The existed advantages and drawbacks of chemical databases were analyzed and summarized based on the collected six categories of freely accessible chemical databases from literature in this review. RESULTS Suggestions on how and in which conditions the usage of these databases could be reasonable were provided. Tools and procedures for building 3D structure chemical libraries were also introduced. CONCLUSION In this review, we described the freely accessible chemical database resources for in silico drug discovery. In particular, the chemical information for building chemical database appears as attractive resources for drug design to alleviate experimental pressure.
Collapse
Affiliation(s)
- JingFang Yang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Di Wang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Chenyang Jia
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - Mengyao Wang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - GeFei Hao
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China
| | - GuangFu Yang
- Key Laboratory of Pesticide & Chemical Biology, Ministry of Education, College of Chemistry, Central China Normal University, Wuhan 430079, China.,Collaborative Innovation Center of Chemical Science and Engineering, Tianjin 300072, China
| |
Collapse
|
23
|
Chen G, Shen Z, Iyer A, Ghumman UF, Tang S, Bi J, Chen W, Li Y. Machine-Learning-Assisted De Novo Design of Organic Molecules and Polymers: Opportunities and Challenges. Polymers (Basel) 2020; 12:E163. [PMID: 31936321 PMCID: PMC7023065 DOI: 10.3390/polym12010163] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Revised: 12/27/2019] [Accepted: 01/02/2020] [Indexed: 12/18/2022] Open
Abstract
Organic molecules and polymers have a broad range of applications in biomedical, chemical, and materials science fields. Traditional design approaches for organic molecules and polymers are mainly experimentally-driven, guided by experience, intuition, and conceptual insights. Though they have been successfully applied to discover many important materials, these methods are facing significant challenges due to the tremendous demand of new materials and vast design space of organic molecules and polymers. Accelerated and inverse materials design is an ideal solution to these challenges. With advancements in high-throughput computation, artificial intelligence (especially machining learning, ML), and the growth of materials databases, ML-assisted materials design is emerging as a promising tool to flourish breakthroughs in many areas of materials science and engineering. To date, using ML-assisted approaches, the quantitative structure property/activity relation for material property prediction can be established more accurately and efficiently. In addition, materials design can be revolutionized and accelerated much faster than ever, through ML-enabled molecular generation and inverse molecular design. In this perspective, we review the recent progresses in ML-guided design of organic molecules and polymers, highlight several successful examples, and examine future opportunities in biomedical, chemical, and materials science fields. We further discuss the relevant challenges to solve in order to fully realize the potential of ML-assisted materials design for organic molecules and polymers. In particular, this study summarizes publicly available materials databases, feature representations for organic molecules, open-source tools for feature generation, methods for molecular generation, and ML models for prediction of material properties, which serve as a tutorial for researchers who have little experience with ML before and want to apply ML for various applications. Last but not least, it draws insights into the current limitations of ML-guided design of organic molecules and polymers. We anticipate that ML-assisted materials design for organic molecules and polymers will be the driving force in the near future, to meet the tremendous demand of new materials with tailored properties in different fields.
Collapse
Affiliation(s)
- Guang Chen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Zhiqiang Shen
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
| | - Akshay Iyer
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Umar Farooq Ghumman
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Shan Tang
- State Key Laboratory of Structural Analysis for Industrial Equipment, Department of Engineering Mechanics, and International Research Center for Computational Mechanics, Dalian University of Technology, Dalian 116023, China;
| | - Jinbo Bi
- Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA;
| | - Wei Chen
- Department of Mechanical Engineering, Northwestern University, Evanston, IL 60208, USA; (A.I.); (U.F.G.)
| | - Ying Li
- Department of Mechanical Engineering, University of Connecticut, Storrs, CT 06269, USA; (G.C.); (Z.S.)
- Polymer Program, Institute of Materials Science, University of Connecticut, Storrs, CT 06269, USA
| |
Collapse
|
24
|
Arús-Pous J, Johansson SV, Prykhodko O, Bjerrum EJ, Tyrchan C, Reymond JL, Chen H, Engkvist O. Randomized SMILES strings improve the quality of molecular generative models. J Cheminform 2019; 11:71. [PMID: 33430971 PMCID: PMC6873550 DOI: 10.1186/s13321-019-0393-0] [Citation(s) in RCA: 120] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2019] [Accepted: 11/09/2019] [Indexed: 12/22/2022] Open
Abstract
Recurrent Neural Networks (RNNs) trained with a set of molecules represented as unique (canonical) SMILES strings, have shown the capacity to create large chemical spaces of valid and meaningful structures. Herein we perform an extensive benchmark on models trained with subsets of GDB-13 of different sizes (1 million, 10,000 and 1000), with different SMILES variants (canonical, randomized and DeepSMILES), with two different recurrent cell types (LSTM and GRU) and with different hyperparameter combinations. To guide the benchmarks new metrics were developed that define how well a model has generalized the training set. The generated chemical space is evaluated with respect to its uniformity, closedness and completeness. Results show that models that use LSTM cells trained with 1 million randomized SMILES, a non-unique molecular string representation, are able to generalize to larger chemical spaces than the other approaches and they represent more accurately the target chemical space. Specifically, a model was trained with randomized SMILES that was able to generate almost all molecules from GDB-13 with a quasi-uniform probability. Models trained with smaller samples show an even bigger improvement when trained with randomized SMILES models. Additionally, models were trained on molecules obtained from ChEMBL and illustrate again that training with randomized SMILES lead to models having a better representation of the drug-like chemical space. Namely, the model trained with randomized SMILES was able to generate at least double the amount of unique molecules with the same distribution of properties comparing to one trained with canonical SMILES.
Collapse
Affiliation(s)
- Josep Arús-Pous
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden.
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland.
| | | | - Oleksii Prykhodko
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | | | - Christian Tyrchan
- Medicinal Chemistry, BioPharmaceuticals Early RIA, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| | - Ola Engkvist
- Hit Discovery, Discovery Sciences, R&D, AstraZeneca Gothenburg, Mölndal, Sweden
| |
Collapse
|
25
|
Kausar S, Falcao AO. A visual approach for analysis and inference of molecular activity spaces. J Cheminform 2019; 11:63. [PMID: 33430986 PMCID: PMC6805449 DOI: 10.1186/s13321-019-0386-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/05/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. RESULTS This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. CONCLUSIONS Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.
Collapse
Affiliation(s)
- Samina Kausar
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Andre O. Falcao
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|
26
|
Zeng T, Liu Z, Liu H, He W, Tang X, Xie L, Wu R. Exploring Chemical and Biological Space of Terpenoids. J Chem Inf Model 2019; 59:3667-3678. [DOI: 10.1021/acs.jcim.9b00443] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Affiliation(s)
- Tao Zeng
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| | - Zhihong Liu
- State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, P.R. China
| | - Huawei Liu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| | - Wengan He
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| | - Xiaowen Tang
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| | - Liwei Xie
- State Key Laboratory of Applied Microbiology Southern China, Guangdong Provincial Key Laboratory of Microbial Culture Collection and Application, Guangdong Open Laboratory of Applied Microbiology, Guangdong Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou 510070, P.R. China
| | - Ruibo Wu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou 510006, P.R. China
| |
Collapse
|
27
|
Yang X, Wang Y, Byrne R, Schneider G, Yang S. Concepts of Artificial Intelligence for Computer-Assisted Drug Discovery. Chem Rev 2019; 119:10520-10594. [PMID: 31294972 DOI: 10.1021/acs.chemrev.8b00728] [Citation(s) in RCA: 346] [Impact Index Per Article: 69.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
Artificial intelligence (AI), and, in particular, deep learning as a subcategory of AI, provides opportunities for the discovery and development of innovative drugs. Various machine learning approaches have recently (re)emerged, some of which may be considered instances of domain-specific AI which have been successfully employed for drug discovery and design. This review provides a comprehensive portrayal of these machine learning techniques and of their applications in medicinal chemistry. After introducing the basic principles, alongside some application notes, of the various machine learning algorithms, the current state-of-the art of AI-assisted pharmaceutical discovery is discussed, including applications in structure- and ligand-based virtual screening, de novo drug design, physicochemical and pharmacokinetic property prediction, drug repurposing, and related aspects. Finally, several challenges and limitations of the current methods are summarized, with a view to potential future directions for AI-assisted drug discovery and design.
Collapse
Affiliation(s)
- Xin Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Yifei Wang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| | - Ryan Byrne
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Gisbert Schneider
- ETH Zurich , Department of Chemistry and Applied Biosciences , Vladimir-Prelog-Weg 4 , CH-8093 Zurich , Switzerland
| | - Shengyong Yang
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital , Sichuan University , Chengdu , Sichuan 610041 , China
| |
Collapse
|
28
|
Pawar G, Madden JC, Ebbrell D, Firman JW, Cronin MTD. In Silico Toxicology Data Resources to Support Read-Across and (Q)SAR. Front Pharmacol 2019; 10:561. [PMID: 31244651 PMCID: PMC6580867 DOI: 10.3389/fphar.2019.00561] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2019] [Accepted: 05/03/2019] [Indexed: 12/14/2022] Open
Abstract
A plethora of databases exist online that can assist in in silico chemical or drug safety assessment. However, a systematic review and grouping of databases, based on purpose and information content, consolidated in a single source, has been lacking. To resolve this issue, this review provides a comprehensive listing of the key in silico data resources relevant to: chemical identity and properties, drug action, toxicology (including nano-material toxicity), exposure, omics, pathways, Absorption, Distribution, Metabolism and Elimination (ADME) properties, clinical trials, pharmacovigilance, patents-related databases, biological (genes, enzymes, proteins, other macromolecules etc.) databases, protein-protein interactions (PPIs), environmental exposure related, and finally databases relating to animal alternatives in support of 3Rs policies. More than nine hundred databases were identified and reviewed against criteria relating to accessibility, data coverage, interoperability or application programming interface (API), appropriate identifiers, types of in vitro, in vivo,-clinical or other data recorded and suitability for modelling, read-across, or similarity searching. This review also specifically addresses the need for solutions for mapping and integration of databases into a common platform for better translatability of preclinical data to clinical data.
Collapse
Affiliation(s)
| | | | | | | | - Mark T. D. Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
29
|
Awale M, Sirockin F, Stiefl N, Reymond JL. Medicinal Chemistry Aware Database GDBMedChem. Mol Inform 2019; 38:e1900031. [PMID: 31169974 DOI: 10.1002/minf.201900031] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Accepted: 05/21/2019] [Indexed: 12/17/2022]
Abstract
The generated database GDB17 enumerates 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogens following simple chemical stability and synthetic feasibility rules, however medicinal chemistry criteria are not taken into account. Here we applied rules inspired by medicinal chemistry to exclude problematic functional groups and complex molecules from GDB17, and sampled the resulting subset uniformly across molecular size, stereochemistry and polarity to form GDBMedChem as a compact collection of 10 million small molecules. This collection has reduced complexity and better synthetic accessibility than the entire GDB17 but retains higher sp3 -carbon fraction and natural product likeness scores compared to known drugs. GDBMedChem molecules are more diverse and very different from known molecules in terms of substructures and represent an unprecedented source of diversity for drug design. GDBMedChem is available for 3D-visualization, similarity searching and for download at http://gdb.unibe.ch.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Finton Sirockin
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Nikolaus Stiefl
- Novartis Institutes for Biomedical Research, Basel, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| |
Collapse
|
30
|
The next level in chemical space navigation: going far beyond enumerable compound libraries. Drug Discov Today 2019; 24:1148-1156. [PMID: 30851414 DOI: 10.1016/j.drudis.2019.02.013] [Citation(s) in RCA: 127] [Impact Index Per Article: 25.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Revised: 02/01/2019] [Accepted: 02/28/2019] [Indexed: 10/27/2022]
Abstract
Recent innovations have brought pharmacophore-driven methods for navigating virtual chemical spaces, the size of which can reach into the billions of molecules, to the fingertips of every chemist. There has been a paradigm shift in the underlying computational chemistry that drives chemical space search applications, incorporating intelligent reaction knowledge into their core so that they can readily deliver commercially available molecules as nearest neighbor hits from within giant virtual spaces. These vast resources enable medicinal chemists to execute rapid scaffold-hopping experiments, rapid hit expansion, and structure-activity relationship (SAR) exploitation in largely intellectual property (IP)-free territory and at unparalleled low cost.
Collapse
|
31
|
Berenger F, Yamanishi Y. A Distance-Based Boolean Applicability Domain for Classification of High Throughput Screening Data. J Chem Inf Model 2019; 59:463-476. [PMID: 30567434 DOI: 10.1021/acs.jcim.8b00499] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
In Quantitative Structure-Activity Relationship (QSAR) modeling, one must come up with an activity model but also with an applicability domain for that model. Some existing methods to create an applicability domain are complex, hard to implement, and/or difficult to interpret. Also, they often require the user to select a threshold value, or they embed an empirical constant. In this work, we propose a trivial to interpret and fully automatic Distance-Based Boolean Applicability Domain (DBBAD) algorithm for category QSAR. In retrospective experiments on High Throughput Screening data sets, this applicability domain improves the classification performance and early retrieval of support vector machine and random forest based classifiers, while improving the scaffold diversity among top-ranked active molecules.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering , Kyushu Institute of Technology , 680-4 Kawazu , Iizuka , Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering , Kyushu Institute of Technology , 680-4 Kawazu , Iizuka , Japan.,PRESTO, Japan Science and Technology Agency , Kawaguchi , Saitama 332-0012 , Japan
| |
Collapse
|
32
|
Manganese coordination compounds of mefenamic acid: In vitro screening and in silico prediction of biological activity. J Inorg Biochem 2019; 190:1-14. [DOI: 10.1016/j.jinorgbio.2018.09.017] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2018] [Revised: 09/14/2018] [Accepted: 09/26/2018] [Indexed: 02/07/2023]
|
33
|
Awale M, Reymond JL. Polypharmacology Browser PPB2: Target Prediction Combining Nearest Neighbors with Machine Learning. J Chem Inf Model 2018; 59:10-17. [PMID: 30558418 DOI: 10.1021/acs.jcim.8b00524] [Citation(s) in RCA: 74] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Here we report PPB2 as a target prediction tool assigning targets to a query molecule based on ChEMBL data. PPB2 computes ligand similarities using molecular fingerprints encoding composition (MQN), molecular shape and pharmacophores (Xfp), and substructures (ECfp4) and features an unprecedented combination of nearest neighbor (NN) searches and Naı̈ve Bayes (NB) machine learning, together with simple NN searches, NB and Deep Neural Network (DNN) machine learning models as further options. Although NN(ECfp4) gives the best results in terms of recall in a 10-fold cross-validation study, combining NN searches with NB machine learning provides superior precision statistics, as well as better results in a case study predicting off-targets of a recently reported TRPV6 calcium channel inhibitor, illustrating the value of this combined approach. PPB2 is available to assess possible off-targets of small molecule drug-like compounds by public access at http://gdb.unibe.ch .
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR TransCure , University of Berne , Freiestrasse 3 , 3012 Berne , Switzerland
| |
Collapse
|
34
|
Dittrich J, Schmidt D, Pfleger C, Gohlke H. Converging a Knowledge-Based Scoring Function: DrugScore2018. J Chem Inf Model 2018; 59:509-521. [DOI: 10.1021/acs.jcim.8b00582] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jonas Dittrich
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Denis Schmidt
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Christopher Pfleger
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Mathematisch-Naturwissenschaftliche Fakultät, Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- John von Neumann Institute for Computing (NIC), Jülich Supercomputing Centre (JSC) & Institute for Complex Systems−Structural Biochemistry (ICS-6), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
35
|
Leveridge M, Chung CW, Gross JW, Phelps CB, Green D. Integration of Lead Discovery Tactics and the Evolution of the Lead Discovery Toolbox. SLAS DISCOVERY 2018; 23:881-897. [PMID: 29874524 DOI: 10.1177/2472555218778503] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
There has been much debate around the success rates of various screening strategies to identify starting points for drug discovery. Although high-throughput target-based and phenotypic screening has been the focus of this debate, techniques such as fragment screening, virtual screening, and DNA-encoded library screening are also increasingly reported as a source of new chemical equity. Here, we provide examples in which integration of more than one screening approach has improved the campaign outcome and discuss how strengths and weaknesses of various methods can be used to build a complementary toolbox of approaches, giving researchers the greatest probability of successfully identifying leads. Among others, we highlight case studies for receptor-interacting serine/threonine-protein kinase 1 and the bromo- and extra-terminal domain family of bromodomains. In each example, the unique insight or chemistries individual approaches provided are described, emphasizing the synergy of information obtained from the various tactics employed and the particular question each tactic was employed to answer. We conclude with a short prospective discussing how screening strategies are evolving, what this screening toolbox might look like in the future, how to maximize success through integration of multiple tactics, and scenarios that drive selection of one combination of tactics over another.
Collapse
Affiliation(s)
- Melanie Leveridge
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Chun-Wa Chung
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| | - Jeffrey W Gross
- 2 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Collegeville, PA, USA
| | - Christopher B Phelps
- 3 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Cambridge, MA, USA
| | - Darren Green
- 1 GlaxoSmithKline Drug Design and Selection, Platform Technology and Science, Stevenage, Hertfordshire, UK
| |
Collapse
|
36
|
Filimonov D, Druzhilovskiy D, Lagunin A, Gloriozova T, Rudik A, Dmitriev A, Pogodin P, Poroikov V. Computer-aided prediction of biological activity spectra for chemical compounds: opportunities and limitation. ACTA ACUST UNITED AC 2018. [DOI: 10.18097/bmcrm00004] [Citation(s) in RCA: 77] [Impact Index Per Article: 12.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
An essential characteristic of chemical compounds is their biological activity since its presence can become the basis for the use of the substance for therapeutic purposes, or, on the contrary, limit the possibilities of its practical application due to the manifestation of side action and toxic effects. Computer assessment of the biological activity spectra makes it possible to determine the most promising directions for the study of the pharmacological action of particular substances, and to filter out potentially dangerous molecules at the early stages of research. For more than 25 years, we have been developing and improving the computer program PASS (Prediction of Activity Spectra for Substances), designed to predict the biological activity spectrum of substance based on the structural formula of its molecules. The prediction is carried out by the analysis of structure-activity relationships for the training set, which currently contains information on structures and known biological activities for more than one million molecules. The structure of the organic compound is represented in PASS using Multilevel Neighborhoods of Atoms descriptors; the activity prediction for new compounds is performed by the naive Bayes classifier and the structure-activity relationships determined by the analysis of the training set. We have created and improved both local versions of the PASS program and freely available web resources based on PASS (http://www.way2drug.com). They predict several thousand biological activities (pharmacological effects, molecular mechanisms of action, specific toxicity and adverse effects, interaction with the unwanted targets, metabolism and action on molecular transport), cytotoxicity for tumor and non-tumor cell lines, carcinogenicity, induced changes of gene expression profiles, metabolic sites of the major enzymes of the first and second phases of xenobiotics biotransformation, and belonging to substrates and/or metabolites of metabolic enzymes. The web resource Way2Drug is used by over 18,000 researchers from more than 90 countries around the world, which allowed them to obtain over 600,000 predictions and publish about 500 papers describing the obtained results. The analysis of the published works shows that in some cases the interpretation of the prediction results presented by the authors of these publications requires an adjustment. In this work, we provide the theoretical basis and consider, on particular examples, the opportunities and limitations of computer-aided prediction of biological activity spectra.
Collapse
Affiliation(s)
| | | | - A.A. Lagunin
- Institute of Biomedical Chemistry; Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - A.V. Rudik
- Institute of Biomedical Chemistry, Moscow, Russia
| | | | - P.V. Pogodin
- Institute of Biomedical Chemistry, Moscow, Russia
| | | |
Collapse
|
37
|
O'Hagan S, Kell DB. Analysing and Navigating Natural Products Space for Generating Small, Diverse, But Representative Chemical Libraries. Biotechnol J 2017; 13. [PMID: 29168302 DOI: 10.1002/biot.201700503] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Revised: 11/09/2017] [Indexed: 01/01/2023]
Abstract
Armed with the digital availability of two natural products libraries, amounting to some 195 885 molecular entities, we ask the question of how we can best sample from them to maximize their "representativeness" in smaller and more usable libraries of 96, 384, 1152, and 1920 molecules. The term "representativeness" is intended to include diversity, but for numerical reasons (and the likelihood of being able to perform a QSAR) it is necessary to focus on areas of chemical space that are more highly populated. Encoding chemical structures as fingerprints using the RDKit "patterned" algorithm, we first assess the granularity of the natural products space using a simple clustering algorithm, showing that there are major regions of "denseness" but also a great many very sparsely populated areas. We then apply a "hybrid" hierarchical K-means clustering algorithm to the data to produce more statistically robust clusters from which representative and appropriate numbers of samples may be chosen. There is necessarily again a trade-off between cluster size and cluster number, but within these constraints, libraries containing 384 or 1152 molecules can be found that come from clusters that represent some 18 and 30% of the whole chemical space, with cluster sizes of, respectively, 50 and 27 or above, just about sufficient to perform a QSAR. By using the online availability of molecules via the Molport system (www.molport.com), we are also able to construct (and, for the first time, provide the contents of) a small virtual library of available molecules that provided effective coverage of the chemical space described. Consistent with this, the average molecular similarities of the contents of the libraries developed is considerably smaller than is that of the original libraries. The suggested libraries may have use in molecular or phenotypic screening, including for determining possible transporter substrates.
Collapse
Affiliation(s)
- Steve O'Hagan
- Dr. S. O'Hagan, Prof. D. B. Kell, School of Chemistry, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK.,Dr. S. O'Hagan, Prof. D. B. Kell, The Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK
| | - Douglas B Kell
- Dr. S. O'Hagan, Prof. D. B. Kell, School of Chemistry, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK.,Dr. S. O'Hagan, Prof. D. B. Kell, The Manchester Institute of Biotechnology, The University of Manchester, 131 Princess St, Manchester M1 7DN, UK.,Prof. D. B. Kell, Centre for the Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM), The University of Manchester, 131 Princess St, Manchester M1 7DN, UK
| |
Collapse
|
38
|
Berenger F, Vu O, Meiler J. Consensus queries in ligand-based virtual screening experiments. J Cheminform 2017; 9:60. [PMID: 29185065 PMCID: PMC5705545 DOI: 10.1186/s13321-017-0248-5] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2017] [Accepted: 11/20/2017] [Indexed: 11/10/2022] Open
Abstract
Background In ligand-based virtual screening experiments, a known active ligand is used in similarity searches to find putative active compounds for the same protein target. When there are several known active molecules, screening using all of them is more powerful than screening using a single ligand. A consensus query can be created by either screening serially with different ligands before merging the obtained similarity scores, or by combining the molecular descriptors (i.e. chemical fingerprints) of those ligands. Results We report on the discriminative power and speed of several consensus methods, on two datasets only made of experimentally verified molecules. The two datasets contain a total of 19 protein targets, 3776 known active and ~ 2 × 106 inactive molecules. Three chemical fingerprints are investigated: MACCS 166 bits, ECFP4 2048 bits and an unfolded version of MOLPRINT2D. Four different consensus policies and five consensus sizes were benchmarked. Conclusions The best consensus method is to rank candidate molecules using the maximum score obtained by each candidate molecule versus all known actives. When the number of actives used is small, the same screening performance can be approached by a consensus fingerprint. However, if the computational exploration of the chemical space is limited by speed (i.e. throughput), a consensus fingerprint allows to outperform this consensus of scores.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA. .,Division of System Cohort, Medical Institute of Bioregulation, Kyushu University, Fukuoka, Japan.
| | - Oanh Vu
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
39
|
Elkamhawy A, Paik S, Hassan AHE, Lee YS, Roh EJ. Hit discovery of 4-amino-N-(4-(3-(trifluoromethyl)phenoxy)pyrimidin-5-yl)benzamide: A novel EGFR inhibitor from a designed small library. Bioorg Chem 2017; 75:393-405. [PMID: 29102722 DOI: 10.1016/j.bioorg.2017.10.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2017] [Revised: 10/10/2017] [Accepted: 10/22/2017] [Indexed: 12/14/2022]
Abstract
Searching for hit compounds within the huge chemical space resembles the attempt to find a needle in a haystack. Cheminformatics-guided selection of few representative molecules of a rationally designed virtual combinatorial library is a powerful tool to confront this challenge, speed up hit identification and cut off costs. Herein, this approach has been applied to identify hit compounds with novel scaffolds able to inhibit EGFR kinase. From a generated virtual library, six 4-aryloxy-5-aminopyrimidine scaffold-derived compounds were selected, synthesized and evaluated as hit EGFR inhibitors. 4-Aryloxy-5-benzamidopyrimidines inhibited EGFR with IC50 1.05-5.37 μM. Cell-based assay of the most potent EGFR inhibitor hit (10ac) confirmed its cytotoxicity against different cancerous cells. In spite of no EGFR, HER2 or VEGFR1 inhibition was elicited by 4-aryloxy-5-(thio)ureidopyrimidine derivatives, cell-based evaluation suggested them as antiproliferative hits acting by other mechanism(s). Molecular docking study provided a plausible explanation of incapability of 4-aryloxy-5-(thio)ureidopyrimidines to inhibit EGFR and suggested a reasonable binding mode of 4-aryloxy-5-benzamidopyrimidines which provides a basis to develop more optimized ligands.
Collapse
Affiliation(s)
- Ahmed Elkamhawy
- Chemical Kinomics Research Center, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; Department of Pharmaceutical Organic Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura 35516, Egypt.
| | - Sora Paik
- Department of Fundamental Pharmaceutical Sciences, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea
| | - Ahmed H E Hassan
- Department of Medicinal Chemistry, Faculty of Pharmacy, Mansoura University, Mansoura 35516, Egypt; Medicinal Chemistry Laboratory, Department of Pharmacy, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea; Department of Life and Nonopharmaceutical Science, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea
| | - Yong Sup Lee
- Department of Fundamental Pharmaceutical Sciences, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea; Medicinal Chemistry Laboratory, Department of Pharmacy, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea; Department of Life and Nonopharmaceutical Science, College of Pharmacy, Kyung Hee University, Seoul 02447, Republic of Korea
| | - Eun Joo Roh
- Chemical Kinomics Research Center, Korea Institute of Science and Technology (KIST), Seoul 02792, Republic of Korea; Division of Bio-Medical Science & Technology, KIST School, Korea University of Science and Technology, Seoul 02792, Republic of Korea.
| |
Collapse
|
40
|
Visini R, Arús-Pous J, Awale M, Reymond JL. Virtual Exploration of the Ring Systems Chemical Universe. J Chem Inf Model 2017; 57:2707-2718. [PMID: 29019686 DOI: 10.1021/acs.jcim.7b00457] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Here, we explore the chemical space of all virtually possible organic molecules focusing on ring systems, which represent the cyclic cores of organic molecules obtained by removing all acyclic bonds and converting all remaining atoms to carbon. This approach circumvents the combinatorial explosion encountered when enumerating the molecules themselves. We report the chemical universe database GDB4c containing 916 130 ring systems up to four saturated or aromatic rings and maximum ring size of 14 atoms and GDB4c3D containing the corresponding 6 555 929 stereoisomers. Almost all (98.6%) of these ring systems are unknown and represent chiral 3D-shaped macrocycles containing small rings and quaternary centers reminiscent of polycyclic natural products. We envision that GDB4c can serve to select new ring systems from which to design analogs of such natural products. The database is available for download at www.gdb.unibe.ch together with interactive visualization and search tools as a resource for molecular design.
Collapse
Affiliation(s)
- Ricardo Visini
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Josep Arús-Pous
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Berne , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
41
|
von Roemeling CA, Caulfield TR, Marlow L, Bok I, Wen J, Miller JL, Hughes R, Hazlehurst L, Pinkerton AB, Radisky DC, Tun HW, Kim YSB, Lane AL, Copland JA. Accelerated bottom-up drug design platform enables the discovery of novel stearoyl-CoA desaturase 1 inhibitors for cancer therapy. Oncotarget 2017; 9:3-20. [PMID: 29416592 PMCID: PMC5787466 DOI: 10.18632/oncotarget.21545] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2017] [Accepted: 08/16/2017] [Indexed: 11/26/2022] Open
Abstract
Here we present an innovative computational-based drug discovery strategy, coupled with machine-based learning and functional assessment, for the rational design of novel small molecule inhibitors of the lipogenic enzyme stearoyl-CoA desaturase 1 (SCD1). Our methods resulted in the discovery of several unique molecules, of which our lead compound SSI-4 demonstrates potent anti-tumor activity, with an excellent pharmacokinetic and toxicology profile. We improve upon key characteristics, including chemoinformatics and absorption/distribution/metabolism/excretion (ADME) toxicity, while driving the IC50 to 0.6 nM in some instances. This approach to drug design can be executed in smaller research settings, applied to a wealth of other targets, and paves a path forward for bringing small-batch based drug programs into the Clinic.
Collapse
Affiliation(s)
| | | | - Laura Marlow
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA
| | - Ilah Bok
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA
| | - Jiang Wen
- Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - James L Miller
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA
| | - Robert Hughes
- Department of Chemistry, University of North Florida, Jacksonville, FL, USA
| | | | - Anthony B Pinkerton
- Conrad Prebys Center for Chemical Genomics, Sanford Burnham Medical Discovery Institute, La Jolla, CA, USA
| | - Derek C Radisky
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA
| | - Han W Tun
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA.,Department of Hematology/Oncology, Mayo Clinic, Jacksonville, FL, USA
| | - Yon Son Betty Kim
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, USA.,Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA.,Department of Neurosurgery, Mayo Clinic, Jacksonville, FL, USA
| | - Amy L Lane
- Department of Chemistry, University of North Florida, Jacksonville, FL, USA
| | - John A Copland
- Department of Cancer Biology, Mayo Clinic, Jacksonville, FL, USA
| |
Collapse
|
42
|
Molecular de-novo design through deep reinforcement learning. J Cheminform 2017; 9:48. [PMID: 29086083 PMCID: PMC5583141 DOI: 10.1186/s13321-017-0235-x] [Citation(s) in RCA: 482] [Impact Index Per Article: 68.9] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2017] [Accepted: 08/23/2017] [Indexed: 01/15/2023] Open
Abstract
This work introduces a method to tune a sequence-based generative model for molecular de novo design that through augmented episodic likelihood can learn to generate structures with certain specified desirable properties. We demonstrate how this model can execute a range of tasks such as generating analogues to a query structure and generating compounds predicted to be active against a biological target. As a proof of principle, the model is first trained to generate molecules that do not contain sulphur. As a second example, the model is trained to generate analogues to the drug Celecoxib, a technique that could be used for scaffold hopping or library expansion starting from a single molecule. Finally, when tuning the model towards generating compounds predicted to be active against the dopamine receptor type 2, the model generates structures of which more than 95% are predicted to be active, including experimentally confirmed actives that have not been included in either the generative model nor the activity prediction model.. ![]()
Collapse
|
43
|
Kontijevskis A. Mapping of Drug-like Chemical Universe with Reduced Complexity Molecular Frameworks. J Chem Inf Model 2017; 57:680-699. [DOI: 10.1021/acs.jcim.7b00006] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
44
|
Abstract
To better understand chemical space we recently enumerated the database GDB-17 containing 166.4 billion possible molecules up to 17 atoms of C, N, O, S and halogen following the simple rules of chemical stability and synthetic feasibility. However, due to the combinatorial explosion caused by systematic enumeration GDB-17 is strongly biased toward the largest, functionally and stereochemically most complex molecules and far too large for most virtual screening tools. Herein we selected a much smaller subset of GDB-17, called the fragment database FDB-17, which contains 10 million fragmentlike molecules evenly covering a broad value range for molecular size, polarity, and stereochemical complexity. The database is available at www.gdb.unibe.ch for download and free use, together with an interactive visualization application and a Web-based nearest neighbor search tool to facilitate the selection of new fragment-sized molecules for chemical synthesis.
Collapse
Affiliation(s)
- Ricardo Visini
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| | - Mahendra Awale
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern , Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
45
|
Pottel J, Moitessier N. Customizable Generation of Synthetically Accessible, Local Chemical Subspaces. J Chem Inf Model 2017; 57:454-467. [DOI: 10.1021/acs.jcim.6b00648] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Joshua Pottel
- Department of Chemistry, McGill University, 801
Sherbrooke Street W., Montréal, Québec, Canada H3A 0B8
| | - Nicolas Moitessier
- Department of Chemistry, McGill University, 801
Sherbrooke Street W., Montréal, Québec, Canada H3A 0B8
| |
Collapse
|
46
|
Awale M, Reymond JL. The polypharmacology browser: a web-based multi-fingerprint target prediction tool using ChEMBL bioactivity data. J Cheminform 2017; 9:11. [PMID: 28270862 PMCID: PMC5319934 DOI: 10.1186/s13321-017-0199-x] [Citation(s) in RCA: 64] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 02/10/2017] [Indexed: 12/31/2022] Open
Abstract
Background Several web-based tools have been reported recently which predict the possible targets of a small molecule by similarity to compounds of known bioactivity using molecular fingerprints (fps), however predictions in each case rely on similarities computed from only one or two fps. Considering that structural similarity and therefore the predicted targets strongly depend on the method used for comparison, it would be highly desirable to predict targets using a broader set of fps simultaneously. Results Herein, we present the polypharmacology browser (PPB), a web-based platform which predicts possible targets for small molecules by searching for nearest neighbors using ten different fps describing composition, substructures, molecular shape and pharmacophores. PPB searches through 4613 groups of at least 10 same target annotated bioactive molecules from ChEMBL and returns a list of predicted targets ranked by consensus voting scheme and p value. A validation study across 670 drugs with up to 20 targets showed that combining the predictions from all 10 fps gives the best results, with on average 50% of the known targets of a drug being correctly predicted with a hit rate of 25%. Furthermore, when profiling a new inhibitor of the calcium channel TRPV6 against 24 targets taken from a safety screen panel, we observed inhibition in 5 out of 5 targets predicted by PPB and in 7 out of 18 targets not predicted by PPB. The rate of correct (5/12) and incorrect (0/12) predictions for this compound by PPB was comparable to that of other web-based prediction tools. Conclusion PPB offers a versatile platform for target prediction based on multi-fingerprint comparisons, and is freely accessible at www.gdb.unibe.ch as a valuable support for drug discovery.. ![]() Electronic supplementary material The online version of this article (doi:10.1186/s13321-017-0199-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mahendra Awale
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR Chemical Biology and NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, National Center of Competence in Research NCCR Chemical Biology and NCCR TransCure, University of Berne, Freiestrasse 3, 3012 Berne, Switzerland
| |
Collapse
|
47
|
Fraaije JGEM, van Male J, Becherer P, Serral Gracià R. Coarse-Grained Models for Automated Fragmentation and Parametrization of Molecular Databases. J Chem Inf Model 2016; 56:2361-2377. [DOI: 10.1021/acs.jcim.6b00003] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Johannes G. E. M. Fraaije
- Leiden
Institute of Chemistry, Leiden University, Einsteinweg 55, 2300 RA Leiden, The Netherlands
- Culgi BV, Galileiweg 8, 2333 BD Leiden, The Netherlands
| | - Jan van Male
- Culgi BV, Galileiweg 8, 2333 BD Leiden, The Netherlands
| | - Paul Becherer
- Culgi BV, Galileiweg 8, 2333 BD Leiden, The Netherlands
| | | |
Collapse
|
48
|
Lewis R, Deheuvels J, Ertl P, Pirard B, Sirockin F. Building Compound Archives for the Future. Mol Inform 2016; 35:580-582. [PMID: 27870238 DOI: 10.1002/minf.201600042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Accepted: 05/02/2016] [Indexed: 11/11/2022]
Abstract
Will the targets of the future be covered by the compound libraries of today? This communication will cover a critical review of past strategies before turning to a new measure of diversity, protein pockets. A fingerprint descriptor for pockets will be described.
Collapse
|
49
|
Druzhilovskiy DS, Rudik AV, Filimonov DA, Lagunin AA, Gloriozova TA, Poroikov VV. Online resources for the prediction of biological activity of organic compounds. Russ Chem Bull 2016. [DOI: 10.1007/s11172-016-1310-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
50
|
Tetko IV, Engkvist O, Koch U, Reymond JL, Chen H. BIGCHEM: Challenges and Opportunities for Big Data Analysis in Chemistry. Mol Inform 2016; 35:615-621. [PMID: 27464907 PMCID: PMC5129546 DOI: 10.1002/minf.201600073] [Citation(s) in RCA: 68] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 07/06/2016] [Indexed: 01/19/2023]
Abstract
The increasing volume of biomedical data in chemistry and life sciences requires the development of new methods and approaches for their handling. Here, we briefly discuss some challenges and opportunities of this fast growing area of research with a focus on those to be addressed within the BIGCHEM project. The article starts with a brief description of some available resources for “Big Data” in chemistry and a discussion of the importance of data quality. We then discuss challenges with visualization of millions of compounds by combining chemical and biological data, the expectations from mining the “Big Data” using advanced machine‐learning methods, and their applications in polypharmacology prediction and target de‐convolution in phenotypic screening. We show that the efficient exploration of billions of molecules requires the development of smart strategies. We also address the issue of secure information sharing without disclosing chemical structures, which is critical to enable bi‐party or multi‐party data sharing. Data sharing is important in the context of the recent trend of “open innovation” in pharmaceutical industry, which has led to not only more information sharing among academics and pharma industries but also the so‐called “precompetitive” collaboration between pharma companies. At the end we highlight the importance of education in “Big Data” for further progress of this area.
Collapse
Affiliation(s)
- Igor V Tetko
- Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH), Institute of Structural Biology, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany.,BIGCHEM GmbH, Ingolstädter Landstraße 1, b. 60w, D-85764, Neuherberg, Germany
| | - Ola Engkvist
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| | - Uwe Koch
- Lead Discovery Center GmbH, Otto-Hahn Strasse 15, Dortmund, 44227, Germany
| | - Jean-Louis Reymond
- Department of Chemistry and Biochemistry, University of Bern, Freiestrasse 3, 3012, Bern, Switzerland
| | - Hongming Chen
- Discovery Sciences, AstraZeneca R&D Gothenburg, Pepparedsleden 1, Mölndal, SE-43183, Sweden
| |
Collapse
|