1
|
Wang L, Song Y, Wang H, Zhang X, Wang M, He J, Li S, Zhang L, Li K, Cao L. Advances of Artificial Intelligence in Anti-Cancer Drug Design: A Review of the Past Decade. Pharmaceuticals (Basel) 2023; 16:253. [PMID: 37259400 PMCID: PMC9963982 DOI: 10.3390/ph16020253] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Revised: 01/25/2023] [Accepted: 02/06/2023] [Indexed: 10/13/2023] Open
Abstract
Anti-cancer drug design has been acknowledged as a complicated, expensive, time-consuming, and challenging task. How to reduce the research costs and speed up the development process of anti-cancer drug designs has become a challenging and urgent question for the pharmaceutical industry. Computer-aided drug design methods have played a major role in the development of cancer treatments for over three decades. Recently, artificial intelligence has emerged as a powerful and promising technology for faster, cheaper, and more effective anti-cancer drug designs. This study is a narrative review that reviews a wide range of applications of artificial intelligence-based methods in anti-cancer drug design. We further clarify the fundamental principles of these methods, along with their advantages and disadvantages. Furthermore, we collate a large number of databases, including the omics database, the epigenomics database, the chemical compound database, and drug databases. Other researchers can consider them and adapt them to their own requirements.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Kang Li
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| | - Lei Cao
- Department of Biostatistics, School of Public Health, Harbin Medical University, Harbin 150081, China
| |
Collapse
|
2
|
Moradi S, Kundu S, Saidaminov MI. High-Throughput Synthesis of Thin Films for the Discovery of Energy Materials: A Perspective. ACS MATERIALS AU 2022; 2:516-524. [PMID: 36124002 PMCID: PMC9479136 DOI: 10.1021/acsmaterialsau.2c00028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Thin films are an
integral part of many electronic and optoelectronic
devices. They also provide an excellent platform for material characterization.
Therefore, strategies for the fabrication of thin films are constantly
developed and have significantly benefited from the advent of high-throughput
synthesis (HTS) platforms. This perspective summarizes recent advances
in HTS of thin films from experimentalists’ point of view.
The work analyzes general strategies of HTS and then discusses their
use in developing new energy materials for applications that rely
on thin films, such as solar cells, light-emitting diodes, batteries,
superconductors, and thermoelectrics. The perspective also summarizes
some key challenges and opportunities in the HTS of thin films.
Collapse
Affiliation(s)
- Shahram Moradi
- Department of Electrical & Computer Engineering, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
| | - Soumya Kundu
- Department of Chemistry, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
| | - Makhsud I. Saidaminov
- Department of Electrical & Computer Engineering, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
- Department of Chemistry, University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
- Centre for Advanced Materials and Related Technologies (CAMTEC), University of Victoria, 3800 Finnerty Road, Victoria, British Columbia V8P 5C2, Canada
| |
Collapse
|
3
|
Chee Wezen X, Chandran A, Eapen RS, Waters E, Bricio-Moreno L, Tosi T, Dolan S, Millership C, Kadioglu A, Gründling A, Itzhaki LS, Welch M, Rahman T. Structure-Based Discovery of Lipoteichoic Acid Synthase Inhibitors. J Chem Inf Model 2022; 62:2586-2599. [PMID: 35533315 PMCID: PMC9131456 DOI: 10.1021/acs.jcim.2c00300] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Indexed: 01/20/2023]
Abstract
Lipoteichoic acid synthase (LtaS) is a key enzyme for the cell wall biosynthesis of Gram-positive bacteria. Gram-positive bacteria that lack lipoteichoic acid (LTA) exhibit impaired cell division and growth defects. Thus, LtaS appears to be an attractive antimicrobial target. The pharmacology around LtaS remains largely unexplored with only two small-molecule LtaS inhibitors reported, namely "compound 1771" and the Congo red dye. Structure-based drug discovery efforts against LtaS remain unattempted due to the lack of an inhibitor-bound structure of LtaS. To address this, we combined the use of a molecular docking technique with molecular dynamics (MD) simulations to model a plausible binding mode of compound 1771 to the extracellular catalytic domain of LtaS (eLtaS). The model was validated using alanine mutagenesis studies combined with isothermal titration calorimetry. Additionally, lead optimization driven by our computational model resulted in an improved version of compound 1771, namely, compound 4 which showed greater affinity for binding to eLtaS than compound 1771 in biophysical assays. Compound 4 reduced LTA production in S. aureus dose-dependently, induced aberrant morphology as seen for LTA-deficient bacteria, and significantly reduced bacteria titers in the lung of mice infected with S. aureus. Analysis of our MD simulation trajectories revealed the possible formation of a transient cryptic pocket in eLtaS. Virtual screening (VS) against the cryptic pocket led to the identification of a new class of inhibitors that could potentiate β-lactams against methicillin-resistant S. aureus. Our overall workflow and data should encourage further drug design campaign against LtaS. Finally, our work reinforces the importance of considering protein conformational flexibility to a successful VS endeavor.
Collapse
Affiliation(s)
- Xavier Chee Wezen
- Science
Program, School of Chemical Engineering and Science, Faculty of Engineering,
Computing and Science, Swinburne University
of Technology Sarawak, Kuching 93350, Malaysia
| | - Aneesh Chandran
- Department
of Biotechnology & Microbiology, Kannur
University, Kannur 670 661, Kerala, India
| | | | - Elaine Waters
- Department
of Clinical Infection Microbiology and Immunology, Institute of Infection
and Global Health, University of Liverpool, Liverpool L69 7BE, U.K.
| | - Laura Bricio-Moreno
- Department
of Clinical Infection Microbiology and Immunology, Institute of Infection
and Global Health, University of Liverpool, Liverpool L69 7BE, U.K.
| | - Tommaso Tosi
- Section
of Molecular Microbiology and MRC Centre for Molecular Bacteriology
and Infection, Imperial College London, London SW7 2AZ, U.K.
| | - Stephen Dolan
- Department
of Biochemistry, University of Cambridge, Cambridge CB2 1QW, U.K.
| | - Charlotte Millership
- Section
of Molecular Microbiology and MRC Centre for Molecular Bacteriology
and Infection, Imperial College London, London SW7 2AZ, U.K.
| | - Aras Kadioglu
- Department
of Clinical Infection Microbiology and Immunology, Institute of Infection
and Global Health, University of Liverpool, Liverpool L69 7BE, U.K.
| | - Angelika Gründling
- Section
of Molecular Microbiology and MRC Centre for Molecular Bacteriology
and Infection, Imperial College London, London SW7 2AZ, U.K.
| | - Laura S. Itzhaki
- Department
of PharmacologyUniversity of CambridgeCambridgeCB2 1PDU.K.
| | - Martin Welch
- Department
of Biochemistry, University of Cambridge, Cambridge CB2 1QW, U.K.
| | - Taufiq Rahman
- Department
of PharmacologyUniversity of CambridgeCambridgeCB2 1PDU.K.
| |
Collapse
|
4
|
Clyde A. Ultrahigh Throughput Protein-Ligand Docking with Deep Learning. Methods Mol Biol 2022; 2390:301-319. [PMID: 34731475 DOI: 10.1007/978-1-0716-1787-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Ultrahigh-throughput virtual screening (uHTVS) is an emerging field linking together classical docking techniques with high-throughput AI methods. We outline mechanistic docking models' goals and successes. We present different AI accelerated workflows for uHTVS, mainly through surrogate docking models. We showcase a novel feature representation technique, molecular depictions (images), as a surrogate model for docking. Along with a discussion on analyzing screens using regression enrichment surfaces at the tens of billion scale, we outline a future for uHTVS screening pipelines with deep learning.
Collapse
Affiliation(s)
- Austin Clyde
- Department of Computer Science, University of Chicago, Chicago, IL, USA.
- Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, USA.
| |
Collapse
|
5
|
Mathai N, Stork C, Kirchmair J. BonMOLière: Small-Sized Libraries of Readily Purchasable Compounds, Optimized to Produce Genuine Hits in Biological Screens across the Protein Space. Int J Mol Sci 2021; 22:ijms22157773. [PMID: 34360558 PMCID: PMC8346018 DOI: 10.3390/ijms22157773] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/13/2021] [Accepted: 07/15/2021] [Indexed: 12/21/2022] Open
Abstract
Experimental screening of large sets of compounds against macromolecular targets is a key strategy to identify novel bioactivities. However, large-scale screening requires substantial experimental resources and is time-consuming and challenging. Therefore, small to medium-sized compound libraries with a high chance of producing genuine hits on an arbitrary protein of interest would be of great value to fields related to early drug discovery, in particular biochemical and cell research. Here, we present a computational approach that incorporates drug-likeness, predicted bioactivities, biological space coverage, and target novelty, to generate optimized compound libraries with maximized chances of producing genuine hits for a wide range of proteins. The computational approach evaluates drug-likeness with a set of established rules, predicts bioactivities with a validated, similarity-based approach, and optimizes the composition of small sets of compounds towards maximum target coverage and novelty. We found that, in comparison to the random selection of compounds for a library, our approach generates substantially improved compound sets. Quantified as the "fitness" of compound libraries, the calculated improvements ranged from +60% (for a library of 15,000 compounds) to +184% (for a library of 1000 compounds). The best of the optimized compound libraries prepared in this work are available for download as a dataset bundle ("BonMOLière").
Collapse
Affiliation(s)
- Neann Mathai
- Computational Biology Unit (CBU) and Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
| | - Conrad Stork
- Center for Bioinformatics (ZBH), Department of Informatics, Universität Hamburg, 20146 Hamburg, Germany;
| | - Johannes Kirchmair
- Computational Biology Unit (CBU) and Department of Chemistry, University of Bergen, N-5020 Bergen, Norway;
- Division of Pharmaceutical Chemistry, Department of Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria
- Correspondence:
| |
Collapse
|
6
|
Sakai M, Nagayasu K, Shibui N, Andoh C, Takayama K, Shirakawa H, Kaneko S. Prediction of pharmacological activities from chemical structures with graph convolutional neural networks. Sci Rep 2021; 11:525. [PMID: 33436854 PMCID: PMC7803991 DOI: 10.1038/s41598-020-80113-7] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 12/17/2020] [Indexed: 01/29/2023] Open
Abstract
Many therapeutic drugs are compounds that can be represented by simple chemical structures, which contain important determinants of affinity at the site of action. Recently, graph convolutional neural network (GCN) models have exhibited excellent results in classifying the activity of such compounds. For models that make quantitative predictions of activity, more complex information has been utilized, such as the three-dimensional structures of compounds and the amino acid sequences of their respective target proteins. As another approach, we hypothesized that if sufficient experimental data were available and there were enough nodes in hidden layers, a simple compound representation would quantitatively predict activity with satisfactory accuracy. In this study, we report that GCN models constructed solely from the two-dimensional structural information of compounds demonstrated a high degree of activity predictability against 127 diverse targets from the ChEMBL database. Using the information entropy as a metric, we also show that the structural diversity had less effect on the prediction performance. Finally, we report that virtual screening using the constructed model identified a new serotonin transporter inhibitor with activity comparable to that of a marketed drug in vitro and exhibited antidepressant effects in behavioural studies.
Collapse
Affiliation(s)
- Miyuki Sakai
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan ,Medical Database Ltd., 2-5-5 Sumitomoshibadaimon building, Shibadaimon, Minato-ku, Tokyo, 105-0012 Japan
| | - Kazuki Nagayasu
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Norihiro Shibui
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Chihiro Andoh
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Kaito Takayama
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Hisashi Shirakawa
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| | - Shuji Kaneko
- grid.258799.80000 0004 0372 2033Department of Molecular Pharmacology, Graduate School of Pharmaceutical Sciences, Kyoto University, 46-29 Yoshida-Shimoadachi-cho, Sakyo-ku, Kyoto, 606-8501 Japan
| |
Collapse
|
7
|
Škuta C, Cortés-Ciriano I, Dehaen W, Kříž P, van Westen GJP, Tetko IV, Bender A, Svozil D. QSAR-derived affinity fingerprints (part 1): fingerprint construction and modeling performance for similarity searching, bioactivity classification and scaffold hopping. J Cheminform 2020; 12:39. [PMID: 33431038 PMCID: PMC7260783 DOI: 10.1186/s13321-020-00443-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2019] [Accepted: 05/16/2020] [Indexed: 02/11/2023] Open
Abstract
An affinity fingerprint is the vector consisting of compound’s affinity or potency against the reference panel of protein targets. Here, we present the QAFFP fingerprint, 440 elements long in silico QSAR-based affinity fingerprint, components of which are predicted by Random Forest regression models trained on bioactivity data from the ChEMBL database. Both real-valued (rv-QAFFP) and binary (b-QAFFP) versions of the QAFFP fingerprint were implemented and their performance in similarity searching, biological activity classification and scaffold hopping was assessed and compared to that of the 1024 bits long Morgan2 fingerprint (the RDKit implementation of the ECFP4 fingerprint). In both similarity searching and biological activity classification, the QAFFP fingerprint yields retrieval rates, measured by AUC (~ 0.65 and ~ 0.70 for similarity searching depending on data sets, and ~ 0.85 for classification) and EF5 (~ 4.67 and ~ 5.82 for similarity searching depending on data sets, and ~ 2.10 for classification), comparable to that of the Morgan2 fingerprint (similarity searching AUC of ~ 0.57 and ~ 0.66, and EF5 of ~ 4.09 and ~ 6.41, depending on data sets, classification AUC of ~ 0.87, and EF5 of ~ 2.16). However, the QAFFP fingerprint outperforms the Morgan2 fingerprint in scaffold hopping as it is able to retrieve 1146 out of existing 1749 scaffolds, while the Morgan2 fingerprint reveals only 864 scaffolds.![]()
Collapse
Affiliation(s)
- C Škuta
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic
| | - I Cortés-Ciriano
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - W Dehaen
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic.,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - P Kříž
- Department of Mathematics, Faculty of Chemical Engineering, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - G J P van Westen
- Computational Drug Discovery, Drug Discovery and Safety, LACDR, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands
| | - I V Tetko
- Helmholtz Zentrum Muenchen - German Research Center for Environmental Health (GmbH) and BIGCHEM GmbH, Ingolstaedter Landstrasse 1, 85764, Neuherberg, Germany
| | - A Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, UK
| | - D Svozil
- CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Institute of Molecular Genetics of the ASCR, v. v. i., Vídeňská 1083, 142 20, Prague 4, Czech Republic. .,CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic.
| |
Collapse
|
8
|
Willems H, De Cesco S, Svensson F. Computational Chemistry on a Budget: Supporting Drug Discovery with Limited Resources. J Med Chem 2020; 63:10158-10169. [DOI: 10.1021/acs.jmedchem.9b02126] [Citation(s) in RCA: 28] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Henriëtte Willems
- The ALBORADA Drug Discovery Institute, University of Cambridge, Island Research Building, Cambridge Biomedical Campus, Hills Road, Cambridge CB2 0AH, U.K
| | - Stephane De Cesco
- Alzheimer’s Research UK Oxford Drug Discovery Institute, University of Oxford, NDM Research Building, Old Road Campus, Roosevelt Drive, Oxford OX3 7FZ, U.K
| | - Fredrik Svensson
- Alzheimer’s Research UK UCL Drug Discovery Institute, University College London, The Cruciform Building, Gower Street, London WC1E 6BT, U.K
| |
Collapse
|
9
|
Singh N, Chaput L, Villoutreix BO. Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace. Brief Bioinform 2020; 22:1790-1818. [PMID: 32187356 PMCID: PMC7986591 DOI: 10.1093/bib/bbaa034] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The interplay between life sciences and advancing technology drives a continuous cycle of chemical data growth; these data are most often stored in open or partially open databases. In parallel, many different types of algorithms are being developed to manipulate these chemical objects and associated bioactivity data. Virtual screening methods are among the most popular computational approaches in pharmaceutical research. Today, user-friendly web-based tools are available to help scientists perform virtual screening experiments. This article provides an overview of internet resources enabling and supporting chemical biology and early drug discovery with a main emphasis on web servers dedicated to virtual ligand screening and small-molecule docking. This survey first introduces some key concepts and then presents recent and easily accessible virtual screening and related target-fishing tools as well as briefly discusses case studies enabled by some of these web services. Notwithstanding further improvements, already available web-based tools not only contribute to the design of bioactive molecules and assist drug repositioning but also help to generate new ideas and explore different hypotheses in a timely fashion while contributing to teaching in the field of drug development.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
10
|
Méndez-Lucio O, Baillif B, Clevert DA, Rouquié D, Wichard J. De novo generation of hit-like molecules from gene expression signatures using artificial intelligence. Nat Commun 2020; 11:10. [PMID: 31900408 PMCID: PMC6941972 DOI: 10.1038/s41467-019-13807-w] [Citation(s) in RCA: 176] [Impact Index Per Article: 44.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Accepted: 11/27/2019] [Indexed: 01/20/2023] Open
Abstract
Finding new molecules with a desired biological activity is an extremely difficult task. In this context, artificial intelligence and generative models have been used for molecular de novo design and compound optimization. Herein, we report a generative model that bridges systems biology and molecular design, conditioning a generative adversarial network with transcriptomic data. By doing so, we can automatically design molecules that have a high probability to induce a desired transcriptomic profile. As long as the gene expression signature of the desired state is provided, this model is able to design active-like molecules for desired targets without any previous target annotation of the training compounds. Molecules designed by this model are more similar to active compounds than the ones identified by similarity of gene expression signatures. Overall, this method represents an alternative approach to bridge chemistry and biology in the long and difficult road of drug discovery.
Collapse
Affiliation(s)
- Oscar Méndez-Lucio
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France.
- Bloomoon, 13 Avenue Albert Einstein, 69100, Villeurbanne, France.
| | - Benoit Baillif
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France
| | - Djork-Arné Clevert
- Department of Machine Learning Research, Bayer AG, 13353, Berlin, Germany
| | - David Rouquié
- Bayer SAS, Bayer Crop Science, 355 rue Dostoïevski, CS 90153, 06906, Valbonne, Sophia Antipolis Cedex, France.
| | - Joerg Wichard
- Department of Genetic Toxicology, Bayer AG, 13353, Berlin, Germany.
| |
Collapse
|
11
|
Zhang Y, Lee AA. Bayesian semi-supervised learning for uncertainty-calibrated prediction of molecular properties and active learning. Chem Sci 2019; 10:8154-8163. [PMID: 31857882 PMCID: PMC6837061 DOI: 10.1039/c9sc00616h] [Citation(s) in RCA: 55] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2019] [Accepted: 07/04/2019] [Indexed: 12/16/2022] Open
Abstract
We report a statistically principled method to quantify the uncertainty of machine learning models for molecular properties prediction. We show that this uncertainty estimate can be used to judiciously design experiments.
Predicting bioactivity and physical properties of small molecules is a central challenge in drug discovery. Deep learning is becoming the method of choice but studies to date focus on mean accuracy as the main metric. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: outliers can derail a discovery campaign, thus models need to reliably predict when it will fail, even when the training data is biased; experiments are expensive, thus models need to be data-efficient and suggest informative training sets using active learning. We show that uncertainty quantification and active learning can be achieved by Bayesian semi-supervised graph convolutional neural networks. The Bayesian approach estimates uncertainty in a statistically principled way through sampling from the posterior distribution. Semi-supervised learning disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit and allowing the model to start active learning from a small initial pool of training data. Our study highlights the promise of Bayesian deep learning for chemistry.
Collapse
Affiliation(s)
- Yao Zhang
- Cavendish Laboratory , University of Cambridge , Cambridge CB3 0HE , UK . .,Department of Applied Mathematics and Theoretical Physics , University of Cambridge , Cambridge CB3 0WA , UK
| | - Alpha A Lee
- Cavendish Laboratory , University of Cambridge , Cambridge CB3 0HE , UK .
| |
Collapse
|
12
|
Bolgár B, Antal P. VB-MK-LMF: fusion of drugs, targets and interactions using variational Bayesian multiple kernel logistic matrix factorization. BMC Bioinformatics 2017; 18:440. [PMID: 28978313 PMCID: PMC5628496 DOI: 10.1186/s12859-017-1845-z] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Computational fusion approaches to drug-target interaction (DTI) prediction, capable of utilizing multiple sources of background knowledge, were reported to achieve superior predictive performance in multiple studies. Other studies showed that specificities of the DTI task, such as weighting the observations and focusing the side information are also vital for reaching top performance. METHOD We present Variational Bayesian Multiple Kernel Logistic Matrix Factorization (VB-MK-LMF), which unifies the advantages of (1) multiple kernel learning, (2) weighted observations, (3) graph Laplacian regularization, and (4) explicit modeling of probabilities of binary drug-target interactions. RESULTS VB-MK-LMF achieves significantly better predictive performance in standard benchmarks compared to state-of-the-art methods, which can be traced back to multiple factors. The systematic evaluation of the effect of multiple kernels confirm their benefits, but also highlights the limitations of linear kernel combinations, already recognized in other fields. The analysis of the effect of prior kernels using varying sample sizes sheds light on the balance of data and knowledge in DTI tasks and on the rate at which the effect of priors vanishes. This also shows the existence of "small sample size" regions where using side information offers significant gains. Alongside favorable predictive performance, a notable property of MF methods is that they provide a unified space for drugs and targets using latent representations. Compared to earlier studies, the dimensionality of this space proved to be surprisingly low, which makes the latent representations constructed by VB-ML-LMF especially well-suited for visual analytics. The probabilistic nature of the predictions allows the calculation of the expected values of hits in functionally relevant sets, which we demonstrate by predicting drug promiscuity. The variational Bayesian approximation is also implemented for general purpose graphics processing units yielding significantly improved computational time. CONCLUSION In standard benchmarks, VB-MK-LMF shows significantly improved predictive performance in a wide range of settings. Beyond these benchmarks, another contribution of our work is highlighting and providing estimates for further pharmaceutically relevant quantities, such as promiscuity, druggability and total number of interactions.
Collapse
Affiliation(s)
- Bence Bolgár
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| | - Péter Antal
- Department of Measurement and Information Systems, Budapest University of Technology and Economics, Magyar tudósok krt. 2., Budapest, 1117 Hungary
| |
Collapse
|
13
|
Svensson F, Norinder U, Bender A. Improving Screening Efficiency through Iterative Screening Using Docking and Conformal Prediction. J Chem Inf Model 2017; 57:439-444. [DOI: 10.1021/acs.jcim.6b00532] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
- Fredrik Svensson
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| | - Ulf Norinder
- Swetox,
Karolinska Institutet, Unit of Toxicology Sciences, Forskargatan
20, SE-151 36 Södertälje, Sweden
- Department
of Computer and Systems Sciences, Stockholm University, Box 7003, SE-164
07 Kista, Sweden
| | - Andreas Bender
- Centre
for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, United Kingdom
| |
Collapse
|