1
|
Bande AY, Baday S. Accelerating Molecular Docking using Machine Learning Methods. Mol Inform 2024; 43:e202300167. [PMID: 38850231 DOI: 10.1002/minf.202300167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Accepted: 04/18/2023] [Indexed: 06/10/2024]
Abstract
Virtual screening (VS) is one of the well-established approaches in drug discovery which speeds up the search for a bioactive molecule and, reduces costs and efforts associated with experiments. VS helps to narrow down the search space of chemical space and allows selecting fewer and more probable candidate compounds for experimental testing. Docking calculations are one of the commonly used and highly appreciated structure-based drug discovery methods. Databases for chemical structures of small molecules have been growing rapidly. However, at the moment virtual screening of large libraries via docking is not very common. In this work, we aim to accelerate docking studies by predicting docking scores without explicitly performing docking calculations. We experimented with an attention based long short-term memory (LSTM) neural network for an efficient prediction of docking scores as well as other machine learning models such as XGBoost. By using docking scores of a small number of ligands we trained our models and predicted docking scores of a few million molecules. Specifically, we tested our approaches on 11 datasets that were produced from in-house drug discovery studies. On average, by training models using only 7000 molecules we predicted docking scores of approximately 3.8 million molecules with R2 (coefficient of determination) of 0.77 and Spearman rank correlation coefficient of 0.85. We designed the system with ease of use in mind. All the user needs to provide is a csv file containing SMILES and their respective docking scores, the system then outputs a model that the user can use for the prediction of docking score for a new molecule.
Collapse
Affiliation(s)
- Abdulsalam Y Bande
- Computer Science Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
| | - Sefer Baday
- Computer Science Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
- Applied Informatics Department, Informatics Institute, Istanbul Technical University, Istanbul, Türkiye
- Artificial Intelligence and Data Engineering Department, Faculty of Computer Informatics and Engineering, Istanbul Technical University, Istanbul, 34469, Türkiye
| |
Collapse
|
2
|
Cieślak M, Danel T, Krzysztyńska-Kuleta O, Kalinowska-Tłuścik J. Machine learning accelerates pharmacophore-based virtual screening of MAO inhibitors. Sci Rep 2024; 14:8228. [PMID: 38589405 PMCID: PMC11369158 DOI: 10.1038/s41598-024-58122-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/26/2024] [Indexed: 04/10/2024] Open
Abstract
Nowadays, an efficient and robust virtual screening procedure is crucial in the drug discovery process, especially when performed on large and chemically diverse databases. Virtual screening methods, like molecular docking and classic QSAR models, are limited in their ability to handle vast numbers of compounds and to learn from scarce data, respectively. In this study, we introduce a universal methodology that uses a machine learning-based approach to predict docking scores without the need for time-consuming molecular docking procedures. The developed protocol yielded 1000 times faster binding energy predictions than classical docking-based screening. The proposed predictive model learns from docking results, allowing users to choose their preferred docking software without relying on insufficient and incoherent experimental activity data. The methodology described employs multiple types of molecular fingerprints and descriptors to construct an ensemble model that further reduces prediction errors and is capable of delivering highly precise docking score values for monoamine oxidase ligands, enabling faster identification of promising compounds. An extensive pharmacophore-constrained screening of the ZINC database resulted in a selection of 24 compounds that were synthesized and evaluated for their biological activity. A preliminary screen discovered weak inhibitors of MAO-A with a percentage efficiency index close to a known drug at the lowest tested concentration. The approach presented here can be successfully applied to other biological targets as target-specific knowledge is not incorporated at the screening phase.
Collapse
Affiliation(s)
- Marcin Cieślak
- Faculty of Chemistry, Jagiellonian University, Gronostajowa 2, 30-387, Kraków, Małopolska, Poland.
- Doctoral School of Exact and Natural Sciences, Jagiellonian University, Prof. S. Łojasiewicza 11, 30-348, Kraków, Małopolska, Poland.
- Computational Chemistry Department, Selvita, Bobrzynskiego 14, 30-348, Kraków, Małopolska, Poland.
| | - Tomasz Danel
- Faculty of Chemistry, Jagiellonian University, Gronostajowa 2, 30-387, Kraków, Małopolska, Poland
- Faculty of Mathematics and Computer Science, Jagiellonian University, Prof. S. Łojasiewicza 6, 30-348, Kraków, Małopolska, Poland
| | - Olga Krzysztyńska-Kuleta
- Cell and Molecular Biology Department, Selvita, Bobrzynskiego 14, 30-348, Kraków, Małopolska, Poland
| | | |
Collapse
|
3
|
Tropsha A, Isayev O, Varnek A, Schneider G, Cherkasov A. Integrating QSAR modelling and deep learning in drug discovery: the emergence of deep QSAR. Nat Rev Drug Discov 2024; 23:141-155. [PMID: 38066301 DOI: 10.1038/s41573-023-00832-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/21/2023] [Indexed: 02/08/2024]
Abstract
Quantitative structure-activity relationship (QSAR) modelling, an approach that was introduced 60 years ago, is widely used in computer-aided drug design. In recent years, progress in artificial intelligence techniques, such as deep learning, the rapid growth of databases of molecules for virtual screening and dramatic improvements in computational power have supported the emergence of a new field of QSAR applications that we term 'deep QSAR'. Marking a decade from the pioneering applications of deep QSAR to tasks involved in small-molecule drug discovery, we herein describe key advances in the field, including deep generative and reinforcement learning approaches in molecular design, deep learning models for synthetic planning and the application of deep QSAR models in structure-based virtual screening. We also reflect on the emergence of quantum computing, which promises to further accelerate deep QSAR applications and the need for open-source and democratized resources to support computer-aided drug design.
Collapse
Affiliation(s)
| | | | | | | | - Artem Cherkasov
- University of British Columbia, Vancouver, BC, Canada.
- Photonic Inc., Coquitlam, BC, Canada.
| |
Collapse
|
4
|
Wojtuch A, Danel T, Podlewska S, Maziarka Ł. Extended study on atomic featurization in graph neural networks for molecular property prediction. J Cheminform 2023; 15:81. [PMID: 37726841 PMCID: PMC10507875 DOI: 10.1186/s13321-023-00751-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results solely to the network architecture. To better understand this issue, we compare multiple atom representations by evaluating them on the prediction of free energy, solubility, and metabolic stability using graph convolutional networks. We discover that the choice of atom representation has a significant impact on model performance and that the optimal subset of features is task-specific. Additional experiments involving more sophisticated architectures, including graph transformers, support these findings. Moreover, we demonstrate that some commonly used atom features, such as the number of neighbors or the number of hydrogens, can be easily predicted using only information about bonds and atom type, yet their explicit inclusion in the representation has a positive impact on model performance. Finally, we explain the predictions of the best-performing models to better understand how they utilize the available atomic features.
Collapse
Affiliation(s)
- Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland.
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343, Kraków, Poland
| | - Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| |
Collapse
|
5
|
Rakhimbekova A, Lopukhov A, Klyachko N, Kabanov A, Madzhidov TI, Tropsha A. Efficient design of peptide-binding polymers using active learning approaches. J Control Release 2023; 353:903-914. [PMID: 36402234 DOI: 10.1016/j.jconrel.2022.11.023] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 10/21/2022] [Accepted: 11/13/2022] [Indexed: 12/23/2022]
Abstract
Active learning (AL) has become a subject of active recent research both in industry and academia as an efficient approach for rapid design and discovery of novel chemicals, materials, and polymers. Herein, we have assessed the applicability of AL for the discovery of polymeric micelle formulations for poorly soluble drugs. We were motivated by the key advantages of this approach making it a desirable strategy for rational design of drug delivery systems due toto its ability to (i) employ relatively small datasets for model development, (ii) iterate between model development and model assessment using small external datasets that can be either generated in focused experimental studies or formed from subsets of the initial training data, and (iii) progressively evolve models towards increasingly more reliable predictions and the identification of novel chemicals with the desired properties. In this study, we compared various AL protocols for their effectiveness in finding biologically active molecules using synthetic datasets. We have investigated the dependency of AL performance on the size of the initial training set, the relative complexity of the task, and the choice of the initial training dataset. We found that AL techniques as applied to regression modeling offer no benefits over random search, while AL used for classification tasks performs better than models built for randomly selected training sets but still quite far from perfect. Using the best performing AL protocol,. Finally, the best performing AL approach was employed to discover and experimentally validate novel binding polymers for a case study of asialoglycoprotein receptor (ASGPR).
Collapse
Affiliation(s)
- Assima Rakhimbekova
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| | - Anton Lopukhov
- Laboratory of Chemical Design of Bionanomaterials, Faculty of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Natalia Klyachko
- Laboratory of Chemical Design of Bionanomaterials, Faculty of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia
| | - Alexander Kabanov
- Laboratory of Chemical Design of Bionanomaterials, Faculty of Chemistry, M.V. Lomonosov Moscow State University, Moscow, Russia; Center for Nanotechnology in Drug Delivery, Division of Pharmacoengineering and Molecular Pharmaceutics, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, NC, USA
| | - Timur I Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kazan 420008, Russia
| | - Alexander Tropsha
- Laboratory for Molecular Modeling, Division of Chemical Biology and Medicinal Chemistry, UNC Eshelman School of Pharmacy, University of North Carolina, Chapel Hill, NC 27599, USA.
| |
Collapse
|
6
|
Qiu M, Liang X, Deng S, Li Y, Ke Y, Wang P, Mei H. A unified GCNN model for predicting CYP450 inhibitors by using graph convolutional neural networks with attention mechanism. Comput Biol Med 2022; 150:106177. [PMID: 36242811 DOI: 10.1016/j.compbiomed.2022.106177] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 09/19/2022] [Accepted: 10/01/2022] [Indexed: 11/17/2022]
Abstract
Undesirable drug-drug interactions (DDIs) may lead to serious adverse side effects when more than two drugs are administered to a patient simultaneously. One of the most common DDIs is caused by unexpected inhibition of a specific human cytochrome P450 (CYP450), which plays a dominant role in the metabolism of the co-administered drugs. Therefore, a unified and reliable method for predicting the potential inhibitors of CYP450 family is extremely important in drug development. In this work, graph convolutional neural network (GCN) with attention mechanism and 1-D convolutional neural network (CNN) were used to extract the features of CYP ligands and the binding sites of CYP450 respectively, which were then combined to establish a unified GCN-CNN (GCNN) model for predicting the inhibitors of 5 dominant CYP isoforms, i.e., 1A2, 2C9, 2C19, 2D6, and 3A4. Overall, the established GCNN model showed good performances on the test samples and achieved better performances than the recently proposed iCYP-MFE model by using the same datasets. Based on the heat-map analysis of the resulting molecular graphs, the key structural determinants of the CYP inhibitors were further explored.
Collapse
Affiliation(s)
- Minyao Qiu
- Key Laboratory of Biorheological Science and Technology (Ministry of Education), College of Bioengineering, Chongqing University, Chongqing, 400044, China; College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Xiaoqi Liang
- College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Siyao Deng
- College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Yufang Li
- College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Yanlan Ke
- College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Pingqing Wang
- College of Bioengineering, Chongqing University, Chongqing, 400044, China
| | - Hu Mei
- Key Laboratory of Biorheological Science and Technology (Ministry of Education), College of Bioengineering, Chongqing University, Chongqing, 400044, China; College of Bioengineering, Chongqing University, Chongqing, 400044, China.
| |
Collapse
|
7
|
Danel T, Wojtuch A, Podlewska S. Generation of new inhibitors of selected cytochrome P450 subtypes- In silico study. Comput Struct Biotechnol J 2022; 20:5639-5651. [PMID: 36284709 PMCID: PMC9582735 DOI: 10.1016/j.csbj.2022.10.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 09/30/2022] [Accepted: 10/02/2022] [Indexed: 11/16/2022] Open
Abstract
Physicochemical and pharmacokinetic compound profile has crucial impact on compound potency to become a future drug. Ligands with desired activity profile cannot be used for treatment if they are characterized by unfavourable physicochemical or ADMET properties. In the study, we consider metabolic stability and focus on selected subtypes of cytochrome P450 - proteins, which take part in the first phase of compound transformations in the organism. We develop a protocol for generation of new potential inhibitors of selected cytochrome isoforms. Its subsequent stages are composed of generation and assessment of new derivatives of known cytochrome inhibitors, docking and evaluation of the compound possible inhibition on the basis of the obtained ligand-protein complexes. Besides the library of new potential agents inhibiting particular cytochrome subtypes, we also prepare a graph neural network that predicts the change in activity for all modifications of the starting molecule. In addition, we perform a systematic statistical study on the influence of particular substitutions on the potential inhibition properties of generated compounds (both mono- and di-substitutions are considered), provide explanations of the inhibitory predictions and prepare an on-line visualization platform enabling manual inspection of the results. The developed methodology can greatly support the design of new cytochrome P450 inhibitors with the overarching goal of generation of new metabolically stable compounds. It enables instant evaluation of possible compound-cytochrome interactions and selection of ligands with the highest potential of possessing desired biological activity.
Collapse
Key Words
- CYP inhibitors
- CYP, cytochrome P450
- CYP450
- DL, deep learning
- DNNs, deep neural networks
- Docking
- Explainability
- GNN, graph neural network
- Graph neural networks
- ML, machine learning
- MSE, mean squared error
- Morgan FP, Morgan fingerprint
- New compounds generation
- On-line platform
- QSPR, quantitative structure-property relationship
- RF, random forest
- SRD, sum of ranking differences
Collapse
Affiliation(s)
- Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, 6 Łojasiewicza Street, 30-348 Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Department of Medicinal Chemistry, 31-343 Kraków, Smętna Street 12, Poland,Corresponding author.
| |
Collapse
|
8
|
Gentile F, Yaacoub JC, Gleave J, Fernandez M, Ton AT, Ban F, Stern A, Cherkasov A. Artificial intelligence-enabled virtual screening of ultra-large chemical libraries with deep docking. Nat Protoc 2022; 17:672-697. [PMID: 35121854 DOI: 10.1038/s41596-021-00659-2] [Citation(s) in RCA: 116] [Impact Index Per Article: 58.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2021] [Accepted: 11/08/2021] [Indexed: 12/14/2022]
Abstract
With the recent explosion of chemical libraries beyond a billion molecules, more efficient virtual screening approaches are needed. The Deep Docking (DD) platform enables up to 100-fold acceleration of structure-based virtual screening by docking only a subset of a chemical library, iteratively synchronized with a ligand-based prediction of the remaining docking scores. This method results in hundreds- to thousands-fold virtual hit enrichment (without significant loss of potential drug candidates) and hence enables the screening of billion molecule-sized chemical libraries without using extraordinary computational resources. Herein, we present and discuss the generalized DD protocol that has been proven successful in various computer-aided drug discovery (CADD) campaigns and can be applied in conjunction with any conventional docking program. The protocol encompasses eight consecutive stages: molecular library preparation, receptor preparation, random sampling of a library, ligand preparation, molecular docking, model training, model inference and the residual docking. The standard DD workflow enables iterative application of stages 3-7 with continuous augmentation of the training set, and the number of such iterations can be adjusted by the user. A predefined recall value allows for control of the percentage of top-scoring molecules that are retained by DD and can be adjusted to control the library size reduction. The procedure takes 1-2 weeks (depending on the available resources) and can be completely automated on computing clusters managed by job schedulers. This open-source protocol, at https://github.com/jamesgleave/DD_protocol , can be readily deployed by CADD researchers and can significantly accelerate the effective exploration of ultra-large portions of a chemical space.
Collapse
Affiliation(s)
- Francesco Gentile
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Jean Charle Yaacoub
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - James Gleave
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Michael Fernandez
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Anh-Tien Ton
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | - Fuqiang Ban
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada
| | | | - Artem Cherkasov
- Vancouver Prostate Centre, Department of Urologic Sciences, The University of British Columbia, Vancouver, BC, Canada.
| |
Collapse
|
9
|
Abstract
Computational methods play an increasingly important role in drug discovery. Structure-based drug design (SBDD), in particular, includes techniques that take into account the structure of the macromolecular target to predict compounds that are likely to establish optimal interactions with the binding site. The current interest in machine learning algorithms based on deep neural networks encouraged the application of deep learning to SBDD related problems. This chapter covers selected works in this active area of research.
Collapse
|
10
|
Abstract
Within the context of the latest resurgence in the application of artificial intelligence approaches, deep learning has undergone a renaissance over recent years. These methods have been applied to a number of problems in computational chemistry. Compared to other machine learning approaches, the practical performance advantages of deep neural networks are often unclear. However, deep learning does appear to offer a number of other advantages such as the facile incorporation of multitask learning and the enhancement of generative modeling. The high complexity of contemporary network architectures represents a potentially significant barrier to their future adoption due to the costs of training such models and challenges in interpreting their predictions. When combined with the relative paucity of very large datasets, it is interesting to reflect on whether deep learning is likely to have the kind of transformational impact on computational chemistry that it is commonly held to have had in other domains such as image recognition.
Collapse
|
11
|
Dong J, Zhao M, Liu Y, Su Y, Zeng X. Deep learning in retrosynthesis planning: datasets, models and tools. Brief Bioinform 2021; 23:6375056. [PMID: 34571535 DOI: 10.1093/bib/bbab391] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/16/2021] [Accepted: 08/30/2021] [Indexed: 12/29/2022] Open
Abstract
In recent years, synthesizing drugs powered by artificial intelligence has brought great convenience to society. Since retrosynthetic analysis occupies an essential position in synthetic chemistry, it has received broad attention from researchers. In this review, we comprehensively summarize the development process of retrosynthesis in the context of deep learning. This review covers all aspects of retrosynthesis, including datasets, models and tools. Specifically, we report representative models from academia, in addition to a detailed description of the available and stable platforms in the industry. We also discuss the disadvantages of the existing models and provide potential future trends, so that more abecedarians will quickly understand and participate in the family of retrosynthesis planning.
Collapse
Affiliation(s)
- Jingxin Dong
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Mingyi Zhao
- Department of Pediatrics, Third Xiangya Hospital, Central South University, 400013, Hunan, China
| | - Yuansheng Liu
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| | - Yansen Su
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, 230601, Hefei, China
| | - Xiangxiang Zeng
- College of Information Science and Engineering, Hunan University, 2 Lushan S Rd, Yuelu District, 410086, Hunan, China
| |
Collapse
|
12
|
Di Filippo JI, Cavasotto CN. Guided structure-based ligand identification and design via artificial intelligence modeling. Expert Opin Drug Discov 2021; 17:71-78. [PMID: 34544293 DOI: 10.1080/17460441.2021.1979514] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
INTRODUCTION The implementation of Artificial Intelligence (AI) methodologies to drug discovery (DD) are on the rise. Several applications have been developed for structure-based DD, where AI methods provide an alternative framework for the identification of ligands for validated therapeutic targets, as well as the de novo design of ligands through generative models. AREAS COVERED Herein, the authors review the contributions between the 2019 to present period regarding the application of AI methods to structure-based virtual screening (SBVS) which encompasses mainly molecular docking applications - binding pose prediction and binary classification for ligand or hit identification-, as well as de novo drug design driven by machine learning (ML) generative models, and the validation of AI models in structure-based screening. Studies are reviewed in terms of their main objective, used databases, implemented methodology, input and output, and key results . EXPERT OPINION More profound analyses regarding the validity and applicability of AI methods in DD have begun to appear. In the near future, we expect to see more structure-based generative models- which are scarce in comparison to ligand-based generative models-, the implementation of standard guidelines for validating the generated structures, and more analyses regarding the validation of AI methods in structure-based DD.
Collapse
Affiliation(s)
- Juan I Di Filippo
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| | - Claudio N Cavasotto
- Computational Drug Design and Biomedical Informatics Laboratory, Instituto de Investigaciones en Medicina Traslacional (IIMT), CONICET-Universidad Austral, Pilar, Buenos Aires, Argentina.,Facultad de Ciencias Biomédicas, and Facultad de Ingeniería, Universidad Austral, Pilar, Buenos Aires, Argentina.,Austral Institute for Applied Artificial Intelligence, Universidad Austral, Pilar, Buenos Aires, Argentina
| |
Collapse
|
13
|
Mordalski S, Wojtuch A, Podolak I, Kurczab R, Bojarski AJ. 2D SIFt: a matrix of ligand-receptor interactions. J Cheminform 2021; 13:66. [PMID: 34496955 PMCID: PMC8424890 DOI: 10.1186/s13321-021-00545-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 08/21/2021] [Indexed: 11/10/2022] Open
Abstract
Depicting a ligand-receptor complex via Interaction Fingerprints has been shown to be both a viable data visualization and an analysis tool. The spectrum of its applications ranges from simple visualization of the binding site through analysis of molecular dynamics runs, to the evaluation of the homology models and virtual screening. Here we present a novel tool derived from the Structural Interaction Fingerprints providing a detailed and unique insight into the interactions between receptor and specific regions of the ligand (grouped into pharmacophore features) in the form of a matrix, a 2D-SIFt descriptor. The provided implementation is easy to use and extends the python library, allowing the generation of interaction matrices and their manipulation (reading and writing as well as producing the average 2D-SIFt). The library for handling the interaction matrices is available via repository http://bitbucket.org/zchl/sift2d.
Collapse
Affiliation(s)
- Stefan Mordalski
- Department of Medicinal Chemistry, Maj Institute of Pharmacology Polish Academy of Sciences, Krakow, Poland.
| | - Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland
| | - Igor Podolak
- Faculty of Mathematics and Computer Science, Jagiellonian University, Krakow, Poland
| | - Rafał Kurczab
- Department of Medicinal Chemistry, Maj Institute of Pharmacology Polish Academy of Sciences, Krakow, Poland
| | - Andrzej J Bojarski
- Department of Medicinal Chemistry, Maj Institute of Pharmacology Polish Academy of Sciences, Krakow, Poland
| |
Collapse
|
14
|
Soleimany AP, Amini A, Goldman S, Rus D, Bhatia SN, Coley CW. Evidential Deep Learning for Guided Molecular Property Prediction and Discovery. ACS CENTRAL SCIENCE 2021; 7:1356-1367. [PMID: 34471680 PMCID: PMC8393200 DOI: 10.1021/acscentsci.1c00546] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Indexed: 05/12/2023]
Abstract
While neural networks achieve state-of-the-art performance for many molecular modeling and structure-property prediction tasks, these models can struggle with generalization to out-of-domain examples, exhibit poor sample efficiency, and produce uncalibrated predictions. In this paper, we leverage advances in evidential deep learning to demonstrate a new approach to uncertainty quantification for neural network-based molecular structure-property prediction at no additional computational cost. We develop both evidential 2D message passing neural networks and evidential 3D atomistic neural networks and apply these networks across a range of different tasks. We demonstrate that evidential uncertainties enable (1) calibrated predictions where uncertainty correlates with error, (2) sample-efficient training through uncertainty-guided active learning, and (3) improved experimental validation rates in a retrospective virtual screening campaign. Our results suggest that evidential deep learning can provide an efficient means of uncertainty quantification useful for molecular property prediction, discovery, and design tasks in the chemical and physical sciences.
Collapse
Affiliation(s)
- Ava P. Soleimany
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Graduate
Program in Biophysics, Harvard University, Boston, Massachusetts 02115, United States
- Microsoft
Research New England, Cambridge, Massachusetts 02142, United States
| | - Alexander Amini
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
| | - Samuel Goldman
- Computational
and Systems Biology, MIT, Cambridge, Massachusetts 02139, United States
| | - Daniela Rus
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
| | - Sangeeta N. Bhatia
- Harvard-MIT
Division of Health Sciences and Technology, MIT, Cambridge, Massachusetts 02139, United States
- Department
of Electrical Engineering and Computer Science, MIT, Cambridge, Massachusetts 02139, United States
- Howard
Hughes Medical Institute, Cambridge, Massachusetts 02139, United States
| | - Connor W. Coley
- Department
of Chemical Engineering, MIT, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
15
|
Berenger F, Kumar A, Zhang KYJ, Yamanishi Y. Lean-Docking: Exploiting Ligands' Predicted Docking Scores to Accelerate Molecular Docking. J Chem Inf Model 2021; 61:2341-2352. [PMID: 33861591 DOI: 10.1021/acs.jcim.0c01452] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
In structure-based virtual screening (SBVS), a binding site on a protein structure is used to search for ligands with favorable nonbonded interactions. Because it is computationally difficult, docking is time-consuming and any docking user will eventually encounter a chemical library that is too big to dock. This problem might arise because there is not enough computing power or because preparing and storing so many three-dimensional (3D) ligands requires too much space. In this study, however, we show that quality regressors can be trained to predict docking scores from molecular fingerprints. Although typical docking has a screening rate of less than one ligand per second on one CPU core, our regressors can predict about 5800 docking scores per second. This approach allows us to focus docking on the portion of a database that is predicted to have docking scores below a user-chosen threshold. Herein, usage examples are shown, where only 25% of a ligand database is docked, without any significant virtual screening performance loss. We call this method "lean-docking". To validate lean-docking, a massive docking campaign using several state-of-the-art docking software packages was undertaken on an unbiased data set, with only wet-lab tested active and inactive molecules. Although regressors allow the screening of a larger chemical space, even at a constant docking power, it is also clear that significant progress in the virtual screening power of docking scores is desirable.
Collapse
Affiliation(s)
- Francois Berenger
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Japan
| | - Ashutosh Kumar
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama, Kanagawa 230-0045, Japan
| | - Yoshihiro Yamanishi
- Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka 820-8502, Japan
| |
Collapse
|
16
|
Winkler DA. Use of Artificial Intelligence and Machine Learning for Discovery of Drugs for Neglected Tropical Diseases. Front Chem 2021; 9:614073. [PMID: 33791277 PMCID: PMC8005575 DOI: 10.3389/fchem.2021.614073] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 01/18/2021] [Indexed: 12/11/2022] Open
Abstract
Neglected tropical diseases continue to create high levels of morbidity and mortality in a sizeable fraction of the world’s population, despite ongoing research into new treatments. Some of the most important technological developments that have accelerated drug discovery for diseases of affluent countries have not flowed down to neglected tropical disease drug discovery. Pharmaceutical development business models, cost of developing new drug treatments and subsequent costs to patients, and accessibility of technologies to scientists in most of the affected countries are some of the reasons for this low uptake and slow development relative to that for common diseases in developed countries. Computational methods are starting to make significant inroads into discovery of drugs for neglected tropical diseases due to the increasing availability of large databases that can be used to train ML models, increasing accuracy of these methods, lower entry barrier for researchers, and widespread availability of public domain machine learning codes. Here, the application of artificial intelligence, largely the subset called machine learning, to modelling and prediction of biological activities and discovery of new drugs for neglected tropical diseases is summarized. The pathways for the development of machine learning methods in the short to medium term and the use of other artificial intelligence methods for drug discovery is discussed. The current roadblocks to, and likely impacts of, synergistic new technological developments on the use of ML methods for neglected tropical disease drug discovery in the future are also discussed.
Collapse
Affiliation(s)
- David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia.,Latrobe Institute for Molecular Science, La Trobe University, Bundoora, VIC, Australia.,School of Pharmacy, University of Nottingham, Nottingham, United Kingdom.,CSIRO Data61, Pullenvale, QLD, Australia
| |
Collapse
|
17
|
Wang ZW, Zhao LX, Ma P, Ye T, Fu Y, Ye F. Fragments recombination, design, synthesis, safener activity and CoMFA model of novel substituted dichloroacetylphenyl sulfonamide derivatives. PEST MANAGEMENT SCIENCE 2021; 77:1724-1738. [PMID: 33236407 DOI: 10.1002/ps.6193] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 11/12/2020] [Accepted: 11/25/2020] [Indexed: 06/11/2023]
Abstract
BACKGROUND Isoxaflutole (IXF), as a kind of 4-hydroxyphenylpyruvate dioxygenase (HPPD) inhibitor, has been widely used in many kinds of plants. IXF can cause injury in corn including leaf and stem bleaching, plant height reduction or stunting, and reduced crop stand. Safeners are co-applied with herbicides to protect crops without compromising weed control efficacy. With the ultimate goal of addressing Zea mays injury caused by IXF, a series of novel substituted dichloroacetylphenyl sulfonamide derivatives was designed on the basis of scaffold hopping and active substructure splicing. RESULTS A total of 35 compounds were synthesized via acylation reactions. All the compounds were characterized by infrared (IR), proton and carbon-13 nuclear magnetic resonance (1 H-NMR and 13 C-NMR), and high-resolution mass spectrometry (HRMS). The configuration of compound II-1 was confirmed by single crystal X-ray diffraction. The bioassay results showed that all the title compounds displayed remarkable protection against IXF via improved content of carotenoid. Especially compound II-1 which possessed better glutathione transferases (GSTs) activity and carotenoid content than the contrast safener cyprosulfamide (CSA). All the satisfied parameters suggested that the Comparative Molecular Field Analysis (CoMFA) model was reliable and stable [with a cross-validated coefficient (q2 ) = 0.527, r2 = 0.995, r2 pred = 0.931]. The molecular docking simulation indicated that the compound II-1 and CSA could compete with diketonitrile (DKN) at the active site of HPPD, which is a hydrolyzed product of IXF in plants, causing the herbicide to be ineffective. CONCLUSIONS The present work revealed that the compound II-1 deserves further attention as the candidate structure of safeners. © 2020 Society of Chemical Industry.
Collapse
Affiliation(s)
- Zi-Wei Wang
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| | - Li-Xia Zhao
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| | - Peng Ma
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| | - Tong Ye
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| | - Ying Fu
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| | - Fei Ye
- Department of Applied Chemistry, College of Arts and Sciences, Northeast Agricultural University, Harbin, China
| |
Collapse
|
18
|
Krishnan SR, Bung N, Bulusu G, Roy A. Accelerating De Novo Drug Design against Novel Proteins Using Deep Learning. J Chem Inf Model 2021; 61:621-630. [PMID: 33491455 DOI: 10.1021/acs.jcim.0c01060] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
In the world plagued by the emergence of new diseases, it is essential that we accelerate the drug design process to develop new therapeutics against them. In recent years, deep learning-based methods have shown some success in ligand-based drug design. Yet, these methods face the problem of data scarcity while designing drugs against a novel target. In this work, the potential of deep learning and molecular modeling approaches was leveraged to develop a drug design pipeline, which can be useful for cases where there is limited or no availability of target-specific ligand datasets. Inhibitors of the homologues of the target protein were screened at the active site of the target protein to create an initial target-specific dataset. Transfer learning was used to learn the features of the target-specific dataset. A deep predictive model was utilized to predict the docking scores of newly designed molecules. Both these models were combined using reinforcement learning to design new chemical entities with an optimized docking score. The pipeline was validated by designing inhibitors against the human JAK2 protein, where none of the existing JAK2 inhibitors were used for training. The ability of the method to reproduce existing molecules from the validation dataset and design molecules with better binding energy demonstrates the potential of the proposed approach.
Collapse
Affiliation(s)
- Sowmya Ramaswamy Krishnan
- TCS Innovation Labs-Hyderabad (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| | - Navneet Bung
- TCS Innovation Labs-Hyderabad (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| | - Gopalakrishnan Bulusu
- TCS Innovation Labs-Hyderabad (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| | - Arijit Roy
- TCS Innovation Labs-Hyderabad (Life Sciences Division), Tata Consultancy Services Limited, Hyderabad 500081, India
| |
Collapse
|