1
|
van Tilborg D, Brinkmann H, Criscuolo E, Rossen L, Özçelik R, Grisoni F. Deep learning for low-data drug discovery: Hurdles and opportunities. Curr Opin Struct Biol 2024; 86:102818. [PMID: 38669740 DOI: 10.1016/j.sbi.2024.102818] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/27/2024] [Accepted: 03/29/2024] [Indexed: 04/28/2024]
Abstract
Deep learning is becoming increasingly relevant in drug discovery, from de novo design to protein structure prediction and synthesis planning. However, it is often challenged by the small data regimes typical of certain drug discovery tasks. In such scenarios, deep learning approaches-which are notoriously 'data-hungry'-might fail to live up to their promise. Developing novel approaches to leverage the power of deep learning in low-data scenarios is sparking great attention, and future developments are expected to propel the field further. This mini-review provides an overview of recent low-data-learning approaches in drug discovery, analyzing their hurdles and advantages. Finally, we venture to provide a forecast of future research directions in low-data learning for drug discovery.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/DerekvTilborg
| | - Helena Brinkmann
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/hlnbrkmnn
| | - Emanuele Criscuolo
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/emanuelecriscu9
| | - Luke Rossen
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands. https://twitter.com/molecular_ml
| | - Rıza Özçelik
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands. https://twitter.com/Rza_ozcelik
| | - Francesca Grisoni
- Institute for Complex Molecular Systems (ICMS), Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, 5600 MB Eindhoven, the Netherlands; Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Princetonlaan 6, 3584 CB, Utrecht, the Netherlands.
| |
Collapse
|
2
|
Plau J, Morgan CE, Fedorov Y, Banerjee S, Adams DJ, Blaner WS, Yu EW, Golczak M. Discovery of Nonretinoid Inhibitors of CRBP1: Structural and Dynamic Insights for Ligand-Binding Mechanisms. ACS Chem Biol 2023; 18:2309-2323. [PMID: 37713257 PMCID: PMC10591915 DOI: 10.1021/acschembio.3c00402] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Accepted: 09/01/2023] [Indexed: 09/16/2023]
Abstract
The dysregulation of retinoid metabolism has been linked to prevalent ocular diseases including age-related macular degeneration and Stargardt disease. Modulating retinoid metabolism through pharmacological approaches holds promise for the treatment of these eye diseases. Cellular retinol-binding protein 1 (CRBP1) is the primary transporter of all-trans-retinol (atROL) in the eye, and its inhibition has recently been shown to protect mouse retinas from light-induced retinal damage. In this report, we employed high-throughput screening to identify new chemical scaffolds for competitive, nonretinoid inhibitors of CRBP1. To understand the mechanisms of interaction between CRBP1 and these inhibitors, we solved high-resolution X-ray crystal structures of the protein in complex with six selected compounds. By combining protein crystallography with hydrogen/deuterium exchange mass spectrometry, we quantified the conformational changes in CRBP1 caused by different inhibitors and correlated their magnitude with apparent binding affinities. Furthermore, using molecular dynamic simulations, we provided evidence for the functional significance of the "closed" conformation of CRBP1 in retaining ligands within the binding pocket. Collectively, our study outlines the molecular foundations for understanding the mechanism of high-affinity interactions between small molecules and CRBPs, offering a framework for the rational design of improved inhibitors for this class of lipid-binding proteins.
Collapse
Affiliation(s)
- Jacqueline Plau
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
| | - Christopher E. Morgan
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
- Department
of Chemistry, Thiel College, Greenville, Pennsylvania 16125, United States
| | - Yuriy Fedorov
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
| | - Surajit Banerjee
- Department
of Chemistry and Chemical Biology, Cornell
University, Ithaca, New York 14850, United States
- Northeastern
Collaborative Access Team, Argonne National
Laboratory, Argonne, Illinois 60439, United States
| | - Drew J. Adams
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
| | - William S. Blaner
- Department
of Medicine, College of Physicians and Surgeons, Columbia University, New York, New York 10032, United States
| | - Edward W. Yu
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
| | - Marcin Golczak
- Department
of Pharmacology, Small Molecule Drug Development Core Facility, Department of Genetics, and Cleveland Center
for Membrane and Structural Biology, School of Medicine, Case Western Reserve University, 10900 Euclid Avenue, Cleveland, Ohio 44106, United States
| |
Collapse
|
3
|
Yan X, Yue T, Winkler DA, Yin Y, Zhu H, Jiang G, Yan B. Converting Nanotoxicity Data to Information Using Artificial Intelligence and Simulation. Chem Rev 2023. [PMID: 37262026 DOI: 10.1021/acs.chemrev.3c00070] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Decades of nanotoxicology research have generated extensive and diverse data sets. However, data is not equal to information. The question is how to extract critical information buried in vast data streams. Here we show that artificial intelligence (AI) and molecular simulation play key roles in transforming nanotoxicity data into critical information, i.e., constructing the quantitative nanostructure (physicochemical properties)-toxicity relationships, and elucidating the toxicity-related molecular mechanisms. For AI and molecular simulation to realize their full impacts in this mission, several obstacles must be overcome. These include the paucity of high-quality nanomaterials (NMs) and standardized nanotoxicity data, the lack of model-friendly databases, the scarcity of specific and universal nanodescriptors, and the inability to simulate NMs at realistic spatial and temporal scales. This review provides a comprehensive and representative, but not exhaustive, summary of the current capability gaps and tools required to fill these formidable gaps. Specifically, we discuss the applications of AI and molecular simulation, which can address the large-scale data challenge for nanotoxicology research. The need for model-friendly nanotoxicity databases, powerful nanodescriptors, new modeling approaches, molecular mechanism analysis, and design of the next-generation NMs are also critically discussed. Finally, we provide a perspective on future trends and challenges.
Collapse
Affiliation(s)
- Xiliang Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| | - Tongtao Yue
- Key Laboratory of Marine Environment and Ecology, Ministry of Education, Institute of Coastal Environmental Pollution Control, Ocean University of China, Qingdao 266100, China
| | - David A Winkler
- Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia
- School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
- Department of Biochemistry and Chemistry, La Trobe Institute for Molecular Science, La Trobe University, Melbourne, Victoria 3086, Australia
| | - Yongguang Yin
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Hao Zhu
- Department of Chemistry and Biochemistry, Rowan University, Glassboro, New Jersey 08028, United States
| | - Guibin Jiang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| | - Bing Yan
- Institute of Environmental Research at the Greater Bay Area, Key Laboratory for Water Quality and Conservation of the Pearl River Delta, Ministry of Education, Guangzhou University, Guangzhou 510006, China
| |
Collapse
|
4
|
Gu S, Shen C, Yu J, Zhao H, Liu H, Liu L, Sheng R, Xu L, Wang Z, Hou T, Kang Y. Can molecular dynamics simulations improve predictions of protein-ligand binding affinity with machine learning? Brief Bioinform 2023; 24:6995375. [PMID: 36681903 DOI: 10.1093/bib/bbad008] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 12/04/2022] [Accepted: 12/30/2023] [Indexed: 01/23/2023] Open
Abstract
Binding affinity prediction largely determines the discovery efficiency of lead compounds in drug discovery. Recently, machine learning (ML)-based approaches have attracted much attention in hopes of enhancing the predictive performance of traditional physics-based approaches. In this study, we evaluated the impact of structural dynamic information on the binding affinity prediction by comparing the models trained on different dimensional descriptors, using three targets (i.e. JAK1, TAF1-BD2 and DDR1) and their corresponding ligands as the examples. Here, 2D descriptors are traditional ECFP4 fingerprints, 3D descriptors are the energy terms of the Smina and NNscore scoring functions and 4D descriptors contain the structural dynamic information derived from the trajectories based on molecular dynamics (MD) simulations. We systematically investigate the MD-refined binding affinity prediction performance of three classical ML algorithms (i.e. RF, SVR and XGB) as well as two common virtual screening methods, namely Glide docking and MM/PBSA. The outcomes of the ML models built using various dimensional descriptors and their combinations reveal that the MD refinement with the optimized protocol can improve the predictive performance on the TAF1-BD2 target with considerable structural flexibility, but not for the less flexible JAK1 and DDR1 targets, when taking docking poses as the initial structure instead of the crystal structures. The results highlight the importance of the initial structures to the final performance of the model through conformational analysis on the three targets with different flexibility.
Collapse
Affiliation(s)
- Shukai Gu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jiahui Yu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Hong Zhao
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Huanxiang Liu
- Faculty of Applied Science, Macao Polytechnic University, Macao, SAR, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Rong Sheng
- Health Technology Development Dept, Huawei Device Co., Ltd., Dongguan 523808, Guangdong, China
| | - Lei Xu
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou 213001, China
| | - Zhe Wang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
5
|
Bahia MS, Kaspi O, Touitou M, Binayev I, Dhail S, Spiegel J, Khazanov N, Yosipof A, Senderowitz H. A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations. Mol Inform 2023; 42:e2200186. [PMID: 36617991 DOI: 10.1002/minf.202200186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 01/10/2023]
Abstract
QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.
Collapse
Affiliation(s)
| | - Omer Kaspi
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Meir Touitou
- School of Cancer and Pharmaceutical Sciences, King's College London, London, 150 Stamford Street, SE1 9NH, United Kingdom
| | - Idan Binayev
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Seema Dhail
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Jacob Spiegel
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Netaly Khazanov
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Abraham Yosipof
- Department of Information Systems, College of Law & Business, Ramat-Gan, P.O. Box 852, Bnei Brak, 5110801, Israel
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
6
|
Quantitative structure-activity relationship modeling for predication of inhibition potencies of imatinib derivatives using SMILES attributes. Sci Rep 2022; 12:21708. [PMID: 36522400 PMCID: PMC9755126 DOI: 10.1038/s41598-022-26279-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Chronic myelogenous leukemia (CML) which is resulted from the BCR-ABL tyrosine kinase (TK) chimeric oncoprotein, is a malignant clonal disorder of hematopoietic stem cells. Imatinib is used as an inhibitor of BCR-ABL TK in the treatment of CML patients. The main object of the present manuscript is focused on constructing quantitative activity relationships (QSARs) models for the prediction of inhibition potencies of a large series of imatinib derivatives against BCR-ABL TK. Herren, the inbuilt Monte Carlo algorithm of CORAL software is employed to develop QSAR models. The SMILES notations of chemical structures are used to compute the descriptor of correlation weights (CWs). QSAR models are established using the balance of correlation method with the index of ideality of correlation (IIC). The data set of 306 molecules is randomly divided into three splits. In QSAR modeling, the numerical value of R2, Q2, and IIC for the validation set of splits 1 to 3 are in the range of 0.7180-0.7755, 0.6891-0.7561, and 0.4431-0.8611 respectively. The numerical result of [Formula: see text] > 0.5 for all three constructed models in the Y-randomization test validate the reliability of established models. The promoters of increase/decrease for pIC50 are recognized and used for the mechanistic interpretation of structural attributes.
Collapse
|
7
|
Ding W, Nan Y, Wu J, Han C, Xin X, Li S, Liu H, Zhang L. Combining multi-dimensional molecular fingerprints to predict the hERG cardiotoxicity of compounds. Comput Biol Med 2022; 144:105390. [DOI: 10.1016/j.compbiomed.2022.105390] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/06/2022] [Accepted: 03/07/2022] [Indexed: 01/28/2023]
|
8
|
Gervasoni S, Malloci G, Bosin A, Vargiu AV, Zgurskaya HI, Ruggerone P. AB-DB: Force-Field parameters, MD trajectories, QM-based data, and Descriptors of Antimicrobials. Sci Data 2022; 9:148. [PMID: 35365662 PMCID: PMC8976083 DOI: 10.1038/s41597-022-01261-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/11/2022] [Indexed: 12/13/2022] Open
Abstract
Antibiotic resistance is a major threat to public health. The development of chemo-informatic tools to guide medicinal chemistry campaigns in the efficint design of antibacterial libraries is urgently needed. We present AB-DB, an open database of all-atom force-field parameters, molecular dynamics trajectories, quantum-mechanical properties, and curated physico-chemical descriptors of antimicrobial compounds. We considered more than 300 molecules belonging to 25 families that include the most relevant antibiotic classes in clinical use, such as β-lactams and (fluoro)quinolones, as well as inhibitors of key bacterial proteins. We provide traditional descriptors together with properties obtained with Density Functional Theory calculations. Noteworthy, AB-DB contains less conventional descriptors extracted from μs-long molecular dynamics simulations in explicit solvent. In addition, for each compound we make available force-field parameters for the major micro-species at physiological pH. With the rise of multi-drug-resistant pathogens and the consequent need for novel antibiotics, inhibitors, and drug re-purposing strategies, curated databases containing reliable and not straightforward properties facilitate the integration of data mining and statistics into the discovery of new antimicrobials.
Collapse
Affiliation(s)
- Silvia Gervasoni
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Giuliano Malloci
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy.
| | - Andrea Bosin
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Attilio V Vargiu
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Helen I Zgurskaya
- University of Oklahoma, Department of Chemistry and Biochemistry, Norman, OK, 73072, United States
| | - Paolo Ruggerone
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| |
Collapse
|
9
|
Geslin D, Lepailleur A, Manguin JL, Vo NV, Lamotte JL, Cuissart B, Bureau R. Deciphering a Pharmacophore Network: A Case Study Using BCR-ABL Data. J Chem Inf Model 2022; 62:678-691. [PMID: 35080879 DOI: 10.1021/acs.jcim.1c00427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
This paper introduces a general method that can be used to create groups of pharmacophores to support their further in-depth analysis. A BCR-ABL molecular dataset was used to calculate graph edit distances between pharmacophores and led to their organization into a novel pharmacophore network. The application of a graph layout algorithm allowed us to discriminate between the pharmacophores associated with active compounds and those associated with inactive compounds. A clustering approach was used to refine the partitioning by grouping the pharmacophores based on their structures, activities, and binding modes. Analysis of a newly spatialized pharmacophore network provided us with critical insight into structure-activity relationships, most notably those that revealed distinctions between activity classes and chemical families. As shown, this method permits us to identify families of structurally homogeneous pharmacophores.
Collapse
Affiliation(s)
- Damien Geslin
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France.,Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Alban Lepailleur
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
| | - Jean-Luc Manguin
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Nhat-Vinh Vo
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Jean-Luc Lamotte
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France.,Sorbonne Université, UFR 919, 4 place Jussieu, F-75252 Paris Cedex 05, France
| | - Bertrand Cuissart
- Groupe de Recherche en Informatique, Image, Automatique et Instrumentation de Caen, Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen, France
| | - Ronan Bureau
- Centre d'Etudes et de Recherche sur le Médicament de Normandie, Normandie Univ, UNICAEN, CERMN, 14000 Caen, France
| |
Collapse
|
10
|
Zankov DV, Matveieva M, Nikonenko AV, Nugmanov RI, Baskin II, Varnek A, Polishchuk P, Madzhidov TI. QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach. J Chem Inf Model 2021; 61:4913-4923. [PMID: 34554736 DOI: 10.1021/acs.jcim.1c00692] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Modern QSAR approaches have wide practical applications in drug discovery for designing potentially bioactive molecules. If such models are based on the use of 2D descriptors, important information contained in the spatial structures of molecules is lost. The major problem in constructing models using 3D descriptors is the choice of a putative bioactive conformation, which affects the predictive performance. The multi-instance (MI) learning approach considering multiple conformations in model training could be a reasonable solution to the above problem. In this study, we implemented several multi-instance algorithms, both conventional and based on deep learning, and investigated their performance. We compared the performance of MI-QSAR models with those based on the classical single-instance QSAR (SI-QSAR) approach in which each molecule is encoded by either 2D descriptors computed for the corresponding molecular graph or 3D descriptors issued for a single lowest energy conformation. The calculations were carried out on 175 data sets extracted from the ChEMBL23 database. It is demonstrated that (i) MI-QSAR outperforms SI-QSAR in numerous cases and (ii) MI algorithms can automatically identify plausible bioactive conformations.
Collapse
Affiliation(s)
- Dmitry V Zankov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya 29, 420111 Kazan, Russia.,Laboratory of Chemoinformatics, Institute Le Bel, University of Strasbourg, B. Pascal 4, 67081 Strasbourg, France
| | - Mariia Matveieva
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900 Olomouc, Czech Republic
| | - Aleksandra V Nikonenko
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900 Olomouc, Czech Republic
| | - Ramil I Nugmanov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya 29, 420111 Kazan, Russia
| | - Igor I Baskin
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, 3200003 Haifa, Israel
| | - Alexandre Varnek
- Laboratory of Chemoinformatics, Institute Le Bel, University of Strasbourg, B. Pascal 4, 67081 Strasbourg, France
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900 Olomouc, Czech Republic
| | - Timur I Madzhidov
- Laboratory of Chemoinformatics and Molecular Modeling, A. M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlyovskaya 29, 420111 Kazan, Russia
| |
Collapse
|