1
|
Bahia MS, Kaspi O, Touitou M, Binayev I, Dhail S, Spiegel J, Khazanov N, Yosipof A, Senderowitz H. A comparison between 2D and 3D descriptors in QSAR modeling based on bio-active conformations. Mol Inform 2023; 42:e2200186. [PMID: 36617991 DOI: 10.1002/minf.202200186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2023] [Revised: 01/03/2023] [Accepted: 01/04/2023] [Indexed: 01/10/2023]
Abstract
QSAR models are widely and successfully used in many research areas. The success of such models highly depends on molecular descriptors typically classified as 1D, 2D, 3D, or 4D. While 3D information is likely important, e. g., for modeling ligand-protein binding, previous comparisons between the performances of 2D and 3D descriptors were inconclusive. Yet in such comparisons the modeled ligands were not necessarily represented by their bioactive conformations. With this in mind, we mined the PDB for sets of protein-ligand complexes sharing the same protein for which uniform activity data were reported. The results, totaling 461 structures spread across six series were compiled into a carefully curated, first of its kind dataset in which each ligand is represented by its bioactive conformation. Next, each set was characterized by 2D, 3D and 2D + 3D descriptors and modeled using three machine learning algorithms, namely, k-Nearest Neighbors, Random Forest and Lasso Regression. Models' performances were evaluated on external test sets derived from the parent datasets either randomly or in a rational manner. We found that many more significant models were obtained when combining 2D and 3D descriptors. We attribute these improvements to the ability of 2D and 3D descriptors to code for different, yet complementary molecular properties.
Collapse
Affiliation(s)
| | - Omer Kaspi
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Meir Touitou
- School of Cancer and Pharmaceutical Sciences, King's College London, London, 150 Stamford Street, SE1 9NH, United Kingdom
| | - Idan Binayev
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Seema Dhail
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Jacob Spiegel
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Netaly Khazanov
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| | - Abraham Yosipof
- Department of Information Systems, College of Law & Business, Ramat-Gan, P.O. Box 852, Bnei Brak, 5110801, Israel
| | - Hanoch Senderowitz
- Department of Chemistry, Bar-Ilan University, Ramat-Gan, 5290002, Israel
| |
Collapse
|
2
|
van Tilborg D, Alenicheva A, Grisoni F. Exposing the Limitations of Molecular Machine Learning with Activity Cliffs. J Chem Inf Model 2022; 62:5938-5951. [PMID: 36456532 PMCID: PMC9749029 DOI: 10.1021/acs.jcim.2c01073] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2022] [Indexed: 12/03/2022]
Abstract
Machine learning has become a crucial tool in drug discovery and chemistry at large, e.g., to predict molecular properties, such as bioactivity, with high accuracy. However, activity cliffs─pairs of molecules that are highly similar in their structure but exhibit large differences in potency─have received limited attention for their effect on model performance. Not only are these edge cases informative for molecule discovery and optimization but also models that are well equipped to accurately predict the potency of activity cliffs have increased potential for prospective applications. Our work aims to fill the current knowledge gap on best-practice machine learning methods in the presence of activity cliffs. We benchmarked a total of 24 machine and deep learning approaches on curated bioactivity data from 30 macromolecular targets for their performance on activity cliff compounds. While all methods struggled in the presence of activity cliffs, machine learning approaches based on molecular descriptors outperformed more complex deep learning methods. Our findings highlight large case-by-case differences in performance, advocating for (a) the inclusion of dedicated "activity-cliff-centered" metrics during model development and evaluation and (b) the development of novel algorithms to better predict the properties of activity cliffs. To this end, the methods, metrics, and results of this study have been encapsulated into an open-access benchmarking platform named MoleculeACE (Activity Cliff Estimation, available on GitHub at: https://github.com/molML/MoleculeACE). MoleculeACE is designed to steer the community toward addressing the pressing but overlooked limitation of molecular machine learning models posed by activity cliffs.
Collapse
Affiliation(s)
- Derek van Tilborg
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| | | | - Francesca Grisoni
- Institute
for Complex Molecular Systems and Dept. Biomedical Engineering, Eindhoven University of Technology, 5612AZEindhoven, The Netherlands
- Centre
for Living Technologies, Alliance TU/e,
WUR, UU, UMC Utrecht, 3584CBUtrecht, The Netherlands
| |
Collapse
|
3
|
Gervasoni S, Malloci G, Bosin A, Vargiu AV, Zgurskaya HI, Ruggerone P. AB-DB: Force-Field parameters, MD trajectories, QM-based data, and Descriptors of Antimicrobials. Sci Data 2022; 9:148. [PMID: 35365662 PMCID: PMC8976083 DOI: 10.1038/s41597-022-01261-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Accepted: 03/11/2022] [Indexed: 12/13/2022] Open
Abstract
Antibiotic resistance is a major threat to public health. The development of chemo-informatic tools to guide medicinal chemistry campaigns in the efficint design of antibacterial libraries is urgently needed. We present AB-DB, an open database of all-atom force-field parameters, molecular dynamics trajectories, quantum-mechanical properties, and curated physico-chemical descriptors of antimicrobial compounds. We considered more than 300 molecules belonging to 25 families that include the most relevant antibiotic classes in clinical use, such as β-lactams and (fluoro)quinolones, as well as inhibitors of key bacterial proteins. We provide traditional descriptors together with properties obtained with Density Functional Theory calculations. Noteworthy, AB-DB contains less conventional descriptors extracted from μs-long molecular dynamics simulations in explicit solvent. In addition, for each compound we make available force-field parameters for the major micro-species at physiological pH. With the rise of multi-drug-resistant pathogens and the consequent need for novel antibiotics, inhibitors, and drug re-purposing strategies, curated databases containing reliable and not straightforward properties facilitate the integration of data mining and statistics into the discovery of new antimicrobials. Measurement(s) | molecular physical property analysis objective | Technology Type(s) | Computer Modeling |
Collapse
Affiliation(s)
- Silvia Gervasoni
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Giuliano Malloci
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy.
| | - Andrea Bosin
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Attilio V Vargiu
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| | - Helen I Zgurskaya
- University of Oklahoma, Department of Chemistry and Biochemistry, Norman, OK, 73072, United States
| | - Paolo Ruggerone
- University of Cagliari, Department of Physics, I-09042, Monserrato (Cagliari), Italy
| |
Collapse
|
4
|
Matsuzaka Y, Uesawa Y. A Deep Learning-Based Quantitative Structure-Activity Relationship System Construct Prediction Model of Agonist and Antagonist with High Performance. Int J Mol Sci 2022; 23:ijms23042141. [PMID: 35216254 PMCID: PMC8877122 DOI: 10.3390/ijms23042141] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Revised: 02/12/2022] [Accepted: 02/14/2022] [Indexed: 01/27/2023] Open
Abstract
Molecular design and evaluation for drug development and chemical safety assessment have been advanced by quantitative structure–activity relationship (QSAR) using artificial intelligence techniques, such as deep learning (DL). Previously, we have reported the high performance of prediction models molecular initiation events (MIEs) on the adverse toxicological outcome using a DL-based QSAR method, called DeepSnap-DL. This method can extract feature values from images generated on a three-dimensional (3D)-chemical structure as a novel QSAR analytical system. However, there is room for improvement of this system’s time-consumption. Therefore, in this study, we constructed an improved DeepSnap-DL system by combining the processes of generating an image from a 3D-chemical structure, DL using the image as input data, and statistical calculation of prediction-performance. Consequently, we obtained that the three prediction models of agonists or antagonists of MIEs achieved high prediction-performance by optimizing the parameters of DeepSnap, such as the angle used in the depiction of the image of a 3D-chemical structure, data-split, and hyperparameters in DL. The improved DeepSnap-DL system will be a powerful tool for computer-aided molecular design as a novel QSAR system.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose 204-8588, Japan;
- Center for Gene and Cell Therapy, Division of Molecular and Medical Genetics, The Institute of Medical Science, University of Tokyo, Minato City 108-8639, Japan
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, Kiyose 204-8588, Japan;
- Correspondence: ; Tel.: +81-42-495-8983
| |
Collapse
|
5
|
Hatmal MM, Abuyaman O, Taha M. Docking-generated multiple ligand poses for bootstrapping bioactivity classifying Machine Learning: Repurposing covalent inhibitors for COVID-19-related TMPRSS2 as case study. Comput Struct Biotechnol J 2021; 19:4790-4824. [PMID: 34426763 PMCID: PMC8373588 DOI: 10.1016/j.csbj.2021.08.023] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2021] [Revised: 08/03/2021] [Accepted: 08/16/2021] [Indexed: 01/10/2023] Open
Abstract
In the present work we introduce the use of multiple docked poses for bootstrapping machine learning-based QSAR modelling. Ligand-receptor contact fingerprints are implemented as descriptor variables. We implemented this method for the discovery of potential inhibitors of the serine protease enzyme TMPRSS2 involved the infectivity of coronaviruses. Several machine learners were scanned, however, Xgboost, support vector machines (SVM) and random forests (RF) were the best with testing set accuracies reaching 90%. Three potential hits were identified upon using the method to scan known untested FDA approved drugs against TMPRSS2. Subsequent molecular dynamics simulation and covalent docking supported the results of the new computational approach.
Collapse
Affiliation(s)
- Ma'mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, PO Box 330127, Zarqa 13133, Jordan
| | - Mutasem Taha
- Department of Pharmaceutical Sciences, Faculty of Pharmacy, University of Jordan, Amman 11942, Jordan
| |
Collapse
|
6
|
Carracedo-Reboredo P, Liñares-Blanco J, Rodríguez-Fernández N, Cedrón F, Novoa FJ, Carballal A, Maojo V, Pazos A, Fernandez-Lozano C. A review on machine learning approaches and trends in drug discovery. Comput Struct Biotechnol J 2021; 19:4538-4558. [PMID: 34471498 PMCID: PMC8387781 DOI: 10.1016/j.csbj.2021.08.011] [Citation(s) in RCA: 103] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2021] [Revised: 08/06/2021] [Accepted: 08/06/2021] [Indexed: 12/30/2022] Open
Abstract
Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.
Collapse
Key Words
- ADMET, Absorption, distribution, metabolism, elimination and toxicity
- ADR, Adverse Drug Reaction
- AI, Artificial Intelligence
- ANN, Artificial Neural Networks
- APFP, Atom Pairs 2d FingerPrint
- AUC, Area under the Curve
- BBB, Blood–Brain barrier
- CDK, Chemical Development Kit
- CNN, Convolutional Neural Networks
- CNS, Central Nervous System
- CPI, Compound-protein interaction
- CV, Cross Validation
- Cheminformatics
- DL, Deep Learning
- DNA, Deoxyribonucleic acid
- Deep Learning
- Drug Discovery
- ECFP, Extended Connectivity Fingerprints
- FDA, Food and Drug Administration
- FNN, Fully Connected Neural Networks
- FP, Fringerprints
- FS, Feature Selection
- GCN, Graph Convolutional Networks
- GEO, Gene Expression Omnibus
- GNN, Graph Neural Networks
- GO, Gene Ontology
- KEGG, Kyoto Encyclopedia of Genes and Genomes
- MACCS, Molecular ACCess System
- MCC, Matthews correlation coefficient
- MD, Molecular Descriptors
- MKL, Multiple Kernel Learning
- ML, Machine Learning
- Machine Learning
- Molecular Descriptors
- NB, Naive Bayes
- OOB, Out of Bag
- PCA, Principal Component Analyisis
- QSAR
- QSAR, Quantitative structure–activity relationship
- RF, Random Forest
- RNA, Ribonucleic Acid
- SMILES, simplified molecular-input line-entry system
- SVM, Support Vector Machines
- TCGA, The Cancer Genome Atlas
- WHO, World Health Organization
- t-SNE, t-Distributed Stochastic Neighbor Embedding
Collapse
Affiliation(s)
- Paula Carracedo-Reboredo
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Jose Liñares-Blanco
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
| | - Nereida Rodríguez-Fernández
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco Cedrón
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Francisco J. Novoa
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Adrian Carballal
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Department of Computer Science and Information Technologies, Faculty of Communication Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
| | - Victor Maojo
- Biomedical Informatics Group, Artificial Intelligence Department, Polytechnic University of Madrid, Calle de los Ciruelos, Boadilla del Monte, Madrid 28660, Spain
| | - Alejandro Pazos
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- Department of Computer Science and Information Technologies, Faculty of Computer Science, Universidade da Coruna, Campus Elviña s/n, A Coruña 15071, Spain
- CITIC-Research Center of Information and Communication Technologies, Universidade da Coruna, A Coruña 15071, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR), Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
7
|
Two Decades of 4D-QSAR: A Dying Art or Staging a Comeback? Int J Mol Sci 2021; 22:ijms22105212. [PMID: 34069090 PMCID: PMC8156896 DOI: 10.3390/ijms22105212] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 05/11/2021] [Accepted: 05/12/2021] [Indexed: 01/01/2023] Open
Abstract
A key question confronting computational chemists concerns the preferable ligand geometry that fits complementarily into the receptor pocket. Typically, the postulated ‘bioactive’ 3D ligand conformation is constructed as a ‘sophisticated guess’ (unnecessarily geometry-optimized) mirroring the pharmacophore hypothesis—sometimes based on an erroneous prerequisite. Hence, 4D-QSAR scheme and its ‘dialects’ have been practically implemented as higher level of model abstraction that allows the examination of the multiple molecular conformation, orientation and protonation representation, respectively. Nearly a quarter of a century has passed since the eminent work of Hopfinger appeared on the stage; therefore the natural question occurs whether 4D-QSAR approach is still appealing to the scientific community? With no intention to be comprehensive, a review of the current state of art in the field of receptor-independent (RI) and receptor-dependent (RD) 4D-QSAR methodology is provided with a brief examination of the ‘mainstream’ algorithms. In fact, a myriad of 4D-QSAR methods have been implemented and applied practically for a diverse range of molecules. It seems that, 4D-QSAR approach has been experiencing a promising renaissance of interests that might be fuelled by the rising power of the graphics processing unit (GPU) clusters applied to full-atom MD-based simulations of the protein-ligand complexes.
Collapse
|
8
|
Mizera M, Latek D. Ligand-Receptor Interactions and Machine Learning in GCGR and GLP-1R Drug Discovery. Int J Mol Sci 2021; 22:ijms22084060. [PMID: 33920024 PMCID: PMC8071054 DOI: 10.3390/ijms22084060] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 03/31/2021] [Accepted: 04/07/2021] [Indexed: 12/03/2022] Open
Abstract
The large amount of data that has been collected so far for G protein-coupled receptors requires machine learning (ML) approaches to fully exploit its potential. Our previous ML model based on gradient boosting used for prediction of drug affinity and selectivity for a receptor subtype was compared with explicit information on ligand-receptor interactions from induced-fit docking. Both methods have proved their usefulness in drug response predictions. Yet, their successful combination still requires allosteric/orthosteric assignment of ligands from datasets. Our ligand datasets included activities of two members of the secretin receptor family: GCGR and GLP-1R. Simultaneous activation of two or three receptors of this family by dual or triple agonists is not a typical kind of information included in compound databases. A precise allosteric/orthosteric ligand assignment requires a continuous update based on new structural and biological data. This data incompleteness remains the main obstacle for current ML methods applied to class B GPCR drug discovery. Even so, for these two class B receptors, our ligand-based ML model demonstrated high accuracy (5-fold cross-validation Q2 > 0.63 and Q2 > 0.67 for GLP-1R and GCGR, respectively). In addition, we performed a ligand annotation using recent cryogenic-electron microscopy (cryo-EM) and X-ray crystallographic data on small-molecule complexes of GCGR and GLP-1R. As a result, we assigned GLP-1R and GCGR actives deposited in ChEMBL to four small-molecule binding sites occupied by positive and negative allosteric modulators and a full agonist. Annotated compounds were added to our recently released repository of GPCR data.
Collapse
|
9
|
Kyaw Zin PP, Borrel A, Fourches D. Benchmarking 2D/3D/MD-QSAR Models for Imatinib Derivatives: How Far Can We Predict? J Chem Inf Model 2020; 60:3342-3360. [PMID: 32623886 DOI: 10.1021/acs.jcim.0c00200] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Imatinib, a 2-phenylaminopyridine-based BCR-ABL tyrosine kinase inhibitor, is a highly effective drug for treating Chronic Myeloid Leukemia (CML). However, cases of drug resistance are constantly emerging due to various mutations in the ABL kinase domain; thus, it is crucial to identify novel bioactive analogues. Reliable QSAR models and molecular docking protocols have been shown to facilitate the discovery of new compounds from chemical libraries prior to experimental testing. However, as the vast majority of QSAR models strictly relies on 2D descriptors, the rise of 3D descriptors directly computed from molecular dynamics simulations offers new opportunities to potentially augment the reliability of QSAR models. Herein, we employed molecular docking and molecular dynamics on a large series of Imatinib derivatives and developed an ensemble of QSAR models relying on deep neural nets (DNN) and hybrid sets of 2D/3D/MD descriptors in order to predict the binding affinity and inhibition potencies of those compounds. Through rigorous validation tests, we showed that our DNN regression models achieved excellent external prediction performances for the pKi data set (n = 555, R2 ≥ 0.71. and MAE ≤ 0.85), and the pIC50 data set (n = 306, R2 ≥ 0.54. and MAE ≤ 0.71) with strict validation protocols based on external test sets and 10-fold native and nested cross validations. Interestingly, the best DNN and random forest models performed similarly across all descriptor sets. In fact, for this particular series of compounds, our external test results suggest that incorporating additional 3D protein-ligand binding site fingerprint, descriptors, or even MD time-series descriptors did not significantly improve the overall R2 but lowered the MAE of DNN QSAR models. Those augmented models could still help in identifying and understanding the key dynamic protein-ligand interactions to be optimized for further molecular design.
Collapse
Affiliation(s)
- Phyo Phyo Kyaw Zin
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Alexandre Borrel
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina 27695, United States
| |
Collapse
|
10
|
Matsuzaka Y, Uesawa Y. Molecular Image-Based Prediction Models of Nuclear Receptor Agonists and Antagonists Using the DeepSnap-Deep Learning Approach with the Tox21 10K Library. Molecules 2020; 25:molecules25122764. [PMID: 32549344 PMCID: PMC7356846 DOI: 10.3390/molecules25122764] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Revised: 06/06/2020] [Accepted: 06/12/2020] [Indexed: 02/07/2023] Open
Abstract
The interaction of nuclear receptors (NRs) with chemical compounds can cause dysregulation of endocrine signaling pathways, leading to adverse health outcomes due to the disruption of natural hormones. Thus, identifying possible ligands of NRs is a crucial task for understanding the adverse outcome pathway (AOP) for human toxicity as well as the development of novel drugs. However, the experimental assessment of novel ligands remains expensive and time-consuming. Therefore, an in silico approach with a wide range of applications instead of experimental examination is highly desirable. The recently developed novel molecular image-based deep learning (DL) method, DeepSnap-DL, can produce multiple snapshots from three-dimensional (3D) chemical structures and has achieved high performance in the prediction of chemicals for toxicological evaluation. In this study, we used DeepSnap-DL to construct prediction models of 35 agonist and antagonist allosteric modulators of NRs for chemicals derived from the Tox21 10K library. We demonstrate the high performance of DeepSnap-DL in constructing prediction models. These findings may aid in interpreting the key molecular events of toxicity and support the development of new fields of machine learning to identify environmental chemicals with the potential to interact with NR signaling pathways.
Collapse
|
11
|
Li X, Fourches D. Inductive transfer learning for molecular activity prediction: Next-Gen QSAR Models with MolPMoFiT. J Cheminform 2020; 12:27. [PMID: 33430978 PMCID: PMC7178569 DOI: 10.1186/s13321-020-00430-x] [Citation(s) in RCA: 50] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 04/15/2020] [Indexed: 12/25/2022] Open
Abstract
Deep neural networks can directly learn from chemical structures without extensive, user-driven selection of descriptors in order to predict molecular properties/activities with high reliability. But these approaches typically require large training sets to learn the endpoint-specific structural features and ensure reasonable prediction accuracy. Even though large datasets are becoming the new normal in drug discovery, especially when it comes to high-throughput screening or metabolomics datasets, one should also consider smaller datasets with challenging endpoints to model and forecast. Thus, it would be highly relevant to better utilize the tremendous compendium of unlabeled compounds from publicly-available datasets for improving the model performances for the user’s particular series of compounds. In this study, we propose the Molecular Prediction Model Fine-Tuning (MolPMoFiT) approach, an effective transfer learning method based on self-supervised pre-training + task-specific fine-tuning for QSPR/QSAR modeling. A large-scale molecular structure prediction model is pre-trained using one million unlabeled molecules from ChEMBL in a self-supervised learning manner, and can then be fine-tuned on various QSPR/QSAR tasks for smaller chemical datasets with specific endpoints. Herein, the method is evaluated on four benchmark datasets (lipophilicity, FreeSolv, HIV, and blood–brain barrier penetration). The results showed the method can achieve strong performances for all four datasets compared to other state-of-the-art machine learning modeling techniques reported in the literature so far.![]()
Collapse
Affiliation(s)
- Xinhao Li
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA
| | - Denis Fourches
- Department of Chemistry, Bioinformatics Research Center, North Carolina State University, Raleigh, NC, 27695, USA.
| |
Collapse
|
12
|
Singh N, Chaput L, Villoutreix BO. Virtual screening web servers: designing chemical probes and drug candidates in the cyberspace. Brief Bioinform 2020; 22:1790-1818. [PMID: 32187356 PMCID: PMC7986591 DOI: 10.1093/bib/bbaa034] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The interplay between life sciences and advancing technology drives a continuous cycle of chemical data growth; these data are most often stored in open or partially open databases. In parallel, many different types of algorithms are being developed to manipulate these chemical objects and associated bioactivity data. Virtual screening methods are among the most popular computational approaches in pharmaceutical research. Today, user-friendly web-based tools are available to help scientists perform virtual screening experiments. This article provides an overview of internet resources enabling and supporting chemical biology and early drug discovery with a main emphasis on web servers dedicated to virtual ligand screening and small-molecule docking. This survey first introduces some key concepts and then presents recent and easily accessible virtual screening and related target-fishing tools as well as briefly discusses case studies enabled by some of these web services. Notwithstanding further improvements, already available web-based tools not only contribute to the design of bioactive molecules and assist drug repositioning but also help to generate new ideas and explore different hypotheses in a timely fashion while contributing to teaching in the field of drug development.
Collapse
Affiliation(s)
- Natesh Singh
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Ludovic Chaput
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| | - Bruno O Villoutreix
- Univ. Lille, Inserm, Institut Pasteur de Lille, U1177 Drugs and Molecules for Living Systems, F-59000 Lille, France
| |
Collapse
|
13
|
Jones MR, Brooks BR. Quantum chemical predictions of water-octanol partition coefficients applied to the SAMPL6 logP blind challenge. J Comput Aided Mol Des 2020; 34:485-493. [PMID: 32002778 DOI: 10.1007/s10822-020-00286-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2019] [Accepted: 01/08/2020] [Indexed: 11/30/2022]
Abstract
Theoretical approaches for predicting physicochemical properties are valuable tools for accelerating the drug discovery process. In this work, quantum chemical methods are used to predict water-octanol partition coefficients as a part of the SAMPL6 blind challenge. The SMD continuum solvent model was employed with MP2 and eight DFT functionals in conjunction with correlation consistent basis sets to determine the water-octanol transfer free energy. Several tactics towards improving the predictions of the partition coefficient were examined, including increasing the quality of basis sets, considering tautomerization, and accounting for inhomogeneities in the water and n-octanol phases. Evaluation of these various schemes highlights the impact of modeling approaches across different methods. With the inclusion of tautomers and adjustments to the permittivity constants, the best predictions were obtained with smaller basis sets and the O3LYP functional, which yielded an RMSE of 0.79 logP units. The results presented correspond to the SAMPL6 logP submission IDs: DYXBT, O7DJK, and AHMTF.
Collapse
Affiliation(s)
- Michael R Jones
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892-5690, USA.
| | - Bernard R Brooks
- Laboratory of Computational Biology, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD, 20892-5690, USA
| |
Collapse
|