1
|
Snyder SH, Vignaux PA, Ozalp MK, Gerlach J, Puhl AC, Lane TR, Corbett J, Urbina F, Ekins S. The Goldilocks paradigm: comparing classical machine learning, large language models, and few-shot learning for drug discovery applications. Commun Chem 2024; 7:134. [PMID: 38866916 PMCID: PMC11169557 DOI: 10.1038/s42004-024-01220-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 06/04/2024] [Indexed: 06/14/2024] Open
Abstract
Recent advances in machine learning (ML) have led to newer model architectures including transformers (large language models, LLMs) showing state of the art results in text generation and image analysis as well as few-shot learning (FSLC) models which offer predictive power with extremely small datasets. These new architectures may offer promise, yet the 'no-free lunch' theorem suggests that no single model algorithm can outperform at all possible tasks. Here, we explore the capabilities of classical (SVR), FSLC, and transformer models (MolBART) over a range of dataset tasks and show a 'goldilocks zone' for each model type, in which dataset size and feature distribution (i.e. dataset "diversity") determines the optimal algorithm strategy. When datasets are small ( < 50 molecules), FSLC tend to outperform both classical ML and transformers. When datasets are small-to-medium sized (50-240 molecules) and diverse, transformers outperform both classical models and few-shot learning. Finally, when datasets are of larger and of sufficient size, classical models then perform the best, suggesting that the optimal model to choose likely depends on the dataset available, its size and diversity. These findings may help to answer the perennial question of which ML algorithm is to be used when faced with a new dataset.
Collapse
Affiliation(s)
- Scott H Snyder
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Patricia A Vignaux
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Mustafa Kemal Ozalp
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Jacob Gerlach
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Ana C Puhl
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Thomas R Lane
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - John Corbett
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA
| | - Fabio Urbina
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| | - Sean Ekins
- Collaborations Pharmaceuticals, Inc., 840 Main Campus Drive, Lab 3510, Raleigh, NC, 27606, USA.
| |
Collapse
|
2
|
Oliveira PF, Guedes RC, Falcao AO. Inferring molecular inhibition potency with AlphaFold predicted structures. Sci Rep 2024; 14:8252. [PMID: 38589418 PMCID: PMC11001998 DOI: 10.1038/s41598-024-58394-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Accepted: 03/28/2024] [Indexed: 04/10/2024] Open
Abstract
Even though in silico drug ligand-based methods have been successful in predicting interactions with known target proteins, they struggle with new, unassessed targets. To address this challenge, we propose an approach that integrates structural data from AlphaFold 2 predicted protein structures into machine learning models. Our method extracts 3D structural protein fingerprints and combines them with ligand structural data to train a single machine learning model. This model captures the relationship between ligand properties and the unique structural features of various target proteins, enabling predictions for never before tested molecules and protein targets. To assess our model, we used a dataset of 144 Human G-protein Coupled Receptors (GPCRs) with over 140,000 measured inhibition constants (Ki) values. Results strongly suggest that our approach performs as well as state-of-the-art ligand-based methods. In a second modeling approach that used 129 targets for training and a separate test set of 15 different protein targets, our model correctly predicted interactions for 73% of targets, with explained variances exceeding 0.50 in 22% of cases. Our findings further verified that the usage of experimentally determined protein structures produced models that were statistically indistinct from the Alphafold synthetic structures. This study presents a proteo-chemometric drug screening approach that uses a simple and scalable method for extracting protein structural information for usage in machine learning models capable of predicting protein-molecule interactions even for orphan targets.
Collapse
Affiliation(s)
- Pedro F Oliveira
- Lasige, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Rita C Guedes
- Research Institute for Medicines (iMed.ULisboa), Faculdade de Farmácia, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Andre O Falcao
- Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016, Lisboa, Portugal.
| |
Collapse
|
3
|
Kumar S, Bhowmik R, Oh JM, Abdelgawad MA, Ghoneim MM, Al-Serwi RH, Kim H, Mathew B. Machine learning driven web-based app platform for the discovery of monoamine oxidase B inhibitors. Sci Rep 2024; 14:4868. [PMID: 38418571 PMCID: PMC10901862 DOI: 10.1038/s41598-024-55628-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 02/26/2024] [Indexed: 03/01/2024] Open
Abstract
Monoamine oxidases (MAOs), specifically MAO-A and MAO-B, play important roles in the breakdown of monoamine neurotransmitters. Therefore, MAO inhibitors are crucial for treating various neurodegenerative disorders, including Parkinson's disease (PD), Alzheimer's disease (AD), and amyotrophic lateral sclerosis (ALS). In this study, we developed a novel cheminformatics pipeline by generating three diverse molecular feature-based machine learning-assisted quantitative structural activity relationship (ML-QSAR) models concerning MAO-B inhibition. PubChem fingerprints, substructure fingerprints, and one-dimensional (1D) and two-dimensional (2D) molecular descriptors were implemented to unravel the structural insights responsible for decoding the origin of MAO-B inhibition in 249 non-reductant molecules. Based on a random forest ML algorithm, the final PubChem fingerprint, substructure fingerprint, and 1D and 2D molecular descriptor prediction models demonstrated significant robustness, with correlation coefficients of 0.9863, 0.9796, and 0.9852, respectively. The significant features of each predictive model responsible for MAO-B inhibition were extracted using a comprehensive variance importance plot (VIP) and correlation matrix analysis. The final predictive models were further developed as a web application, MAO-B-pred ( https://mao-b-pred.streamlit.app/ ), to allow users to predict the bioactivity of molecules against MAO-B. Molecular docking and dynamics studies were conducted to gain insight into the atomic-level molecular interactions between the ligand-receptor complexes. These findings were compared with the structural features obtained from the ML-QSAR models, which supported the mechanistic understanding of the binding phenomena. The presented models have the potential to serve as tools for identifying crucial molecular characteristics for the rational design of MAO-B target inhibitors, which may be used to develop effective drugs for neurodegenerative disorders.
Collapse
Affiliation(s)
- Sunil Kumar
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi, India
| | - Ratul Bhowmik
- Department of Pharmaceutical Chemistry, School of Pharmaceutical Education and Research, Jamia Hamdard, New Delhi, India
| | - Jong Min Oh
- Department of Pharmacy, and Research Institute of Life Pharmaceutical Sciences, Sunchon National University, Suncheon, 57922, Republic of Korea
| | - Mohamed A Abdelgawad
- Department of Pharmaceutical Chemistry, College of Pharmacy, Jouf University, 72341, Sakaka, Aljouf, Saudi Arabia
| | - Mohammed M Ghoneim
- Department of Pharmacy Practice, College of Pharmacy, AlMaarefa University, 13713, Ad Diriyah, Riyadh, Saudi Arabia
| | - Rasha Hamed Al-Serwi
- Department of Basic Dental Sciences, College of Dentistry, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Hoon Kim
- Department of Pharmacy, and Research Institute of Life Pharmaceutical Sciences, Sunchon National University, Suncheon, 57922, Republic of Korea.
| | - Bijo Mathew
- Department of Pharmaceutical Chemistry, Amrita School of Pharmacy, Amrita Vishwa Vidyapeetham, AIMS Health Sciences Campus, Kochi, India.
| |
Collapse
|
4
|
Kızılcan DŞ, Güzel Y, Türkmenoğlu B. Clustering of atoms relative to vector space in the Z-matrix coordinate system and 'graphical fingerprint' analysis of 3D pharmacophore structure. Mol Divers 2024:10.1007/s11030-023-10798-1. [PMID: 38280974 DOI: 10.1007/s11030-023-10798-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 12/20/2023] [Indexed: 01/29/2024]
Abstract
The behavior of a molecule within its environment is governed by chemical fields present in 3D space. However, beyond local descriptors in 3D, the conformations a molecule assumes, and the resulting clusters also play a role in influencing structure-activity models. This study focuses on the clustering of atoms according to the vector space of four atoms aligned in the Z-Matrix Reference system for molecular similarity. Using 3D-QSAR analysis, it was aimed to determine the pharmacophore groups as interaction points in the binding region of the β2-adrenoceptor target of fenoterol stereoisomers. Different types of local reactive descriptors of ligands have been used to elucidate points of interaction with the target. Activity values for ligand-receptor interaction energy were determined using the Levenberg-Marquardt algorithm. Using the Molecular Comparative Electron Topology method, the 3D pharmacophore model (3D-PhaM) was obtained after aligning and superimposing the molecules and was further validated by the molecular docking method. Best guesses were calculated with a non-output validation (LOO-CV) method. Finally, the data were calculated using the 'graphic fingerprint' technique. Based on the eLKlopman (Electrostatic LUMO Klopman) descriptor, the Q2 value of this derivative set was calculated as 0.981 and the R2ext value is calculated as 0.998.
Collapse
Affiliation(s)
- Dilek Şeyma Kızılcan
- Department of Chemistry, Faculty of Science, Erciyes University, Kayseri, Turkey
| | - Yahya Güzel
- Department of Chemistry, Faculty of Science, Erciyes University, Kayseri, Turkey
| | - Burçin Türkmenoğlu
- Department of Analytical Chemistry, Faculty of Pharmacy, Erzincan Binali Yıldırım University, Erzincan, Turkey.
| |
Collapse
|
5
|
Kalian AD, Benfenati E, Osborne OJ, Gott D, Potter C, Dorne JLCM, Guo M, Hogstrand C. Exploring Dimensionality Reduction Techniques for Deep Learning Driven QSAR Models of Mutagenicity. TOXICS 2023; 11:572. [PMID: 37505541 PMCID: PMC10384850 DOI: 10.3390/toxics11070572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/28/2023] [Accepted: 06/28/2023] [Indexed: 07/29/2023]
Abstract
Dimensionality reduction techniques are crucial for enabling deep learning driven quantitative structure-activity relationship (QSAR) models to navigate higher dimensional toxicological spaces, however the use of specific techniques is often arbitrary and poorly explored. Six dimensionality techniques (both linear and non-linear) were hence applied to a higher dimensionality mutagenicity dataset and compared in their ability to power a simple deep learning driven QSAR model, following grid searches for optimal hyperparameter values. It was found that comparatively simpler linear techniques, such as principal component analysis (PCA), were sufficient for enabling optimal QSAR model performances, which indicated that the original dataset was at least approximately linearly separable (in accordance with Cover's theorem). However certain non-linear techniques such as kernel PCA and autoencoders performed at closely comparable levels, while (especially in the case of autoencoders) being more widely applicable to potentially non-linearly separable datasets. Analysis of the chemical space, in terms of XLogP and molecular weight, uncovered that the vast majority of testing data occurred within the defined applicability domain, as well as that certain regions were measurably more problematic and antagonised performances. It was however indicated that certain dimensionality reduction techniques were able to facilitate uniquely beneficial navigations of the chemical space.
Collapse
Affiliation(s)
- Alexander D Kalian
- Department of Nutritional Sciences, King's College London, Franklin-Wilkins Building, 150 Stamford St., London SE1 9NH, UK
| | - Emilio Benfenati
- Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy
| | | | - David Gott
- Food Standards Agency, 70 Petty France, London SW1H 9EX, UK
| | - Claire Potter
- Food Standards Agency, 70 Petty France, London SW1H 9EX, UK
| | - Jean-Lou C M Dorne
- European Food Safety Authority (EFSA), Via Carlo Magno 1A, 43126 Parma, Italy
| | - Miao Guo
- Department of Engineering, King's College London, Strand Campus, Strand, London WC2R 2LS, UK
| | - Christer Hogstrand
- Department of Analytical, Environmental and Forensic Sciences, King's College London, Franklin-Wilkins Building, 150 Stamford St., London SE1 9NH, UK
| |
Collapse
|
6
|
Pinel P, Guichaoua G, Najm M, Labouille S, Drizard N, Gaston-Mathé Y, Hoffmann B, Stoven V. Exploring isofunctional molecules: Design of a benchmark and evaluation of prediction performance. Mol Inform 2023; 42:e2200216. [PMID: 36633361 DOI: 10.1002/minf.202200216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 12/19/2022] [Accepted: 01/11/2023] [Indexed: 01/13/2023]
Abstract
Identification of novel chemotypes with biological activity similar to a known active molecule is an important challenge in drug discovery called 'scaffold hopping'. Small-, medium-, and large-step scaffold hopping efforts may lead to increasing degrees of chemical structure novelty with respect to the parent compound. In the present paper, we focus on the problem of large-step scaffold hopping. We assembled a high quality and well characterized dataset of scaffold hopping examples comprising pairs of active molecules and including a variety of protein targets. This dataset was used to build a benchmark corresponding to the setting of real-life applications: one active molecule is known, and the second active is searched among a set of decoys chosen in a way to avoid statistical bias. This allowed us to evaluate the performance of computational methods for solving large-step scaffold hopping problems. In particular, we assessed how difficult these problems are, particularly for classical 2D and 3D ligand-based methods. We also showed that a machine-learning chemogenomic algorithm outperforms classical methods and we provided some useful hints for future improvements.
Collapse
Affiliation(s)
- Philippe Pinel
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France.,Iktos SAS, 75017, Paris, France
| | - Gwenn Guichaoua
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | - Matthieu Najm
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| | | | | | | | | | - Véronique Stoven
- Center for Computational Biology, Mines Paris-PSL, PSL Research University, 75006, Paris, France.,Institut Curie, 75248, Paris, France.,INSERM U900, 75428, Paris, France
| |
Collapse
|
7
|
Yang J, Cai Y, Zhao K, Xie H, Chen X. Concepts and applications of chemical fingerprint for hit and lead screening. Drug Discov Today 2022; 27:103356. [PMID: 36113834 DOI: 10.1016/j.drudis.2022.103356] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Revised: 07/28/2022] [Accepted: 09/08/2022] [Indexed: 11/22/2022]
Abstract
Molecular fingerprints are used to represent chemical (structural, physicochemical, etc.) properties of large-scale chemical sets in a low computational cost way. They have a prominent role in transforming chemical data sets into consistent input formats (bit strings or numeric values) suitable for in silico approaches. In this review, we summarize and classify common and state-of-the-art fingerprints into eight different types (dictionary based, circular, topological, pharmacophore, protein-ligand interaction, shape based, reinforced, and multi). We also highlight applications of fingerprints in early drug research and development (R&D). Thus, this review provides a guide for the selection of appropriate fingerprints of compounds (or ligand-protein complexes) for use in drug R&D.
Collapse
Affiliation(s)
- Jingbo Yang
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Yiyang Cai
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Kairui Zhao
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China
| | - Hongbo Xie
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| | - Xiujie Chen
- Department of Pharmagenomics, College of Bioinformatics Science and Technology, Harbin Medical University, 150081 Harbin, Heilongjiang, China.
| |
Collapse
|
8
|
Structural Model Based on Genetic Algorithm for Inhibiting Fatty Acid Amide Hydrolase. AI 2022. [DOI: 10.3390/ai3040052] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The fatty acid amide hydrolase (FAAH) is an enzyme responsible for the degradation of anandamide, an endocannabinoid. Pharmacologically blocking this target can lead to anxiolytic effects; therefore, new inhibitors can improve therapy in this field. In order to speed up the process of drug discovery, various in silico methods can be used, such as molecular docking, quantitative structure–activity relationship models (QSAR), and artificial intelligence (AI) classification algorithms. Besides architecture, one important factor for an AI model with high accuracy is the dataset quality. This issue can be solved by a genetic algorithm that can select optimal features for the prediction. The objective of the current study is to use this feature selection method in order to identify the most relevant molecular descriptors that can be used as independent variables, thus improving the efficacy of AI algorithms that can predict FAAH inhibitors. The model that used features chosen by the genetic algorithm had better accuracy than the model that used all molecular descriptors generated by the CDK descriptor calculator 1.4.6 software. Hence, carefully selecting the input data used by AI classification algorithms by using a GA is a promising strategy in drug development.
Collapse
|
9
|
Devillers J, Sartor V, Devillers H. Predicting mosquito repellents for clothing application from molecular fingerprint-based artificial neural network SAR models. SAR AND QSAR IN ENVIRONMENTAL RESEARCH 2022; 33:729-751. [PMID: 36106833 DOI: 10.1080/1062936x.2022.2124014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2022] [Accepted: 09/06/2022] [Indexed: 06/15/2023]
Abstract
Spraying repellents on clothing limits toxicity and allergy problems that can occur when the repellents are directly applied to skin. This also allows the use of higher doses to ensure longer lasting effects. As the number of repellents available on the market is limited, it is necessary to propose new ones, especially by using in silico methods that reduce costs and time. In this context SAR models were built from a dataset of 2027 chemicals for which repellent activity on clothing was measured against Aedes aegypti. The interest of using either the ECFP or MACCS fingerprints as input neurons of a three-layer perceptron was evaluated. Transformation of MACCS bit strings into disjunctive tables led to interesting results. Models obtained with both types of fingerprints were compared to a model including physicochemical and topological descriptors.
Collapse
Affiliation(s)
| | - V Sartor
- Laboratoire des IMRCP, Université de Toulouse, CNRS UMR 5623, Université Toulouse III - Paul Sabatier, Toulouse, France
| | - H Devillers
- SPO, Univ Montpellier, INRAE, Institut Agro, Montpellier, France
| |
Collapse
|
10
|
Franco C, Kausar S, Silva MFB, Guedes RC, Falcao AO, Brito MA. Multi-Targeting Approach in Glioblastoma Using Computer-Assisted Drug Discovery Tools to Overcome the Blood–Brain Barrier and Target EGFR/PI3Kp110β Signaling. Cancers (Basel) 2022; 14:cancers14143506. [PMID: 35884571 PMCID: PMC9317902 DOI: 10.3390/cancers14143506] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2022] [Accepted: 07/12/2022] [Indexed: 02/04/2023] Open
Abstract
Simple Summary Treatment of glioblastoma is hampered by the activation of compensatory survival mechanisms by malignant cells that lead to drug resistance. Moreover, the blood–brain barrier (BBB) precludes the brain entrance of most drugs. We hypothesized that computer-assisted drug discovery tools would reveal novel multi-targeting drug candidates with BBB-permeant and favorable ADMET properties. We aimed to discover molecules with predicted ability to inhibit the EGFR/PI3Kp110β pathway and to validate their efficacy and safety in biological assays. We used quantitative structure–activity relationship models and structure-based virtual screening, and assessed ADMET properties, to identify BBB-permeant drug candidates. Moreover, we tested their anti-tumor efficacy and BBB safety and permeation in cell models. We found two EGFR, two PI3Kp110β, and, mostly, two dual inhibitors with anti-tumor effects. Among them, one EGFR and two PI3Kp110β inhibitors were able to cross the BBB endothelium without compromising it. These studies revealed novel drug candidates for glioblastoma treatment. Abstract The epidermal growth factor receptor (EGFR) is upregulated in glioblastoma, becoming an attractive therapeutic target. However, activation of compensatory pathways generates inputs to downstream PI3Kp110β signaling, leading to anti-EGFR therapeutic resistance. Moreover, the blood–brain barrier (BBB) limits drugs’ brain penetration. We aimed to discover EGFR/PI3Kp110β pathway inhibitors for a multi-targeting approach, with favorable ADMET and BBB-permeant properties. We used quantitative structure–activity relationship models and structure-based virtual screening, and assessed ADMET properties, to identify BBB-permeant drug candidates. Predictions were validated in in vitro models of the human BBB and BBB-glioma co-cultures. The results disclosed 27 molecules (18 EGFR, 6 PI3Kp110β, and 3 dual inhibitors) for biological validation, performed in two glioblastoma cell lines (U87MG and U87MG overexpressing EGFR). Six molecules (two EGFR, two PI3Kp110β, and two dual inhibitors) decreased cell viability by 40–99%, with the greatest effect observed for the dual inhibitors. The glioma cytotoxicity was confirmed by analysis of targets’ downregulation and increased apoptosis (15–85%). Safety to BBB endothelial cells was confirmed for three of those molecules (one EGFR and two PI3Kp110β inhibitors). These molecules crossed the endothelial monolayer in the BBB in vitro model and in the BBB-glioblastoma co-culture system. These results revealed novel drug candidates for glioblastoma treatment.
Collapse
Affiliation(s)
- Catarina Franco
- LASIGE, Department of Informatics, Faculty of Sciences, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal; (C.F.); (S.K.)
- Research Institute for Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal; (M.F.B.S.); (R.C.G.)
| | - Samina Kausar
- LASIGE, Department of Informatics, Faculty of Sciences, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal; (C.F.); (S.K.)
- Research Institute for Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal; (M.F.B.S.); (R.C.G.)
| | - Margarida F. B. Silva
- Research Institute for Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal; (M.F.B.S.); (R.C.G.)
- Department of Pharmaceutical Sciences and Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Rita C. Guedes
- Research Institute for Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal; (M.F.B.S.); (R.C.G.)
- Department of Pharmaceutical Sciences and Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
| | - Andre O. Falcao
- LASIGE, Department of Informatics, Faculty of Sciences, Universidade de Lisboa, Campo Grande, 1749-016 Lisboa, Portugal; (C.F.); (S.K.)
- Correspondence: (A.O.F.); (M.A.B.); Tel.: +351-217500239 (A.O.F.); +351-217946449 (M.A.B.)
| | - Maria Alexandra Brito
- Research Institute for Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal; (M.F.B.S.); (R.C.G.)
- Department of Pharmaceutical Sciences and Medicines, Faculty of Pharmacy, Universidade de Lisboa, Av. Prof. Gama Pinto, 1649-003 Lisboa, Portugal
- Correspondence: (A.O.F.); (M.A.B.); Tel.: +351-217500239 (A.O.F.); +351-217946449 (M.A.B.)
| |
Collapse
|
11
|
Lovrić M, Malev O, Klobučar G, Kern R, Liu JJ, Lučić B. Predictive Capability of QSAR Models Based on the CompTox Zebrafish Embryo Assays: An Imbalanced Classification Problem. Molecules 2021; 26:1617. [PMID: 33803931 PMCID: PMC7998177 DOI: 10.3390/molecules26061617] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Revised: 03/03/2021] [Accepted: 03/11/2021] [Indexed: 02/06/2023] Open
Abstract
The CompTox Chemistry Dashboard (ToxCast) contains one of the largest public databases on Zebrafish (Danio rerio) developmental toxicity. The data consists of 19 toxicological endpoints on unique 1018 compounds measured in relatively low concentration ranges. The endpoints are related to developmental effects occurring in dechorionated zebrafish embryos for 120 hours post fertilization and monitored via gross malformations and mortality. We report the predictive capability of 209 quantitative structure-activity relationship (QSAR) models developed by machine learning methods using penalization techniques and diverse model quality metrics to cope with the imbalanced endpoints. All these QSAR models were generated to test how the imbalanced classification (toxic or non-toxic) endpoints could be predicted regardless which of three algorithms is used: logistic regression, multi-layer perceptron, or random forests. Additionally, QSAR toxicity models are developed starting from sets of classical molecular descriptors, structural fingerprints and their combinations. Only 8 out of 209 models passed the 0.20 Matthew's correlation coefficient value defined a priori as a threshold for acceptable model quality on the test sets. The best models were obtained for endpoints mortality (MORT), ActivityScore and JAW (deformation). The low predictability of the QSAR model developed from the zebrafish embryotoxicity data in the database is mainly due to a higher sensitivity of 19 measurements of endpoints carried out on dechorionated embryos at low concentrations.
Collapse
Affiliation(s)
- Mario Lovrić
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| | - Olga Malev
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Göran Klobučar
- Department of Biology, Faculty of Science, University of Zagreb, Rooseveltov Trg 6, 10000 Zagreb, Croatia;
| | - Roman Kern
- Know-Center, Inffeldgasse 13, 8010 Graz, Austria; (M.L.); (R.K.)
- Institute of Interactive Systems and Data Science, TU Graz, Inffeldgasse 16c, 8010 Graz, Austria
| | - Jay J. Liu
- Department of Chemical Engineering, Pukyong National University, Busan 608-739, Korea
| | - Bono Lučić
- Ruđer Bošković Institute, P.O. Box 180, 10002 Zagreb, Croatia;
| |
Collapse
|
12
|
Matsuzaka Y, Hosaka T, Ogaito A, Yoshinari K, Uesawa Y. Prediction Model of Aryl Hydrocarbon Receptor Activation by a Novel QSAR Approach, DeepSnap-Deep Learning. Molecules 2020; 25:molecules25061317. [PMID: 32183141 PMCID: PMC7144728 DOI: 10.3390/molecules25061317] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2020] [Revised: 03/05/2020] [Accepted: 03/09/2020] [Indexed: 12/31/2022] Open
Abstract
The aryl hydrocarbon receptor (AhR) is a ligand-dependent transcription factor that senses environmental exogenous and endogenous ligands or xenobiotic chemicals. In particular, exposure of the liver to environmental metabolism-disrupting chemicals contributes to the development and propagation of steatosis and hepatotoxicity. However, the mechanisms for AhR-induced hepatotoxicity and tumor propagation in the liver remain to be revealed, due to the wide variety of AhR ligands. Recently, quantitative structure–activity relationship (QSAR) analysis using deep neural network (DNN) has shown superior performance for the prediction of chemical compounds. Therefore, this study proposes a novel QSAR analysis using deep learning (DL), called the DeepSnap–DL method, to construct prediction models of chemical activation of AhR. Compared with conventional machine learning (ML) techniques, such as the random forest, XGBoost, LightGBM, and CatBoost, the proposed method achieves high-performance prediction of AhR activation. Thus, the DeepSnap–DL method may be considered a useful tool for achieving high-throughput in silico evaluation of AhR-induced hepatotoxicity.
Collapse
Affiliation(s)
- Yasunari Matsuzaka
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
| | - Takuomi Hosaka
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Anna Ogaito
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Kouichi Yoshinari
- Laboratory of Molecular Toxicology, School of Pharmaceutical Sciences, University of Shizuoka, Shizuoka 422-8529, Japan; (T.H.); (A.O.); (K.Y.)
| | - Yoshihiro Uesawa
- Department of Medical Molecular Informatics, Meiji Pharmaceutical University, 204-8588 Tokyo, Japan;
- Correspondence:
| |
Collapse
|
13
|
Kausar S, Falcao AO. A visual approach for analysis and inference of molecular activity spaces. J Cheminform 2019; 11:63. [PMID: 33430986 PMCID: PMC6805449 DOI: 10.1186/s13321-019-0386-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Accepted: 10/05/2019] [Indexed: 11/25/2022] Open
Abstract
BACKGROUND Molecular space visualization can help to explore the diversity of large heterogeneous chemical data, which ultimately may increase the understanding of structure-activity relationships (SAR) in drug discovery projects. Visual SAR analysis can therefore be useful for library design, chemical classification for their biological evaluation and virtual screening for the selection of compounds for synthesis or in vitro testing. As such, computational approaches for molecular space visualization have become an important issue in cheminformatics research. The proposed approach uses molecular similarity as the sole input for computing a probabilistic surface of molecular activity (PSMA). This similarity matrix is transformed in 2D using different dimension reduction algorithms (Principal Coordinates Analysis ( PCooA), Kruskal multidimensional scaling, Sammon mapping and t-SNE). From this projection, a kernel density function is applied to compute the probability of activity for each coordinate in the new projected space. RESULTS This methodology was tested over four different quantitative structure-activity relationship (QSAR) binary classification data sets and the PSMAs were computed for each. The generated maps showed internal consistency with active molecules grouped together for all data sets and all dimensionality reduction algorithms. To validate the quality of the generated maps, the 2D coordinates of test molecules were computed into the new reference space using a data transformation matrix. In total sixteen PSMAs were built, and their performance was assessed using the Area Under Curve (AUC) and the Matthews Coefficient Correlation (MCC). For the best projections for each data set, AUC testing results ranged from 0.87 to 0.98 and the MCC scores ranged from 0.33 to 0.77, suggesting this methodology can validly capture the complexities of the molecular activity space. All four mapping functions provided generally good results yet the overall performance of PCooA and t-SNE was slightly better than Sammon mapping and Kruskal multidimensional scaling. CONCLUSIONS Our result showed that by using an appropriate combination of metric space representation and dimensionality reduction applied over metric spaces it is possible to produce a visual PSMA for which its consistency has been validated by using this map as a classification model. The produced maps can be used as prediction tools as it is simple to project any molecule into this new reference space as long as the similarities to the molecules used to compute the initial similarity matrix can be computed.
Collapse
Affiliation(s)
- Samina Kausar
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| | - Andre O. Falcao
- LaSIGE, Departamento de Informática, Faculdade de Ciências, Universidade de Lisboa, 1749-016 Lisboa, Portugal
- BioISI: Biosystems & Integrative Sciences Institute, Faculdade de Ciencias, Universidade de Lisboa, 1749-016 Lisboa, Portugal
| |
Collapse
|