1
|
Xu C, Liu R, Huang S, Li W, Li Z, Luo HB. 3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation. Brief Bioinform 2023; 24:bbad327. [PMID: 37756591 DOI: 10.1093/bib/bbad327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/19/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023] Open
Abstract
In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.
Collapse
Affiliation(s)
- Chao Xu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Runduo Liu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Shuheng Huang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Wenchao Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Zhe Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Hai-Bin Luo
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| |
Collapse
|
2
|
Sohlenius-Sternbeck AK, Terelius Y. Evaluation of ADMET Predictor TM in early discovery DMPK project work. Drug Metab Dispos 2021; 50:95-104. [PMID: 34750195 DOI: 10.1124/dmd.121.000552] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 11/05/2021] [Indexed: 11/22/2022] Open
Abstract
A dataset consisting of measured values for LogD, solubility, metabolic stability in human liver microsomes (HLM) and Caco-2 permeability was used to evaluate the prediction models for lipophilicity (S+LogD), water solubility (S+Sw_pH), metabolic stability in HLM (CYP_HLM_Clint), intestinal permeability (S+Peff) and P-gp substrate identification (P-gp substrate) in the software ADMET PredictorTM (AP) from Simulations Plus. The dataset consisted of a total of 4794 compounds, with at least data from metabolic stability determinations in HLM, from multiple discovery projects at Medivir. Our evaluation shows that the global AP models can be used for categorization of high and low values based on predicted results for metabolic stability in HLM and intestinal permeability, and to give good predictions of LogD (R2=0.79), guiding the synthesis of new compounds and for prioritzing in vitro ADME experiments. The model seems to overpredict solubility for the Medivir compounds, however. We also used the in-house datasets to build local models for LogD, solubility, metabolic stability and permeability by using artificial neural network (ANN) models in the optional ModelerTM module of AP. Predictions of the test sets were performed with both the global and the local models and the R2 value for linear regression for predicted versus measured HLM CLint based on logarithmic data was 0.72 for the in-house model and 0.53 for the AP model. The improved predictions with the local models are likely explained both by the specific chemical space of the Medivir dataset and lab specific assay conditions for parameters which require biological assay systems. Significance Statement AP is useful early in projects for predicting and categorizing LogD, metabolic stability and permeability, to guide the synthesis of new compounds and for prioritizing in vitro ADME experiments. The building of local in-house prediction models with the optional AP Modeler Module can give improved prediction success since these models are built on data from the same experimental setup and can also be based on compounds with similar structures.
Collapse
|
3
|
Schaduangrat N, Lampa S, Simeon S, Gleeson MP, Spjuth O, Nantasenamat C. Towards reproducible computational drug discovery. J Cheminform 2020; 12:9. [PMID: 33430992 PMCID: PMC6988305 DOI: 10.1186/s13321-020-0408-x] [Citation(s) in RCA: 78] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 01/02/2020] [Indexed: 12/11/2022] Open
Abstract
The reproducibility of experiments has been a long standing impediment for further scientific progress. Computational methods have been instrumental in drug discovery efforts owing to its multifaceted utilization for data collection, pre-processing, analysis and inference. This article provides an in-depth coverage on the reproducibility of computational drug discovery. This review explores the following topics: (1) the current state-of-the-art on reproducible research, (2) research documentation (e.g. electronic laboratory notebook, Jupyter notebook, etc.), (3) science of reproducible research (i.e. comparison and contrast with related concepts as replicability, reusability and reliability), (4) model development in computational drug discovery, (5) computational issues on model development and deployment, (6) use case scenarios for streamlining the computational drug discovery protocol. In computational disciplines, it has become common practice to share data and programming codes used for numerical calculations as to not only facilitate reproducibility, but also to foster collaborations (i.e. to drive the project further by introducing new ideas, growing the data, augmenting the code, etc.). It is therefore inevitable that the field of computational drug design would adopt an open approach towards the collection, curation and sharing of data/code.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, 10700, Bangkok, Thailand
| | - Samuel Lampa
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden
| | - Saw Simeon
- Interdisciplinary Graduate Program in Bioscience, Faculty of Science, Kasetsart University, 10900, Bangkok, Thailand
| | - Matthew Paul Gleeson
- Department of Biomedical Engineering, Faculty of Engineering, King Mongkut's Institute of Technology Ladkrabang, 10520, Bangkok, Thailand.
| | - Ola Spjuth
- Department of Pharmaceutical Biosciences, Uppsala University, 751 24, Uppsala, Sweden.
| | - Chanin Nantasenamat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, 10700, Bangkok, Thailand.
| |
Collapse
|
4
|
Simeon S, Montanari D, Gleeson MP. Investigation of Factors Affecting the Performance of
in silico
Volume Distribution QSAR Models for Human, Rat, Mouse, Dog & Monkey. Mol Inform 2019; 38:e1900059. [DOI: 10.1002/minf.201900059] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 07/03/2019] [Indexed: 01/09/2023]
Affiliation(s)
- Saw Simeon
- Interdisciplinary Graduate Program in Bioscience, Faculty of ScienceKasetsart University Bangkok 10900 Thailand
- Center for Advanced Studies in Nanotechnology for Chemical, Food and Agricultural Industries, KU Institute for Advanced StudiesKasetsart University Bangkok 10900 Thailand
| | - Dino Montanari
- DMPK and Bioanalysis, Aptuit Via Alessandro Fleming, 4 37135 Verona VR Italy
| | - Matthew Paul Gleeson
- Department of Chemistry, Faculty of ScienceKasetsart University Bangkok 10900 Thailand
- Department of Biomedical Engineering, Faculty of EngineeringKing Mongkut's Institute of Technology Ladkrabang Bangkok 10520 Thailand
| |
Collapse
|
5
|
Marine natural products with anti-inflammatory activity. Appl Microbiol Biotechnol 2015; 100:1645-1666. [DOI: 10.1007/s00253-015-7244-3] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 12/07/2015] [Accepted: 12/09/2015] [Indexed: 12/14/2022]
|
6
|
Abstract
In recent decades, in silico absorption, distribution, metabolism, excretion (ADME), and toxicity (T) modelling as a tool for rational drug design has received considerable attention from pharmaceutical scientists, and various ADME/T-related prediction models have been reported. The high-throughput and low-cost nature of these models permits a more streamlined drug development process in which the identification of hits or their structural optimization can be guided based on a parallel investigation of bioavailability and safety, along with activity. However, the effectiveness of these tools is highly dependent on their capacity to cope with needs at different stages, e.g. their use in candidate selection has been limited due to their lack of the required predictability. For some events or endpoints involving more complex mechanisms, the current in silico approaches still need further improvement. In this review, we will briefly introduce the development of in silico models for some physicochemical parameters, ADME properties and toxicity evaluation, with an emphasis on the modelling approaches thereof, their application in drug discovery, and the potential merits or deficiencies of these models. Finally, the outlook for future ADME/T modelling based on big data analysis and systems sciences will be discussed.
Collapse
|
7
|
Abstract
Drug discovery utilizes chemical biology and computational drug design approaches for the efficient identification and optimization of lead compounds. Chemical biology is mostly involved in the elucidation of the biological function of a target and the mechanism of action of a chemical modulator. On the other hand, computer-aided drug design makes use of the structural knowledge of either the target (structure-based) or known ligands with bioactivity (ligand-based) to facilitate the determination of promising candidate drugs. Various virtual screening techniques are now being used by both pharmaceutical companies and academic research groups to reduce the cost and time required for the discovery of a potent drug. Despite the rapid advances in these methods, continuous improvements are critical for future drug discovery tools. Advantages presented by structure-based and ligand-based drug design suggest that their complementary use, as well as their integration with experimental routines, has a powerful impact on rational drug design. In this article, we give an overview of the current computational drug design and their application in integrated rational drug development to aid in the progress of drug discovery research.
Collapse
Affiliation(s)
- Stephani Joy Y Macalino
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Vijayakumar Gosu
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Sunhye Hong
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea
| | - Sun Choi
- National Leading Research Laboratory of Molecular Modeling and Drug Design, College of Pharmacy and Graduate School of Pharmaceutical Sciences, and Global Top 5 Research Program, Ewha Womans University, Seoul, 120-750, Korea.
| |
Collapse
|
8
|
Caldwell GW. In silico tools used for compound selection during target-based drug discovery and development. Expert Opin Drug Discov 2015; 10:901-23. [DOI: 10.1517/17460441.2015.1043885] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Affiliation(s)
- Gary W Caldwell
- Janssen Research & Development LLC, Discovery Sciences, Spring House, PA, USA
| |
Collapse
|
9
|
Time dependent analysis of assay comparability: a novel approach to understand intra- and inter-site variability over time. J Comput Aided Mol Des 2015; 29:795-807. [DOI: 10.1007/s10822-015-9836-5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 02/09/2015] [Indexed: 01/06/2023]
|
10
|
Focused chemical libraries--design and enrichment: an example of protein-protein interaction chemical space. Future Med Chem 2014; 6:1291-307. [PMID: 24773599 DOI: 10.4155/fmc.14.57] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
One of the many obstacles in the development of new drugs lies in the limited number of therapeutic targets and in the quality of screening collections of compounds. In this review, we present general strategies for building target-focused chemical libraries with a particular emphasis on protein-protein interactions (PPIs). We describe the chemical spaces spanned by nine commercially available PPI-focused libraries and compare them to our 2P2I3D academic library, dedicated to orthosteric PPI modulators. We show that although PPI-focused libraries have been designed using different strategies, they share common subspaces. PPI inhibitors are larger and more hydrophobic than standard drugs; however, an effort has been made to improve the drug-likeness of focused chemical libraries dedicated to this challenging class of targets.
Collapse
|
11
|
Yongye AB, Medina-Franco JL. Systematic characterization of structure-activity relationships and ADMET compliance: a case study. Drug Discov Today 2013; 18:732-9. [PMID: 23583765 DOI: 10.1016/j.drudis.2013.04.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Revised: 03/18/2013] [Accepted: 04/04/2013] [Indexed: 01/29/2023]
Abstract
Traditionally, activity landscape modeling has been focused on analyzing SAR, despite the fact that lead optimization in drug discovery involves concurrent enhancements of activity and ADMET properties of leads. As a case study, we discuss the systematic analysis of activity landscapes, incorporating ADMET considerations, using a dataset of 166 compounds screened for kappa-opioid receptor activity. Pairwise MACCS/Tanimoto structure similarities, property similarities utilizing 33 ADMET descriptors and a 35-dimensional 'violation bit vector' representing drug-likeness are analyzed. We address the question about the range of ADMET property violations that arise from structural changes, subtle and significant. Pairs of compounds are identified bearing identical, comparable and significantly different drug-likeness in the three informative regions of structure-activity landscapes.
Collapse
Affiliation(s)
- Austin B Yongye
- Torrey Pines Institute for Molecular Studies, 11350 SW Village Parkway, Port St. Lucie, FL 34987, USA.
| | | |
Collapse
|
12
|
Wood DJ, Carlsson L, Eklund M, Norinder U, Stålring J. QSAR with experimental and predictive distributions: an information theoretic approach for assessing model quality. J Comput Aided Mol Des 2013; 27:203-19. [PMID: 23504478 PMCID: PMC3639359 DOI: 10.1007/s10822-013-9639-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2012] [Accepted: 03/05/2013] [Indexed: 11/29/2022]
Abstract
We propose that quantitative structure–activity relationship (QSAR) predictions should be explicitly represented as predictive (probability) distributions. If both predictions and experimental measurements are treated as probability distributions, the quality of a set of predictive distributions output by a model can be assessed with Kullback–Leibler (KL) divergence: a widely used information theoretic measure of the distance between two probability distributions. We have assessed a range of different machine learning algorithms and error estimation methods for producing predictive distributions with an analysis against three of AstraZeneca’s global DMPK datasets. Using the KL-divergence framework, we have identified a few combinations of algorithms that produce accurate and valid compound-specific predictive distributions. These methods use reliability indices to assign predictive distributions to the predictions output by QSAR models so that reliable predictions have tight distributions and vice versa. Finally we show how valid predictive distributions can be used to estimate the probability that a test compound has properties that hit single- or multi- objective target profiles.
Collapse
|