1
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
2
|
Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024; 15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open
Abstract
Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.
Collapse
Affiliation(s)
- Yeonjoon Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
- Department of Chemistry, Pukyong National University Busan 48513 Republic of Korea
| | - Hojin Jung
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Sabari Kumar
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Seonah Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
3
|
Xu C, Liu R, Huang S, Li W, Li Z, Luo HB. 3D-SMGE: a pipeline for scaffold-based molecular generation and evaluation. Brief Bioinform 2023; 24:bbad327. [PMID: 37756591 DOI: 10.1093/bib/bbad327] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 08/19/2023] [Accepted: 08/30/2023] [Indexed: 09/29/2023] Open
Abstract
In the process of drug discovery, one of the key problems is how to improve the biological activity and ADMET properties starting from a specific structure, which is also called structural optimization. Based on a starting scaffold, the use of deep generative model to generate molecules with desired drug-like properties will provide a powerful tool to accelerate the structural optimization process. However, the existing generative models remain challenging in extracting molecular features efficiently in 3D space to generate drug-like 3D molecules. Moreover, most of the existing ADMET prediction models made predictions of different properties through a single model, which can result in reduced prediction accuracy on some datasets. To effectively generate molecules from a specific scaffold and provide basis for the structural optimization, the 3D-SMGE (3-Dimensional Scaffold-based Molecular Generation and Evaluation) work consisting of molecular generation and prediction of ADMET properties is presented. For the molecular generation, we proposed 3D-SMG, a novel deep generative model for the end-to-end design of 3D molecules. In the 3D-SMG model, we designed the cross-aggregated continuous-filter convolution (ca-cfconv), which is used to achieve efficient and low-cost 3D spatial feature extraction while ensuring the invariance of atomic space rotation. 3D-SMG was proved to generate valid, unique and novel molecules with high drug-likeness. Besides, the proposed data-adaptive multi-model ADMET prediction method outperformed or maintained the best evaluation metrics on 24 out of 27 ADMET benchmark datasets. 3D-SMGE is anticipated to emerge as a powerful tool for hit-to-lead structural optimizations and accelerate the drug discovery process.
Collapse
Affiliation(s)
- Chao Xu
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Runduo Liu
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Shuheng Huang
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| | - Wenchao Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Zhe Li
- School of Pharmaceutical Sciences, Sun Yat-Sen University, Guangzhou, 510000, Guangdong, P.R. China
| | - Hai-Bin Luo
- Key Laboratory of Tropical Biological Resources of Ministry of Education, School of Pharmaceutical Sciences, Hainan University, Haikou 570228, Hainan, P.R. China
| |
Collapse
|
4
|
Avdeef A. Mechanistically transparent models for predicting aqueous solubility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression. ADMET AND DMPK 2023; 11:317-330. [PMID: 37829322 PMCID: PMC10567068 DOI: 10.5599/admet.1879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Yalkowsky's General Solubility Equation (GSE), with its three fixed constants, is popular and easy to apply, but is not very accurate for polar, zwitterionic, or flexible molecules. This review examines the findings of a series of studies, where we have sought to come up with a better prediction model, by comparing the performances of the GSE to Abraham's Solvation Equation (ABSOLV), and Random Forest regression (RFR) machine-learning (ML) method. Large, well-curated aqueous intrinsic solubility databases are available. However, drugs may be sparsely distributed in chemical space, concentrated in clusters. Even a large database might overlook some regions. Test compounds from under-represented portions of space may be poorly predicted, as might be the case with the 'loose' set of 32 drugs in the Second Solubility Challenge (2020). There appears to be still a need for better coverage of drug space. Increasingly, current trends in predictions of solubility use calculated input descriptors, which may be an advantage for exploring properties of molecules yet to be synthesized. The risk may be that overall prediction approaches might be based on accumulated uncertainty. The increasing use of ML/AI methods can lead to accurate predictions, but such predictions may not readily suggest the strategies to pursue in selecting yet-to-be-synthesized compounds. Based on our latest findings, we recommend predictions based on both 'grouped' ABSOLV(GRP) and 'Flexible Acceptor' GSE(Φ,B) models with the provided best-fit parameters, where Φ is the Kier molecular flexibility index and B is the Abraham H-bond acceptor strength. For molecules with Φ < 11, the prudent choice is to pick the Consensus Model, the average of ABSOLV(GRP) and GSE(Φ,B). For more flexible molecules, GSE(Φ,B) is recommended.
Collapse
|
5
|
Stienstra CMK, Ieritano C, Haack A, Hopkins WS. Bridging the Gap between Differential Mobility, Log S, and Log P Using Machine Learning and SHAP Analysis. Anal Chem 2023. [PMID: 37384824 DOI: 10.1021/acs.analchem.3c00921] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Aqueous solubility, log S, and the water-octanol partition coefficient, log P, are physicochemical properties that are used to screen the viability of drug candidates and to estimate mass transport in the environment. In this work, differential mobility spectrometry (DMS) experiments performed in microsolvating environments are used to train machine learning (ML) frameworks that predict the log S and log P of various molecule classes. In lieu of a consistent source of experimentally measured log S and log P values, the OPERA package was used to evaluate the aqueous solubility and hydrophobicity of 333 analytes. With ion mobility/DMS data (e.g., CCS, dispersion curves) as input, we used ML regressors and ensemble stacking to derive relationships with a high degree of explainability, as assessed via SHapley Additive exPlanations (SHAP) analysis. The DMS-based regression models returned scores of R2 = 0.67 and RMSE = 1.03 ± 0.10 for log S predictions and R2 = 0.67 and RMSE = 1.20 ± 0.10 for log P after 5-fold random cross-validation. SHAP analysis reveals that the regressors strongly weighted gas-phase clustering in log P correlations. The addition of structural descriptors (e.g., # of aromatic carbons) improved log S predictions to yield RMSE = 0.84 ± 0.07 and R2 = 0.78. Similarly, log P predictions using the same data resulted in an RMSE of 0.83 ± 0.04 and R2 = 0.84. The SHAP analysis of log P models highlights the need for additional experimental parameters describing hydrophobic interactions. These results were achieved with a smaller dataset (333 instances) and minimal structural correlation compared to purely structure-based models, underscoring the value of employing DMS data in predictive models.
Collapse
Affiliation(s)
- Cailum M K Stienstra
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Christian Ieritano
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Alexander Haack
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - W Scott Hopkins
- Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
- Watermine Innovation, Waterloo, Ontario N0B 2T0, Canada
- Centre for Eye and Vision Research, Hong Kong Science Park, New Territories 999077, Hong Kong
| |
Collapse
|
6
|
Guo J, Sun M, Zhao X, Shi C, Su H, Guo Y, Pu X. General Graph Neural Network-Based Model To Accurately Predict Cocrystal Density and Insight from Data Quality and Feature Representation. J Chem Inf Model 2023; 63:1143-1156. [PMID: 36734616 DOI: 10.1021/acs.jcim.2c01538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Cocrystal engineering as an effective way to modify solid-state properties has inspired great interest from diverse material fields while cocrystal density is an important property closely correlated with the material function. In order to accurately predict the cocrystal density, we develop a graph neural network (GNN)-based deep learning framework by considering three key factors of machine learning (data quality, feature presentation, and model architecture). The result shows that different stoichiometric ratios of molecules in cocrystals can significantly influence the prediction performances, highlighting the importance of data quality. In addition, the feature complementary is not suitable for augmenting the molecular graph representation in the cocrystal density prediction, suggesting that the complementary strategy needs to consider whether extra features can sufficiently supplement the lacked information in the original representation. Based on these results, 4144 cocrystals with 1:1 stoichiometry ratio are selected as the dataset, supplemented by the data augmentation of exchanging a pair of coformers. The molecular graph is determined to learn feature representation to train the GNN-based model. Global attention is introduced to further optimize the feature space and identify important atoms to realize the interpretability of the model. Benefited from the advantages, our model significantly outperforms three competitive models and exhibits high prediction accuracy for unseen cocrystals, showcasing its robustness and generality. Overall, our work not only provides a general cocrystal density prediction tool for experimental investigations but also provides useful guidelines for the machine learning application. All source codes are freely available at https://github.com/Xiao-Gua00/CCPGraph.
Collapse
Affiliation(s)
- Jiali Guo
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Ming Sun
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Xueyan Zhao
- Institute of Chemical Materials, China Academy of Engineering Physics, Mianyang621900, China
| | - Chaojie Shi
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Haoming Su
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Yanzhi Guo
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| | - Xuemei Pu
- College of Chemistry, Sichuan University, Chengdu610064, People's Republic of China
| |
Collapse
|
7
|
Conn JM, Carter JW, Conn JJA, Subramanian V, Baxter A, Engkvist O, Llinas A, Ratkova EL, Pickett SD, McDonagh JL, Palmer DS. Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models. J Chem Inf Model 2023; 63:1099-1113. [PMID: 36758178 PMCID: PMC9976279 DOI: 10.1021/acs.jcim.2c01189] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.
Collapse
Affiliation(s)
- Jonathan
G. M. Conn
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - James W. Carter
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - Justin J. A. Conn
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - Vigneshwari Subramanian
- Drug
Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D,
AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden
| | - Andrew Baxter
- GSK
Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, U.K.
| | - Ola Engkvist
- Medicinal
Chemistry, Research and Early Development, Cardiovascular, Renal and
Metabolism (CVRM), BioPharmaceuticals R&D,
AstraZeneca, SE-431 50 Göteborg, Sweden,Department
of Computer Science and Engineering, Chalmers
University of Technology, SE-412 96 Göteborg, Sweden
| | - Antonio Llinas
- Drug
Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D,
AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden
| | - Ekaterina L. Ratkova
- Medicinal
Chemistry, Research and Early Development, Cardiovascular, Renal and
Metabolism (CVRM), BioPharmaceuticals R&D,
AstraZeneca, SE-431 50 Göteborg, Sweden
| | - Stephen D. Pickett
- Computational
Sciences, GlaxoSmithKline R&D Pharmaceuticals, Stevenage SG1 2NY, U.K.
| | - James L. McDonagh
- IBM Research
Europe, Hartree Centre, SciTech Daresbury, Warrington, Cheshire WA4 4AD, U.K.
| | - David S. Palmer
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.,E-mail:
| |
Collapse
|
8
|
Alghamdi S, Abbas F, Hussein R, Alhamzani A, El‐Shamy N. Spectroscopic characterization (IR, UV-Vis), and HOMO-LUMO, MEP, NLO, NBO Analysis and the Antifungal Activity for 4-Bromo-N-(2-nitrophenyl) benzamide; Using DFT Modeling and In silico Molecular Docking. J Mol Struct 2023. [DOI: 10.1016/j.molstruc.2022.134001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2022]
|
9
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
10
|
Oja M, Sild S, Piir G, Maran U. Intrinsic Aqueous Solubility: Mechanistically Transparent Data-Driven Modeling of Drug Substances. Pharmaceutics 2022; 14:pharmaceutics14102248. [PMID: 36297685 PMCID: PMC9611068 DOI: 10.3390/pharmaceutics14102248] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/12/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
Intrinsic aqueous solubility is a foundational property for understanding the chemical, technological, pharmaceutical, and environmental behavior of drug substances. Despite years of solubility research, molecular structure-based prediction of the intrinsic aqueous solubility of drug substances is still under active investigation. This paper describes the authors’ systematic data-driven modelling in which two fit-for-purpose training data sets for intrinsic aqueous solubility were collected and curated, and three quantitative structure–property relationships were derived to make predictions for the most recent solubility challenge. All three models perform well individually, while being mechanistically transparent and easy to understand. Molecular descriptors involved in the models are related to the following key steps in the solubility process: dissociation of the molecule from the crystal, formation of a cavity in the solvent, and insertion of the molecule into the solvent. A consensus modeling approach with these models remarkably improved prediction capability and reduced the number of strong outliers by more than two times. The performance and outliers of the second solubility challenge predictions were analyzed retrospectively. All developed models have been published in the QsarDB.org repository according to FAIR principles and can be used without restrictions for exploring, downloading, and making predictions.
Collapse
Affiliation(s)
| | | | | | - Uko Maran
- Correspondence: ; Tel.: +372-7-375-254; Fax: +372-7-375-264
| |
Collapse
|
11
|
Wu J, Kang Y, Pan P, Hou T. Machine learning methods for pK a prediction of small molecules: Advances and challenges. Drug Discov Today 2022; 27:103372. [PMID: 36167281 DOI: 10.1016/j.drudis.2022.103372] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 08/15/2022] [Accepted: 09/21/2022] [Indexed: 11/27/2022]
Abstract
The acid-base dissociation constant (pKa) is a fundamental property influencing many ADMET properties of small molecules. However, rapid and accurate pKa prediction remains a great challenge. In this review, we outline the current advances in machine-learning-based QSAR models for pKa prediction, including descriptor-based and graph-based approaches, and summarize their pros and cons. Moreover, we highlight the current challenges and future directions regarding experimental data, crucial factors influencing pKa and in silico prediction tools. We hope that this review can provide a practical guidance for the follow-up studies.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Peichen Pan
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences and Cancer Center, Zhejiang University, Hangzhou 310058, Zhejiang, China.
| |
Collapse
|
12
|
Avdeef A, Kansy M. Trends in PhysChem Properties of Newly Approved Drugs over the Last Six Years; Predicting Solubility of Drugs Approved in 2021. J SOLUTION CHEM 2022. [DOI: 10.1007/s10953-022-01199-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
13
|
Nikita S, Thakur G, Jesubalan NG, Kulkarni A, Yezhuvath VB, Rathore AS. AI-ML applications in bioprocessing: ML as an enabler of real time quality prediction in continuous manufacturing of mAbs. Comput Chem Eng 2022. [DOI: 10.1016/j.compchemeng.2022.107896] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
14
|
Kosinska GP, Ognichenko LM, Shyrykalova AO, Burdina YF, Artemenko AG, Kuz’min VE. Influence of Chemical Structure of Molecules on Blood–Brain Barrier Permeability on the Pampa Model. THEOR EXP CHEM+ 2022. [DOI: 10.1007/s11237-022-09718-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
15
|
Avdeef A, Kansy M. Predicting Solubility of Newly-Approved Drugs (2016–2020) with a Simple ABSOLV and GSE(Flexible-Acceptor) Consensus Model Outperforming Random Forest Regression. J SOLUTION CHEM 2022; 51:1020-1055. [PMID: 35153342 PMCID: PMC8818506 DOI: 10.1007/s10953-022-01141-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Accepted: 11/10/2021] [Indexed: 11/24/2022]
Abstract
This study applies the ‘Flexible-Acceptor’ variant of the General Solubility Equation, GSE(Φ,B), to the prediction of the aqueous intrinsic solubility, log10S0, of FDA recently-approved (2016–2020) ‘small-molecule’ new molecular entities (NMEs). The novel equation had been shown to predict the solubility of drugs beyond Lipinski’s ‘Rule of 5’ chemical space (bRo5) to a precision nearly matching that of the Random Forest Regression (RFR) machine learning method. Since then, it was found that the GSE(Φ,B) appears to work well not only for bRo5 NMEs, but also for Ro5 drugs. To put context to GSE(Φ,B), Yalkowsky’s GSE(classic), Abraham’s ABSOLV, and Breiman’s RFR models were also applied to predict log10 S0 of 72 newly-approve NMEs, for which useable reported solubility values could be accessed (nearly 60% from FDA New Drug Application published reports). Except for GSE (classic), the prediction models were retrained with an enlarged version of the Wiki-pS0 database (nearly 400 added log10 S0 entries since our recent previous study). Thus, these four models were further validated by the additional independent solubility measurements which the newly-approved drugs introduced. The prediction methods ranked RFR ~ GSE (Φ,B) > ABSOLV > GSE (classic) in performance. It was further demonstrated that the biases generated in the four separate models could be nearly eliminated in a consensus model based on the average of just two of the methods: GSE (Φ,B) and ABSOLV. The resulting consensus prediction equation is simple in form and can be easily incorporated into spreadsheet calculations. Even more significant, it slightly outperformed the RFR method.
Collapse
|
16
|
Discovery solubility measurement and assessment of small molecules with drug development in mind. Drug Discov Today 2022; 27:1315-1325. [PMID: 35114363 DOI: 10.1016/j.drudis.2022.01.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/04/2022] [Accepted: 01/27/2022] [Indexed: 12/24/2022]
Abstract
Solubility is a key physicochemical property for the success of any drug candidate. Although the methods used and their rationales for determining solubility are subject to project needs and stages along the drug discovery-drug development pipeline, an artificial boundary can exist at the discovery-development interface. This boundary results in less effective solubility knowledge sharing and data integration among scientists in both drug discovery and drug development. Herein, we present a refreshed perspective on solubility. Solubility experimentation is not a one-size-fits-all measurement; instead, we stress the importance of constructing a seamless solubility understanding of a molecule as it progresses from a new chemical entity into a drug product.
Collapse
|
17
|
El-Shamy NT, Alkaoud AM, Hussein RK, Ibrahim MA, Alhamzani AG, Abou-Krisha MM. DFT, ADMET and Molecular Docking Investigations for the Antimicrobial Activity of 6,6'-Diamino-1,1',3,3'-tetramethyl-5,5'-(4-chlorobenzylidene)bis[pyrimidine-2,4(1H,3H)-dione]. Molecules 2022; 27:620. [PMID: 35163880 PMCID: PMC8839838 DOI: 10.3390/molecules27030620] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2021] [Revised: 01/05/2022] [Accepted: 01/15/2022] [Indexed: 12/24/2022] Open
Abstract
Heterocyclic compounds, including pyrimidine derivatives, exhibit a broad variety of biological and pharmacological activities. In this paper, a previously synthesized novel pyrimidine molecule is proposed, and its pharmaceutical properties are investigated. Computational techniques such as the density functional theory, ADMET evaluation, and molecular docking were applied to elucidate the chemical nature, drug likeness and antibacterial function of molecule. The viewpoint of quantum chemical computations revealed that the molecule was relatively stable and has a high electrophilic nature. The contour maps of HOMO-LUMO and molecular electrostatic potential were analyzed to illustrate the charge density distributions that could be associated with the biological activity. Natural bond orbital (NBO) analysis revealed details about the interaction between donor and acceptor within the bond. Drug likeness and ADMET analysis showed that the molecule possesses the agents of safety and the effective combination therapy as pharmaceutical drug. The antimicrobial activity was investigated using molecular docking. The investigated molecule demonstrated a high affinity for binding within the active sites of antibacterial and antimalarial proteins. The high affinity of the antibacterial protein was proved by its low binding energy (-7.97 kcal/mol) and a low inhibition constant value (1.43 µM). The formation of four conventional hydrogen bonds in ligand-protein interactions confirmed the high stability of the resulting complexes. When compared to known standard drugs, the studied molecule displayed a remarkable antimalarial activity, as indicated by higher binding affinity (B.E. -5.86 kcal/mol & Ki = 50.23 M). The pre-selected molecule could be presented as a promising drug candidate for the development of novel antimicrobial agents.
Collapse
Affiliation(s)
- Nesreen T. El-Shamy
- Physics Department, Faculty of Science, Taibah University, Al-Madina Al Munawarah 44256, Saudi Arabia; or
- Physics Department, Faculty of Women, Ain Shams University, Cairo 11865, Egypt
| | - Ahmed M. Alkaoud
- Physics Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11623, Saudi Arabia; (A.M.A.); (M.A.I.)
| | - Rageh K. Hussein
- Physics Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11623, Saudi Arabia; (A.M.A.); (M.A.I.)
| | - Moez A. Ibrahim
- Physics Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11623, Saudi Arabia; (A.M.A.); (M.A.I.)
| | - Abdulrahman G. Alhamzani
- Chemistry Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11623, Saudi Arabia; (A.G.A.); (M.M.A.-K.)
| | - Mortaga M. Abou-Krisha
- Chemistry Department, College of Science, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 11623, Saudi Arabia; (A.G.A.); (M.M.A.-K.)
- Department of Chemistry, Faculty of Science, South Valley University, Qena 83523, Egypt
| |
Collapse
|
18
|
Avdeef A, Sugano K. Salt Solubility and Disproportionation - Uses and Limitations of Equations for pH max and the In-silico Prediction of pH max. J Pharm Sci 2021; 111:225-246. [PMID: 34863819 DOI: 10.1016/j.xphs.2021.11.017] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 11/23/2021] [Accepted: 11/23/2021] [Indexed: 10/19/2022]
Abstract
A multiphasic mass action equilibrium model was used to study the phase properties near the critical pH ('pHmax') in an acid-base transformation of a solid drug salt into its corresponding solid free base form in pure water slurries. The goal of this study was to better define the characteristics of disproportionation of pharmaceutical salts, objectively (i) to classify salts as μ-type (microclimate stable) or δ-type (disproportionation prone) based on the relationship between the calculated pHmax and the calculated pH of the saturated salt solution, (ii) to compare the distribution of μ/δ-type salts to predictions from the disproportionation potential equation introduced by Merritt et al.,20 (iii) to determine if the intrinsic solubility of the free base, S0, can be predicted from the measured μ-type salt solubility as a means of estimating the value of pHmax, (iv) to determine S0 directly from the measured δ-type salt solubility, and (v) to address some of the limitations of the equations commonly used to calculate pHmax. When the salt solubility is measured for a basic API (pKa of which is known), but the experimental value of S0 is unavailable, a potentially useful simple screen for disproportionation is still possible, since pHmax can be estimated from a 'μ-predicted' (objective iii) or 'δ-measured' S0 (objective iv). Twelve model weak base API were selected in the study. For each API, 2-17 different salt forms with reported salt solubilities in distilled water were sourced from the literature. In all, 73 salt solubility values based on 29 different salt-forming acids comprise the studied set. All the corresponding free base solubility values were available. The pKa values for all the acids and bases studied are generally well known. For each API salt, an acid-base titration simulation was performed, anchored to the measured salt solubility value, using the general mass action analysis program pDISOL-X. The log S-pH profiles were drawn out by analytic continuity from pH 0 to 13, as described in detail previously.24 Potentially useful in-silico models were developed that correlate pS0 to linear functions of the salt solubility in water, pSw, the partition coefficient of the salt-forming acid (log POCTacid) and the melting point (mp) of the drug salt, thereby enabling the derivation of the approximate pHmax value from the predicted pS0.
Collapse
Affiliation(s)
- Alex Avdeef
- in-ADME Research, 1732 First Avenue, #102, New York, NY, 10128, USA.
| | - Kiyohiko Sugano
- Molecular Pharmaceutics Lab., College of Pharmaceutical Sciences, Ritsumeikan University, 1-1-1, Noji-higashi, Kusatsu, Shiga, 525-8577, Japan
| |
Collapse
|
19
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
20
|
Tosca EM, Bartolucci R, Magni P. Application of Artificial Neural Networks to Predict the Intrinsic Solubility of Drug-Like Molecules. Pharmaceutics 2021; 13:1101. [PMID: 34371792 PMCID: PMC8309152 DOI: 10.3390/pharmaceutics13071101] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2021] [Revised: 07/15/2021] [Accepted: 07/16/2021] [Indexed: 11/25/2022] Open
Abstract
Machine learning (ML) approaches are receiving increasing attention from pharmaceutical companies and regulatory agencies, given their ability to mine knowledge from available data. In drug discovery, for example, they are employed in quantitative structure-property relationship (QSPR) models to predict biological properties from the chemical structure of a drug molecule. In this paper, following the Second Solubility Challenge (SC-2), a QSPR model based on artificial neural networks (ANNs) was built to predict the intrinsic solubility (logS0) of the 100-compound low-variance tight set and the 32-compound high-variance loose set provided by SC-2 as test datasets. First, a training dataset of 270 drug-like molecules with logS0 value experimentally determined was gathered from the literature. Then, a standard three-layer feed-forward neural network was defined by using 10 ChemGPS physico-chemical descriptors as input features. The developed ANN showed adequate predictive performances on both of the SC-2 test datasets. Benefits and limitations of ML approaches have been highlighted and discussed, starting from this case-study. The main findings confirmed that ML approaches are an attractive and promising tool to predict logS0; however, many aspects, such as data quality, molecular descriptor computation and selection, and assessment of applicability domain, are crucial but often neglected, and should be carefully considered to improve predictions based on ML.
Collapse
Affiliation(s)
| | | | - Paolo Magni
- Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 5, I-27100 Pavia, Italy; (E.M.T.); (R.B.)
| |
Collapse
|
21
|
Tang ZQ, Zhao L, Chen GX, Chen CYC. Novel and versatile artificial intelligence algorithms for investigating possible GHSR1α and DRD1 agonists for Alzheimer's disease. RSC Adv 2021; 11:6423-6446. [PMID: 35423219 PMCID: PMC8694922 DOI: 10.1039/d0ra10077c] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2020] [Accepted: 01/18/2021] [Indexed: 11/21/2022] Open
Abstract
Hippocampal lesions are recognized as the earliest pathological changes in Alzheimer's disease (AD). Recent researches have shown that the co-activation of growth hormone secretagogue receptor 1α (GHSR1α) and dopamine receptor D1 (DRD1) could recover the function of hippocampal synaptic and cognition. We combined traditional virtual screening technology with artificial intelligence models to screen multi-target agonists for target proteins from TCM database and a novel boost Generalized Regression Neural Network (GRNN) model was proposed in this article to improve the poor adjustability of GRNN. R-square was chosen to evaluate the accuracy of these artificial intelligent models. For the GHSR1α agonist dataset, Adaptive Boosting (AdaBoost), Linear Ridge Regression (LRR), Support Vector Machine (SVM), and boost GRNN achieved good results; the R-square of the test set of these models reached 0.900, 0.813, 0.708, and 0.802, respectively. For the DRD1 agonist dataset, Gradient Boosting (GB), Random Forest (RF), SVM, and boost GRNN achieved good results; the R-square of the test set of these models reached 0.839, 0.781, 0.763, and 0.815, respectively. According to these values of R-square, it is obvious that boost GRNN and SVM have better adaptability for different data sets and boost GRNN is more accurate than SVM. To evaluate the reliability of screening results, molecular dynamics (MD) simulation experiments were performed to make sure that candidates were docked well in the protein binding site. By analyzing the results of these artificial intelligent models and MD experiments, we suggest that 2007_17103 and 2007_13380 are the possible dual-target drugs for Alzheimer's disease (AD).
Collapse
Affiliation(s)
- Zi-Qiang Tang
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
| | - Lu Zhao
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
- Department of Clinical Laboratory, The Sixth Affiliated Hospital, Sun Yat-sen University Guangzhou 510655 China
| | - Guan-Xing Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
| | - Calvin Yu-Chian Chen
- Artificial Intelligence Medical Center, School of Intelligent Systems Engineering, Sun Yat-sen University Shenzhen Guangzhou 510275 China
- Department of Medical Research, China Medical University Hospital Taichung 40447 Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University Taichung 41354 Taiwan
| |
Collapse
|
22
|
Sorkun MC, Koelman JVA, Er S. Pushing the limits of solubility prediction via quality-oriented data selection. iScience 2021; 24:101961. [PMID: 33437941 PMCID: PMC7788089 DOI: 10.1016/j.isci.2020.101961] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 11/18/2020] [Accepted: 12/15/2020] [Indexed: 01/19/2023] Open
Abstract
Accurate prediction of the solubility of chemical substances in solvents remains a challenge. The sparsity of high-quality solubility data is recognized as the biggest hurdle in the development of robust data-driven methods for practical use. Nonetheless, the effects of the quality and quantity of data on aqueous solubility predictions have not yet been scrutinized. In this study, the roles of the size and the quality of data sets on the performances of the solubility prediction models are unraveled, and the concepts of actual and observed performances are introduced. In an effort to curtail the gap between actual and observed performances, a quality-oriented data selection method, which evaluates the quality of data and extracts the most accurate part of it through statistical validation, is designed. Applying this method on the largest publicly available solubility database and using a consensus machine learning approach, a top-performing solubility prediction model is achieved.
Collapse
Affiliation(s)
- Murat Cihan Sorkun
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
- CCER - Center for Computational Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
- Department of Applied Physics, Eindhoven University of Technology, 5600 MB Eindhoven, the Netherlands
| | - J.M. Vianney A. Koelman
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
- CCER - Center for Computational Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
- Department of Applied Physics, Eindhoven University of Technology, 5600 MB Eindhoven, the Netherlands
| | - Süleyman Er
- DIFFER - Dutch Institute for Fundamental Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
- CCER - Center for Computational Energy Research, De Zaale 20, 5612 AJ Eindhoven, the Netherlands
| |
Collapse
|
23
|
QSPR models for water solubility of ammonium hexafluorosilicates: analysis of the effects of hydrogen bonds. Struct Chem 2020. [DOI: 10.1007/s11224-020-01652-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
24
|
Abstract
This study describes a novel nonlinear variant of the well-known Yalkowsky general solubility equation (GSE). The modified equation can be trained with small molecules, mostly from the Lipinski Rule of 5 (Ro5) chemical space, to predict the intrinsic aqueous solubility, S0, of large molecules (MW > 800 Da) from beyond the rule of 5 (bRo5) space, to an accuracy almost equal to that of a recently described random forest regression (RFR) machine learning analysis. The new approach replaces the GSE constant factors in the intercept (0.5), the octanol-water log P (-1.0), and melting point, mp (-0.01) terms with simple exponential functions incorporating the sum descriptor, Φ+B (Kier Φ molecular flexibility and Abraham H-bond acceptor potential). The constants in the modified three-variable (log P, mp, Φ+B) equation were determined by partial least-squares (PLS) refinement using a small-molecule log S0 training set (n = 6541) of mostly druglike molecules. In this "flexible-acceptor" GSE(Φ,B) model, the coefficient of log P (normally fixed at -1.0) varies smoothly from -1.1 for rigid nonionizable molecules (Φ+B = 0) to -0.39 for typically flexible (Φ ∼ 20, B ∼ 6) large molecules. The intercept (traditionally fixed at +0.5) varies smoothly from +1.9 for completely inflexible small molecules to -2.2 for typically flexible large molecules. The mp coefficient (-0.007) remains practically constant, near the traditional value (-0.01) for most molecules, which suggests that the small-to-large molecule continuum is mainly solvation responsive, apparently with only minor changes in the crystal lattice contributions. For a test set of 32 large molecules (e.g., cyclosporine A, gramicidin A, leuprolide, nafarelin, oxytocin, vancomycin, and mostly natural-product-derived therapeutics used in infectious/viral diseases, in immunosuppression, and in oncology) the modified equation predicted the intrinsic solubility with a root-mean-square error of 1.10 log unit, compared to 3.0 by the traditional GSE, and 1.07 by RFR.
Collapse
Affiliation(s)
- Alex Avdeef
- in-ADME Research, 1732 First Avenue, no. 102, New York 10128, United States
| | | |
Collapse
|
25
|
Llinas A, Oprisiu I, Avdeef A. Findings of the Second Challenge to Predict Aqueous Solubility. J Chem Inf Model 2020; 60:4791-4803. [DOI: 10.1021/acs.jcim.0c00701] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Antonio Llinas
- DMPK, Research and Early Development, Respiratory & Immunology (R&I), BioPharmaceuticals R&D, AstraZeneca, Gothenburg SE 431 50, Sweden
| | - Ioana Oprisiu
- Data Science & Artificial Intelligence, Imaging & Data Analytics, Clinical Pharmacology & Safety Sciences, R&D, AstraZeneca, Gothenburg SE 431 50, Sweden
| | - Alex Avdeef
- in-ADME Research, 1732 First Avenue, #102, New York, New York 10128, United States
| |
Collapse
|