1
|
Zukić S, Osmanović A, Harej Hrkać A, Kraljević Pavelić S, Špirtović-Halilović S, Veljović E, Roca S, Trifunović S, Završnik D, Maran U. Data-Driven Modelling of Substituted Pyrimidine and Uracil-Based Derivatives Validated with Newly Synthesized and Antiproliferative Evaluated Compounds. Int J Mol Sci 2024; 25:9390. [PMID: 39273338 PMCID: PMC11395534 DOI: 10.3390/ijms25179390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2024] [Revised: 08/21/2024] [Accepted: 08/26/2024] [Indexed: 09/15/2024] Open
Abstract
The pyrimidine heterocycle plays an important role in anticancer research. In particular, the pyrimidine derivative families of uracil show promise as structural scaffolds relevant to cervical cancer. This group of chemicals lacks data-driven machine learning quantitative structure-activity relationships (QSARs) that allow for generalization and predictive capabilities in the search for new active compounds. To achieve this, a dataset of pyrimidine and uracil compounds from ChEMBL were collected and curated. A workflow was developed for data-driven machine learning QSAR using an intuitive dataset design and forwards selection of molecular descriptors. The model was thoroughly externally validated against available data. Blind validation was also performed by synthesis and antiproliferative evaluation of new synthesized uracil-based and pyrimidine derivatives. The most active compound among new synthesized derivatives, 2,4,5-trisubstituted pyrimidine was predicted with the QSAR model with differences of 0.02 compared to experimentally tested activity.
Collapse
Affiliation(s)
- Selma Zukić
- Institute of Chemistry, University of Tartu, Ravila Street 14a, 50411 Tartu, Estonia
| | - Amar Osmanović
- University of Sarajevo-Faculty of Pharmacy, Zmaja od Bosne 8, 71000 Sarajevo, Bosnia and Herzegovina
| | - Anja Harej Hrkać
- Department of Basic and Clinical Pharmacology and Toxicology, Faculty of Medicine, University of Rijeka, Braće Branchetta 20, 51000 Rijeka, Croatia
| | | | - Selma Špirtović-Halilović
- University of Sarajevo-Faculty of Pharmacy, Zmaja od Bosne 8, 71000 Sarajevo, Bosnia and Herzegovina
| | - Elma Veljović
- University of Sarajevo-Faculty of Pharmacy, Zmaja od Bosne 8, 71000 Sarajevo, Bosnia and Herzegovina
| | - Sunčica Roca
- Centre for Nuclear Magnetic Resonance (NMR), Ruđer Bošković Institute, Bijenička Street 54, 10000 Zagreb, Croatia
| | - Snežana Trifunović
- Faculty of Chemistry, University of Belgrade, Studentski trg 12-16, 11158 Belgrade, Serbia
| | - Davorka Završnik
- University of Sarajevo-Faculty of Pharmacy, Zmaja od Bosne 8, 71000 Sarajevo, Bosnia and Herzegovina
| | - Uko Maran
- Institute of Chemistry, University of Tartu, Ravila Street 14a, 50411 Tartu, Estonia
| |
Collapse
|
2
|
Khan T, Hussain A, Siddique MUM, Altamimi MA, Malik A, Bhat ZR. HSPiP, Computational, and Thermodynamic Model-Based Optimized Solvents for Subcutaneous Delivery of Tolterodine Tartrate and GastroPlus‑Based In Vivo Prediction in Humans: Part II. AAPS PharmSciTech 2024; 25:160. [PMID: 38992299 DOI: 10.1208/s12249-024-02880-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2024] [Accepted: 06/22/2024] [Indexed: 07/13/2024] Open
Abstract
In part I, we reported Hansen solubility parameters (HSP, HSPiP program), experimental solubility at varied temperatures for TOTA delivery. Here, we studied dose volume selection, stability, pH, osmolality, dispersion, clarity, and viscosity of the explored combinations (I-VI). Ex vivo permeation and deposition studies were performed to observe relative diffusion rate from the injected site in rat skin. Confocal laser scanning microscopy (CLSM) study was conducted to support ex vivo findings. Moreover, GastroPlus predicted in vivo parameters in humans and the impact of various critical factors on pharmacokinetic parameters (PK). Immediate release product (IR) contained 60% of PEG400 whereas controlled release formulation (CR) contained PEG400 (60%), water (10%) and d-limonene (30%) to deliver 2 mg of TOTA. GastroPlus predicted the plasma drug concentration of weakly basic TOTA as function of pH (from pH 2.0 to 9). The cumulative drug permeation and drug deposition were found to be in the order as B-VI˃ C-VI˃A-VI across rat skin. This finding was further supported with CLSM. Moreover, IR and CR were predicted to achieve Cmax of 0.0038 µg/ mL and 0.00023 µg/mL, respectively, after sub-Q delivery. Added limonene in CR extended the plasma drug concentration over period of 12 h as predicted in GastroPlus. Parameters sensitivity analysis (PSA) assessment predicted that sub-Q blood flow rate is the only factor affecting PK parameters in IR formulation whereas this was insignificant for CR. Thus, sub-Q delivery CR would be promising alternative with ease of delivery to children and aged patient.
Collapse
Affiliation(s)
- Tasneem Khan
- Department of Pharmaceutics, School of Pharmaceutical Education and Research, Jamia Hamdard, New Delhi, 110062, India.
| | - Afzal Hussain
- Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, 11451, Saudi Arabia.
| | - Mohd Usman Mohd Siddique
- Department of Pharmaceutical Chemistry, Shri Vile Parle Kelavani Mandal's Institute of Pharmacy Dhule, Dhule, MH, 424001, India
| | - Mohammad A Altamimi
- Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Abdul Malik
- Department of Pharmaceutics, College of Pharmacy, King Saud University, Riyadh, 11451, Saudi Arabia
| | - Zahid Rafiq Bhat
- Department of Molecular and Cellular Oncology, MD Anderson Cancer Centre, Houston, Texas, USA
| |
Collapse
|
3
|
Martinez-Mayorga K, Rosas-Jiménez JG, Gonzalez-Ponce K, López-López E, Neme A, Medina-Franco JL. The pursuit of accurate predictive models of the bioactivity of small molecules. Chem Sci 2024; 15:1938-1952. [PMID: 38332817 PMCID: PMC10848664 DOI: 10.1039/d3sc05534e] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/09/2024] [Indexed: 02/10/2024] Open
Abstract
Property prediction is a key interest in chemistry. For several decades there has been a continued and incremental development of mathematical models to predict properties. As more data is generated and accumulated, there seems to be more areas of opportunity to develop models with increased accuracy. The same is true if one considers the large developments in machine and deep learning models. However, along with the same areas of opportunity and development, issues and challenges remain and, with more data, new challenges emerge such as the quality and quantity and reliability of the data, and model reproducibility. Herein, we discuss the status of the accuracy of predictive models and present the authors' perspective of the direction of the field, emphasizing on good practices. We focus on predictive models of bioactive properties of small molecules relevant for drug discovery, agrochemical, food chemistry, natural product research, and related fields.
Collapse
Affiliation(s)
- Karina Martinez-Mayorga
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José G Rosas-Jiménez
- Department of Theoretical Biophysics, IMPRS on Cellular Biophysics Max-von-Laue Strasse 3 Frankfurt am Main 60438 Germany
| | - Karla Gonzalez-Ponce
- Institute of Chemistry, Merida Unit, National Autonomous University of Mexico Merida-Tetiz Highway, Km. 4.5 Ucu Yucatan Mexico
| | - Edgar López-López
- Department of Chemistry and Graduate Program in Pharmacology, Center for Research and Advanced Studies of the National Polytechnic Institute Mexico City 07000 Mexico
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| | - Antonio Neme
- Institute for Applied Mathematics and Systems, Merida Research Unit, National Autonomous University of Mexico Sierra Papacal Merida Yucatan Mexico
| | - José L Medina-Franco
- DIFACQUIM Research Group, Department of Pharmacy, School of Chemistry National Autonomous University of Mexico Mexico City 04510 Mexico
| |
Collapse
|
4
|
Kotli M, Piir G, Maran U. Pesticide effect on earthworm lethality via interpretable machine learning. JOURNAL OF HAZARDOUS MATERIALS 2024; 461:132577. [PMID: 37793249 DOI: 10.1016/j.jhazmat.2023.132577] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Revised: 09/15/2023] [Accepted: 09/16/2023] [Indexed: 10/06/2023]
Abstract
Earthworms are among the most important animals (invertebrates) for soil health. Many chemical substances released into nature for agricultural development, such as pesticides, may have unwanted effects on those organisms. However, it is essential to understand the extent of the impact of chemicals on soil health first and then make the proper decisions for regulatory or commercial purposes. We hypothesize that there is an expressible quantitative structure-activity relationship (QSAR) between the structure of pesticide compounds and the acute toxicity effect of earthworm species Eisenia fetida. The description of this relationship allows for a better assessment of the impact of chemicals on the said earthworm. To describe this relationship, a dataset of chemicals was collected from open-access sources to develop a mathematical model. A novel approach, combining genetic algorithm and Bayesian optimization, was used to select structural features into the model and to optimize model parameters. The final QSAR classification model was created with the Random Forest algorithm and exhibited good prediction Accuracy of 0.78 on training set and 0.80 on test set. The model representation follows FAIR principles and is available on QsarDB.org.
Collapse
Affiliation(s)
- Mihkel Kotli
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Geven Piir
- University of Tartu, Institute of Chemistry, Tartu, Estonia
| | - Uko Maran
- University of Tartu, Institute of Chemistry, Tartu, Estonia.
| |
Collapse
|
5
|
Jovic O, Mouras R. Extreme Gradient Boosting Combined with Conformal Predictors for Informative Solubility Estimation. Molecules 2023; 29:19. [PMID: 38202602 PMCID: PMC10779886 DOI: 10.3390/molecules29010019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Revised: 12/15/2023] [Accepted: 12/17/2023] [Indexed: 01/12/2024] Open
Abstract
We used the extreme gradient boosting (XGB) algorithm to predict the experimental solubility of chemical compounds in water and organic solvents and to select significant molecular descriptors. The accuracy of prediction of our forward stepwise top-importance XGB (FSTI-XGB) on curated solubility data sets in terms of RMSE was found to be 0.59-0.76 Log(S) for two water data sets, while for organic solvent data sets it was 0.69-0.79 Log(S) for the Methanol data set, 0.65-0.79 for the Ethanol data set, and 0.62-0.70 Log(S) for the Acetone data set. That was the first step. In the second step, we used uncurated and curated AquaSolDB data sets for applicability domain (AD) tests of Drugbank, PubChem, and COCONUT databases and determined that more than 95% of studied ca. 500,000 compounds were within the AD. In the third step, we applied conformal prediction to obtain narrow prediction intervals and we successfully validated them using test sets' true solubility values. With prediction intervals obtained in the last fourth step, we were able to estimate individual error margins and the accuracy class of the solubility prediction for molecules within the AD of three public databases. All that was possible without the knowledge of experimental database solubilities. We find these four steps novel because usually, solubility-related works only study the first step or the first two steps.
Collapse
Affiliation(s)
| | - Rabah Mouras
- Pharmaceutical Manufacturing Technology Centre, Bernal Institute, Department of Chemical Sciences, University of Limerick, V94 T9PX Limerick, Ireland;
| |
Collapse
|
6
|
Preikša J, Petrikaitė V, Petrauskas V, Matulis D. Intrinsic Solubility of Ionizable Compounds from p Ka Shift. ACS OMEGA 2023; 8:44571-44577. [PMID: 38046347 PMCID: PMC10688098 DOI: 10.1021/acsomega.3c04071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 09/20/2023] [Indexed: 12/05/2023]
Abstract
Aqueous solubility of pharmaceutical substances plays an important role in small molecule drug discovery and development, with ionizable groups often employed to enhance solubility. Drug candidate compounds often contain ionizable groups to increase their solubility. Recognizing that the electrostatically charged form of the compound is much more soluble than the uncharged form, this work proposes a model to explore the relationship between the pKa shift of the ionizable group and dissolution equilibria. The model considers three forms of a compound: dissolved-charged, dissolved-uncharged, and aggregated-uncharged. It analyzes two linked equilibria: the protonation of the ionizable group and the dissolution-aggregation of the uncharged form, with the observed pKa shift depending on the total concentration of the compound. The active concentration of the aggregates determines this shift. The model was explored through the determination of the pKa shift and intrinsic solubility of specific compounds, such as ICPD47, a high-affinity inhibitor of the Hsp90 chaperone protein and anticancer target, as well as benzoic acid and benzydamine. The model holds the potential for a more nuanced understanding of intrinsic solubility and may lead to advancements in drug discovery and development.
Collapse
Affiliation(s)
- Joku̅bas Preikša
- Department
of Molecular Compound Physics, Center for
Physical Sciences and Technology, Savanoriu Ave. 231, Vilnius, LT-02300, Lithuania
- Department
of Biothermodynamics and Drug Design, Institute
of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| | - Vilma Petrikaitė
- Department
of Biothermodynamics and Drug Design, Institute
of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
- Laboratory
of Drug Targets Histopathology, Institute
of Cardiology, Lithuanian University of Health Sciences, Sukileliu pr. 13, Kaunas, LT-50162, Lithuania
| | - Vytautas Petrauskas
- Department
of Biothermodynamics and Drug Design, Institute
of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| | - Daumantas Matulis
- Department
of Biothermodynamics and Drug Design, Institute
of Biotechnology, Life Sciences Center, Vilnius University, Saulėtekio 7, Vilnius, LT-10257, Lithuania
| |
Collapse
|
7
|
Avdeef A. Mechanistically transparent models for predicting aqueous solubility of rigid, slightly flexible, and very flexible drugs (MW<2000) Accuracy near that of random forest regression. ADMET AND DMPK 2023; 11:317-330. [PMID: 37829322 PMCID: PMC10567068 DOI: 10.5599/admet.1879] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Revised: 08/15/2023] [Indexed: 10/14/2023] Open
Abstract
Yalkowsky's General Solubility Equation (GSE), with its three fixed constants, is popular and easy to apply, but is not very accurate for polar, zwitterionic, or flexible molecules. This review examines the findings of a series of studies, where we have sought to come up with a better prediction model, by comparing the performances of the GSE to Abraham's Solvation Equation (ABSOLV), and Random Forest regression (RFR) machine-learning (ML) method. Large, well-curated aqueous intrinsic solubility databases are available. However, drugs may be sparsely distributed in chemical space, concentrated in clusters. Even a large database might overlook some regions. Test compounds from under-represented portions of space may be poorly predicted, as might be the case with the 'loose' set of 32 drugs in the Second Solubility Challenge (2020). There appears to be still a need for better coverage of drug space. Increasingly, current trends in predictions of solubility use calculated input descriptors, which may be an advantage for exploring properties of molecules yet to be synthesized. The risk may be that overall prediction approaches might be based on accumulated uncertainty. The increasing use of ML/AI methods can lead to accurate predictions, but such predictions may not readily suggest the strategies to pursue in selecting yet-to-be-synthesized compounds. Based on our latest findings, we recommend predictions based on both 'grouped' ABSOLV(GRP) and 'Flexible Acceptor' GSE(Φ,B) models with the provided best-fit parameters, where Φ is the Kier molecular flexibility index and B is the Abraham H-bond acceptor strength. For molecules with Φ < 11, the prudent choice is to pick the Consensus Model, the average of ABSOLV(GRP) and GSE(Φ,B). For more flexible molecules, GSE(Φ,B) is recommended.
Collapse
|
8
|
Oja M, Sild S, Piir G, Maran U. Intrinsic Aqueous Solubility: Mechanistically Transparent Data-Driven Modeling of Drug Substances. Pharmaceutics 2022; 14:pharmaceutics14102248. [PMID: 36297685 PMCID: PMC9611068 DOI: 10.3390/pharmaceutics14102248] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Revised: 10/12/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open
Abstract
Intrinsic aqueous solubility is a foundational property for understanding the chemical, technological, pharmaceutical, and environmental behavior of drug substances. Despite years of solubility research, molecular structure-based prediction of the intrinsic aqueous solubility of drug substances is still under active investigation. This paper describes the authors’ systematic data-driven modelling in which two fit-for-purpose training data sets for intrinsic aqueous solubility were collected and curated, and three quantitative structure–property relationships were derived to make predictions for the most recent solubility challenge. All three models perform well individually, while being mechanistically transparent and easy to understand. Molecular descriptors involved in the models are related to the following key steps in the solubility process: dissociation of the molecule from the crystal, formation of a cavity in the solvent, and insertion of the molecule into the solvent. A consensus modeling approach with these models remarkably improved prediction capability and reduced the number of strong outliers by more than two times. The performance and outliers of the second solubility challenge predictions were analyzed retrospectively. All developed models have been published in the QsarDB.org repository according to FAIR principles and can be used without restrictions for exploring, downloading, and making predictions.
Collapse
Affiliation(s)
| | | | | | - Uko Maran
- Correspondence: ; Tel.: +372-7-375-254; Fax: +372-7-375-264
| |
Collapse
|