1
|
Cysewski P, Jeliński T, Przybyłek M, Mai A, Kułak J. Experimental and Machine-Learning-Assisted Design of Pharmaceutically Acceptable Deep Eutectic Solvents for the Solubility Improvement of Non-Selective COX Inhibitors Ibuprofen and Ketoprofen. Molecules 2024; 29:2296. [PMID: 38792157 PMCID: PMC11124057 DOI: 10.3390/molecules29102296] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 05/09/2024] [Accepted: 05/12/2024] [Indexed: 05/26/2024] Open
Abstract
Deep eutectic solvents (DESs) are commonly used in pharmaceutical applications as excellent solubilizers of active substances. This study investigated the tuning of ibuprofen and ketoprofen solubility utilizing DESs containing choline chloride or betaine as hydrogen bond acceptors and various polyols (ethylene glycol, diethylene glycol, triethylene glycol, glycerol, 1,2-propanediol, 1,3-butanediol) as hydrogen bond donors. Experimental solubility data were collected for all DES systems. A machine learning model was developed using COSMO-RS molecular descriptors to predict solubility. All studied DESs exhibited a cosolvency effect, increasing drug solubility at modest concentrations of water. The model accurately predicted solubility for ibuprofen, ketoprofen, and related analogs (flurbiprofen, felbinac, phenylacetic acid, diphenylacetic acid). A machine learning approach utilizing COSMO-RS descriptors enables the rational design and solubility prediction of DES formulations for improved pharmaceutical applications.
Collapse
Affiliation(s)
- Piotr Cysewski
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland; (T.J.); (M.P.)
| | | | | | | | | |
Collapse
|
2
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
3
|
Gheta SKO, Bonin A, Gerlach T, Göller AH. Predicting absolute aqueous solubility by applying a machine learning model for an artificially liquid-state as proxy for the solid-state. J Comput Aided Mol Des 2023; 37:765-789. [PMID: 37878216 DOI: 10.1007/s10822-023-00538-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 10/02/2023] [Indexed: 10/26/2023]
Abstract
In this study, we use machine learning algorithms with QM-derived COSMO-RS descriptors, along with Morgan fingerprints, to predict the absolute solubility of drug-like compounds. The QM-derived descriptors account for the molecular properties of the solute, i.e., the solute-solute interactions in an artificial-liquid-state (super-cooled liquid), and the solute-solvent interactions in solution. We employ two main approaches to predict solubility: (i) a hypothetical pathway that involves melting the solute at room temperature T = T¯ ([Formula: see text]) and mixing the artificially liquid solute into the solvent ([Formula: see text]). In this approach [Formula: see text] is predicted using machine learning models, and the [Formula: see text] is obtained from COSMO-RS calculations; (ii) direct solubility prediction using machine learning algorithms. The models were trained on a large number of Bayer in-house compounds for which water solubility data is available at physiological pH of 6.5 and ambient temperature. We also evaluated our models using external datasets from a solubility challenge. Our models present great improvements compared to the absolute solubility prediction with the QSAR model for the artificial liquid state as implemented in the COSMOtherm software, for both in-house and external datasets. We are furthermore able to demonstrate the superiority of QM-derived descriptors compared to cheminformatics descriptors. We finally present low-cost alternative models using fragment-based COSMOquick calculations with only marginal reduction in the quality of predicted solubility.
Collapse
Affiliation(s)
- Sadra Kashef Ol Gheta
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Anne Bonin
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany
| | - Thomas Gerlach
- Bayer AG, Crop Science, R&D, Digital Transformation, 40789, Monheim, Germany
- Bayer AG, Engineering & Technology, Thermal Separation Technologies, 51368, Leverkusen, Germany
| | - Andreas H Göller
- Bayer AG, Pharmaceuticals, R&D, Computational Molecular Design, 42096, Wuppertal, Germany.
| |
Collapse
|
4
|
Raza A, Chohan TA, Buabeid M, Arafa ESA, Chohan TA, Fatima B, Sultana K, Ullah MS, Murtaza G. Deep learning in drug discovery: a futuristic modality to materialize the large datasets for cheminformatics. J Biomol Struct Dyn 2023; 41:9177-9192. [PMID: 36305195 DOI: 10.1080/07391102.2022.2136244] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 10/08/2022] [Indexed: 10/31/2022]
Abstract
Artificial intelligence (AI) development imitates the workings of the human brain to comprehend modern problems. The traditional approaches such as high throughput screening (HTS) and combinatorial chemistry are lengthy and expensive to the pharmaceutical industry as they can only handle a smaller dataset. Deep learning (DL) is a sophisticated AI method that uses a thorough comprehension of particular systems. The pharmaceutical industry is now adopting DL techniques to enhance the research and development process. Multi-oriented algorithms play a crucial role in the processing of QSAR analysis, de novo drug design, ADME evaluation, physicochemical analysis, preclinical development, followed by clinical trial data precision. In this study, we investigated the performance of several algorithms, including deep neural networks (DNN), convolutional neural networks (CNN) and multi-task learning (MTL), with the aim of generating high-quality, interpretable big and diverse databases for drug design and development. Studies have demonstrated that CNN, recurrent neural network and deep belief network are compatible, accurate and effective for the molecular description of pharmacodynamic properties. In Covid-19, existing pharmacological compounds has also been repurposed using DL models. In the absence of the Covid-19 vaccine, remdesivir and oseltamivir have been widely employed to treat severe SARS-CoV-2 infections. In conclusion, the results indicate the potential benefits of employing the DL strategies in the drug discovery process.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ali Raza
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
| | - Talha Ali Chohan
- Institute of Molecular Biology and Biochemistry, The University of Lahore, Pakistan
- Institute of Pharmaceutical Science, UVAS, Lahore, Pakistan
| | - Manal Buabeid
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
| | - El-Shaima A Arafa
- Department of Clinical Sciences, College of Pharmacy and Health Sciences, Ajman University, Ajman, United Arab Emirates
- Centre of Medical and Bio-Allied Health Sciences Research, Ajman University, Ajman, United Arab Emirates
| | | | - Batool Fatima
- Department of biochemistry, Bahauddin Zakariya University, Multan, Pakistan
| | - Kishwar Sultana
- Department of pharmaceutical chemistry, Faculty of Pharmacy, The University of Lahore, Pakistan
| | - Malik Saad Ullah
- Department of Pharmacy, Government College University, Faisalabad, Pakistan
| | - Ghulam Murtaza
- Department of Pharmacy, COMSATS University Islamabad, Lahore Campus, Pakistan
| |
Collapse
|
5
|
Liu T, Johnson KR, Jansone-Popova S, Jiang DE. Advancing Rare-Earth Separation by Machine Learning. JACS AU 2022; 2:1428-1434. [PMID: 35783179 PMCID: PMC9241157 DOI: 10.1021/jacsau.2c00122] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 05/24/2022] [Accepted: 06/01/2022] [Indexed: 05/24/2023]
Abstract
Constituting the bulk of rare-earth elements, lanthanides need to be separated to fully realize their potential as critical materials in many important technologies. The discovery of new ligands for improving rare-earth separations by solvent extraction, the most practical rare-earth separation process, is still largely based on trial and error, a low-throughput and inefficient approach. A predictive model that allows high-throughput screening of ligands is needed to identify suitable ligands to achieve enhanced separation performance. Here, we show that deep neural networks, trained on the available experimental data, can be used to predict accurate distribution coefficients for solvent extraction of lanthanide ions, thereby opening the door to high-throughput screening of ligands for rare-earth separations. One innovative approach that we employed is a combined representation of ligands with both molecular physicochemical descriptors and atomic extended-connectivity fingerprints, which greatly boosts the accuracy of the trained model. More importantly, we synthesized four new ligands and found that the predicted distribution coefficients from our trained machine-learning model match well with the measured values. Therefore, our machine-learning approach paves the way for accelerating the discovery of new ligands for rare-earth separations.
Collapse
Affiliation(s)
- Tongyu Liu
- Department
of Chemistry, University of California, Riverside, California 92521, United States
| | - Katherine R. Johnson
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - Santa Jansone-Popova
- Chemical
Sciences Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee 37831, United States
| | - De-en Jiang
- Department
of Chemistry, University of California, Riverside, California 92521, United States
| |
Collapse
|
6
|
Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022; 7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
Collapse
|
7
|
Lee S, Lee M, Gyak KW, Kim SD, Kim MJ, Min K. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS OMEGA 2022; 7:12268-12277. [PMID: 35449985 PMCID: PMC9016862 DOI: 10.1021/acsomega.2c00697] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 03/18/2022] [Indexed: 05/27/2023]
Abstract
Predicting both accurate and reliable solubility values has long been a crucial but challenging task. In this work, surrogated model-based methods were developed to accurately predict the solubility of two molecules (solute and solvent) through machine learning and deep learning. The current study employed two methods: (1) converting molecules into molecular fingerprints and adding optimal physicochemical properties as descriptors and (2) using graph convolutional network (GCN) models to convert molecules into a graph representation and deal with prediction tasks. Then, two prediction tasks were conducted with each method: (1) the solubility value (regression) and (2) the solubility class (classification). The fingerprint-based method clearly demonstrates that high performance is possible by adding simple but significant physicochemical descriptors to molecular fingerprints, while the GCN method shows that it is possible to predict various properties of chemical compounds with relatively simplified features from the graph representation. The developed methodologies provide a comprehensive understanding of constructing a proper model for predicting solubility and can be employed to find suitable solutes and solvents.
Collapse
Affiliation(s)
- Sumin Lee
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Myeonghun Lee
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Ki-Won Gyak
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Sung Dug Kim
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Mi-Jeong Kim
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Kyoungmin Min
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
8
|
Nazarova AL, Yang L, Liu K, Mishra A, Kalia RK, Nomura KI, Nakano A, Vashishta P, Rajak P. Dielectric Polymer Property Prediction Using Recurrent Neural Networks with Optimizations. J Chem Inf Model 2021; 61:2175-2186. [PMID: 33871989 DOI: 10.1021/acs.jcim.0c01366] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Despite the growing success of machine learning for predicting structure-property relationships in molecules and materials, such as predicting the dielectric properties of polymers, it is still in its infancy. We report on the effectiveness of solving structure-property relationships for a computer-generated database of dielectric polymers using recurrent neural network (RNN) models. The implementation of a series of optimization strategies was crucial to achieving high learning speeds and sufficient accuracy: (1) binary and nonbinary representations of SMILES (Simplified Molecular Input Line System) fingerprints and (2) backpropagation with affine transformation of the input sequence (ATransformedBP) and resilient backpropagation with initial weight update parameter optimizations (iRPROP- optimized). For the investigated database of polymers, the binary SMILES representation was found to be superior to the decimal representation with respect to the training and prediction performance. All developed and optimized Elman-type RNN algorithms outperformed nonoptimized RNN models in the efficient prediction of nonlinear structure-activity relationships. The average relative standard deviation (RSD) remained well below 5%, and the maximum RSD did not exceed 30%. Moreover, we provide a C++ codebase as a testbed for a new generation of open programming languages that target increasingly diverse computer architectures.
Collapse
Affiliation(s)
- Antonina L Nazarova
- Department of Chemistry, Loker Hydrocarbon Research Institute, and USC Bridge Institue, University of Southern California, Los Angeles, California 90089, United States
| | - Liqiu Yang
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Kuang Liu
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Ankit Mishra
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Rajiv K Kalia
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Ken-Ichi Nomura
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Aiichiro Nakano
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Priya Vashishta
- Collaboratory of Advanced Computing and Simulations, Department of Computer Science, Department of Physics & Astronomy, Department of Chemical Engineering & Materials Science, and Department of Biological Sciences, University of Southern California, Los Angeles, California 90089, United States
| | - Pankaj Rajak
- Argonne Leadership Computing Facility, Argonne National Laboratory, Lemont, Illinois 60439, United States
| |
Collapse
|
9
|
Boobier S, Hose DRJ, Blacker AJ, Nguyen BN. Machine learning with physicochemical relationships: solubility prediction in organic solvents and water. Nat Commun 2020; 11:5753. [PMID: 33188226 PMCID: PMC7666209 DOI: 10.1038/s41467-020-19594-z] [Citation(s) in RCA: 77] [Impact Index Per Article: 19.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2020] [Accepted: 10/12/2020] [Indexed: 11/09/2022] Open
Abstract
Solubility prediction remains a critical challenge in drug development, synthetic route and chemical process design, extraction and crystallisation. Here we report a successful approach to solubility prediction in organic solvents and water using a combination of machine learning (ANN, SVM, RF, ExtraTrees, Bagging and GP) and computational chemistry. Rational interpretation of dissolution process into a numerical problem led to a small set of selected descriptors and subsequent predictions which are independent of the applied machine learning method. These models gave significantly more accurate predictions compared to benchmarked open-access and commercial tools, achieving accuracy close to the expected level of noise in training data (LogS ± 0.7). Finally, they reproduced physicochemical relationship between solubility and molecular properties in different solvents, which led to rational approaches to improve the accuracy of each models.
Collapse
Affiliation(s)
- Samuel Boobier
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK
| | - David R J Hose
- Chemical Development, Pharmaceutical Technology and Development, Operations, AstraZeneca, Macclesfield, SK10 2NA, UK
| | - A John Blacker
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK
| | - Bao N Nguyen
- Institute of Process Research & Development, School of Chemistry, University of Leeds, Woodhouse Lane, Leeds, LS2 9JT, UK.
| |
Collapse
|