1
|
Bao Z, Tom G, Cheng A, Watchorn J, Aspuru-Guzik A, Allen C. Towards the prediction of drug solubility in binary solvent mixtures at various temperatures using machine learning. J Cheminform 2024; 16:117. [PMID: 39468626 PMCID: PMC11520512 DOI: 10.1186/s13321-024-00911-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2024] [Accepted: 09/28/2024] [Indexed: 10/30/2024] Open
Abstract
Drug solubility is an important parameter in the drug development process, yet it is often tedious and challenging to measure, especially for expensive drugs or those available in small quantities. To alleviate these challenges, machine learning (ML) has been applied to predict drug solubility as an alternative approach. However, the majority of existing ML research has focused on the predictions of aqueous solubility and/or solubility at specific temperatures, which restricts the model applicability in pharmaceutical development. To bridge this gap, we compiled a dataset of 27,000 solubility datapoints, including solubility of small molecules measured in a range of binary solvent mixtures under various temperatures. Next, a panel of ML models were trained on this dataset with their hyperparameters tuned using Bayesian optimization. The resulting top-performing models, both gradient boosted decision trees (light gradient boosting machine and extreme gradient boosting), achieved mean absolute errors (MAE) of 0.33 for LogS (S in g/100 g) on the holdout set. These models were further validated through a prospective study, wherein the solubility of four drug molecules were predicted by the models and then validated with in-house solubility experiments. This prospective study demonstrated that the models accurately predicted the solubility of solutes in specific binary solvent mixtures under different temperatures, especially for drugs whose features closely align within the solutes in the dataset (MAE < 0.5 for LogS). To support future research and facilitate advancements in the field, we have made the dataset and code openly available. Scientific contribution Our research advances the state-of-the-art in predicting solubility for small molecules by leveraging ML and a uniquely comprehensive dataset. Unlike existing ML studies that predominantly focus on solubility in aqueous solvents at fixed temperatures, our work enables prediction of drug solubility in a variety of binary solvent mixtures over a broad temperature range, providing practical insights on the modeling of solubility for realistic pharmaceutical applications. These advancements along with the open access dataset and code support significant steps in the drug development process including new molecule discovery, drug analysis and formulation.
Collapse
Affiliation(s)
- Zeqing Bao
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, M5S 3M2, Canada
| | - Gary Tom
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
| | - Austin Cheng
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
| | | | - Alán Aspuru-Guzik
- Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6, Canada
- Department of Computer Science, University of Toronto, Toronto, ON, M5S 2E4, Canada
- Vector Institute for Artificial Intelligence, Toronto, ON, M5S 1M1, Canada
- Acceleration Consortium, Toronto, ON, M5S 3H6, Canada
- Lebovic Fellow, Canadian Institute for Advanced Research (CIFAR), Toronto, ON, M5S 1M1, Canada
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada
- Department of Materials Science and Engineering, University of Toronto, Toronto, ON, M5S 3E4, Canada
- CIFAR Artificial Intelligence Research Chair, Vector Institute, Toronto, ON, M5S 1M1, Canada
| | - Christine Allen
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, ON, M5S 3M2, Canada.
- Acceleration Consortium, Toronto, ON, M5S 3H6, Canada.
- Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, ON, M5S 3E5, Canada.
| |
Collapse
|
2
|
Ahmad W, Chong KT, Tayara H. GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction. J Chem Inf Model 2024; 64:7833-7843. [PMID: 39387596 DOI: 10.1021/acs.jcim.4c00792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug's absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
| |
Collapse
|
3
|
Zheng T, Mitchell JBO, Dobson S. Revisiting the Application of Machine Learning Approaches in Predicting Aqueous Solubility. ACS OMEGA 2024; 9:35209-35222. [PMID: 39157153 PMCID: PMC11325511 DOI: 10.1021/acsomega.4c06163] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 07/19/2024] [Accepted: 07/22/2024] [Indexed: 08/20/2024]
Abstract
The solubility of chemical substances in water is a critical parameter in pharmaceutical development, environmental chemistry, agrochemistry, and other fields; however, accurately predicting it remains a challenge. This study aims to evaluate and compare the effectiveness of some of the most popular machine learning modeling methods and molecular featurization techniques in predicting aqueous solubility. Although these methods were not implemented in a competitive environment, some of their performance surpassed previous benchmarks, offering gradual but significant improvements. Our results show that methods based on graph convolution and graph attention mechanisms demonstrated exceptional predictive abilities with high-quality data sets, albeit with a sensitivity to data noise and errors. In contrast, models leveraging molecular descriptors not only provided better interpretability but also showed more resilience when dealing with inherent noise and errors in data. Our analysis of over 4000 molecular descriptors used in various models identified that approximately 800 of these descriptors make a significant contribution to solubility prediction. These insights offer guidance and direction for future developments in solubility prediction.
Collapse
Affiliation(s)
- Tianyuan Zheng
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| | - John B. O. Mitchell
- EaStCHEM
School of Chemistry, University of St Andrews, St Andrews, Fife KY16 9ST, U.K.
| | - Simon Dobson
- School
of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, U.K.
| |
Collapse
|
4
|
Ramani V, Karmakar T. Graph Neural Networks for Predicting Solubility in Diverse Solvents Using MolMerger Incorporating Solute-Solvent Interactions. J Chem Theory Comput 2024. [PMID: 39041858 DOI: 10.1021/acs.jctc.4c00382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
The prediction of solubility is a complex and challenging physicochemical problem that has tremendous implications for the chemical and pharmaceutical industry. Recent advancements in machine learning methods have provided a great scope for predicting the reliable solubility of a large number of molecular systems. However, most of these methods rely on using physical properties obtained from experiments and expensive quantum chemical calculations. Here, we developed a method that utilizes a graphical representation of solute-solvent interactions using "MolMerger," which captures the strongest polar interactions between molecules using Gasteiger charges and creates a graph incorporating the true nature of the system. Using these graphs as input, a neural network learns the correlation between the structural properties of a molecule in the form of node embedding and its physicochemical properties as the output. This approach has been used to calculate molecular solubility by predicting the Log solubility values of various organic molecules and pharmaceuticals in diverse sets of solvents.
Collapse
Affiliation(s)
- Vansh Ramani
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| | - Tarak Karmakar
- Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
5
|
Ramos MC, White AD. Predicting small molecules solubility on endpoint devices using deep ensemble neural networks. DIGITAL DISCOVERY 2024; 3:786-795. [PMID: 38638648 PMCID: PMC11022985 DOI: 10.1039/d3dd00217a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/07/2024] [Indexed: 04/20/2024]
Abstract
Aqueous solubility is a valuable yet challenging property to predict. Computing solubility using first-principles methods requires accounting for the competing effects of entropy and enthalpy, resulting in long computations for relatively poor accuracy. Data-driven approaches, such as deep learning, offer improved accuracy and computational efficiency but typically lack uncertainty quantification. Additionally, ease of use remains a concern for any computational technique, resulting in the sustained popularity of group-based contribution methods. In this work, we addressed these problems with a deep learning model with predictive uncertainty that runs on a static website (without a server). This approach moves computing needs onto the website visitor without requiring installation, removing the need to pay for and maintain servers. Our model achieves satisfactory results in solubility prediction. Furthermore, we demonstrate how to create molecular property prediction models that balance uncertainty and ease of use. The code is available at https://github.com/ur-whitelab/mol.dev, and the model is useable at https://mol.dev.
Collapse
Affiliation(s)
- Mayk Caldas Ramos
- Chemical Engineer Department, University of Rochester Rochester NY 14642 USA
| | - Andrew D White
- Chemical Engineer Department, University of Rochester Rochester NY 14642 USA
| |
Collapse
|
6
|
Biehn SE, Goncalves LM, Lehmann J, Marty JD, Mueller C, Ramirez SA, Tillier F, Sage CR. BioPrint meets the AI age: development of artificial intelligence-based ADMET models for the drug-discovery platform SAFIRE. Future Med Chem 2024; 16:587-599. [PMID: 38372202 DOI: 10.4155/fmc-2024-0007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Accepted: 02/08/2024] [Indexed: 02/20/2024] Open
Abstract
Background: To prioritize compounds with a higher likelihood of success, artificial intelligence models can be used to predict absorption, distribution, metabolism, excretion and toxicity (ADMET) properties of molecules quickly and efficiently. Methods: Models were trained with BioPrint database proprietary data along with public datasets to predict various ADMET end points for the SAFIRE platform. Results: SAFIRE models performed at or above 75% accuracy and 0.4 Matthew's correlation coefficient with validation sets. Training with both proprietary and public data improved model performance and expanded the chemical space on which the models were trained. The platform features scoring functionality to guide user decision-making. Conclusion: High-quality datasets along with chemical space considerations yielded ADMET models performing favorably with utility in the drug discovery process.
Collapse
Affiliation(s)
- Sarah E Biehn
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | | | - Juerg Lehmann
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Jessica D Marty
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Christoph Mueller
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Samuel A Ramirez
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Fabien Tillier
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| | - Carleton R Sage
- Eurofins DiscoveryAI, Eurofins Panlabs, Inc., Saint Charles, MO 63304, USA
| |
Collapse
|
7
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
8
|
Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024; 15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open
Abstract
Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.
Collapse
Affiliation(s)
- Yeonjoon Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
- Department of Chemistry, Pukyong National University Busan 48513 Republic of Korea
| | - Hojin Jung
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Sabari Kumar
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Seonah Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
9
|
Ahmad W, Tayara H, Shim H, Chong KT. SolPredictor: Predicting Solubility with Residual Gated Graph Neural Network. Int J Mol Sci 2024; 25:715. [PMID: 38255790 PMCID: PMC10815788 DOI: 10.3390/ijms25020715] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 12/26/2023] [Accepted: 01/04/2024] [Indexed: 01/24/2024] Open
Abstract
Computational methods play a pivotal role in the pursuit of efficient drug discovery, enabling the rapid assessment of compound properties before costly and time-consuming laboratory experiments. With the advent of technology and large data availability, machine and deep learning methods have proven efficient in predicting molecular solubility. High-precision in silico solubility prediction has revolutionized drug development by enhancing formulation design, guiding lead optimization, and predicting pharmacokinetic parameters. These benefits result in considerable cost and time savings, resulting in a more efficient and shortened drug development process. The proposed SolPredictor is designed with the aim of developing a computational model for solubility prediction. The model is based on residual graph neural network convolution (RGNN). The RGNNs were designed to capture long-range dependencies in graph-structured data. Residual connections enable information to be utilized over various layers, allowing the model to capture and preserve essential features and patterns scattered throughout the network. The two largest datasets available to date are compiled, and the model uses a simplified molecular-input line-entry system (SMILES) representation. SolPredictor uses the ten-fold split cross-validation Pearson correlation coefficient R2 0.79±0.02 and root mean square error (RMSE) 1.03±0.04. The proposed model was evaluated using five independent datasets. Error analysis, hyperparameter optimization analysis, and model explainability were used to determine the molecular features that were most valuable for prediction.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - HyunJoo Shim
- School of Pharmacy, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
10
|
Ghahremanpour MM, Saar A, Tirado-Rives J, Jorgensen WL. Ensemble Geometric Deep Learning of Aqueous Solubility. J Chem Inf Model 2023; 63:7338-7349. [PMID: 37990484 DOI: 10.1021/acs.jcim.3c01536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Geometric deep learning is one of the main workhorses for harnessing the power of big data to predict molecular properties such as aqueous solubility, which is key to the pharmacokinetic improvement of drug candidates. Two ensembles of graph neural network architectures were built, one based on spectral convolution and the other on spatial convolution. The pretrained models, denoted respectively as SolNet-GCN and SolNet-GAT, significantly outperformed the existing neural networks benchmarked on a validation set of 207 molecules. The SolNet-GCN model demonstrated the best performance on both the training and validation sets, with RMSE values of 0.53 and 0.72 log molar unit and Pearson r2 values of 0.95 and 0.75, respectively. Further, the ranking power of the SolNet models agreed well with a QM-based thermodynamic cycle approach at the PBE-vdW level of theory on a series of benzophenylurea derivatives and a series of benzodiazepine derivatives. Nevertheless, testing the resultant models on a set of inhibitors of the macrophage migration inhibitory factor (MIF) illustrated that the inclusion of atomic attributes to discriminate atoms with a higher tendency to form intermolecular hydrogen bonds in the crystalline state and to identify planar or nonplanar substructures can be beneficial for the prediction of aqueous solubility.
Collapse
Affiliation(s)
| | - Anastasia Saar
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - Julian Tirado-Rives
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| | - William L Jorgensen
- Department of Chemistry, Yale University New Haven, Connecticut 06520-8107, United States
| |
Collapse
|
11
|
Yin T, Panapitiya G, Coda ED, Saldanha EG. Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction. J Cheminform 2023; 15:105. [PMID: 37941055 PMCID: PMC10633997 DOI: 10.1186/s13321-023-00753-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Accepted: 08/25/2023] [Indexed: 11/10/2023] Open
Abstract
Deep learning models have proven to be a powerful tool for the prediction of molecular properties for applications including drug design and the development of energy storage materials. However, in order to learn accurate and robust structure-property mappings, these models require large amounts of data which can be a challenge to collect given the time and resource-intensive nature of experimental material characterization efforts. Additionally, such models fail to generalize to new types of molecular structures that were not included in the model training data. The acceleration of material development through uncertainty-guided experimental design has the promise to significantly reduce the data requirements and enable faster generalization to new types of materials. To evaluate the potential of such approaches for electrolyte design applications, we perform comprehensive evaluation of existing uncertainty quantification methods on the prediction of two relevant molecular properties - aqueous solubility and redox potential. We develop novel evaluation methods to probe the utility of the uncertainty estimates for both in-domain and out-of-domain data sets. Finally, we leverage selected uncertainty estimation methods for active learning to evaluate their capacity to support experimental design.
Collapse
Affiliation(s)
- Tianzhixi Yin
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, USA.
| | - Gihan Panapitiya
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, USA
| | - Elizabeth D Coda
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, USA
- The University of California, San Diego, La Jolla, CA, USA
| | - Emily G Saldanha
- Pacific Northwest National Laboratory, 902 Battelle Blvd, Richland, WA, USA
| |
Collapse
|
12
|
Gouveia TIA, Alves A, Santos MSF. Theoretical rejection of fifty-four antineoplastic drugs by different nanofiltration membranes. ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH INTERNATIONAL 2023; 30:106099-106111. [PMID: 37723401 PMCID: PMC10579118 DOI: 10.1007/s11356-023-29830-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/07/2023] [Accepted: 09/07/2023] [Indexed: 09/20/2023]
Abstract
The rise of nanofiltration technologies holds great promise for creating more effective and affordable techniques aiming to remove undesirable pollutants from wastewaters. Despite nanofiltration's promising potential in removing antineoplastic drugs from liquid matrices, the limited information on this topic makes it important to estimate the rejection rates for a larger number of compounds, particularly the emerging ones, in order to preview the nanofiltration performance. Aiming to have preliminary estimations of the rejection rates of antineoplastic drugs by nanofiltration, 54 antineoplastic drugs were studied in 5 nanofiltration membranes (Desal 5DK, Desal HL, Trisep TS-80, NF270, and NF50), using a quantitative structure-activity relationship (QSAR) model. While this methodology provides useful and reliable predictions of the rejections of compounds by nanofiltration, particularly for hydrophilic and neutral compounds, it is important to note that QSAR results should always be corroborated by experimental assays, as predictions were confirmed to have their limitations (especially for hydrophobic and charged compounds). Out of the 54 studied antineoplastic drugs, 29 were predicted to have a rejection that could go up to 100%, independent of the membrane used. Nonetheless, there were 2 antineoplastic drugs, fluorouracil and thiotepa, for which negligible removals were obtained (<21%). This study's findings may contribute (i) to the selection of the most appropriate nanofiltration membranes for removing antineoplastic drugs from wastewaters and (ii) to assist in the design of effective treatment approaches for their removal.
Collapse
Affiliation(s)
- Teresa I A Gouveia
- LEPABE - Laboratory for Process, Environmental, Biotechnology and Energy Engineering, Faculty of Engineering, University of Porto, R. Dr. Roberto Frias, 4200-465, Porto, Portugal
- ALiCE - Associate Laboratory in Chemical Engineering, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
| | - Arminda Alves
- LEPABE - Laboratory for Process, Environmental, Biotechnology and Energy Engineering, Faculty of Engineering, University of Porto, R. Dr. Roberto Frias, 4200-465, Porto, Portugal
- ALiCE - Associate Laboratory in Chemical Engineering, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal
| | - Mónica S F Santos
- LEPABE - Laboratory for Process, Environmental, Biotechnology and Energy Engineering, Faculty of Engineering, University of Porto, R. Dr. Roberto Frias, 4200-465, Porto, Portugal.
- ALiCE - Associate Laboratory in Chemical Engineering, Faculty of Engineering, University of Porto, Rua Dr. Roberto Frias, 4200-465, Porto, Portugal.
- EPIUnit - Institute of Public Health, University of Porto, Rua das Taipas, no. 135, 4050-600, Porto, Portugal.
- ITR - Laboratory for Integrative and Translational Research in Population Health, University of Porto, Rua das Taipas, no. 135, 4050-600, Porto, Portugal.
| |
Collapse
|
13
|
Zhu X, Polyakov VR, Bajjuri K, Hu H, Maderna A, Tovee CA, Ward SC. Building Machine Learning Small Molecule Melting Points and Solubility Models Using CCDC Melting Points Dataset. J Chem Inf Model 2023; 63:2948-2959. [PMID: 37125691 DOI: 10.1021/acs.jcim.3c00308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/02/2023]
Abstract
Predicting solubility of small molecules is a very difficult undertaking due to the lack of reliable and consistent experimental solubility data. It is well known that for a molecule in a crystal lattice to be dissolved, it must, first, dissociate from the lattice and then, second, be solvated. The melting point of a compound is proportional to the lattice energy, and the octanol-water partition coefficient (log P) is a measure of the compound's solvation efficiency. The CCDC's melting point dataset of almost one hundred thousand compounds was utilized to create widely applicable machine learning models of small molecule melting points. Using the general solubility equation, the aqueous thermodynamic solubilities of the same compounds can be predicted. The global model could be easily localized by adding additional melting point measurements for a chemical series of interest.
Collapse
Affiliation(s)
- Xiangwei Zhu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Valery R Polyakov
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Krishna Bajjuri
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Huiyong Hu
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Andreas Maderna
- Sutro Biopharma, 111 Oyster Point Blvd, South San Francisco, California 94080, United States
| | - Clare A Tovee
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| | - Suzanna C Ward
- Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K
| |
Collapse
|
14
|
Tuttle MR, Brackman EM, Sorourifar F, Paulson J, Zhang S. Predicting the Solubility of Organic Energy Storage Materials Based on Functional Group Identity and Substitution Pattern. J Phys Chem Lett 2023; 14:1318-1325. [PMID: 36724735 DOI: 10.1021/acs.jpclett.3c00182] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
Organic electrode materials (OEMs) provide sustainable alternatives to conventional electrode materials based on transition metals. However, the application of OEMs in lithium-ion and redox flow batteries requires either low or high solubility. Currently, the identification of new OEM candidates relies on chemical intuition and trial-and-error experimental testing, which is costly and time intensive. Herein, we develop a simple empirical model that predicts the solubility of anthraquinones based on functional group identity and substitution pattern. Within this statistical scaffold, a training set of 18 anthraquinone derivatives allows us to predict the solubility of 808 quinones. Internal and external validations show that our model can predict the solubility of anthraquinones in battery electrolytes within log S ± 0.7, which is a much higher accuracy than existing solubility models. As a demonstration of the utility of our approach, we identified several new anthraquinones with low solubilities and successfully demonstrated their utility experimentally in Li-organic cells.
Collapse
Affiliation(s)
- Madison R Tuttle
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio43210, United States
| | - Emma M Brackman
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio43210, United States
| | - Farshud Sorourifar
- Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio43210, United States
| | - Joel Paulson
- Department of Chemical and Biomolecular Engineering, The Ohio State University, 151 W. Woodruff Avenue, Columbus, Ohio43210, United States
| | - Shiyu Zhang
- Department of Chemistry & Biochemistry, The Ohio State University, 100 West 18th Avenue, Columbus, Ohio43210, United States
| |
Collapse
|
15
|
Ahmad W, Tayara H, Chong KT. Attention-Based Graph Neural Network for Molecular Solubility Prediction. ACS OMEGA 2023; 8:3236-3244. [PMID: 36713733 PMCID: PMC9878542 DOI: 10.1021/acsomega.2c06702] [Citation(s) in RCA: 15] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Accepted: 12/23/2022] [Indexed: 06/18/2023]
Abstract
Drug discovery (DD) research is aimed at the discovery of new medications. Solubility is an important physicochemical property in drug development. Active pharmaceutical ingredients (APIs) are essential substances for high drug efficacy. During DD research, aqueous solubility (AS) is a key physicochemical attribute required for API characterization. High-precision in silico solubility prediction reduces the experimental cost and time of drug development. Several artificial tools have been employed for solubility prediction using machine learning and deep learning techniques. This study aims to create different deep learning models that can predict the solubility of a wide range of molecules using the largest currently available solubility data set. Simplified molecular-input line-entry system (SMILES) strings were used as molecular representation, models developed using simple graph convolution, graph isomorphism network, graph attention network, and AttentiveFP network. Based on the performance of the models, the AttentiveFP-based network model was finally selected. The model was trained and tested on 9943 compounds. The model outperformed on 62 anticancer compounds with metric Pearson correlation R 2 and root-mean-square error values of 0.52 and 0.61, respectively. AS can be improved by graph algorithm improvement or more molecular properties addition.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
| | - Hilal Tayara
- School
of International Engineering and Science, Jeonbuk National University, Jeonju54896, South Korea
| | - Kil To Chong
- Department
of Electronics and Information Engineering, Jeonbuk National University, Jeonju54896, South Korea
- Advanced
Electronics and Information Research Center, Jeonbuk National University, Jeonju54896, South Korea
| |
Collapse
|
16
|
Cysewski P, Jeliński T, Przybyłek M, Nowak W, Olczak M. Solubility Characteristics of Acetaminophen and Phenacetin in Binary Mixtures of Aqueous Organic Solvents: Experimental and Deep Machine Learning Screening of Green Dissolution Media. Pharmaceutics 2022; 14:pharmaceutics14122828. [PMID: 36559321 PMCID: PMC9781932 DOI: 10.3390/pharmaceutics14122828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 12/10/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
The solubility of active pharmaceutical ingredients is a mandatory physicochemical characteristic in pharmaceutical practice. However, the number of potential solvents and their mixtures prevents direct measurements of all possible combinations for finding environmentally friendly, operational and cost-effective solubilizers. That is why support from theoretical screening seems to be valuable. Here, a collection of acetaminophen and phenacetin solubility data in neat and binary solvent mixtures was used for the development of a nonlinear deep machine learning model using new intuitive molecular descriptors derived from COSMO-RS computations. The literature dataset was augmented with results of new measurements in aqueous binary mixtures of 4-formylmorpholine, DMSO and DMF. The solubility values back-computed with the developed ensemble of neural networks are in perfect agreement with the experimental data, which enables the extensive screening of many combinations of solvents not studied experimentally within the applicability domain of the trained model. The final predictions were presented not only in the form of the set of optimal hyperparameters but also in a more intuitive way by the set of parameters of the Jouyban-Acree equation often used in the co-solvency domain. This new and effective approach is easily extendible to other systems, enabling the fast and reliable selection of candidates for new solvents and directing the experimental solubility screening of active pharmaceutical ingredients.
Collapse
|
17
|
Wu J, Wang J, Wu Z, Zhang S, Deng Y, Kang Y, Cao D, Hsieh CY, Hou T. ALipSol: An Attention-Driven Mixture-of-Experts Model for Lipophilicity and Solubility Prediction. J Chem Inf Model 2022; 62:5975-5987. [PMID: 36417544 DOI: 10.1021/acs.jcim.2c01290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Lipophilicity (logD) and aqueous solubility (logSw) play a central role in drug development. The accurate prediction of these properties remains to be solved due to data scarcity. Current methodologies neglect the intrinsic relationships between physicochemical properties and usually ignore the ionization effects. Here, we propose an attention-driven mixture-of-experts (MoE) model named ALipSol, which explicitly reproduces the hierarchy of task relationships. We adopt the principle of divide-and-conquer by breaking down the complex end point (logD or logSw) into simpler ones (acidic pKa, basic pKa, and logP) and allocating a specific expert network for each subproblem. Subsequently, we implement transfer learning to extract knowledge from related tasks, thus alleviating the dilemma of limited data. Additionally, we substitute the gating network with an attention mechanism to better capture the dynamic task relationships on a per-example basis. We adopt local fine-tuning and consensus prediction to further boost model performance. Extensive evaluation experiments verify the success of the ALipSol model, which achieves RMSE improvement of 8.04%, 2.49%, 8.57%, 12.8%, and 8.60% on the Lipop, ESOL, AqSolDB, external logD, and external logS data sets, respectively, compared with Attentive FP and the state-of-the-art in silico tools. In particular, our model yields more significant advantages (Welch's t-test) for small training data, implying its high robustness and generalizability. The interpretability analysis proves that the atom contributions learned by ALipSol are more reasonable compared with the vanilla Attentive FP, and the substitution effects in benzene derivatives agreed well with empirical constants, revealing the potential of our model to extract useful patterns from data and provide guidance for lead optimization.
Collapse
Affiliation(s)
- Jialu Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Junmei Wang
- Department of Pharmaceutical Sciences and Computational Chemical Genomics Screening Center, School of Pharmacy, University of Pittsburgh, Pittsburgh, Pennsylvania15261, United States
| | - Zhenxing Wu
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China.,CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Shengyu Zhang
- Tencent Quantum Laboratory, Tencent, Shenzhen, 518057Guangdong, P. R. China
| | - Yafeng Deng
- CarbonSilicon AI Technology Co., Ltd, Hangzhou, 310018Zhejiang, P. R. China
| | - Yu Kang
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, 410004Hunan, P. R. China
| | - Chang-Yu Hsieh
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| | - Tingjun Hou
- Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058Zhejiang, P. R. China
| |
Collapse
|
18
|
Gao P, Andersen A, Sepulveda J, Panapitiya GU, Hollas A, Saldanha EG, Murugesan V, Wang W. SOMAS: a platform for data-driven material discovery in redox flow battery development. Sci Data 2022; 9:740. [PMID: 36456604 PMCID: PMC9715657 DOI: 10.1038/s41597-022-01814-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2021] [Accepted: 10/31/2022] [Indexed: 12/05/2022] Open
Abstract
Aqueous organic redox flow batteries offer an environmentally benign, tunable, and safe route to large-scale energy storage. The energy density is one of the key performance parameters of organic redox flow batteries, which critically depends on the solubility of the redox-active molecule in water. Prediction of aqueous solubility remains a challenge in chemistry. Recently, machine learning models have been developed for molecular properties prediction in chemistry and material science. The fidelity of a machine learning model critically depends on the diversity, accuracy, and abundancy of the training datasets. We build a comprehensive open access organic molecular database "Solubility of Organic Molecules in Aqueous Solution" (SOMAS) containing about 12,000 molecules that covers wider chemical and solubility regimes suitable for aqueous organic redox flow battery development efforts. In addition to experimental solubility, we also provide eight distinctive quantum descriptors including optimized geometry derived from high-throughput density functional theory calculations along with six molecular descriptors for each molecule. SOMAS builds a critical foundation for future efforts in artificial intelligence-based solubility prediction models.
Collapse
Affiliation(s)
- Peiyuan Gao
- Physical and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
| | - Amity Andersen
- Environmental Molecular Sciences Laboratory, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Jonathan Sepulveda
- Energy and Environment Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Gihan U Panapitiya
- National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Aaron Hollas
- Energy and Environment Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Emily G Saldanha
- National Security Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA
| | - Vijayakumar Murugesan
- Physical and Computational Sciences Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
| | - Wei Wang
- Energy and Environment Directorate, Pacific Northwest National Laboratory, Richland, WA, 99354, USA.
| |
Collapse
|
19
|
Li M, Chen H, Zhang H, Zeng M, Chen B, Guan L. Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm. ACS OMEGA 2022; 7:42027-42035. [PMID: 36440111 PMCID: PMC9685740 DOI: 10.1021/acsomega.2c03885] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 10/18/2022] [Indexed: 06/16/2023]
Abstract
Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R 2 of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.
Collapse
|
20
|
Tevosyan A, Khondkaryan L, Khachatrian H, Tadevosyan G, Apresyan L, Babayan N, Stopper H, Navoyan Z. Improving VAE based molecular representations for compound property prediction. J Cheminform 2022; 14:69. [PMID: 36242073 PMCID: PMC9569108 DOI: 10.1186/s13321-022-00648-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Accepted: 10/01/2022] [Indexed: 11/25/2022] Open
Abstract
Collecting labeled data for many important tasks in chemoinformatics is time consuming and requires expensive experiments. In recent years, machine learning has been used to learn rich representations of molecules using large scale unlabeled molecular datasets and transfer the knowledge to solve the more challenging tasks with limited datasets. Variational autoencoders are one of the tools that have been proposed to perform the transfer for both chemical property prediction and molecular generation tasks. In this work we propose a simple method to improve chemical property prediction performance of machine learning models by incorporating additional information on correlated molecular descriptors in the representations learned by variational autoencoders. We verify the method on three property prediction tasks. We explore the impact of the number of incorporated descriptors, correlation between the descriptors and the target properties, sizes of the datasets etc. Finally, we show the relation between the performance of property prediction models and the distance between property prediction dataset and the larger unlabeled dataset in the representation space.
Collapse
Affiliation(s)
- Ani Tevosyan
- YerevaNN, Charents str. 20, 0025, Yerevan, Armenia
| | - Lusine Khondkaryan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Hrant Khachatrian
- YerevaNN, Charents str. 20, 0025, Yerevan, Armenia.,Yerevan State University, Alex Manoogian str. 1, 0025, Yerevan, Armenia
| | - Gohar Tadevosyan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Lilit Apresyan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia
| | - Nelly Babayan
- Laboratory of Cell Technologies, Institute of Molecular Biology, National Academy of Sciences of RA, Hasratyan str. 7, 0014, Yerevan, Armenia.,, Toxometris.ai, Sarmen str. 7, 0009, Yerevan, Armenia
| | - Helga Stopper
- Department of Toxicology, Institute of Pharmacology and Toxicology, University of Würzburg, Versbacher str. 9, 97078, Würzburg, Germany
| | - Zaven Navoyan
- , Toxometris.ai, Sarmen str. 7, 0009, Yerevan, Armenia.
| |
Collapse
|
21
|
Vijayan S, Loganathan C, Sakayanathan P, Thayumanavan P. Synthesis and Characterization of Plumbagin S-Allyl Cysteine Ester: Determination of Anticancer Activity In Silico and In Vitro. Appl Biochem Biotechnol 2022; 194:5827-5847. [PMID: 35819687 DOI: 10.1007/s12010-022-04079-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2022] [Accepted: 07/05/2022] [Indexed: 11/29/2022]
Abstract
In recent years, derivatives of natural compounds are synthesized to increase the bioavailability, pharmacology, and pharmacokinetics properties. The naphthoquinone, plumbagin (PLU), is well known for its anticancer activity. However, the clinical use of PLU is hindered due to its toxicity. Previous reports have shown that modification of PLU at 5'-hydroxyl group has reduced its toxicity towards normal cell line. In accordance, in the present study, 5'-hydroxyl group of PLU was esterified with S-allyl cysteine (SAC) to obtain PLU-SAC ester. The drug-likeness of PLU-SAC was understood by in silico ADME analysis. PLU-SAC was characterized by UV-visible spectroscopy, mass spectroscopy, and nuclear magnetic resonance (NMR) spectroscopy. Molecular docking and dynamics simulation analysis revealed the interaction of PLU-SAC with proteins of interest in cancer therapy such as human estrogen receptor α, tumor protein p53 negative regulator mouse double minute 2, and cyclin-dependent kinase 2. MMGBSA calculation showed the favorable binding energy which in turn demonstrated the stable binding of PLU-SAC with these proteins. PLU-SAC showed apoptosis in breast cancer cell line (MCF-7) by inducing oxidative stress, disturbing mitochondrial function, arresting cells at G1 phase of cell cycle, and initiating DNA fragmentation. However, PLU-SAC did not show toxicity towards normal Vero cell line. PLU-SAC was synthesized and structurally characterized, and its anticancer activity was determined by in silico and in vitro analysis.
Collapse
Affiliation(s)
- Sudha Vijayan
- Department of Biochemistry, Periyar University, Salem, Tamil Nadu, 636011, India
| | - Chitra Loganathan
- Department of Biochemistry, Periyar University, Salem, Tamil Nadu, 636011, India.,Research and Development Center, Bioinnov Solutions LLP, Salem, Tamil Nadu, 636002, India
| | | | | |
Collapse
|
22
|
Deng C, Liang L, Xing G, Hua Y, Lu T, Zhang Y, Chen Y, Liu H. Multi-channel GCN ensembled machine learning model for molecular aqueous solubility prediction on a clean dataset. Mol Divers 2022:10.1007/s11030-022-10465-x. [PMID: 35739374 DOI: 10.1007/s11030-022-10465-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Accepted: 05/19/2022] [Indexed: 10/17/2022]
Abstract
This study constructed a new aqueous solubility dataset and a solubility regression model which was ensembled by GCN and machine learning models. Aqueous solubility is a key physiochemical property of small molecules in drug discovery. In the past few decades, there have been many studies about solubility prediction. However, many of these studies have high root mean squared error (RMSE). Meanwhile, their dataset always contains salt compounds and solubility data obtained from different experimental conditions. In this paper, we constructed a clean dataset with 2609 compounds, which was small but contains only solubility records without salts at the same temperatures (25 °C). Here, we applied graph convolutional neural network (GCN) to construct an aqueous solubility prediction model. To enhance the performance of the model, the molecular MACCS key fingerprints and physiochemical descriptors were also combined with the GCN model to build a multi-channel model. Additionally, the authors also built two machine learning models (support vector regression and gradient boost decision tree) and assembled them to the GCN model to improve the root mean squared error (RMSE = 0.665). Finally, comparative experiments have shown that our framework achieved the best performance on ESOL dataset (RMSEval = 0.56, RMSEtest = 0.44) and surpassed four established software on aqueous solubility prediction of new compounds.
Collapse
Affiliation(s)
- Chenglong Deng
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Guomeng Xing
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yi Hua
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.,State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing, 210009, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing, 211198, China.
| |
Collapse
|
23
|
Vermeire FH, Chung Y, Green WH. Predicting Solubility Limits of Organic Solutes for a Wide Range of Solvents and Temperatures. J Am Chem Soc 2022; 144:10785-10797. [PMID: 35687887 DOI: 10.1021/jacs.2c01768] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
The solubility of organic molecules is crucial in organic synthesis and industrial chemistry; it is important in the design of many phase separation and purification units, and it controls the migration of many species into the environment. To decide which solvents and temperatures can be used in the design of new processes, trial and error is often used, as the choice is restricted by unknown solid solubility limits. Here, we present a fast and convenient computational method for estimating the solubility of solid neutral organic molecules in water and many organic solvents for a broad range of temperatures. The model is developed by combining fundamental thermodynamic equations with machine learning models for solvation free energy, solvation enthalpy, Abraham solute parameters, and aqueous solid solubility at 298 K. We provide free open-source and online tools for the prediction of solid solubility limits and a curated data collection (SolProp) that includes more than 5000 experimental solid solubility values for validation of the model. The model predictions are accurate for aqueous systems and for a huge range of organic solvents up to 550 K or higher. Methods to further improve solid solubility predictions by providing experimental data on the solute of interest in another solvent, or on the solute's sublimation enthalpy, are also presented.
Collapse
Affiliation(s)
- Florence H Vermeire
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - Yunsie Chung
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - William H Green
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| |
Collapse
|
24
|
Panapitiya G, Girard M, Hollas A, Sepulveda J, Murugesan V, Wang W, Saldanha E. Evaluation of Deep Learning Architectures for Aqueous Solubility Prediction. ACS OMEGA 2022; 7:15695-15710. [PMID: 35571767 PMCID: PMC9096921 DOI: 10.1021/acsomega.2c00642] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 04/11/2022] [Indexed: 05/17/2023]
Abstract
Determining the aqueous solubility of molecules is a vital step in many pharmaceutical, environmental, and energy storage applications. Despite efforts made over decades, there are still challenges associated with developing a solubility prediction model with satisfactory accuracy for many of these applications. The goals of this study are to assess current deep learning methods for solubility prediction, develop a general model capable of predicting the solubility of a broad range of organic molecules, and to understand the impact of data properties, molecular representation, and modeling architecture on predictive performance. Using the largest currently available solubility data set, we implement deep learning-based models to predict solubility from the molecular structure and explore several different molecular representations including molecular descriptors, simplified molecular-input line-entry system strings, molecular graphs, and three-dimensional atomic coordinates using four different neural network architectures-fully connected neural networks, recurrent neural networks, graph neural networks (GNNs), and SchNet. We find that models using molecular descriptors achieve the best performance, with GNN models also achieving good performance. We perform extensive error analysis to understand the molecular properties that influence model performance, perform feature analysis to understand which information about the molecular structure is most valuable for prediction, and perform a transfer learning and data size study to understand the impact of data availability on model performance.
Collapse
Affiliation(s)
- Gihan Panapitiya
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Michael Girard
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Aaron Hollas
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Jonathan Sepulveda
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | | | - Wei Wang
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| | - Emily Saldanha
- Pacific Northwest National
Laboratory, Richland, Washington 99352, United States
| |
Collapse
|
25
|
Modeling of the Crystallization Conditions for Organic Synthesis Product Purification Using Deep Learning. ELECTRONICS 2022. [DOI: 10.3390/electronics11091360] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Crystallization is an important purification technique for solid products in a chemical laboratory. However, the correct selection of a solvent is important for the success of the procedure. In order to accelerate the solvent or solvent mixture search process, we offer an in silico alternative, i.e., a never previously demonstrated approach that can model the reaction mixture crystallization conditions which are invariant to the reaction type. The offered deep learning-based method is trained to directly predict the solvent labels used in the crystallization steps of the synthetic procedure. Our solvent label prediction task is a multi-label multi-class classification task during which the method must correctly choose one or several solvents from 13 possible examples. During the experimental investigation, we tested two multi-label classifiers (i.e., Feed-Forward and Long Short-Term Memory neural networks) applied on top of vectors. For the vectorization, we used two methods (i.e., extended-connectivity fingerprints and autoencoders) with various parameters. Our optimized technique was able to reach the accuracy of 0.870 ± 0.004 (which is 0.693 above the baseline) on the testing dataset. This allows us to assume that the proposed approach can help to accelerate manual R&D processes in chemical laboratories.
Collapse
|
26
|
Lee S, Lee M, Gyak KW, Kim SD, Kim MJ, Min K. Novel Solubility Prediction Models: Molecular Fingerprints and Physicochemical Features vs Graph Convolutional Neural Networks. ACS OMEGA 2022; 7:12268-12277. [PMID: 35449985 PMCID: PMC9016862 DOI: 10.1021/acsomega.2c00697] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 03/18/2022] [Indexed: 05/27/2023]
Abstract
Predicting both accurate and reliable solubility values has long been a crucial but challenging task. In this work, surrogated model-based methods were developed to accurately predict the solubility of two molecules (solute and solvent) through machine learning and deep learning. The current study employed two methods: (1) converting molecules into molecular fingerprints and adding optimal physicochemical properties as descriptors and (2) using graph convolutional network (GCN) models to convert molecules into a graph representation and deal with prediction tasks. Then, two prediction tasks were conducted with each method: (1) the solubility value (regression) and (2) the solubility class (classification). The fingerprint-based method clearly demonstrates that high performance is possible by adding simple but significant physicochemical descriptors to molecular fingerprints, while the GCN method shows that it is possible to predict various properties of chemical compounds with relatively simplified features from the graph representation. The developed methodologies provide a comprehensive understanding of constructing a proper model for predicting solubility and can be employed to find suitable solutes and solvents.
Collapse
Affiliation(s)
- Sumin Lee
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Myeonghun Lee
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| | - Ki-Won Gyak
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Sung Dug Kim
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Mi-Jeong Kim
- Polymer
Research Lab, Samsung Advanced Institute of Technology, 130 Samsung-ro, Suwon, Gyeonggi-do 16678, Republic of Korea
| | - Kyoungmin Min
- Department
of Industrial and Information Systems Engineering, School of Systems
Biomedical Science, School of Mechanical Engineering, Soongsil
University, 369 Sangdo-ro, Dongjak-gu, Seoul 06978, Republic of Korea
| |
Collapse
|
27
|
Rowaiye AB, Ogugua AJ, Ibeanu G, Bur D, Asala MT, Ogbeide OB, Abraham EO, Usman HB. Identifying potential natural inhibitors of Brucella melitensis Methionyl-tRNA synthetase through an in-silico approach. PLoS Negl Trop Dis 2022; 16:e0009799. [PMID: 35312681 PMCID: PMC8970508 DOI: 10.1371/journal.pntd.0009799] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 03/31/2022] [Accepted: 02/16/2022] [Indexed: 12/12/2022] Open
Abstract
Background Brucellosis is an infectious disease caused by bacteria of the genus Brucella. Although it is the most common zoonosis worldwide, there are increasing reports of drug resistance and cases of relapse after long term treatment with the existing drugs of choice. This study therefore aims at identifying possible natural inhibitors of Brucella melitensis Methionyl-tRNA synthetase through an in-silico approach. Methods Using PyRx 0.8 virtual screening software, the target was docked against a library of natural compounds obtained from edible African plants. The compound, 2-({3-[(3,5-dichlorobenzyl) amino] propyl} amino) quinolin-4(1H)-one (OOU) which is a co-crystallized ligand with the target was used as the reference compound. Screening of the molecular descriptors of the compounds for bioavailability, pharmacokinetic properties, and bioactivity was performed using the SWISSADME, pkCSM, and Molinspiration web servers respectively. The Fpocket and PLIP webservers were used to perform the analyses of the binding pockets and the protein ligand interactions. Analysis of the time-resolved trajectories of the Apo and Holo forms of the target was performed using the Galaxy and MDWeb servers. Results The lead compounds, Strophanthidin and Isopteropodin are present in Corchorus olitorius and Uncaria tomentosa (Cat’s-claw) plants respectively. Isopteropodin had a binding affinity score of -8.9 kcal / ml with the target and had 17 anti-correlating residues in Pocket 1 after molecular dynamics simulation. The complex formed by Isopteropodin and the target had a total RMSD of 4.408 and a total RMSF of 9.8067. However, Strophanthidin formed 3 hydrogen bonds with the target at ILE21, GLY262 and LEU294, and induced a total RMSF of 5.4541 at Pocket 1. Conclusion Overall, Isopteropodin and Strophanthidin were found to be better drug candidates than OOU and they showed potentials to inhibit the Brucella melitensis Methionyl-tRNA synthetase at Pocket 1, hence abilities to treat brucellosis. In-vivo and in-vitro investigations are needed to further evaluate the efficacy and toxicity of the lead compounds. The cure for brucellosis involves a long course of treatment with a combination of antibiotics. However, some of the drugs are not recommended for very young children and pregnant women. Moreover, cases of relapse and resistance to these drugs are reported. With the Brucella Methionyl-tRNA synthetase as a target, molecular docking and virtual screening was used to identify possible drug candidates from a library of 1524 compounds obtained from edible African plants. Two lead compounds, Strophanthidin and Isopteropodin usually present in Corchorus olitorius and Uncaria tomentosa (Cat’s claw) plants showed potentials to inhibit the Brucella melitensis Methionyl-tRNA synthetase. Their bioactivities were also confirmed in their molecular dynamic simulation with the target protein. Consequently, both compounds have potentials for safety and efficacy in the treatment of brucellosis.
Collapse
Affiliation(s)
| | - Akwoba Joseph Ogugua
- Department of Veterinary Public Health and Preventive Medicine, University of Nigeria, Nsukka, Nigeria
- * E-mail:
| | - Gordon Ibeanu
- Department of Pharmaceutical Science, North Carolina Central University, Durham, North Carolina, United States of America
| | - Doofan Bur
- Department of Medical Biotechnology, National Biotechnology Development Agency, Abuja, Nigeria
| | - Mercy Titilayo Asala
- Department of Medical Biotechnology, National Biotechnology Development Agency, Abuja, Nigeria
| | | | | | - Hamzah Bundu Usman
- Department of Plant Science and Biotechnology, Federal University Gusau, Gusau, Nigeria
| |
Collapse
|
28
|
Accurate Physical Property Predictions via Deep Learning. Molecules 2022; 27:molecules27051668. [PMID: 35268770 PMCID: PMC8912091 DOI: 10.3390/molecules27051668] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/01/2022] [Accepted: 03/01/2022] [Indexed: 02/01/2023] Open
Abstract
Neural networks and deep learning have been successfully applied to tackle problems in drug discovery with increasing accuracy over time. There are still many challenges and opportunities to improve molecular property predictions with satisfactory accuracy even further. Here, we proposed a deep-learning architecture model, namely Bidirectional long short-term memory with Channel and Spatial Attention network (BCSA), of which the training process is fully data-driven and end to end. It is based on data augmentation and SMILES tokenization technology without relying on auxiliary knowledge, such as complex spatial structure. In addition, our model takes the advantages of the long- and short-term memory network (LSTM) in sequence processing. The embedded channel and spatial attention modules in turn specifically identify the prime factors in the SMILES sequence for predicting properties. The model was further improved by Bayesian optimization. In this work, we demonstrate that the trained BSCA model is capable of predicting aqueous solubility. Furthermore, our proposed method shows noticeable superiorities and competitiveness in predicting oil-water partition coefficient, when compared with state-of-the-art graphs models, including graph convoluted network (GCN), message-passing neural network (MPNN), and AttentiveFP.
Collapse
|
29
|
State-of-the-Art Review of Artificial Neural Networks to Predict, Characterize and Optimize Pharmaceutical Formulation. Pharmaceutics 2022; 14:pharmaceutics14010183. [PMID: 35057076 PMCID: PMC8779224 DOI: 10.3390/pharmaceutics14010183] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 12/29/2021] [Accepted: 01/06/2022] [Indexed: 11/30/2022] Open
Abstract
During the development of a pharmaceutical formulation, a powerful tool is needed to extract the key points from the complicated process parameters and material attributes. Artificial neural networks (ANNs), a promising and more flexible modeling technique, can address real intricate questions in a high parallelism and distributed pattern in the manner of biological neural networks. The data mined and analyzing based on ANNs have the ability to replace hundreds of trial and error experiments. ANNs have been used for data analysis by pharmaceutics researchers since the 1990s and it has now become a research method in pharmaceutical science. This review focuses on the latest application progress of ANNs in the prediction, characterization and optimization of pharmaceutical formulation to provide a reference for the further interdisciplinary study of pharmaceutics and ANNs.
Collapse
|
30
|
Shin HK. Topological Distance-Based Electron Interaction Tensor to Apply a Convolutional Neural Network on Drug-like Compounds. ACS OMEGA 2021; 6:35757-35768. [PMID: 34984306 PMCID: PMC8717557 DOI: 10.1021/acsomega.1c05693] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2021] [Accepted: 12/08/2021] [Indexed: 05/15/2023]
Abstract
Deep learning (DL) models in quantitative structure-activity relationship fed the molecular structure directly to the network without using human-designed descriptors by representing molecule as a graph or string (e.g., SMILES code). However, these two representations were oversimplification of real molecules to reflect chemical properties of molecular structures. Given that the choice of molecular representation determines the architecture of the DL model to apply, a novel way of molecular representation can open a way to apply diverse DL networks developed and used in other fields. A topological distance-based electron interaction (TDEi) tensor has been developed in this study inspired by the quantum mechanical model of the molecule, which defines a molecule with electrons and protons. In the TDEi tensor, the atomic orbital (AO) of each atom is represented by an electron configuration (EC) vector, which is a bit string based on the presence and absence of electrons in each AO according to spin indicated by positive and negative signs. Interactions between EC vectors were calculated based on the topological distance between atoms in a molecule. As a molecular structure was translated into 3D array, CNN models (modified VGGNet) were applied using a TDEi tensor to predict four physicochemical properties of drug-like compound datasets: MP (275,131), Lipop (4193), Esol (1127), and Freesolv (639). Models achieved good prediction accuracy. PCA showed that a stronger correlation was observed between the extracted features and the target endpoint as features were extracted from the deeper layer.
Collapse
Affiliation(s)
- Hyun Kil Shin
- Department
of Predictive Toxicology, Korea Institute
of Toxicology, Daejeon 34114, Republic of Korea
- Human
and Environmental Toxicology, University
of Science and Technology, Daejeon 34113, Republic of Korea
| |
Collapse
|
31
|
|
32
|
Muller C, Rabal O, Diaz Gonzalez C. Artificial Intelligence, Machine Learning, and Deep Learning in Real-Life Drug Design Cases. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2021; 2390:383-407. [PMID: 34731478 DOI: 10.1007/978-1-0716-1787-8_16] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The discovery and development of drugs is a long and expensive process with a high attrition rate. Computational drug discovery contributes to ligand discovery and optimization, by using models that describe the properties of ligands and their interactions with biological targets. In recent years, artificial intelligence (AI) has made remarkable modeling progress, driven by new algorithms and by the increase in computing power and storage capacities, which allow the processing of large amounts of data in a short time. This review provides the current state of the art of AI methods applied to drug discovery, with a focus on structure- and ligand-based virtual screening, library design and high-throughput analysis, drug repurposing and drug sensitivity, de novo design, chemical reactions and synthetic accessibility, ADMET, and quantum mechanics.
Collapse
Affiliation(s)
- Christophe Muller
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | - Obdulia Rabal
- Evotec (France) SAS, Computational Drug Discovery, Integrated Drug Discovery, Toulouse, France
| | | |
Collapse
|
33
|
Machine learning-based solubility prediction and methodology evaluation of active pharmaceutical ingredients in industrial crystallization. Front Chem Sci Eng 2021. [DOI: 10.1007/s11705-021-2083-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
34
|
Wieder O, Kuenemann M, Wieder M, Seidel T, Meyer C, Bryant SD, Langer T. Improved Lipophilicity and Aqueous Solubility Prediction with Composite Graph Neural Networks. Molecules 2021; 26:6185. [PMID: 34684766 PMCID: PMC8539502 DOI: 10.3390/molecules26206185] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 09/30/2021] [Accepted: 10/08/2021] [Indexed: 11/16/2022] Open
Abstract
The accurate prediction of molecular properties, such as lipophilicity and aqueous solubility, are of great importance and pose challenges in several stages of the drug discovery pipeline. Machine learning methods, such as graph-based neural networks (GNNs), have shown exceptionally good performance in predicting these properties. In this work, we introduce a novel GNN architecture, called directed edge graph isomorphism network (D-GIN). It is composed of two distinct sub-architectures (D-MPNN, GIN) and achieves an improvement in accuracy over its sub-architectures employing various learning, and featurization strategies. We argue that combining models with different key aspects help make graph neural networks deeper and simultaneously increase their predictive power. Furthermore, we address current limitations in assessment of deep-learning models, namely, comparison of single training run performance metrics, and offer a more robust solution.
Collapse
Affiliation(s)
- Oliver Wieder
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria; (M.W.); (T.S.); (T.L.)
| | - Mélaine Kuenemann
- Servier Research Institute-CentEx Biotechnology, 125 Chemin de Ronde, 78290 Croissy-sur-Seine, France; (M.K.); (C.M.)
| | - Marcus Wieder
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria; (M.W.); (T.S.); (T.L.)
| | - Thomas Seidel
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria; (M.W.); (T.S.); (T.L.)
| | - Christophe Meyer
- Servier Research Institute-CentEx Biotechnology, 125 Chemin de Ronde, 78290 Croissy-sur-Seine, France; (M.K.); (C.M.)
| | - Sharon D. Bryant
- Inte:Ligand Software Entwicklungs und Consulting GmbH, 74B/11 Mariahilferstrasse, 1070 Vienna, Austria;
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, University of Vienna, Althanstraße 14, A-1090 Vienna, Austria; (M.W.); (T.S.); (T.L.)
| |
Collapse
|
35
|
Kashyap K, Siddiqi MI. Recent trends in artificial intelligence-driven identification and development of anti-neurodegenerative therapeutic agents. Mol Divers 2021; 25:1517-1539. [PMID: 34282519 DOI: 10.1007/s11030-021-10274-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/05/2021] [Indexed: 12/12/2022]
Abstract
Neurological disorders affect various aspects of life. Finding drugs for the central nervous system is a very challenging and complex task due to the involvement of the blood-brain barrier, P-glycoprotein, and the drug's high attrition rates. The availability of big data present in online databases and resources has enabled the emergence of artificial intelligence techniques including machine learning to analyze, process the data, and predict the unknown data with high efficiency. The use of these modern techniques has revolutionized the whole drug development paradigm, with an unprecedented acceleration in the central nervous system drug discovery programs. Also, the new deep learning architectures proposed in many recent works have given a better understanding of how artificial intelligence can tackle big complex problems that arose due to central nervous system disorders. Therefore, the present review provides comprehensive and up-to-date information on machine learning/artificial intelligence-triggered effort in the brain care domain. In addition, a brief overview is presented on machine learning algorithms and their uses in structure-based drug design, ligand-based drug design, ADMET prediction, de novo drug design, and drug repurposing. Lastly, we conclude by discussing the major challenges and limitations posed and how they can be tackled in the future by using these modern machine learning/artificial intelligence approaches.
Collapse
Affiliation(s)
- Kushagra Kashyap
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India.,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India
| | - Mohammad Imran Siddiqi
- Academy of Scientific and Innovative Research (AcSIR), CSIR-Central Drug Research Institute (CSIR-CDRI) Campus, Lucknow, India. .,Molecular and Structural Biology Division, CSIR-Central Drug Research Institute (CSIR-CDRI), Sector 10, Jankipuram Extension, Sitapur Road, Lucknow, 226031, India.
| |
Collapse
|
36
|
Falcón-Cano G, Molina C, Cabrera-Pérez MÁ. ADME prediction with KNIME: A retrospective contribution to the second "Solubility Challenge". ADMET AND DMPK 2021; 9:209-218. [PMID: 35300359 PMCID: PMC8920098 DOI: 10.5599/admet.979] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 06/21/2021] [Indexed: 12/12/2022] Open
Abstract
Computational models for predicting aqueous solubility from the molecular structure represent a promising strategy from the perspective of drug design and discovery. Since the first "Solubility Challenge", these initiatives have marked the state-of-art of the modelling algorithms used to predict drug solubility. In this regard, the quality of the input experimental data and its influence on model performance has been frequently discussed. In our previous study, we developed a computational model for aqueous solubility based on recursive random forest approaches. The aim of the current commentary is to analyse the performance of this already trained predictive model on the molecules of the second "Solubility Challenge". Even when our training set has inconsistencies related to the pH, solid form and temperature conditions of the solubility measurements, the model was able to predict the two sets from the second "Solubility Challenge" with statistics comparable to those of the top ranked models. Finally, we provided a KNIME automated workflow to predict aqueous solubility of new drug candidates, during the early stages of drug discovery and development, for ensuring the applicability and reproducibility of our model.
Collapse
Affiliation(s)
- Gabriela Falcón-Cano
- Unit of Modelling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central "Marta Abreu" de las Villas. Santa Clara 54830, Villa Clara, Cuba
| | | | - Miguel Ángel Cabrera-Pérez
- Unit of Modelling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central "Marta Abreu" de las Villas. Santa Clara 54830, Villa Clara, Cuba
- Department of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
- Department of Engineering, Area of Pharmacy and Pharmaceutical Technology, Miguel Hernández University, 03550 Sant Joan d'Alacant, Alicante, Spain
| |
Collapse
|
37
|
Ge K, Ji Y. Novel Computational Approach by Combining Machine Learning with Molecular Thermodynamics for Predicting Drug Solubility in Solvents. Ind Eng Chem Res 2021. [DOI: 10.1021/acs.iecr.1c00998] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Affiliation(s)
- Kai Ge
- Jiangsu Province Hi-Tech Key Laboratory for Biomedical Research, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, People’s Republic of China
| | - Yuanhui Ji
- Jiangsu Province Hi-Tech Key Laboratory for Biomedical Research, School of Chemistry and Chemical Engineering, Southeast University, Nanjing 211189, People’s Republic of China
| |
Collapse
|
38
|
Francoeur PG, Koes DR. SolTranNet-A Machine Learning Tool for Fast Aqueous Solubility Prediction. J Chem Inf Model 2021; 61:2530-2536. [PMID: 34038123 DOI: 10.1021/acs.jcim.1c00331] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
While accurate prediction of aqueous solubility remains a challenge in drug discovery, machine learning (ML) approaches have become increasingly popular for this task. For instance, in the Second Challenge to Predict Aqueous Solubility (SC2), all groups utilized machine learning methods in their submissions. We present SolTranNet, a molecule attention transformer to predict aqueous solubility from a molecule's SMILES representation. Atypically, we demonstrate that larger models perform worse at this task, with SolTranNet's final architecture having 3,393 parameters while outperforming linear ML approaches. SolTranNet has a 3-fold scaffold split cross-validation root-mean-square error (RMSE) of 1.459 on AqSolDB and an RMSE of 1.711 on a withheld test set. We also demonstrate that, when used as a classifier to filter out insoluble compounds, SolTranNet achieves a sensitivity of 94.8% on the SC2 data set and is competitive with the other methods submitted to the competition. SolTranNet is distributed via pip, and its source code is available at https://github.com/gnina/SolTranNet.
Collapse
Affiliation(s)
- Paul G Francoeur
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| | - David R Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, United States
| |
Collapse
|
39
|
Falcón-Cano G, Molina C, Cabrera-Pérez MÁ. ADME prediction with KNIME: In silico aqueous solubility consensus model based on supervised recursive random forest approaches. ADMET AND DMPK 2020; 8:251-273. [PMID: 35300309 PMCID: PMC8915604 DOI: 10.5599/admet.852] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Revised: 08/01/2020] [Indexed: 12/12/2022] Open
Abstract
In-silico prediction of aqueous solubility plays an important role during the drug discovery and development processes. For many years, the limited performance of in-silico solubility models has been attributed to the lack of high-quality solubility data for pharmaceutical molecules. However, some studies suggest that the poor accuracy of solubility prediction is not related to the quality of the experimental data and that more precise methodologies (algorithms and/or set of descriptors) are required for predicting aqueous solubility for pharmaceutical molecules. In this study a large and diverse database was generated with aqueous solubility values collected from two public sources; two new recursive machine-learning approaches were developed for data cleaning and variable selection, and a consensus model based on regression and classification algorithms was created. The modeling protocol, which includes the curation of chemical and experimental data, was implemented in KNIME, with the aim of obtaining an automated workflow for the prediction of new databases. Finally, we compared several methods or models available in the literature with our consensus model, showing results comparable or even outperforming previous published models.
Collapse
Affiliation(s)
- Gabriela Falcón-Cano
- Unit of Modeling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central “Marta Abreu” de las Villas. Santa Clara 54830, Villa Clara, Cuba
| | | | - Miguel Ángel Cabrera-Pérez
- Unit of Modeling and Experimental Biopharmaceutics. Centro de Bioactivos Químicos. Universidad Central “Marta Abreu” de las Villas. Santa Clara 54830, Villa Clara, Cuba
- Department of Pharmacy and Pharmaceutical Technology, University of Valencia, Burjassot 46100, Valencia, Spain
- Department of Engineering, Area of Pharmacy and Pharmaceutical Technology, Miguel Hernández University, 03550 Sant Joan d'Alacant, Alicante, Spain
| |
Collapse
|