1
|
Odugbemi AI, Nyirenda C, Christoffels A, Egieyeh SA. Artificial intelligence in antidiabetic drug discovery: The advances in QSAR and the prediction of α-glucosidase inhibitors. Comput Struct Biotechnol J 2024; 23:2964-2977. [PMID: 39148608 PMCID: PMC11326494 DOI: 10.1016/j.csbj.2024.07.003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Revised: 07/03/2024] [Accepted: 07/03/2024] [Indexed: 08/17/2024] Open
Abstract
Artificial Intelligence is transforming drug discovery, particularly in the hit identification phase of therapeutic compounds. One tool that has been instrumental in this transformation is Quantitative Structure-Activity Relationship (QSAR) analysis. This computer-aided drug design tool uses machine learning to predict the biological activity of new compounds based on the numerical representation of chemical structures against various biological targets. With diabetes mellitus becoming a significant health challenge in recent times, there is intense research interest in modulating antidiabetic drug targets. α-Glucosidase is an antidiabetic target that has gained attention due to its ability to suppress postprandial hyperglycaemia, a key contributor to diabetic complications. This review explored a detailed approach to developing QSAR models, focusing on strategies for generating input variables (molecular descriptors) and computational approaches ranging from classical machine learning algorithms to modern deep learning algorithms. We also highlighted studies that have used these approaches to develop predictive models for α-glucosidase inhibitors to modulate this critical antidiabetic drug target.
Collapse
Affiliation(s)
- Adeshina I Odugbemi
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- School of Pharmacy, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
| | - Clement Nyirenda
- Department of Computer Science, University of the Western Cape, Cape Town 7535, South Africa
| | - Alan Christoffels
- South African Medical Research Council Bioinformatics Unit, South African National Bioinformatics Institute, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- Africa Centres for Disease Control and Prevention, African Union, Addis Ababa, Ethiopia
| | - Samuel A Egieyeh
- School of Pharmacy, University of the Western Cape, Bellville, Cape Town 7535, South Africa
- National Institute for Theoretical and Computational Sciences (NITheCS), South Africa
| |
Collapse
|
2
|
Yang Y, Yang Z, Pang X, Cao H, Sun Y, Wang L, Zhou Z, Wang P, Liang Y, Wang Y. Molecular designing of potential environmentally friendly PFAS based on deep learning and generative models. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 953:176095. [PMID: 39245376 DOI: 10.1016/j.scitotenv.2024.176095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Revised: 09/03/2024] [Accepted: 09/04/2024] [Indexed: 09/10/2024]
Abstract
Perfluoroalkyl and polyfluoroalkyl substances (PFAS) are widely used across a spectrum of industrial and consumer goods. Nonetheless, their persistent nature and tendency to accumulate in biological systems pose substantial environmental and health threats. Consequently, striking a balance between maximizing product efficiency and minimizing environmental and health risks by tailoring the molecular structure of PFAS has become a pivotal challenge in the fields of environmental chemistry and sustainable development. To address this issue, a computational workflow was proposed for designing an environmentally friendly PFAS by incorporating deep learning (DL) and molecular generative models. The hybrid DL architecture MolHGT+ based on heterogeneous graph neural network with transformer-like attention was applied to predict the surface tension, bioaccumulation, and hepatotoxicity of the molecules. Through virtual screening of the PFAS master database using MolHGT+, the findings indicate that incorporating the siloxane group and betaine fragment can effectively decrease both the bioaccumulation and hepatotoxicity of PFAS while preserving low surface tension. In addition, molecular generative models were employed to create a structurally diverse pool of novel PFASs with the aforementioned hit molecules serving as the initial template structures. Overall, our study presents a promising AI-driven method for advancing the development of environmentally friendly PFAS.
Collapse
Affiliation(s)
- Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Zhen Zhou
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Pu Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China.
| | - Yawei Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China; State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China
| |
Collapse
|
3
|
Ahmad W, Chong KT, Tayara H. GGAS2SN: Gated Graph and SmilesToSeq Network for Solubility Prediction. J Chem Inf Model 2024; 64:7833-7843. [PMID: 39387596 DOI: 10.1021/acs.jcim.4c00792] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2024]
Abstract
Aqueous solubility is a critical physicochemical property of drug discovery. Solubility is a key issue in pharmaceutical development because it can limit a drug's absorption capacity. Accurate solubility prediction is crucial for pharmacological, environmental, and drug development studies. This research introduces a novel method for solubility prediction by combining gated graph neural networks (GGNNs) and graph attention neural networks (GATs) with Smiles2Seq encoding. Our methodology involves converting chemical compounds into graph structures with nodes representing atoms and edges indicating chemical bonds. These graphs are then processed by using a specialized graph neural network (GNN) architecture. Incorporating attention mechanisms into GNN allows for capturing subtle structural dependencies, fostering improved solubility predictions. Furthermore, we utilized the Smiles2Seq encoding technique to bridge the semantic gap between molecular structures and their textual representations. Smiles2Seq seamlessly converts chemical notations into numeric sequences, facilitating the efficient transfer of information into our model. We demonstrate the efficacy of our approach through comprehensive experiments on benchmark solubility data sets, showcasing superior predictive performance compared to traditional methods. Our model outperforms existing solubility prediction models and provides interpretable insights into the molecular features driving solubility behavior. This research signifies an important advancement in solubility prediction, offering potent tools for drug discovery, formulation development, and environmental assessments. The fusion of GGNN and Smiles2Seq encoding establishes a robust framework for accurately forecasting solubility across various chemical compounds, fostering innovation in various domains reliant on solubility data.
Collapse
Affiliation(s)
- Waqar Ahmad
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Korea
| |
Collapse
|
4
|
Zhao J, Hermans E, Sepassi K, Tistaert C, Bergström CAS, Ahmad M, Larsson P. Effect of Data Quality and Data Quantity on the Estimation of Intrinsic Solubility: Analysis Based on a Single-Source Data Set. Mol Pharm 2024; 21:5261-5271. [PMID: 39267585 PMCID: PMC11462503 DOI: 10.1021/acs.molpharmaceut.4c00685] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 09/05/2024] [Accepted: 09/05/2024] [Indexed: 09/17/2024]
Abstract
Aqueous solubility is one of the most important physicochemical properties of drug molecules and a major driving force for oral drug absorption. To date, the performance of in silico models for the estimation of solubility for novel chemical space is limited. To investigate possible reasons and remedies for this, the Johnson and Johnson in-house aqueous solubility data with over 40,000 compounds was leveraged. All data were generated through the same high-throughput assay, providing a unique opportunity to explore the relationship between data quality, quantity, and model estimations. Six intrinsic solubility data sets with different sizes and noise levels were generated by making use of three different approaches: (i) inclusion or exclusion of amorphous solid residue, (ii) measured or experimental log D to identify the intrinsic solubility, and (iii) adopting or omitting a quality check process in the data processing workflow. A random forest regressor was trained on the data sets with three different sets of descriptors calculated from RDKit, ADMET predictor, or Mordred, and the performances were evaluated with nested cross-validation as well as ten refined test sets. The models confirm, as expected, that with the same data set size, high-quality data leads to better model performance; however, also, models trained with larger data sets containing analytical variability can give equally accurate estimations compared to models trained with small, clean, and diverse data sets. However, noise introduced by including the presence of amorphous solid postsolubility measurement in the training data set cannot be overcome by increasing data size, as they are introducing a biased systematic positive error in the data set, confirming the importance of critical data review. Finally, two top-performing models were tested on the first test set from the second solubility challenge, achieving RMSE values of 0.74 and 0.72 and log S ± 0.5 of 46 and 48%, respectively. These results demonstrated improved performance compared to those reported in the findings of the competition, highlighting that a single-source curated data set can enhance the prediction of intrinsic solubility.
Collapse
Affiliation(s)
- Jiaxi Zhao
- Department
of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| | - Eline Hermans
- Pharmaceutical
& Material Sciences, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | - Kia Sepassi
- Discovery
Pharmaceutics, Janssen Research & Development,
LLC, La Jolla, California 92121, United States
| | - Christophe Tistaert
- Pharmaceutical
& Material Sciences, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | | | - Mazen Ahmad
- In
Silico Discovery, Janssen Pharmaceutica
NV, B-2340 Beerse, Belgium
| | - Per Larsson
- Department
of Pharmacy, Uppsala University, 751 23 Uppsala, Sweden
| |
Collapse
|
5
|
Neha, Aggarwal M, Soni A, Karmakar T. Polymorph-Specific Solubility Prediction of Urea Using Constant Chemical Potential Molecular Dynamics Simulations. J Phys Chem B 2024; 128:8477-8483. [PMID: 39186699 DOI: 10.1021/acs.jpcb.4c02027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
Molecular dynamics simulations offer a robust approach to understanding the material properties within a system. Solubility is defined as the analytical composition of a saturated solution expressed as a proportion of designated solute in a designated solvent, according to IUPAC. It is a critical property of compounds and holds significance across numerous fields. Various computational techniques have been explored for determining solubility, including methods based on chemical potential determination, enhanced sampling simulation, and direct coexistence simulation, and lately, machine learning-based methods have shown promise. In this investigation, we have utilized Constant Chemical Potential Molecular Dynamics, a method rooted in direct coexistence simulation, to predict the solubility of urea polymorphs in aqueous solution. The primary purpose of using this method is to overcome the limitation of the direct simulation method by maintaining a constant chemical potential for a sufficiently long time. Urea is chosen as a prototypical system for our study, with a particular focus on three of its polymorphs. Our approach effectively discriminates between the polymorphs of urea based on their respective solubility values; polymorph III is found to have the highest solubility, followed by forms IV and I.
Collapse
Affiliation(s)
- Neha
- Department of Chemistry, Indian Institute of Technology, Delhi, New Delhi 110016, India
| | - Manya Aggarwal
- Department of Chemistry, Indian Institute of Technology, Delhi, New Delhi 110016, India
| | - Aashutosh Soni
- Department of Chemistry, Indian Institute of Technology, Delhi, New Delhi 110016, India
| | - Tarak Karmakar
- Department of Chemistry, Indian Institute of Technology, Delhi, New Delhi 110016, India
| |
Collapse
|
6
|
Suriyaamporn P, Pamornpathomkul B, Patrojanasophon P, Ngawhirunpat T, Rojanarata T, Opanasopit P. The Artificial Intelligence-Powered New Era in Pharmaceutical Research and Development: A Review. AAPS PharmSciTech 2024; 25:188. [PMID: 39147952 DOI: 10.1208/s12249-024-02901-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2024] [Accepted: 07/22/2024] [Indexed: 08/17/2024] Open
Abstract
Currently, artificial intelligence (AI), machine learning (ML), and deep learning (DL) are gaining increased interest in many fields, particularly in pharmaceutical research and development, where they assist in decision-making in complex situations. Numerous research studies and advancements have demonstrated how these computational technologies are used in various pharmaceutical research and development aspects, including drug discovery, personalized medicine, drug formulation, optimization, predictions, drug interactions, pharmacokinetics/ pharmacodynamics, quality control/quality assurance, and manufacturing processes. Using advanced modeling techniques, these computational technologies can enhance efficiency and accuracy, handle complex data, and facilitate novel discoveries within minutes. Furthermore, these technologies offer several advantages over conventional statistics. They allow for pattern recognition from complex datasets, and the models, typically developed from data-driven algorithms, can predict a given outcome (model output) from a set of features (model inputs). Additionally, this review discusses emerging trends and provides perspectives on the application of AI with quality by design (QbD) and the future role of AI in this field. Ethical and regulatory considerations associated with integrating AI into pharmaceutical technology were also examined. This review aims to offer insights to researchers, professionals, and others on the current state of AI applications in pharmaceutical research and development and their potential role in the future of research and the era of pharmaceutical Industry 4.0 and 5.0.
Collapse
Affiliation(s)
- Phuvamin Suriyaamporn
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Boonnada Pamornpathomkul
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Prasopchai Patrojanasophon
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Tanasait Ngawhirunpat
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Theerasak Rojanarata
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand
| | - Praneet Opanasopit
- Pharmaceutical Development of Green Innovations Group (PDGIG), Department of Industrial Pharmacy, Faculty of Pharmacy, Silpakorn University, Nakhon Pathom, Thailand.
| |
Collapse
|
7
|
Ramani V, Karmakar T. Graph Neural Networks for Predicting Solubility in Diverse Solvents Using MolMerger Incorporating Solute-Solvent Interactions. J Chem Theory Comput 2024. [PMID: 39041858 DOI: 10.1021/acs.jctc.4c00382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2024]
Abstract
The prediction of solubility is a complex and challenging physicochemical problem that has tremendous implications for the chemical and pharmaceutical industry. Recent advancements in machine learning methods have provided a great scope for predicting the reliable solubility of a large number of molecular systems. However, most of these methods rely on using physical properties obtained from experiments and expensive quantum chemical calculations. Here, we developed a method that utilizes a graphical representation of solute-solvent interactions using "MolMerger," which captures the strongest polar interactions between molecules using Gasteiger charges and creates a graph incorporating the true nature of the system. Using these graphs as input, a neural network learns the correlation between the structural properties of a molecule in the form of node embedding and its physicochemical properties as the output. This approach has been used to calculate molecular solubility by predicting the Log solubility values of various organic molecules and pharmaceuticals in diverse sets of solvents.
Collapse
Affiliation(s)
- Vansh Ramani
- Department of Chemical Engineering, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| | - Tarak Karmakar
- Department of Chemistry, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi 110016, India
| |
Collapse
|
8
|
Yang Z, Wang Y, Du G, Zhan Y, Zhan W. Prediction method of pharmacokinetic parameters of small molecule drugs based on GCN network model. J Mol Model 2024; 30:264. [PMID: 38995407 DOI: 10.1007/s00894-024-06051-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 06/26/2024] [Indexed: 07/13/2024]
Abstract
CONTEXT Accurately predicting plasma protein binding rate (PPBR) and oral bioavailability (OBA) helps to better reveal the absorption and distribution of drugs in the human body and subsequent drug design. Although machine learning models have achieved good results in prediction accuracy, they often suffer from insufficient accuracy when dealing with data with irregular topological structures. METHODS In view of this, this study proposes a pharmacokinetic parameter prediction framework based on graph convolutional networks (GCN), which predicts the PPBR and OBA of small molecule drugs. In the framework, GCN is first used to extract spatial feature information on the topological structure of drug molecules, in order to better learn node features and association information between nodes. Then, based on the principle of drug similarity, this study calculates the similarity between small molecule drugs, selects different thresholds to construct datasets, and establishes a prediction model centered on the GCN algorithm. The experimental results show that compared with traditional machine learning prediction models, the prediction model constructed based on the GCN method performs best on PPBR and OBA datasets with an inter-molecular similarity threshold of 0.25, with MAE of 0.155 and 0.167, respectively. In addition, in order to further improve the accuracy of the prediction model, GCN is combined with other algorithms. Compared to using a single GCN method, the distribution of the predicted values obtained by the combined model is highly consistent with the true values. In summary, this work provides a new method for improving the rate of early drug screening in the future.
Collapse
Affiliation(s)
- Zhihua Yang
- Department of Radiation Oncology, General Hospital of Ningxia Medical University, Yinchuan, 750004, China
| | - Ying Wang
- Engineering Research Center of Molecular and Neuro Imaging of the Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Getao Du
- Engineering Research Center of Molecular and Neuro Imaging of the Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China
| | - Yonghua Zhan
- Engineering Research Center of Molecular and Neuro Imaging of the Ministry of Education, School of Life Science and Technology, Xidian University, Xi'an, 710071, Shaanxi, China.
| | - Wenhua Zhan
- Department of Radiation Oncology, General Hospital of Ningxia Medical University, Yinchuan, 750004, China.
| |
Collapse
|
9
|
Kim B, Manchuri AR, Oh GT, Lim Y, Son Y, Choi S, Kang M, Jang J, Ha J, Cho CH, Lee MW, Lee DS. Experimental analysis and prediction of radionuclide solubility using machine learning models: Effects of organic complexing agents. JOURNAL OF HAZARDOUS MATERIALS 2024; 469:134012. [PMID: 38492397 DOI: 10.1016/j.jhazmat.2024.134012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 03/02/2024] [Accepted: 03/10/2024] [Indexed: 03/18/2024]
Abstract
Radioactive wastes contain organic complexing agents that can form complexes with radionuclides and enhance the solubility of these radionuclides, increasing the mobility of radionuclides over great distances from a radioactive waste repository. In this study, four radionuclides (cobalt, strontium, iodine, and uranium) and three organic complexing agents (ethylenediaminetetraacetic acid, nitrilotriacetic acid, and iso-saccharic acid) were selected, and the solubility of these radionuclides was assessed under realistic environmental conditions such as different pHs (7, 9, 11, and 13), temperatures (10 °C, 20 °C, and 40 °C), and organic complexing agent concentrations (10-5-10-2 M). A total of 720 datasets were generated from solubility batch experiments. Four supervised machine learning models such as the Gaussian process regression (GPR), ensemble-boosted trees, artificial neural networks, and support vector machine were developed for predicting the radionuclide solubility. Each ML model was optimized using Bayesian optimization algorithm. The GPR evolved as a robust model that provided accurate predictions within the underlying solubility patterns by capturing the intricate relationships of the independent parameters of the dataset. At an uncertainty level of 95%, both the experimental results and GPR simulated estimations were closely correlated, confirming the suitability of the GPR model for future explorations.
Collapse
Affiliation(s)
- Bolam Kim
- Department of Environmental Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea
| | - Amaranadha Reddy Manchuri
- Department of Environmental Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea
| | - Gi-Taek Oh
- Department of Chemical Engineering, Keimyung University, 1095 Dalgubeol-daero, Dalseo-gu, Daegu 42601, Republic of Korea
| | - Youngsu Lim
- Department of Environmental Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea
| | - Yuhwa Son
- LILW Technology Team, Korea Radioactive Waste Agency, 19 Chunghyochun-gil, Gyeongju-si, Gyeongsangbuk-do 38062, Republic of Korea
| | - Seho Choi
- LILW Technology Team, Korea Radioactive Waste Agency, 19 Chunghyochun-gil, Gyeongju-si, Gyeongsangbuk-do 38062, Republic of Korea
| | - Myunggoo Kang
- LILW Technology Team, Korea Radioactive Waste Agency, 19 Chunghyochun-gil, Gyeongju-si, Gyeongsangbuk-do 38062, Republic of Korea
| | - Jiseon Jang
- HLW Technology Development Institute, Korea Radioactive Waste Agency, 174 Gajeong-ro, Yuseong-gu, Daejeon 34129, Republic of Korea
| | - Jaechul Ha
- LILW Technology Team, Korea Radioactive Waste Agency, 19 Chunghyochun-gil, Gyeongju-si, Gyeongsangbuk-do 38062, Republic of Korea
| | - Chun-Hyung Cho
- HLW Technology Development Institute, Korea Radioactive Waste Agency, 174 Gajeong-ro, Yuseong-gu, Daejeon 34129, Republic of Korea
| | - Min-Woo Lee
- Department of Chemical Engineering, Keimyung University, 1095 Dalgubeol-daero, Dalseo-gu, Daegu 42601, Republic of Korea
| | - Dae Sung Lee
- Department of Environmental Engineering, Kyungpook National University, 80 Daehak-ro, Buk-gu, Daegu 41566, Republic of Korea.
| |
Collapse
|
10
|
Cysewski P, Jeliński T, Przybyłek M. Experimental and Theoretical Insights into the Intermolecular Interactions in Saturated Systems of Dapsone in Conventional and Deep Eutectic Solvents. Molecules 2024; 29:1743. [PMID: 38675562 PMCID: PMC11051893 DOI: 10.3390/molecules29081743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Revised: 04/05/2024] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
Solubility is not only a crucial physicochemical property for laboratory practice but also provides valuable insight into the mechanism of saturated system organization, as a measure of the interplay between various intermolecular interactions. The importance of these data cannot be overstated, particularly when dealing with active pharmaceutical ingredients (APIs), such as dapsone. It is a commonly used anti-inflammatory and antimicrobial agent. However, its low solubility hampers its efficient applications. In this project, deep eutectic solvents (DESs) were used as solubilizing agents for dapsone as an alternative to traditional solvents. DESs were composed of choline chloride and one of six polyols. Additionally, water-DES mixtures were studied as a type of ternary solvents. The solubility of dapsone in these systems was determined spectrophotometrically. This study also analyzed the intermolecular interactions, not only in the studied eutectic systems, but also in a wide range of systems found in the literature, determined using the COSMO-RS framework. The intermolecular interactions were quantified as affinity values, which correspond to the Gibbs free energy of pair formation of dapsone molecules with constituents of regular solvents and choline chloride-based deep eutectic solvents. The patterns of solute-solute, solute-solvent, and solvent-solvent interactions that affect solubility were recognized using Orange data mining software (version 3.36.2). Finally, the computed affinity values were used to provide useful descriptors for machine learning purposes. The impact of intermolecular interactions on dapsone solubility in neat solvents, binary organic solvent mixtures, and deep eutectic solvents was analyzed and highlighted, underscoring the crucial role of dapsone self-association and providing valuable insights into complex solubility phenomena. Also the importance of solvent-solvent diversity was highlighted as a factor determining dapsone solubility. The Non-Linear Support Vector Regression (NuSVR) model, in conjunction with unique molecular descriptors, revealed exceptional predictive accuracy. Overall, this study underscores the potency of computed molecular characteristics and machine learning models in unraveling complex molecular interactions, thereby advancing our understanding of solubility phenomena within the scientific community.
Collapse
Affiliation(s)
- Piotr Cysewski
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland; (T.J.); (M.P.)
| | | | | |
Collapse
|
11
|
Llompart P, Minoletti C, Baybekov S, Horvath D, Marcou G, Varnek A. Will we ever be able to accurately predict solubility? Sci Data 2024; 11:303. [PMID: 38499581 PMCID: PMC10948805 DOI: 10.1038/s41597-024-03105-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Accepted: 02/29/2024] [Indexed: 03/20/2024] Open
Abstract
Accurate prediction of thermodynamic solubility by machine learning remains a challenge. Recent models often display good performances, but their reliability may be deceiving when used prospectively. This study investigates the origins of these discrepancies, following three directions: a historical perspective, an analysis of the aqueous solubility dataverse and data quality. We investigated over 20 years of published solubility datasets and models, highlighting overlooked datasets and the overlaps between popular sets. We benchmarked recently published models on a novel curated solubility dataset and report poor performances. We also propose a workflow to cure aqueous solubility data aiming at producing useful models for bench chemist. Our results demonstrate that some state-of-the-art models are not ready for public usage because they lack a well-defined applicability domain and overlook historical data sources. We report the impact of factors influencing the utility of the models: interlaboratory standard deviation, ionic state of the solute and data sources. The herein obtained models, and quality-assessed datasets are publicly available.
Collapse
Affiliation(s)
- P Llompart
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
- IDD/CADD, Sanofi, Vitry-Sur-Seine, France
| | | | - S Baybekov
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - D Horvath
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| | - G Marcou
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France.
| | - A Varnek
- Laboratory of Chemoinformatics, UMR7140, University of Strasbourg, Strasbourg, France
| |
Collapse
|
12
|
Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:2817-2829. [PMID: 38291630 DOI: 10.1021/acs.est.3c09779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Over the past few decades, extensive research has indicated that exposure to bisphenol A (BPA) increases the health risks in humans. Toxicological studies have demonstrated that BPA can bind to the androgen receptor (AR), resulting in endocrine-disrupting effects. In recent investigations, many alternatives to BPA have been detected in various environmental media as major pollutants. However, related experimental evaluations of BPA alternatives have not been systematically implemented for the assessment of chemical safety and the effects of structural characteristics on the antagonistic activity of the AR. To promote the green development of BPA alternatives, high-throughput toxicological screening is fundamental for prioritizing chemical tests. Therefore, we proposed a hybrid deep learning architecture that combines molecular descriptors and molecular graphs to predict AR antagonistic activity. Compared to previous models, this hybrid architecture can extract substantial chemical information from various molecular representations to improve the model's generalization ability for BPA alternatives. Our predictions suggest that lignin-derivable bisguaiacols, as alternatives to BPA, are likely to be nonantagonist for AR compared to bisphenol analogues. Additionally, molecular dynamics (MD) simulations identified the dihydrotestosterone-bound pocket, rather than the surface, as the major binding site of bisphenol analogues. The conformational changes of key helix H12 from an agonistic to an antagonistic conformation can be evaluated qualitatively by accelerated MD simulations to explain the underlying mechanism. Overall, our computational study is helpful for toxicological screening of BPA alternatives and the design of environmentally friendly BPA alternatives.
Collapse
Affiliation(s)
- Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
13
|
Adebar N, Keupp J, Emenike VN, Kühlborn J, Vom Dahl L, Möckel R, Smiatek J. Scientific Deep Machine Learning Concepts for the Prediction of Concentration Profiles and Chemical Reaction Kinetics: Consideration of Reaction Conditions. J Phys Chem A 2024; 128:929-944. [PMID: 38271617 DOI: 10.1021/acs.jpca.3c06265] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2024]
Abstract
Emerging concepts from scientific deep machine learning such as physics-informed neural networks (PINNs) enable a data-driven approach for the study of complex kinetic problems. We present an extended framework that combines the advantages of PINNs with the detailed consideration of experimental parameter variations for the simulation and prediction of chemical reaction kinetics. The approach is based on truncated Taylor series expansions for the underlying fundamental equations, whereby the external variations can be interpreted as perturbations of the kinetic parameters. Accordingly, our method allows for an efficient consideration of experimental parameter settings and their influence on the concentration profiles and reaction kinetics. A particular advantage of our approach, in addition to the consideration of univariate and multivariate parameter variations, is the robust model-based exploration of the parameter space to determine optimal reaction conditions in combination with advanced reaction insights. The benefits of this concept are demonstrated for higher-order chemical reactions including catalytic and oscillatory systems in combination with small amounts of training data. All predicted values show a high level of accuracy, demonstrating the broad applicability and flexibility of our approach.
Collapse
Affiliation(s)
- Niklas Adebar
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Julian Keupp
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Victor N Emenike
- HP BioP Launch and Innovation, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jonas Kühlborn
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Lisa Vom Dahl
- Development NCE, Analytical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Robert Möckel
- Development NCE, Chemical Development, Boehringer Ingelheim Pharma GmbH & Co. KG, D-55218 Ingelheim (Rhein), Germany
| | - Jens Smiatek
- Institute for Computational Physics, University of Stuttgart, D-70569 Stuttgart, Germany
- Development NCE, Strategy NCEs, Boehringer Ingelheim Pharma GmbH & Co. KG, D-88397 Biberach (Riss), Germany
| |
Collapse
|
14
|
Setiya A, Jani V, Sonavane U, Joshi R. MolToxPred: small molecule toxicity prediction using machine learning approach. RSC Adv 2024; 14:4201-4220. [PMID: 38292268 PMCID: PMC10826801 DOI: 10.1039/d3ra07322j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 01/23/2024] [Indexed: 02/01/2024] Open
Abstract
Different types of chemicals and products may exhibit various health risks when administered into the human body. For toxicity reasons, the number of new drugs entering the market through the conventional drug development process has been reduced over the years. However, with the advent of big data and artificial intelligence, machine learning techniques have emerged as a potential solution for predicting toxicity and ensuring efficient drug development and chemical safety. An ML model for toxicity prediction can reduce experimental costs and time while addressing ethical concerns by drastically reducing the need for animals and clinical trials. Herein, MolToxPred, an ML-based tool, has been developed using a stacked model approach to predict the potential toxicity of small molecules and metabolites. The stacked model consists of random forest, multi-layer perceptron, and LightGBM as base classifiers and Logistic Regression as the meta classifier. For training and validation purposes, a comprehensive set of toxic and non-toxic molecules is curated. Different structural and physicochemical-based features in the form of molecular descriptors and fingerprints were employed. MolToxPred utilizes a comprehensive feature selection process and optimizes its hyperparameters through Bayesian optimization with stratified 5-fold cross-validation. In the evaluation phase, MolToxPred achieved an AUROC of 87.76% on the test set and 88.84% on an external validation set. The McNemar test was used as the post-hoc test to determine if the stacked models' performance was significantly different compared to the base learners. The developed stacked model outperformed its base classifiers and an existing tool in the literature, reaffirming its better performance. The hypothesis is that the incorporation of a diverse set of data, the subsequent feature selection, and a stacked ensemble approach give MolToxPred the edge over other methods. In addition to this, an attempt has been made to identify structural alerts responsible for endpoints of the Tox21 data to determine the association of a molecule with a plausible downstream pathway of action. MolToxPred may be helpful for drug discovery and regulatory pipelines in pharmaceutical and other industries for in silico toxicity prediction of small molecule candidates.
Collapse
Affiliation(s)
- Anjali Setiya
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Vinod Jani
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Uddhavesh Sonavane
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| | - Rajendra Joshi
- HPC-Medical & Bioinformatics Applications Group, Centre for Development of Advanced Computing (C-DAC) Innovation Park, Panchawati, Pashan Pune 411008 India
| |
Collapse
|
15
|
Kim Y, Jung H, Kumar S, Paton RS, Kim S. Designing solvent systems using self-evolving solubility databases and graph neural networks. Chem Sci 2024; 15:923-939. [PMID: 38239675 PMCID: PMC10793204 DOI: 10.1039/d3sc03468b] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Accepted: 12/04/2023] [Indexed: 01/22/2024] Open
Abstract
Designing solvent systems is key to achieving the facile synthesis and separation of desired products from chemical processes, so many machine learning models have been developed to predict solubilities. However, breakthroughs are needed to address deficiencies in the model's predictive accuracy and generalizability; this can be addressed by expanding and integrating experimental and computational solubility databases. To maximize predictive accuracy, these two databases should not be trained separately, and they should not be simply combined without reconciling the discrepancies from different magnitudes of errors and uncertainties. Here, we introduce self-evolving solubility databases and graph neural networks developed through semi-supervised self-training approaches. Solubilities from quantum-mechanical calculations are referred to during semi-supervised learning, but they are not directly added to the experimental database. Dataset augmentation is performed from 11 637 experimental solubilities to >900 000 data points in the integrated database, while correcting for the discrepancies between experiment and computation. Our model was successfully applied to study solvent selection in organic reactions and separation processes. The accuracy (mean absolute error around 0.2 kcal mol-1 for the test set) is quantitatively useful in exploring Linear Free Energy Relationships between reaction rates and solvation free energies for 11 organic reactions. Our model also accurately predicted the partition coefficients of lignin-derived monomers and drug-like molecules. While there is room for expanding solubility predictions to transition states, radicals, charged species, and organometallic complexes, this approach will be attractive to predictive chemistry areas where experimental, computational, and other heterogeneous data should be combined.
Collapse
Affiliation(s)
- Yeonjoon Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
- Department of Chemistry, Pukyong National University Busan 48513 Republic of Korea
| | - Hojin Jung
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Sabari Kumar
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Robert S Paton
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| | - Seonah Kim
- Department of Chemistry, Colorado State University Fort Collins CO 80523 USA
| |
Collapse
|
16
|
Hong RS, Rojas AV, Bhardwaj RM, Wang L, Mattei A, Abraham NS, Cusack KP, Pierce MO, Mondal S, Mehio N, Bordawekar S, Kym PR, Abel R, Sheikh AY. Free Energy Perturbation Approach for Accurate Crystalline Aqueous Solubility Predictions. J Med Chem 2023; 66:15883-15893. [PMID: 38016916 DOI: 10.1021/acs.jmedchem.3c01339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Early assessment of crystalline thermodynamic solubility continues to be elusive for drug discovery and development despite its critical importance, especially for the ever-increasing fraction of poorly soluble drug candidates. Here we present a detailed evaluation of a physics-based free energy perturbation (FEP+) approach for computing the thermodynamic aqueous solubility. The predictive power of this approach is assessed across diverse chemical spaces, spanning pharmaceutically relevant literature compounds and more complex AbbVie compounds. Our approach achieves predictive (RMSE = 0.86) and differentiating power (R2 = 0.69) and therefore provides notably improved correlations to experimental solubility compared to state-of-the-art machine learning approaches that utilize quantum mechanics-based descriptors. The importance of explicit considerations of crystalline packing in predicting solubility by the FEP+ approach is also highlighted in this study. Finally, we show how computed energetics, including hydration and sublimation free energies, can provide further insights into molecule design to feed the medicinal chemistry DMTA cycle.
Collapse
Affiliation(s)
- Richard S Hong
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Ana V Rojas
- Schrödinger Inc., 1540 Broadway 24th Floor, New York, New York 10036, United States
| | - Rajni Miglani Bhardwaj
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Lingle Wang
- Schrödinger Inc., 1540 Broadway 24th Floor, New York, New York 10036, United States
| | - Alessandra Mattei
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Nathan S Abraham
- Ventus Therapeutics 100 Beaver St, Waltham, Massachusetts 02453, United States
| | - Kevin P Cusack
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - M Olivia Pierce
- Bristol Myer Squibb, 100 Binney Street, Cambridge, Massachusetts 02142, United States
| | - Sayan Mondal
- Schrödinger Inc., 1540 Broadway 24th Floor, New York, New York 10036, United States
| | - Nada Mehio
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Shailendra Bordawekar
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Philip R Kym
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| | - Robert Abel
- Schrödinger Inc., 1540 Broadway 24th Floor, New York, New York 10036, United States
| | - Ahmad Y Sheikh
- AbbVie Inc., Research & Development, 1 N Waukegan Road, North Chicago, Illinois 60064, United States
| |
Collapse
|
17
|
Murray JD, Lange JJ, Bennett-Lenane H, Holm R, Kuentz M, O'Dwyer PJ, Griffin BT. Advancing algorithmic drug product development: Recommendations for machine learning approaches in drug formulation. Eur J Pharm Sci 2023; 191:106562. [PMID: 37562550 DOI: 10.1016/j.ejps.2023.106562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Revised: 07/09/2023] [Accepted: 08/07/2023] [Indexed: 08/12/2023]
Abstract
Artificial intelligence is a rapidly expanding area of research, with the disruptive potential to transform traditional approaches in the pharmaceutical industry, from drug discovery and development to clinical practice. Machine learning, a subfield of artificial intelligence, has fundamentally transformed in silico modelling and has the capacity to streamline clinical translation. This paper reviews data-driven modelling methodologies with a focus on drug formulation development. Despite recent advances, there is limited modelling guidance specific to drug product development and a trend towards suboptimal modelling practices, resulting in models that may not give reliable predictions in practice. There is an overwhelming focus on benchtop experimental outcomes obtained for a specific modelling aim, leaving the capabilities of data scraping or the use of combined modelling approaches yet to be fully explored. Moreover, the preference for high accuracy can lead to a reliance on black box methods over interpretable models. This further limits the widespread adoption of machine learning as black boxes yield models that cannot be easily understood for the purposes of enhancing product performance. In this review, recommendations for conducting machine learning research for drug product development to ensure trustworthiness, transparency, and reliability of the models produced are presented. Finally, possible future directions on how research in this area might develop are discussed to aim for models that provide useful and robust guidance to formulators.
Collapse
Affiliation(s)
- Jack D Murray
- School of Pharmacy, University College Cork, Cork, Ireland
| | - Justus J Lange
- School of Pharmacy, University College Cork, Cork, Ireland; Roche Pharmaceutical Research & Early Development, Pre-Clinical CMC, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd, Grenzacherstrasse 124, Basel, Switzerland
| | | | - René Holm
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Campusvej 55, Odense 5230, Denmark
| | - Martin Kuentz
- School of Life Sciences, University of Applied Sciences and Arts Northwestern Switzerland, Muttenz CH 4132, Switzerland
| | | | | |
Collapse
|
18
|
Liu J, Lei X, Ji C, Pan Y. Fragment-pair based drug molecule solubility prediction through attention mechanism. Front Pharmacol 2023; 14:1255181. [PMID: 37881183 PMCID: PMC10595153 DOI: 10.3389/fphar.2023.1255181] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/26/2023] [Indexed: 10/27/2023] Open
Abstract
The purpose of drug discovery is to identify new drugs, and the solubility of drug molecules is an important physicochemical property in medicinal chemistry, that plays a crucial role in drug discovery. In solubility prediction, high-precision computational methods can significantly reduce the experimental costs and time associated with drug development. Therefore, artificial intelligence technologies have been widely used for solubility prediction. This study utilized the attention layer in mechanism in the deep learning model to consider the atomic-level features of the molecules, and used gated recurrent neural networks to aggregate vectors between layers. It also utilized molecular fragment technology to divide the complete molecule into pairs of fragments, extracted characteristics from each fragment pair, and finally fused the characteristics to predict the solubility of drug molecules. We compared and evaluated our method with five existing models using two performance evaluation indicators, demonstrating that our method has better performance and greater robustness.
Collapse
Affiliation(s)
- Jianping Liu
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi’an, China
| | - Chunyan Ji
- Computer Science Department, BNU-HKBU United International College, Zhuhai, China
| | - Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, China
| |
Collapse
|
19
|
Cysewski P, Przybyłek M, Jeliński T. Intermolecular Interactions as a Measure of Dapsone Solubility in Neat Solvents and Binary Solvent Mixtures. MATERIALS (BASEL, SWITZERLAND) 2023; 16:6336. [PMID: 37763610 PMCID: PMC10532775 DOI: 10.3390/ma16186336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 09/19/2023] [Accepted: 09/20/2023] [Indexed: 09/29/2023]
Abstract
Dapsone is an effective antibacterial drug used to treat a variety of conditions. However, the aqueous solubility of this drug is limited, as is its permeability. This study expands the available solubility data pool for dapsone by measuring its solubility in several pure organic solvents: N-methyl-2-pyrrolidone (CAS: 872-50-4), dimethyl sulfoxide (CAS: 67-68-5), 4-formylmorpholine (CAS: 4394-85-8), tetraethylene pentamine (CAS: 112-57-2), and diethylene glycol bis(3-aminopropyl) ether (CAS: 4246-51-9). Furthermore, the study proposes the use of intermolecular interactions as molecular descriptors to predict the solubility of dapsone in neat solvents and binary mixtures using machine learning models. An ensemble of regressors was used, including support vector machines, random forests, gradient boosting, and neural networks. Affinities of dapsone to solvent molecules were calculated using COSMO-RS and used as input for model training. Due to the polymorphic nature of dapsone, fusion data are not available, which prohibits the direct use of COSMO-RS for solubility calculations. Therefore, a consonance solvent approach was tested, which allows an indirect estimation of the fusion properties. Unfortunately, the resulting accuracy is unsatisfactory. In contrast, the developed regressors showed high predictive potential. This work documents that intermolecular interactions characterized by solute-solvent contacts can be considered valuable molecular descriptors for solubility modeling and that the wealth of encoded information is sufficient for solubility predictions for new systems, including those for which experimental measurements of thermodynamic properties are unavailable.
Collapse
Affiliation(s)
- Piotr Cysewski
- Department of Physical Chemistry, Pharmacy Faculty, Collegium Medicum of Bydgoszcz, Nicolaus Copernicus University in Toruń, Kurpińskiego 5, 85-096 Bydgoszcz, Poland; (M.P.); (T.J.)
| | | | | |
Collapse
|
20
|
Wojtuch A, Danel T, Podlewska S, Maziarka Ł. Extended study on atomic featurization in graph neural networks for molecular property prediction. J Cheminform 2023; 15:81. [PMID: 37726841 PMCID: PMC10507875 DOI: 10.1186/s13321-023-00751-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 08/23/2023] [Indexed: 09/21/2023] Open
Abstract
Graph neural networks have recently become a standard method for analyzing chemical compounds. In the field of molecular property prediction, the emphasis is now on designing new model architectures, and the importance of atom featurization is oftentimes belittled. When contrasting two graph neural networks, the use of different representations possibly leads to incorrect attribution of the results solely to the network architecture. To better understand this issue, we compare multiple atom representations by evaluating them on the prediction of free energy, solubility, and metabolic stability using graph convolutional networks. We discover that the choice of atom representation has a significant impact on model performance and that the optimal subset of features is task-specific. Additional experiments involving more sophisticated architectures, including graph transformers, support these findings. Moreover, we demonstrate that some commonly used atom features, such as the number of neighbors or the number of hydrogens, can be easily predicted using only information about bonds and atom type, yet their explicit inclusion in the representation has a positive impact on model performance. Finally, we explain the predictions of the best-performing models to better understand how they utilize the available atomic features.
Collapse
Affiliation(s)
- Agnieszka Wojtuch
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland.
| | - Tomasz Danel
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| | - Sabina Podlewska
- Maj Institute of Pharmacology, Polish Academy of Sciences, Smętna 12, 31-343, Kraków, Poland
| | - Łukasz Maziarka
- Faculty of Mathematics and Computer Science, Jagiellonian University, Łojasiewicza 6, 30-348, Kraków, Poland
| |
Collapse
|
21
|
Teng S, Yin C, Wang Y, Chen X, Yan Z, Cui L, Wei L. MolFPG: Multi-level fingerprint-based Graph Transformer for accurate and robust drug toxicity prediction. Comput Biol Med 2023; 164:106904. [PMID: 37453376 DOI: 10.1016/j.compbiomed.2023.106904] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 03/20/2023] [Accepted: 04/10/2023] [Indexed: 07/18/2023]
Abstract
Drug toxicity prediction is essential to drug development, which can help screen compounds with potential toxicity and reduce the cost and risk of animal experiments and clinical trials. However, traditional handcrafted feature-based and molecular-graph-based approaches are insufficient for molecular representation learning. To address the problem, we developed an innovative molecular fingerprint Graph Transformer framework (MolFPG) with a global-aware module for interpretable toxicity prediction. Our approach encodes compounds using multiple molecular fingerprinting techniques and integrates Graph Transformer-based molecular representation for feature learning and toxic prediction. Experimental results show that our proposed approach has high accuracy and reliability in predicting drug toxicity. In addition, we explored the relationship between drug features and toxicity through an interpretive analysis approach, which improved the interpretability of the approach. Our results highlight the potential of Graph Transformers and multi-level fingerprints for accelerating the drug discovery process by reliably, effectively alarming drug safety. We believe that our study will provide vital support and reference for further development in the field of drug development and toxicity assessment.
Collapse
Affiliation(s)
- Saisai Teng
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Chenglin Yin
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | - Yu Wang
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China
| | | | - Zhongmin Yan
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Lizhen Cui
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| | - Leyi Wei
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan, China.
| |
Collapse
|
22
|
Tseng YJ, Chuang PJ, Appell M. When Machine Learning and Deep Learning Come to the Big Data in Food Chemistry. ACS OMEGA 2023; 8:15854-15864. [PMID: 37179635 PMCID: PMC10173424 DOI: 10.1021/acsomega.2c07722] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2022] [Accepted: 04/07/2023] [Indexed: 05/15/2023]
Abstract
Since the first food database was released over one hundred years ago, food databases have become more diversified, including food composition databases, food flavor databases, and food chemical compound databases. These databases provide detailed information about the nutritional compositions, flavor molecules, and chemical properties of various food compounds. As artificial intelligence (AI) is becoming popular in every field, AI methods can also be applied to food industry research and molecular chemistry. Machine learning and deep learning are valuable tools for analyzing big data sources such as food databases. Studies investigating food compositions, flavors, and chemical compounds with AI concepts and learning methods have emerged in the past few years. This review illustrates several well-known food databases, focusing on their primary contents, interfaces, and other essential features. We also introduce some of the most common machine learning and deep learning methods. Furthermore, a few studies related to food databases are given as examples, demonstrating their applications in food pairing, food-drug interactions, and molecular modeling. Based on the results of these applications, it is expected that the combination of food databases and AI will play an essential role in food science and food chemistry.
Collapse
Affiliation(s)
- Yufeng Jane Tseng
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
| | - Pei-Jiun Chuang
- Graduate
Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, No. 1 Roosevelt Rd. Sec. 4, Taipei 10617, Taiwan
| | - Michael Appell
- USDA,
Agricultural Research Service, National Center for Agricultural Utilization
Research, Mycotoxin Prevention
and Applied Microbiology Research Unit, 1815 N. University, Peoria, Illinois. 61604, United States
| |
Collapse
|
23
|
Conn JM, Carter JW, Conn JJA, Subramanian V, Baxter A, Engkvist O, Llinas A, Ratkova EL, Pickett SD, McDonagh JL, Palmer DS. Blinded Predictions and Post Hoc Analysis of the Second Solubility Challenge Data: Exploring Training Data and Feature Set Selection for Machine and Deep Learning Models. J Chem Inf Model 2023; 63:1099-1113. [PMID: 36758178 PMCID: PMC9976279 DOI: 10.1021/acs.jcim.2c01189] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/11/2023]
Abstract
Accurate methods to predict solubility from molecular structure are highly sought after in the chemical sciences. To assess the state of the art, the American Chemical Society organized a "Second Solubility Challenge" in 2019, in which competitors were invited to submit blinded predictions of the solubilities of 132 drug-like molecules. In the first part of this article, we describe the development of two models that were submitted to the Blind Challenge in 2019 but which have not previously been reported. These models were based on computationally inexpensive molecular descriptors and traditional machine learning algorithms and were trained on a relatively small data set of 300 molecules. In the second part of the article, to test the hypothesis that predictions would improve with more advanced algorithms and higher volumes of training data, we compare these original predictions with those made after the deadline using deep learning models trained on larger solubility data sets consisting of 2999 and 5697 molecules. The results show that there are several algorithms that are able to obtain near state-of-the-art performance on the solubility challenge data sets, with the best model, a graph convolutional neural network, resulting in an RMSE of 0.86 log units. Critical analysis of the models reveals systematic differences between the performance of models using certain feature sets and training data sets. The results suggest that careful selection of high quality training data from relevant regions of chemical space is critical for prediction accuracy but that other methodological issues remain problematic for machine learning solubility models, such as the difficulty in modeling complex chemical spaces from sparse training data sets.
Collapse
Affiliation(s)
- Jonathan
G. M. Conn
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - James W. Carter
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - Justin J. A. Conn
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.
| | - Vigneshwari Subramanian
- Drug
Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D,
AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden
| | - Andrew Baxter
- GSK
Medicines Research Centre, Gunnels Wood Road, Stevenage SG1 2NY, U.K.
| | - Ola Engkvist
- Medicinal
Chemistry, Research and Early Development, Cardiovascular, Renal and
Metabolism (CVRM), BioPharmaceuticals R&D,
AstraZeneca, SE-431 50 Göteborg, Sweden,Department
of Computer Science and Engineering, Chalmers
University of Technology, SE-412 96 Göteborg, Sweden
| | - Antonio Llinas
- Drug
Metabolism and Pharmacokinetics, Research and Early Development, Respiratory & Immunology, BioPharmaceuticals R&D,
AstraZeneca, Pepparedsleden 1, SE-431 83 Göteborg, Sweden
| | - Ekaterina L. Ratkova
- Medicinal
Chemistry, Research and Early Development, Cardiovascular, Renal and
Metabolism (CVRM), BioPharmaceuticals R&D,
AstraZeneca, SE-431 50 Göteborg, Sweden
| | - Stephen D. Pickett
- Computational
Sciences, GlaxoSmithKline R&D Pharmaceuticals, Stevenage SG1 2NY, U.K.
| | - James L. McDonagh
- IBM Research
Europe, Hartree Centre, SciTech Daresbury, Warrington, Cheshire WA4 4AD, U.K.
| | - David S. Palmer
- Department
of Pure and Applied Chemistry, University
of Strathclyde, Thomas Graham Building, 295 Cathedral Street, Glasgow G1 1XL, U.K.,E-mail:
| |
Collapse
|
24
|
Cysewski P, Jeliński T, Przybyłek M, Nowak W, Olczak M. Solubility Characteristics of Acetaminophen and Phenacetin in Binary Mixtures of Aqueous Organic Solvents: Experimental and Deep Machine Learning Screening of Green Dissolution Media. Pharmaceutics 2022; 14:pharmaceutics14122828. [PMID: 36559321 PMCID: PMC9781932 DOI: 10.3390/pharmaceutics14122828] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2022] [Revised: 12/10/2022] [Accepted: 12/12/2022] [Indexed: 12/23/2022] Open
Abstract
The solubility of active pharmaceutical ingredients is a mandatory physicochemical characteristic in pharmaceutical practice. However, the number of potential solvents and their mixtures prevents direct measurements of all possible combinations for finding environmentally friendly, operational and cost-effective solubilizers. That is why support from theoretical screening seems to be valuable. Here, a collection of acetaminophen and phenacetin solubility data in neat and binary solvent mixtures was used for the development of a nonlinear deep machine learning model using new intuitive molecular descriptors derived from COSMO-RS computations. The literature dataset was augmented with results of new measurements in aqueous binary mixtures of 4-formylmorpholine, DMSO and DMF. The solubility values back-computed with the developed ensemble of neural networks are in perfect agreement with the experimental data, which enables the extensive screening of many combinations of solvents not studied experimentally within the applicability domain of the trained model. The final predictions were presented not only in the form of the set of optimal hyperparameters but also in a more intuitive way by the set of parameters of the Jouyban-Acree equation often used in the co-solvency domain. This new and effective approach is easily extendible to other systems, enabling the fast and reliable selection of candidates for new solvents and directing the experimental solubility screening of active pharmaceutical ingredients.
Collapse
|
25
|
Zhao Z, Gui J, Yao A, Le NQK, Chua MCH. Improved Prediction Model of Protein and Peptide Toxicity by Integrating Channel Attention into a Convolutional Neural Network and Gated Recurrent Units. ACS OMEGA 2022; 7:40569-40577. [PMID: 36385847 PMCID: PMC9647964 DOI: 10.1021/acsomega.2c05881] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2022] [Accepted: 10/19/2022] [Indexed: 06/16/2023]
Abstract
In recent times, the importance of peptides in the biomedical domain has received increasing concern in terms of their effect on multiple disease treatments. However, before successful large-scale implementation in the industry, accurate identification of peptide toxicity is a vital prerequisite. The existing computational methods have reached good results from toxicity prediction, and we present an improved model based on different deep learning architectures. The modification mainly focuses on two aspects: sequence encoding and variational information bottlenecks. Consequently, one of our modified plans shows an obvious increase in sensitivity, while the rest show good performance meanwhile adding novelty in the peptide toxicity prediction domain. In detail, our best model could achieve an accuracy of 97.38 and 95.03% in protein and peptide toxicity predictions, respectively. The performance was superior to previous predictors on the same datasets.
Collapse
Affiliation(s)
- Zhengyun Zhao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Jingyu Gui
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Anqi Yao
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence
in Medicine, College of Medicine, Taipei
Medical University, Taipei 106, Taiwan
- Research
Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei 106, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei 110, Taiwan
| | - Matthew Chin Heng Chua
- Institute of Systems
Science, National University of Singapore, 25 Heng Mui Keng Terrace, Singapore 119615, Singapore
| |
Collapse
|