1
|
Chen M, Yang J, Tang C, Lu X, Wei Z, Liu Y, Yu P, Li H. Improving ADMET Prediction Accuracy for Candidate Drugs: Factors to Consider in QSPR Modeling Approaches. Curr Top Med Chem 2024; 24:222-242. [PMID: 38083894 DOI: 10.2174/0115680266280005231207105900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 11/02/2023] [Accepted: 11/10/2023] [Indexed: 05/04/2024]
Abstract
Quantitative Structure-Property Relationship (QSPR) employs mathematical and statistical methods to reveal quantitative correlations between the pharmacokinetics of compounds and their molecular structures, as well as their physical and chemical properties. QSPR models have been widely applied in the prediction of drug absorption, distribution, metabolism, excretion, and toxicity (ADMET). However, the accuracy of QSPR models for predicting drug ADMET properties still needs improvement. Therefore, this paper comprehensively reviews the tools employed in various stages of QSPR predictions for drug ADMET. It summarizes commonly used approaches to building QSPR models, systematically analyzing the advantages and limitations of each modeling method to ensure their judicious application. We provide an overview of recent advancements in the application of QSPR models for predicting drug ADMET properties. Furthermore, this review explores the inherent challenges in QSPR modeling while also proposing a range of considerations aimed at enhancing model prediction accuracy. The objective is to enhance the predictive capabilities of QSPR models in the field of drug development and provide valuable reference and guidance for researchers in this domain.
Collapse
Affiliation(s)
- Meilun Chen
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Jie Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Chunhua Tang
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Xiaoling Lu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Zheng Wei
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Yijie Liu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - Peng Yu
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| | - HuanHuan Li
- Xiangya School of Pharmaceutical Sciences, Central South University, No. 172, Tongzipo Road, Changsha, Hunan, 410013, China
| |
Collapse
|
2
|
Chaka MD, Mekonnen YS, Wu Q, Geffe CA. Advancing energy storage through solubility prediction: leveraging the potential of deep learning. Phys Chem Chem Phys 2023; 25:31836-31847. [PMID: 37966375 DOI: 10.1039/d3cp03992g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2023]
Abstract
Solubility prediction plays a crucial role in energy storage applications, such as redox flow batteries, because it directly affects the efficiency and reliability. Researchers have developed various methods that utilize quantum calculations and descriptors to predict the aqueous solubilities of organic molecules. Notably, machine learning models based on descriptors have shown promise for solubility prediction. As deep learning tools, graph neural networks (GNNs) have emerged to capture complex structure-property relationships for material property prediction. Specifically, MolGAT, a type of GNN model, was designed to incorporate n-dimensional edge attributes, enabling the modeling of intricacies in molecular graphs and enhancing the prediction capabilities. In a previous study, MolGAT successfully screened 23 467 promising redox-active molecules from a database of over 500 000 compounds, based on redox potential predictions. This study focused on applying the MolGAT model to predict the aqueous solubility (log S) of a broad range of organic compounds, including those previously screened for redox activity. The model was trained on a diverse sample of 8494 organic molecules from AqSolDB and benchmarked against literature data, demonstrating superior accuracy compared with other state of the art graph-based and descriptor-based models. Subsequently, the trained MolGAT model was employed to screen redox-active organic compounds identified in the first phase of high-throughput virtual screening, targeting favorable solubility in energy storage applications. The second round of screening, which considered solubility, yielded 12 332 promising redox-active and soluble organic molecules suitable for use in aqueous redox flow batteries. Thus, the two-phase high-throughput virtual screening approach utilizing MolGAT, specifically trained for redox potential and solubility, is an effective strategy for selecting suitable intrinsically soluble redox-active molecules from extensive databases, potentially advancing energy storage through reliable material development. This indicates that the model is reliable for predicting the solubility of various molecules and provides valuable insights for energy storage, pharmaceutical, environmental, and chemical applications.
Collapse
Affiliation(s)
- Mesfin Diro Chaka
- Department of Physics, College of Natural and Computational Sciences, Addis Ababa University, P. O. Box 1176, Addis Ababa, Ethiopia.
- Computational Data Science Program, College of Natural and Computational Sciences, Addis Ababa University, P. O. Box 1176, Addis Ababa, Ethiopia
| | - Yedilfana Setarge Mekonnen
- Center for Environmental Science, College of Natural and Computational Sciences, Addis Ababa University, P. O. Box 1176, Addis Ababa, Ethiopia
| | - Qin Wu
- Center for Functional Nanomaterials, Brookhaven National Laboratory, Upton, NY 11973, USA
| | - Chernet Amente Geffe
- Department of Physics, College of Natural and Computational Sciences, Addis Ababa University, P. O. Box 1176, Addis Ababa, Ethiopia.
| |
Collapse
|
3
|
Tran TTV, Tayara H, Chong KT. Recent Studies of Artificial Intelligence on In Silico Drug Absorption. J Chem Inf Model 2023; 63:6198-6211. [PMID: 37819031 DOI: 10.1021/acs.jcim.3c00960] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/13/2023]
Abstract
Absorption is an important area of research in pharmacochemistry and drug development, because the drug has to be absorbed before any drug effects can occur. Furthermore, the ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) profile of drugs can be directly and considerably altered by modulating factors affecting absorption. Many drugs in development fail because of poor absorption. The research and continuous efforts of researchers in recent years have brought many successes and promises in drug absorption property prediction, especially in silico, which helps to reduce the time and cost significantly for screening undesirable drug candidates. In this report, we explicitly provide an overview of recent in silico studies on predicting absorption properties, especially from 2019 to the present, using artificial intelligence. Additionally, we have collected and investigated public databases that support absorption prediction research. On those grounds, we also proposed the challenges and development directions of absorption prediction in the future. We hope this review can provide researchers with valuable guidelines on absorption prediction to facilitate the development of newer approaches in drug discovery.
Collapse
Affiliation(s)
- Thi Tuyet Van Tran
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju 54896, Republic of Korea
- Faculty of Information Technology, An Giang University, Long Xuyen 880000, Vietnam
- Vietnam National University, Ho Chi Minh City, Ho Chi Minh 700000, Vietnam
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju 54896, Republic of Korea
| | - Kil To Chong
- Advances Electronics and Information Research Center, Jeonbuk National University, Jeonju 54896, Republic of Korea
| |
Collapse
|
4
|
Lowe CN, Charest N, Ramsland C, Chang DT, Martin TM, Williams AJ. Transparency in Modeling through Careful Application of OECD's QSAR/QSPR Principles via a Curated Water Solubility Data Set. Chem Res Toxicol 2023; 36:465-478. [PMID: 36877669 PMCID: PMC10357388 DOI: 10.1021/acs.chemrestox.2c00379] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
The need for careful assembly, training, and validation of quantitative structure-activity/property models (QSAR/QSPR) is more significant than ever as data sets become larger and sophisticated machine learning tools become increasingly ubiquitous and accessible to the scientific community. Regulatory agencies such as the United States Environmental Protection Agency must carefully scrutinize each aspect of a resulting QSAR/QSPR model to determine its potential use in environmental exposure and hazard assessment. Herein, we revisit the goals of the Organisation for Economic Cooperation and Development (OECD) in our application and discuss the validation principles for structure-activity models. We apply these principles to a model for predicting water solubility of organic compounds derived using random forest regression, a common machine learning approach in the QSA/PR literature. Using public sources, we carefully assembled and curated a data set consisting of 10,200 unique chemical structures with associated water solubility measurements. This data set was then used as a focal narrative to methodically consider the OECD's QSA/PR principles and how they can be applied to random forests. Despite some expert, mechanistically informed supervision of descriptor selection to enhance model interpretability, we achieved a model of water solubility with comparable performance to previously published models (5-fold cross validated performance 0.81 R2 and 0.98 RMSE). We hope this work will catalyze a necessary conversation around the importance of cautiously modernizing and explicitly leveraging OECD principles while pursuing state-of-the-art machine learning approaches to derive QSA/PR models suitable for regulatory consideration.
Collapse
Affiliation(s)
- Charles N. Lowe
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Nathaniel Charest
- ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Christian Ramsland
- ORAU Student Services Contractor to Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Daniel T. Chang
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Todd M. Martin
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| | - Antony J. Williams
- Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina 27711, United States
| |
Collapse
|
5
|
Mahanty B, Behera SK, Sahoo NK. Misinterpretation of Dubinin–Radushkevich isotherm and its implications on adsorption parameter estimates. SEP SCI TECHNOL 2023. [DOI: 10.1080/01496395.2023.2189050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Affiliation(s)
- Biswanath Mahanty
- Department of Biotechnology, Karunya Institute of Technology and Sciences, Coimbatore, India
| | - Shishir Kumar Behera
- Industrial Ecology Research Group, School of Chemical Engineering, Vellore Institute of Technology, Vellore, India
| | - Naresh Kumar Sahoo
- Department of Chemistry, Environmental Science Program, (ITER), Siksha ‘O’ Anusandhan (Deemed to be University), Bhubaneswar, Odisha, India
| |
Collapse
|
6
|
Development of QSPR-ANN models for the estimation of critical properties of pure hydrocarbons. J Mol Graph Model 2023; 121:108450. [PMID: 36907016 DOI: 10.1016/j.jmgm.2023.108450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Revised: 02/21/2023] [Accepted: 03/06/2023] [Indexed: 03/09/2023]
Abstract
The current work aimed to predict three critical properties: critical temperature (Tc), critical volume (Vc), and critical pressure (Pc) of pure hydrocarbons. A multi-layer perceptron artificial neural network (MLP-ANN) has been adopted as a nonlinear modeling technique and computational approach based on a few relevant molecular descriptors. A set of diverse data points was used to build three QSPR-ANN models, including 223 points for Tc, Vc, and 221 for Pc. The entire database was randomly split into two subsets: 80% for the training set and 20% for the testing set. A large number of 1666 molecular descriptors were calculated and then reduced by a statistical methodology based on several phases to retain them into a reasonable number of relevant descriptors, wherein about 99% of initial descriptors were excluded. Thus, the Quasi-Newton backpropagation (BFGS) algorithm was applied to train the ANN structure. The results of three QSPR-ANN models showed good precision, confirmed by the high values of determination coefficient (R2) ranging from 0.9990 to 0.9945, and the low values of calculated errors, such as the Mean Absolute Percentage Error (MAPE) that ranged from 2.2497 to 0.7424% for the best three models of Tc, Vc, and Pc. The weight sensitivity analysis method was applied to know the contribution of each input descriptor individually or by class on each appropriate QSPR-ANN model. Moreover, the applicability domain (AD) method was also used with a strict limit of standardized residual values (di = ±2). However, the results were promising, with nearly 88% of the data points validated within the AD range. Finally, the results of the proposed QSPR-ANN models were compared with other well-known QSPR or ANN models for each property. Consequently, our three models provided satisfactory results, outperforming most of the models mentioned in this comparison. This computational approach can be applied in petroleum engineering and other related fields to accurately determine the critical properties of pure hydrocarbons: Tc, Vc, and Pc.
Collapse
|
7
|
Li M, Chen H, Zhang H, Zeng M, Chen B, Guan L. Prediction of the Aqueous Solubility of Compounds Based on Light Gradient Boosting Machines with Molecular Fingerprints and the Cuckoo Search Algorithm. ACS OMEGA 2022; 7:42027-42035. [PMID: 36440111 PMCID: PMC9685740 DOI: 10.1021/acsomega.2c03885] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Accepted: 10/18/2022] [Indexed: 06/16/2023]
Abstract
Aqueous solubility is one of the most important physicochemical properties in drug discovery. At present, the prediction of aqueous solubility of compounds is still a challenging problem. Machine learning has shown great potential in solubility prediction. Most machine learning models largely rely on the setting of hyperparameters, and their performance can be improved by setting the hyperparameters in a better way. In this paper, we used MACCS fingerprints to represent the structural features and optimized the hyperparameters of the light gradient boosting machine (LightGBM) with the cuckoo search algorithm (CS). Based on the above representation and optimization, the CS-LightGBM model was established to predict the aqueous solubility of 2446 organic compounds and the obtained prediction results were compared with those obtained with the other six different machine learning models (RF, GBDT, XGBoost, LightGBM, SVR, and BO-LightGBM). The comparison results showed that the CS-LightGBM model had a better prediction performance than the other six different models. RMSE, MAE, and R 2 of the CS-LightGBM model were, respectively, 0.7785, 0.5117, and 0.8575. In addition, this model has good scalability and can be used to solve solubility prediction problems in other fields such as solvent selection and drug screening.
Collapse
|
8
|
Kadaoluwa Pathirannahalage SP, Meftahi N, Elbourne A, Weiss ACG, McConville CF, Padua A, Winkler DA, Costa Gomes M, Greaves TL, Le TC, Besford QA, Christofferson AJ. Systematic Comparison of the Structural and Dynamic Properties of Commonly Used Water Models for Molecular Dynamics Simulations. J Chem Inf Model 2021; 61:4521-4536. [PMID: 34406000 DOI: 10.1021/acs.jcim.1c00794] [Citation(s) in RCA: 64] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Water is a unique solvent that is ubiquitous in biology and present in a variety of solutions, mixtures, and materials settings. It therefore forms the basis for all molecular dynamics simulations of biological phenomena, as well as for many chemical, industrial, and materials investigations. Over the years, many water models have been developed, and it remains a challenge to find a single water model that accurately reproduces all experimental properties of water simultaneously. Here, we report a comprehensive comparison of structural and dynamic properties of 30 commonly used 3-point, 4-point, 5-point, and polarizable water models simulated using consistent settings and analysis methods. For the properties of density, coordination number, surface tension, dielectric constant, self-diffusion coefficient, and solvation free energy of methane, models published within the past two decades consistently show better agreement with experimental values compared to models published earlier, albeit with some notable exceptions. However, no single model reproduced all experimental values exactly, highlighting the need to carefully choose a water model for a particular study, depending on the phenomena of interest. Finally, machine learning algorithms quantified the relationship between the water model force field parameters and the resulting bulk properties, providing insight into the parameter-property relationship and illustrating the challenges of developing a water model that can accurately reproduce all properties of water simultaneously.
Collapse
Affiliation(s)
- Sachini P Kadaoluwa Pathirannahalage
- School of Science, RMIT University, Melbourne, Victoria 3000, Australia.,Laboratoire de Chimie, Ecole Normale Supérieure de Lyon, CNRS, Lyon 69342, France
| | - Nastaran Meftahi
- ARC Centre of Excellence in Exciton Science, School of Science, RMIT University, Melbourne, Victoria 3000, Australia
| | - Aaron Elbourne
- School of Science, RMIT University, Melbourne, Victoria 3000, Australia
| | - Alessia C G Weiss
- Leibniz-Institut für Polymerforschung e.V., Hohe Straße 6, 01069 Dresden, Germany
| | - Chris F McConville
- School of Science, RMIT University, Melbourne, Victoria 3000, Australia.,Institute for Frontier Materials, Deakin University, Geelong, Victoria 3220, Australia
| | - Agilio Padua
- Laboratoire de Chimie, Ecole Normale Supérieure de Lyon, CNRS, Lyon 69342, France
| | - David A Winkler
- School of Biochemistry and Genetics, La Trobe Institute for Molecular Science, La Trobe University, Bundoora, Victoria 3086, Australia.,Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, Victoria 3052, Australia.,School of Pharmacy, University of Nottingham, Nottingham NG7 2QL, U.K
| | | | - Tamar L Greaves
- School of Science, RMIT University, Melbourne, Victoria 3000, Australia
| | - Tu C Le
- School of Engineering, RMIT University, Melbourne, Victoria 3001, Australia
| | - Quinn A Besford
- Leibniz-Institut für Polymerforschung e.V., Hohe Straße 6, 01069 Dresden, Germany
| | - Andrew J Christofferson
- School of Science, RMIT University, Melbourne, Victoria 3000, Australia.,ARC Centre of Excellence in Exciton Science, School of Science, RMIT University, Melbourne, Victoria 3000, Australia
| |
Collapse
|
9
|
Gala M, Žoldák G. Classifying Residues in Mechanically Stable and Unstable Substructures Based on a Protein Sequence: The Case Study of the DnaK Hsp70 Chaperone. NANOMATERIALS (BASEL, SWITZERLAND) 2021; 11:2198. [PMID: 34578514 PMCID: PMC8467864 DOI: 10.3390/nano11092198] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Revised: 08/16/2021] [Accepted: 08/24/2021] [Indexed: 12/17/2022]
Abstract
Artificial proteins can be constructed from stable substructures, whose stability is encoded in their protein sequence. Identifying stable protein substructures experimentally is the only available option at the moment because no suitable method exists to extract this information from a protein sequence. In previous research, we examined the mechanics of E. coli Hsp70 and found four mechanically stable (S class) and three unstable substructures (U class). Of the total 603 residues in the folded domains of Hsp70, 234 residues belong to one of four mechanically stable substructures, and 369 residues belong to one of three unstable substructures. Here our goal is to develop a machine learning model to categorize Hsp70 residues using sequence information. We applied three supervised methods: logistic regression (LR), random forest, and support vector machine. The LR method showed the highest accuracy, 0.925, to predict the correct class of a particular residue only when context-dependent physico-chemical features were included. The cross-validation of the LR model yielded a prediction accuracy of 0.879 and revealed that most of the misclassified residues lie at the borders between substructures. We foresee machine learning models being used to identify stable substructures as candidates for building blocks to engineer new proteins.
Collapse
Affiliation(s)
- Michal Gala
- Department of Biophysics, Faculty of Science, P. J. Šafárik University, Jesena 5, 040 01 Košice, Slovakia;
| | - Gabriel Žoldák
- Center for Interdisciplinary Biosciences, Technology and Innovation Park, P. J. Šafárik University, Trieda SNP 1, 040 11 Košice, Slovakia
| |
Collapse
|