1
|
Wang Z, You X, Lan L, Huang G, Zhu T, Tian S, Yang B, Zhuo Q. Electrocatalytic oxidation of hexafluoropropylene oxide homologues in water using a boron-doped diamond electrode. ENVIRONMENTAL TECHNOLOGY 2024:1-12. [PMID: 39128835 DOI: 10.1080/09593330.2024.2382937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Accepted: 06/03/2024] [Indexed: 08/13/2024]
Abstract
Hexafluoropropylene oxide (GenX) is a kind of substitute to PFOA, which has been listed in the Stockholm Convention. In this study, GenX was attempted to be degraded using a boron-doped diamond anode in the electrochemical oxidation system. The effects of operating parameters, including current density (0.5-10 mA/cm2), initial pH (3.0-11.49), initial concentration of GenX (20-150 mg/L), electrode distances (0.5-2 cm), electrolyte types (Na2SO4, NaCl, NaNO3 and NaHCO3) and Na2SO4 electrolyte concentration (40-80 mm), on GenX were studied. GenX can almost completely be degraded under the optimal operating parameters after 180 min of electrolysis. Free radical quenching experiments were carried out to investigate the effects of hydroxyl radicals and sulphate radicals on the degradation of GenX. The degradation intermediates were identified based on the ultra-high performance liquid chromatography equipped with a tandem mass spectrometer, and the degradation mechanisms were also proposed. Finally, the toxicities of GenX and its degradation products were evaluated using the QSAR models. The novelty is that the degradation mechanisms of the high concentration GenX (100 mg/L) were elucidated based on the free radical quenching experiments and the intermediates detected, when the degradation ratio reached 100%.
Collapse
Affiliation(s)
- Zihao Wang
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| | - Xiaolin You
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, People's Republic of China
| | - Liying Lan
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| | - Gang Huang
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| | - Tongyin Zhu
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| | - Shengpeng Tian
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| | - Bo Yang
- College of Chemistry and Environmental Engineering, Shenzhen University, Shenzhen, People's Republic of China
| | - Qiongfang Zhuo
- School of Environment and Civil Engineering, Dongguan University of Technology, Dongguan Key Laboratory of Emerging Contaminants, Dongguan, People's Republic of China
| |
Collapse
|
2
|
Saifi I, Bhat BA, Hamdani SS, Bhat UY, Lobato-Tapia CA, Mir MA, Dar TUH, Ganie SA. Artificial intelligence and cheminformatics tools: a contribution to the drug development and chemical science. J Biomol Struct Dyn 2024; 42:6523-6541. [PMID: 37434311 DOI: 10.1080/07391102.2023.2234039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2023] [Accepted: 07/03/2023] [Indexed: 07/13/2023]
Abstract
In the ever-evolving field of drug discovery, the integration of Artificial Intelligence (AI) and Machine Learning (ML) with cheminformatics has proven to be a powerful combination. Cheminformatics, which combines the principles of computer science and chemistry, is used to extract chemical information and search compound databases, while the application of AI and ML allows for the identification of potential hit compounds, optimization of synthesis routes, and prediction of drug efficacy and toxicity. This collaborative approach has led to the discovery, preclinical evaluations and approval of over 70 drugs in recent years. To aid researchers in the pursuit of new drugs, this article presents a comprehensive list of databases, datasets, predictive and generative models, scoring functions and web platforms that have been launched between 2021 and 2022. These resources provide a wealth of information and tools for computer-assisted drug development, and are a valuable asset for those working in the field of cheminformatics. Overall, the integration of AI, ML and cheminformatics has greatly advanced the drug discovery process and continues to hold great potential for the future. As new resources and technologies become available, we can expect to see even more groundbreaking discoveries and advancements in these fields.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Ifra Saifi
- Chaudhary Charan Singh University, Meerut, Uttar Pradesh, India
| | - Basharat Ahmad Bhat
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Syed Suhail Hamdani
- Department of Bioresources, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | - Umar Yousuf Bhat
- Department of Zoology, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| | | | - Mushtaq Ahmad Mir
- Department of Clinical Laboratory Sciences, College of Applied Medical Science, King Khalid University, KSA, Saudi Arabia
| | - Tanvir Ul Hasan Dar
- Department of Biotechnology, School of Biosciences and Biotechnology, BGSB University, Rajouri, India
| | - Showkat Ahmad Ganie
- Department of Clinical Biochemistry, School of Biological Sciences, University of Kashmir, Srinagar, J&K, India
| |
Collapse
|
3
|
Ja’afaru SC, Uzairu A, Bayil I, Sallau MS, Ndukwe GI, Ibrahim MT, Moin AT, Mollah AKMM, Absar N. Unveiling potent inhibitors for schistosomiasis through ligand-based drug design, molecular docking, molecular dynamics simulations and pharmacokinetics predictions. PLoS One 2024; 19:e0302390. [PMID: 38923997 PMCID: PMC11207139 DOI: 10.1371/journal.pone.0302390] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Accepted: 04/02/2024] [Indexed: 06/28/2024] Open
Abstract
Schistosomiasis is a neglected tropical disease which imposes a considerable and enduring impact on affected regions, leading to persistent morbidity, hindering child development, diminishing productivity, and imposing economic burdens. Due to the emergence of drug resistance and limited management options, there is need to develop additional effective inhibitors for schistosomiasis. In view of this, quantitative structure-activity relationship studies, molecular docking, molecular dynamics simulations, drug-likeness and pharmacokinetics predictions were applied to 39 Schistosoma mansoni Thioredoxin Glutathione Reductase (SmTGR) inhibitors. The chosen QSAR model demonstrated robust statistical parameters, including an R2 of 0.798, R2adj of 0.767, Q2cv of 0.681, LOF of 0.930, R2test of 0.776, and cR2p of 0.746, confirming its reliability. The most active derivative (compound 40) was identified as a lead candidate for the development of new potential non-covalent inhibitors through ligand-based design. Subsequently, 12 novel compounds (40a-40l) were designed with enhanced anti-schistosomiasis activity and binding affinity. Molecular docking studies revealed strong and stable interactions, including hydrogen bonding, between the designed compounds and the target receptor. Molecular dynamics simulations over 100 nanoseconds and MM-PBSA free binding energy (ΔGbind) calculations validated the stability of the two best-designed molecules. Furthermore, drug-likeness and pharmacokinetics prediction analyses affirmed the potential of these designed compounds, suggesting their promise as innovative agents for the treatment of schistosomiasis.
Collapse
Affiliation(s)
- Saudatu Chinade Ja’afaru
- Department of Chemistry Ahmadu Bello University Zaria, Zaria, Nigeria
- Department of Chemistry, Aliko Dangote University of Science and Technology, Wudil, Kano, Nigeria
| | - Adamu Uzairu
- Department of Chemistry Ahmadu Bello University Zaria, Zaria, Nigeria
| | - Imren Bayil
- Department of Bioinformatics and Computational Biology, Gaziantep University, Gaziantep, Turkey
| | | | | | | | - Abu Tayab Moin
- Department of Genetic Engineering and Biotechnology, Faculty of Biological Sciences, University of Chittagong, Chattogram, Bangladesh
| | | | - Nurul Absar
- Department of Biochemistry and Biotechnology, Faculty of Basic Medical and Pharmaceutical Sciences, University of Science & Technology Chittagong, Khulshi, Chittagong, Bangladesh
| |
Collapse
|
4
|
Walter M, Webb SJ, Gillet VJ. Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features. J Chem Inf Model 2024; 64:3670-3688. [PMID: 38686880 PMCID: PMC11094726 DOI: 10.1021/acs.jcim.4c00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.
Collapse
Affiliation(s)
- Moritz Walter
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| | - Samuel J. Webb
- Lhasa
Limited, Granary Wharf
House, 2 Canal Wharf, Leeds LS11 5PY, U.K.
| | - Valerie J. Gillet
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| |
Collapse
|
5
|
Feng H, Qin L, Zhang B, Zhou J. Prediction and Interpretability of Melting Points of Ionic Liquids Using Graph Neural Networks. ACS OMEGA 2024; 9:16016-16025. [PMID: 38617653 PMCID: PMC11007696 DOI: 10.1021/acsomega.3c09543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 03/13/2024] [Accepted: 03/15/2024] [Indexed: 04/16/2024]
Abstract
Ionic liquids (ILs) have wide and promising applications in fields such as chemical engineering, energy, and the environment. However, the melting points (MPs) of ILs are one of the most crucial properties affecting their applications. The MPs of ILs are affected by various factors, and tuning these in a laboratory is time-consuming and costly. Therefore, an accurate and efficient method is required to predict the desired MPs in the design of novel targeted ILs. In this study, three descriptor-based machine learning (DBML) models and eight graph neural network (GNN) models were proposed to predict the MPs of ILs. Fingerprints and molecular graphs were used to represent molecules for the DBML and GNNs, respectively. The GNN models demonstrated performance superior to that of the DBML models. Among all of the examined models, the graph convolutional model exhibited the best performance with high accuracy (root-mean-squared error = 37.06, mean absolute error = 28.79, and correlation coefficient = 0.76). Benefiting from molecular graph representation, we built a GNN-based interpretable model to reveal the atomistic contribution to the MPs of ILs using a data-driven procedure. According to our interpretable model, amino groups, S+, N+, and P+ would increase the MPs of ILs, while the negatively charged halogen atoms, S-, and N- would decrease the MPs of ILs. The results of this study provide new insight into the rapid screening and synthesis of targeted ILs with appropriate MPs.
Collapse
Affiliation(s)
- Haijun Feng
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Lanlan Qin
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| | - Bingxuan Zhang
- School
of Computer Sciences, Shenzhen Institute
of Information Technology, Shenzhen, Guangdong 518172, China
| | - Jian Zhou
- School
of Chemistry and Chemical Engineering, South
China University of Technology, Guangzhou, Guangdong 510640, China
| |
Collapse
|
6
|
He D, Liu Q, Mi Y, Meng Q, Xu L, Hou C, Wang J, Li N, Liu Y, Chai H, Yang Y, Liu J, Wang L, Hou Y. De Novo Generation and Identification of Novel Compounds with Drug Efficacy Based on Machine Learning. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2024; 11:e2307245. [PMID: 38204214 PMCID: PMC10962488 DOI: 10.1002/advs.202307245] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/05/2023] [Indexed: 01/12/2024]
Abstract
One of the main challenges in small molecule drug discovery is finding novel chemical compounds with desirable activity. Traditional drug development typically begins with target selection, but the correlation between targets and disease remains to be further investigated, and drugs designed based on targets may not always have the desired drug efficacy. The emergence of machine learning provides a powerful tool to overcome the challenge. Herein, a machine learning-based strategy is developed for de novo generation of novel compounds with drug efficacy termed DTLS (Deep Transfer Learning-based Strategy) by using dataset of disease-direct-related activity as input. DTLS is applied in two kinds of disease: colorectal cancer (CRC) and Alzheimer's disease (AD). In each case, novel compound is discovered and identified in in vitro and in vivo disease models. Their mechanism of actionis further explored. The experimental results reveal that DTLS can not only realize the generation and identification of novel compounds with drug efficacy but also has the advantage of identifying compounds by focusing on protein targets to facilitate the mechanism study. This work highlights the significant impact of machine learning on the design of novel compounds with drug efficacy, which provides a powerful new approach to drug discovery.
Collapse
Affiliation(s)
- Dakuo He
- College of Information Science and EngineeringState Key Laboratory of Synthetical Automation for Process IndustriesNortheastern UniversityShenyang110819China
| | - Qing Liu
- College of Information Science and EngineeringState Key Laboratory of Synthetical Automation for Process IndustriesNortheastern UniversityShenyang110819China
| | - Yan Mi
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| | - Qingqi Meng
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| | - Libin Xu
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| | - Chunyu Hou
- College of Information Science and EngineeringState Key Laboratory of Synthetical Automation for Process IndustriesNortheastern UniversityShenyang110819China
| | - Jinpeng Wang
- College of Information Science and EngineeringState Key Laboratory of Synthetical Automation for Process IndustriesNortheastern UniversityShenyang110819China
| | - Ning Li
- School of Traditional Chinese Materia MedicaKey Laboratory for TCM Material Basis Study and Innovative Drug Development of Shenyang CityShenyang Pharmaceutical UniversityShenyang110016China
| | - Yang Liu
- Key Laboratory of Structure‐Based Drug Design & Discovery of Ministry of EducationShenyang Pharmaceutical UniversityShenyang110016China
| | - Huifang Chai
- School of PharmacyGuizhou University of Traditional Chinese MedicineGuiyang550025China
| | - Yanqiu Yang
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| | - Jingyu Liu
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| | - Lihui Wang
- Department of PharmacologyShenyang Pharmaceutical UniversityShenyang110016China
| | - Yue Hou
- Key Laboratory of Bioresource Research and Development of Liaoning ProvinceCollege of Life and Health SciencesNational Frontiers Science Center for Industrial Intelligence and Systems OptimizationNortheastern UniversityShenyang110169China
- Key Laboratory of Data Analytics and Optimization for Smart IndustryMinistry of EducationNortheastern UniversityShenyang110169China
| |
Collapse
|
7
|
Moesgaard L, Pedersen ML, Uhd Nielsen C, Kongsted J. Structure-based discovery of novel P-glycoprotein inhibitors targeting the nucleotide binding domains. Sci Rep 2023; 13:21217. [PMID: 38040777 PMCID: PMC10692163 DOI: 10.1038/s41598-023-48281-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2023] [Accepted: 11/24/2023] [Indexed: 12/03/2023] Open
Abstract
P-glycoprotein (P-gp), a membrane transport protein overexpressed in certain drug-resistant cancer cells, has been the target of numerous drug discovery projects aimed at overcoming drug resistance in cancer. Most characterized P-gp inhibitors bind at the large hydrophobic drug binding domain (DBD), but none have yet attained regulatory approval. In this study, we explored the potential of designing inhibitors that target the nucleotide binding domains (NBDs), by computationally screening a large library of 2.6 billion synthesizable molecules, using a combination of machine learning-guided molecular docking and molecular dynamics (MD). 14 of the computationally best-scoring molecules were subsequently tested for their ability to inhibit P-gp mediated calcein-AM efflux. In total, five diverse compounds exhibited inhibitory effects in the calcein-AM assay without displaying toxicity. The activity of these compounds was confirmed by their ability to decrease the verapamil-stimulated ATPase activity of P-gp in a subsequent assay. The discovery of these five novel P-gp inhibitors demonstrates the potential of in-silico screening in drug discovery and provides a new stepping point towards future potent P-gp inhibitors.
Collapse
Affiliation(s)
- Laust Moesgaard
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark.
| | - Maria L Pedersen
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| | - Carsten Uhd Nielsen
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| | - Jacob Kongsted
- Department of Physics, Chemistry and Pharmacy, University of Southern Denmark, Odense M, 5230, Denmark
| |
Collapse
|
8
|
Liu X, Guo Y, Pan W, Xue Q, Fu J, Qu G, Zhang A. Exogenous Chemicals Impact Virus Receptor Gene Transcription: Insights from Deep Learning. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2023; 57:18038-18047. [PMID: 37186679 DOI: 10.1021/acs.est.2c09837] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
Despite the fact that coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been disrupting human life and health worldwide since the outbreak in late 2019, the impact of exogenous substance exposure on the viral infection remains unclear. It is well-known that, during viral infection, organism receptors play a significant role in mediating the entry of viruses to enter host cells. A major receptor of SARS-CoV-2 is the angiotensin-converting enzyme 2 (ACE2). This study proposes a deep learning model based on the graph convolutional network (GCN) that enables, for the first time, the prediction of exogenous substances that affect the transcriptional expression of the ACE2 gene. It outperforms other machine learning models, achieving an area under receiver operating characteristic curve (AUROC) of 0.712 and 0.703 on the validation and internal test set, respectively. In addition, quantitative polymerase chain reaction (qPCR) experiments provided additional supporting evidence for indoor air pollutants identified by the GCN model. More broadly, the proposed methodology can be applied to predict the effect of environmental chemicals on the gene transcription of other virus receptors as well. In contrast to typical deep learning models that are of black box nature, we further highlight the interpretability of the proposed GCN model and how it facilitates deeper understanding of gene change at the structural level.
Collapse
Affiliation(s)
- Xian Liu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Yunhe Guo
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Wenxiao Pan
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Qiao Xue
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
| | - Jianjie Fu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, P.R. China
| | - Guangbo Qu
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
| | - Aiqian Zhang
- State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, P. R. China
- School of Environment, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou 310012, P. R. China
- College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100190, P. R. China
- Institute of Environment and Health, Jianghan University, Wuhan 430056, P.R. China
| |
Collapse
|
9
|
Mastropietro A, Feldmann C, Bajorath J. Calculation of exact Shapley values for explaining support vector machine models using the radial basis function kernel. Sci Rep 2023; 13:19561. [PMID: 37949930 PMCID: PMC10638308 DOI: 10.1038/s41598-023-46930-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 11/07/2023] [Indexed: 11/12/2023] Open
Abstract
Machine learning (ML) algorithms are extensively used in pharmaceutical research. Most ML models have black-box character, thus preventing the interpretation of predictions. However, rationalizing model decisions is of critical importance if predictions should aid in experimental design. Accordingly, in interdisciplinary research, there is growing interest in explaining ML models. Methods devised for this purpose are a part of the explainable artificial intelligence (XAI) spectrum of approaches. In XAI, the Shapley value concept originating from cooperative game theory has become popular for identifying features determining predictions. The Shapley value concept has been adapted as a model-agnostic approach for explaining predictions. Since the computational time required for Shapley value calculations scales exponentially with the number of features used, local approximations such as Shapley additive explanations (SHAP) are usually required in ML. The support vector machine (SVM) algorithm is one of the most popular ML methods in pharmaceutical research and beyond. SVM models are often explained using SHAP. However, there is only limited correlation between SHAP and exact Shapley values, as previously demonstrated for SVM calculations using the Tanimoto kernel, which limits SVM model explanation. Since the Tanimoto kernel is a special kernel function mostly applied for assessing chemical similarity, we have developed the Shapley value-expressed radial basis function (SVERAD), a computationally efficient approach for the calculation of exact Shapley values for SVM models based upon radial basis function kernels that are widely applied in different areas. SVERAD is shown to produce meaningful explanations of SVM predictions.
Collapse
Affiliation(s)
- Andrea Mastropietro
- Department of Computer, Control and Management Engineering "Antonio Ruberti", Sapienza University of Rome, 00185, Rome, Italy
| | - Christian Feldmann
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 5/6, 53115, Bonn, Germany.
| |
Collapse
|
10
|
Rodríguez-Belenguer P, March-Vila E, Pastor M, Mangas-Sanjuan V, Soria-Olivas E. Usage of model combination in computational toxicology. Toxicol Lett 2023; 389:34-44. [PMID: 37890682 DOI: 10.1016/j.toxlet.2023.10.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2023] [Revised: 10/17/2023] [Accepted: 10/24/2023] [Indexed: 10/29/2023]
Abstract
New Approach Methodologies (NAMs) have ushered in a new era in the field of toxicology, aiming to replace animal testing. However, despite these advancements, they are not exempt from the inherent complexities associated with the study's endpoint. In this review, we have identified three major groups of complexities: mechanistic, chemical space, and methodological. The mechanistic complexity arises from interconnected biological processes within a network that are challenging to model in a single step. In the second group, chemical space complexity exhibits significant dissimilarity between compounds in the training and test series. The third group encompasses algorithmic and molecular descriptor limitations and typical class imbalance problems. To address these complexities, this work provides a guide to the usage of a combination of predictive Quantitative Structure-Activity Relationship (QSAR) models, known as metamodels. This combination of low-level models (LLMs) enables a more precise approach to the problem by focusing on different sub-mechanisms or sub-processes. For mechanistic complexity, multiple Molecular Initiating Events (MIEs) or levels of information are combined to form a mechanistic-based metamodel. Regarding the complexity arising from chemical space, two types of approaches were reviewed to construct a fragment-based chemical space metamodel: those with and without structure sharing. Metamodels with structure sharing utilize unsupervised strategies to identify data patterns and build low-level models for each cluster, which are then combined. For situations without structure sharing due to pharmaceutical industry intellectual property, the use of prediction sharing, and federated learning approaches have been reviewed. Lastly, to tackle methodological complexity, various algorithms are combined to overcome their limitations, diverse descriptors are employed to enhance problem definition and balanced dataset combinations are used to address class imbalance issues (methodological-based metamodels). Remarkably, metamodels consistently outperformed classical QSAR models across all cases, highlighting the importance of alternatives to classical QSAR models when faced with such complexities.
Collapse
Affiliation(s)
- Pablo Rodríguez-Belenguer
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain; Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain
| | - Eric March-Vila
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Manuel Pastor
- Research Programme on Biomedical Informatics (GRIB), Department of Medicine and Life Sciences, Universitat Pompeu Fabra, Hospital del Mar Medical Research Institute, 08003 Barcelona, Spain
| | - Victor Mangas-Sanjuan
- Department of Pharmacy and Pharmaceutical Technology and Parasitology, Universitat de València, 46100 Valencia, Spain; Interuniversity Research Institute for Molecular Recognition and Technological Development, Universitat Politècnica de València, 46100 Valencia, Spain
| | - Emilio Soria-Olivas
- IDAL, Intelligent Data Analysis Laboratory, ETSE, Universitat de València, 46100 Valencia, Spain.
| |
Collapse
|
11
|
Hadfield TE, Scantlebury J, Deane CM. Exploring the ability of machine learning-based virtual screening models to identify the functional groups responsible for binding. J Cheminform 2023; 15:84. [PMID: 37726844 PMCID: PMC10509074 DOI: 10.1186/s13321-023-00755-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 08/25/2023] [Indexed: 09/21/2023] Open
Abstract
Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS .
Collapse
Affiliation(s)
- Thomas E Hadfield
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Jack Scantlebury
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK
| | - Charlotte M Deane
- Oxford Protein Informatics Group, Department of Statistics, University of Oxford, Oxford, UK.
| |
Collapse
|
12
|
Amara K, Rodríguez-Pérez R, Jiménez-Luna J. Explaining compound activity predictions with a substructure-aware loss for graph neural networks. J Cheminform 2023; 15:67. [PMID: 37491407 PMCID: PMC10369817 DOI: 10.1186/s13321-023-00733-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Accepted: 07/08/2023] [Indexed: 07/27/2023] Open
Abstract
Explainable machine learning is increasingly used in drug discovery to help rationalize compound property predictions. Feature attribution techniques are popular choices to identify which molecular substructures are responsible for a predicted property change. However, established molecular feature attribution methods have so far displayed low performance for popular deep learning algorithms such as graph neural networks (GNNs), especially when compared with simpler modeling alternatives such as random forests coupled with atom masking. To mitigate this problem, a modification of the regression objective for GNNs is proposed to specifically account for common core structures between pairs of molecules. The presented approach shows higher accuracy on a recently-proposed explainability benchmark. This methodology has the potential to assist with model explainability in drug discovery pipelines, particularly in lead optimization efforts where specific chemical series are investigated.
Collapse
Affiliation(s)
- Kenza Amara
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
- Department of Computer Science, ETH Zurich, Andreasstrasse 5, 8050 Zurich, Switzerland
| | | | - José Jiménez-Luna
- Microsoft Research AI4Science, 21 Station Rd., Cambridge, CB1 2FB UK
| |
Collapse
|
13
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
14
|
Singh AV, Varma M, Laux P, Choudhary S, Datusalia AK, Gupta N, Luch A, Gandhi A, Kulkarni P, Nath B. Artificial intelligence and machine learning disciplines with the potential to improve the nanotoxicology and nanomedicine fields: a comprehensive review. Arch Toxicol 2023; 97:963-979. [PMID: 36878992 PMCID: PMC10025217 DOI: 10.1007/s00204-023-03471-x] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Accepted: 02/20/2023] [Indexed: 03/08/2023]
Abstract
The use of nanomaterials in medicine depends largely on nanotoxicological evaluation in order to ensure safe application on living organisms. Artificial intelligence (AI) and machine learning (MI) can be used to analyze and interpret large amounts of data in the field of toxicology, such as data from toxicological databases and high-content image-based screening data. Physiologically based pharmacokinetic (PBPK) models and nano-quantitative structure-activity relationship (QSAR) models can be used to predict the behavior and toxic effects of nanomaterials, respectively. PBPK and Nano-QSAR are prominent ML tool for harmful event analysis that is used to understand the mechanisms by which chemical compounds can cause toxic effects, while toxicogenomics is the study of the genetic basis of toxic responses in living organisms. Despite the potential of these methods, there are still many challenges and uncertainties that need to be addressed in the field. In this review, we provide an overview of artificial intelligence (AI) and machine learning (ML) techniques in nanomedicine and nanotoxicology to better understand the potential toxic effects of these materials at the nanoscale.
Collapse
Affiliation(s)
- Ajay Vikram Singh
- Department of Chemical and Product Safety, German Federal Institute for Risk Assessment (BfR), Max-Dohrn-Straße 8-10, 10589, Berlin, Germany.
| | - Mansi Varma
- Department of Regulatory Toxicology, National Institute of Pharmaceutical Education and Research (NIPER-Raebareli), Lucknow, 229001, India
| | - Peter Laux
- Department of Chemical and Product Safety, German Federal Institute for Risk Assessment (BfR), Max-Dohrn-Straße 8-10, 10589, Berlin, Germany
| | - Sunil Choudhary
- Department of Radiotherapy and Radiation Medicine, Institute of Medical Sciences, Banaras Hindu University, Varanasi, 221005, India
| | - Ashok Kumar Datusalia
- Department of Regulatory Toxicology, National Institute of Pharmaceutical Education and Research (NIPER-Raebareli), Lucknow, 229001, India
| | - Neha Gupta
- Department of Radiation Oncology, Apex Hospital, Varanasi, 221005, India
| | - Andreas Luch
- Department of Chemical and Product Safety, German Federal Institute for Risk Assessment (BfR), Max-Dohrn-Straße 8-10, 10589, Berlin, Germany
| | - Anusha Gandhi
- Elisabeth-Selbert-Gymnasium, Tübinger Str. 71, 70794, Filderstadt, Germany
| | - Pranav Kulkarni
- Seeta Nursing Home, Shivaji Nagar, Nashik, Maharashtra, 422002, India
| | - Banashree Nath
- Department of Obstetrics and Gynaecology, All India Institute of Medical Sciences, Raebareli, Uttar Pradesh, 229405, India
| |
Collapse
|
15
|
Belfield SJ, Cronin MTD, Enoch SJ, Firman JW. Guidance for good practice in the application of machine learning in development of toxicological quantitative structure-activity relationships (QSARs). PLoS One 2023; 18:e0282924. [PMID: 37163504 PMCID: PMC10171609 DOI: 10.1371/journal.pone.0282924] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2022] [Accepted: 02/26/2023] [Indexed: 05/12/2023] Open
Abstract
Recent years have seen a substantial growth in the adoption of machine learning approaches for the purposes of quantitative structure-activity relationship (QSAR) development. Such a trend has coincided with desire to see a shifting in the focus of methodology employed within chemical safety assessment: away from traditional reliance upon animal-intensive in vivo protocols, and towards increased application of in silico (or computational) predictive toxicology. With QSAR central amongst techniques applied in this area, the emergence of algorithms trained through machine learning with the objective of toxicity estimation has, quite naturally, arisen. On account of the pattern-recognition capabilities of the underlying methods, the statistical power of the ensuing models is potentially considerable-appropriate for the handling even of vast, heterogeneous datasets. However, such potency comes at a price: this manifesting as the general practical deficits observed with respect to the reproducibility, interpretability and generalisability of the resulting tools. Unsurprisingly, these elements have served to hinder broader uptake (most notably within a regulatory setting). Areas of uncertainty liable to accompany (and hence detract from applicability of) toxicological QSAR have previously been highlighted, accompanied by the forwarding of suggestions for "best practice" aimed at mitigation of their influence. However, the scope of such exercises has remained limited to "classical" QSAR-that conducted through use of linear regression and related techniques, with the adoption of comparatively few features or descriptors. Accordingly, the intention of this study has been to extend the remit of best practice guidance, so as to address concerns specific to employment of machine learning within the field. In doing so, the impact of strategies aimed at enhancing the transparency (feature importance, feature reduction), generalisability (cross-validation) and predictive power (hyperparameter optimisation) of algorithms, trained upon real toxicity data through six common learning approaches, is evaluated.
Collapse
Affiliation(s)
- Samuel J Belfield
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Mark T D Cronin
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - Steven J Enoch
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| | - James W Firman
- School of Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Liverpool, United Kingdom
| |
Collapse
|
16
|
Zheng X, Tomiura Y, Hayashi K. Investigation of the structure-odor relationship using a Transformer model. J Cheminform 2022; 14:88. [PMID: 36581889 PMCID: PMC9798546 DOI: 10.1186/s13321-022-00671-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 12/14/2022] [Indexed: 12/30/2022] Open
Abstract
The relationships between molecular structures and their properties are subtle and complex, and the properties of odor are no exception. Molecules with similar structures, such as a molecule and its optical isomer, may have completely different odors, whereas molecules with completely distinct structures may have similar odors. Many works have attempted to explain the molecular structure-odor relationship from chemical and data-driven perspectives. The Transformer model is widely used in natural language processing and computer vision, and the attention mechanism included in the Transformer model can identify relationships between inputs and outputs. In this paper, we describe the construction of a Transformer model for predicting molecular properties and interpreting the prediction results. The SMILES data of 100,000 molecules are collected and used to predict the existence of molecular substructures, and our proposed model achieves an F1 value of 0.98. The attention matrix is visualized to investigate the substructure annotation performance of the attention mechanism, and we find that certain atoms in the target substructures are accurately annotated. Finally, we collect 4462 molecules and their odor descriptors and use the proposed model to infer 98 odor descriptors, obtaining an average F1 value of 0.33. For the 19 odor descriptors that achieved F1 values greater than 0.45, we also attempt to summarize the relationship between the molecular substructures and odor quality through the attention matrix.
Collapse
Affiliation(s)
- Xiaofan Zheng
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Yoichi Tomiura
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Kenshi Hayashi
- Graduate School of Information Science and Electrical Engineering, Department of Electronics, Kyushu University, Fukuoka, Japan
| |
Collapse
|
17
|
Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:4987639. [PMID: 35958779 PMCID: PMC9357736 DOI: 10.1155/2022/4987639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2022] [Revised: 06/14/2022] [Accepted: 06/27/2022] [Indexed: 11/18/2022]
Abstract
Performance prediction based on candidates and screening based on predicted performance value are the core of product development. For example, the performance prediction and screening of equipment components and parts are an important guarantee for the reliability of equipment products. The prediction and screening of drug bioactivity value and performance are the keys to pharmaceutical product development. The main reasons for the failure of pharmaceutical discovery are the low bioactivity of the candidate compounds and the deficiencies in their efficacy and safety, which are related to the absorption, distribution, metabolism, excretion, and toxicity (ADMET) of the compounds. Therefore, it is very necessary to quickly and effectively perform systematic bioactivity value prediction and ADMET property evaluation for candidate compounds in the early stage of drug discovery. In this paper, a data-driven pharmaceutical products screening prediction model is proposed to screen drug candidates with higher bioactivity value and better ADMET properties. First, a quantitative prediction method for bioactivity value is proposed using the fusion regression of LGBM and neural network based on backpropagation (BP-NN). Then, the ADMET properties prediction method is proposed using XGBoost. According to the predicted bioactivity value and ADMET properties, the BVAP method is defined to screen the drug candidates. And the screening model is validated on the dataset of antagonized Erα active compounds, in which the mean square error (MSE) of fusion regression is 1.1496, the XGBoost prediction accuracy of ADMET properties are 94.0% for Caco-2, 95.7% for CYP3A4, 89.4% for HERG, 88.6% for hob, and 96.2% for Mn. Compared with the commonly used methods for ADMET properties such as SVM, RF, KNN, LDA, and NB, the XGBoost in this paper has the highest prediction accuracy and AUC value, which has better guiding significance and can help screen pharmaceutical product candidates with good bioactivity, pharmacokinetic properties, and safety.
Collapse
|
18
|
Bender A, Schneider N, Segler M, Patrick Walters W, Engkvist O, Rodrigues T. Evaluation guidelines for machine learning tools in the chemical sciences. Nat Rev Chem 2022; 6:428-442. [PMID: 37117429 DOI: 10.1038/s41570-022-00391-9] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/13/2022] [Indexed: 02/07/2023]
Abstract
Machine learning (ML) promises to tackle the grand challenges in chemistry and speed up the generation, improvement and/or ordering of research hypotheses. Despite the overarching applicability of ML workflows, one usually finds diverse evaluation study designs. The current heterogeneity in evaluation techniques and metrics leads to difficulty in (or the impossibility of) comparing and assessing the relevance of new algorithms. Ultimately, this may delay the digitalization of chemistry at scale and confuse method developers, experimentalists, reviewers and journal editors. In this Perspective, we critically discuss a set of method development and evaluation guidelines for different types of ML-based publications, emphasizing supervised learning. We provide a diverse collection of examples from various authors and disciplines in chemistry. While taking into account varying accessibility across research groups, our recommendations focus on reporting completeness and standardizing comparisons between tools. We aim to further contribute to improved ML transparency and credibility by suggesting a checklist of retro-/prospective tests and dissecting their importance. We envisage that the wide adoption and continuous update of best practices will encourage an informed use of ML on real-world problems related to the chemical sciences.
Collapse
|
19
|
Rodríguez-Pérez R, Miljković F, Bajorath J. Machine Learning in Chemoinformatics and Medicinal Chemistry. Annu Rev Biomed Data Sci 2022; 5:43-65. [PMID: 35440144 DOI: 10.1146/annurev-biodatasci-122120-124216] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In chemoinformatics and medicinal chemistry, machine learning has evolved into an important approach. In recent years, increasing computational resources and new deep learning algorithms have put machine learning onto a new level, addressing previously unmet challenges in pharmaceutical research. In silico approaches for compound activity predictions, de novo design, and reaction modeling have been further advanced by new algorithmic developments and the emergence of big data in the field. Herein, novel applications of machine learning and deep learning in chemoinformatics and medicinal chemistry are reviewed. Opportunities and challenges for new methods and applications are discussed, placing emphasis on proper baseline comparisons, robust validation methodologies, and new applicability domains. Expected final online publication date for the Annual Review of Biomedical Data Science, Volume 5 is August 2022. Please see http://www.annualreviews.org/page/journal/pubdates for revised estimates.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Novartis Institutes for Biomedical Research, Novartis Campus, Basel, Switzerland
| | - Filip Miljković
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany; .,Current affiliation: Data Science and AI, Imaging and Data Analytics, Clinical Pharmacology and Safety Sciences, R&D AstraZeneca, Gothenburg, Sweden
| | - Jürgen Bajorath
- Department of Life Science Informatics, B-IT (Bonn-Aachen International Center for Information Technology), Chemical Biology and Medicinal Chemistry Program Unit, LIMES (Life and Medical Sciences Institute), Rheinische Friedrich-Wilhelms-Universität, Bonn, Germany;
| |
Collapse
|
20
|
Jiménez-Luna J, Skalic M, Weskamp N. Benchmarking Molecular Feature Attribution Methods with Activity Cliffs. J Chem Inf Model 2022; 62:274-283. [PMID: 35019265 DOI: 10.1021/acs.jcim.1c01163] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Feature attribution techniques are popular choices within the explainable artificial intelligence toolbox, as they can help elucidate which parts of the provided inputs used by an underlying supervised-learning method are considered relevant for a specific prediction. In the context of molecular design, these approaches typically involve the coloring of molecular graphs, whose presentation to medicinal chemists can be useful for making a decision of which compounds to synthesize or prioritize. The consistency of the highlighted moieties alongside expert background knowledge is expected to contribute to the understanding of machine-learning models in drug design. Quantitative evaluation of such coloring approaches, however, has so far been limited to substructure identification tasks. We here present an approach that is based on maximum common substructure algorithms applied to experimentally-determined activity cliffs. Using the proposed benchmark, we found that molecule coloring approaches in conjunction with classical machine-learning models tend to outperform more modern, graph-neural-network alternatives. The provided benchmark data are fully open sourced, which we hope will facilitate the testing of newly developed molecular feature attribution techniques.
Collapse
Affiliation(s)
- José Jiménez-Luna
- Department of Chemistry and Applied Biosciences, RETHINK, ETH Zurich, 8093 Zurich, Switzerland.,Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Miha Skalic
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| | - Nils Weskamp
- Department of Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Straße 65, 88397 Biberach an der Riss, Germany
| |
Collapse
|
21
|
Romano JD, Hao Y, Moore JH. Improving QSAR Modeling for Predictive Toxicology using Publicly Aggregated Semantic Graph Data and Graph Neural Networks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2022; 27:187-198. [PMID: 34890148 PMCID: PMC8714189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Quantitative Structure-Activity Relationship (QSAR) modeling is a common computational technique for predicting chemical toxicity, but a lack of new methodological innovations has impeded QSAR performance on many tasks. We show that contemporary QSAR modeling for predictive toxicology can be substantially improved by incorporating semantic graph data aggregated from open-access public databases, and analyzing those data in the context of graph neural networks (GNNs). Furthermore, we introspect the GNNs to demonstrate how they can lead to more interpretable applications of QSAR, and use ablation analysis to explore the contribution of different data elements to the final models' performance.
Collapse
Affiliation(s)
| | | | - Jason H. Moore
- Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, Pennsylvania 19104, United States
| |
Collapse
|
22
|
Dibia KT, Igbokwe PK, Ezemagu GI, Asadu CO. Exploration of the quantitative Structure-Activity relationships for predicting Cyclooxygenase-2 inhibition bioactivity by Machine learning approaches. RESULTS IN CHEMISTRY 2022. [DOI: 10.1016/j.rechem.2021.100272] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
|
23
|
Rodríguez-Pérez R, Bajorath J. Explainable Machine Learning for Property Predictions in Compound Optimization. J Med Chem 2021; 64:17744-17752. [PMID: 34902252 DOI: 10.1021/acs.jmedchem.1c01789] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The prediction of compound properties from chemical structure is a main task for machine learning (ML) in medicinal chemistry. ML is often applied to large data sets in applications such as compound screening, virtual library enumeration, or generative chemistry. Albeit desirable, a detailed understanding of ML model decisions is typically not required in these cases. By contrast, compound optimization efforts rely on small data sets to identify structural modifications leading to desired property profiles. In this situation, if ML is applied, one usually is reluctant to make decisions based on predictions that cannot be rationalized. Only few ML methods are interpretable. However, to yield insights into complex ML model decisions, explanatory approaches can be applied. Herein, methodologies for better understanding of ML models or explaining individual predictions are reviewed and current challenges in integrating ML into medicinal chemistry programs as well as future opportunities are discussed.
Collapse
Affiliation(s)
- Raquel Rodríguez-Pérez
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany.,Novartis Institutes for Biomedical Research, Novartis Campus, CH-4002 Basel, Switzerland
| | - Jürgen Bajorath
- Department of Life Science Informatics and Data Science, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Friedrich-Hirzebruch-Allee 6, D-53115 Bonn, Germany
| |
Collapse
|
24
|
Nikonenko A, Zankov D, Baskin I, Madzhidov T, Polishchuk P. Multiple Conformer Descriptors for QSAR Modeling. Mol Inform 2021; 40:e2060030. [PMID: 34342944 DOI: 10.1002/minf.202060030] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 07/19/2021] [Indexed: 12/11/2022]
Abstract
The most widely used QSAR approaches are mainly based on 2D molecular representation which ignores stereoconfiguration and conformational flexibility of compounds. 3D QSAR uses a single conformer of each compound which is difficult to choose reasonably. 4D QSAR uses multiple conformers to overcome the issues of 2D and 3D methods. However, many of existing 4D QSAR models suffer from the necessity to pre-align conformers, while alignment-independent approaches often ignore stereoconfiguration of compounds. In this study we propose a QSAR modeling approach based on transforming chirality-aware 3D pharmacophore descriptors of individual conformers into a set of latent variables representing the whole conformer set of a molecule. This is achieved by clustering together all conformers of all training set compounds. The final representation of a compound is a bit string encoding cluster membership of its conformers. In our study we used Random Forest, but this representation can be used in combination with any machine learning method. We compared this approach with conventional 2D and 3D approaches using multiple data sets and investigated the sensitivity of the approach proposed to tuning parameters: number of conformers and clusters.
Collapse
Affiliation(s)
- Aleksandra Nikonenko
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| | - Dmitry Zankov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Igor Baskin
- Department of Materials Science and Engineering, Technion-Israel Institute of Technology, 3200003, Haifa, Israel
| | - Timur Madzhidov
- A.M. Butlerov Institute of Chemistry, Kazan Federal University, Kremlevskaya Str. 18, 420008, Kazan, Russia
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacký University and University Hospital in Olomouc, Hnevotinska 5, 77900, Olomouc, Czech Republic
| |
Collapse
|