1
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
2
|
Banerjee A, Roy K. ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:991-1007. [PMID: 38743054 DOI: 10.1039/d4em00173g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Due to the lack of experimental toxicity data for environmental chemicals, there arises a need to fill data gaps by in silico approaches. One of the most commonly used in silico approaches for toxicity assessment of small datasets is the Quantitative Structure-Activity Relationship (QSAR), which generates predictive models for the efficient prediction of query compounds. However, the reliability of the predictions from QSARs derived from small datasets is often questionable from a statistical point of view. This is due to the presence of a larger number of descriptors as compared to the number of training compounds, which reduces the degree of freedom of the developed model. To reduce the overall prediction error for a particular QSAR model, we have proposed here the computation of the novel Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. We have reduced the number of modeling descriptors in a supervised manner by partitioning them into K classes (K = 2 here) depending on the higher mean normalized values of the descriptors to a particular response class, thus preventing the loss of chemical information. A scatter plot of the data points using the values of two ARKA descriptors (ARKA_2 vs. ARKA_1) can potentially identify activity cliffs, less confident data points, and less modelable data points. We have used here five representative environmentally relevant endpoints (skin sensitization, earthworm toxicity, milk/plasma partitioning, algal toxicity, and rodent carcinogenicity of hazardous chemicals) with graded responses to which the ARKA framework was applied for classification modeling. On comparing the performance of the models generated using conventional QSAR descriptors and the ARKA descriptors, the prediction quality of the models derived from ARKA descriptors was found, based on multiple graded-data validation metrics-derived decision criteria, much better than the models derived from QSAR descriptors signifying the potential of ARKA descriptors in ecotoxicological classification modeling of small data sets. Additionally, this holds true for the Read-Across approach as well, since the Read-Across predictions using ARKA descriptors supersede the predictions generated from QSAR descriptors. For the ease of users, a Java-based expert system has been developed that computes the ARKA descriptors from the input of QSAR descriptors.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700 032, India.
| |
Collapse
|
3
|
Zheng X, Tomiura Y. A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence. J Cheminform 2024; 16:71. [PMID: 38898528 PMCID: PMC11186148 DOI: 10.1186/s13321-024-00848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 04/27/2024] [Indexed: 06/21/2024] Open
Abstract
Among the various molecular properties and their combinations, it is a costly process to obtain the desired molecular properties through theory or experiment. Using machine learning to analyze molecular structure features and to predict molecular properties is a potentially efficient alternative for accelerating the prediction of molecular properties. In this study, we analyze molecular properties through the molecular structure from the perspective of machine learning. We use SMILES sequences as inputs to an artificial neural network in extracting molecular structural features and predicting molecular properties. A SMILES sequence comprises symbols representing molecular structures. To address the problem that a SMILES sequence is different from actual molecular structural data, we propose a pretraining model for a SMILES sequence based on the BERT model, which is widely used in natural language processing, such that the model learns to extract the molecular structural information contained in the SMILES sequence. In an experiment, we first pretrain the proposed model with 100,000 SMILES sequences and then use the pretrained model to predict molecular properties on 22 data sets and the odor characteristics of molecules (98 types of odor descriptor). The experimental results show that our proposed pretraining model effectively improves the performance of molecular property prediction SCIENTIFIC CONTRIBUTION: The 2-encoder pretraining is proposed by focusing on the lower dependency of symbols to the contextual environment in a SMILES than one in a natural language sentence and the corresponding of one compound to multiple SMILES sequences. The model pretrained with 2-encoder shows higher robustness in tasks of molecular properties prediction compared to BERT which is adept at natural language.
Collapse
Affiliation(s)
- Xiaofan Zheng
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Yoichi Tomiura
- Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan.
| |
Collapse
|
4
|
Yuan Y, Tang X, Li H, Lang X, Li C, Song Y, Sun S, Yang Y, Zhou Z. KLSD: a kinase database focused on ligand similarity and diversity. Front Pharmacol 2024; 15:1400136. [PMID: 38957398 PMCID: PMC11217335 DOI: 10.3389/fphar.2024.1400136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/28/2024] [Indexed: 07/04/2024] Open
Abstract
Due to the similarity and diversity among kinases, small molecule kinase inhibitors (SMKIs) often display multi-target effects or selectivity, which have a strong correlation with the efficacy and safety of these inhibitors. However, due to the limited number of well-known popular databases and their restricted data mining capabilities, along with the significant scarcity of databases focusing on the pharmacological similarity and diversity of SMIKIs, researchers find it challenging to quickly access relevant information. The KLIFS database is representative of specialized application databases in the field, focusing on kinase structure and co-crystallised kinase-ligand interactions, whereas the KLSD database in this paper emphasizes the analysis of SMKIs among all reported kinase targets. To solve the current problem of the lack of professional application databases in kinase research and to provide centralized, standardized, reliable and efficient data resources for kinase researchers, this paper proposes a research program based on the ChEMBL database. It focuses on kinase ligands activities comparisons. This scheme extracts kinase data and standardizes and normalizes them, then performs kinase target difference analysis to achieve kinase activity threshold judgement. It then constructs a specialized and personalized kinase database platform, adopts the front-end and back-end separation technology of SpringBoot architecture, constructs an extensible WEB application, handles the storage, retrieval and analysis of the data, ultimately realizing data visualization and interaction. This study aims to develop a kinase database platform to collect, organize, and provide standardized data related to kinases. By offering essential resources and tools, it supports kinase research and drug development, thereby advancing scientific research and innovation in kinase-related fields. It is freely accessible at: http://ai.njucm.edu.cn:8080.
Collapse
Affiliation(s)
- Yuqian Yuan
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xiaozhu Tang
- School of Medicine and Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing, China
| | - Hongyan Li
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Xufeng Lang
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Can Li
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Yihua Song
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| | - Shanliang Sun
- National and Local Collaborative Engineering Center of Chinese Medicinal Resources Industrialization and Formulae Innovative Medicine, Jiangsu Collaborative Innovation Center of Chinese Medicinal Resources Industrialization, Jiangsu Key Laboratory for High Technology Research of TCM Formulae, Nanjing University of Chinese Medicine, Nanjing, China
| | - Ye Yang
- School of Medicine and Holistic Integrative Medicine, Nanjing University of Chinese Medicine, Nanjing, China
| | - Zuojian Zhou
- School of Artificial Intelligence and Information Technology, Nanjing University of Chinese Medicine, Nanjing, China
| |
Collapse
|
5
|
Zhang R, Nolte D, Sanchez-Villalobos C, Ghosh S, Pal R. Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nat Commun 2024; 15:5072. [PMID: 38871711 DOI: 10.1038/s41467-024-49372-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open
Abstract
Quantitative structure-activity relationship (QSAR) modeling is a powerful tool for drug discovery, yet the lack of interpretability of commonly used QSAR models hinders their application in molecular design. We propose a similarity-based regression framework, topological regression (TR), that offers a statistically grounded, computationally fast, and interpretable technique to predict drug responses. We compare the predictive performance of TR on 530 ChEMBL human target activity datasets against the predictive performance of deep-learning-based QSAR models. Our results suggest that our sparse TR model can achieve equal, if not better, performance than the deep learning-based QSAR models and provide better intuitive interpretation by extracting an approximate isometry between the chemical space of the drugs and their activity space.
Collapse
Affiliation(s)
- Ruibo Zhang
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Daniel Nolte
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Cesar Sanchez-Villalobos
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA
| | - Souparno Ghosh
- Department of Statistics, University of Nebraska - Lincoln, Lincoln, NB, 68588, USA.
| | - Ranadip Pal
- Department of Electrical and Computer Engineering, Texas Tech University, Lubbock, TX, 79409, USA.
| |
Collapse
|
6
|
Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
In recent years, alternative animal testing methods such as computational and machine learning approaches have become increasingly crucial for toxicity testing. However, the complexity and scarcity of available biomedical data challenge the development of predictive models. Combining nonlinear machine learning together with multicondition descriptors offers a solution for using data from various assays to create a robust model. This work applies multicondition descriptors (MCDs) to develop a QSTR (Quantitative Structure-Toxicity Relationship) model based on a large toxicity data set comprising more than 80,000 compounds and 59 different end points (122,572 data points). The prediction capabilities of developed single-task multi-end point machine learning models as well as a novel data analysis approach with the use of Convolutional Neural Networks (CNN) are discussed. The results show that using MCDs significantly improves the model and using them with CNN-1D yields the best result (R2train = 0.93, R2ext = 0.70). Several structural features showed a high level of contribution to the toxicity, including van der Waals surface area (VSA), number of nitrogen-containing fragments (nN+), presence of S-P fragments, ionization potential, and presence of C-N fragments. The developed models can be very useful tools to predict the toxicity of various compounds under different conditions, enabling quick toxicity assessment of new compounds.
Collapse
Affiliation(s)
- Amirreza Daghighi
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Gerardo M Casanola-Martin
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Kweeni Iduoku
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| | - Hrvoje Kusic
- Faculty of Chemical Engineering and Technology, University of Zagreb, Marulicev Trg 19, Zagreb 10000, Croatia
| | - Humberto González-Díaz
- Department of Organic and Inorganic Chemistry, University of Basque Country UPV/EHU, Leioa 48940, Spain
- BIOFISIKA, Basque Center for Biophysics CSIC-UPVEH, Leioa 48940, Spain
- IKERBASQUE, Basque Foundation for Science,Bilbao, Biscay 48011, Spain
| | - Bakhtiyor Rasulev
- Department of Coatings and Polymeric Materials, North Dakota State University, Fargo, North Dakota 58102, United States
- Biomedical Engineering Program, North Dakota State University, Fargo, North Dakota 58102, United States
| |
Collapse
|
7
|
Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024; 64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]
Abstract
By accelerating time-consuming processes with high efficiency, computing has become an essential part of many modern chemical pipelines. Machine learning is a class of computing methods that can discover patterns within chemical data and utilize this knowledge for a wide variety of downstream tasks, such as property prediction or substance generation. The complex and diverse chemical space requires complex machine learning architectures with great learning power. Recently, learning models based on transformer architectures have revolutionized multiple domains of machine learning, including natural language processing and computer vision. Naturally, there have been ongoing endeavors in adopting these techniques to the chemical domain, resulting in a surge of publications within a short period. The diversity of chemical structures, use cases, and learning models necessitate a comprehensive summarization of existing works. In this paper, we review recent innovations in adapting transformers to solve learning problems in chemistry. Because chemical data is diverse and complex, we structure our discussion based on chemical representations. Specifically, we highlight the strengths and weaknesses of each representation, the current progress of adapting transformer architectures, and future directions.
Collapse
Affiliation(s)
- Kha-Dinh Luong
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| | - Ambuj Singh
- Department of Computer Science, University of California Santa Barbara, Santa Barbara, CA 93106, United States
| |
Collapse
|
8
|
Kumar A, Ojha PK, Roy K. The first report on the assessment of maximum acceptable daily intake (MADI) of pesticides for humans using intelligent consensus predictions. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024; 26:870-881. [PMID: 38652036 DOI: 10.1039/d4em00059e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]
Abstract
Direct or indirect consumption of pesticides and their related products by humans and other living organisms without safe dosing may pose a health risk. The risk may arise after a short/long time which depends on the nature and amount of chemicals consumed. Therefore, the maximum acceptable daily intake of chemicals must be calculated to prevent these risks. In the present work, regression-based quantitative structure-activity relationship (QSAR) models were developed using 39 pesticides with maximum acceptable daily intake (MADI) for humans as the endpoint. From the statistical results (R2 = 0.674-0.712, QLOO2 = 0.553-0.580, Q(F1)2 = 0.544-0.611, and Q(F2)2 = 0.531-0.599), it can be inferred that the developed models were robust, reliable, reproducible, accurate, and predictive. Intelligent Consensus Prediction (ICP) was employed to improve the external predictivity (Q(F1)2 =0.579-0.657 and Q(F2)2 = 0.563-0.647) of the models. Some of the chemical markers responsible for toxicity enhancement are the presence of unsaturated bonds, lipophilicity, presence of C< (double bond-single bond-single bonded carbon), and the presence of sulphur and phosphate bonds at the topological distances 1 and 6, while the presence of hydrophilic groups and short chain fragments reduces the toxicity. The Pesticide Properties Database (PPDB) (1694 pesticides) was also screened with the developed models. Hence, this research work will be helpful for the toxicity assessment of pesticides before their synthesis, the development of eco-friendly and safer pesticides, and data-gap filling reducing the time, cost, and animal experimentation. Thus, this study might hold promise for future potential MADI assessment of pesticides and provide a meaningful contribution to the field of risk assessment.
Collapse
Affiliation(s)
- Ankur Kumar
- Drug Discovery and Development Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Probir Kumar Ojha
- Drug Discovery and Development Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics (DTC) Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
9
|
Isinkaye FO, Olusanya MO, Singh PK. Deep learning and content-based filtering techniques for improving plant disease identification and treatment recommendations: A comprehensive review. Heliyon 2024; 10:e29583. [PMID: 38737274 PMCID: PMC11088271 DOI: 10.1016/j.heliyon.2024.e29583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/30/2024] [Accepted: 04/10/2024] [Indexed: 05/14/2024] Open
Abstract
The importance of identifying plant diseases has risen recently due to the adverse effect they have on agricultutal production. Plant diseases have been a big concern in agriculture, as they affect crop production, and constitute a major threat to global food security. In the domain of modern agriculture, effective plant disease management is vital to ensure healthy crop yields and sustainable practices. Traditional means of identifying plant disease are faced with lots of challenges and the need for better and efficient detection methods cannot be overemphazised. The emergence of advanced technologies, particularly deep learning and content-based filtering techniques, if integrated together can changed the way plant diseases are identified and treated. Such as speedy and correct identification of plant diseases and efficient treatment recommendations which are keys for sustainable food production. In this work, We try to investigate the current state of research, identified gaps and limitations in knowledge, and suggests future directions for researchers, experts and farmers that could help to provide better ways of mitigating plant disease problems.
Collapse
Affiliation(s)
- Folasade Olubusola Isinkaye
- Department of Computer Science and Information Technology, Sol Plaatje University Kimberley, 8301, South Africa
| | - Michael Olusoji Olusanya
- Department of Computer Science and Information Technology, Sol Plaatje University Kimberley, 8301, South Africa
| | - Pramod Kumar Singh
- Department of Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management Gwalior, Gwalior, 474015, MP, India
| |
Collapse
|
10
|
Schlosser L, Rana D, Pflüger P, Katzenburg F, Glorius F. EnTdecker - A Machine Learning-Based Platform for Guiding Substrate Discovery in Energy Transfer Catalysis. J Am Chem Soc 2024; 146:13266-13275. [PMID: 38695558 DOI: 10.1021/jacs.4c01352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Due to the magnitude of chemical space, the discovery of novel substrates in energy transfer (EnT) catalysis remains a daunting task. Experimental and computational strategies to identify compounds that successfully undergo EnT-mediated reactions are limited by their time and cost efficiency. To accelerate the discovery process in EnT catalysis, we herein present the EnTdecker platform, which facilitates the large-scale virtual screening of potential substrates using machine-learning (ML) based predictions of their excited state properties. To achieve this, a data set is created containing more than 34,000 molecules aiming to cover a vast fraction of synthetically relevant compound space for EnT catalysis. Using this data predictive models are trained, and their aptitude for an in-lab application is demonstrated by rediscovering successful substrates from literature as well as experimental validation through luminescence-based screening. By reducing the computational effort needed to obtain excited state properties, the EnTdecker platform represents a tool to efficiently guide substrate selection and increase the experimental success rate for EnT catalysis. Moreover, through an easy-to-use web application, EnTdecker is made publicly accessible under entdecker.uni-muenster.de.
Collapse
Affiliation(s)
- Leon Schlosser
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Debanjan Rana
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Philipp Pflüger
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Felix Katzenburg
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| | - Frank Glorius
- Organisch-Chemisches Institut, University of Münster, Corrensstraße 36, 48149 Münster, Germany
| |
Collapse
|
11
|
Walter M, Webb SJ, Gillet VJ. Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features. J Chem Inf Model 2024; 64:3670-3688. [PMID: 38686880 PMCID: PMC11094726 DOI: 10.1021/acs.jcim.4c00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]
Abstract
Neural network models have become a popular machine-learning technique for the toxicity prediction of chemicals. However, due to their complex structure, it is difficult to understand predictions made by these models which limits confidence. Current techniques to tackle this problem such as SHAP or integrated gradients provide insights by attributing importance to the input features of individual compounds. While these methods have produced promising results in some cases, they do not shed light on how representations of compounds are transformed in hidden layers, which constitute how neural networks learn. We present a novel technique to interpret neural networks which identifies chemical substructures in training data found to be responsible for the activation of hidden neurons. For individual test compounds, the importance of hidden neurons is determined, and the associated substructures are leveraged to explain the model prediction. Using structural alerts for mutagenicity from the Derek Nexus expert system as ground truth, we demonstrate the validity of the approach and show that model explanations are competitive with and complementary to explanations obtained from an established feature attribution method.
Collapse
Affiliation(s)
- Moritz Walter
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| | - Samuel J. Webb
- Lhasa
Limited, Granary Wharf
House, 2 Canal Wharf, Leeds LS11 5PY, U.K.
| | - Valerie J. Gillet
- Information
School, University of Sheffield, The Wave, 2 Whitham Road, Sheffield S10 2AH, U.K.
| |
Collapse
|
12
|
Tian T, Li S, Fang M, Zhao D, Zeng J. MolSHAP: Interpreting Quantitative Structure-Activity Relationships Using Shapley Values of R-Groups. J Chem Inf Model 2024; 64:2236-2249. [PMID: 37584270 DOI: 10.1021/acs.jcim.3c00465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]
Abstract
Optimizing the activities and properties of lead compounds is an essential step in the drug discovery process. Despite recent advances in machine learning-aided drug discovery, most of the existing methods focus on making predictions for the desired objectives directly while ignoring the explanations for predictions. Although several techniques can provide interpretations for machine learning-based methods such as feature attribution, there are still gaps between these interpretations and the principles commonly adopted by medicinal chemists when designing and optimizing molecules. Here, we propose an interpretation framework, named MolSHAP, for quantitative structure-activity relationship analysis by estimating the contributions of R-groups. Instead of attributing the activities to individual input features, MolSHAP regards the R-group fragments as the basic units of interpretation, which is in accordance with the fragment-based modifications in molecule optimization. MolSHAP is a model-agnostic method that can interpret activity regression models with arbitrary input formats and model architectures. Based on the evaluations of numerous representative activity regression models on a specially designed R-group ranking task, MolSHAP achieved significantly better interpretation power compared with other methods. In addition, we developed a compound optimization algorithm based on MolSHAP and illustrated the reliability of the optimized compounds using an independent case study. These results demonstrated that MolSHAP can provide a useful tool for accurately interpreting the quantitative structure-activity relationships and rationally optimizing the compound activities in drug discovery.
Collapse
Affiliation(s)
- Tingzhong Tian
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Shuya Li
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Meng Fang
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Dan Zhao
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
13
|
Hartog PBR, Krüger F, Genheden S, Tetko IV. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition. J Cheminform 2024; 16:39. [PMID: 38576047 PMCID: PMC10993590 DOI: 10.1186/s13321-024-00824-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open
Abstract
Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation. SCIENTIFIC CONTRIBUTION: In this research we critically investigate XAI through test-time augmentation, contrasting previous assumptions about using expert validation and showing inconsistencies within models for identical representations. SMILES augmentation has been used to increase model accuracy, but was here adapted from the field of image test-time augmentation to be used as an independent indication of the consistency within SMILES-based molecular representation models.
Collapse
Affiliation(s)
- Peter B R Hartog
- Molecular AI, Discovery Sciences, R &D, AstraZeneca, 431 83, Mölndal, Sweden.
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany.
| | - Fabian Krüger
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany
| | - Samuel Genheden
- Molecular AI, Discovery Sciences, R &D, AstraZeneca, 431 83, Mölndal, Sweden
| | - Igor V Tetko
- Institute of Structural Biology, Helmholtz Munich, Munich, 85764, Germany
| |
Collapse
|
14
|
Kovalishyn V, Severin O, Kachaeva M, Kobzar O, Keith KA, Harden EA, Hartline CB, James SH, Vovk A, Brovarets V. In Silico Design and Experimental Validation of Novel Oxazole Derivatives Against Varicella zoster virus. Mol Biotechnol 2024; 66:707-717. [PMID: 36709460 DOI: 10.1007/s12033-023-00670-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/14/2023] [Indexed: 01/30/2023]
Abstract
Varicella zoster virus (VZV) infection causes severe disease such as chickenpox, shingles, and postherpetic neuralgia, often leading to disability. Reactivation of latent VZV is associated with a decrease in specific cellular immunity in the elderly and in patients with immunodeficiency. However, due to the limited efficacy of existing therapy and the emergence of antiviral resistance, it has become necessary to develop new and effective antiviral drugs for the treatment of diseases caused by VZV, particularly in the setting of opportunistic infections. The goal of this work is to identify potent oxazole derivatives as anti-VZV agents by machine learning, followed by their synthesis and experimental validation. Predictive QSAR models were developed using the Online Chemical Modeling Environment (OCHEM). Data on compounds exhibiting antiviral activity were collected from the ChEMBL and uploaded in the OCHEM database. The predictive ability of the models was tested by cross-validation, giving coefficient of determination q2 = 0.87-0.9. The validation of the models using an external test set proves that the models can be used to predict the antiviral activity of newly designed and known compounds with reasonable accuracy within the applicability domain (q2 = 0.83-0.84). The models were applied to screen a virtual chemical library with expected activity of compounds against VZV. The 7 most promising oxazole derivatives were identified, synthesized, and tested. Two of them showed activity against the VZV Ellen strain upon primary in vitro antiviral screening. The synthesized compounds may represent an interesting starting point for further development of the oxazole derivatives against VZV. The developed models are available online at OCHEM http://ochem.eu/article/145978 and can be used to virtually screen for potential compounds with anti-VZV activity.
Collapse
Affiliation(s)
- Vasyl Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine.
| | - Oleksandr Severin
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Maryna Kachaeva
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Oleksandr Kobzar
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Kathy A Keith
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Emma A Harden
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Caroll B Hartline
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Scott H James
- Department of Pediatrics, Division of Pediatric Infectious Diseases, University of Alabama at Birmingham, Birmingham, Alabama, 35233, USA
| | - Andriy Vovk
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Volodymyr Brovarets
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of the National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| |
Collapse
|
15
|
Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024; 29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]
Abstract
The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100 K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
Collapse
Affiliation(s)
- Andrea Hunklinger
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Peter Hartog
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany
| | - Martin Šícho
- Leiden Academic Centre for Drug Research, Leiden University, 55 Einsteinweg, 2333 CC Leiden, the Netherlands; CZ-OPENSCREEN: National Infrastructure for Chemical Biology, Department of Informatics and Chemistry, Faculty of Chemical Technology, University of Chemistry and Technology Prague, Technická 5, 166 28, Prague, Czech Republic
| | - Guillaume Godin
- dsm-firmenich SA, Rue de la Bergère 7, CH-1242 Satigny, Switzerland
| | - Igor V Tetko
- Institute of Structural Biology, Molecular Targets and Therapeutics Center, Helmholtz Munich-Deutsches Forschungszentrum für Gesundheit und Umwelt (GmbH), DE-85764 Neuherberg, Germany; BIGCHEM GmbH, Valerystr. 49, DE-85716 Unterschleißheim, Germany.
| |
Collapse
|
16
|
Shen T, Li S, Wang XS, Wang D, Wu S, Xia J, Zhang L. Deep reinforcement learning enables better bias control in benchmark for virtual screening. Comput Biol Med 2024; 171:108165. [PMID: 38402838 DOI: 10.1016/j.compbiomed.2024.108165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 02/27/2024]
Abstract
Virtual screening (VS) has been incorporated into the paradigm of modern drug discovery. This field is now undergoing a new wave of revolution driven by artificial intelligence and more specifically, machine learning (ML). In terms of those out-of-the-box datasets for model training or benchmarking, their data volume and applicability domain are limited. They are suffering from the biases constantly reported in the ML application. To address these issues, we present a novel benchmark named MUBDsyn. The utilization of synthetic decoys (i.e., presumed inactives) is the main feature of MUBDsyn, where deep reinforcement learning was leveraged for bias control during decoy generation. Then, we carried out extensive validations on this new benchmark. First, we confirmed that MUBDsyn was superior to the classical benchmarks in control of domain bias, artificial enrichment bias and analogue bias. Moreover, we found that the assessment of ML models based on MUBDsyn was less biased as revealed by the analysis of asymmetric validation embedding bias. In addition, MUBDsyn showed better setting of benchmarking challenge for deep learning models compared with NRLiSt-BDB. Overall, we have proven that MUBDsyn is the close-to-ideal benchmark for VS. The computational tool is publicly available for the easy extension of MUBDsyn.
Collapse
Affiliation(s)
- Tao Shen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China
| | - Shan Li
- College of Economics and Management, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China
| | - Xiang Simon Wang
- Artificial Intelligence and Drug Discovery Core Laboratory for District of Columbia Center for AIDS Research (DC CFAR), Department of Pharmaceutical Sciences, College of Pharmacy, Howard University, USA
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, 100050, China.
| | - Liangren Zhang
- State Key Laboratory of Natural and Biomimetic Drugs, School of Pharmaceutical Sciences, Peking University, Beijing, 100191, China
| |
Collapse
|
17
|
Nwadiugwu M, Onwuekwe I, Ezeanolue E, Deng H. Beyond Amyloid: A Machine Learning-Driven Approach Reveals Properties of Potent GSK-3β Inhibitors Targeting Neurofibrillary Tangles. Int J Mol Sci 2024; 25:2646. [PMID: 38473895 PMCID: PMC10931970 DOI: 10.3390/ijms25052646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/16/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open
Abstract
Current treatments for Alzheimer's disease (AD) focus on slowing memory and cognitive decline, but none offer curative outcomes. This study aims to explore and curate the common properties of active, drug-like molecules that modulate glycogen synthase kinase 3β (GSK-3β), a well-documented kinase with increased activity in tau hyperphosphorylation and neurofibrillary tangles-hallmarks of AD pathology. Leveraging quantitative structure-activity relationship (QSAR) data from the PubChem and ChEMBL databases, we employed seven machine learning models: logistic regression (LogR), k-nearest neighbors (KNN), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), neural networks (NNs), and ensemble majority voting. Our goal was to correctly predict active and inactive compounds that inhibit GSK-3β activity and identify their key properties. Among the six individual models, the NN demonstrated the highest performance with a 79% AUC-ROC on unbalanced external validation data, while the SVM model was superior in accurately classifying the compounds. The SVM and RF models surpassed NN in terms of Kappa values, and the ensemble majority voting model demonstrated slightly better accuracy to the NN on the external validation data. Feature importance analysis revealed that hydrogen bonds, phenol groups, and specific electronic characteristics are important features of molecular descriptors that positively correlate with active GSK-3β inhibition. Conversely, structural features like imidazole rings, sulfides, and methoxy groups showed a negative correlation. Our study highlights the significance of structural, electronic, and physicochemical descriptors in screening active candidates against GSK-3β. These predictive features could prove useful in therapeutic strategies to understand the important properties of GSK-3β candidate inhibitors that may potentially benefit non-amyloid-based AD treatments targeting neurofibrillary tangles.
Collapse
Affiliation(s)
- Martin Nwadiugwu
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Ikenna Onwuekwe
- Neurology Unit, Department of Medicine, University of Nigeria Teaching Hospital, Ituku-Ozalla 400001, Enugu, Nigeria;
- Department of Medicine, College of Medicine, University of Nigeria, Enugu Campus, Nsukka 400001, Enugu, Nigeria
| | - Echezona Ezeanolue
- Center for Translation and Implementation Research (CTAIR), University of Nigeria, Nsukka 410001, Enugu, Nigeria;
- Healthy Sunrise Foundation, Las Vegas, NV 89107, USA
| | - Hongwen Deng
- Tulane Center for Biomedical Informatics and Genomics, Deming Department of Medicine, Tulane University School of Medicine, Tulane University, New Orleans, LA 70112, USA
| |
Collapse
|
18
|
Shen T, Guo J, Han Z, Zhang G, Liu Q, Si X, Wang D, Wu S, Xia J. AutoMolDesigner for Antibiotic Discovery: An AI-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J Chem Inf Model 2024; 64:575-583. [PMID: 38265916 DOI: 10.1021/acs.jcim.3c01562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]
Abstract
Discovery of small-molecule antibiotics with novel chemotypes serves as one of the essential strategies to address antibiotic resistance. Although a considerable number of computational tools committed to molecular design have been reported, there is a deficit in holistic and efficient tools specifically developed for small-molecule antibiotic discovery. To address this issue, we report AutoMolDesigner, a computational modeling software dedicated to small-molecule antibiotic design. It is a generalized framework comprising two functional modules, i.e., generative-deep-learning-enabled molecular generation and automated machine-learning-based antibacterial activity/property prediction, wherein individually trained models and curated datasets are out-of-the-box for whole-cell-based antibiotic screening and design. It is open-source, thus allowing for the incorporation of new features for flexible use. Unlike most software programs based on Linux and command lines, this application equipped with a Qt-based graphical user interface can be run on personal computers with multiple operating systems, making it much easier to use for experimental scientists. The software and related materials are freely available at GitHub (https://github.com/taoshen99/AutoMolDesigner) and Zenodo (https://zenodo.org/record/10097899).
Collapse
Affiliation(s)
- Tao Shen
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jiale Guo
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Zunsheng Han
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Gao Zhang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Qingxin Liu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Xinxin Si
- School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
| | - Dongmei Wang
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Song Wu
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| | - Jie Xia
- State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
| |
Collapse
|
19
|
Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024; 15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open
Abstract
There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.
Collapse
Affiliation(s)
- Amit Gangwal
- Department of Natural Product Chemistry, Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Azim Ansari
- Computer Aided Drug Design Center Shri Vile Parle Kelavani Mandal’s Institute of Pharmacy, Dhule, Maharashtra, India
| | - Iqrar Ahmad
- Department of Pharmaceutical Chemistry, Prof. Ravindra Nikam College of Pharmacy, Dhule, India
| | - Abul Kalam Azad
- Faculty of Pharmacy, University College of MAIWP International, Batu Caves, Malaysia
| | - Vinoth Kumarasamy
- Department of Parasitology and Medical Entomology, Faculty of Medicine, Universiti Kebangsaan Malaysia, Cheras, Malaysia
| | - Vetriselvan Subramaniyan
- Pharmacology Unit, Jeffrey Cheah School of Medicine and Health Sciences, Monash University Malaysia, Selangor, Malaysia
- School of Bioengineering and Biosciences, Lovely Professional University, Phagwara, Punjab, India
| | - Ling Shing Wong
- Faculty of Health and Life Sciences, INTI International University, Nilai, Malaysia
| |
Collapse
|
20
|
Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024; 342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]
Abstract
The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.
Collapse
Affiliation(s)
- Lang Lei
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Liangmao Zhang
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Zhibang Han
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Qirui Chen
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Pengcheng Liao
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
| | - Dong Wu
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Jun Tai
- Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
| | - Bing Xie
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
| | - Yinglong Su
- Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.
| |
Collapse
|
21
|
Song Z, Chen J, Cheng J, Chen G, Qi Z. Computer-Aided Molecular Design of Ionic Liquids as Advanced Process Media: A Review from Fundamentals to Applications. Chem Rev 2024; 124:248-317. [PMID: 38108629 DOI: 10.1021/acs.chemrev.3c00223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
The unique physicochemical properties, flexible structural tunability, and giant chemical space of ionic liquids (ILs) provide them a great opportunity to match different target properties to work as advanced process media. The crux of the matter is how to efficiently and reliably tailor suitable ILs toward a specific application. In this regard, the computer-aided molecular design (CAMD) approach has been widely adapted to cover this family of high-profile chemicals, that is, to perform computer-aided IL design (CAILD). This review discusses the past developments that have contributed to the state-of-the-art of CAILD and provides a perspective about how future works could pursue the acceleration of the practical application of ILs. In a broad context of CAILD, key aspects related to the forward structure-property modeling and reverse molecular design of ILs are overviewed. For the former forward task, diverse IL molecular representations, modeling algorithms, as well as representative models on physical properties, thermodynamic properties, among others of ILs are introduced. For the latter reverse task, representative works formulating different molecular design scenarios are summarized. Beyond the substantial progress made, some future perspectives to move CAILD a step forward are finally provided.
Collapse
Affiliation(s)
- Zhen Song
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jiahui Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Jie Cheng
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Guzhong Chen
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| | - Zhiwen Qi
- State Key laboratory of Chemical Engineering, School of Chemical Engineering, East China University of Science and Technology, 130 Meilong Road, Shanghai 200237, China
| |
Collapse
|
22
|
Gryniukova A, Borysko P, Myziuk I, Alieksieieva D, Hodyna D, Semenyuta I, Kovalishyn V, Metelytsia L, Rogalsky S, Tcherniuk S. Anticancer activity features of imidazole-based ionic liquids and lysosomotropic detergents: in silico and in vitro studies. Mol Divers 2024:10.1007/s11030-023-10779-4. [PMID: 38246950 DOI: 10.1007/s11030-023-10779-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/20/2023] [Indexed: 01/23/2024]
Abstract
Long-chain imidazole-based ionic liquids (compounds 2, 4, 9) and lysosomotropic detergents (compounds 7, 3, 8) with potent anticancer activity were synthesized. Their inhibitory activities against neuroblastoma and leukaemia cell lines were predicted by the new in silico QSAR models. The cytotoxic activities of the synthesized imidazole derivatives were investigated on the SK-N-DZ (human neuroblastoma) and K-562 (human chronic myeloid leukaemia) cell lines. Compounds 2 and 7 showed the highest in vitro cytotoxic effect on both cancer cell lines. The docking procedure of compounds 2 and 7 into the NAD+ coenzyme binding site of deacetylase Sirtuin-1 (SIRT-1) showed the formation of protein-ligand complexes with calculated binding energies of - 8.0 and - 8.1 kcal/mol, respectively. The interaction of SIRT1 with compounds 2, 7 and 9 and the interaction of Bromodomain-containing protein 4 (BRD4) with compounds 7 and 9 were also demonstrated by thermal shift assay. Compounds 2, 4, 7 and 9 inhibited SIRT1 deacetylase activity in the SIRT-Glo assay. Compounds 7 and 9 showed a moderate inhibitory activity against Aurora kinase A. In addition, compounds 3, 4, 8 and 9 inhibited the Janus kinase 2 activity. The results obtained showed that long-chain imidazole derivatives exhibited cytotoxic activities on K562 leukaemia and SK-N-DZ neuroblastoma cell lines. Furthermore, these compounds inhibited a panel of molecular targets involved in leukaemia and neuroblastoma tumorigenesis. All these results suggest that both long-chain imidazole-based ionic liquids and lysosomotropic detergents may be an effective alternative for the treatment of neuroblastoma and chronic myeloid leukemia and merit further investigation.
Collapse
Affiliation(s)
- Anastasiia Gryniukova
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str, Kyiv, 02094, Ukraine
- Bienta/Enamine Ltd, 78 Winston Churchill Str, Kyiv, 02094, Ukraine
| | - Petro Borysko
- Bienta/Enamine Ltd, 78 Winston Churchill Str, Kyiv, 02094, Ukraine
| | - Iryna Myziuk
- Bienta/Enamine Ltd, 78 Winston Churchill Str, Kyiv, 02094, Ukraine
| | | | - Diana Hodyna
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str, Kyiv, 02094, Ukraine
| | - Ivan Semenyuta
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str, Kyiv, 02094, Ukraine
| | - Vasyl Kovalishyn
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str, Kyiv, 02094, Ukraine
| | - Larysa Metelytsia
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 1 Academician Kukhar Str, Kyiv, 02094, Ukraine
| | - Sergiy Rogalsky
- Laboratory of Modification of Polymers, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, National Academy of Science of Ukraine, 50 Kharkivske shose, Kyiv, 02160, Ukraine.
| | - Sergey Tcherniuk
- IdeSip, 4 Rue Pierre Fontaine, 91058, Évry-Courcouronnes, France.
- Department of Biological Sciences, Youth Academy of Sciences, 2 Nemyrovych-Danchenko Str, Kyiv, 01011, Ukraine.
| |
Collapse
|
23
|
Siramshetty VB, Xu X, Shah P. Artificial Intelligence in ADME Property Prediction. Methods Mol Biol 2024; 2714:307-327. [PMID: 37676606 DOI: 10.1007/978-1-0716-3441-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]
Abstract
Absorption, distribution, metabolism, excretion (ADME) are key properties of a small molecule that govern pharmacokinetic profiles and impact its efficacy and safety. Computational methods such as machine learning and artificial intelligence have gained significant interest in both academic and industrial settings to predict pharmacokinetic properties of small molecules. These methods are applied in drug discovery to optimize chemical libraries, prioritize hits from biological screens, and optimize ADME properties of lead molecules. In the recent years, the drug discovery community witnessed the use of a range of neural network architectures such as deep neural networks, recurrent neural networks, graph neural networks, and transformer neural networks, which marked a paradigm shift in computer-aided drug design and development. This chapter discusses recent developments with an emphasis on their application to predict ADME properties.
Collapse
Affiliation(s)
- Vishal B Siramshetty
- National Center for Advancing Translational Sciences, Rockville, MD, USA
- Department of Safety Assessment, Genentech, Inc., South San Francisco, CA, USA
| | - Xin Xu
- National Center for Advancing Translational Sciences, Rockville, MD, USA
| | - Pranav Shah
- National Center for Advancing Translational Sciences, Rockville, MD, USA.
| |
Collapse
|
24
|
Li Y, Cardoso-Silva J, Kelly JM, Delves MJ, Furnham N, Papageorgiou LG, Tsoka S. Optimisation-based modelling for explainable lead discovery in malaria. Artif Intell Med 2024; 147:102700. [PMID: 38184363 DOI: 10.1016/j.artmed.2023.102700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/17/2023] [Accepted: 10/29/2023] [Indexed: 01/08/2024]
Abstract
BACKGROUND The search for new antimalarial treatments is urgent due to growing resistance to existing therapies. The Open Source Malaria (OSM) project offers a promising starting point, having extensively screened various compounds for their effectiveness. Further analysis of the chemical space surrounding these compounds could provide the means for innovative drugs. METHODS We report an optimisation-based method for quantitative structure-activity relationship (QSAR) modelling that provides explainable modelling of ligand activity through a mathematical programming formulation. The methodology is based on piecewise regression principles and offers optimal detection of breakpoint features, efficient allocation of samples into distinct sub-groups based on breakpoint feature values, and insightful regression coefficients. Analysis of OSM antimalarial compounds yields interpretable results through rules generated by the model that reflect the contribution of individual fingerprint fragments in ligand activity prediction. Using knowledge of fragment prioritisation and screening of commercially available compound libraries, potential lead compounds for antimalarials are identified and evaluated experimentally via a Plasmodium falciparum asexual growth inhibition assay (PfGIA) and a human cell cytotoxicity assay. CONCLUSIONS Three compounds are identified as potential leads for antimalarials using the methodology described above. This work illustrates how explainable predictive models based on mathematical optimisation can pave the way towards more efficient fragment-based lead discovery as applied in malaria.
Collapse
Affiliation(s)
- Yutong Li
- Department of Informatics, King's College London, Bush House, London, WC2B 4BG, UK
| | - Jonathan Cardoso-Silva
- Data Science Institute, London School of Economics and Political Science, Houghton St, London, WC2A 2AE, UK
| | - John M Kelly
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Michael J Delves
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Nicholas Furnham
- Department of Infection Biology, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, UK
| | - Lazaros G Papageorgiou
- The Sargent Centre for Process Systems Engineering, Department of Chemical Engineering, University College London, Torrington Place, London, WC1E 7JE, UK
| | - Sophia Tsoka
- Department of Informatics, King's College London, Bush House, London, WC2B 4BG, UK.
| |
Collapse
|
25
|
Pérez-Correa I, Giunta PD, Mariño FJ, Francesconi JA. Transformer-Based Representation of Organic Molecules for Potential Modeling of Physicochemical Properties. J Chem Inf Model 2023; 63:7676-7688. [PMID: 38062559 DOI: 10.1021/acs.jcim.3c01548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]
Abstract
In this work, we study the use of three configurations of an autoencoder neural network to process organic substances with the aim of generating meaningful molecular descriptors that can be employed to develop property prediction models. A total of 18,322,500 compounds represented as SMILES strings were used to train the model, demonstrating that a latent space of 24 units is able to adequately reconstruct the data. After AE training, an analysis of the latent space properties in terms of compound similarity was carried out, indicating that this space possesses desired properties for the potential development of models for forecasting physical properties of organic compounds. As a final step, a QSPR model was developed to predict the boiling point of chemical substances based on the AE descriptors. 5276 substances were used for the regression task, and the predictive ability was compared with models available in the literature evaluated on the same database. The final AE model has an overall error of 1.40% (1.39% with augmented SMILES) in the prediction of the boiling temperature, while other models have errors between 2.0 and 3.2%. This shows that the SMILES representation is comparable and even outperforms the state-of-the-art representations widely used in the literature.
Collapse
Affiliation(s)
- Ignacio Pérez-Correa
- Instituto de Tecnologías del Hidrógeno y Energías Sostenibles (ITHES), UBA-CONICET, Ciudad Universitaria, Intendente Güiraldes 2160, Ciudad de Buenos Aires C1428EGA, Argentina
| | - Pablo D Giunta
- Instituto de Tecnologías del Hidrógeno y Energías Sostenibles (ITHES), UBA-CONICET, Ciudad Universitaria, Intendente Güiraldes 2160, Ciudad de Buenos Aires C1428EGA, Argentina
| | - Fernando J Mariño
- Instituto de Tecnologías del Hidrógeno y Energías Sostenibles (ITHES), UBA-CONICET, Ciudad Universitaria, Intendente Güiraldes 2160, Ciudad de Buenos Aires C1428EGA, Argentina
| | - Javier A Francesconi
- Centro de Investigación y Desarrollo en Tecnología de Alimentos (CIDTA), UTN-FRRo, Estanislao Zeballos 1341, Rosario S2000BQA, Argentina
| |
Collapse
|
26
|
Sandhu H, Garg P. Machine Learning Enables Accurate Prediction of Quinone Formation during Drug Metabolism. Chem Res Toxicol 2023; 36:1876-1890. [PMID: 37885227 DOI: 10.1021/acs.chemrestox.3c00162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Metabolism helps in the elimination of drugs from the human body by making them more hydrophilic. Sometimes, drugs can be bioactivated to highly reactive metabolites or intermediates during metabolism. These reactive metabolites are often responsible for the toxicities associated with the drugs. Identification of reactive metabolites of drug candidates can be very helpful in the initial stages of drug discovery. Quinones are soft electrophiles that are generated as reactive intermediates during metabolism. Quinones make up more than 40% of the reactive metabolites. In this work, a reliable data set of 510 molecules was used to develop machine learning and deep learning-based predictive models to predict the formation of quinone-type metabolites. For representing molecules, two-dimensional (2D) descriptors, PubChem fingerprints, electro-topological state (E-state) fingerprints, and metabolic reactivity-based descriptors were used. Developed models were compared to the existing Xenosite web server using the untouched test set of 102 molecules. The best model achieved an accuracy of 86.27%, while the Xenosite server could achieve an accuracy of only 52.94% on the test set. Descriptor analysis revealed that the presence of greater numbers of polar moieties in a molecule can prevent the formation of quinone-type metabolites. In addition, the presence of a nitrogen atom in an aromatic ring and the presence of metabolophores V51, V52, and V53 (SMARTCyp descriptors) decrease the probability of quinone formation. Finally, a tool based on the best machine learning models was developed, which is accessible at http://14.139.57.41/quinonepred/.
Collapse
Affiliation(s)
- Hardeep Sandhu
- Department of pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar 160062, Punjab, India
| | - Prabha Garg
- Department of pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar 160062, Punjab, India
| |
Collapse
|
27
|
Ali H, Qureshi R, Shah Z. Artificial Intelligence-Based Methods for Integrating Local and Global Features for Brain Cancer Imaging: Scoping Review. JMIR Med Inform 2023; 11:e47445. [PMID: 37976086 PMCID: PMC10692876 DOI: 10.2196/47445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 07/02/2023] [Accepted: 07/12/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND Transformer-based models are gaining popularity in medical imaging and cancer imaging applications. Many recent studies have demonstrated the use of transformer-based models for brain cancer imaging applications such as diagnosis and tumor segmentation. OBJECTIVE This study aims to review how different vision transformers (ViTs) contributed to advancing brain cancer diagnosis and tumor segmentation using brain image data. This study examines the different architectures developed for enhancing the task of brain tumor segmentation. Furthermore, it explores how the ViT-based models augmented the performance of convolutional neural networks for brain cancer imaging. METHODS This review performed the study search and study selection following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The search comprised 4 popular scientific databases: PubMed, Scopus, IEEE Xplore, and Google Scholar. The search terms were formulated to cover the interventions (ie, ViTs) and the target application (ie, brain cancer imaging). The title and abstract for study selection were performed by 2 reviewers independently and validated by a third reviewer. Data extraction was performed by 2 reviewers and validated by a third reviewer. Finally, the data were synthesized using a narrative approach. RESULTS Of the 736 retrieved studies, 22 (3%) were included in this review. These studies were published in 2021 and 2022. The most commonly addressed task in these studies was tumor segmentation using ViTs. No study reported early detection of brain cancer. Among the different ViT architectures, Shifted Window transformer-based architectures have recently become the most popular choice of the research community. Among the included architectures, UNet transformer and TransUNet had the highest number of parameters and thus needed a cluster of as many as 8 graphics processing units for model training. The brain tumor segmentation challenge data set was the most popular data set used in the included studies. ViT was used in different combinations with convolutional neural networks to capture both the global and local context of the input brain imaging data. CONCLUSIONS It can be argued that the computational complexity of transformer architectures is a bottleneck in advancing the field and enabling clinical transformations. This review provides the current state of knowledge on the topic, and the findings of this review will be helpful for researchers in the field of medical artificial intelligence and its applications in brain cancer.
Collapse
Affiliation(s)
- Hazrat Ali
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Rizwan Qureshi
- Department of Imaging Physics, MD Anderson Cancer Center, University of Texas, Houston, Houston, TX, United States
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| |
Collapse
|
28
|
Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023; 24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open
Abstract
Artificial intelligence (AI) has gained significant traction in the field of drug discovery, with deep learning (DL) algorithms playing a crucial role in predicting protein-ligand binding affinities. Despite advancements in neural network architectures, system representation, and training techniques, the performance of DL affinity prediction has reached a plateau, prompting the question of whether it is truly solved or if the current performance is overly optimistic and reliant on biased, easily predictable data. Like other DL-related problems, this issue seems to stem from the training and test sets used when building the models. In this work, we investigate the impact of several parameters related to the input data on the performance of neural network affinity prediction models. Notably, we identify the size of the binding pocket as a critical factor influencing the performance of our statistical models; furthermore, it is more important to train a model with as much data as possible than to restrict the training to only high-quality datasets. Finally, we also confirm the bias in the typically used current test sets. Therefore, several types of evaluation and benchmarking are required to understand models' decision-making processes and accurately compare the performance of models.
Collapse
Affiliation(s)
- Pierre-Yves Libouban
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Samia Aci-Sèche
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| | - Jose Carlos Gómez-Tamayo
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Gary Tresadern
- Computational Chemistry, Janssen Research & Development, Janssen Pharmaceutica N. V., B-2340 Beerse, Belgium; (J.C.G.-T.); (G.T.)
| | - Pascal Bonnet
- Institute of Organic and Analytical Chemistry (ICOA), UMR7311, Université d’Orléans, CNRS, Pôle de Chimie rue de Chartres, 45067 Orléans, CEDEX 2, France; (P.-Y.L.); (S.A.-S.)
| |
Collapse
|
29
|
Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023; 22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]
Abstract
Developments in computational omics technologies have provided new means to access the hidden diversity of natural products, unearthing new potential for drug discovery. In parallel, artificial intelligence approaches such as machine learning have led to exciting developments in the computational drug design field, facilitating biological activity prediction and de novo drug design for molecular targets of interest. Here, we describe current and future synergies between these developments to effectively identify drug candidates from the plethora of molecules produced by nature. We also discuss how to address key challenges in realizing the potential of these synergies, such as the need for high-quality datasets to train deep learning algorithms and appropriate strategies for algorithm validation.
Collapse
Affiliation(s)
| | - Katherine R Duncan
- Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
| | - Somayah S Elsayed
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Neha Garg
- School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
| | - Justin J J van der Hooft
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
| | - Nathaniel I Martin
- Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - David Meijer
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Barbara R Terlouw
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
| | - Friederike Biermann
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Kai Blin
- The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Marina Gorostiola González
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
- ONCODE institute, Leiden, The Netherlands
| | - Eric J N Helfrich
- Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany
- LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
| | - Florian Huber
- Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
| | - Stefan Leopold-Messer
- Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
| | - Kohulan Rajan
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
| | - Tristan de Rond
- School of Chemical Sciences, University of Auckland, Auckland, New Zealand
| | - Jeffrey A van Santen
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
| | - Maria Sorokina
- Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany
- Pharmaceuticals R&D, Bayer AG, Berlin, Germany
| | - Marcy J Balunas
- Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA
- Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
| | - Mehdi A Beniddir
- Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
| | - Doris A van Bergeijk
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | - Laura M Carroll
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Chase M Clark
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chao Du
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
| | | | - Francesca Grisoni
- Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
- Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
| | | | - Willem Jespers
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
| | - Olga V Kalinina
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | | | - Hyunwoo Kim
- College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
| | - Tiago F Leao
- Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
| | - Joleen Masschelein
- Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium
- Department of Biology, KU Leuven, Heverlee, Belgium
| | - Evan R Rees
- Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
| | - Raphael Reher
- Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany
- Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
| | - Daniel Reker
- Department of Biomedical Engineering, Duke University, Durham, NC, USA
- Duke Microbiome Center, Duke University, Durham, NC, USA
| | - Philippe Schwaller
- Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
| | | | - Michael A Skinnider
- Adapsyn Bioscience, Hamilton, Ontario, Canada
- Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
| | - Allison S Walker
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
| | - Egon L Willighagen
- Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Barbara Zdrazil
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
| | - Nadine Ziemert
- Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
| | | | - Pierre Guyomard
- Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
| | - Andrea Volkamer
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
- In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
| | - William H Gerwick
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun Uk Kim
- Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany
- Department of Pharmacy, Saarland University, Saarbrücken, Germany
- German Center for infection research (DZIF), Braunschweig, Germany
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
| | - Gilles P van Wezel
- Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
- Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
| | - Gerard J P van Westen
- Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
| | - Anna K H Hirsch
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany.
- Department of Pharmacy, Saarland University, Saarbrücken, Germany.
- German Center for infection research (DZIF), Braunschweig, Germany.
- Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
| | - Roger G Linington
- Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
| | - Serina L Robinson
- Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
| | - Marnix H Medema
- Bioinformatics Group, Wageningen University, Wageningen, The Netherlands.
- Institute of Biology, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
30
|
Zhao X, Kong Y, Ji Y, Xin X, Chen L, Chen G, Yu C. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers 2023:10.1007/s11030-023-10735-2. [PMID: 37910346 DOI: 10.1007/s11030-023-10735-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/22/2023] [Indexed: 11/03/2023]
Abstract
Tropomyosin receptor kinases (TRKs) are important broad-spectrum anticancer targets. The oncogenic rearrangement of the NTRK gene disrupts the extracellular structural domain and epitopes for therapeutic antibodies, making small-molecule inhibitors essential for treating NTRK fusion-driven tumors. In this work, several algorithms were used to construct descriptor-based and nondescriptor-based models, and the models were evaluated by outer 10-fold cross-validation. To find a model with good generalization ability, the dataset was partitioned by random and cluster-splitting methods to construct in- and cross-domain models, respectively. Among the 48 models built, the model with the combination of the deep neural network (DNN) algorithm and extended connectivity fingerprints 4 (ECFP4) descriptors achieved excellent performance in both dataset divisions. The results indicate that the DNN algorithm has a strong generalization prediction ability, and the richness of features plays a vital role in predicting unknown spatial molecules. Additionally, we combined the clustering results and decision tree models of fingerprint descriptors to perform structure-activity relationship analysis. It was found that nitrogen-containing aromatic heterocyclic and benzo heterocyclic structures play a crucial role in enhancing the activity of TRK inhibitors. Workflow for generating predictive models for TRK inhibitors.
Collapse
Affiliation(s)
- Xiaoman Zhao
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Yue Kong
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Yueshan Ji
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Xiulan Xin
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Liang Chen
- College of Bio engineering, No. 9 Liangshuihe 1st Street, Beijing, 100176, People's Republic of China
| | - Guang Chen
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China
| | - Changyuan Yu
- College of Life Science and Technology, Beijing University of Chemical Technology, 15 BeiSanHuan East Road, Beijing, 100029, People's Republic of China.
| |
Collapse
|
31
|
Banerjee A, Roy K. Read-across-based intelligent learning: development of a global q-RASAR model for the efficient quantitative predictions of skin sensitization potential of diverse organic chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023; 25:1626-1644. [PMID: 37682520 DOI: 10.1039/d3em00322a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]
Abstract
Environmental chemicals and contaminants cause a wide array of harmful implications to terrestrial and aquatic life which ranges from skin sensitization to acute oral toxicity. The current study aims to assess the quantitative skin sensitization potential of a large set of industrial and environmental chemicals acting through different mechanisms using the novel quantitative Read-Across Structure-Activity Relationship (q-RASAR) approach. Based on the identified important set of structural and physicochemical features, Read-Across-based hyperparameters were optimized using the training set compounds followed by the calculation of similarity and error-based RASAR descriptors. Data fusion, further feature selection, and removal of prediction confidence outliers were performed to generate a partial least squares (PLS) q-RASAR model, followed by the application of various Machine Learning (ML) tools to check the quality of predictions. The PLS model was found to be the best among different models. A simple user-friendly Java-based software tool was developed based on the PLS model, which efficiently predicts the toxicity value(s) of query compound(s) along with their status of Applicability Domain (AD) in terms of leverage values. This model has been developed using structurally diverse compounds and is expected to predict efficiently and quantitatively the skin sensitization potential of environmental chemicals to estimate their occupational and health hazards.
Collapse
Affiliation(s)
- Arkaprava Banerjee
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| | - Kunal Roy
- Drug Theoretics and Cheminformatics Laboratory, Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India.
| |
Collapse
|
32
|
Hodyna D, Kovalishyn V, Romanenko Y, Semenyuta I, Blagodatny V, Kachaeva M, Brazhko O, Metelytsia L. Quinoline Hydrazone Derivatives as New Antibacterials against Multidrug Resistant Strains. Chem Biodivers 2023; 20:e202300839. [PMID: 37552570 DOI: 10.1002/cbdv.202300839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/01/2023] [Accepted: 08/07/2023] [Indexed: 08/10/2023]
Abstract
To develop novel antimicrobial agents a series of 2(4)-hydrazone derivatives of quinoline were designed, synthesized and tested. QSAR models of the antibacterial activity of quinoline derivatives were developed by the OCHEM web platform using different machine learning methods. A virtual set of quinoline derivatives was verified with a previously published classification model of anti-E. coli activity and screened using the regression model of anti-S. aureus activity. Selected and synthesized 2(4)-hydrazone derivatives of quinoline exhibited antibacterial activity against the standard and antibiotic-resistant S. aureus and E. coli strains in the range from 15 to 30 mm by the diameter of growth inhibition zones. Molecular docking showed the complex formation of the studied compounds into the catalytic domain of dihydrofolate reductase with an estimated binding affinity from -8.4 to -9.4 kcal/mol.
Collapse
Affiliation(s)
- Diana Hodyna
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of, National Academy of Science of Ukraine, Kyiv, 02094, Academician Kukhar Str., 1, Ukraine
| | - Vasyl Kovalishyn
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of, National Academy of Science of Ukraine, Kyiv, 02094, Academician Kukhar Str., 1, Ukraine
| | - Yanina Romanenko
- Zaporizhzhya National University, Faculty of Biology, Zaporizhzhya, 69095, Zhukovs'ky Str., 66, Ukraine
| | - Ivan Semenyuta
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of, National Academy of Science of Ukraine, Kyiv, 02094, Academician Kukhar Str., 1, Ukraine
| | - Volodymyr Blagodatny
- Shupyk National Healthcare University of Ukraine, Kyiv, 04112, Dorogozhytska Str., 9, Ukraine
| | - Maryna Kachaeva
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of, National Academy of Science of Ukraine, Kyiv, 02094, Academician Kukhar Str., 1, Ukraine
| | - Oleksandr Brazhko
- Zaporizhzhya National University, Faculty of Biology, Zaporizhzhya, 69095, Zhukovs'ky Str., 66, Ukraine
| | - Larysa Metelytsia
- V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of, National Academy of Science of Ukraine, Kyiv, 02094, Academician Kukhar Str., 1, Ukraine
| |
Collapse
|
33
|
Sar S, Mitra S, Panda P, Mandal SC, Ghosh N, Halder AK, Cordeiro MNDS. In Silico Modeling and Structural Analysis of Soluble Epoxide Hydrolase Inhibitors for Enhanced Therapeutic Design. Molecules 2023; 28:6379. [PMID: 37687207 PMCID: PMC10490281 DOI: 10.3390/molecules28176379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/17/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open
Abstract
Human soluble epoxide hydrolase (sEH), a dual-functioning homodimeric enzyme with hydrolase and phosphatase activities, is known for its pivotal role in the hydrolysis of epoxyeicosatrienoic acids. Inhibitors targeting sEH have shown promising potential in the treatment of various life-threatening diseases. In this study, we employed a range of in silico modeling approaches to investigate a diverse dataset of structurally distinct sEH inhibitors. Our primary aim was to develop predictive and validated models while gaining insights into the structural requirements necessary for achieving higher inhibitory potential. To accomplish this, we initially calculated molecular descriptors using nine different descriptor-calculating tools, coupled with stochastic and non-stochastic feature selection strategies, to identify the most statistically significant linear 2D-QSAR model. The resulting model highlighted the critical roles played by topological characteristics, 2D pharmacophore features, and specific physicochemical properties in enhancing inhibitory potential. In addition to conventional 2D-QSAR modeling, we implemented the Transformer-CNN methodology to develop QSAR models, enabling us to obtain structural interpretations based on the Layer-wise Relevance Propagation (LRP) algorithm. Moreover, a comprehensive 3D-QSAR analysis provided additional insights into the structural requirements of these compounds as potent sEH inhibitors. To validate the findings from the QSAR modeling studies, we performed molecular dynamics (MD) simulations using selected compounds from the dataset. The simulation results offered crucial insights into receptor-ligand interactions, supporting the predictions obtained from the QSAR models. Collectively, our work serves as an essential guideline for the rational design of novel sEH inhibitors with enhanced therapeutic potential. Importantly, all the in silico studies were performed using open-access tools to ensure reproducibility and accessibility.
Collapse
Affiliation(s)
- Shuvam Sar
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Soumya Mitra
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
| | - Parthasarathi Panda
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
| | - Subhash C. Mandal
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Nilanjan Ghosh
- Department of Pharmaceutical Technology, Jadavpur University, Kolkata 700032, India; (S.S.)
| | - Amit Kumar Halder
- Dr. B. C. Roy College of Pharmacy and Allied Health Sciences, Campus Dr. Meghnad Saha Sarani, Durgapur 713206, India
- LAQV@REQUIMTE—Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| | - Maria Natalia D. S. Cordeiro
- LAQV@REQUIMTE—Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, Portugal
| |
Collapse
|
34
|
Miao Y, Ma H, Huang J. Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning. Chem Res Toxicol 2023; 36:1206-1226. [PMID: 37562046 DOI: 10.1021/acs.chemrestox.2c00384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
The development of new drugs is time-consuming and expensive, and as such, accurately predicting the potential toxicity of a drug candidate is crucial in ensuring its safety and efficacy. Recently, deep graph learning has become prevalent in this field due to its computational power and cost efficiency. Many novel deep graph learning methods aid toxicity prediction and further prompt drug development. This review aims to connect fundamental knowledge with burgeoning deep graph learning methods. We first summarize the essential components of deep graph learning models for toxicity prediction, including molecular descriptors, molecular representations, evaluation metrics, validation methods, and data sets. Furthermore, based on various graph-related representations of molecules, we introduce several representative studies and methods for toxicity prediction from the perspective of GNN architectures and graph pretrained models. Compared to other types of models, deep graph models not only advance in higher accuracy and efficiency but also provide more intuitive insights, which is significant in the development of model interpretation and generalization ability. The graph pretrained models are emerging as they can extract prominent features from large-scale unlabeled molecular graph data and improve the performance of downstream toxicity prediction tasks. We hope this survey can serve as a handbook for individuals interested in exploring deep graph learning for toxicity prediction.
Collapse
Affiliation(s)
- Yuwei Miao
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Hehuan Ma
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| | - Junzhou Huang
- Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, Texas 76019, United States
| |
Collapse
|
35
|
Mrug G, Hodyna D, Metelytsia L, Kovalishyn V, Trokhimenko O, Bondarenko S, Kondratyuk K, Kozitskiy A, Frasinyuk M. Structure-Activity Relationship Prediction-Based Synthesis and Cytotoxicity Evaluation against the HEp-2 Laryngeal Carcinoma Cell of Isoflavone-Cytisine Mannich Bases. Chem Biodivers 2023; 20:e202300560. [PMID: 37477067 DOI: 10.1002/cbdv.202300560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/15/2023] [Accepted: 07/20/2023] [Indexed: 07/22/2023]
Abstract
QSAR analysis of previously synthesized and nature-inspired virtual isoflavone-cytisine hybrids against the HEp-2 laryngeal carcinoma cell lines was performed using the OCHEM web platform. The validation of the models using an external test set proved that the models can be used to predict the activity of newly designed compounds such as 8-cytisinylmethyl derivatives of 5,7- and 6,7-dihydroxyisoflavones. The synthetic procedure for selective aminomethylation of 5,7-dihydroxyisoflavones with cytisine was developed. In vitro testing identified compound 7 f with cisplatin-level cytotoxicity against HEp-2 cell lines and compound 10 which was twice active than cisplatin after 72 h of incubation.
Collapse
Affiliation(s)
- Galyna Mrug
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Diana Hodyna
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Larysa Metelytsia
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Vasyl Kovalishyn
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | - Olena Trokhimenko
- Shupyk National Healthcare University of Ukraine, Kyiv, 04112, Ukraine
| | - Svitlana Bondarenko
- Department of Food Chemistry, National University of Food Technologies, Kyiv, 01601, Ukraine
| | - Kostyantyn Kondratyuk
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
| | | | - Mykhaylo Frasinyuk
- V. P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry of National Academy of Science of Ukraine, Kyiv, 02094, Ukraine
- Enamine Ltd., Kyiv, 02094, Ukraine
| |
Collapse
|
36
|
Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023; 24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
In modern drug discovery, the combination of chemoinformatics and quantitative structure-activity relationship (QSAR) modeling has emerged as a formidable alliance, enabling researchers to harness the vast potential of machine learning (ML) techniques for predictive molecular design and analysis. This review delves into the fundamental aspects of chemoinformatics, elucidating the intricate nature of chemical data and the crucial role of molecular descriptors in unveiling the underlying molecular properties. Molecular descriptors, including 2D fingerprints and topological indices, in conjunction with the structure-activity relationships (SARs), are pivotal in unlocking the pathway to small-molecule drug discovery. Technical intricacies of developing robust ML-QSAR models, including feature selection, model validation, and performance evaluation, are discussed herewith. Various ML algorithms, such as regression analysis and support vector machines, are showcased in the text for their ability to predict and comprehend the relationships between molecular structures and biological activities. This review serves as a comprehensive guide for researchers, providing an understanding of the synergy between chemoinformatics, QSAR, and ML. Due to embracing these cutting-edge technologies, predictive molecular analysis holds promise for expediting the discovery of novel therapeutic agents in the pharmaceutical sciences.
Collapse
Affiliation(s)
- Sarfaraz K Niazi
- College of Pharmacy, University of Illinois, Chicago, IL 61820, USA
| | - Zamara Mariam
- Zamara Mariam, School of Interdisciplinary Engineering & Sciences (SINES), National University of Sciences & Technology (NUST), Islamabad 24090, Pakistan
| |
Collapse
|
37
|
Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023; 123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.
Collapse
Affiliation(s)
- Bozheng Dou
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Zailiang Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Ekaterina Merkurjev
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Lu Ke
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Long Chen
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jian Jiang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Yueying Zhu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Jie Liu
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Bengong Zhang
- Research Center of Nonlinear Science, School of Mathematical and Physical Sciences,Wuhan Textile University, Wuhan 430200, P, R. China
| | - Guo-Wei Wei
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Electrical and Computer Engineering, Michigan State University, East Lansing, Michigan 48824, United States
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
38
|
Srivathsa AV, Sadashivappa NM, Hegde AK, Radha S, Mahesh AR, Ammunje DN, Sen D, Theivendren P, Govindaraj S, Kunjiappan S, Pavadai P. A Review on Artificial Intelligence Approaches and Rational Approaches in Drug Discovery. Curr Pharm Des 2023; 29:1180-1192. [PMID: 37132148 DOI: 10.2174/1381612829666230428110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/06/2023] [Accepted: 02/27/2023] [Indexed: 05/04/2023]
Abstract
Artificial intelligence (AI) speeds up the drug development process and reduces its time, as well as the cost which is of enormous importance in outbreaks such as COVID-19. It uses a set of machine learning algorithms that collects the available data from resources, categorises, processes and develops novel learning methodologies. Virtual screening is a successful application of AI, which is used in screening huge drug-like databases and filtering to a small number of compounds. The brain's thinking of AI is its neural networking which uses techniques such as Convoluted Neural Network (CNN), Recursive Neural Network (RNN) or Generative Adversial Neural Network (GANN). The application ranges from small molecule drug discovery to the development of vaccines. In the present review article, we discussed various techniques of drug design, structure and ligand-based, pharmacokinetics and toxicity prediction using AI. The rapid phase of discovery is the need of the hour and AI is a targeted approach to achieve this.
Collapse
Affiliation(s)
- Anjana Vidya Srivathsa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Nandini Markuli Sadashivappa
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Apeksha Krishnamurthy Hegde
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Srimathi Radha
- Department of Pharmaceutical Chemistry, SRM College of Pharmacy, Faculty of Medicine and Health Sciences, SRM Institute of Science and Technology, Chengalpattu District, Kattankulathur, Tamil Nadu, 603203, India
| | - Agasa Ramu Mahesh
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Damodar Nayak Ammunje
- Department of Pharmacology, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| | - Debanjan Sen
- Department of Pharmaceutical Chemistry, BCDA College of Pharmacy & Technology, Hridaypur, Kolkata, 700127, West Bengal, India
| | - Panneerselvam Theivendren
- Department of Pharmaceutical Chemistry, Swamy Vivekanandha College of Pharmacy, Elayampalayam, Tiruchengode, 637205, India
| | - Saravanan Govindaraj
- Department of Pharmaceutical Chemistry, MNR College of Pharmacy, Fasalwadi, Sangareddy, 502 001, India
| | - Selvaraj Kunjiappan
- Department of Biotechnology, Kalasalingam Academy of Research and Education, Krishnankoil, 626126, India
| | - Parasuraman Pavadai
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, M.S. Ramaiah University of Applied Sciences, M.S.R. Nagar, Bengaluru, 560054, India
| |
Collapse
|
39
|
Qian X, Dai X, Luo L, Lin M, Xu Y, Zhao Y, Huang D, Qiu H, Liang L, Liu H, Liu Y, Gu L, Lu T, Chen Y, Zhang Y. An Interpretable Multitask Framework BiLAT Enables Accurate Prediction of Cyclin-Dependent Protein Kinase Inhibitors. J Chem Inf Model 2023. [PMID: 37171216 DOI: 10.1021/acs.jcim.3c00473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
The cyclin-dependent protein kinases (CDKs) are protein-serine/threonine kinases with crucial effects on the regulation of cell cycle and transcription. CDKs can be a hallmark of cancer since their excessive expression could lead to impaired cell proliferation. However, the selectivity profile of most developed CDK inhibitors is not enough, which have hindered the therapeutic use of CDK inhibitors. In this study, we propose a multitask deep learning framework called BiLAT based on SMILES representation for the prediction of the inhibitory activity of molecules on eight CDK subtypes (CDK1, 2, 4-9). The framework is mainly composed of an improved bidirectional long short-term memory module BiLSTM and the encode layer of the Transformer framework. Additionally, the data enhancement method of SMILES enumeration is applied to improve the performance of the model. Compared with baseline predictive models based on three conventional machine learning methods and two multitask deep learning algorithms, BiLAT achieves the best performance with the highest average AUC, ACC, F1-score, and MCC values of 0.938, 0.894, 0.911, and 0.715 for the test set. Moreover, we constructed a targeted external data set CDK-Dec for the CDK family, which mainly contains bait values screened by 3D similarity with active compounds. This dataset was utilized in the subsequent evaluation of our model. It is worth mentioning that the BiLAT model is interpretable and can be used by chemists to design and synthesize compounds with improved activity. To further verify the generalization ability of the multitask BiLAT model, we also conducted another evaluation on three public datasets (Tox21, ClinTox, and SIDER). Compared with several currently popular models, BiLAT shows the best performance on two datasets. These results indicate that BiLAT is an effective tool for accelerating drug discovery.
Collapse
Affiliation(s)
- Xu Qian
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Xiaowen Dai
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lin Luo
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Mingde Lin
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yuan Xu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yang Zhao
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Dingfang Huang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haodi Qiu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Li Liang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Haichun Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yingbo Liu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Lingxi Gu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Tao Lu
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
- State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
| | - Yadong Chen
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| | - Yanmin Zhang
- Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
| |
Collapse
|
40
|
Zhang B, Lin J, Du L, Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers (Basel) 2023; 15:polym15092224. [PMID: 37177370 PMCID: PMC10180765 DOI: 10.3390/polym15092224] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open
Abstract
As a template-free, data-driven methodology, the molecular transformer model provides an alternative by which to predict the outcome of chemical reactions and design the route of the retrosynthetic plane in the field of organic synthesis and polymer chemistry. However, in consideration of the small datasets of chemical reactions, the data-driven model suffers from the difficulty of low accuracy in the prediction tasks of chemical reactions. In this contribution, we integrate the molecular transformer model with the strategies of data augmentation and normalization preprocessing to accomplish the three tasks of chemical reactions, including the forward predictions of chemical reactions, and single-step retrosynthetic predictions with and without the reaction classes. It is clearly demonstrated that the prediction accuracy of the molecular transformer model can be significantly raised by the use of proposed strategies for the three tasks of chemical reactions. Notably, after the introduction of the 40-level data augmentation and normalization preprocessing, the top-1 accuracy of the forward prediction increases markedly from 71.6% to 84.2% and the top-1 accuracy of the single-step retrosynthetic prediction with additional reaction class increases from 53.2% to 63.4%. Furthermore, it is found that the superior performance of the data-driven model originates from the correction of the grammatical errors of the SMILES strings, especially for the case of the reaction classes with small datasets.
Collapse
Affiliation(s)
- Boyu Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Jiaping Lin
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Lei Du
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| | - Liangshun Zhang
- Shanghai Key Laboratory of Advanced Polymeric Materials, School of Materials Science and Engineering, East China University of Science and Technology, Shanghai 200237, China
| |
Collapse
|
41
|
Patlewicz G, Paul-Friedman K, Houck K, Zhang L, Huang R, Xia M, Brown J, Simmons SO. Evaluating the utility of a high throughput thiol-containing fluorescent probe to screen for reactivity: A case study with the Tox21 library. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023; 26:10.1016/j.comtox.2023.100271. [PMID: 37388277 PMCID: PMC10304587 DOI: 10.1016/j.comtox.2023.100271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]
Abstract
High-throughput screening (HTS) assays for bioactivity in the Tox21 program aim to evaluate an array of different biological targets and pathways, but a significant barrier to interpretation of these data is the lack of high-throughput screening (HTS) assays intended to identify non-specific reactive chemicals. This is an important aspect for prioritising chemicals to test in specific assays, identifying promiscuous chemicals based on their reactivity, as well as addressing hazards such as skin sensitisation which are not necessarily initiated by a receptor-mediated effect but act through a non-specific mechanism. Herein, a fluorescence-based HTS assay that allows the identification of thiol-reactive compounds was used to screen 7,872 unique chemicals in the Tox21 10K chemical library. Active chemicals were compared with profiling outcomes using structural alerts encoding electrophilic information. Random Forest classification models based on chemical fingerprints were developed to predict assay outcomes and evaluated through 10-fold stratified cross validation (CV). The mean CV Balanced Accuracy of the validation set was 0.648. The model developed shows promise as a tool to screen untested chemicals for their potential electrophilic reactivity based solely on chemical structural features.
Collapse
Affiliation(s)
- Grace Patlewicz
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Katie Paul-Friedman
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Keith Houck
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Li Zhang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Ruili Huang
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Menghang Xia
- National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, Bethesda, MD 20892, USA
| | - Jason Brown
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| | - Steven O. Simmons
- Center for Computational Toxicology & Exposure (CCTE), U.S. Environmental Protection Agency, Research Triangle Park, Durham, NC, 27709, USA
| |
Collapse
|
42
|
Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of 11B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023; 25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]
Abstract
In this article, we present the results of developing a model based on an RFR machine learning method using the ISIDA fragment descriptors for predicting the 11B NMR chemical shift of BODIPYs. The model is freely available at https://ochem.eu/article/146458. The model demonstrates the high quality of predicting the 11B NMR chemical shift (RMSE, 5CV (FINALE training set) = 0.40 ppm, RMSE (TEST set) = 0.14 ppm). In addition, we compared the "cost" and the user-friendliness for calculations using the quantum-chemical model with the DFT/GIAO approach. The 11B NMR chemical shift prediction accuracy (RMSE) of the model considered is more than three times higher and tremendously faster than the DFT/GIAO calculations. As a result, we provide a convenient tool and database that we collected for all researchers, that allows them to predict the 11B NMR chemical shift of boron-containing dyes. We believe that the new model will make it easier for researchers to correctly interpret the 11B NMR chemical shifts experimentally determined and to select more optimal conditions to perform an NMR experiment.
Collapse
Affiliation(s)
- Alexander A Ksenofontov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Yaroslav I Isaev
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Michail M Lukanov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Dmitry M Makarov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Varvara A Eventova
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia. .,Ivanovo State University of Chemistry and Technology, 7, Sheremetevskiy Avenue, Ivanovo 153000, Russia
| | - Ilya A Khodov
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| | - Mechail B Berezin
- G.A. Krestov Institute of Solution Chemistry of the Russian Academy of Sciences, Akademicheskaya Street, 153045 Ivanovo, Russia.
| |
Collapse
|
43
|
ASI-DBNet: An Adaptive Sparse Interactive ResNet-Vision Transformer Dual-Branch Network for the Grading of Brain Cancer Histopathological Images. Interdiscip Sci 2023; 15:15-31. [PMID: 35810266 DOI: 10.1007/s12539-022-00532-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/26/2022] [Accepted: 05/31/2022] [Indexed: 10/17/2022]
Abstract
Brain cancer is the deadliest cancer that occurs in the brain and central nervous system, and rapid and precise grading is essential to reduce patient suffering and improve survival. Traditional convolutional neural network (CNN)-based computer-aided diagnosis algorithms cannot fully utilize the global information of pathology images, and the recently popular vision transformer (ViT) model does not focus enough on the local details of pathology images, both of which lead to a lack of precision in the focus of the model and a lack of accuracy in the grading of brain cancer. To solve this problem, we propose an adaptive sparse interaction ResNet-ViT dual-branch network (ASI-DBNet). First, we design the ResNet-ViT parallel structure to simultaneously capture and retain the local and global information of pathology images. Second, we design the adaptive sparse interaction block (ASIB) to interact the ResNet branch with the ViT branch. Furthermore, we introduce the attention mechanism in ASIB to adaptively filter the redundant information from the dual branches during the interaction so that the feature maps delivered during the interaction are more beneficial. Intensive experiments have shown that ASI-DBNet performs best in various baseline and SOTA models, with 95.24% accuracy in four grades. In particular, for brain tumors with a high degree of deterioration (Grade III and Grade IV), the highest diagnostic accuracies achieved by ASI-DBNet are 97.93% and 96.28%, respectively, which is of great clinical significance. Meanwhile, the gradient-weighted class activation map (Grad_cam) and attention rollout visualization mechanisms are utilized to visualize the working logic behind the model, and the resulting feature maps highlight the important distinguishing features related to the diagnosis. Therefore, the interpretability and confidence of the model are improved, which is of great value for the clinical diagnosis of brain cancer.
Collapse
|
44
|
SuHAN: Substructural hierarchical attention network for molecular representation. J Mol Graph Model 2023; 119:108401. [PMID: 36584590 DOI: 10.1016/j.jmgm.2022.108401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/16/2022] [Accepted: 12/23/2022] [Indexed: 12/26/2022]
Abstract
Recently, molecular representation and property exploration, with the combination of neural network, play a critical role in the field of drug design and discovery for assisting in drug related research. However, previous research in molecular representation relies heavily on artificial extraction of features based on biological experiments which may result in a manually introduced noise of molecular information with high cost in time and money. In this paper, a novel method named Substructural Hierarchical Attention Network (SuHAN) is proposed to discover inherent characteristics of molecules for representation learning. Specifically, SuHAN is composed of the cascaded layer: atom-level layer and substructure-level layer. Molecule in the SMILES format is divided into several substructural fragments by predefined partition rules, and then they are fed into atom-level layer and substructure-level layer successively to obtain feature representation from different perspective: atomic view and substructural view. In this way, the prominent structural features that may be omitted in global extraction are excavated from a fine-grained viewpoint and fused to reconstruct representative pattern in an overall view. Experiments on biophysics and physiology datasets demonstrate that our model is competitive with a significant improvement of both accuracy and stability in performance. We confirmed that the substructural segments and progressive hierarchical networks lead to an effective molecular representation for downstream tasks. These results provide a novel perspective about reconstructing overall pattern through local prominent structure.
Collapse
|
45
|
Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023; 28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open
Abstract
Spiking neural networks are biologically inspired machine learning algorithms attracting researchers' attention for their applicability to alternative energy-efficient hardware other than traditional computers. In the current work, spiking neural networks have been tested in a quantitative structure-activity analysis targeting the toxicity of molecules. Multiple public-domain databases of compounds have been evaluated with spiking neural networks, achieving accuracies compatible with high-quality frameworks presented in the previous literature. The numerical experiments also included an analysis of hyperparameters and tested the spiking neural networks on molecular fingerprints of different lengths. Proposing alternatives to traditional software and hardware for time- and resource-consuming tasks, such as those found in chemoinformatics, may open the door to new research and improvements in the field.
Collapse
Affiliation(s)
- Mauro Nascimben
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
- Enginsoft SpA, 35129 Padua, Italy
- Correspondence:
| | - Lia Rimondini
- Department of Health Sciences, Center on Autoimmune and Allergic Diseases CAAD, Università del Piemonte Orientale, 28100 Novara, Italy
| |
Collapse
|
46
|
XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J Cheminform 2023; 15:2. [PMID: 36609340 PMCID: PMC9817292 DOI: 10.1186/s13321-022-00673-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Explainable artificial intelligence (XAI) methods have shown increasing applicability in chemistry. In this context, visualization techniques can highlight regions of a molecule to reveal their influence over a predicted property. For this purpose, some XAI techniques calculate attribution scores associated with tokens of SMILES strings or with atoms of a molecule. While an association of a score with an atom can be directly visually represented on a molecule diagram, scores computed for SMILES non-atom tokens cannot. For instance, a substring [N+] contains 3 non-atom tokens, i.e., [, [Formula: see text], and ], and their attributions, depending on the model, are not necessarily revealing an influence of the nitrogen atom over the predicted property; for that reason, it is not possible to represent the scores on a molecule diagram. Moreover, SMILES's notation is complex, foregrounding the need for techniques to facilitate the analysis of explanations associated with their tokens. RESULTS We propose XSMILES, an interactive visualization technique, to explore explainable artificial intelligence attributions scores and support the interpretation of SMILES. Users can input any type of score attributed to atom and non-atom tokens and visualize them on top of a 2D molecule diagram coordinated with a bar chart that represents a SMILES string. We demonstrate how attributions calculated for SMILES strings can be evaluated and better interpreted through interactivity with two use cases. CONCLUSIONS Data scientists can use XSMILES to understand their models' behavior and compare multiple modeling approaches. The tool provides a set of parameters to adapt the visualization to users' needs and it can be integrated into different platforms. We believe XSMILES can support data scientists to develop, improve, and communicate their models by making it easier to identify patterns and compare attributions through interactive exploratory visualization.
Collapse
|
47
|
Zheng X, Tomiura Y, Hayashi K. Investigation of the structure-odor relationship using a Transformer model. J Cheminform 2022; 14:88. [PMID: 36581889 PMCID: PMC9798546 DOI: 10.1186/s13321-022-00671-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 12/14/2022] [Indexed: 12/30/2022] Open
Abstract
The relationships between molecular structures and their properties are subtle and complex, and the properties of odor are no exception. Molecules with similar structures, such as a molecule and its optical isomer, may have completely different odors, whereas molecules with completely distinct structures may have similar odors. Many works have attempted to explain the molecular structure-odor relationship from chemical and data-driven perspectives. The Transformer model is widely used in natural language processing and computer vision, and the attention mechanism included in the Transformer model can identify relationships between inputs and outputs. In this paper, we describe the construction of a Transformer model for predicting molecular properties and interpreting the prediction results. The SMILES data of 100,000 molecules are collected and used to predict the existence of molecular substructures, and our proposed model achieves an F1 value of 0.98. The attention matrix is visualized to investigate the substructure annotation performance of the attention mechanism, and we find that certain atoms in the target substructures are accurately annotated. Finally, we collect 4462 molecules and their odor descriptors and use the proposed model to infer 98 odor descriptors, obtaining an average F1 value of 0.33. For the 19 odor descriptors that achieved F1 values greater than 0.45, we also attempt to summarize the relationship between the molecular substructures and odor quality through the attention matrix.
Collapse
Affiliation(s)
- Xiaofan Zheng
- grid.177174.30000 0001 2242 4849Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Yoichi Tomiura
- grid.177174.30000 0001 2242 4849Graduate School of Information Science and Electrical Engineering, Department of Informatics, Kyushu University, Fukuoka, Japan
| | - Kenshi Hayashi
- grid.177174.30000 0001 2242 4849Graduate School of Information Science and Electrical Engineering, Department of Electronics, Kyushu University, Fukuoka, Japan
| |
Collapse
|
48
|
Makarov D, Fadeeva Y, Safonova E, Shmukler L. Predictive modeling of antibacterial activity of ionic liquids by machine learning methods. Comput Biol Chem 2022; 101:107775. [DOI: 10.1016/j.compbiolchem.2022.107775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/24/2022] [Accepted: 10/03/2022] [Indexed: 11/03/2022]
|
49
|
Muzychka LV, Verves EV, Yaremchuk IO, Zinchenko AM, Shishkina SV, Semenyuta IV, Hodyna DM, Metelytsia LO, Kovalishyn V, Smolii OB. Synthesis, QSAR modeling, and molecular docking of novel fused 7-deazaxanthine derivatives as adenosine A 2A receptor antagonists. Chem Biol Drug Des 2022; 100:1025-1032. [PMID: 34651417 DOI: 10.1111/cbdd.13975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 01/25/2023]
Abstract
Predictive QSAR models for the search of new adenosine A2A receptor antagonists were developed by using OCHEM platform. The predictive ability of the regression models has coefficient of determination q2 = 0.65-0.71 with cross-validation and independent test set. The inhibition activities of novel fused 7-deazaxanthine compounds were predicted by the developed QSAR models. A preparative method for the synthesis of pyrimido[5',4':4,5]pyrrolo[1,2-a][1,4]diazepine derivatives was developed, and 11 new adenosine A2A receptor antagonists were obtained. Preliminary investigations into the toxicology of fused 7-deazaxanthine compounds toward commonly used model organism to assess toxicity invertebrate cladoceran D. magna were also described.
Collapse
Affiliation(s)
- Liubov V Muzychka
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Evgenii V Verves
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine.,Enamine Ltd, Kyiv, Ukraine
| | - Iryna O Yaremchuk
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Anna M Zinchenko
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Svitlana V Shishkina
- Department of X-ray Diffraction Studies and Quantum Chemistry, STC "Institute for Single Crystals", NAS of Ukraine, Kharkiv, Ukraine
| | - Ivan V Semenyuta
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Diana M Hodyna
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Larysa O Metelytsia
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Vasyl Kovalishyn
- Department of Medical and Biological Researches, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| | - Oleg B Smolii
- Department of Chemistry of Natural Compounds, V.P. Kukhar Institute of Bioorganic Chemistry and Petrochemistry, NAS of Ukraine, Kyiv, Ukraine
| |
Collapse
|
50
|
Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022; 56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]
Abstract
Recently, using artificial intelligence (AI) in drug discovery has received much attention since it significantly shortens the time and cost of developing new drugs. Deep learning (DL)-based approaches are increasingly being used in all stages of drug development as DL technology advances, and drug-related data grows. Therefore, this paper presents a systematic Literature review (SLR) that integrates the recent DL technologies and applications in drug discovery Including, drug-target interactions (DTIs), drug-drug similarity interactions (DDIs), drug sensitivity and responsiveness, and drug-side effect predictions. We present a review of more than 300 articles between 2000 and 2022. The benchmark data sets, the databases, and the evaluation measures are also presented. In addition, this paper provides an overview of how explainable AI (XAI) supports drug discovery problems. The drug dosing optimization and success stories are discussed as well. Finally, digital twining (DT) and open issues are suggested as future research challenges for drug discovery problems. Challenges to be addressed, future research directions are identified, and an extensive bibliography is also included.
Collapse
Affiliation(s)
- Heba Askr
- Faculty of Computers and Artificial Intelligence, University of Sadat City, Sadat City, Egypt
| | - Enas Elgeldawi
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Heba Aboul Ella
- Faculty of Pharmacy and Drug Technology, Chinese University in Egypt (CUE), Cairo, Egypt
| | | | - Mamdouh M. Gomaa
- Computer Science Department, Faculty of Science, Minia University, Minia, Egypt
| | - Aboul Ella Hassanien
- Faculty of Computers and Artificial Intelligence, Cairo University, Cairo, Egypt
| |
Collapse
|