Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 2020;12:17. [PMID: 33431004 PMCID: PMC7079452 DOI: 10.1186/s13321-020-00423-w] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/09/2020] [Indexed: 01/03/2023] Open

For:	Karpov P, Godin G, Tetko IV. Transformer-CNN: Swiss knife for QSAR modeling and interpretation. J Cheminform 2020;12:17. [PMID: 33431004 PMCID: PMC7079452 DOI: 10.1186/s13321-020-00423-w] [Citation(s) in RCA: 107] [Impact Index Per Article: 26.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 03/09/2020] [Indexed: 01/03/2023] Open

Number

Cited by Other Article(s)

Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024;29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]

Banerjee A, Roy K. ARKA: a framework of dimensionality reduction for machine-learning classification modeling, risk assessment, and data gap-filling of sparse environmental toxicity data. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024;26:991-1007. [PMID: 38743054 DOI: 10.1039/d4em00173g] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Abstract

Due to the lack of experimental toxicity data for environmental chemicals, there arises a need to fill data gaps by in silico approaches. One of the most commonly used in silico approaches for toxicity assessment of small datasets is the Quantitative Structure-Activity Relationship (QSAR), which generates predictive models for the efficient prediction of query compounds. However, the reliability of the predictions from QSARs derived from small datasets is often questionable from a statistical point of view. This is due to the presence of a larger number of descriptors as compared to the number of training compounds, which reduces the degree of freedom of the developed model. To reduce the overall prediction error for a particular QSAR model, we have proposed here the computation of the novel Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. We have reduced the number of modeling descriptors in a supervised manner by partitioning them into K classes (K = 2 here) depending on the higher mean normalized values of the descriptors to a particular response class, thus preventing the loss of chemical information. A scatter plot of the data points using the values of two ARKA descriptors (ARKA_2 vs. ARKA_1) can potentially identify activity cliffs, less confident data points, and less modelable data points. We have used here five representative environmentally relevant endpoints (skin sensitization, earthworm toxicity, milk/plasma partitioning, algal toxicity, and rodent carcinogenicity of hazardous chemicals) with graded responses to which the ARKA framework was applied for classification modeling. On comparing the performance of the models generated using conventional QSAR descriptors and the ARKA descriptors, the prediction quality of the models derived from ARKA descriptors was found, based on multiple graded-data validation metrics-derived decision criteria, much better than the models derived from QSAR descriptors signifying the potential of ARKA descriptors in ecotoxicological classification modeling of small data sets. Additionally, this holds true for the Read-Across approach as well, since the Read-Across predictions using ARKA descriptors supersede the predictions generated from QSAR descriptors. For the ease of users, a Java-based expert system has been developed that computes the ARKA descriptors from the input of QSAR descriptors.

Collapse

Zheng X, Tomiura Y. A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence. J Cheminform 2024;16:71. [PMID: 38898528 PMCID: PMC11186148 DOI: 10.1186/s13321-024-00848-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 04/27/2024] [Indexed: 06/21/2024] Open

Abstract

Among the various molecular properties and their combinations, it is a costly process to obtain the desired molecular properties through theory or experiment. Using machine learning to analyze molecular structure features and to predict molecular properties is a potentially efficient alternative for accelerating the prediction of molecular properties. In this study, we analyze molecular properties through the molecular structure from the perspective of machine learning. We use SMILES sequences as inputs to an artificial neural network in extracting molecular structural features and predicting molecular properties. A SMILES sequence comprises symbols representing molecular structures. To address the problem that a SMILES sequence is different from actual molecular structural data, we propose a pretraining model for a SMILES sequence based on the BERT model, which is widely used in natural language processing, such that the model learns to extract the molecular structural information contained in the SMILES sequence. In an experiment, we first pretrain the proposed model with 100,000 SMILES sequences and then use the pretrained model to predict molecular properties on 22 data sets and the odor characteristics of molecules (98 types of odor descriptor). The experimental results show that our proposed pretraining model effectively improves the performance of molecular property prediction SCIENTIFIC CONTRIBUTION: The 2-encoder pretraining is proposed by focusing on the lower dependency of symbols to the contextual environment in a SMILES than one in a natural language sentence and the corresponding of one compound to multiple SMILES sequences. The model pretrained with 2-encoder shows higher robustness in tasks of molecular properties prediction compared to BERT which is adept at natural language.

Collapse

Yuan Y, Tang X, Li H, Lang X, Li C, Song Y, Sun S, Yang Y, Zhou Z. KLSD: a kinase database focused on ligand similarity and diversity. Front Pharmacol 2024;15:1400136. [PMID: 38957398 PMCID: PMC11217335 DOI: 10.3389/fphar.2024.1400136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/28/2024] [Indexed: 07/04/2024] Open

Abstract

Due to the similarity and diversity among kinases, small molecule kinase inhibitors (SMKIs) often display multi-target effects or selectivity, which have a strong correlation with the efficacy and safety of these inhibitors. However, due to the limited number of well-known popular databases and their restricted data mining capabilities, along with the significant scarcity of databases focusing on the pharmacological similarity and diversity of SMIKIs, researchers find it challenging to quickly access relevant information. The KLIFS database is representative of specialized application databases in the field, focusing on kinase structure and co-crystallised kinase-ligand interactions, whereas the KLSD database in this paper emphasizes the analysis of SMKIs among all reported kinase targets. To solve the current problem of the lack of professional application databases in kinase research and to provide centralized, standardized, reliable and efficient data resources for kinase researchers, this paper proposes a research program based on the ChEMBL database. It focuses on kinase ligands activities comparisons. This scheme extracts kinase data and standardizes and normalizes them, then performs kinase target difference analysis to achieve kinase activity threshold judgement. It then constructs a specialized and personalized kinase database platform, adopts the front-end and back-end separation technology of SpringBoot architecture, constructs an extensible WEB application, handles the storage, retrieval and analysis of the data, ultimately realizing data visualization and interaction. This study aims to develop a kinase database platform to collect, organize, and provide standardized data related to kinases. By offering essential resources and tools, it supports kinase research and drug development, thereby advancing scientific research and innovation in kinase-related fields. It is freely accessible at: http://ai.njucm.edu.cn:8080.

Collapse

Zhang R, Nolte D, Sanchez-Villalobos C, Ghosh S, Pal R. Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nat Commun 2024;15:5072. [PMID: 38871711 DOI: 10.1038/s41467-024-49372-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 06/04/2024] [Indexed: 06/15/2024] Open

Daghighi A, Casanola-Martin GM, Iduoku K, Kusic H, González-Díaz H, Rasulev B. Multi-Endpoint Acute Toxicity Assessment of Organic Compounds Using Large-Scale Machine Learning Modeling. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024;58:10116-10127. [PMID: 38797941 DOI: 10.1021/acs.est.4c01017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]

Luong KD, Singh A. Application of Transformers in Cheminformatics. J Chem Inf Model 2024;64:4392-4409. [PMID: 38815246 PMCID: PMC11167597 DOI: 10.1021/acs.jcim.3c02070] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 04/05/2024] [Accepted: 05/06/2024] [Indexed: 06/01/2024]

Kumar A, Ojha PK, Roy K. The first report on the assessment of maximum acceptable daily intake (MADI) of pesticides for humans using intelligent consensus predictions. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2024;26:870-881. [PMID: 38652036 DOI: 10.1039/d4em00059e] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/25/2024]

Isinkaye FO, Olusanya MO, Singh PK. Deep learning and content-based filtering techniques for improving plant disease identification and treatment recommendations: A comprehensive review. Heliyon 2024;10:e29583. [PMID: 38737274 PMCID: PMC11088271 DOI: 10.1016/j.heliyon.2024.e29583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 03/30/2024] [Accepted: 04/10/2024] [Indexed: 05/14/2024] Open

Schlosser L, Rana D, Pflüger P, Katzenburg F, Glorius F. EnTdecker - A Machine Learning-Based Platform for Guiding Substrate Discovery in Energy Transfer Catalysis. J Am Chem Soc 2024;146:13266-13275. [PMID: 38695558 DOI: 10.1021/jacs.4c01352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Walter M, Webb SJ, Gillet VJ. Interpreting Neural Network Models for Toxicity Prediction by Extracting Learned Chemical Features. J Chem Inf Model 2024;64:3670-3688. [PMID: 38686880 PMCID: PMC11094726 DOI: 10.1021/acs.jcim.4c00127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/15/2024] [Accepted: 04/15/2024] [Indexed: 05/02/2024]

Tian T, Li S, Fang M, Zhao D, Zeng J. MolSHAP: Interpreting Quantitative Structure-Activity Relationships Using Shapley Values of R-Groups. J Chem Inf Model 2024;64:2236-2249. [PMID: 37584270 DOI: 10.1021/acs.jcim.3c00465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/17/2023]

Hartog PBR, Krüger F, Genheden S, Tetko IV. Using test-time augmentation to investigate explainable AI: inconsistencies between method, model and human intuition. J Cheminform 2024;16:39. [PMID: 38576047 PMCID: PMC10993590 DOI: 10.1186/s13321-024-00824-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Accepted: 03/09/2024] [Indexed: 04/06/2024] Open

Abstract

Stakeholders of machine learning models desire explainable artificial intelligence (XAI) to produce human-understandable and consistent interpretations. In computational toxicity, augmentation of text-based molecular representations has been used successfully for transfer learning on downstream tasks. Augmentations of molecular representations can also be used at inference to compare differences between multiple representations of the same ground-truth. In this study, we investigate the robustness of eight XAI methods using test-time augmentation for a molecular-representation model in the field of computational toxicity prediction. We report significant differences between explanations for different representations of the same ground-truth, and show that randomized models have similar variance. We hypothesize that text-based molecular representations in this and past research reflect tokenization more than learned parameters. Furthermore, we see a greater variance between in-domain predictions than out-of-domain predictions, indicating XAI measures something other than learned parameters. Finally, we investigate the relative importance given to expert-derived structural alerts and find similar importance given irregardless of applicability domain, randomization and varying training procedures. We therefore caution future research to validate their methods using a similar comparison to human intuition without further investigation. SCIENTIFIC CONTRIBUTION: In this research we critically investigate XAI through test-time augmentation, contrasting previous assumptions about using expert validation and showing inconsistencies within models for identical representations. SMILES augmentation has been used to increase model accuracy, but was here adapted from the field of image test-time augmentation to be used as an independent indication of the consistency within SMILES-based molecular representation models.

Collapse

Kovalishyn V, Severin O, Kachaeva M, Kobzar O, Keith KA, Harden EA, Hartline CB, James SH, Vovk A, Brovarets V. In Silico Design and Experimental Validation of Novel Oxazole Derivatives Against Varicella zoster virus. Mol Biotechnol 2024;66:707-717. [PMID: 36709460 DOI: 10.1007/s12033-023-00670-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 01/14/2023] [Indexed: 01/30/2023]

Abstract

Varicella zoster virus (VZV) infection causes severe disease such as chickenpox, shingles, and postherpetic neuralgia, often leading to disability. Reactivation of latent VZV is associated with a decrease in specific cellular immunity in the elderly and in patients with immunodeficiency. However, due to the limited efficacy of existing therapy and the emergence of antiviral resistance, it has become necessary to develop new and effective antiviral drugs for the treatment of diseases caused by VZV, particularly in the setting of opportunistic infections. The goal of this work is to identify potent oxazole derivatives as anti-VZV agents by machine learning, followed by their synthesis and experimental validation. Predictive QSAR models were developed using the Online Chemical Modeling Environment (OCHEM). Data on compounds exhibiting antiviral activity were collected from the ChEMBL and uploaded in the OCHEM database. The predictive ability of the models was tested by cross-validation, giving coefficient of determination q2 = 0.87-0.9. The validation of the models using an external test set proves that the models can be used to predict the antiviral activity of newly designed and known compounds with reasonable accuracy within the applicability domain (q2 = 0.83-0.84). The models were applied to screen a virtual chemical library with expected activity of compounds against VZV. The 7 most promising oxazole derivatives were identified, synthesized, and tested. Two of them showed activity against the VZV Ellen strain upon primary in vitro antiviral screening. The synthesized compounds may represent an interesting starting point for further development of the oxazole derivatives against VZV. The developed models are available online at OCHEM http://ochem.eu/article/145978 and can be used to virtually screen for potential compounds with anti-VZV activity.

Collapse

Hunklinger A, Hartog P, Šícho M, Godin G, Tetko IV. The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS joint compound solubility challenge. SLAS DISCOVERY : ADVANCING LIFE SCIENCES R & D 2024;29:100144. [PMID: 38316342 DOI: 10.1016/j.slasd.2024.01.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2023] [Revised: 01/06/2024] [Accepted: 01/22/2024] [Indexed: 02/07/2024]

Shen T, Li S, Wang XS, Wang D, Wu S, Xia J, Zhang L. Deep reinforcement learning enables better bias control in benchmark for virtual screening. Comput Biol Med 2024;171:108165. [PMID: 38402838 DOI: 10.1016/j.compbiomed.2024.108165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Revised: 02/07/2024] [Accepted: 02/14/2024] [Indexed: 02/27/2024]

Nwadiugwu M, Onwuekwe I, Ezeanolue E, Deng H. Beyond Amyloid: A Machine Learning-Driven Approach Reveals Properties of Potent GSK-3β Inhibitors Targeting Neurofibrillary Tangles. Int J Mol Sci 2024;25:2646. [PMID: 38473895 PMCID: PMC10931970 DOI: 10.3390/ijms25052646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 02/16/2024] [Accepted: 02/21/2024] [Indexed: 03/14/2024] Open

Abstract

Current treatments for Alzheimer's disease (AD) focus on slowing memory and cognitive decline, but none offer curative outcomes. This study aims to explore and curate the common properties of active, drug-like molecules that modulate glycogen synthase kinase 3β (GSK-3β), a well-documented kinase with increased activity in tau hyperphosphorylation and neurofibrillary tangles-hallmarks of AD pathology. Leveraging quantitative structure-activity relationship (QSAR) data from the PubChem and ChEMBL databases, we employed seven machine learning models: logistic regression (LogR), k-nearest neighbors (KNN), random forest (RF), support vector machine (SVM), extreme gradient boosting (XGB), neural networks (NNs), and ensemble majority voting. Our goal was to correctly predict active and inactive compounds that inhibit GSK-3β activity and identify their key properties. Among the six individual models, the NN demonstrated the highest performance with a 79% AUC-ROC on unbalanced external validation data, while the SVM model was superior in accurately classifying the compounds. The SVM and RF models surpassed NN in terms of Kappa values, and the ensemble majority voting model demonstrated slightly better accuracy to the NN on the external validation data. Feature importance analysis revealed that hydrogen bonds, phenol groups, and specific electronic characteristics are important features of molecular descriptors that positively correlate with active GSK-3β inhibition. Conversely, structural features like imidazole rings, sulfides, and methoxy groups showed a negative correlation. Our study highlights the significance of structural, electronic, and physicochemical descriptors in screening active candidates against GSK-3β. These predictive features could prove useful in therapeutic strategies to understand the important properties of GSK-3β candidate inhibitors that may potentially benefit non-amyloid-based AD treatments targeting neurofibrillary tangles.

Collapse

Shen T, Guo J, Han Z, Zhang G, Liu Q, Si X, Wang D, Wu S, Xia J. AutoMolDesigner for Antibiotic Discovery: An AI-Based Open-Source Software for Automated Design of Small-Molecule Antibiotics. J Chem Inf Model 2024;64:575-583. [PMID: 38265916 DOI: 10.1021/acs.jcim.3c01562] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2024]

Affiliation(s)

Tao Shen State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Jiale Guo State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Zunsheng Han State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Gao Zhang State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Qingxin Liu State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
Xinxin Si School of Pharmacy, Jiangsu Ocean University, Lianyungang, Jiangsu 222005, China
Dongmei Wang State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Song Wu State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China
Jie Xia State Key Laboratory of Bioactive Substance and Function of Natural Medicines, Institute of Materia Medica, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100050, China

Collapse

Gangwal A, Ansari A, Ahmad I, Azad AK, Kumarasamy V, Subramaniyan V, Wong LS. Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Front Pharmacol 2024;15:1331062. [PMID: 38384298 PMCID: PMC10879372 DOI: 10.3389/fphar.2024.1331062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Accepted: 01/17/2024] [Indexed: 02/23/2024] Open

Abstract

There are two main ways to discover or design small drug molecules. The first involves fine-tuning existing molecules or commercially successful drugs through quantitative structure-activity relationships and virtual screening. The second approach involves generating new molecules through de novo drug design or inverse quantitative structure-activity relationship. Both methods aim to get a drug molecule with the best pharmacokinetic and pharmacodynamic profiles. However, bringing a new drug to market is an expensive and time-consuming endeavor, with the average cost being estimated at around $2.5 billion. One of the biggest challenges is screening the vast number of potential drug candidates to find one that is both safe and effective. The development of artificial intelligence in recent years has been phenomenal, ushering in a revolution in many fields. The field of pharmaceutical sciences has also significantly benefited from multiple applications of artificial intelligence, especially drug discovery projects. Artificial intelligence models are finding use in molecular property prediction, molecule generation, virtual screening, synthesis planning, repurposing, among others. Lately, generative artificial intelligence has gained popularity across domains for its ability to generate entirely new data, such as images, sentences, audios, videos, novel chemical molecules, etc. Generative artificial intelligence has also delivered promising results in drug discovery and development. This review article delves into the fundamentals and framework of various generative artificial intelligence models in the context of drug discovery via de novo drug design approach. Various basic and advanced models have been discussed, along with their recent applications. The review also explores recent examples and advances in the generative artificial intelligence approach, as well as the challenges and ongoing efforts to fully harness the potential of generative artificial intelligence in generating novel drug molecules in a faster and more affordable manner. Some clinical-level assets generated form generative artificial intelligence have also been discussed in this review to show the ever-increasing application of artificial intelligence in drug discovery through commercial partnerships.

Collapse

Lei L, Zhang L, Han Z, Chen Q, Liao P, Wu D, Tai J, Xie B, Su Y. Advancing chronic toxicity risk assessment in freshwater ecology by molecular characterization-based machine learning. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2024;342:123093. [PMID: 38072027 DOI: 10.1016/j.envpol.2023.123093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/30/2023] [Accepted: 12/02/2023] [Indexed: 01/26/2024]

Abstract

The continuously increased production of various chemicals and their release into environments have raised potential negative effects on ecological health. However, traditional labor-intensive assessment methods cannot effectively and rapidly evaluate these hazards, especially for chronic risk. In this study, machine learning (ML) was employed to construct quantitative structure-activity relationship (QSAR) models, enabling the prediction of chronic toxicity to aquatic organisms by leveraging the molecular characteristics of pollutants, namely, the molecular descriptors, fingerprints, and graphs. The limited dataset size hindered the notable advantages of the graph attention network (GAT) model for the molecular graphs. Considering computational efficiency and performance (R2 = 0.78; RMSE = 0.77), XGBoost (XGB) was used for reliable QSAR-ML models predicting chronic toxicity using small- or medium-sized tabular data and the molecular descriptors. Further kernel density estimation analysis confirmed the high accuracy of the model for pollutant concentrations ranging from 10-3 to 102 mg/L, effectively aligning with most environmental scenarios. Model interpretation showed SlogP and exposure duration as the primary influential factors. SlogP, representing the distribution coefficient of a molecule between lipophilic and hydrophilic environments, had a negative effect on the toxicity outcomes. Additionally, the exposure duration played a crucial role in determining the chronic toxicity. Finally, the chronic toxicity data of bisphenol A validated the robustness and reliability of the model established in this research. Our study provided a robust and feasible methodology for chronic ecological risk evaluation of various types of pollutants and could facilitate and increase the use of ML applications in environmental fields.

Collapse

Affiliation(s)

Lang Lei Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Liangmao Zhang Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Zhibang Han Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Qirui Chen Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Pengcheng Liao Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China
Dong Wu Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Jun Tai Shanghai Environmental Sanitation Engineering Design Institute Co., Ltd., Shanghai, 200232, China
Bing Xie Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China
Yinglong Su Shanghai Engineering Research Center of Biotransformation of Organic Solid Waste, School of Ecological and Environmental Sciences, East China Normal University, Shanghai, 200241, China; Chongqing Key Laboratory of Precision Optics, Chongqing Institute of East China Normal University, Chongqing, 401120, China; Shanghai Institute of Pollution Control and Ecological Security, Shanghai, 200092, China.

Collapse

Song Z, Chen J, Cheng J, Chen G, Qi Z. Computer-Aided Molecular Design of Ionic Liquids as Advanced Process Media: A Review from Fundamentals to Applications. Chem Rev 2024;124:248-317. [PMID: 38108629 DOI: 10.1021/acs.chemrev.3c00223] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]

Gryniukova A, Borysko P, Myziuk I, Alieksieieva D, Hodyna D, Semenyuta I, Kovalishyn V, Metelytsia L, Rogalsky S, Tcherniuk S. Anticancer activity features of imidazole-based ionic liquids and lysosomotropic detergents: in silico and in vitro studies. Mol Divers 2024:10.1007/s11030-023-10779-4. [PMID: 38246950 DOI: 10.1007/s11030-023-10779-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Accepted: 11/20/2023] [Indexed: 01/23/2024]

Siramshetty VB, Xu X, Shah P. Artificial Intelligence in ADME Property Prediction. Methods Mol Biol 2024;2714:307-327. [PMID: 37676606 DOI: 10.1007/978-1-0716-3441-7_17] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/08/2023]

Li Y, Cardoso-Silva J, Kelly JM, Delves MJ, Furnham N, Papageorgiou LG, Tsoka S. Optimisation-based modelling for explainable lead discovery in malaria. Artif Intell Med 2024;147:102700. [PMID: 38184363 DOI: 10.1016/j.artmed.2023.102700] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 10/17/2023] [Accepted: 10/29/2023] [Indexed: 01/08/2024]

Pérez-Correa I, Giunta PD, Mariño FJ, Francesconi JA. Transformer-Based Representation of Organic Molecules for Potential Modeling of Physicochemical Properties. J Chem Inf Model 2023;63:7676-7688. [PMID: 38062559 DOI: 10.1021/acs.jcim.3c01548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2023]

Sandhu H, Garg P. Machine Learning Enables Accurate Prediction of Quinone Formation during Drug Metabolism. Chem Res Toxicol 2023;36:1876-1890. [PMID: 37885227 DOI: 10.1021/acs.chemrestox.3c00162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]

Ali H, Qureshi R, Shah Z. Artificial Intelligence-Based Methods for Integrating Local and Global Features for Brain Cancer Imaging: Scoping Review. JMIR Med Inform 2023;11:e47445. [PMID: 37976086 PMCID: PMC10692876 DOI: 10.2196/47445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 07/02/2023] [Accepted: 07/12/2023] [Indexed: 11/19/2023] Open

Abstract

BACKGROUND

Transformer-based models are gaining popularity in medical imaging and cancer imaging applications. Many recent studies have demonstrated the use of transformer-based models for brain cancer imaging applications such as diagnosis and tumor segmentation.

OBJECTIVE

This study aims to review how different vision transformers (ViTs) contributed to advancing brain cancer diagnosis and tumor segmentation using brain image data. This study examines the different architectures developed for enhancing the task of brain tumor segmentation. Furthermore, it explores how the ViT-based models augmented the performance of convolutional neural networks for brain cancer imaging.

METHODS

This review performed the study search and study selection following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. The search comprised 4 popular scientific databases: PubMed, Scopus, IEEE Xplore, and Google Scholar. The search terms were formulated to cover the interventions (ie, ViTs) and the target application (ie, brain cancer imaging). The title and abstract for study selection were performed by 2 reviewers independently and validated by a third reviewer. Data extraction was performed by 2 reviewers and validated by a third reviewer. Finally, the data were synthesized using a narrative approach.

RESULTS

Of the 736 retrieved studies, 22 (3%) were included in this review. These studies were published in 2021 and 2022. The most commonly addressed task in these studies was tumor segmentation using ViTs. No study reported early detection of brain cancer. Among the different ViT architectures, Shifted Window transformer-based architectures have recently become the most popular choice of the research community. Among the included architectures, UNet transformer and TransUNet had the highest number of parameters and thus needed a cluster of as many as 8 graphics processing units for model training. The brain tumor segmentation challenge data set was the most popular data set used in the included studies. ViT was used in different combinations with convolutional neural networks to capture both the global and local context of the input brain imaging data.

CONCLUSIONS

It can be argued that the computational complexity of transformer architectures is a bottleneck in advancing the field and enabling clinical transformations. This review provides the current state of knowledge on the topic, and the findings of this review will be helpful for researchers in the field of medical artificial intelligence and its applications in brain cancer.

Collapse

Libouban PY, Aci-Sèche S, Gómez-Tamayo JC, Tresadern G, Bonnet P. The Impact of Data on Structure-Based Binding Affinity Predictions Using Deep Neural Networks. Int J Mol Sci 2023;24:16120. [PMID: 38003312 PMCID: PMC10671244 DOI: 10.3390/ijms242216120] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/30/2023] [Accepted: 11/01/2023] [Indexed: 11/26/2023] Open

Mullowney MW, Duncan KR, Elsayed SS, Garg N, van der Hooft JJJ, Martin NI, Meijer D, Terlouw BR, Biermann F, Blin K, Durairaj J, Gorostiola González M, Helfrich EJN, Huber F, Leopold-Messer S, Rajan K, de Rond T, van Santen JA, Sorokina M, Balunas MJ, Beniddir MA, van Bergeijk DA, Carroll LM, Clark CM, Clevert DA, Dejong CA, Du C, Ferrinho S, Grisoni F, Hofstetter A, Jespers W, Kalinina OV, Kautsar SA, Kim H, Leao TF, Masschelein J, Rees ER, Reher R, Reker D, Schwaller P, Segler M, Skinnider MA, Walker AS, Willighagen EL, Zdrazil B, Ziemert N, Goss RJM, Guyomard P, Volkamer A, Gerwick WH, Kim HU, Müller R, van Wezel GP, van Westen GJP, Hirsch AKH, Linington RG, Robinson SL, Medema MH. Artificial intelligence for natural product drug discovery. Nat Rev Drug Discov 2023;22:895-916. [PMID: 37697042 DOI: 10.1038/s41573-023-00774-7] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/20/2023] [Indexed: 09/13/2023]

Affiliation(s)

Michael W Mullowney Duchossois Family Institute, The University of Chicago, Chicago, IL, USA
Katherine R Duncan Strathclyde Institute of Pharmacy and Biomedical Sciences, University of Strathclyde, Glasgow, UK
Somayah S Elsayed Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Neha Garg School of Chemistry and Biochemistry, Center for Microbial Dynamics and Infection, Georgia Institute of Technology, Atlanta, GA, USA
Justin J J van der Hooft Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Department of Biochemistry, University of Johannesburg, Johannesburg, South Africa
Nathaniel I Martin Biological Chemistry Group, Institute of Biology, Leiden University, Leiden, The Netherlands
David Meijer Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Barbara R Terlouw Bioinformatics Group, Wageningen University, Wageningen, The Netherlands
Friederike Biermann Bioinformatics Group, Wageningen University, Wageningen, The Netherlands Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Kai Blin The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Kongens Lyngby, Denmark
Janani Durairaj Biozentrum, University of Basel, Basel, Switzerland
Marina Gorostiola González Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands ONCODE institute, Leiden, The Netherlands
Eric J N Helfrich Institute of Molecular Bio Science, Goethe-University Frankfurt, Frankfurt am Main, Germany LOEWE Center for Translational Biodiversity Genomics (TBG), Frankfurt am Main, Germany
Florian Huber Center for Digitalization and Digitality, Hochschule Düsseldorf, Düsseldorf, Germany
Stefan Leopold-Messer Institut für Mikrobiologie, Eidgenössische Technische Hochschule (ETH) Zürich, Zürich, Switzerland
Kohulan Rajan Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller-University Jena, Jena, Germany
Tristan de Rond School of Chemical Sciences, University of Auckland, Auckland, New Zealand
Jeffrey A van Santen Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada
Maria Sorokina Institute for Inorganic and Analytical Chemistry, Friedrich-Schiller University, Jena, Germany Pharmaceuticals R&D, Bayer AG, Berlin, Germany
Marcy J Balunas Department of Microbiology and Immunology, University of Michigan, Ann Arbor, MI, USA Department of Medicinal Chemistry, University of Michigan, Ann Arbor, MI, USA
Mehdi A Beniddir Équipe "Chimie des Substances Naturelles", Université Paris-Saclay, CNRS, BioCIS, Orsay, France
Doris A van Bergeijk Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Laura M Carroll Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
Chase M Clark Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Djork-Arné Clevert WRDM - Machine Learning Research, Pfizer, Berlin, Germany
Chris A Dejong Adapsyn Bioscience, Hamilton, Ontario, Canada
Chao Du Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands
Scarlet Ferrinho Chemistry Department, University of St Andrews, St Andrews, UK
Francesca Grisoni Institute for Complex Molecular Systems, Department of Biomedical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands Centre for Living Technologies, Alliance TU/e, WUR, UU, UMC Utrecht, Utrecht, The Netherlands
Albert Hofstetter Laboratory of Physical Chemistry, ETH Zürich, Zürich, Switzerland
Willem Jespers Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands
Olga V Kalinina Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Drug Bioinformatics, Medical Faculty, Saarland University, Homburg, Germany Center for Bioinformatics, Saarland University, Saarbrücken, Germany
Satria A Kautsar Department of Chemistry, Scripps Research, FL, USA
Hyunwoo Kim College of Pharmacy and Integrated Research Institute for Drug Development, Dongguk University Seoul, Goyang-si, Republic of Korea
Tiago F Leao Center for Nuclear Energy in Agriculture, University of São Paulo, Piracicaba, Brazil
Joleen Masschelein Center for Microbiology, VIB-KU Leuven, Heverlee, Belgium Department of Biology, KU Leuven, Heverlee, Belgium
Evan R Rees Division of Pharmaceutical Sciences, School of Pharmacy, University of Wisconsin-Madison, Madison, WI, USA
Raphael Reher Institute of Pharmaceutical Biology and Biotechnology, University of Marburg, Marburg, Germany Institute of Pharmacy, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany
Daniel Reker Department of Biomedical Engineering, Duke University, Durham, NC, USA Duke Microbiome Center, Duke University, Durham, NC, USA
Philippe Schwaller Laboratory of Artificial Chemical Intelligence, Institut des Sciences et Ingénierie Chimiques, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
Marwin Segler Microsoft Research, Cambridge, UK
Michael A Skinnider Adapsyn Bioscience, Hamilton, Ontario, Canada Michael Smith Laboratories, University of British Columbia, Vancouver, British Columbia, Canada
Allison S Walker Department of Chemistry, Vanderbilt University, Nashville, TN, USA Department of Biological Sciences, Vanderbilt University, Nashville, TN, USA
Egon L Willighagen Department of Bioinformatics - BiGCaT, NUTRIM, Maastricht University, Maastricht, The Netherlands
Barbara Zdrazil European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, UK
Nadine Ziemert Interfaculty Institute for Microbiology and Infection Medicine Tuebingen (IMIT), Institute for Bioinformatics and Medical Informatics (IBMI), University of Tuebingen, Tuebingen, Germany
Rebecca J M Goss Chemistry Department, University of St Andrews, St Andrews, UK
Pierre Guyomard Bonsai team, CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, Université de Lille, Villeneuve d'Ascq Cedex, France
Andrea Volkamer Center for Bioinformatics, Saarland University, Saarbrücken, Germany In silico Toxicology and Structural Bioinformatics, Institute of Physiology, Charité - Universitätsmedizin Berlin, Berlin, Germany
William H Gerwick Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
Hyun Uk Kim Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Korea
Rolf Müller Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany Department of Pharmacy, Saarland University, Saarbrücken, Germany German Center for infection research (DZIF), Braunschweig, Germany Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany
Gilles P van Wezel Department of Molecular Biotechnology, Institute of Biology, Leiden University, Leiden, The Netherlands Netherlands Institute of Ecology, NIOO-KNAW, Wageningen, The Netherlands
Gerard J P van Westen Drug Discovery and Safety, Leiden Academic Centre for Drug Research, Leiden, The Netherlands.
Anna K H Hirsch Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Saarbrücken, Germany. Department of Pharmacy, Saarland University, Saarbrücken, Germany. German Center for infection research (DZIF), Braunschweig, Germany. Helmholtz International Lab for Anti-Infectives, Saarbrücken, Germany.
Roger G Linington Department of Chemistry, Simon Fraser University, Burnaby, British Columbia, Canada.
Serina L Robinson Department of Environmental Microbiology, Eawag: Swiss Federal Institute for Aquatic Science and Technology, Dübendorf, Switzerland.
Marnix H Medema Bioinformatics Group, Wageningen University, Wageningen, The Netherlands. Institute of Biology, Leiden University, Leiden, The Netherlands.

Collapse

Zhao X, Kong Y, Ji Y, Xin X, Chen L, Chen G, Yu C. Classification models for predicting the bioactivity of pan-TRK inhibitors and SAR analysis. Mol Divers 2023:10.1007/s11030-023-10735-2. [PMID: 37910346 DOI: 10.1007/s11030-023-10735-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/22/2023] [Indexed: 11/03/2023]

Banerjee A, Roy K. Read-across-based intelligent learning: development of a global q-RASAR model for the efficient quantitative predictions of skin sensitization potential of diverse organic chemicals. ENVIRONMENTAL SCIENCE. PROCESSES & IMPACTS 2023;25:1626-1644. [PMID: 37682520 DOI: 10.1039/d3em00322a] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/09/2023]

Hodyna D, Kovalishyn V, Romanenko Y, Semenyuta I, Blagodatny V, Kachaeva M, Brazhko O, Metelytsia L. Quinoline Hydrazone Derivatives as New Antibacterials against Multidrug Resistant Strains. Chem Biodivers 2023;20:e202300839. [PMID: 37552570 DOI: 10.1002/cbdv.202300839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 08/01/2023] [Accepted: 08/07/2023] [Indexed: 08/10/2023]

Sar S, Mitra S, Panda P, Mandal SC, Ghosh N, Halder AK, Cordeiro MNDS. In Silico Modeling and Structural Analysis of Soluble Epoxide Hydrolase Inhibitors for Enhanced Therapeutic Design. Molecules 2023;28:6379. [PMID: 37687207 PMCID: PMC10490281 DOI: 10.3390/molecules28176379] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 08/17/2023] [Accepted: 08/28/2023] [Indexed: 09/10/2023] Open

Abstract

Human soluble epoxide hydrolase (sEH), a dual-functioning homodimeric enzyme with hydrolase and phosphatase activities, is known for its pivotal role in the hydrolysis of epoxyeicosatrienoic acids. Inhibitors targeting sEH have shown promising potential in the treatment of various life-threatening diseases. In this study, we employed a range of in silico modeling approaches to investigate a diverse dataset of structurally distinct sEH inhibitors. Our primary aim was to develop predictive and validated models while gaining insights into the structural requirements necessary for achieving higher inhibitory potential. To accomplish this, we initially calculated molecular descriptors using nine different descriptor-calculating tools, coupled with stochastic and non-stochastic feature selection strategies, to identify the most statistically significant linear 2D-QSAR model. The resulting model highlighted the critical roles played by topological characteristics, 2D pharmacophore features, and specific physicochemical properties in enhancing inhibitory potential. In addition to conventional 2D-QSAR modeling, we implemented the Transformer-CNN methodology to develop QSAR models, enabling us to obtain structural interpretations based on the Layer-wise Relevance Propagation (LRP) algorithm. Moreover, a comprehensive 3D-QSAR analysis provided additional insights into the structural requirements of these compounds as potent sEH inhibitors. To validate the findings from the QSAR modeling studies, we performed molecular dynamics (MD) simulations using selected compounds from the dataset. The simulation results offered crucial insights into receptor-ligand interactions, supporting the predictions obtained from the QSAR models. Collectively, our work serves as an essential guideline for the rational design of novel sEH inhibitors with enhanced therapeutic potential. Importantly, all the in silico studies were performed using open-access tools to ensure reproducibility and accessibility.

Collapse

Miao Y, Ma H, Huang J. Recent Advances in Toxicity Prediction: Applications of Deep Graph Learning. Chem Res Toxicol 2023;36:1206-1226. [PMID: 37562046 DOI: 10.1021/acs.chemrestox.2c00384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]

Mrug G, Hodyna D, Metelytsia L, Kovalishyn V, Trokhimenko O, Bondarenko S, Kondratyuk K, Kozitskiy A, Frasinyuk M. Structure-Activity Relationship Prediction-Based Synthesis and Cytotoxicity Evaluation against the HEp-2 Laryngeal Carcinoma Cell of Isoflavone-Cytisine Mannich Bases. Chem Biodivers 2023;20:e202300560. [PMID: 37477067 DOI: 10.1002/cbdv.202300560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2023] [Revised: 07/15/2023] [Accepted: 07/20/2023] [Indexed: 07/22/2023]

Niazi SK, Mariam Z. Recent Advances in Machine-Learning-Based Chemoinformatics: A Comprehensive Review. Int J Mol Sci 2023;24:11488. [PMID: 37511247 PMCID: PMC10380192 DOI: 10.3390/ijms241411488] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Revised: 06/30/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open

Dou B, Zhu Z, Merkurjev E, Ke L, Chen L, Jiang J, Zhu Y, Liu J, Zhang B, Wei GW. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem Rev 2023;123:8736-8780. [PMID: 37384816 PMCID: PMC10999174 DOI: 10.1021/acs.chemrev.3c00189] [Citation(s) in RCA: 21] [Impact Index Per Article: 21.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]

Abstract

Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and technical limitations in data acquisition. However, big data have been the focus for the past decade, small data and their challenges have received little attention, even though they are technically more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technological breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solutions for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solutions to small data challenges in molecular science, including chemical and biological sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), Generative Adversarial Network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semi-supervised learning, combining deep learning with traditional machine learning, and physical model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in molecular science.

Collapse

Srivathsa AV, Sadashivappa NM, Hegde AK, Radha S, Mahesh AR, Ammunje DN, Sen D, Theivendren P, Govindaraj S, Kunjiappan S, Pavadai P. A Review on Artificial Intelligence Approaches and Rational Approaches in Drug Discovery. Curr Pharm Des 2023;29:1180-1192. [PMID: 37132148 DOI: 10.2174/1381612829666230428110542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 02/06/2023] [Accepted: 02/27/2023] [Indexed: 05/04/2023]

Qian X, Dai X, Luo L, Lin M, Xu Y, Zhao Y, Huang D, Qiu H, Liang L, Liu H, Liu Y, Gu L, Lu T, Chen Y, Zhang Y. An Interpretable Multitask Framework BiLAT Enables Accurate Prediction of Cyclin-Dependent Protein Kinase Inhibitors. J Chem Inf Model 2023. [PMID: 37171216 DOI: 10.1021/acs.jcim.3c00473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]

Abstract

The cyclin-dependent protein kinases (CDKs) are protein-serine/threonine kinases with crucial effects on the regulation of cell cycle and transcription. CDKs can be a hallmark of cancer since their excessive expression could lead to impaired cell proliferation. However, the selectivity profile of most developed CDK inhibitors is not enough, which have hindered the therapeutic use of CDK inhibitors. In this study, we propose a multitask deep learning framework called BiLAT based on SMILES representation for the prediction of the inhibitory activity of molecules on eight CDK subtypes (CDK1, 2, 4-9). The framework is mainly composed of an improved bidirectional long short-term memory module BiLSTM and the encode layer of the Transformer framework. Additionally, the data enhancement method of SMILES enumeration is applied to improve the performance of the model. Compared with baseline predictive models based on three conventional machine learning methods and two multitask deep learning algorithms, BiLAT achieves the best performance with the highest average AUC, ACC, F1-score, and MCC values of 0.938, 0.894, 0.911, and 0.715 for the test set. Moreover, we constructed a targeted external data set CDK-Dec for the CDK family, which mainly contains bait values screened by 3D similarity with active compounds. This dataset was utilized in the subsequent evaluation of our model. It is worth mentioning that the BiLAT model is interpretable and can be used by chemists to design and synthesize compounds with improved activity. To further verify the generalization ability of the multitask BiLAT model, we also conducted another evaluation on three public datasets (Tox21, ClinTox, and SIDER). Compared with several currently popular models, BiLAT shows the best performance on two datasets. These results indicate that BiLAT is an effective tool for accelerating drug discovery.

Collapse

Affiliation(s)

Xu Qian Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Xiaowen Dai Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Lin Luo Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Mingde Lin Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Yuan Xu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Yang Zhao Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Dingfang Huang Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Haodi Qiu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Li Liang Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Haichun Liu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Yingbo Liu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Lingxi Gu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Tao Lu Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China State Key Laboratory of Natural Medicines, China Pharmaceutical University, 24 Tongjiaxiang, Nanjing 210009, China
Yadong Chen Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China
Yanmin Zhang Laboratory of Molecular Design and Drug Discovery, School of Science, China Pharmaceutical University, 639 Longmian Avenue, Nanjing 211198, China

Collapse

Zhang B, Lin J, Du L, Zhang L. Harnessing Data Augmentation and Normalization Preprocessing to Improve the Performance of Chemical Reaction Predictions of Data-Driven Model. Polymers (Basel) 2023;15:polym15092224. [PMID: 37177370 PMCID: PMC10180765 DOI: 10.3390/polym15092224] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 05/03/2023] [Accepted: 05/03/2023] [Indexed: 05/15/2023] Open

Patlewicz G, Paul-Friedman K, Houck K, Zhang L, Huang R, Xia M, Brown J, Simmons SO. Evaluating the utility of a high throughput thiol-containing fluorescent probe to screen for reactivity: A case study with the Tox21 library. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2023;26:10.1016/j.comtox.2023.100271. [PMID: 37388277 PMCID: PMC10304587 DOI: 10.1016/j.comtox.2023.100271] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/01/2023]

Ksenofontov AA, Isaev YI, Lukanov MM, Makarov DM, Eventova VA, Khodov IA, Berezin MB. Accurate prediction of ¹¹B NMR chemical shift of BODIPYs via machine learning. Phys Chem Chem Phys 2023;25:9472-9481. [PMID: 36935644 DOI: 10.1039/d3cp00253e] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/08/2023]

ASI-DBNet: An Adaptive Sparse Interactive ResNet-Vision Transformer Dual-Branch Network for the Grading of Brain Cancer Histopathological Images. Interdiscip Sci 2023;15:15-31. [PMID: 35810266 DOI: 10.1007/s12539-022-00532-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Revised: 05/26/2022] [Accepted: 05/31/2022] [Indexed: 10/17/2022]

Abstract

Brain cancer is the deadliest cancer that occurs in the brain and central nervous system, and rapid and precise grading is essential to reduce patient suffering and improve survival. Traditional convolutional neural network (CNN)-based computer-aided diagnosis algorithms cannot fully utilize the global information of pathology images, and the recently popular vision transformer (ViT) model does not focus enough on the local details of pathology images, both of which lead to a lack of precision in the focus of the model and a lack of accuracy in the grading of brain cancer. To solve this problem, we propose an adaptive sparse interaction ResNet-ViT dual-branch network (ASI-DBNet). First, we design the ResNet-ViT parallel structure to simultaneously capture and retain the local and global information of pathology images. Second, we design the adaptive sparse interaction block (ASIB) to interact the ResNet branch with the ViT branch. Furthermore, we introduce the attention mechanism in ASIB to adaptively filter the redundant information from the dual branches during the interaction so that the feature maps delivered during the interaction are more beneficial. Intensive experiments have shown that ASI-DBNet performs best in various baseline and SOTA models, with 95.24% accuracy in four grades. In particular, for brain tumors with a high degree of deterioration (Grade III and Grade IV), the highest diagnostic accuracies achieved by ASI-DBNet are 97.93% and 96.28%, respectively, which is of great clinical significance. Meanwhile, the gradient-weighted class activation map (Grad_cam) and attention rollout visualization mechanisms are utilized to visualize the working logic behind the model, and the resulting feature maps highlight the important distinguishing features related to the diagnosis. Therefore, the interpretability and confidence of the model are improved, which is of great value for the clinical diagnosis of brain cancer.

Collapse

SuHAN: Substructural hierarchical attention network for molecular representation. J Mol Graph Model 2023;119:108401. [PMID: 36584590 DOI: 10.1016/j.jmgm.2022.108401] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Revised: 12/16/2022] [Accepted: 12/23/2022] [Indexed: 12/26/2022]

Nascimben M, Rimondini L. Molecular Toxicity Virtual Screening Applying a Quantized Computational SNN-Based Framework. Molecules 2023;28:molecules28031342. [PMID: 36771009 PMCID: PMC9919191 DOI: 10.3390/molecules28031342] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 01/27/2023] [Accepted: 01/29/2023] [Indexed: 02/04/2023] Open

XSMILES: interactive visualization for molecules, SMILES and XAI attribution scores. J Cheminform 2023;15:2. [PMID: 36609340 PMCID: PMC9817292 DOI: 10.1186/s13321-022-00673-w] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Accepted: 12/17/2022] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

Explainable artificial intelligence (XAI) methods have shown increasing applicability in chemistry. In this context, visualization techniques can highlight regions of a molecule to reveal their influence over a predicted property. For this purpose, some XAI techniques calculate attribution scores associated with tokens of SMILES strings or with atoms of a molecule. While an association of a score with an atom can be directly visually represented on a molecule diagram, scores computed for SMILES non-atom tokens cannot. For instance, a substring [N+] contains 3 non-atom tokens, i.e., [, [Formula: see text], and ], and their attributions, depending on the model, are not necessarily revealing an influence of the nitrogen atom over the predicted property; for that reason, it is not possible to represent the scores on a molecule diagram. Moreover, SMILES's notation is complex, foregrounding the need for techniques to facilitate the analysis of explanations associated with their tokens.

RESULTS

We propose XSMILES, an interactive visualization technique, to explore explainable artificial intelligence attributions scores and support the interpretation of SMILES. Users can input any type of score attributed to atom and non-atom tokens and visualize them on top of a 2D molecule diagram coordinated with a bar chart that represents a SMILES string. We demonstrate how attributions calculated for SMILES strings can be evaluated and better interpreted through interactivity with two use cases.

CONCLUSIONS

Data scientists can use XSMILES to understand their models' behavior and compare multiple modeling approaches. The tool provides a set of parameters to adapt the visualization to users' needs and it can be integrated into different platforms. We believe XSMILES can support data scientists to develop, improve, and communicate their models by making it easier to identify patterns and compare attributions through interactive exploratory visualization.

Collapse

Zheng X, Tomiura Y, Hayashi K. Investigation of the structure-odor relationship using a Transformer model. J Cheminform 2022;14:88. [PMID: 36581889 PMCID: PMC9798546 DOI: 10.1186/s13321-022-00671-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2022] [Accepted: 12/14/2022] [Indexed: 12/30/2022] Open

Makarov D, Fadeeva Y, Safonova E, Shmukler L. Predictive modeling of antibacterial activity of ionic liquids by machine learning methods. Comput Biol Chem 2022;101:107775. [DOI: 10.1016/j.compbiolchem.2022.107775] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2022] [Revised: 09/24/2022] [Accepted: 10/03/2022] [Indexed: 11/03/2022]

Muzychka LV, Verves EV, Yaremchuk IO, Zinchenko AM, Shishkina SV, Semenyuta IV, Hodyna DM, Metelytsia LO, Kovalishyn V, Smolii OB. Synthesis, QSAR modeling, and molecular docking of novel fused 7-deazaxanthine derivatives as adenosine A_2A receptor antagonists. Chem Biol Drug Des 2022;100:1025-1032. [PMID: 34651417 DOI: 10.1111/cbdd.13975] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2021] [Revised: 09/21/2021] [Accepted: 10/10/2021] [Indexed: 01/25/2023]

Askr H, Elgeldawi E, Aboul Ella H, Elshaier YAMM, Gomaa MM, Hassanien AE. Deep learning in drug discovery: an integrative review and future challenges. Artif Intell Rev 2022;56:5975-6037. [PMID: 36415536 PMCID: PMC9669545 DOI: 10.1007/s10462-022-10306-1] [Citation(s) in RCA: 29] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/24/2022] [Indexed: 11/18/2022]