1
|
Yao R, Shen Z, Xu X, Ling G, Xiang R, Song T, Zhai F, Zhai Y. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Front Pharmacol 2024; 15:1393415. [PMID: 38799167 PMCID: PMC11116974 DOI: 10.3389/fphar.2024.1393415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction In recent years, graph neural network has been extensively applied to drug discovery research. Although researchers have made significant progress in this field, there is less research on bibliometrics. The purpose of this study is to conduct a comprehensive bibliometric analysis of graph neural network applications in drug discovery in order to identify current research hotspots and trends, as well as serve as a reference for future research. Methods Publications from 2017 to 2023 about the application of graph neural network in drug discovery were collected from the Web of Science Core Collection. Bibliometrix, VOSviewer, and Citespace were mainly used for bibliometric studies. Results and Discussion In this paper, a total of 652 papers from 48 countries/regions were included. Research interest in this field is continuously increasing. China and the United States have a significant advantage in terms of funding, the number of publications, and collaborations with other institutions and countries. Although some cooperation networks have been formed in this field, extensive worldwide cooperation still needs to be strengthened. The results of the keyword analysis clarified that graph neural network has primarily been applied to drug-target interaction, drug repurposing, and drug-drug interaction, while graph convolutional neural network and its related optimization methods are currently the core algorithms in this field. Data availability and ethical supervision, balancing computing resources, and developing novel graph neural network models with better interpretability are the key technical issues currently faced. This paper analyzes the current state, hot spots, and trends of graph neural network applications in drug discovery through bibliometric approaches, as well as the current issues and challenges in this field. These findings provide researchers with valuable insights on the current status and future directions of this field.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fei Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| | - Yuxuan Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
2
|
Ong WJG, Kirubakaran P, Karanicolas J. Poor Generalization by Current Deep Learning Models for Predicting Binding Affinities of Kinase Inhibitors. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.04.556234. [PMID: 37732243 PMCID: PMC10508770 DOI: 10.1101/2023.09.04.556234] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
The extreme surge of interest over the past decade surrounding the use of neural networks has inspired many groups to deploy them for predicting binding affinities of drug-like molecules to their receptors. A model that can accurately make such predictions has the potential to screen large chemical libraries and help streamline the drug discovery process. However, despite reports of models that accurately predict quantitative inhibition using protein kinase sequences and inhibitors' SMILES strings, it is still unclear whether these models can generalize to previously unseen data. Here, we build a Convolutional Neural Network (CNN) analogous to those previously reported and evaluate the model over four datasets commonly used for inhibitor/kinase predictions. We find that the model performs comparably to those previously reported, provided that the individual data points are randomly split between the training set and the test set. However, model performance is dramatically deteriorated when all data for a given inhibitor is placed together in the same training/testing fold, implying that information leakage underlies the models' performance. Through comparison to simple models in which the SMILES strings are tokenized, or in which test set predictions are simply copied from the closest training set data points, we demonstrate that there is essentially no generalization whatsoever in this model. In other words, the model has not learned anything about molecular interactions, and does not provide any benefit over much simpler and more transparent models. These observations strongly point to the need for richer structure-based encodings, to obtain useful prospective predictions of not-yet-synthesized candidate inhibitors.
Collapse
Affiliation(s)
- Wern Juin Gabriel Ong
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
- Bowdoin College, Brunswick, ME 04011
| | - Palani Kirubakaran
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| | - John Karanicolas
- Cancer Signaling & Microenvironment Program, Fox Chase Cancer Center, Philadelphia, PA 19111
| |
Collapse
|
3
|
Williams AH, Zhan CG. Staying Ahead of the Game: How SARS-CoV-2 has Accelerated the Application of Machine Learning in Pandemic Management. BioDrugs 2023; 37:649-674. [PMID: 37464099 DOI: 10.1007/s40259-023-00611-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/28/2023] [Indexed: 07/20/2023]
Abstract
In recent years, machine learning (ML) techniques have garnered considerable interest for their potential use in accelerating the rate of drug discovery. With the emergence of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, the utilization of ML has become even more crucial in the search for effective antiviral medications. The pandemic has presented the scientific community with a unique challenge, and the rapid identification of potential treatments has become an urgent priority. Researchers have been able to accelerate the process of identifying drug candidates, repurposing existing drugs, and designing new compounds with desirable properties using machine learning in drug discovery. To train predictive models, ML techniques in drug discovery rely on the analysis of large datasets, including both experimental and clinical data. These models can be used to predict the biological activities, potential side effects, and interactions with specific target proteins of drug candidates. This strategy has proven to be an effective method for identifying potential coronavirus disease 2019 (COVID-19) and other disease treatments. This paper offers a thorough analysis of the various ML techniques implemented to combat COVID-19, including supervised and unsupervised learning, deep learning, and natural language processing. The paper discusses the impact of these techniques on pandemic drug development, including the identification of potential treatments, the understanding of the disease mechanism, and the creation of effective and safe therapeutics. The lessons learned can be applied to future outbreaks and drug discovery initiatives.
Collapse
Affiliation(s)
- Alexander H Williams
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA
- GSK Upper Providence, 1250 S. Collegeville Road, Collegeville, PA, 19426, USA
| | - Chang-Guo Zhan
- Molecular Modeling and Biopharmaceutical Center, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
- Department of Pharmaceutical Sciences, College of Pharmacy, University of Kentucky, 789 South Limestone Street, Lexington, KY, 40536, USA.
| |
Collapse
|
4
|
Yang Z, Cai X, Ye Q, Zhao Y, Li X, Zhang S, Zhang L. High-Throughput Screening for the Potential Inhibitors of SARS-CoV-2 with Essential Dynamic Behavior. Curr Drug Targets 2023; 24:532-545. [PMID: 36876836 DOI: 10.2174/1389450124666230306141725] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2022] [Revised: 11/09/2022] [Accepted: 01/11/2023] [Indexed: 03/07/2023]
Abstract
Global health security has been challenged by the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) pandemic. Due to the lengthy process of generating vaccinations, it is vital to reposition currently available drugs in order to relieve anti-epidemic tensions and accelerate the development of therapies for Coronavirus Disease 2019 (COVID-19), the public threat caused by SARS-CoV-2. High throughput screening techniques have established their roles in the evaluation of already available medications and the search for novel potential agents with desirable chemical space and more cost-effectiveness. Here, we present the architectural aspects of highthroughput screening for SARS-CoV-2 inhibitors, especially three generations of virtual screening methodologies with structural dynamics: ligand-based screening, receptor-based screening, and machine learning (ML)-based scoring functions (SFs). By outlining the benefits and drawbacks, we hope that researchers will be motivated to adopt these methods in the development of novel anti- SARS-CoV-2 agents.
Collapse
Affiliation(s)
- Zhiwei Yang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Xinhui Cai
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Qiushi Ye
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Yizhen Zhao
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Xuhua Li
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Shengli Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| | - Lei Zhang
- MOE Key Laboratory for Nonequilibrium Synthesis and Modulation of Condensed Matter, School of Physics, Xi'an Jiaotong University, Xi'an710049, China
| |
Collapse
|
5
|
Hybrid Approach to Identifying Druglikeness Leading Compounds against COVID-19 3CL Protease. Pharmaceuticals (Basel) 2022; 15:ph15111333. [DOI: 10.3390/ph15111333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 10/21/2022] [Accepted: 10/25/2022] [Indexed: 11/16/2022] Open
Abstract
SARS-CoV-2 is a positive single-strand RNA-based macromolecule that has caused the death of more than 6.3 million people since June 2022. Moreover, by disturbing global supply chains through lockdowns, the virus has indirectly caused devastating damage to the global economy. It is vital to design and develop drugs for this virus and its various variants. In this paper, we developed an in silico study-based hybrid framework to repurpose existing therapeutic agents in finding drug-like bioactive molecules that would cure COVID-19. In the first step, a total of 133 drug-likeness bioactive molecules are retrieved from the ChEMBL database against SARS coronavirus 3CL Protease. Based on the standard IC50, the dataset is divided into three classes: active, inactive, and intermediate. Our comparative analysis demonstrated that the proposed Extra Tree Regressor (ETR)-based QSAR model has improved prediction results related to the bioactivity of chemical compounds as compared to Gradient Boosting-, XGBoost-, Support Vector-, Decision Tree-, and Random Forest-based regressor models. ADMET analysis is carried out to identify thirteen bioactive molecules with the ChEMBL IDs 187460, 190743, 222234, 222628, 222735, 222769, 222840, 222893, 225515, 358279, 363535, 365134, and 426898. These molecules are highly suitable drug candidates for SARS-CoV-2 3CL Protease. In the next step, the efficacy of the bioactive molecules is computed in terms of binding affinity using molecular docking, and then six bioactive molecules are shortlisted, with the ChEMBL IDs 187460, 222769, 225515, 358279, 363535, and 365134. These molecules can be suitable drug candidates for SARS-CoV-2. It is anticipated that the pharmacologist and/or drug manufacturer would further investigate these six molecules to find suitable drug candidates for SARS-CoV-2. They can adopt these promising compounds for their downstream drug development stages.
Collapse
|
6
|
Wei Y, Li S, Li Z, Wan Z, Lin J. Interpretable-ADMET: a web service for ADMET prediction and optimization based on deep neural representation. Bioinformatics 2022; 38:2863-2871. [PMID: 35561160 DOI: 10.1093/bioinformatics/btac192] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Revised: 03/05/2022] [Accepted: 03/28/2022] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION In the process of discovery and optimization of lead compounds, it is difficult for non-expert pharmacologists to intuitively determine the contribution of substructure to a particular property of a molecule. RESULTS In this work, we develop a user-friendly web service, named interpretable-absorption, distribution, metabolism, excretion and toxicity (ADMET), which predict 59 ADMET-associated properties using 90 qualitative classification models and 28 quantitative regression models based on graph convolutional neural network and graph attention network algorithms. In interpretable-ADMET, there are 250 729 entries associated with 59 kinds of ADMET-associated properties for 80 167 chemical compounds. In addition to making predictions, interpretable-ADMET provides interpretation models based on gradient-weighted class activation map for identifying the substructure, which is important to the particular property. Interpretable-ADMET also provides an optimize module to automatically generate a set of novel virtual candidates based on matched molecular pair rules. We believe that interpretable-ADMET could serve as a useful tool for lead optimization in drug discovery. AVAILABILITY AND IMPLEMENTATION Interpretable-ADMET is available at http://cadd.pharmacy.nankai.edu.cn/interpretableadmet/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu Wei
- State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Center for Cell Responses, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300353, China
| | - Shanshan Li
- State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Center for Cell Responses, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300457, China
| | - Zhonglin Li
- State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Center for Cell Responses, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300457, China
| | - Ziwei Wan
- State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Center for Cell Responses, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300457, China
| | - Jianping Lin
- State Key Laboratory of Medicinal Chemical Biology, Frontiers Science Center for Cell Responses, College of Pharmacy and Tianjin Key Laboratory of Molecular Drug Research, Nankai University, Tianjin 300353, China
- Platform of Pharmaceutical Intelligence, Tianjin International Joint Academy of Biomedicine, Tianjin 300457, China
- Biodesign Center, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin 300308, China
| |
Collapse
|
7
|
Harigua-Souiai E, Oualha R, Souiai O, Abdeljaoued-Tej I, Guizani I. Applied Machine Learning Toward Drug Discovery Enhancement: Leishmaniases as a Case Study. Bioinform Biol Insights 2022; 16:11779322221090349. [PMID: 35478992 PMCID: PMC9036323 DOI: 10.1177/11779322221090349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 03/04/2022] [Indexed: 11/25/2022] Open
Abstract
Drug discovery (DD) research is a complex field with a high attrition rate. Machine learning (ML) approaches combined to chemoinformatics are of valuable input to this field. We, herein, focused on implementing multiple ML algorithms that shall learn from different molecular fingerprints (FPs) of 65 057 molecules that have been identified as active or inactive against Leishmania major promastigotes. We sought to build a classifier able to predict whether a given molecule has the potential of being anti-leishmanial or not. Using the RDkit library, we calculated 5 molecular FPs of the molecules. Then, we implemented 4 ML algorithms that we trained and tested for their ability to classify the molecules into active/inactive classes based on their chemical structure, encoded by the molecular FPs. Best performers were random forest (RF) and support vector machine (SVM), while atom-pair and topology torsion FPs were the best embedding functions. Both models were further assessed on different stratification levels of the dataset and showed stable performances. At last, we used them to predict the potential of molecules within the Food and Drug Administration (FDA)-approved drugs collection to present anti-Leishmania effects. We ranked these drugs according to their anti-Leishmanial probability and obtained in total seven anti-Leishmania agents, previously described in the literature, within the top 10 of each model. This validates the robustness of the approach, the algorithms, and FPs choices as well as the importance of the dataset size and content. We further engaged these molecules into reverse docking experiments on 3D crystal structures of seven well-studied Leishmania drug targets and could predict the molecular targets for 4 drugs. The results bring novel insights into anti-Leishmania compounds.
Collapse
Affiliation(s)
- Emna Harigua-Souiai
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Rafeh Oualha
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Oussama Souiai
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| | - Ines Abdeljaoued-Tej
- Laboratory of Bioinformatics, BioMathematics and BioStatistics LR20IPT09, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia.,Engineering School of Statistics and Information Analysis, University of Carthage, Ariana, Tunisia
| | - Ikram Guizani
- Laboratory of Molecular Epidemiology and Experimental Pathology-LR16IPT04, Institut Pasteur de Tunis, Université de Tunis El Manar, Tunis, Tunisia
| |
Collapse
|