1
|
Liu H, Hu B, Chen P, Wang X, Wang H, Wang S, Wang J, Lin B, Cheng M. Docking Score ML: Target-Specific Machine Learning Models Improving Docking-Based Virtual Screening in 155 Targets. J Chem Inf Model 2024. [PMID: 38958413 DOI: 10.1021/acs.jcim.4c00072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/04/2024]
Abstract
In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
Collapse
Affiliation(s)
- Haihan Liu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Baichun Hu
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Peiying Chen
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Xiao Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Hanxun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Shizun Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Jian Wang
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Bin Lin
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| | - Maosheng Cheng
- Key Laboratory of Structure-Based Drug Design & Discovery of Ministry of Education, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- Key Laboratory of Intelligent Drug Design and New Drug Discovery of Liaoning Province, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
- School of Pharmaceutical Engineering, Shenyang Pharmaceutical University, Shenyang 110016, People's Republic of China
| |
Collapse
|
2
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
3
|
Zhao D, Zhang Y, Chen Y, Li B, Zhou W, Wang L. Highly Accurate and Explainable Predictions of Small-Molecule Antioxidants for Eight In Vitro Assays Simultaneously through an Alternating Multitask Learning Strategy. J Chem Inf Model 2024. [PMID: 38888465 DOI: 10.1021/acs.jcim.4c00748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/20/2024]
Abstract
Small molecule antioxidants can inhibit or retard oxidation reactions and protect against free radical damage to cells, thus playing a key role in food, cosmetics, pharmaceuticals, the environment, as well as materials. Experimentally driven antioxidant discovery is a major paradigm, and computationally assisted antioxidants are rarely reported. In this study, a functional-group-based alternating multitask self-supervised molecular representation learning method is proposed to simultaneously predict the antioxidant activities of small molecules for eight commonly used in vitro antioxidant assays. Extensive evaluation results reveal that compared with the baseline models, the multitask FG-BERT model achieves the best overall predictive performance, with the highest average F1, BA, ROC-AUC, and PRC-AUC values of 0.860, 0.880, 0.954, and 0.937 for the test sets, respectively. The Y-scrambling testing results further demonstrate that such a deep learning model was not constructed by accident and that it has reliable predictive capabilities. Additionally, the excellent interpretability of the multitask FG-BERT model makes it easy to identify key structural fragments/groups that contribute significantly to the antioxidant effect of a given molecule. Finally, an online antioxidant activity prediction platform called AOP (freely available at https://aop.idruglab.cn/) and its local version were developed based on the high-quality multitask FG-BERT model for experts and nonexperts in the field. We anticipate that it will contribute to the discovery of novel small-molecule antioxidants.
Collapse
Affiliation(s)
- Duancheng Zhao
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Biaoshun Li
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Wenguang Zhou
- Central Laboratory of The Sixth Affiliated Hospital, School of Medicine, South China University of Technology, Foshan 528200, China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
4
|
Cui Z, Ma R, Yang CH, Malpani A, Chu TN, Ghazi A, Davis JW, Miles BJ, Lau C, Liu Y, Hung AJ. Capturing relationships between suturing sub-skills to improve automatic suturing assessment. NPJ Digit Med 2024; 7:152. [PMID: 38862627 PMCID: PMC11167055 DOI: 10.1038/s41746-024-01143-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Accepted: 05/22/2024] [Indexed: 06/13/2024] Open
Abstract
Suturing skill scores have demonstrated strong predictive capabilities for patient functional recovery. The suturing can be broken down into several substep components, including needle repositioning, needle entry angle, etc. Artificial intelligence (AI) systems have been explored to automate suturing skill scoring. Traditional approaches to skill assessment typically focus on evaluating individual sub-skills required for particular substeps in isolation. However, surgical procedures require the integration and coordination of multiple sub-skills to achieve successful outcomes. Significant associations among the technical sub-skill have been established by existing studies. In this paper, we propose a framework for joint skill assessment that takes into account the interconnected nature of sub-skills required in surgery. The prior known relationships among sub-skills are firstly identified. Our proposed AI system is then empowered by the prior known relationships to perform the suturing skill scoring for each sub-skill domain simultaneously. Our approach can effectively improve skill assessment performance through the prior known relationships among sub-skills. Through the proposed approach to joint skill assessment, we aspire to enhance the evaluation of surgical proficiency and ultimately improve patient outcomes in surgery.
Collapse
Affiliation(s)
- Zijun Cui
- University of Southern California, Los Angeles, CA, USA
| | - Runzhuo Ma
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Cherine H Yang
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | - Timothy N Chu
- University of Southern California, Los Angeles, CA, USA
| | - Ahmed Ghazi
- Johns Hopkins University, Baltimore, MD, USA
| | - John W Davis
- University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | | | - Yan Liu
- University of Southern California, Los Angeles, CA, USA
| | - Andrew J Hung
- Department of Urology, Cedars-Sinai Medical Center, Los Angeles, CA, USA.
| |
Collapse
|
5
|
Ru J, Zhu Z, Shi J. Spatial and geometric learning for classification of breast tumors from multi-center ultrasound images: a hybrid learning approach. BMC Med Imaging 2024; 24:133. [PMID: 38840240 PMCID: PMC11155188 DOI: 10.1186/s12880-024-01307-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2024] [Accepted: 05/27/2024] [Indexed: 06/07/2024] Open
Abstract
BACKGROUND Breast cancer is the most common cancer among women, and ultrasound is a usual tool for early screening. Nowadays, deep learning technique is applied as an auxiliary tool to provide the predictive results for doctors to decide whether to make further examinations or treatments. This study aimed to develop a hybrid learning approach for breast ultrasound classification by extracting more potential features from local and multi-center ultrasound data. METHODS We proposed a hybrid learning approach to classify the breast tumors into benign and malignant. Three multi-center datasets (BUSI, BUS, OASBUD) were used to pretrain a model by federated learning, then every dataset was fine-tuned at local. The proposed model consisted of a convolutional neural network (CNN) and a graph neural network (GNN), aiming to extract features from images at a spatial level and from graphs at a geometric level. The input images are small-sized and free from pixel-level labels, and the input graphs are generated automatically in an unsupervised manner, which saves the costs of labor and memory space. RESULTS The classification AUCROC of our proposed method is 0.911, 0.871 and 0.767 for BUSI, BUS and OASBUD. The balanced accuracy is 87.6%, 85.2% and 61.4% respectively. The results show that our method outperforms conventional methods. CONCLUSIONS Our hybrid approach can learn the inter-feature among multi-center data and the intra-feature of local data. It shows potential in aiding doctors for breast tumor classification in ultrasound at an early stage.
Collapse
Affiliation(s)
- Jintao Ru
- Department of Medical Engineering, Shaoxing Hospital of Traditional Chinese Medicine, Shaoxing, Zhejiang, People's Republic of China.
| | - Zili Zhu
- Department of Radiology, The First Affiliated Hospital of Ningbo University, Ningbo, Zhejiang, People's Republic of China
| | - Jialin Shi
- Rehabilitation Medicine Institute, Zhejiang Rehabilitation Medical Center, Hangzhou, Zhejiang, People's Republic of China
| |
Collapse
|
6
|
Qian X, Ju B, Shen P, Yang K, Li L, Liu Q. Meta Learning with Attention Based FP-GNNs for Few-Shot Molecular Property Prediction. ACS OMEGA 2024; 9:23940-23948. [PMID: 38854580 PMCID: PMC11154901 DOI: 10.1021/acsomega.4c02147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Revised: 05/09/2024] [Accepted: 05/14/2024] [Indexed: 06/11/2024]
Abstract
Molecular property prediction holds significant importance in drug discovery, enabling the identification of biologically active compounds with favorable drug-like properties. However, the low data problem, arising from the scarcity of labeled data in drug discovery, poses a substantial obstacle for accurate predictions. To address this challenge, we introduce a novel architecture, AttFPGNN-MAML, for few-shot molecular property prediction. The proposed approach incorporates a hybrid feature representation to enrich molecular representations and model intermolecular relationships specific to the task. By leveraging ProtoMAML, a meta-learning strategy, our model is trained and adapted to new tasks. Evaluation on two few-shot data sets, MoleculeNet and FS-Mol, demonstrates our method's superior performance in three out of four tasks and across various support set sizes. These results convincingly validate the effectiveness of our method in the realm of few-shot molecular property prediction. The source code is publicly available at https://github.com/sanomics-lab/AttFPGNN-MAML.
Collapse
Affiliation(s)
- Xiaoliang Qian
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
| | - Bin Ju
- SanOmics
AI Co., Ltd., Hangzhou 311103, China
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Ping Shen
- State
Key Laboratory for Diagnosis and Treatment of Infectious Diseases,
National Clinical Research Center for Infectious Diseases, Collaborative
Innovation Center for Diagnosis and Treatment of Infectious Diseases,
The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310009, China
| | - Keda Yang
- Shulan
International Medical College, Zhejiang
Shuren University, Hangzhou 310015, China
| | - Li Li
- Department
of Hepatobiliary Surgery, The First People’s
Hospital of Kunming, Kunming 650034, China
| | - Qi Liu
- Translational
Medical Center for Stem Cell Therapy and Institute for Regenerative
Medicine, Shanghai East Hospital, Frontier Science Center for Stem
Cell Research, Bioinformatics Department, School of Life Sciences
and Technology, Tongji University, Shanghai 200092, China
- Key
Laboratory
of Spine and Spinal Cord Injury Repair and Regeneration (Tongji University),
Ministry of Education, Orthopaedic Department of Tongji Hospital,
Frontier Science Center for Stem Cell Research, Bioinformatics Department,
School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
- Shanghai
Research Institute for Intelligent Autonomous Systems, Shanghai 201804, China
| |
Collapse
|
7
|
Shi S, Fu L, Yi J, Yang Z, Zhang X, Deng Y, Wang W, Wu C, Zhao W, Hou T, Zeng X, Lyu A, Cao D. ChemFH: an integrated tool for screening frequent false positives in chemical biology and drug discovery. Nucleic Acids Res 2024:gkae424. [PMID: 38783035 DOI: 10.1093/nar/gkae424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/25/2024] [Accepted: 05/10/2024] [Indexed: 05/25/2024] Open
Abstract
High-throughput screening rapidly tests an extensive array of chemical compounds to identify hit compounds for specific biological targets in drug discovery. However, false-positive results disrupt hit compound screening, leading to wastage of time and resources. To address this, we propose ChemFH, an integrated online platform facilitating rapid virtual evaluation of potential false positives, including colloidal aggregators, spectroscopic interference compounds, firefly luciferase inhibitors, chemical reactive compounds, promiscuous compounds, and other assay interferences. By leveraging a dataset containing 823 391 compounds, we constructed high-quality prediction models using multi-task directed message-passing network (DMPNN) architectures combining uncertainty estimation, yielding an average AUC value of 0.91. Furthermore, ChemFH incorporated 1441 representative alert substructures derived from the collected data and ten commonly used frequent hitter screening rules. ChemFH was validated with an external set of 75 compounds. Subsequently, the virtual screening capability of ChemFH was successfully confirmed through its application to five virtual screening libraries. Furthermore, ChemFH underwent additional validation on two natural products and FDA-approved drugs, yielding reliable and accurate results. ChemFH is a comprehensive, reliable, and computationally efficient screening pipeline that facilitates the identification of true positive results in assays, contributing to enhanced efficiency and success rates in drug discovery. ChemFH is freely available via https://chemfh.scbdd.com/.
Collapse
Affiliation(s)
- Shaohua Shi
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ziyi Yang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Xiaochen Zhang
- School of Information Technology, Shangqiu Normal University, Shangqiu, Henan 476000, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
8
|
Zhang R, Lin Y, Wu Y, Deng L, Zhang H, Liao M, Peng Y. MvMRL: a multi-view molecular representation learning method for molecular property prediction. Brief Bioinform 2024; 25:bbae298. [PMID: 38920342 PMCID: PMC11200189 DOI: 10.1093/bib/bbae298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2024] [Revised: 05/09/2024] [Accepted: 06/07/2024] [Indexed: 06/27/2024] Open
Abstract
Effective molecular representation learning is very important for Artificial Intelligence-driven Drug Design because it affects the accuracy and efficiency of molecular property prediction and other molecular modeling relevant tasks. However, previous molecular representation learning studies often suffer from limitations, such as over-reliance on a single molecular representation, failure to fully capture both local and global information in molecular structure, and ineffective integration of multiscale features from different molecular representations. These limitations restrict the complete and accurate representation of molecular structure and properties, ultimately impacting the accuracy of predicting molecular properties. To this end, we propose a novel multi-view molecular representation learning method called MvMRL, which can incorporate feature information from multiple molecular representations and capture both local and global information from different views well, thus improving molecular property prediction. Specifically, MvMRL consists of four parts: a multiscale CNN-SE Simplified Molecular Input Line Entry System (SMILES) learning component and a multiscale Graph Neural Network encoder to extract local feature information and global feature information from the SMILES view and the molecular graph view, respectively; a Multi-Layer Perceptron network to capture complex non-linear relationship features from the molecular fingerprint view; and a dual cross-attention component to fuse feature information on the multi-views deeply for predicting molecular properties. We evaluate the performance of MvMRL on 11 benchmark datasets, and experimental results show that MvMRL outperforms state-of-the-art methods, indicating its rationality and effectiveness in molecular property prediction. The source code of MvMRL was released in https://github.com/jedison-github/MvMRL.
Collapse
Affiliation(s)
- Ru Zhang
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Yanmei Lin
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Center for Applied Mathematics of Guangxi, Nanning Normal University, 508 Xinning Road, Wuming District, Nanning 530100, China
| | - Yijia Wu
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 932 Lushan South Road, Changsha 410083, China
| | - Hao Zhang
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Shenzhen 518000, China
| | - Mingzhi Liao
- Center of Bioinformatics, College of Life Sciences, Northwest A&F University, 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Yuzhong Peng
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Nanning Normal University, No. 175, Mingxiu East Road, Xixiang Tang District, Nanning 530001, China
- Guangxi Academy of Sciences, 174 East University Road, Nanning 530007, China
| |
Collapse
|
9
|
Pang Y, Chen Y, Lin M, Zhang Y, Zhang J, Wang L. MMSyn: A New Multimodal Deep Learning Framework for Enhanced Prediction of Synergistic Drug Combinations. J Chem Inf Model 2024; 64:3689-3705. [PMID: 38676916 DOI: 10.1021/acs.jcim.4c00165] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/29/2024]
Abstract
Combination therapy is a promising strategy for the successful treatment of cancer. The large number of possible combinations, however, mean that it is laborious and expensive to screen for synergistic drug combinations in vitro. Nevertheless, because of the availability of high-throughput screening data and advances in computational techniques, deep learning (DL) can be a useful tool for the prediction of synergistic drug combinations. In this study, we proposed a multimodal DL framework, MMSyn, for the prediction of synergistic drug combinations. First, features embedded in the drug molecules were extracted: structure, fingerprint, and string encoding. Then, gene expression data, DNA copy number, and pathway activity were used to describe cancer cell lines. Finally, these processed features were integrated using an attention mechanism and an interaction module and then input into a multilayer perceptron to predict drug synergy. Experimental results showed that our method outperformed five state-of-the-art DL methods and three traditional machine learning models for drug combination prediction. We verified that MMSyn achieved superior performance in stratified cross-validation settings using both the drug combination and cell line data. Moreover, we performed a set of ablation experiments to illustrate the effectiveness of each component and the efficacy of our model. In addition, our visual representation and case studies further confirmed the effectiveness of our model. All results showed that MMSyn can be used as a powerful tool for the prediction of synergistic drug combinations.
Collapse
Affiliation(s)
- Yu Pang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yihao Chen
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Mujie Lin
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Yanhong Zhang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Jiquan Zhang
- Guizhou Provincial Engineering Technology Research Center for Chemical Drug R&D, College of Pharmacy, Guizhou Medical University, Guiyang 550025, P. R. China
| | - Ling Wang
- Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
10
|
Yao R, Shen Z, Xu X, Ling G, Xiang R, Song T, Zhai F, Zhai Y. Knowledge mapping of graph neural networks for drug discovery: a bibliometric and visualized analysis. Front Pharmacol 2024; 15:1393415. [PMID: 38799167 PMCID: PMC11116974 DOI: 10.3389/fphar.2024.1393415] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 04/12/2024] [Indexed: 05/29/2024] Open
Abstract
Introduction In recent years, graph neural network has been extensively applied to drug discovery research. Although researchers have made significant progress in this field, there is less research on bibliometrics. The purpose of this study is to conduct a comprehensive bibliometric analysis of graph neural network applications in drug discovery in order to identify current research hotspots and trends, as well as serve as a reference for future research. Methods Publications from 2017 to 2023 about the application of graph neural network in drug discovery were collected from the Web of Science Core Collection. Bibliometrix, VOSviewer, and Citespace were mainly used for bibliometric studies. Results and Discussion In this paper, a total of 652 papers from 48 countries/regions were included. Research interest in this field is continuously increasing. China and the United States have a significant advantage in terms of funding, the number of publications, and collaborations with other institutions and countries. Although some cooperation networks have been formed in this field, extensive worldwide cooperation still needs to be strengthened. The results of the keyword analysis clarified that graph neural network has primarily been applied to drug-target interaction, drug repurposing, and drug-drug interaction, while graph convolutional neural network and its related optimization methods are currently the core algorithms in this field. Data availability and ethical supervision, balancing computing resources, and developing novel graph neural network models with better interpretability are the key technical issues currently faced. This paper analyzes the current state, hot spots, and trends of graph neural network applications in drug discovery through bibliometric approaches, as well as the current issues and challenges in this field. These findings provide researchers with valuable insights on the current status and future directions of this field.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Fei Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| | - Yuxuan Zhai
- Faculty of Medical Device, Shenyang Pharmaceutical University, Shenyang, China
| |
Collapse
|
11
|
Zhang X, Sheng Y, Liu X, Yang J, Goddard Iii WA, Ye C, Zhang W. Polymer-Unit Graph: Advancing Interpretability in Graph Neural Network Machine Learning for Organic Polymer Semiconductor Materials. J Chem Theory Comput 2024; 20:2908-2920. [PMID: 38551455 DOI: 10.1021/acs.jctc.3c01385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/10/2024]
Abstract
The graph representation of complex materials plays a crucial role in the field of inorganic and organic materials investigations for developing data-centric materials science, such as those using graph neural networks (GNNs). However, the currently prevalent GNN models are primarily employed for investigating periodic crystals and organic small molecule data, yet they still encounter challenges in terms of interpretability and computational efficiency when applied to polymer monomers and organic macromolecules data. There is still a lack of graph representation of organic polymers and macromolecules specifically tailored for GNN models to explore the structural characteristics. The Polymer-unit Graph, a novel coarse-grained graph representation method introduced in study, is dedicated to expressing and analyzing polymers and macromolecules. By incorporating the Polymer-unit Graph into the GNN models and analyzing the organic semiconductor (OSC) materials database, it becomes possible to uncover intricate structure-property relationships involving branched-chain engineering, fluoridation substitution, and donor-acceptor combination effects on the elementary structure of OSC polymers. Furthermore, the Polymer-unit Graph enables visualizing the relationship between target properties and polymer units while reducing training time by an impressive 98% and minimizing molecular graph representation models. In conclusion, the Polymer-unit Graph successfully integrates the concept of Polymer-unit into the field of GNNs, enabling more accurate analysis and understanding of organic polymers and macromolecules.
Collapse
Affiliation(s)
- Xinyue Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Ye Sheng
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Xiumin Liu
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Key Laboratory of Soft Chemistry and Functional Materials of MOE, School of Chemistry and Chemical Engineering, Nanjing University of Science and Technology, Nanjing 210094, PR China
| | - Jiong Yang
- Materials Genome Institute, Shanghai University, Shanghai 200444, PR China
| | - William A Goddard Iii
- Materials and Process Simulation Center (MSC), California Institute of Technology, Pasadena, California 91125, United States
| | - Caichao Ye
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
- Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen 518055, PR China
| | - Wenqing Zhang
- Department of Materials Science and Engineering & Guangdong Provincial Key Laboratory of Computational Science and Material Design, Southern University of Science and Technology, Shenzhen 518055, PR China
| |
Collapse
|
12
|
Fu L, Shi S, Yi J, Wang N, He Y, Wu Z, Peng J, Deng Y, Wang W, Wu C, Lyu A, Zeng X, Zhao W, Hou T, Cao D. ADMETlab 3.0: an updated comprehensive online ADMET prediction platform enhanced with broader coverage, improved performance, API functionality and decision support. Nucleic Acids Res 2024:gkae236. [PMID: 38572755 DOI: 10.1093/nar/gkae236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2024] [Revised: 03/10/2024] [Accepted: 03/21/2024] [Indexed: 04/05/2024] Open
Abstract
ADMETlab 3.0 is the second updated version of the web server that provides a comprehensive and efficient platform for evaluating ADMET-related parameters as well as physicochemical properties and medicinal chemistry characteristics involved in the drug discovery process. This new release addresses the limitations of the previous version and offers broader coverage, improved performance, API functionality, and decision support. For supporting data and endpoints, this version includes 119 features, an increase of 31 compared to the previous version. The updated number of entries is 1.5 times larger than the previous version with over 400 000 entries. ADMETlab 3.0 incorporates a multi-task DMPNN architecture coupled with molecular descriptors, a method that not only guaranteed calculation speed for each endpoint simultaneously, but also achieved a superior performance in terms of accuracy and robustness. In addition, an API has been introduced to meet the growing demand for programmatic access to large amounts of data in ADMETlab 3.0. Moreover, this version includes uncertainty estimates in the prediction results, aiding in the confident selection of candidate compounds for further studies and experiments. ADMETlab 3.0 is publicly for access without the need for registration at: https://admetlab3.scbdd.com.
Collapse
Affiliation(s)
- Li Fu
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Shaohua Shi
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Jiacai Yi
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Ningning Wang
- Xiangya Hospital of Central South University, Changsha, Hunan 410008, P.R. China
| | - Yuanhang He
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Zhenxing Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Jinfu Peng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Youchao Deng
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Wenxuan Wang
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| | - Chengkun Wu
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Aiping Lyu
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, Hong Kong SAR, 999077, P.R. China
| | - Xiangxiang Zeng
- Department of Computer Science, Hunan University, Changsha, Hunan 410082, P.R. China
| | - Wentao Zhao
- School of Computer Science, National University of Defense Technology, Changsha, Hunan 410073, P.R. China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, Zhejiang 310058, P.R. China
| | - Dongsheng Cao
- Xiangya School of Pharmaceutical Sciences, Central South University, Changsha, Hunan 410013, P.R. China
| |
Collapse
|
13
|
Wu K, Yang X, Wang Z, Li N, Zhang J, Liu L. Data-balanced transformer for accelerated ionizable lipid nanoparticles screening in mRNA delivery. Brief Bioinform 2024; 25:bbae186. [PMID: 38670158 PMCID: PMC11052633 DOI: 10.1093/bib/bbae186] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/26/2024] [Accepted: 04/05/2024] [Indexed: 04/28/2024] Open
Abstract
Despite the widespread use of ionizable lipid nanoparticles (LNPs) in clinical applications for messenger RNA (mRNA) delivery, the mRNA drug delivery system faces an efficient challenge in the screening of LNPs. Traditional screening methods often require a substantial amount of experimental time and incur high research and development costs. To accelerate the early development stage of LNPs, we propose TransLNP, a transformer-based transfection prediction model designed to aid in the selection of LNPs for mRNA drug delivery systems. TransLNP uses two types of molecular information to perceive the relationship between structure and transfection efficiency: coarse-grained atomic sequence information and fine-grained atomic spatial relationship information. Due to the scarcity of existing LNPs experimental data, we find that pretraining the molecular model is crucial for better understanding the task of predicting LNPs properties, which is achieved through reconstructing atomic 3D coordinates and masking atom predictions. In addition, the issue of data imbalance is particularly prominent in the real-world exploration of LNPs. We introduce the BalMol block to solve this problem by smoothing the distribution of labels and molecular features. Our approach outperforms state-of-the-art works in transfection property prediction under both random and scaffold data splitting. Additionally, we establish a relationship between molecular structural similarity and transfection differences, selecting 4267 pairs of molecular transfection cliffs, which are pairs of molecules that exhibit high structural similarity but significant differences in transfection efficiency, thereby revealing the primary source of prediction errors. The code, model and data are made publicly available at https://github.com/wklix/TransLNP.
Collapse
Affiliation(s)
- Kun Wu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiulong Yang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zixu Wang
- Department of Computer Science, University of Tsukuba, Tsukuba 3058577, Japan
| | - Na Li
- National Facility for Protein Science in Shanghai, Zhangjiang Laboratory, Shanghai Advanced Research Institute, Chinese Academy of Sciences
| | - Jialu Zhang
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Lizhuang Liu
- Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai 201210, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| |
Collapse
|
14
|
Yang Z, Wang L, Yang Y, Pang X, Sun Y, Liang Y, Cao H. Screening of the Antagonistic Activity of Potential Bisphenol A Alternatives toward the Androgen Receptor Using Machine Learning and Molecular Dynamics Simulation. ENVIRONMENTAL SCIENCE & TECHNOLOGY 2024; 58:2817-2829. [PMID: 38291630 DOI: 10.1021/acs.est.3c09779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2024]
Abstract
Over the past few decades, extensive research has indicated that exposure to bisphenol A (BPA) increases the health risks in humans. Toxicological studies have demonstrated that BPA can bind to the androgen receptor (AR), resulting in endocrine-disrupting effects. In recent investigations, many alternatives to BPA have been detected in various environmental media as major pollutants. However, related experimental evaluations of BPA alternatives have not been systematically implemented for the assessment of chemical safety and the effects of structural characteristics on the antagonistic activity of the AR. To promote the green development of BPA alternatives, high-throughput toxicological screening is fundamental for prioritizing chemical tests. Therefore, we proposed a hybrid deep learning architecture that combines molecular descriptors and molecular graphs to predict AR antagonistic activity. Compared to previous models, this hybrid architecture can extract substantial chemical information from various molecular representations to improve the model's generalization ability for BPA alternatives. Our predictions suggest that lignin-derivable bisguaiacols, as alternatives to BPA, are likely to be nonantagonist for AR compared to bisphenol analogues. Additionally, molecular dynamics (MD) simulations identified the dihydrotestosterone-bound pocket, rather than the surface, as the major binding site of bisphenol analogues. The conformational changes of key helix H12 from an agonistic to an antagonistic conformation can be evaluated qualitatively by accelerated MD simulations to explain the underlying mechanism. Overall, our computational study is helpful for toxicological screening of BPA alternatives and the design of environmentally friendly BPA alternatives.
Collapse
Affiliation(s)
- Zeguo Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ling Wang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Ying Yang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Xudi Pang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yuzhen Sun
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Yong Liang
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| | - Huiming Cao
- Hubei Key Laboratory of Environmental and Health Effects of Persistent Toxic Substances, School of Environment and Health, Jianghan University, Wuhan 430056, China
| |
Collapse
|
15
|
Ma M, Lei X. A deep learning framework for predicting molecular property based on multi-type features fusion. Comput Biol Med 2024; 169:107911. [PMID: 38160501 DOI: 10.1016/j.compbiomed.2023.107911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 12/18/2023] [Accepted: 12/24/2023] [Indexed: 01/03/2024]
Abstract
Extracting expressive molecular features is essential for molecular property prediction. Sequence-based representation is a common representation of molecules, which ignores the structure information of molecules. While molecular graph representation has a weak ability in expressing the 3D structure. In this article, we try to make use of the advantages of different type representations simultaneously for molecular property prediction. Thus, we propose a fusion model named DLF-MFF, which integrates the multi-type molecular features. Specifically, we first extract four different types of features from molecular fingerprints, 2D molecular graph, 3D molecular graph and molecular image. Then, in order to learn molecular features individually, we use four essential deep learning frameworks, which correspond to four distinct molecular representations. The final molecular representation is created by integrating the four feature vectors and feeding them into prediction layer to predict molecular property. We compare DLF-MFF with 7 state-of-the-art methods on 6 benchmark datasets consisting of multiple molecular properties, the experimental results show that DLF-MFF achieves state-of-the-art performance on 6 benchmark datasets. Moreover, DLF-MFF is applied to identify potential anti-SARS-CoV-2 inhibitor from 2500 drugs. We predict probability of each drug being inferred as a 3CL protease inhibitor and also calculate the binding affinity scores between each drug and 3CL protease. The results show that DLF-MFF product better performance in the identification of anti-SARS-CoV-2 inhibitor. This work is expected to offer novel research perspectives for accurate prediction of molecular properties and provide valuable insights into drug repurposing for COVID-19.
Collapse
Affiliation(s)
- Mei Ma
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China; School of Mathematics and Statistics, Qinghai Normal University, Qinghai, 810000, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, China.
| |
Collapse
|
16
|
Wu J, Chen Y, Wu J, Zhao D, Huang J, Lin M, Wang L. Large-scale comparison of machine learning methods for profiling prediction of kinase inhibitors. J Cheminform 2024; 16:13. [PMID: 38291477 PMCID: PMC10829268 DOI: 10.1186/s13321-023-00799-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 12/22/2023] [Indexed: 02/01/2024] Open
Abstract
Conventional machine learning (ML) and deep learning (DL) play a key role in the selectivity prediction of kinase inhibitors. A number of models based on available datasets can be used to predict the kinase profile of compounds, but there is still controversy about the advantages and disadvantages of ML and DL for such tasks. In this study, we constructed a comprehensive benchmark dataset of kinase inhibitors, involving in 141,086 unique compounds and 216,823 well-defined bioassay data points for 354 kinases. We then systematically compared the performance of 12 ML and DL methods on the kinase profiling prediction task. Extensive experimental results reveal that (1) Descriptor-based ML models generally slightly outperform fingerprint-based ML models in terms of predictive performance. RF as an ensemble learning approach displays the overall best predictive performance. (2) Single-task graph-based DL models are generally inferior to conventional descriptor- and fingerprint-based ML models, however, the corresponding multi-task models generally improves the average accuracy of kinase profile prediction. For example, the multi-task FP-GNN model outperforms the conventional descriptor- and fingerprint-based ML models with an average AUC of 0.807. (3) Fusion models based on voting and stacking methods can further improve the performance of the kinase profiling prediction task, specifically, RF::AtomPairs + FP2 + RDKitDes fusion model performs best with the highest average AUC value of 0.825 on the test sets. These findings provide useful information for guiding choices of the ML and DL methods for the kinase profiling prediction tasks. Finally, an online platform called KIPP ( https://kipp.idruglab.cn ) and python software are developed based on the best models to support the kinase profiling prediction, as well as various kinase inhibitor identification tasks including virtual screening, compound repositioning and target fishing.
Collapse
Affiliation(s)
- Jiangxia Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yihao Chen
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jingxing Wu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Jindi Huang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - MuJie Lin
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
17
|
Hao Y, Chen X, Fei A, Jia Q, Chen Y, Shao J, Pandiyan S, Wang L. SG-ATT: A Sequence Graph Cross-Attention Representation Architecture for Molecular Property Prediction. Molecules 2024; 29:492. [PMID: 38276570 PMCID: PMC10819071 DOI: 10.3390/molecules29020492] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/06/2024] [Accepted: 01/14/2024] [Indexed: 01/27/2024] Open
Abstract
Existing formats based on the simplified molecular input line entry system (SMILES) encoding and molecular graph structure are designed to encode the complete semantic and structural information of molecules. However, the physicochemical properties of molecules are complex, and a single encoding of molecular features from SMILES sequences or molecular graph structures cannot adequately represent molecular information. Aiming to address this problem, this study proposes a sequence graph cross-attention (SG-ATT) representation architecture for a molecular property prediction model to efficiently use domain knowledge to enhance molecular graph feature encoding and combine the features of molecular SMILES sequences. The SG-ATT fuses the two-dimensional molecular features so that the current model input molecular information contains molecular structure information and semantic information. The SG-ATT was tested on nine molecular property prediction tasks. Among them, the biggest SG-ATT model performance improvement was 4.5% on the BACE dataset, and the average model performance improvement was 1.83% on the full dataset. Additionally, specific model interpretability studies were conducted to showcase the performance of the SG-ATT model on different datasets. In-depth analysis was provided through case studies of in vitro validation. Finally, network tools for molecular property prediction were developed for the use of researchers.
Collapse
Affiliation(s)
- Yajie Hao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Xing Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Ailu Fei
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Qifeng Jia
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Yu Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Jinsong Shao
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Sanjeevi Pandiyan
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
| | - Li Wang
- School of Information Science and Technology, Nantong University, Nantong 226001, China; (Y.H.); (X.C.); (A.F.); (Q.J.); (Y.C.); (J.S.); (S.P.)
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| |
Collapse
|
18
|
Wang J, Zhang L, Sun J, Yang X, Wu W, Chen W, Zhao Q. Predicting drug-induced liver injury using graph attention mechanism and molecular fingerprints. Methods 2024; 221:18-26. [PMID: 38040204 DOI: 10.1016/j.ymeth.2023.11.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2023] [Revised: 11/14/2023] [Accepted: 11/25/2023] [Indexed: 12/03/2023] Open
Abstract
Drug-induced liver injury (DILI) is a significant issue in drug development and clinical treatment due to its potential to cause liver dysfunction or damage, which, in severe cases, can lead to liver failure or even fatality. DILI has numerous pathogenic factors, many of which remain incompletely understood. Consequently, it is imperative to devise methodologies and tools for anticipatory assessment of DILI risk in the initial phases of drug development. In this study, we present DMFPGA, a novel deep learning predictive model designed to predict DILI. To provide a comprehensive description of molecular properties, we employ a multi-head graph attention mechanism to extract features from the molecular graphs, representing characteristics at the level of compound nodes. Additionally, we combine multiple fingerprints of molecules to capture features at the molecular level of compounds. The fusion of molecular fingerprints and graph features can more fully express the properties of compounds. Subsequently, we employ a fully connected neural network to classify compounds as either DILI-positive or DILI-negative. To rigorously evaluate DMFPGA's performance, we conduct a 5-fold cross-validation experiment. The obtained results demonstrate the superiority of our method over four existing state-of-the-art computational approaches, exhibiting an average AUC of 0.935 and an average ACC of 0.934. We believe that DMFPGA is helpful for early-stage DILI prediction and assessment in drug development.
Collapse
Affiliation(s)
- Jifeng Wang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Li Zhang
- School of Life Science, Liaoning University, Shenyang 110036, China
| | - Jianqiang Sun
- School of Information Science and Engineering, Linyi University, Linyi 276000, China
| | - Xin Yang
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Wu
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Wei Chen
- Innovative Institute of Chinese Medicine and Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China.
| |
Collapse
|
19
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
20
|
Li B, Lin M, Chen T, Wang L. FG-BERT: a generalized and self-supervised functional group-based molecular representation learning framework for properties prediction. Brief Bioinform 2023; 24:bbad398. [PMID: 37930026 DOI: 10.1093/bib/bbad398] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/25/2023] [Accepted: 10/14/2023] [Indexed: 11/07/2023] Open
Abstract
Artificial intelligence-based molecular property prediction plays a key role in molecular design such as bioactive molecules and functional materials. In this study, we propose a self-supervised pretraining deep learning (DL) framework, called functional group bidirectional encoder representations from transformers (FG-BERT), pertained based on ~1.45 million unlabeled drug-like molecules, to learn meaningful representation of molecules from function groups. The pretrained FG-BERT framework can be fine-tuned to predict molecular properties. Compared to state-of-the-art (SOTA) machine learning and DL methods, we demonstrate the high performance of FG-BERT in evaluating molecular properties in tasks involving physical chemistry, biophysics and physiology across 44 benchmark datasets. In addition, FG-BERT utilizes attention mechanisms to focus on FG features that are critical to the target properties, thereby providing excellent interpretability for downstream training tasks. Collectively, FG-BERT does not require any artificially crafted features as input and has excellent interpretability, providing an out-of-the-box framework for developing SOTA models for a variety of molecule (especially for drug) discovery tasks.
Collapse
Affiliation(s)
- Biaoshun Li
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Mujie Lin
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| | - Tiegen Chen
- Zhongshan Institute for Drug Discovery, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Room 109, Building C, SSIP Healthcare and Medicine Demonstration Zone, Zhongshan Tsuihang New District, Zhongshan, Guangdong, 528400, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Ministry of Education, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou 510006, China
| |
Collapse
|
21
|
Han S, Fu H, Wu Y, Zhao G, Song Z, Huang F, Zhang Z, Liu S, Zhang W. HimGNN: a novel hierarchical molecular graph representation learning framework for property prediction. Brief Bioinform 2023; 24:bbad305. [PMID: 37594313 DOI: 10.1093/bib/bbad305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 07/18/2023] [Accepted: 08/04/2023] [Indexed: 08/19/2023] Open
Abstract
Accurate prediction of molecular properties is an important topic in drug discovery. Recent works have developed various representation schemes for molecular structures to capture different chemical information in molecules. The atom and motif can be viewed as hierarchical molecular structures that are widely used for learning molecular representations to predict chemical properties. Previous works have attempted to exploit both atom and motif to address the problem of information loss in single representation learning for various tasks. To further fuse such hierarchical information, the correspondence between learned chemical features from different molecular structures should be considered. Herein, we propose a novel framework for molecular property prediction, called hierarchical molecular graph neural networks (HimGNN). HimGNN learns hierarchical topology representations by applying graph neural networks on atom- and motif-based graphs. In order to boost the representational power of the motif feature, we design a Transformer-based local augmentation module to enrich motif features by introducing heterogeneous atom information in motif representation learning. Besides, we focus on the molecular hierarchical relationship and propose a simple yet effective rescaling module, called contextual self-rescaling, that adaptively recalibrates molecular representations by explicitly modelling interdependencies between atom and motif features. Extensive computational experiments demonstrate that HimGNN can achieve promising performances over state-of-the-art baselines on both classification and regression tasks in molecular property prediction.
Collapse
Affiliation(s)
- Shen Han
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Haitao Fu
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Yuyang Wu
- College of Plant Science and Technology, Huazhong Agricultural University, People's Republic of China
| | - Ganglan Zhao
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhenyu Song
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Feng Huang
- College of Informatics, Huazhong Agricultural University, People's Republic of China
| | - Zhongfei Zhang
- Computer Science Department, Binghamton University, Binghamton, NY, USA
| | - Shichao Liu
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| | - Wen Zhang
- College of Informatics, Huazhong Agricultural University, People's Republic of China and Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, Key Laboratory of Smart Animal Farming Technology, Ministry of Agriculture, Huazhong Agricultural University
| |
Collapse
|
22
|
Song Y, Chang S, Tian J, Pan W, Feng L, Ji H. A Comprehensive Comparative Analysis of Deep Learning Based Feature Representations for Molecular Taste Prediction. Foods 2023; 12:3386. [PMID: 37761095 PMCID: PMC10529232 DOI: 10.3390/foods12183386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/30/2023] [Accepted: 09/01/2023] [Indexed: 09/29/2023] Open
Abstract
Taste determination in small molecules is critical in food chemistry but traditional experimental methods can be time-consuming. Consequently, computational techniques have emerged as valuable tools for this task. In this study, we explore taste prediction using various molecular feature representations and assess the performance of different machine learning algorithms on a dataset comprising 2601 molecules. The results reveal that GNN-based models outperform other approaches in taste prediction. Moreover, consensus models that combine diverse molecular representations demonstrate improved performance. Among these, the molecular fingerprints + GNN consensus model emerges as the top performer, highlighting the complementary strengths of GNNs and molecular fingerprints. These findings have significant implications for food chemistry research and related fields. By leveraging these computational approaches, taste prediction can be expedited, leading to advancements in understanding the relationship between molecular structure and taste perception in various food components and related compounds.
Collapse
Affiliation(s)
- Yu Song
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Sihao Chang
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Jing Tian
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Weihua Pan
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Lu Feng
- Zhengzhou Research Base, State Key Laboratory of Cotton Biology, School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, China;
| | - Hongchao Ji
- Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Shenzhen 518120, China
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
23
|
Wu T, Tang Y, Sun Q, Xiong L. Molecular Joint Representation Learning via Multi-Modal Information of SMILES and Graphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3044-3055. [PMID: 37028366 DOI: 10.1109/tcbb.2023.3253862] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
In recent years, artificial intelligence has played an important role on accelerating the whole process of drug discovery. Various of molecular representation schemes of different modals (e.g., textual sequence or graph) are developed. By digitally encoding them, different chemical information can be learned through corresponding network structures. Molecular graphs and Simplified Molecular Input Line Entry System (SMILES) are popular means for molecular representation learning in current. Previous works have done attempts by combining both of them to solve the problem of specific information loss in single-modal representation on various tasks. To further fusing such multi-modal imformation, the correspondence between learned chemical feature from different representation should be considered. To realize this, we propose a novel framework of molecular joint representation learning via Multi-Modal information of SMILES and molecular Graphs, called MMSG. We improve the self-attention mechanism by introducing bond-level graph representation as attention bias in Transformer to reinforce feature correspondence between multi-modal information. We further propose a Bidirectional Message Communication Graph Neural Network (BMC GNN) to strengthen the information flow aggregated from graphs for further combination. Numerous experiments on public property prediction datasets have demonstrated the effectiveness of our model.
Collapse
|
24
|
Fang K, Zhang Y, Du S, He J. ColdDTA: Utilizing data augmentation and attention-based feature fusion for drug-target binding affinity prediction. Comput Biol Med 2023; 164:107372. [PMID: 37597410 DOI: 10.1016/j.compbiomed.2023.107372] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 07/26/2023] [Accepted: 08/12/2023] [Indexed: 08/21/2023]
Abstract
Accurate prediction of drug-target affinity (DTA) plays a crucial role in drug discovery and development. Recently, deep learning methods have shown excellent predictive performance on randomly split public datasets. However, verifications are still required on this splitting method to reflect real-world problems in practical applications. And in a cold-start experimental setup, where drugs or proteins in the test set do not appear in the training set, the performance of deep learning models often significantly decreases. This indicates that improving the generalization ability of the models remains a challenge. To this end, in this study, we propose ColdDTA: using data augmentation and attention-based feature fusion to improve the generalization ability of predicting drug-target binding affinity. Specifically, ColdDTA generates new drug-target pairs by removing subgraphs of drugs. The attention-based feature fusion module is also used to better capture the drug-target interactions. We conduct cold-start experiments on three benchmark datasets, and the consistency index (CI) and mean square error (MSE) results on the Davis and KIBA datasets show that ColdDTA outperforms the five state-of-the-art baseline methods. Meanwhile, the results of area under the receiver operating characteristic (ROC-AUC) on the BindingDB dataset show that ColdDTA also has better performance on the classification task. Furthermore, visualizing the model weights allows for interpretable insights. Overall, ColdDTA can better solve the realistic DTA prediction problem. The code has been available to the public.
Collapse
Affiliation(s)
- Kejie Fang
- Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, 315211, China
| | - Yiming Zhang
- Engineering Laboratory of Advanced Energy Materials, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315201, China
| | - Shiyu Du
- Engineering Laboratory of Advanced Energy Materials, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, 315201, China; School of Materials Science and Engineering and School of Computer Science, China University of Petroleum (East China), Qingdao, 266580, China.
| | - Jian He
- State Key Laboratory of Systems Medicine for Cancer, Center for Single-Cell Omics, School of Public Health, Shanghai Jiao Tong University School of Medicine, Shanghai, 200025, China.
| |
Collapse
|
25
|
Xu H, Zhang B, Liu Q. Deep learning-based classification model for GPR151 activator activity prediction. BMC Bioinformatics 2023; 24:245. [PMID: 37296398 DOI: 10.1186/s12859-023-05369-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 05/29/2023] [Indexed: 06/12/2023] Open
Abstract
BACKGROUND GPR151 is a kind of protein belonging to G protein-coupled receptor family that is closely associated with a variety of physiological and pathological processes.The potential use of GPR151 as a therapeutic target for the management of metabolic disorders has been demonstrated in several studies, highlighting the demand to explore its activators further. Activity prediction serves as a vital preliminary step in drug discovery, which is both costly and time-consuming. Thus, the development of reliable activity classification model has become an essential way in the process of drug discovery, aiming to enhance the efficiency of virtual screening. RESULTS We propose a learning-based method based on feature extractor and deep neural network to predict the activity of GPR151 activators. We first introduce a new molecular feature extraction algorithm which utilizes the idea of bag-of-words model in natural language to densify the sparse fingerprint vector. Mol2vec method is also used to extract diverse features. Then, we construct three classical feature selection algorithms and three types of deep learning model to enhance the representational capacity of molecules and predict activity label by five different classifiers. We conduct experiments using our own dataset of GPR151 activators. The results demonstrate high classification accuracy and stability, with the optimal model Mol2vec-CNN significantly improving performance across multiple classifiers. The svm classifier achieves the best accuracy of 0.92 and F1 score of 0.76 which indicates promising applications for our method in the field of activity prediction. CONCLUSION The results suggest that the experimental design of this study is appropriate and well-conceived. The deep learning-based feature extraction algorithm established in this study outperforms traditional feature selection algorithm for activity prediction. The model developed can be effectively utilized in the pre-screening stage of drug virtual screening.
Collapse
Affiliation(s)
- Huangchao Xu
- Computer Network Information Center, Chinese Academy of Sciences, Dongsheng Sourth Street No.2, Haidian District, Beijing, 100190, China
- University of Chinese Academy of Sciences, No.1 Yanqihu East Rd, Huairou District, Beijing, 101408, China
| | - Baohua Zhang
- Computer Network Information Center, Chinese Academy of Sciences, Dongsheng Sourth Street No.2, Haidian District, Beijing, 100190, China
| | - Qian Liu
- Computer Network Information Center, Chinese Academy of Sciences, Dongsheng Sourth Street No.2, Haidian District, Beijing, 100190, China.
| |
Collapse
|
26
|
Wu J, Xiao Y, Lin M, Cai H, Zhao D, Li Y, Luo H, Tang C, Wang L. DeepCancerMap: A versatile deep learning platform for target- and cell-based anticancer drug discovery. Eur J Med Chem 2023; 255:115401. [PMID: 37116265 DOI: 10.1016/j.ejmech.2023.115401] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2022] [Revised: 03/29/2023] [Accepted: 04/18/2023] [Indexed: 04/30/2023]
Abstract
Discovering new anticancer drugs has been widely concerned and remains an open challenge. Target- and phenotypic-based experimental screening represent two mainstream anticancer drug discovery methods, which suffer from time-consuming, labor-intensive, and high experimental costs. In this study, we collected 485,900 compounds involving in 3,919,974 bioactivity records against 426 anticancer targets and 346 cancer cell lines from academic literature, as well as 60 tumor cell lines from NCI-60 panel. A total of 832 classification models (426 target- and 406 cell-based predictive models) were then constructed to predict the inhibitory activity of compounds against targets and tumor cell lines using FP-GNN deep learning method. Compared to the classical machine learning and deep learning methods, the FP-GNN models achieve considerable overall predictive performance, with the highest AUC values of 0.91, 0.88, 0.91 for the test sets of targets, academia-sourced and NCI-60 cancer cell lines, respectively. A user-friendly webserver called DeepCancerMap and its local version were developed based on these high-quality models, enabling users to perform anticancer drug discovery-related tasks including large-scale virtual screening, profiling prediction of anticancer agents, target fishing, and drug repositioning. We anticipate this platform to accelerate the discovery of anticancer drugs in the field. DeepCancerMap is freely available at https://deepcancermap.idruglab.cn.
Collapse
Affiliation(s)
- Jingxing Wu
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yi Xiao
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Mujie Lin
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hanxuan Cai
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Duancheng Zhao
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Yirui Li
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Hailin Luo
- School of Software Engineering, South China University of Technology, Guangzhou, 510006, China
| | - Chuanqi Tang
- School of Design, South China University of Technology, Guangzhou, 510006, China
| | - Ling Wang
- School of Biology and Biological Engineering, South China University of Technology, Guangzhou, 510006, China.
| |
Collapse
|
27
|
Zhu W, Zhang Y, Zhao D, Xu J, Wang L. HiGNN: A Hierarchical Informative Graph Neural Network for Molecular Property Prediction Equipped with Feature-Wise Attention. J Chem Inf Model 2023; 63:43-55. [PMID: 36519623 DOI: 10.1021/acs.jcim.2c01099] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
Elucidating and accurately predicting the druggability and bioactivities of molecules plays a pivotal role in drug design and discovery and remains an open challenge. Recently, graph neural networks (GNNs) have made remarkable advancements in graph-based molecular property prediction. However, current graph-based deep learning methods neglect the hierarchical information of molecules and the relationships between feature channels. In this study, we propose a well-designed hierarchical informative graph neural network (termed HiGNN) framework for predicting molecular property by utilizing a corepresentation learning of molecular graphs and chemically synthesizable breaking of retrosynthetically interesting chemical substructure (BRICS) fragments. Furthermore, a plug-and-play feature-wise attention block is first designed in HiGNN architecture to adaptively recalibrate atomic features after the message passing phase. Extensive experiments demonstrate that HiGNN achieves state-of-the-art predictive performance on many challenging drug discovery-associated benchmark data sets. In addition, we devise a molecule-fragment similarity mechanism to comprehensively investigate the interpretability of the HiGNN model at the subgraph level, indicating that HiGNN as a powerful deep learning tool can help chemists and pharmacists identify the key components of molecules for designing better molecules with desired properties or functions. The source code is publicly available at https://github.com/idruglab/hignn.
Collapse
Affiliation(s)
- Weimin Zhu
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Yi Zhang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Duancheng Zhao
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| | - Jianrong Xu
- Department of Pharmacology and Chemical Biology, Shanghai Jiao Tong University School of Medicine, Shanghai200025, China.,Academy of Integrative Medicine, Shanghai University of Traditional Chinese Medicine, Shanghai201203, China
| | - Ling Wang
- Guangdong Provincial Key Laboratory of Fermentation and Enzyme Engineering, Joint International Research Laboratory of Synthetic Biology and Medicine, Guangdong Provincial Engineering and Technology Research Center of Biopharmaceuticals, School of Biology and Biological Engineering, South China University of Technology, Guangzhou510006, China
| |
Collapse
|
28
|
Ai D, Cai H, Wei J, Zhao D, Chen Y, Wang L. DEEPCYPs: A deep learning platform for enhanced cytochrome P450 activity prediction. Front Pharmacol 2023; 14:1099093. [PMID: 37101544 PMCID: PMC10123292 DOI: 10.3389/fphar.2023.1099093] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2022] [Accepted: 03/31/2023] [Indexed: 04/28/2023] Open
Abstract
Cytochrome P450 (CYP) is a superfamily of heme-containing oxidizing enzymes involved in the metabolism of a wide range of medicines, xenobiotics, and endogenous compounds. Five of the CYPs (1A2, 2C9, 2C19, 2D6, and 3A4) are responsible for metabolizing the vast majority of approved drugs. Adverse drug-drug interactions, many of which are mediated by CYPs, are one of the important causes for the premature termination of drug development and drug withdrawal from the market. In this work, we reported in silicon classification models to predict the inhibitory activity of molecules against these five CYP isoforms using our recently developed FP-GNN deep learning method. The evaluation results showed that, to the best of our knowledge, the multi-task FP-GNN model achieved the best predictive performance with the highest average AUC (0.905), F1 (0.779), BA (0.819), and MCC (0.647) values for the test sets, even compared to advanced machine learning, deep learning, and existing models. Y-scrambling testing confirmed that the results of the multi-task FP-GNN model were not attributed to chance correlation. Furthermore, the interpretability of the multi-task FP-GNN model enables the discovery of critical structural fragments associated with CYPs inhibition. Finally, an online webserver called DEEPCYPs and its local version software were created based on the optimal multi-task FP-GNN model to detect whether compounds bear potential inhibitory activity against CYPs, thereby promoting the prediction of drug-drug interactions in clinical practice and could be used to rule out inappropriate compounds in the early stages of drug discovery and/or identify new CYPs inhibitors.
Collapse
|
29
|
Ai D, Wu J, Cai H, Zhao D, Chen Y, Wei J, Xu J, Zhang J, Wang L. A multi-task FP-GNN framework enables accurate prediction of selective PARP inhibitors. Front Pharmacol 2022; 13:971369. [DOI: 10.3389/fphar.2022.971369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 09/14/2022] [Indexed: 11/13/2022] Open
Abstract
PARP (poly ADP-ribose polymerase) family is a crucial DNA repair enzyme that responds to DNA damage, regulates apoptosis, and maintains genome stability; therefore, PARP inhibitors represent a promising therapeutic strategy for the treatment of various human diseases including COVID-19. In this study, a multi-task FP-GNN (Fingerprint and Graph Neural Networks) deep learning framework was proposed to predict the inhibitory activity of molecules against four PARP isoforms (PARP-1, PARP-2, PARP-5A, and PARP-5B). Compared with baseline predictive models based on four conventional machine learning methods such as RF, SVM, XGBoost, and LR as well as six deep learning algorithms such as DNN, Attentive FP, MPNN, GAT, GCN, and D-MPNN, the evaluation results indicate that the multi-task FP-GNN method achieves the best performance with the highest average BA, F1, and AUC values of 0.753 ± 0.033, 0.910 ± 0.045, and 0.888 ± 0.016 for the test set. In addition, Y-scrambling testing successfully verified that the model was not results of chance correlation. More importantly, the interpretability of the multi-task FP-GNN model enabled the identification of key structural fragments associated with the inhibition of each PARP isoform. To facilitate the use of the multi-task FP-GNN model in the field, an online webserver called PARPi-Predict and its local version software were created to predict whether compounds bear potential inhibitory activity against PARPs, thereby contributing to design and discover better selective PARP inhibitors.
Collapse
|