1
|
Gong X, Liu Q, Han R, Guo Y, Wang G. MIFS: An adaptive multipath information fused self-supervised framework for drug discovery. Neural Netw 2025; 184:107088. [PMID: 39778297 DOI: 10.1016/j.neunet.2024.107088] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 12/13/2024] [Accepted: 12/21/2024] [Indexed: 01/11/2025]
Abstract
The production of expressive molecular representations with scarce labeled data is challenging for AI-driven drug discovery. Mainstream studies often follow a pipeline that pre-trains a specific molecular encoder and then fine-tunes it. However, the significant challenges of these methods are (1) neglecting the propagation of diverse information within molecules and (2) the absence of knowledge and chemical constraints in the pre-training strategy. In this study, we propose an adaptive multipath information fused self-supervised framework (MIFS) that explores molecular representations from large-scale unlabeled data to aid drug discovery. In MIFS, we innovatively design a dedicated molecular graph encoder called Mol-EN, which implements three pathways of information propagation: atom-to-atom, chemical bond-to-atom, and group-to-atom, to comprehensively perceive and capture abundant semantic information. Furthermore, a novel adaptive pre-training strategy based on molecular scaffolds is devised to pre-train Mol-EN on 11 million unlabeled molecules. It optimizes Mol-EN by constructing a topological contrastive loss to provide additional chemical insights into molecular structures. Subsequently, the pre-trained Mol-EN is fine-tuned on 14 widespread drug discovery benchmark datasets, including molecular properties prediction, drug-target interactions, and drug-drug interactions. Notably, to further enhance chemical knowledge, we introduce an elemental knowledge graph (ElementKG) in the fine-tuning phase. Extensive experiments show that MIFS achieves competitive performance while providing plausible explanations for predictions from a chemical perspective.
Collapse
Affiliation(s)
- Xu Gong
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Qun Liu
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Rui Han
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China.
| | - Yike Guo
- Department of Computer Science and Engineering, The Hong Kong University of Science and Engineering, 999077, Hong Kong, China.
| | - Guoyin Wang
- Chongqing Key Laboratory of Computational Intelligence, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China; College of Computer and Information Science, Chongqing Normal University, Chongqing, 401331, China.
| |
Collapse
|
2
|
Dai J, Zhou Z, Zhao Y, Kong F, Zhai Z, Zhu Z, Cai J, Huang S, Xu Y, Sun T. Combined usage of ligand- and structure-based virtual screening in the artificial intelligence era. Eur J Med Chem 2025; 283:117162. [PMID: 39673863 DOI: 10.1016/j.ejmech.2024.117162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 11/27/2024] [Accepted: 12/09/2024] [Indexed: 12/16/2024]
Abstract
Drug design has always been pursuing techniques with time- and cost-benefits. Virtual screening, generally classified as ligand-based (LBVS) and structure-based (SBVS) approaches, could identify active compounds in the large chemical library to reduce time and cost. Owing to the intrinsic flaws and complementary nature of both approaches, continued efforts have been made to combine them to mitigate limitations. Meanwhile, the emergence of machine learning (ML) endows them with opportunities to leverage vast amounts of data to improve their defects. However, few discussions on how to merge ML-improved LBVS and SBVS have been conducted. Therefore, this review provides insights into combined usage of ML-improved LBVS and SBVS to enlighten medicinal chemists to utilize these joint strategies to lift the screening efficiency as well as AI professionals to design novel techniques.
Collapse
Affiliation(s)
- Jingyi Dai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ziyi Zhou
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Yanru Zhao
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Fanjing Kong
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhenwei Zhai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Zhishan Zhu
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Jie Cai
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Sha Huang
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| | - Ying Xu
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan, China.
| | - Tao Sun
- School of Intelligent Medicine, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China; State Key Laboratory of Southwestern Chinese Medicine Resources, School of Pharmacy, Chengdu University of Traditional Chinese Medicine, Chengdu, 611137, Sichuan, China.
| |
Collapse
|
3
|
Zuo Y, Wu X, Ge F, Yan H, Fei S, Liang J, Deng Z. Research progress on Drug-Target Interactions in the last five years. Anal Biochem 2025; 697:115691. [PMID: 39455038 DOI: 10.1016/j.ab.2024.115691] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2024] [Revised: 10/06/2024] [Accepted: 10/16/2024] [Indexed: 10/28/2024]
Abstract
The identification of Drug-Target Interaction (DTI) is an important step in drug discovery and drug repositioning, and has high application value in multiple fields such as drug discovery, drug repositioning, and repurposing. However, the high cost of experimental validation limits its identification. In contrast, computation-based approaches are both economical and efficient. This review first synthesizes existing chemical genomic approaches, provides a comprehensive summary of prevalent databases for predicting DTIs, and categorizes the feature encodings from recent years. This is followed by an overview and brief description of the methods currently in use for predicting DTIs. The strengths and weaknesses of newly proposed prediction methods in the last five years (2020-2024), including those based on network representation learning and graph neural networks, are then discussed in detail, evaluating the performance of the different methods on a wide range of datasets. Finally, this review explores potential directions for future DTI research, emphasizing how to improve prediction accuracy and efficiency by combining big data and emerging computing technologies.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| | - Xubin Wu
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Fei Ge
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Hongjin Yan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Sirui Fei
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Jingwen Liang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, 214000, China.
| |
Collapse
|
4
|
He H, Chen G, Tang Z, Chen CYC. Dual modality feature fused neural network integrating binding site information for drug target affinity prediction. NPJ Digit Med 2025; 8:67. [PMID: 39875637 PMCID: PMC11775287 DOI: 10.1038/s41746-025-01464-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 01/15/2025] [Indexed: 01/30/2025] Open
Abstract
Accurately predicting binding affinities between drugs and targets is crucial for drug discovery but remains challenging due to the complexity of modeling interactions between small drug and large targets. This study proposes DMFF-DTA, a dual-modality neural network model integrates sequence and graph structure information from drugs and proteins for drug-target affinity prediction. The model introduces a binding site-focused graph construction approach to extract binding information, enabling more balanced and efficient modeling of drug-target interactions. Comprehensive experiments demonstrate DMFF-DTA outperforms state-of-the-art methods with significant improvements. The model exhibits excellent generalization capabilities on completely unseen drugs and targets, achieving an improvement of over 8% compared to existing methods. Model interpretability analysis validates the biological relevance of the model. A case study in pancreatic cancer drug repurposing demonstrates its practical utility. This work provides an interpretable, robust approach to integrate multi-view drug and protein features for advancing computational drug discovery.
Collapse
Affiliation(s)
- Haohuai He
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Guanxing Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Zhenchao Tang
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China
- Artificial Intelligence Medical Research Center, School of Intelligent Systems Engineering, Sun Yat-sen University, Shenzhen, 510275, China
| | - Calvin Yu-Chian Chen
- State Key Laboratory of Chemical Oncogenomics, Key Laboratory of Chemical Genomics, School of Chemical Biology and Biotechnology, Peking University Shenzhen Graduate School, Shenzhen, 518055, China.
- Department of Medical Research, China Medical University Hospital, Taichung, 40447, Taiwan.
- Department of Bioinformatics and Medical Engineering, Asia University, Taichung, 41354, Taiwan.
| |
Collapse
|
5
|
Grover A, Singh S, Sindhu S, Lath A, Kumar S. Advances in cyclotide research: bioactivity to cyclotide-based therapeutics. Mol Divers 2025:10.1007/s11030-025-11113-w. [PMID: 39862350 DOI: 10.1007/s11030-025-11113-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2024] [Accepted: 01/07/2025] [Indexed: 01/27/2025]
Abstract
Cyclotides are a class of plant-derived cyclic peptides having a distinctive structure with a cyclic cystine knot (CCK) motif. They are stable molecules that naturally play a role in plant defense. Till date, more than 750 cyclotides have been reported among diverse plant taxa belonging to Cucurbitaceae, Violaceae, Rubiaceae, Solanaceae, and Fabaceae. These native cyclotides exhibit several bioactivities, such as anti-bacterial, anti-HIV, anti-fungal, pesticidal, cytotoxic, and hemolytic activities which have immense significance in agriculture and therapeutics. The general mode of action of cyclotides is related to their structure, where their hydrophobic face penetrates the cell membrane and disrupts it to exhibit anti-microbial, cytotoxic, or hemolytic activities. Thus, the structure-activity relationship is of significance in cyclotides. Further, owing to their, small size, stability, and potential to interact and cross the membrane barrier of cells, they make promising choices for developing peptide-based biologics. However, challenges, such as production complexity, pharmacokinetic limitations, and off-target effects hinder their development. Advancements in cyclotide engineering, such as peptide grafting, ligand conjugation, and nanocarrier integration, heterologous production along with computational design optimization, can help overcome these challenges. Given the potential of these cyclic peptides, the present review focuses on the diversity, bioactivities, and structure-activity relationships of cyclotides, and advancements in cyclotides engineering emphasizing their unique attributes for diverse medical and biotechnological applications.
Collapse
Affiliation(s)
- Ankita Grover
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Sawraj Singh
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India
| | - Sonal Sindhu
- Department of Medical Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Amit Lath
- Department of Biotechnology, Maharshi Dayanand University, Rohtak, Haryana, India
| | - Sanjay Kumar
- Department of Microbiology, Maharshi Dayanand University, Rohtak, Haryana, 124001, India.
| |
Collapse
|
6
|
Li Z, Zeng Y, Jiang M, Wei B. Deep Drug-Target Binding Affinity Prediction Base on Multiple Feature Extraction and Fusion. ACS OMEGA 2025; 10:2020-2032. [PMID: 39866608 PMCID: PMC11755178 DOI: 10.1021/acsomega.4c08048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/25/2024] [Accepted: 01/03/2025] [Indexed: 01/28/2025]
Abstract
Accurate drug-target binding affinity (DTA) prediction is crucial in drug discovery. Recently, deep learning methods for DTA prediction have made significant progress. However, there are still two challenges: (1) recent models always ignore the correlations in drug and target data in the drug/target representation process and (2) the interaction learning of drug-target pairs always is by simple concatenation, which is insufficient to explore their fusion. To overcome these challenges, we propose an end-to-end sequence-based model called BTDHDTA. In the feature extraction process, the bidirectional gated recurrent unit (GRU), transformer encoder, and dilated convolution are employed to extract global, local, and their correlation patterns of drug and target input. Additionally, a module combining convolutional neural networks with a Highway connection is introduced to fuse drug and protein deep features. We evaluate the performance of BTDHDTA on three benchmark data sets (Davis, KIBA, and Metz), demonstrating its superiority over several current state-of-the-art methods in key metrics such as Mean Squared Error (MSE), Concordance Index (CI), and Regression toward the mean (R m 2). The results indicate that our method achieves a better performance in DTA prediction. In the case study, we use the BTDHDTA model to predict the binding affinities between 3137 FDA-approved drugs and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) replication-related proteins, validating the model's effectiveness in practical scenarios.
Collapse
Affiliation(s)
- Zepeng Li
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Yuni Zeng
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Mingfeng Jiang
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Bo Wei
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
- Longgang
Research Institute, Zhejiang Sci-Tech University, Longgang 325000, Zhejiang, China
| |
Collapse
|
7
|
Deng M, Wang J, Zhao Y, Zhao Y, Cao H, Wang Z. Predicting drug and target interaction with dilated reparameterize convolution. Sci Rep 2025; 15:2579. [PMID: 39833385 PMCID: PMC11747116 DOI: 10.1038/s41598-025-86918-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 01/22/2025] Open
Abstract
Predicting drug-target interaction (DTI) stands as a pivotal and formidable challenge in pharmaceutical research. Many existing deep learning methods only learn the high-dimensional representation of ligands and targets on a small scale. However, it is difficult for the model to obtain the potential law of combining pockets or multiple binding sites on a large scale. To address this lacuna, we designed a large-kernel convolutional block for extracting large-scale sequence information and proposed a novel DTI prediction framework, named Rep-ConvDTI. The reparameterization method is introduced to help large-kernel convolutions capture small-scale information. We have also developed a gated attention mechanism to more efficiently characterize the interaction of drugs and targets. Extensive experiments demonstrate that Rep-ConvDTI achieves the most competitive performance against state-of-the-art baselines on the three benchmark datasets. Furthermore, we validated the potential of Rep-ConvDTI as a drug screening tool through model interpretative studies and drug screening experiments with cystathionine-β-synthase.
Collapse
Affiliation(s)
- Moping Deng
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Jian Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yiming Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Yongjia Zhao
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China
| | - Hao Cao
- School of Life Science and Biopharmaceutics, Shenyang Pharmaceutical University, 103 Wenhua Road, Shenyang, 110016, Liaoning Province, China
| | - Zhuo Wang
- Shenyang Institute of Automation, Chinese Academy of Science, Shenyang, 110016, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
| |
Collapse
|
8
|
Wang X, Zhou J, Mueller J, Quinn D, Carvalho A, Moody TS, Huang M. BioStructNet: Structure-Based Network with Transfer Learning for Predicting Biocatalyst Functions. J Chem Theory Comput 2025; 21:474-490. [PMID: 39705058 PMCID: PMC11736791 DOI: 10.1021/acs.jctc.4c01391] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2024] [Revised: 12/03/2024] [Accepted: 12/12/2024] [Indexed: 12/21/2024]
Abstract
Enzyme-substrate interactions are essential to both biological processes and industrial applications. Advanced machine learning techniques have significantly accelerated biocatalysis research, revolutionizing the prediction of biocatalytic activities and facilitating the discovery of novel biocatalysts. However, the limited availability of data for specific enzyme functions, such as conversion efficiency and stereoselectivity, presents challenges for prediction accuracy. In this study, we developed BioStructNet, a structure-based deep learning network that integrates both protein and ligand structural data to capture the complexity of enzyme-substrate interactions. Benchmarking studies with different algorithms showed the enhanced predictive accuracy of BioStructNet. To further optimize the prediction accuracy for the small data set, we implemented transfer learning in the framework, training a source model on a large data set and fine-tuning it on a small, function-specific data set, using the CalB data set as a case study. The model performance was validated by comparing the attention heat maps generated by the BioStructNet interaction module with the enzyme-substrate interactions revealed from molecular dynamics simulations of enzyme-substrate complexes. BioStructNet would accelerate the discovery of functional enzymes for industrial use, particularly in cases where the training data sets for machine learning are small.
Collapse
Affiliation(s)
- Xiangwen Wang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Jiahui Zhou
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
| | - Jane Mueller
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Derek Quinn
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Alexandra Carvalho
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
| | - Thomas S. Moody
- Department
of Biocatalysis and Isotope Chemistry, Almac
Sciences, BT63 5QD Craigavon, Northern
Ireland, U.K.
- Arran
Chemical Company Limited, Unit 1 Monksland Industrial Estate, Athlone, Co. Roscommon N37 DN24, Ireland
| | - Meilan Huang
- School
of Chemistry and Chemical Engineering, Queen’s
University Belfast, BT9 5AG Belfast, Northern Ireland, U.K.
| |
Collapse
|
9
|
Chen X, Wang T, Guo T, Guo K, Zhou J, Li H, Song Z, Gao X, Zhang X. Unveiling the power of language models in chemical research question answering. Commun Chem 2025; 8:4. [PMID: 39757259 DOI: 10.1038/s42004-024-01394-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 12/12/2024] [Indexed: 01/07/2025] Open
Abstract
While the abilities of language models are thoroughly evaluated in areas like general domains and biomedicine, academic chemistry remains less explored. Chemical QA tools also play a crucial role in both education and research by effectively translating complex chemical information into an understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers are reasoned out based on the corresponding abstracts. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a ChemMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. Experiments show that Large Language Models (LLMs) still have significant room for improvement in the field of chemistry. Moreover, ChemMatch significantly outperforms recent similar-scale baselines: https://github.com/iriscxy/chemmatch .
Collapse
Affiliation(s)
- Xiuying Chen
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE.
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia.
| | - Tairan Wang
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
| | | | - Kehan Guo
- University of Notre Dame, Notre Dame, IN, USA
| | - Juexiao Zhou
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
| | - Haoyang Li
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
| | - Zirui Song
- Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, UAE
| | - Xin Gao
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
| | - Xiangliang Zhang
- King Abdullah University of Science and Technology, Jeddah, Saudi Arabia
- University of Notre Dame, Notre Dame, IN, USA
| |
Collapse
|
10
|
Yang Z, Zhong W, Lv Q, Dong T, Chen G, Chen CYC. Interaction-Based Inductive Bias in Graph Neural Networks: Enhancing Protein-Ligand Binding Affinity Predictions From 3D Structures. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2024; 46:8191-8208. [PMID: 38739515 DOI: 10.1109/tpami.2024.3400515] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Inductive bias in machine learning (ML) is the set of assumptions describing how a model makes predictions. Different ML-based methods for protein-ligand binding affinity (PLA) prediction have different inductive biases, leading to different levels of generalization capability and interpretability. Intuitively, the inductive bias of an ML-based model for PLA prediction should fit in with biological mechanisms relevant for binding to achieve good predictions with meaningful reasons. To this end, we propose an interaction-based inductive bias to restrict neural networks to functions relevant for binding with two assumptions: 1) A protein-ligand complex can be naturally expressed as a heterogeneous graph with covalent and non-covalent interactions; 2) The predicted PLA is the sum of pairwise atom-atom affinities determined by non-covalent interactions. The interaction-based inductive bias is embodied by an explainable heterogeneous interaction graph neural network (EHIGN) for explicitly modeling pairwise atom-atom interactions to predict PLA from 3D structures. Extensive experiments demonstrate that EHIGN achieves better generalization capability than other state-of-the-art ML-based baselines in PLA prediction and structure-based virtual screening. More importantly, comprehensive analyses of distance-affinity, pose-affinity, and substructure-affinity relations suggest that the interaction-based inductive bias can guide the model to learn atomic interactions that are consistent with physical reality. As a case study to demonstrate practical usefulness, our method is tested for predicting the efficacy of Nirmatrelvir against SARS-CoV-2 variants. EHIGN successfully recognizes the changes in the efficacy of Nirmatrelvir for different SARS-CoV-2 variants with meaningful reasons.
Collapse
|
11
|
Luo Z, Wu W, Sun Q, Wang J. Accurate and transferable drug-target interaction prediction with DrugLAMP. Bioinformatics 2024; 40:btae693. [PMID: 39570605 PMCID: PMC11629708 DOI: 10.1093/bioinformatics/btae693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2024] [Revised: 10/29/2024] [Accepted: 11/14/2024] [Indexed: 11/22/2024] Open
Abstract
MOTIVATION Accurate prediction of drug-target interactions (DTIs), especially for novel targets or drugs, is crucial for accelerating drug discovery. Recent advances in pretrained language models (PLMs) and multi-modal learning present new opportunities to enhance DTI prediction by leveraging vast unlabeled molecular data and integrating complementary information from multiple modalities. RESULTS We introduce DrugLAMP (PLM-assisted multi-modal prediction), a PLM-based multi-modal framework for accurate and transferable DTI prediction. DrugLAMP integrates molecular graph and protein sequence features extracted by PLMs and traditional feature extractors. We introduce two novel multi-modal fusion modules: (i) pocket-guided co-attention (PGCA), which uses protein pocket information to guide the attention mechanism on drug features, and (ii) paired multi-modal attention (PMMA), which enables effective cross-modal interactions between drug and protein features. These modules work together to enhance the model's ability to capture complex drug-protein interactions. Moreover, the contrastive compound-protein pre-training (2C2P) module enhances the model's generalization to real-world scenarios by aligning features across modalities and conditions. Comprehensive experiments demonstrate DrugLAMP's state-of-the-art performance on both standard benchmarks and challenging settings simulating real-world drug discovery, where test drugs/targets are unseen during training. Visualizations of attention maps and application to predict cryptic pockets and drug side effects further showcase DrugLAMP's strong interpretability and generalizability. Ablation studies confirm the contributions of the proposed modules. AVAILABILITY AND IMPLEMENTATION Source code and datasets are freely available at https://github.com/Lzcstan/DrugLAMP. All data originate from public sources.
Collapse
Affiliation(s)
- Zhengchao Luo
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| | - Wei Wu
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| | - Qichen Sun
- School of Mathematical Sciences, Peking University, Beijing 100871, China
| | - Jinzhuo Wang
- Department of Big Data and Biomedical AI, College of Future Technology, Peking University, Beijing 100871, China
| |
Collapse
|
12
|
Li F, Ackloo S, Arrowsmith CH, Ban F, Barden CJ, Beck H, Beránek J, Berenger F, Bolotokova A, Bret G, Breznik M, Carosati E, Chau I, Chen Y, Cherkasov A, Corte DD, Denzinger K, Dong A, Draga S, Dunn I, Edfeldt K, Edwards A, Eguida M, Eisenhuth P, Friedrich L, Fuerll A, Gardiner SS, Gentile F, Ghiabi P, Gibson E, Glavatskikh M, Gorgulla C, Guenther J, Gunnarsson A, Gusev F, Gutkin E, Halabelian L, Harding RJ, Hillisch A, Hoffer L, Hogner A, Houliston S, Irwin JJ, Isayev O, Ivanova A, Jacquemard C, Jarrett AJ, Jensen JH, Kireev D, Kleber J, Koby SB, Koes D, Kumar A, Kurnikova MG, Kutlushina A, Lessel U, Liessmann F, Liu S, Lu W, Meiler J, Mettu A, Minibaeva G, Moretti R, Morris CJ, Narangoda C, Noonan T, Obendorf L, Pach S, Pandit A, Perveen S, Poda G, Polishchuk P, Puls K, Pütter V, Rognan D, Roskams-Edris D, Schindler C, Sindt F, Spiwok V, Steinmann C, Stevens RL, Talagayev V, Tingey D, Vu O, Walters WP, Wang X, Wang Z, Wolber G, Wolf CA, Wortmann L, Zeng H, Zepeda CA, Zhang KYJ, Zhang J, Zheng S, Schapira M. CACHE Challenge #1: Targeting the WDR Domain of LRRK2, A Parkinson's Disease Associated Protein. J Chem Inf Model 2024; 64:8521-8536. [PMID: 39499532 DOI: 10.1021/acs.jcim.4c01267] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2024]
Abstract
The CACHE challenges are a series of prospective benchmarking exercises to evaluate progress in the field of computational hit-finding. Here we report the results of the inaugural CACHE challenge in which 23 computational teams each selected up to 100 commercially available compounds that they predicted would bind to the WDR domain of the Parkinson's disease target LRRK2, a domain with no known ligand and only an apo structure in the PDB. The lack of known binding data and presumably low druggability of the target is a challenge to computational hit finding methods. Of the 1955 molecules predicted by participants in Round 1 of the challenge, 73 were found to bind to LRRK2 in an SPR assay with a KD lower than 150 μM. These 73 molecules were advanced to the Round 2 hit expansion phase, where computational teams each selected up to 50 analogs. Binding was observed in two orthogonal assays for seven chemically diverse series, with affinities ranging from 18 to 140 μM. The seven successful computational workflows varied in their screening strategies and techniques. Three used molecular dynamics to produce a conformational ensemble of the targeted site, three included a fragment docking step, three implemented a generative design strategy and five used one or more deep learning steps. CACHE #1 reflects a highly exploratory phase in computational drug design where participants adopted strikingly diverging screening strategies. Machine learning-accelerated methods achieved similar results to brute force (e.g., exhaustive) docking. First-in-class, experimentally confirmed compounds were rare and weakly potent, indicating that recent advances are not sufficient to effectively address challenging targets.
Collapse
Affiliation(s)
- Fengling Li
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Suzanne Ackloo
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Cheryl H Arrowsmith
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Medical Biophysics, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2C4, Canada
| | - Fuqiang Ban
- Vancouver Prostate Centre, University of British Columbia, 2660 Oak Street, Vancouver, British Columbia V6H 3Z6, Canada
| | - Christopher J Barden
- Treventis Corporation, Toronto, Ontario M5T 0S8, Canada
- University Health Network, Toronto, Ontario M5G 2C4, Canada
| | - Hartmut Beck
- Bayer AG, Drug Discovery Sciences, 42096 Wuppertal, Germany
| | - Jan Beránek
- Department of Biochemistry and Microbiology, University of Chemistry and Technology, Technická 5, 16628 Prague Czech Republic
| | - Francois Berenger
- Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwa-no-ha, Kashiwa, Chiba 277-8561, Japan
| | - Albina Bolotokova
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Guillaume Bret
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400 Illkirch, France
| | - Marko Breznik
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Emanuele Carosati
- Department of Chemical and Pharmaceutical Sciences, University of Trieste, 34127 Trieste, Italy
| | - Irene Chau
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Yu Chen
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Artem Cherkasov
- Vancouver Prostate Centre, University of British Columbia, 2660 Oak Street, Vancouver, British Columbia V6H 3Z6, Canada
| | - Dennis Della Corte
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Katrin Denzinger
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Aiping Dong
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Sorin Draga
- Virtual Discovery, Inc., Boston, Massachusetts 02108, United States
- Non-Governmental Research Organization Biologic, 14 Schitului Street, Bucharest 032044, Romania
| | - Ian Dunn
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Kristina Edfeldt
- Structural Genomics Consortium, Department of Medicine, Karolinska University Hospital and Karolinska Institutet, 171 76 Stockholm, Sweden
| | - Aled Edwards
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Conscience Medicines Network, Toronto, Ontario M5G 1L7 Canada
| | - Merveille Eguida
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400 Illkirch, France
| | - Paul Eisenhuth
- Institute for Drug Discovery, Medical Faculty, Leipzig University, Leipzig, Saxony 04103, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Saxony 04105, Germany
| | - Lukas Friedrich
- Computational Drug Design, Merck KGaA, 64293 Darmstadt Germany
| | - Alexander Fuerll
- Institute for Drug Discovery, Medical Faculty, Leipzig University, Leipzig, Saxony 04103, Germany
| | - Spencer S Gardiner
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Francesco Gentile
- Vancouver Prostate Centre, University of British Columbia, 2660 Oak Street, Vancouver, British Columbia V6H 3Z6, Canada
- Department of Chemistry and Biomolecular Sciences, University of Ottawa, Ottawa, Ontario K1N 6N5, Canada
- Ottawa Institute of Systems Biology, University of Ottawa, K1H 8M5 Ottawa, Ontario Canada
| | - Pegah Ghiabi
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Elisa Gibson
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Marta Glavatskikh
- University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, United States
| | - Christoph Gorgulla
- St. Jude Children's Research Hospital, Memphis, Tennessee 38105, United States
- Department of Physics, Harvard University, Cambridge, Massachusetts 02138, United States
| | | | - Anders Gunnarsson
- Structure and Biophysics, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Pepparedsleden 1, Mölndal 431 50, Sweden
| | - Filipp Gusev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Evgeny Gutkin
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Levon Halabelian
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| | - Rachel J Harding
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, Ontario M5S 3M2, Canada
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| | | | - Laurent Hoffer
- Drug Discovery, Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada
| | - Anders Hogner
- Medicinal Chemistry, Research and Early Development, Cardiovascular, Renal and Metabolism (CVRM), BioPharmaceuticals R&D, AstraZeneca, 431 50 Gothenburg Sweden
| | - Scott Houliston
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario M5G 2C4, Canada
| | - John J Irwin
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158, United States
| | - Olexandr Isayev
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Aleksandra Ivanova
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Hnevotinska 5, 77900 Olomouc Czech Republic
| | - Celien Jacquemard
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400 Illkirch, France
| | - Austin J Jarrett
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen, Universitetsparken 5, 2100 Copenhagen, Denmark
| | - Dmitri Kireev
- Department of Chemistry, University of Missouri, Columbia, Missouri 65211-7600, United States
| | - Julian Kleber
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - S Benjamin Koby
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - David Koes
- Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15261, United States
| | - Ashutosh Kumar
- Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | - Maria G Kurnikova
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Alina Kutlushina
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Hnevotinska 5, 77900 Olomouc Czech Republic
| | - Uta Lessel
- Boehringer Ingelheim Pharma GmbH & Co. KG, 88400 Biberach an der Riss, Germany
| | - Fabian Liessmann
- Institute for Drug Discovery, Medical Faculty, Leipzig University, Leipzig, Saxony 04103, Germany
| | - Sijie Liu
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Wei Lu
- Galixir Technologies, 200100 Shanghai, China
| | - Jens Meiler
- Institute for Drug Discovery, Medical Faculty, Leipzig University, Leipzig, Saxony 04103, Germany
- Center for Scalable Data Analytics and Artificial Intelligence, Leipzig University, Leipzig, Saxony 04105, Germany
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Akhila Mettu
- Department of Chemistry, University of Missouri, Columbia, Missouri 65211-7600, United States
| | - Guzel Minibaeva
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Hnevotinska 5, 77900 Olomouc Czech Republic
| | - Rocco Moretti
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
| | - Connor J Morris
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Chamali Narangoda
- Department of Chemistry, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, United States
| | - Theresa Noonan
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Leon Obendorf
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Szymon Pach
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Amit Pandit
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Sumera Perveen
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | - Gennady Poda
- Drug Discovery, Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada
- Leslie Dan Faculty of Pharmacy, University of Toronto, Toronto, Ontario M5S 3M2, Canada
| | - Pavel Polishchuk
- Institute of Molecular and Translational Medicine, Faculty of Medicine and Dentistry, Palacky University in Olomouc, Hnevotinska 5, 77900 Olomouc Czech Republic
| | - Kristina Puls
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | | | - Didier Rognan
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400 Illkirch, France
| | | | | | - François Sindt
- Laboratoire d'innovation thérapeutique, UMR7200 CNRS-Université de Strasbourg, F-67400 Illkirch, France
| | - Vojtěch Spiwok
- Department of Biochemistry and Microbiology, University of Chemistry and Technology, Technická 5, 16628 Prague Czech Republic
| | - Casper Steinmann
- Department of Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220, Aalborg, Denmark
| | - Rick L Stevens
- Department of Computer Science, University of Chicago, Chicago, Illinois 60637, United States
- Argonne National Laboratory, Lemont, Illinois 60439, United States
| | - Valerij Talagayev
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Damon Tingey
- Department of Physics and Astronomy, Brigham Young University, Provo, Utah 84602, United States
| | - Oanh Vu
- Center for Structural Biology, Vanderbilt University, Nashville, Tennessee 37235, United States
| | | | - Xiaowen Wang
- Department of Chemistry, University of Missouri, Columbia, Missouri 65211-7600, United States
| | - Zhenyu Wang
- Galixir Technologies, 200100 Shanghai, China
- Global Institute of Future Technology, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Gerhard Wolber
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Clemens Alexander Wolf
- Computational Molecular Design, Institute of Pharmacy, Freie Universitaet Berlin, Koenigin-Luisestr. 2 + 4, 14195 Berlin, Germany
| | - Lars Wortmann
- Boehringer Ingelheim Pharma GmbH & Co. KG, 88400 Biberach an der Riss, Germany
| | - Hong Zeng
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
| | | | - Kam Y J Zhang
- Center for Biosystems Dynamics Research, RIKEN, 1-7-22 Suehiro, Tsurumi, Yokohama, Kanagawa 230-0045, Japan
| | | | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, 200240 Shanghai, China
| | - Matthieu Schapira
- Structural Genomics Consortium, University of Toronto, Toronto, Ontario M5G 1L7, Canada
- Department of Pharmacology & Toxicology, University of Toronto, Toronto, Ontario, M5S 1A8, Canada
| |
Collapse
|
13
|
Wang Y, Ji J, Yao Y, Nie J, Xie F, Xie Y, Li G. Current status and challenges of model-informed drug discovery and development in China. Adv Drug Deliv Rev 2024; 214:115459. [PMID: 39389423 DOI: 10.1016/j.addr.2024.115459] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 08/18/2024] [Accepted: 10/04/2024] [Indexed: 10/12/2024]
Abstract
In the past decade, biopharmaceutical research and development in China has been notably boosted by government policies, regulatory initiatives and increasing investments in life sciences. With regulatory agency acting as a strong driver, model-informed drug development (MIDD) is transitioning rapidly from an academic pursuit to a critical component of innovative drug discovery and development within the country. In this article, we provided a cross-sectional summary on the current status of MIDD implementations across early and late-stage drug development in China, illustrated by case examples. We also shared insights into regulatory policy development and decision-making. Various modeling and simulation approaches were presented across a range of applications. Furthermore, the challenges and opportunities of MIDD in China were discussed and compared with other regions where these practices have a more established history. Through this analysis, we highlighted the potential of MIDD to enhance drug development efficiency and effectiveness in China's evolving pharmaceutical landscape.
Collapse
Affiliation(s)
- Yuzhu Wang
- Center for Drug Evaluation, National Medicine Products Administration, China
| | - Jia Ji
- Johnson & Johnson Innovative Medicine, Beijing, China
| | - Ye Yao
- Certara (Shanghai) Pharmaceutical Consulting Co., Ltd, Shanghai, China
| | - Jing Nie
- Abbisko Therapeutics Co., Ltd, Shanghai, China
| | - Fengbo Xie
- School of Data Science and Technology, North University of China, Taiyuan, China
| | - Yehua Xie
- Certara (Shanghai) Pharmaceutical Consulting Co., Ltd, Shanghai, China
| | - Gailing Li
- Certara (Shanghai) Pharmaceutical Consulting Co., Ltd, Shanghai, China.
| |
Collapse
|
14
|
Son H, Lee S, Kim J, Park H, Hwang MH, Yi GS. BASE: a web service for providing compound-protein binding affinity prediction datasets with reduced similarity bias. BMC Bioinformatics 2024; 25:340. [PMID: 39478454 PMCID: PMC11526688 DOI: 10.1186/s12859-024-05968-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Accepted: 10/23/2024] [Indexed: 11/02/2024] Open
Abstract
BACKGROUND Deep learning-based drug-target affinity (DTA) prediction methods have shown impressive performance, despite a high number of training parameters relative to the available data. Previous studies have highlighted the presence of dataset bias by suggesting that models trained solely on protein or ligand structures may perform similarly to those trained on complex structures. However, these studies did not propose solutions and focused solely on analyzing complex structure-based models. Even when ligands are excluded, protein-only models trained on complex structures still incorporate some ligand information at the binding sites. Therefore, it is unclear whether binding affinity can be accurately predicted using only compound or protein features due to potential dataset bias. In this study, we expanded our analysis to comprehensive databases and investigated dataset bias through compound and protein feature-based methods using multilayer perceptron models. We assessed the impact of this bias on current prediction models and proposed the binding affinity similarity explorer (BASE) web service, which provides bias-reduced datasets. RESULTS By analyzing eight binding affinity databases using multilayer perceptron models, we confirmed a bias where the compound-protein binding affinity can be accurately predicted using compound features alone. This bias arises because most compounds show consistent binding affinities due to high sequence or functional similarity among their target proteins. Our Uniform Manifold Approximation and Projection analysis based on compound fingerprints further revealed that low and high variation compounds do not exhibit significant structural differences. This suggests that the primary factor driving the consistent binding affinities is protein similarity rather than compound structure. We addressed this bias by creating datasets with progressively reduced protein similarity between the training and test sets, observing significant changes in model performance. We developed the BASE web service to allow researchers to download and utilize these datasets. Feature importance analysis revealed that previous models heavily relied on protein features. However, using bias-reduced datasets increased the importance of compound and interaction features, enabling a more balanced extraction of key features. CONCLUSIONS We propose the BASE web service, providing both the affinity prediction results of existing models and bias-reduced datasets. These resources contribute to the development of generalized and robust predictive models, enhancing the accuracy and reliability of DTA predictions in the drug discovery process. BASE is freely available online at https://synbi2024.kaist.ac.kr/base .
Collapse
Affiliation(s)
- Hyojin Son
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Sechan Lee
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Jaeuk Kim
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Haangik Park
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Myeong-Ha Hwang
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
| | - Gwan-Su Yi
- Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea.
| |
Collapse
|
15
|
Quan L, Wu J, Jiang Y, Pan D, Qiang L. DTA-GTOmega: Enhancing Drug-Target Binding Affinity Prediction with Graph Transformers Using OmegaFold Protein Structures. J Mol Biol 2024:168843. [PMID: 39481634 DOI: 10.1016/j.jmb.2024.168843] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 10/05/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024]
Abstract
Understanding drug-protein interactions is crucial for elucidating drug mechanisms and optimizing drug development. However, existing methods have limitations in representing the three-dimensional structure of targets and capturing the complex relationships between drugs and targets. This study proposes a new method, DTA-GTOmega, for predicting drug-target binding affinity. DTA-GTOmega utilizes OmegaFold to predict protein three-dimensional structure and construct target graphs, while processing drug SMILES sequences with RDKit to generate drug graphs. By employing multi-layer graph transformer modules and co-attention modules, this method effectively integrates atomic-level features of drugs and residue-level features of targets, accurately modeling the complex interactions between drugs and targets, thereby significantly improving the accuracy of binding affinity predictions. Our method outperforms existing techniques on benchmark datasets such as KIBA, Davis, and BindingDB_Kd under cold-start setting. Moreover, DTA-GTOmega demonstrates competitive performance in real-world DTI scenarios involving DrugBank data and drug-target interactions related to cardiovascular and nervous system-related diseases, highlighting its robust generalization capabilities. Additionally, the introduced DTI evaluation metrics further validate DTA-GTOmega's potential in handling imbalanced data.
Collapse
Affiliation(s)
- Lijun Quan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China
| | - Jian Wu
- China Mobile (Suzhou) Software Technology Co., Ltd., Suzhou 215000, China
| | - Yelu Jiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Deng Pan
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China
| | - Lyu Qiang
- School of Computer Science and Technology, Soochow University, Jiangsu 215006, China; Collaborative Innovation Center of Novel Software Technology and Industrialization, Jiangsu 210000, China.
| |
Collapse
|
16
|
Tao W, Lin X, Liu Y, Zeng L, Ma T, Cheng N, Jiang J, Zeng X, Yuan S. Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction. BMC Biol 2024; 22:248. [PMID: 39468510 PMCID: PMC11520867 DOI: 10.1186/s12915-024-02049-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Accepted: 10/17/2024] [Indexed: 10/30/2024] Open
Abstract
BACKGROUND Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. RESULTS Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. CONCLUSIONS Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction.
Collapse
Affiliation(s)
- Wen Tao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xuan Lin
- School of Computer Science, Xiangtan University, Xiangtan, 411105, Hunan, China
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China.
- Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105, Hunan, China.
| | - Li Zeng
- Department of AIDD, Shanghai Yuyao Biotechnology Co., Ltd., Shanghai, 201109, Shanghai, China
| | - Tengfei Ma
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Ning Cheng
- School of Informatics, Hunan University of Chinese Medicine, Changsha, 410208, Hunan, China
| | - Jing Jiang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, Hunan, China
| | - Sisi Yuan
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, 28223, NC, USA.
| |
Collapse
|
17
|
Zhao L, Wang H, Shi S. PocketDTA: an advanced multimodal architecture for enhanced prediction of drug-target affinity from 3D structural data of target binding pockets. Bioinformatics 2024; 40:btae594. [PMID: 39365726 PMCID: PMC11502498 DOI: 10.1093/bioinformatics/btae594] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2024] [Revised: 09/20/2024] [Accepted: 10/02/2024] [Indexed: 10/06/2024] Open
Abstract
MOTIVATION Accurately predicting the drug-target binding affinity (DTA) is crucial to drug discovery and repurposing. Although deep learning has been widely used in this field, it still faces challenges with insufficient generalization performance, inadequate use of 3D information, and poor interpretability. RESULTS To alleviate these problems, we developed the PocketDTA model. This model enhances the generalization performance by pre-trained models ESM-2 and GraphMVP. It ingeniously handles the first 3 (top-3) target binding pockets and drug 3D information through customized GVP-GNN Layers and GraphMVP-Decoder. In addition, it uses a bilinear attention network to enhance interpretability. Comparative analysis with state-of-the-art (SOTA) methods on the optimized Davis and KIBA datasets reveals that the PocketDTA model exhibits significant performance advantages. Further, ablation studies confirm the effectiveness of the model components, whereas cold-start experiments illustrate its robust generalization capabilities. In particular, the PocketDTA model has shown significant advantages in identifying key drug functional groups and amino acid residues via molecular docking and literature validation, highlighting its strong potential for interpretability. AVAILABILITY AND IMPLEMENTATION Code and data are available at: https://github.com/zhaolongNCU/PocketDTA.
Collapse
Affiliation(s)
- Long Zhao
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Hongmei Wang
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
| | - Shaoping Shi
- Department of Mathematics, School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330031, China
- Institute of Mathematics and Interdisciplinary Sciences, Nanchang University, Nanchang 330031, China
| |
Collapse
|
18
|
Ahmed KT, Ansari MI, Zhang W. DTI-LM: language model powered drug-target interaction prediction. Bioinformatics 2024; 40:btae533. [PMID: 39221997 PMCID: PMC11520403 DOI: 10.1093/bioinformatics/btae533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 08/05/2024] [Accepted: 08/29/2024] [Indexed: 09/04/2024] Open
Abstract
MOTIVATION The identification and understanding of drug-target interactions (DTIs) play a pivotal role in the drug discovery and development process. Sequence representations of drugs and proteins in computational model offer advantages such as their widespread availability, easier input quality control, and reduced computational resource requirements. These make them an efficient and accessible tools for various computational biology and drug discovery applications. Many sequence-based DTI prediction methods have been developed over the years. Despite the advancement in methodology, cold start DTI prediction involving unknown drug or protein remains a challenging task, particularly for sequence-based models. Introducing DTI-LM, a novel framework leveraging advanced pretrained language models, we harness their exceptional context-capturing abilities along with neighborhood information to predict DTIs. DTI-LM is specifically designed to rely solely on sequence representations for drugs and proteins, aiming to bridge the gap between warm start and cold start predictions. RESULTS Large-scale experiments on four datasets show that DTI-LM can achieve state-of-the-art performance on DTI predictions. Notably, it excels in overcoming the common challenges faced by sequence-based models in cold start predictions for proteins, yielding impressive results. The incorporation of neighborhood information through a graph attention network further enhances prediction accuracy. Nevertheless, a disparity persists between cold start predictions for proteins and drugs. A detailed examination of DTI-LM reveals that language models exhibit contrasting capabilities in capturing similarities between drugs and proteins. AVAILABILITY AND IMPLEMENTATION Source code is available at: https://github.com/compbiolabucf/DTI-LM.
Collapse
Affiliation(s)
- Khandakar Tanvir Ahmed
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Md Istiaq Ansari
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| | - Wei Zhang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, United States
- Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, United States
| |
Collapse
|
19
|
Zhang L, Zeng W, Chen J, Chen J, Li K. ParaCPI: A Parallel Graph Convolutional Network for Compound-Protein Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:1565-1578. [PMID: 38787671 DOI: 10.1109/tcbb.2024.3404889] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2024]
Abstract
Identifying compound-protein interactions (CPIs) is critical in drug discovery, as accurate prediction of CPIs can remarkably reduce the time and cost of new drug development. The rapid growth of existing biological knowledge has opened up possibilities for leveraging known biological knowledge to predict unknown CPIs. However, existing CPI prediction models still fall short of meeting the needs of practical drug discovery applications. A novel parallel graph convolutional network model for CPI prediction (ParaCPI) is proposed in this study. This model constructs feature representation of compounds using a unique approach to predict unknown CPIs from known CPI data more effectively. Experiments are conducted on five public datasets, and the results are compared with current state-of-the-art (SOTA) models under three different experimental settings to evaluate the model's performance. In the three cold-start settings, ParaCPI achieves an average performance gain of 26.75%, 23.84%, and 14.68% in terms of area under the curve compared with the other SOTA models. In addition, the results of the experiments in the case study show ParaCPI's superior ability to predict unknown CPIs based on known data, with higher accuracy and stronger generalization compared with the SOTA models. Researchers can leverage ParaCPI to accelerate the drug discovery process.
Collapse
|
20
|
Peng L, Liu X, Chen M, Liao W, Mao J, Zhou L. MGNDTI: A Drug-Target Interaction Prediction Framework Based on Multimodal Representation Learning and the Gating Mechanism. J Chem Inf Model 2024; 64:6684-6698. [PMID: 39137398 DOI: 10.1021/acs.jcim.4c00957] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2024]
Abstract
Drug-Target Interaction (DTI) prediction facilitates acceleration of drug discovery and promotes drug repositioning. Most existing deep learning-based DTI prediction methods can better extract discriminative features for drugs and proteins, but they rarely consider multimodal features of drugs. Moreover, learning the interaction representations between drugs and targets needs further exploration. Here, we proposed a simple M ulti-modal G ating N etwork for DTI prediction, MGNDTI, based on multimodal representation learning and the gating mechanism. MGNDTI first learns the sequence representations of drugs and targets using different retentive networks. Next, it extracts molecular graph features of drugs through a graph convolutional network. Subsequently, it devises a multimodal gating network to obtain the joint representations of drugs and targets. Finally, it builds a fully connected network for computing the interaction probability. MGNDTI was benchmarked against seven state-of-the-art DTI prediction models (CPI-GNN, TransformerCPI, MolTrans, BACPI, CPGL, GIFDTI, and FOTF-CPI) using four data sets (i.e., Human, C. elegans, BioSNAP, and BindingDB) under four different experimental settings. Through evaluation with AUROC, AUPRC, accuracy, F1 score, and MCC, MGNDTI significantly outperformed the above seven methods. MGNDTI is a powerful tool for DTI prediction, showcasing its superior robustness and generalization ability on diverse data sets and different experimental settings. It is freely available at https://github.com/plhhnu/MGNDTI.
Collapse
Affiliation(s)
- Lihong Peng
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Xin Liu
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Min Chen
- School of Computer Science and Engineering, Hunan Institute of Technology, Hengyang, Hunan 421002, China
| | - Wen Liao
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Jiale Mao
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| | - Liqian Zhou
- College of Life Science and Chemistry, Hunan University of Technology, Zhuzhou, Hunan 412007, China
| |
Collapse
|
21
|
Hao Y, Li B, Huang D, Wu S, Wang T, Fu L, Liu X. Developing a Semi-Supervised Approach Using a PU-Learning-Based Data Augmentation Strategy for Multitarget Drug Discovery. Int J Mol Sci 2024; 25:8239. [PMID: 39125808 PMCID: PMC11312053 DOI: 10.3390/ijms25158239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 07/26/2024] [Accepted: 07/26/2024] [Indexed: 08/12/2024] Open
Abstract
Multifactorial diseases demand therapeutics that can modulate multiple targets for enhanced safety and efficacy, yet the clinical approval of multitarget drugs remains rare. The integration of machine learning (ML) and deep learning (DL) in drug discovery has revolutionized virtual screening. This study investigates the synergy between ML/DL methodologies, molecular representations, and data augmentation strategies. Notably, we found that SVM can match or even surpass the performance of state-of-the-art DL methods. However, conventional data augmentation often involves a trade-off between the true positive rate and false positive rate. To address this, we introduce Negative-Augmented PU-bagging (NAPU-bagging) SVM, a novel semi-supervised learning framework. By leveraging ensemble SVM classifiers trained on resampled bags containing positive, negative, and unlabeled data, our approach is capable of managing false positive rates while maintaining high recall rates. We applied this method to the identification of multitarget-directed ligands (MTDLs), where high recall rates are critical for compiling a list of interaction candidate compounds. Case studies demonstrate that NAPU-bagging SVM can identify structurally novel MTDL hits for ALK-EGFR with favorable docking scores and binding modes, as well as pan-agonists for dopamine receptors. The NAPU-bagging SVM methodology should serve as a promising avenue to virtual screening, especially for the discovery of MTDLs.
Collapse
Affiliation(s)
- Yang Hao
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Bo Li
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Daiyun Huang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- School of Life Sciences, Fudan University, Shanghai 200092, China
| | - Sijin Wu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Tianjun Wang
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZX, UK
| | - Lei Fu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| | - Xin Liu
- Wisdom Lake Academy of Pharmacy, Xi’an Jiaotong-Liverpool University, Suzhou 215123, China; (Y.H.); (B.L.); (S.W.); (T.W.); (L.F.)
| |
Collapse
|
22
|
Wu H, Liu J, Zhang R, Lu Y, Cui G, Cui Z, Ding Y. A review of deep learning methods for ligand based drug virtual screening. FUNDAMENTAL RESEARCH 2024; 4:715-737. [PMID: 39156568 PMCID: PMC11330120 DOI: 10.1016/j.fmre.2024.02.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Revised: 01/10/2024] [Accepted: 02/18/2024] [Indexed: 08/20/2024] Open
Abstract
Drug discovery is costly and time consuming, and modern drug discovery endeavors are progressively reliant on computational methodologies, aiming to mitigate temporal and financial expenditures associated with the process. In particular, the time required for vaccine and drug discovery is prolonged during emergency situations such as the coronavirus 2019 pandemic. Recently, the performance of deep learning methods in drug virtual screening has been particularly prominent. It has become a concern for researchers how to summarize the existing deep learning in drug virtual screening, select different models for different drug screening problems, exploit the advantages of deep learning models, and further improve the capability of deep learning in drug virtual screening. This review first introduces the basic concepts of drug virtual screening, common datasets, and data representation methods. Then, large numbers of common deep learning methods for drug virtual screening are compared and analyzed. In addition, a dataset of different sizes is constructed independently to evaluate the performance of each deep learning model for the difficult problem of large-scale ligand virtual screening. Finally, the existing challenges and future directions in the field of virtual screening are presented.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Runhua Zhang
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yaoyao Lu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Guozeng Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| |
Collapse
|
23
|
Feng BM, Zhang YY, Zhou XC, Wang JL, Feng YF. MolLoG: A Molecular Level Interpretability Model Bridging Local to Global for Predicting Drug Target Interactions. J Chem Inf Model 2024; 64:4348-4358. [PMID: 38709146 DOI: 10.1021/acs.jcim.4c00171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/07/2024]
Abstract
Developing new pharmaceuticals is a costly and time-consuming endeavor fraught with significant safety risks. A critical aspect of drug research and disease therapy is discerning the existence of interactions between drugs and proteins. The evolution of deep learning (DL) in computer science has been remarkably aided in this regard in recent years. Yet, two challenges remain: (i) balancing the extraction of profound, local cohesive characteristics while warding off gradient disappearance and (ii) globally representing and understanding the interactions between the drug and target local attributes, which is vital for delivering molecular level insights indispensable to drug development. In response to these challenges, we propose a DL network structure, MolLoG, primarily comprising two modules: local feature encoders (LFE) and global interactive learning (GIL). Within the LFE module, graph convolution networks and leap blocks capture the local features of drug and protein molecules, respectively. The GIL module enables the efficient amalgamation of feature information, facilitating the global learning of feature structural semantics and procuring multihead attention weights for abstract features stemming from two modalities, providing biologically pertinent explanations for black-box results. Finally, predictive outcomes are achieved by decoding the unified representation via a multilayer perceptron. Our experimental analysis reveals that MolLoG outperforms several cutting-edge baselines across four data sets, delivering superior overall performance and providing satisfactory results when elucidating various facets of drug-target interaction predictions.
Collapse
Affiliation(s)
- Bao-Ming Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yuan-Yuan Zhang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Xiao-Chen Zhou
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Jin-Long Wang
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| | - Yin-Fei Feng
- School of Information and Control Engineering, Qingdao University of Technology, Qingdao, 266520 Shandong, China
| |
Collapse
|
24
|
Zhang S, Tian X, Chen C, Su Y, Huang W, Lv X, Chen C, Li H. AIGO-DTI: Predicting Drug-Target Interactions Based on Improved Drug Properties Combined with Adaptive Iterative Algorithms. J Chem Inf Model 2024; 64:4373-4384. [PMID: 38743013 DOI: 10.1021/acs.jcim.4c00584] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Artificial intelligence-based methods for predicting drug-target interactions (DTIs) aim to explore reliable drug candidate targets rapidly and cost-effectively to accelerate the drug development process. However, current methods are often limited by the topological regularities of drug molecules, making them difficult to generalize to a broader chemical space. Additionally, the use of similarity to measure DTI network links often introduces noise, leading to false DTI relationships and affecting the prediction accuracy. To address these issues, this study proposes an Adaptive Iterative Graph Optimization (AIGO)-DTI prediction framework. This framework integrates atomic cluster information and enhances molecular features through the design of functional group prompts and graph encoders, optimizing the construction of DTI association networks. Furthermore, the optimization of graph structure is transformed into a node similarity learning problem, utilizing multihead similarity metric functions to iteratively update the network structure to improve the quality of DTI information. Experimental results demonstrate the outstanding performance of AIGO-DTI on multiple public data sets and label reversal data sets. Case studies, molecular docking, and existing research validate its effectiveness and reliability. Overall, the method proposed in this study can construct comprehensive and reliable DTI association network information, providing new graphing and optimization strategies for DTI prediction, which contribute to efficient drug development and reduce target discovery costs.
Collapse
Affiliation(s)
- Sizhe Zhang
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xuecong Tian
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Chen Chen
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Ying Su
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Wanhua Huang
- College of Information Science and Engineering, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Xiaoyi Lv
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Cheng Chen
- College of Software, Xinjiang University, Urumqi, 830046 Xinjiang, China
| | - Hongyi Li
- Xinjiang University, Urumqi, 830046 Xinjiang, China
| |
Collapse
|
25
|
Rao J, Xie J, Yuan Q, Liu D, Wang Z, Lu Y, Zheng S, Yang Y. A variational expectation-maximization framework for balanced multi-scale learning of protein and drug interactions. Nat Commun 2024; 15:4476. [PMID: 38796523 PMCID: PMC11530528 DOI: 10.1038/s41467-024-48801-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Accepted: 05/14/2024] [Indexed: 05/28/2024] Open
Abstract
Protein functions are characterized by interactions with proteins, drugs, and other biomolecules. Understanding these interactions is essential for deciphering the molecular mechanisms underlying biological processes and developing new therapeutic strategies. Current computational methods mostly predict interactions based on either molecular network or structural information, without integrating them within a unified multi-scale framework. While a few multi-view learning methods are devoted to fusing the multi-scale information, these methods tend to rely intensively on a single scale and under-fitting the others, likely attributed to the imbalanced nature and inherent greediness of multi-scale learning. To alleviate the optimization imbalance, we present MUSE, a multi-scale representation learning framework based on a variant expectation maximization to optimize different scales in an alternating procedure over multiple iterations. This strategy efficiently fuses multi-scale information between atomic structure and molecular network scale through mutual supervision and iterative optimization. MUSE outperforms the current state-of-the-art models not only in molecular interaction (protein-protein, drug-protein, and drug-drug) tasks but also in protein interface prediction at the atomic structure scale. More importantly, the multi-scale learning framework shows potential for extension to other scales of computational drug discovery.
Collapse
Affiliation(s)
- Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jiancong Xie
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Deqin Liu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Zhen Wang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Yutong Lu
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
| | - Shuangjia Zheng
- Global Institute of Future Technology, Shanghai Jiao Tong University, Shanghai, China.
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Sun Yat-sen University, Guangzhou, China.
- State Key Laboratory of Oncology in South China, Sun Yat-sen University, Guangzhou, China.
| |
Collapse
|
26
|
Goles M, Daza A, Cabas-Mora G, Sarmiento-Varón L, Sepúlveda-Yañez J, Anvari-Kazemabad H, Davari MD, Uribe-Paredes R, Olivera-Nappa Á, Navarrete MA, Medina-Ortiz D. Peptide-based drug discovery through artificial intelligence: towards an autonomous design of therapeutic peptides. Brief Bioinform 2024; 25:bbae275. [PMID: 38856172 PMCID: PMC11163380 DOI: 10.1093/bib/bbae275] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 04/23/2024] [Accepted: 06/04/2024] [Indexed: 06/11/2024] Open
Abstract
With their diverse biological activities, peptides are promising candidates for therapeutic applications, showing antimicrobial, antitumour and hormonal signalling capabilities. Despite their advantages, therapeutic peptides face challenges such as short half-life, limited oral bioavailability and susceptibility to plasma degradation. The rise of computational tools and artificial intelligence (AI) in peptide research has spurred the development of advanced methodologies and databases that are pivotal in the exploration of these complex macromolecules. This perspective delves into integrating AI in peptide development, encompassing classifier methods, predictive systems and the avant-garde design facilitated by deep-generative models like generative adversarial networks and variational autoencoders. There are still challenges, such as the need for processing optimization and careful validation of predictive models. This work outlines traditional strategies for machine learning model construction and training techniques and proposes a comprehensive AI-assisted peptide design and validation pipeline. The evolving landscape of peptide design using AI is emphasized, showcasing the practicality of these methods in expediting the development and discovery of novel peptides within the context of peptide-based drug discovery.
Collapse
Affiliation(s)
- Montserrat Goles
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Departamento de Ingeniería Química, Biotecnología y Materiales, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Anamaría Daza
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Gabriel Cabas-Mora
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Lindybeth Sarmiento-Varón
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
| | - Julieta Sepúlveda-Yañez
- Facultad de Ciencias de la Salud, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Hoda Anvari-Kazemabad
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Mehdi D Davari
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, 06120, Halle, Germany
| | - Roberto Uribe-Paredes
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - Álvaro Olivera-Nappa
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| | - Marcelo A Navarrete
- Centro Asistencial de Docencia e Investigación, CADI, Universidad de Magallanes, Av. Los Flamencos 01364, 6210005, Punta Arenas, Chile
- Escuela de Medicina, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
| | - David Medina-Ortiz
- Departamento de Ingeniería en Computación, Universidad de Magallanes, Av. Pdte. Manuel Bulnes 01855, 6210427, Punta Arenas, Chile
- Centre for Biotechnology and Bioengineering, CeBiB, Universidad de Chile, Beauchef 851, 8370456, Santiago, Chile
| |
Collapse
|
27
|
Tan D, Jiang H, Li H, Xie Y, Su Y. Prediction of drug-protein interaction based on dual channel neural networks with attention mechanism. Brief Funct Genomics 2024; 23:286-294. [PMID: 37642213 DOI: 10.1093/bfgp/elad037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Revised: 07/16/2023] [Accepted: 08/08/2023] [Indexed: 08/31/2023] Open
Abstract
The precise identification of drug-protein inter action (DPI) can significantly speed up the drug discovery process. Bioassay methods are time-consuming and expensive to screen for each pair of drug proteins. Machine-learning-based methods cannot accurately predict a large number of DPIs. Compared with traditional computing methods, deep learning methods need less domain knowledge and have strong data learning ability. In this study, we construct a DPI prediction model based on dual channel neural networks with an efficient path attention mechanism, called DCA-DPI. The drug molecular graph and protein sequence are used as the data input of the model, and the residual graph neural network and the residual convolution network are used to learn the feature representation of the drug and protein, respectively, to obtain the feature vector of the drug and the hidden vector of protein. To get a more accurate protein feature vector, the weighted sum of the hidden vector of protein is applied using the neural attention mechanism. In the end, drug and protein vectors are concatenated and input into the full connection layer for classification. In order to evaluate the performance of DCA-DPI, three widely used public data, Human, C.elegans and DUD-E, are used in the experiment. The evaluation metrics values in the experiment are superior to other relevant methods. Experiments show that our model is efficient for DPI prediction.
Collapse
Affiliation(s)
- Dayu Tan
- Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Haijun Jiang
- Key Laboratory of Intelligent Computing and Signal Processing, School of Computer Science and Technology, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Haitao Li
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| | - Ying Xie
- School of Mechanical, Electrical and Information Engineering, Putian University, China
| | - Yansen Su
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, 111 Jiulong Road, 230601, Hefei, China
| |
Collapse
|
28
|
Du W, Zhao L, Wu R, Huang B, Liu S, Liu Y, Huang H, Shi G. Predicting drug-Protein interaction with deep learning framework for molecular graphs and sequences: Potential candidates against SAR-CoV-2. PLoS One 2024; 19:e0299696. [PMID: 38728335 PMCID: PMC11086825 DOI: 10.1371/journal.pone.0299696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 02/14/2024] [Indexed: 05/12/2024] Open
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) caused the COVID-19 disease, which represents a new life-threatening disaster. Regarding viral infection, many therapeutics have been investigated to alleviate the epidemiology such as vaccines and receptor decoys. However, the continuous mutating coronavirus, especially the variants of Delta and Omicron, are tended to invalidate the therapeutic biological product. Thus, it is necessary to develop molecular entities as broad-spectrum antiviral drugs. Coronavirus replication is controlled by the viral 3-chymotrypsin-like cysteine protease (3CLpro) enzyme, which is required for the virus's life cycle. In the cases of severe acute respiratory syndrome coronavirus (SARS-CoV) and middle east respiratory syndrome coronavirus (MERS-CoV), 3CLpro has been shown to be a promising therapeutic development target. Here we proposed an attention-based deep learning framework for molecular graphs and sequences, training from the BindingDB 3CLpro dataset (114,555 compounds). After construction of such model, we conducted large-scale screening the in vivo/vitro dataset (276,003 compounds) from Zinc Database and visualize the candidate compounds with attention score. geometric-based affinity prediction was employed for validation. Finally, we established a 3CLpro-specific deep learning framework, namely GraphDPI-3CL (AUROC: 0.958) achieved superior performance beyond the existing state of the art model and discovered 10 molecules with a high binding affinity of 3CLpro and superior binding mode.
Collapse
Affiliation(s)
- Weian Du
- Department of Dermatology, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Liang Zhao
- Shenzhen Health Development Research and Data Management Center, Shenzhen, China
| | - Rong Wu
- Department of Dermatology, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Boning Huang
- School of Finance, Shanghai University of Finance and Economics, Shanghai, China
| | - Si Liu
- Department of Cosmetic and Plastic Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Yufeng Liu
- Department of Cosmetic and Plastic Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| | - Huaiqiu Huang
- Department of Dermatology, Third Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
| | - Ge Shi
- Department of Cosmetic and Plastic Surgery, The Sixth Affiliated Hospital, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
29
|
Xia Y, Pan X, Shen HB. Heterogeneous sampled subgraph neural networks with knowledge distillation to enhance double-blind compound-protein interaction prediction. Structure 2024; 32:611-620.e4. [PMID: 38447575 DOI: 10.1016/j.str.2024.02.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 12/18/2023] [Accepted: 02/08/2024] [Indexed: 03/08/2024]
Abstract
Identifying binding compounds against a target protein is crucial for large-scale virtual screening in drug development. Recently, network-based methods have been developed for compound-protein interaction (CPI) prediction. However, they are difficult to be applied to unseen (i.e., never-seen-before) proteins and compounds. In this study, we propose SgCPI to incorporate local known interacting networks to predict CPI interactions. SgCPI randomly samples the local CPI network of the query compound-protein pair as a subgraph and applies a heterogeneous graph neural network (HGNN) to embed the active/inactive message of the subgraph. For unseen compounds and proteins, SgCPI-KD takes SgCPI as the teacher model to distillate its knowledge by estimating the potential neighbors. Experimental results indicate: (1) the sampled subgraphs of the CPI network introduce efficient knowledge for unseen molecular prediction with the HGNNs, and (2) the knowledge distillation strategy is beneficial to the double-blind interaction prediction by estimating molecular neighbors and distilling knowledge.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
30
|
Pan F, Yin C, Liu SQ, Huang T, Bian Z, Yuen PC. BindingSiteDTI: differential-scale binding site modelling for drug-target interaction prediction. Bioinformatics 2024; 40:btae308. [PMID: 38730554 PMCID: PMC11256917 DOI: 10.1093/bioinformatics/btae308] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 03/06/2024] [Accepted: 05/09/2024] [Indexed: 05/13/2024] Open
Abstract
MOTIVATION Enhanced by contemporary computational advances, the prediction of drug-target interactions (DTIs) has become crucial in developing de novo and effective drugs. Existing deep learning approaches to DTI prediction are frequently beleaguered by a tendency to overfit specific molecular representations, which significantly impedes their predictive reliability and utility in novel drug discovery contexts. Furthermore, existing DTI networks often disregard the molecular size variance between macro molecules (targets) and micro molecules (drugs) by treating them at an equivalent scale that undermines the accurate elucidation of their interaction. RESULTS We propose a novel DTI network with a differential-scale scheme to model the binding site for enhancing DTI prediction, which is named as BindingSiteDTI. It explicitly extracts multiscale substructures from targets with different scales of molecular size and fixed-scale substructures from drugs, facilitating the identification of structurally similar substructural tokens, and models the concealed relationships at the substructural level to construct interaction feature. Experiments conducted on popular benchmarks, including DUD-E, human, and BindingDB, shown that BindingSiteDTI contains significant improvements compared with recent DTI prediction methods. AVAILABILITY AND IMPLEMENTATION The source code of BindingSiteDTI can be accessed at https://github.com/MagicPF/BindingSiteDTI.
Collapse
Affiliation(s)
- Feng Pan
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Chong Yin
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Si-Qi Liu
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
- Shenzhen Research Institute of Big Data, The Chinese University of Hong Kong (Shenzhen), 518172, China
| | - Tao Huang
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Zhaoxiang Bian
- School of Chinese Medicine, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| | - Pong Chi Yuen
- Department of Computer Science, Hong Kong Baptist University, Kowloon, 999077, Hong Kong
| |
Collapse
|
31
|
Gao M, Zhang D, Chen Y, Zhang Y, Wang Z, Wang X, Li S, Guo Y, Webb GI, Nguyen ATN, May L, Song J. GraphormerDTI: A graph transformer-based approach for drug-target interaction prediction. Comput Biol Med 2024; 173:108339. [PMID: 38547658 DOI: 10.1016/j.compbiomed.2024.108339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2023] [Revised: 03/05/2024] [Accepted: 03/17/2024] [Indexed: 04/17/2024]
Abstract
The application of Artificial Intelligence (AI) to screen drug molecules with potential therapeutic effects has revolutionized the drug discovery process, with significantly lower economic cost and time consumption than the traditional drug discovery pipeline. With the great power of AI, it is possible to rapidly search the vast chemical space for potential drug-target interactions (DTIs) between candidate drug molecules and disease protein targets. However, only a small proportion of molecules have labelled DTIs, consequently limiting the performance of AI-based drug screening. To solve this problem, a machine learning-based approach with great ability to generalize DTI prediction across molecules is desirable. Many existing machine learning approaches for DTI identification failed to exploit the full information with respect to the topological structures of candidate molecules. To develop a better approach for DTI prediction, we propose GraphormerDTI, which employs the powerful Graph Transformer neural network to model molecular structures. GraphormerDTI embeds molecular graphs into vector-format representations through iterative Transformer-based message passing, which encodes molecules' structural characteristics by node centrality encoding, node spatial encoding and edge encoding. With a strong structural inductive bias, the proposed GraphormerDTI approach can effectively infer informative representations for out-of-sample molecules and as such, it is capable of predicting DTIs across molecules with an exceptional performance. GraphormerDTI integrates the Graph Transformer neural network with a 1-dimensional Convolutional Neural Network (1D-CNN) to extract the drugs' and target proteins' representations and leverages an attention mechanism to model the interactions between them. To examine GraphormerDTI's performance for DTI prediction, we conduct experiments on three benchmark datasets, where GraphormerDTI achieves a superior performance than five state-of-the-art baselines for out-of-molecule DTI prediction, including GNN-CPI, GNN-PT, DeepEmbedding-DTI, MolTrans and HyperAttentionDTI, and is on a par with the best baseline for transductive DTI prediction. The source codes and datasets are publicly accessible at https://github.com/mengmeng34/GraphormerDTI.
Collapse
Affiliation(s)
- Mengmeng Gao
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China
| | - Daokun Zhang
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia.
| | - Yi Chen
- School of Biological Science and Medical Engineering, Southeast University, Nanjing, China.
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Zhikang Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Xiaoyu Wang
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC, 3004, Australia
| | - Geoffrey I Webb
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Melbourne, Australia
| | - Anh T N Nguyen
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Lauren May
- Drug Discovery Biology Theme, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, Australia
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
| |
Collapse
|
32
|
Svensson E, Hoedt PJ, Hochreiter S, Klambauer G. HyperPCM: Robust Task-Conditioned Modeling of Drug-Target Interactions. J Chem Inf Model 2024; 64:2539-2553. [PMID: 38185877 PMCID: PMC11005051 DOI: 10.1021/acs.jcim.3c01417] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2023] [Revised: 11/27/2023] [Accepted: 11/27/2023] [Indexed: 01/09/2024]
Abstract
A central problem in drug discovery is to identify the interactions between drug-like compounds and protein targets. Over the past few decades, various quantitative structure-activity relationship (QSAR) and proteo-chemometric (PCM) approaches have been developed to model and predict these interactions. While QSAR approaches solely utilize representations of the drug compound, PCM methods incorporate both representations of the protein target and the drug compound, enabling them to achieve above-chance predictive accuracy on previously unseen protein targets. Both QSAR and PCM approaches have recently been improved by machine learning and deep neural networks, that allow the development of drug-target interaction prediction models from measurement data. However, deep neural networks typically require large amounts of training data and cannot robustly adapt to new tasks, such as predicting interaction for unseen protein targets at inference time. In this work, we propose to use HyperNetworks to efficiently transfer information between tasks during inference and thus to accurately predict drug-target interactions on unseen protein targets. Our HyperPCM method reaches state-of-the-art performance compared to previous methods on multiple well-known benchmarks, including Davis, DUD-E, and a ChEMBL derived data set, and particularly excels at zero-shot inference involving unseen protein targets. Our method, as well as reproducible data preparation, is available at https://github.com/ml-jku/hyper-dti.
Collapse
Affiliation(s)
- Emma Svensson
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Molecular
AI, Discovery Sciences, R&D, AstraZeneca, Gothenburg, 431 83, Sweden
| | - Pieter-Jan Hoedt
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| | - Sepp Hochreiter
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
- Institute
of Advanced Research in Artificial Intelligence (IARAI), Vienna 1030, Austria
| | - Günter Klambauer
- ELLIS
Unit Linz & Institute for Machine Learning, Johannes Kepler University, Linz 4040, Austria
| |
Collapse
|
33
|
Zeng X, Chen W, Lei B. CAT-DTI: cross-attention and Transformer network with domain adaptation for drug-target interaction prediction. BMC Bioinformatics 2024; 25:141. [PMID: 38566002 PMCID: PMC11264959 DOI: 10.1186/s12859-024-05753-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 03/19/2024] [Indexed: 04/04/2024] Open
Abstract
Accurate and efficient prediction of drug-target interaction (DTI) is critical to advance drug development and reduce the cost of drug discovery. Recently, the employment of deep learning methods has enhanced DTI prediction precision and efficacy, but it still encounters several challenges. The first challenge lies in the efficient learning of drug and protein feature representations alongside their interaction features to enhance DTI prediction. Another important challenge is to improve the generalization capability of the DTI model within real-world scenarios. To address these challenges, we propose CAT-DTI, a model based on cross-attention and Transformer, possessing domain adaptation capability. CAT-DTI effectively captures the drug-target interactions while adapting to out-of-distribution data. Specifically, we use a convolution neural network combined with a Transformer to encode the distance relationship between amino acids within protein sequences and employ a cross-attention module to capture the drug-target interaction features. Generalization to new DTI prediction scenarios is achieved by leveraging a conditional domain adversarial network, aligning DTI representations under diverse distributions. Experimental results within in-domain and cross-domain scenarios demonstrate that CAT-DTI model overall improves DTI prediction performance compared with previous methods.
Collapse
Affiliation(s)
- Xiaoting Zeng
- School of Computer and Software, Shenzhen University, Shenzhen, 518060, China
| | - Weilin Chen
- Marshall Laboratory of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, 518055, China.
| | - Baiying Lei
- School of Biomedical Engineering, Shenzhen University, Shenzhen, 518055, China.
| |
Collapse
|
34
|
Wang J, Chen S, Yuan Q, Chen J, Li D, Wang L, Yang Y. Predicting the effects of mutations on protein solubility using graph convolution network and protein language model representation. J Comput Chem 2024; 45:436-445. [PMID: 37933773 DOI: 10.1002/jcc.27249] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 10/11/2023] [Accepted: 10/21/2023] [Indexed: 11/08/2023]
Abstract
Solubility is one of the most important properties of protein. Protein solubility can be greatly changed by single amino acid mutations and the reduced protein solubility could lead to diseases. Since experimental methods to determine solubility are time-consuming and expensive, in-silico methods have been developed to predict the protein solubility changes caused by mutations mostly through protein evolution information. However, these methods are slow since it takes long time to obtain evolution information through multiple sequence alignment. In addition, these methods are of low performance because they do not fully utilize protein 3D structures due to a lack of experimental structures for most proteins. Here, we proposed a sequence-based method DeepMutSol to predict solubility change from residual mutations based on the Graph Convolutional Neural Network (GCN), where the protein graph was initiated according to predicted protein structure from Alphafold2, and the nodes (residues) were represented by protein language embeddings. To circumvent the small data of solubility changes, we further pretrained the model over absolute protein solubility. DeepMutSol was shown to outperform state-of-the-art methods in benchmark tests. In addition, we applied the method to clinically relevant genes from the ClinVar database and the predicted solubility changes were shown able to separate pathogenic mutations. All of the data sets and the source code are available at https://github.com/biomed-AI/DeepMutSol.
Collapse
Affiliation(s)
- Jing Wang
- Guangzhou institute of technology, Xidian University, Guangzhou, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Jianwen Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| | - Danping Li
- School of Telecommunications Engineering, Xidian University, Xi'an, China
| | - Lei Wang
- School of Electronic Engineering, Xidian University, Xi'an, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
| |
Collapse
|
35
|
Liu Y, Xing L, Zhang L, Cai H, Guo M. GEFormerDTA: drug target affinity prediction based on transformer graph for early fusion. Sci Rep 2024; 14:7416. [PMID: 38548825 PMCID: PMC10979032 DOI: 10.1038/s41598-024-57879-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/22/2024] [Indexed: 04/01/2024] Open
Abstract
Predicting the interaction affinity between drugs and target proteins is crucial for rapid and accurate drug discovery and repositioning. Therefore, more accurate prediction of DTA has become a key area of research in the field of drug discovery and drug repositioning. However, traditional experimental methods have disadvantages such as long operation cycles, high manpower requirements, and high economic costs, making it difficult to predict specific interactions between drugs and target proteins quickly and accurately. Some methods mainly use the SMILES sequence of drugs and the primary structure of proteins as inputs, ignoring the graph information such as bond encoding, degree centrality encoding, spatial encoding of drug molecule graphs, and the structural information of proteins such as secondary structure and accessible surface area. Moreover, previous methods were based on protein sequences to learn feature representations, neglecting the completeness of information. To address the completeness of drug and protein structure information, we propose a Transformer graph-based early fusion research approach for drug-target affinity prediction (GEFormerDTA). Our method reduces prediction errors caused by insufficient feature learning. Experimental results on Davis and KIBA datasets showed a better prediction of drugtarget affinity than existing affinity prediction methods.
Collapse
Affiliation(s)
- Youzhi Liu
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Linlin Xing
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China.
| | - Longbo Zhang
- Department of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, China
| | - Hongzhen Cai
- Department of Agricultural Engineering and Food Science, Shandong University of Technology, Zibo, 255000, China
| | - Maozu Guo
- Department of Electrical and Information Engineering, Beijing University of Architecture, Beijing, 102616, China
| |
Collapse
|
36
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
37
|
Yin Z, Chen Y, Hao Y, Pandiyan S, Shao J, Wang L. FOTF-CPI: A compound-protein interaction prediction transformer based on the fusion of optimal transport fragments. iScience 2024; 27:108756. [PMID: 38230261 PMCID: PMC10790010 DOI: 10.1016/j.isci.2023.108756] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2023] [Revised: 11/05/2023] [Accepted: 12/13/2023] [Indexed: 01/18/2024] Open
Abstract
Compound-protein interaction (CPI) affinity prediction plays an important role in reducing the cost and time of drug discovery. However, the interpretability of how fragments function in CPI is impacted by the fact that current methods ignore the affinity relationships between fragments of compounds and fragments of proteins in CPI modeling. This article introduces an improved Transformer called FOTF-CPI (a Fusion of Optimal Transport Fragments compound-protein interaction prediction model). We use an optimal transport-based fragmentation approach to improve the model's understanding of compound and protein sequences. Additionally, a fused attention mechanism is employed, which combines the features of fragments to capture full affinity information. This fused attention redistributes higher attention scores to fragments with higher affinity. Experimental results show FOTF-CPI achieves an average 2% higher performance than other models on all three datasets. Furthermore, the visualization confirms the potential of FOTF-CPI for drug discovery applications.
Collapse
Affiliation(s)
- Zeyu Yin
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Yu Chen
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Yajie Hao
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Sanjeevi Pandiyan
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| | - Jinsong Shao
- School of Information Science and Technology, Nantong University, Nantong 226001, China
| | - Li Wang
- School of Information Science and Technology, Nantong University, Nantong 226001, China
- Research Center for Intelligent Information Technology, Nantong University, Nantong 226001, China
| |
Collapse
|
38
|
Xu W, Yang X, Guan Y, Cheng X, Wang Y. Integrative approach for predicting drug-target interactions via matrix factorization and broad learning systems. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:2608-2625. [PMID: 38454698 DOI: 10.3934/mbe.2024115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/09/2024]
Abstract
In the drug discovery process, time and costs are the most typical problems resulting from the experimental screening of drug-target interactions (DTIs). To address these limitations, many computational methods have been developed to achieve more accurate predictions. However, identifying DTIs mostly rely on separate learning tasks with drug and target features that neglect interaction representation between drugs and target. In addition, the lack of these relationships may lead to a greatly impaired performance on the prediction of DTIs. Aiming at capturing comprehensive drug-target representations and simplifying the network structure, we propose an integrative approach with a convolution broad learning system for the DTI prediction (ConvBLS-DTI) to reduce the impact of the data sparsity and incompleteness. First, given the lack of known interactions for the drug and target, the weighted K-nearest known neighbors (WKNKN) method was used as a preprocessing strategy for unknown drug-target pairs. Second, a neighborhood regularized logistic matrix factorization (NRLMF) was applied to extract features of updated drug-target interaction information, which focused more on the known interaction pair parties. Then, a broad learning network incorporating a convolutional neural network was established to predict DTIs, which can make classification more effective using a different perspective. Finally, based on the four benchmark datasets in three scenarios, the ConvBLS-DTI's overall performance out-performed some mainstream methods. The test results demonstrate that our model achieves improved prediction effect on the area under the receiver operating characteristic curve and the precision-recall curve.
Collapse
Affiliation(s)
- Wanying Xu
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, China
| | - Xixin Yang
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, China
- School of Automation, Qingdao University, Qingdao 266071, China
| | - Yuanlin Guan
- Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao 266520, China
- School of Mechanical & Automotive Engineering, Qingdao University of Technology, Qingdao 266520, China
| | - Xiaoqing Cheng
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, China
| | - Yu Wang
- College of Computer Science & Technology, Qingdao University, Qingdao 266071, China
| |
Collapse
|
39
|
Wu H, Liu J, Jiang T, Zou Q, Qi S, Cui Z, Tiwari P, Ding Y. AttentionMGT-DTA: A multi-modal drug-target affinity prediction using graph transformer and attention mechanism. Neural Netw 2024; 169:623-636. [PMID: 37976593 DOI: 10.1016/j.neunet.2023.11.018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/29/2023] [Accepted: 11/07/2023] [Indexed: 11/19/2023]
Abstract
The accurate prediction of drug-target affinity (DTA) is a crucial step in drug discovery and design. Traditional experiments are very expensive and time-consuming. Recently, deep learning methods have achieved notable performance improvements in DTA prediction. However, one challenge for deep learning-based models is appropriate and accurate representations of drugs and targets, especially the lack of effective exploration of target representations. Another challenge is how to comprehensively capture the interaction information between different instances, which is also important for predicting DTA. In this study, we propose AttentionMGT-DTA, a multi-modal attention-based model for DTA prediction. AttentionMGT-DTA represents drugs and targets by a molecular graph and binding pocket graph, respectively. Two attention mechanisms are adopted to integrate and interact information between different protein modalities and drug-target pairs. The experimental results showed that our proposed model outperformed state-of-the-art baselines on two benchmark datasets. In addition, AttentionMGT-DTA also had high interpretability by modeling the interaction strength between drug atoms and protein residues. Our code is available at https://github.com/JK-Liu7/AttentionMGT-DTA.
Collapse
Affiliation(s)
- Hongjie Wu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China; Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Tengsheng Jiang
- Gusu School, Nanjing Medical University, Suzhou, 215009, China.
| | - Quan Zou
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| | - Shujie Qi
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Zhiming Cui
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou, 215009, China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute(Quzhou), University of Electronic Science and Technology of China, Quzhou, 324003, China.
| |
Collapse
|
40
|
Liu Z, Bao Y, Wang W, Pan L, Wang H, Lin GN. Emden: A novel method integrating graph and transformer representations for predicting the effect of mutations on clinical drug response. Comput Biol Med 2023; 167:107678. [PMID: 37976823 DOI: 10.1016/j.compbiomed.2023.107678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/22/2023] [Accepted: 11/06/2023] [Indexed: 11/19/2023]
Abstract
Precision medicine based on personalized genomics provides promising strategies to enhance the efficacy of molecular-targeted therapies. However, the clinical effectiveness of drugs has been severely limited due to genetic variations that lead to drug resistance. Predicting the impact of missense mutations on clinical drug response is an essential way to reduce the cost of clinical trials and understand genetic diseases. Here, we present Emden, a novel method integrating graph and transformer representations that predicts the effect of missense mutations on drug response through binary classification with interpretability. Emden utilized protein sequences-based features and drug structures as inputs for rapid prediction, employing competitive representation learning and demonstrating strong generalization capabilities and robustness. Our study showed promising potential for clinical drug guidance and deep insight into computer-assisted precision medicine. Emden is freely available as a web server at https://www.psymukb.net/Emden.
Collapse
Affiliation(s)
- Zhe Liu
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Yihang Bao
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Weidi Wang
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China
| | - Liangwei Pan
- Department of Thoracic Surgery, Zhongshan Hospital, Fudan University, Shanghai, China
| | - Han Wang
- School of Information Science and Technology, Institute of Computational Biology, Northeast Normal University, Changchun, China.
| | - Guan Ning Lin
- Shanghai Mental Health Center, Shanghai Jiao Tong University School of Medicine, School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, China; Shanghai Key Laboratory of Psychotic Disorders, Shanghai, China.
| |
Collapse
|
41
|
Li Y, Fan Z, Rao J, Chen Z, Chu Q, Zheng M, Li X. An overview of recent advances and challenges in predicting compound-protein interaction (CPI). MEDICAL REVIEW (2021) 2023; 3:465-486. [PMID: 38282802 PMCID: PMC10808869 DOI: 10.1515/mr-2023-0030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Accepted: 08/30/2023] [Indexed: 01/30/2024]
Abstract
Compound-protein interactions (CPIs) are critical in drug discovery for identifying therapeutic targets, drug side effects, and repurposing existing drugs. Machine learning (ML) algorithms have emerged as powerful tools for CPI prediction, offering notable advantages in cost-effectiveness and efficiency. This review provides an overview of recent advances in both structure-based and non-structure-based CPI prediction ML models, highlighting their performance and achievements. It also offers insights into CPI prediction-related datasets and evaluation benchmarks. Lastly, the article presents a comprehensive assessment of the current landscape of CPI prediction, elucidating the challenges faced and outlining emerging trends to advance the field.
Collapse
Affiliation(s)
- Yanbei Li
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhehuan Fan
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Jingxin Rao
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Zhiyi Chen
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Qinyu Chu
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Mingyue Zheng
- School of Pharmaceutical Science and Technology, Hangzhou Institute for Advanced Study, UCAS, Hangzhou, Zhejiang Province, China
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xutong Li
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
42
|
Song N, Dong R, Pu Y, Wang E, Xu J, Guo F. Pmf-cpi: assessing drug selectivity with a pretrained multi-functional model for compound-protein interactions. J Cheminform 2023; 15:97. [PMID: 37838703 PMCID: PMC10576287 DOI: 10.1186/s13321-023-00767-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 09/28/2023] [Indexed: 10/16/2023] Open
Abstract
Compound-protein interactions (CPI) play significant roles in drug development. To avoid side effects, it is also crucial to evaluate drug selectivity when binding to different targets. However, most selectivity prediction models are constructed for specific targets with limited data. In this study, we present a pretrained multi-functional model for compound-protein interaction prediction (PMF-CPI) and fine-tune it to assess drug selectivity. This model uses recurrent neural networks to process the protein embedding based on the pretrained language model TAPE, extracts molecular information from a graph encoder, and produces the output from dense layers. PMF-CPI obtained the best performance compared to outstanding approaches on both the binding affinity regression and CPI classification tasks. Meanwhile, we apply the model to analyzing drug selectivity after fine-tuning it on three datasets related to specific targets, including human cytochrome P450s. The study shows that PMF-CPI can accurately predict different drug affinities or opposite interactions toward similar targets, recognizing selective drugs for precise therapeutics.Kindly confirm if corresponding authors affiliations are identified correctly and amend if any.Yes, it is correct.
Collapse
Affiliation(s)
- Nan Song
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ruihan Dong
- Academy for Advanced Interdisciplinary Studies, Peking University, Beijing, Beijing, 100871, China
| | - Yuqian Pu
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China
| | - Ercheng Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, 310058, Zhejiang, China.
- Zhejiang Laboratory, Hangzhou, 311100, Zhejiang, China.
| | - Junhai Xu
- School of New Media and Communication, Tianjin University, Tianjin, Tianjin, 300072, China.
- College of Intelligence and Computing, Tianjin University, Tianjin, Tianjin, 300350, China.
| | - Fei Guo
- School of Computer Science and Engineering, Central South University, Changsha, 410083, Hunan, China.
| |
Collapse
|
43
|
Song Y, Yuan Q, Zhao H, Yang Y. Accurately identifying nucleic-acid-binding sites through geometric graph learning on language model predicted structures. Brief Bioinform 2023; 24:bbad360. [PMID: 37824738 DOI: 10.1093/bib/bbad360] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2023] [Revised: 09/18/2023] [Accepted: 09/18/2023] [Indexed: 10/14/2023] Open
Abstract
The interactions between nucleic acids and proteins are important in diverse biological processes. The high-quality prediction of nucleic-acid-binding sites continues to pose a significant challenge. Presently, the predictive efficacy of sequence-based methods is constrained by their exclusive consideration of sequence context information, whereas structure-based methods are unsuitable for proteins lacking known tertiary structures. Though protein structures predicted by AlphaFold2 could be used, the extensive computing requirement of AlphaFold2 hinders its use for genome-wide applications. Based on the recent breakthrough of ESMFold for fast prediction of protein structures, we have developed GLMSite, which accurately identifies DNA- and RNA-binding sites using geometric graph learning on ESMFold predicted structures. Here, the predicted protein structures are employed to construct protein structural graph with residues as nodes and spatially neighboring residue pairs for edges. The node representations are further enhanced through the pre-trained language model ProtTrans. The network was trained using a geometric vector perceptron, and the geometric embeddings were subsequently fed into a common network to acquire common binding characteristics. Finally, these characteristics were input into two fully connected layers to predict binding sites with DNA and RNA, respectively. Through comprehensive tests on DNA/RNA benchmark datasets, GLMSite was shown to surpass the latest sequence-based methods and be comparable with structure-based methods. Moreover, the prediction was shown useful for inferring nucleic-acid-binding proteins, demonstrating its potential for protein function discovery. The datasets, codes, and trained models are available at https://github.com/biomed-AI/nucleic-acid-binding.
Collapse
Affiliation(s)
- Yidong Song
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Qianmu Yuan
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
44
|
Tao W, Liu Y, Lin X, Song B, Zeng X. Prediction of multi-relational drug-gene interaction via Dynamic hyperGraph Contrastive Learning. Brief Bioinform 2023; 24:bbad371. [PMID: 37864294 DOI: 10.1093/bib/bbad371] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 09/11/2023] [Accepted: 09/29/2023] [Indexed: 10/22/2023] Open
Abstract
Drug-gene interaction prediction occupies a crucial position in various areas of drug discovery, such as drug repurposing, lead discovery and off-target detection. Previous studies show good performance, but they are limited to exploring the binding interactions and ignoring the other interaction relationships. Graph neural networks have emerged as promising approaches owing to their powerful capability of modeling correlations under drug-gene bipartite graphs. Despite the widespread adoption of graph neural network-based methods, many of them experience performance degradation in situations where high-quality and sufficient training data are unavailable. Unfortunately, in practical drug discovery scenarios, interaction data are often sparse and noisy, which may lead to unsatisfactory results. To undertake the above challenges, we propose a novel Dynamic hyperGraph Contrastive Learning (DGCL) framework that exploits local and global relationships between drugs and genes. Specifically, graph convolutions are adopted to extract explicit local relations among drugs and genes. Meanwhile, the cooperation of dynamic hypergraph structure learning and hypergraph message passing enables the model to aggregate information in a global region. With flexible global-level messages, a self-augmented contrastive learning component is designed to constrain hypergraph structure learning and enhance the discrimination of drug/gene representations. Experiments conducted on three datasets show that DGCL is superior to eight state-of-the-art methods and notably gains a 7.6% performance improvement on the DGIdb dataset. Further analyses verify the robustness of DGCL for alleviating data sparsity and over-smoothing issues.
Collapse
Affiliation(s)
- Wen Tao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Yuansheng Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xuan Lin
- School of Computer Science, Xiangtan University, Xiangtan, 411105 Hunan, China
- Key Laboratory of Intelligent Computing and Information Processing, Ministry of Education (Xiangtan University), Xiangtan, 411105 Hunan, China
| | - Bosheng Song
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| | - Xiangxiang Zeng
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082 Hunan, China
| |
Collapse
|
45
|
Wang L, Zhou Y, Chen Q. AMMVF-DTI: A Novel Model Predicting Drug-Target Interactions Based on Attention Mechanism and Multi-View Fusion. Int J Mol Sci 2023; 24:14142. [PMID: 37762445 PMCID: PMC10531525 DOI: 10.3390/ijms241814142] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/09/2023] [Accepted: 09/12/2023] [Indexed: 09/29/2023] Open
Abstract
Accurate identification of potential drug-target interactions (DTIs) is a crucial task in drug development and repositioning. Despite the remarkable progress achieved in recent years, improving the performance of DTI prediction still presents significant challenges. In this study, we propose a novel end-to-end deep learning model called AMMVF-DTI (attention mechanism and multi-view fusion), which leverages a multi-head self-attention mechanism to explore varying degrees of interaction between drugs and target proteins. More importantly, AMMVF-DTI extracts interactive features between drugs and proteins from both node-level and graph-level embeddings, enabling a more effective modeling of DTIs. This advantage is generally lacking in existing DTI prediction models. Consequently, when compared to many of the start-of-the-art methods, AMMVF-DTI demonstrated excellent performance on the human, C. elegans, and DrugBank baseline datasets, which can be attributed to its ability to incorporate interactive information and mine features from both local and global structures. The results from additional ablation experiments also confirmed the importance of each module in our AMMVF-DTI model. Finally, a case study is presented utilizing our model for COVID-19-related DTI prediction. We believe the AMMVF-DTI model can not only achieve reasonable accuracy in DTI prediction, but also provide insights into the understanding of potential interactions between drugs and targets.
Collapse
|
46
|
Xiaolin X, Xiaozhi L, Guoping H, Hongwei L, Jinkuo G, Xiyun B, Zhen T, Xiaofang M, Yanxia L, Na X, Chunyan Z, Rui G, Kuan W, Cheng Z, Cuancuan W, Mingyong L, Xinping D. Overfit deep neural network for predicting drug-target interactions. iScience 2023; 26:107646. [PMID: 37680476 PMCID: PMC10480310 DOI: 10.1016/j.isci.2023.107646] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 06/28/2023] [Accepted: 08/11/2023] [Indexed: 09/09/2023] Open
Abstract
Drug-target interactions (DTIs) prediction is an important step in drug discovery. As traditional biological experiments or high-throughput screening are high cost and time-consuming, many deep learning models have been developed. Overfitting must be avoided when training deep learning models. We propose a simple framework, called OverfitDTI, for DTI prediction. In OverfitDTI, a deep neural network (DNN) model is overfit to sufficiently learn the features of the chemical space of drugs and the biological space of targets. The weights of trained DNN model form an implicit representation of the nonlinear relationship between drugs and targets. Performance of OverfitDTI on three public datasets showed that the overfit DNN models fit the nonlinear relationship with high accuracy. We identified fifteen compounds that interacted with TEK, a receptor tyrosine kinase contributing to vascular homeostasis, and the predicted AT9283 and dorsomorphin were experimentally demonstrated as inhibitors of TEK in human umbilical vein endothelial cells (HUVECs).
Collapse
Affiliation(s)
- Xiao Xiaolin
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Xiaozhi
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - He Guoping
- Geriatrics Department, Traditional Chinese Medicine Hospital of Binhai New Area, Tianjin, China
| | - Liu Hongwei
- School of Clinical Medicine, North China University of Science and Technology, Tangshan, Hebei, China
- Department of Anesthesiology, Tangshan Maternal and Child Health Hospital, Tangshan, Hebei, China
| | - Guo Jinkuo
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| | - Bian Xiyun
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Tian Zhen
- Deepwater Technology Research Institute, China National Offshore Oil Corporation, Tianjin, China
| | - Ma Xiaofang
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Li Yanxia
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Xue Na
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Chunyan
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Central Laboratory, Tianjin Fifth Central Hospital, Tianjin, China
| | - Gao Rui
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Kuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Zhang Cheng
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Wang Cuancuan
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Liu Mingyong
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- Department of Urology, Tianjin Fifth Central Hospital, Tianjin, China
| | - Du Xinping
- Department of Cardiology, Tianjin Fifth Central Hospital, Tianjin, China
- Tianjin Key Laboratory of Epigenetics for Organ Development of Premature Infants, Tianjin Fifth Central Hospital, Tianjin, China
- College of Food Science and Engineering, Tianjin University of Science & Technology, Tianjin, China
| |
Collapse
|
47
|
Dong L, Shi S, Qu X, Luo D, Wang B. Ligand binding affinity prediction with fusion of graph neural networks and 3D structure-based complex graph. Phys Chem Chem Phys 2023; 25:24110-24120. [PMID: 37655493 DOI: 10.1039/d3cp03651k] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/02/2023]
Abstract
Accurate prediction of protein-ligand binding affinity is pivotal for drug design and discovery. Here, we proposed a novel deep fusion graph neural networks framework named FGNN to learn the protein-ligand interactions from the 3D structures of protein-ligand complexes. Unlike 1D sequences for proteins or 2D graphs for ligands, the 3D graph of protein-ligand complex enables the more accurate representations of the protein-ligand interactions. Benchmark studies have shown that our fusion models FGNN can achieve more accurate prediction of binding affinity than any individual algorithm. The advantages of fusion strategies have been demonstrated in terms of expressive power of data, learning efficiency and model interpretability. Our fusion models show satisfactory performances on diverse data sets, demonstrating their generalization ability. Given the good performances in both binding affinity prediction and virtual screening, our fusion models are expected to be practically applied for drug screening and design. Our work highlights the potential of the fusion graph neural network algorithm in solving complex prediction problems in computational biology and chemistry. The fusion graph neural networks (FGNN) model is freely available in https://github.com/LinaDongXMU/FGNN.
Collapse
Affiliation(s)
- Lina Dong
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Shuai Shi
- Department of Algorithm, TuringQ Co., Ltd., Shanghai, 200240, China
| | - Xiaoyang Qu
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Ding Luo
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
| | - Binju Wang
- State Key Laboratory of Physical Chemistry of Solid Surfaces and Fujian Provincial Key Laboratory of Theoretical and Computational Chemistry, iChEM, College of Chemistry and Chemical Engineering, Xiamen University, Xiamen, 361005, China.
- Innovation Laboratory for Sciences and Technologies of Energy Materials of Fujian Province (IKKEM), Xiamen, 361005, China
| |
Collapse
|
48
|
Liu L, Zhang Q, Wei Y, Zhao Q, Liao B. A Biological Feature and Heterogeneous Network Representation Learning-Based Framework for Drug-Target Interaction Prediction. Molecules 2023; 28:6546. [PMID: 37764321 PMCID: PMC10535805 DOI: 10.3390/molecules28186546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 09/06/2023] [Accepted: 09/07/2023] [Indexed: 09/29/2023] Open
Abstract
The prediction of drug-target interaction (DTI) is crucial to drug discovery. Although the interactions between the drug and target can be accurately verified by traditional biochemical experiments, the determination of DTI through biochemical experiments is a time-consuming, laborious, and expensive process. Therefore, we propose a learning-based framework named BG-DTI for drug-target interaction prediction. Our model combines two main approaches based on biological features and heterogeneous networks to identify interactions between drugs and targets. First, we extract original features from the sequence to encode each drug and target. Later, we further consider the relationships among various biological entities by constructing drug-drug similarity networks and target-target similarity networks. Furthermore, a graph convolutional network and a graph attention network in the graph representation learning module help us learn the features representation of drugs and targets. After obtaining the features from graph representation learning modules, these features are combined into fusion descriptors for drug-target pairs. Finally, we send the fusion descriptors and labels to a random forest classifier for predicting DTI. The evaluation results show that BG-DTI achieves an average AUC of 0.938 and an average AUPR of 0.930, which is better than those of five existing state-of-the-art methods. We believe that BG-DTI can facilitate the development of drug discovery or drug repurposing.
Collapse
Affiliation(s)
- Liwei Liu
- College of Science, Dalian Jiaotong University, Dalian 116028, China; (L.L.); (Q.Z.)
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China
| | - Qi Zhang
- College of Science, Dalian Jiaotong University, Dalian 116028, China; (L.L.); (Q.Z.)
| | - Yuxiao Wei
- College of Software, Dalian Jiaotong University, Dalian 116028, China;
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan 114051, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Hainan Normal University, Haikou 571158, China
| |
Collapse
|
49
|
Ma J, Li C, Zhang Y, Wang Z, Li S, Guo Y, Zhang L, Liu H, Gao X, Song J. MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning. Bioinformatics 2023; 39:btad524. [PMID: 37610353 PMCID: PMC10518077 DOI: 10.1093/bioinformatics/btad524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 07/26/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed. RESULTS To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new "guilty-by-association"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning. AVAILABILITY AND IMPLEMENTATION MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.
Collapse
Affiliation(s)
- Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Zhikang Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xin Gao
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Wenzhou Medical University-Monash Biomedicine Discovery Institute (BDI) Alliance in Clinical and Experimental Biomedicine, Wenzhou 325035, China
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|
50
|
Zhu C, Xia X, Li N, Zhong F, Yang Z, Liu L. RDKG-115: Assisting drug repurposing and discovery for rare diseases by trimodal knowledge graph embedding. Comput Biol Med 2023; 164:107262. [PMID: 37481946 DOI: 10.1016/j.compbiomed.2023.107262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Revised: 07/07/2023] [Accepted: 07/16/2023] [Indexed: 07/25/2023]
Abstract
Rare diseases (RDs) may affect individuals in small numbers, but they have a significant impact on a global scale. Accurate diagnosis of RDs is challenging, and there is a severe lack of drugs available for treatment. Pharmaceutical companies have shown a preference for drug repurposing from existing drugs developed for other diseases due to the high investment, high risk, and long cycle involved in RD drug development. Compared to traditional approaches, knowledge graph embedding (KGE) based methods are more efficient and convenient, as they treat drug repurposing as a link prediction task. KGE models allow for the enrichment of existing knowledge by incorporating multimodal information from various sources. In this study, we constructed RDKG-115, a rare disease knowledge graph involving 115 RDs, composed of 35,643 entities, 25 relations, and 5,539,839 refined triplets, based on 372,384 high-quality literature and 4 biomedical datasets: DRKG, Pathway Commons, PharmKG, and PMapp. Subsequently, we developed a trimodal KGE model containing structure, category, and description embeddings using reverse-hyperplane projection. We utilized this model to infer 4199 reliable new inferred triplets from RDKG-115. Finally, we calculated potential drugs and small molecules for each of the 115 RDs, taking multiple sclerosis as a case study. This study provides a paradigm for large-scale screening of drug repurposing and discovery for RDs, which will speed up the drug development process and ultimately benefit patients with RDs. The source code and data are available at https://github.com/ZhuChaoY/RDKG-115.
Collapse
Affiliation(s)
- Chaoyu Zhu
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Xiaoqiong Xia
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China
| | - Nan Li
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China
| | - Fan Zhong
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China.
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, China.
| | - Lei Liu
- Intelligent Medicine Institute, Shanghai Medical College, Fudan University, Shanghai, 200032, China; Shanghai Institute of Stem Cell Research and Clinical Translation, Shanghai, 200120, China.
| |
Collapse
|