1
|
Cheng X, Yang X, Guan Y, Feng Y. ERT-GFAN: A multimodal drug-target interaction prediction model based on molecular biology and knowledge-enhanced attention mechanism. Comput Biol Med 2024; 180:109012. [PMID: 39153394 DOI: 10.1016/j.compbiomed.2024.109012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 08/06/2024] [Accepted: 08/07/2024] [Indexed: 08/19/2024]
Abstract
In drug discovery, precisely identifying drug-target interactions is crucial for finding new drugs and understanding drug mechanisms. Evolving drug/target heterogeneous data presents challenges in obtaining multimodal representation in drug-target prediction(DTI). To deal with this, we propose 'ERT-GFAN', a multimodal drug-target interaction prediction model inspired by molecular biology. Firstly, it integrates bio-inspired principles to obtain structure feature of drugs and targets using Extended Connectivity Fingerprints(ECFP). Simultaneously, the knowledge graph embedding model RotatE is employed to discover the interaction feature of drug-target pairs. Subsequently, Transformer is utilized to refine the contextual neighborhood features from the obtained structure feature and interaction features, and multi-modal high-dimensional fusion features of the three-modal information constructed. Finally, the final DTI prediction results are outputted by integrating the multimodal fusion features into a graphical high-dimensional fusion feature attention network (GFAN) using our innovative multimodal high-dimensional fusion feature attention. This multimodal approach offers a comprehensive understanding of drug-target interactions, addressing challenges in complex knowledge graphs. By combining structure feature, interaction feature, and contextual neighborhood features, 'ERT-GFAN' excels in predicting DTI. Empirical evaluations on three datasets demonstrate our method's superior performance, with AUC of 0.9739, 0.9862, and 0.9667, AUPR of 0.9598, 0.9789, and 0.9750, and Mean Reciprocal Rank(MRR) of 0.7386, 0.7035, and 0.7133. Ablation studies show over a 5% improvement in predictive performance compared to baseline unimodal and bimodal models. These results, along with detailed case studies, highlight the efficacy and robustness of our approach.
Collapse
Affiliation(s)
- Xiaoqing Cheng
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China.
| | - Xixin Yang
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China; School of Automation, Qingdao University, Qingdao, 266071, China.
| | - Yuanlin Guan
- School of Mechanical and Automotive Engineering, Qingdao University of Technology, Qingdao, 266071, China; Key Lab of Industrial Fluid Energy Conservation and Pollution Control, Ministry of Education, Qingdao University of Technology, Qingdao, 266071, China
| | - Yihan Feng
- College of Computer Science and Technology, Qingdao University, Qingdao, 266071, China
| |
Collapse
|
2
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
3
|
Wang M, Wang J, Rong Z, Wang L, Xu Z, Zhang L, He J, Li S, Cao L, Hou Y, Li K. A bidirectional interpretable compound-protein interaction prediction framework based on cross attention. Comput Biol Med 2024; 172:108239. [PMID: 38460309 DOI: 10.1016/j.compbiomed.2024.108239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 02/25/2024] [Accepted: 02/26/2024] [Indexed: 03/11/2024]
Abstract
The identification of compound-protein interactions (CPIs) plays a vital role in drug discovery. However, the huge cost and labor-intensive nature in vitro and vivo experiments make it urgent for researchers to develop novel CPI prediction methods. Despite emerging deep learning methods have achieved promising performance in CPI prediction, they also face ongoing challenges: (i) providing bidirectional interpretability from both the chemical and biological perspective for the prediction results; (ii) comprehensively evaluating model generalization performance; (iii) demonstrating the practical applicability of these models. To overcome the challenges posed by current deep learning methods, we propose a cross multi-head attention oriented bidirectional interpretable CPI prediction model (CmhAttCPI). First, CmhAttCPI takes molecular graphs and protein sequences as inputs, utilizing the GCW module to learn atom features and the CNN module to learn residue features, respectively. Second, the model applies cross multi-head attention module to compute attention weights for atoms and residues. Finally, CmhAttCPI employs a fully connected neural network to predict scores for CPIs. We evaluated the performance of CmhAttCPI on balanced datasets and imbalanced datasets. The results consistently show that CmhAttCPI outperforms multiple state-of-the-art methods. We constructed three scenarios based on compound and protein clustering and comprehensively evaluated the model generalization ability within these scenarios. The results demonstrate that the generalization ability of CmhAttCPI surpasses that of other models. Besides, the visualizations of attention weights reveal that CmhAttCPI provides chemical and biological interpretation for CPI prediction. Moreover, case studies confirm the practical applicability of CmhAttCPI in discovering anticancer candidates.
Collapse
Affiliation(s)
- Meng Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jianmin Wang
- School of Integrative Biotechnology and Translational Medicine, Yonsei University, Incheon, 21983, Republic of Korea
| | - Zhiwei Rong
- School of Public Health, Peking University, Beijing, 100871, China
| | - Liuying Wang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Zhenyi Xu
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Liuchao Zhang
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Jia He
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Shuang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Lei Cao
- School of Public Health, Harbin Medical University, Harbin, 150081, China
| | - Yan Hou
- School of Public Health, Peking University, Beijing, 100871, China
| | - Kang Li
- School of Public Health, Harbin Medical University, Harbin, 150081, China.
| |
Collapse
|
4
|
Zhang Y, Liu C, Liu M, Liu T, Lin H, Huang CB, Ning L. Attention is all you need: utilizing attention in AI-enabled drug discovery. Brief Bioinform 2023; 25:bbad467. [PMID: 38189543 PMCID: PMC10772984 DOI: 10.1093/bib/bbad467] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/03/2023] [Accepted: 11/25/2023] [Indexed: 01/09/2024] Open
Abstract
Recently, attention mechanism and derived models have gained significant traction in drug development due to their outstanding performance and interpretability in handling complex data structures. This review offers an in-depth exploration of the principles underlying attention-based models and their advantages in drug discovery. We further elaborate on their applications in various aspects of drug development, from molecular screening and target binding to property prediction and molecule generation. Finally, we discuss the current challenges faced in the application of attention mechanisms and Artificial Intelligence technologies, including data quality, model interpretability and computational resource constraints, along with future directions for research. Given the accelerating pace of technological advancement, we believe that attention-based models will have an increasingly prominent role in future drug discovery. We anticipate that these models will usher in revolutionary breakthroughs in the pharmaceutical domain, significantly accelerating the pace of drug development.
Collapse
Affiliation(s)
- Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Caiqi Liu
- Department of Gastrointestinal Medical Oncology, Harbin Medical University Cancer Hospital, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
- Key Laboratory of Molecular Oncology of Heilongjiang Province, No.150 Haping Road, Nangang District, Harbin, Heilongjiang 150081, China
| | - Mujiexin Liu
- Chongqing Key Laboratory of Sichuan-Chongqing Co-construction for Diagnosis and Treatment of Infectious Diseases Integrated Traditional Chinese and Western Medicine, College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Tianyuan Liu
- Graduate School of Science and Technology, University of Tsukuba, Tsukuba, Japan
| | - Hao Lin
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Cheng-Bing Huang
- School of Computer Science and Technology, Aba Teachers University, Aba, China
| | - Lin Ning
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu 611844, China
| |
Collapse
|
5
|
Xia L, Xu L, Pan S, Niu D, Zhang B, Li Z. Drug-target binding affinity prediction using message passing neural network and self supervised learning. BMC Genomics 2023; 24:557. [PMID: 37730555 PMCID: PMC10510145 DOI: 10.1186/s12864-023-09664-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2023] [Accepted: 09/09/2023] [Indexed: 09/22/2023] Open
Abstract
BACKGROUND Drug-target binding affinity (DTA) prediction is important for the rapid development of drug discovery. Compared to traditional methods, deep learning methods provide a new way for DTA prediction to achieve good performance without much knowledge of the biochemical background. However, there are still room for improvement in DTA prediction: (1) only focusing on the information of the atom leads to an incomplete representation of the molecular graph; (2) the self-supervised learning method could be introduced for protein representation. RESULTS In this paper, a DTA prediction model using the deep learning method is proposed, which uses an undirected-CMPNN for molecular embedding and combines CPCProt and MLM models for protein embedding. An attention mechanism is introduced to discover the important part of the protein sequence. The proposed method is evaluated on the datasets Ki and Davis, and the model outperformed other deep learning methods. CONCLUSIONS The proposed model improves the performance of the DTA prediction, which provides a novel strategy for deep learning-based virtual screening methods.
Collapse
Affiliation(s)
- Leiming Xia
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Lei Xu
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Shourun Pan
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Dongjiang Niu
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Beiyi Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Zhen Li
- College of Computer Science and Technology, Qingdao University, Qingdao, China.
| |
Collapse
|
6
|
Yamane H, Ishida T. Helix encoder: a compound-protein interaction prediction model specifically designed for class A GPCRs. FRONTIERS IN BIOINFORMATICS 2023; 3:1193025. [PMID: 37304403 PMCID: PMC10250622 DOI: 10.3389/fbinf.2023.1193025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 05/15/2023] [Indexed: 06/13/2023] Open
Abstract
Class A G protein-coupled receptors (GPCRs) represent the largest class of GPCRs. They are essential targets of drug discovery and thus various computational approaches have been applied to predict their ligands. However, there are a large number of orphan receptors in class A GPCRs and it is difficult to use a general protein-specific supervised prediction scheme. Therefore, the compound-protein interaction (CPI) prediction approach has been considered one of the most suitable for class A GPCRs. However, the accuracy of CPI prediction is still insufficient. The current CPI prediction model generally employs the whole protein sequence as the input because it is difficult to identify the important regions in general proteins. In contrast, it is well-known that only a few transmembrane helices of class A GPCRs play a critical role in ligand binding. Therefore, using such domain knowledge, the CPI prediction performance could be improved by developing an encoding method that is specifically designed for this family. In this study, we developed a protein sequence encoder called the Helix encoder, which takes only a protein sequence of transmembrane regions of class A GPCRs as input. The performance evaluation showed that the proposed model achieved a higher prediction accuracy compared to a prediction model using the entire protein sequence. Additionally, our analysis indicated that several extracellular loops are also important for the prediction as mentioned in several biological researches.
Collapse
|