1
|
Djeddi WE, Hermi K, Ben Yahia S, Diallo G. Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining. BMC Bioinformatics 2023; 24:488. [PMID: 38114937 PMCID: PMC10731821 DOI: 10.1186/s12859-023-05593-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery. RESULTS The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets. CONCLUSIONS The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.
Collapse
Affiliation(s)
- Warith Eddine Djeddi
- LR11ES14, Faculty of Sciences of Tunis, University of Tunis El Manar, Campus Universitaire, 2092, Tunis, Tunisia.
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia.
| | - Khalil Hermi
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia
| | - Sadok Ben Yahia
- Department of Software Science, Tallinn University of Technology, Ehitajate tee-5, 12618, Tallinn, Estonia
- The Maersk Mc-Kinney Moller Institute, Southern Syddansk Universitet, Alsion 2, 6400, Sønderborg, Denmark
| | - Gayo Diallo
- Bordeaux Population Health Inserm 1219, University of Bordeaux, rue Léo Saignat, 33000, Bordeaux, France
| |
Collapse
|
2
|
Ma J, Li C, Zhang Y, Wang Z, Li S, Guo Y, Zhang L, Liu H, Gao X, Song J. MULGA, a unified multi-view graph autoencoder-based approach for identifying drug-protein interaction and drug repositioning. Bioinformatics 2023; 39:btad524. [PMID: 37610353 PMCID: PMC10518077 DOI: 10.1093/bioinformatics/btad524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2023] [Revised: 07/26/2023] [Accepted: 08/22/2023] [Indexed: 08/24/2023] Open
Abstract
MOTIVATION Identifying drug-protein interactions (DPIs) is a critical step in drug repositioning, which allows reuse of approved drugs that may be effective for treating a different disease and thereby alleviates the challenges of new drug development. Despite the fact that a great variety of computational approaches for DPI prediction have been proposed, key challenges, such as extendable and unbiased similarity calculation, heterogeneous information utilization, and reliable negative sample selection, remain to be addressed. RESULTS To address these issues, we propose a novel, unified multi-view graph autoencoder framework, termed MULGA, for both DPI and drug repositioning predictions. MULGA is featured by: (i) a multi-view learning technique to effectively learn authentic drug affinity and target affinity matrices; (ii) a graph autoencoder to infer missing DPI interactions; and (iii) a new "guilty-by-association"-based negative sampling approach for selecting highly reliable non-DPIs. Benchmark experiments demonstrate that MULGA outperforms state-of-the-art methods in DPI prediction and the ablation studies verify the effectiveness of each proposed component. Importantly, we highlight the top drugs shortlisted by MULGA that target the spike glycoprotein of severe acute respiratory syndrome coronavirus 2 (SAR-CoV-2), offering additional insights into and potentially useful treatment option for COVID-19. Together with the availability of datasets and source codes, we envision that MULGA can be explored as a useful tool for DPI prediction and drug repositioning. AVAILABILITY AND IMPLEMENTATION MULGA is publicly available for academic purposes at https://github.com/jianiM/MULGA/.
Collapse
Affiliation(s)
- Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Yiwen Zhang
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Zhikang Wang
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
| | - Shanshan Li
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Yuming Guo
- Climate, Air Quality Research Unit, School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia
| | - Lin Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Hui Liu
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou 221116, China
| | - Xin Gao
- KAUST Computational Bioscience Research Center (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Wenzhou Medical University-Monash Biomedicine Discovery Institute (BDI) Alliance in Clinical and Experimental Biomedicine, Wenzhou 325035, China
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| |
Collapse
|