1
|
Djeddi WE, Hermi K, Ben Yahia S, Diallo G. Advancing drug-target interaction prediction: a comprehensive graph-based approach integrating knowledge graph embedding and ProtBert pretraining. BMC Bioinformatics 2023; 24:488. [PMID: 38114937 PMCID: PMC10731821 DOI: 10.1186/s12859-023-05593-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Accepted: 11/30/2023] [Indexed: 12/21/2023] Open
Abstract
BACKGROUND The pharmaceutical field faces a significant challenge in validating drug target interactions (DTIs) due to the time and cost involved, leading to only a fraction being experimentally verified. To expedite drug discovery, accurate computational methods are essential for predicting potential interactions. Recently, machine learning techniques, particularly graph-based methods, have gained prominence. These methods utilize networks of drugs and targets, employing knowledge graph embedding (KGE) to represent structured information from knowledge graphs in a continuous vector space. This phenomenon highlights the growing inclination to utilize graph topologies as a means to improve the precision of predicting DTIs, hence addressing the pressing requirement for effective computational methodologies in the field of drug discovery. RESULTS The present study presents a novel approach called DTIOG for the prediction of DTIs. The methodology employed in this study involves the utilization of a KGE strategy, together with the incorporation of contextual information obtained from protein sequences. More specifically, the study makes use of Protein Bidirectional Encoder Representations from Transformers (ProtBERT) for this purpose. DTIOG utilizes a two-step process to compute embedding vectors using KGE techniques. Additionally, it employs ProtBERT to determine target-target similarity. Different similarity measures, such as Cosine similarity or Euclidean distance, are utilized in the prediction procedure. In addition to the contextual embedding, the proposed unique approach incorporates local representations obtained from the Simplified Molecular Input Line Entry Specification (SMILES) of drugs and the amino acid sequences of protein targets. CONCLUSIONS The effectiveness of the proposed approach was assessed through extensive experimentation on datasets pertaining to Enzymes, Ion Channels, and G-protein-coupled Receptors. The remarkable efficacy of DTIOG was showcased through the utilization of diverse similarity measures in order to calculate the similarities between drugs and targets. The combination of these factors, along with the incorporation of various classifiers, enabled the model to outperform existing algorithms in its ability to predict DTIs. The consistent observation of this advantage across all datasets underlines the robustness and accuracy of DTIOG in the domain of DTIs. Additionally, our case study suggests that the DTIOG can serve as a valuable tool for discovering new DTIs.
Collapse
Affiliation(s)
- Warith Eddine Djeddi
- LR11ES14, Faculty of Sciences of Tunis, University of Tunis El Manar, Campus Universitaire, 2092, Tunis, Tunisia.
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia.
| | - Khalil Hermi
- High Institute of Informatics in Kef, University of Jendouba, Saleh Ayech, 8189, Jendouba, Tunisia
| | - Sadok Ben Yahia
- Department of Software Science, Tallinn University of Technology, Ehitajate tee-5, 12618, Tallinn, Estonia
- The Maersk Mc-Kinney Moller Institute, Southern Syddansk Universitet, Alsion 2, 6400, Sønderborg, Denmark
| | - Gayo Diallo
- Bordeaux Population Health Inserm 1219, University of Bordeaux, rue Léo Saignat, 33000, Bordeaux, France
| |
Collapse
|
2
|
Tan M, Xia J, Luo H, Meng G, Zhu Z. Applying the digital data and the bioinformatics tools in SARS-CoV-2 research. Comput Struct Biotechnol J 2023; 21:4697-4705. [PMID: 37841328 PMCID: PMC10568291 DOI: 10.1016/j.csbj.2023.09.044] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Revised: 09/29/2023] [Accepted: 09/29/2023] [Indexed: 10/17/2023] Open
Abstract
Bioinformatics has been playing a crucial role in the scientific progress to fight against the pandemic of the coronavirus disease 2019 (COVID-19) caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The advances in novel algorithms, mega data technology, artificial intelligence and deep learning assisted the development of novel bioinformatics tools to analyze daily increasing SARS-CoV-2 data in the past years. These tools were applied in genomic analyses, evolutionary tracking, epidemiological analyses, protein structure interpretation, studies in virus-host interaction and clinical performance. To promote the in-silico analysis in the future, we conducted a review which summarized the databases, web services and software applied in SARS-CoV-2 research. Those digital resources applied in SARS-CoV-2 research may also potentially contribute to the research in other coronavirus and non-coronavirus viruses.
Collapse
Affiliation(s)
- Meng Tan
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Jiaxin Xia
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Haitao Luo
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Geng Meng
- College of Veterinary Medicine, China Agricultural University, Beijing, China
| | - Zhenglin Zhu
- School of Life Sciences, Chongqing University, Chongqing, China
| |
Collapse
|
3
|
Doğan T, Atas H, Joshi V, Atakan A, Rifaioglu A, Nalbat E, Nightingale A, Saidi R, Volynkin V, Zellner H, Cetin-Atalay R, Martin M, Atalay V. CROssBAR: comprehensive resource of biomedical relations with knowledge graph representations. Nucleic Acids Res 2021; 49:e96. [PMID: 34181736 PMCID: PMC8450100 DOI: 10.1093/nar/gkab543] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2020] [Revised: 04/11/2021] [Accepted: 06/10/2021] [Indexed: 12/11/2022] Open
Abstract
Systemic analysis of available large-scale biological/biomedical data is critical for studying biological mechanisms, and developing novel and effective treatment approaches against diseases. However, different layers of the available data are produced using different technologies and scattered across individual computational resources without any explicit connections to each other, which hinders extensive and integrative multi-omics-based analysis. We aimed to address this issue by developing a new data integration/representation methodology and its application by constructing a biological data resource. CROssBAR is a comprehensive system that integrates large-scale biological/biomedical data from various resources and stores them in a NoSQL database. CROssBAR is enriched with the deep-learning-based prediction of relationships between numerous data entries, which is followed by the rigorous analysis of the enriched data to obtain biologically meaningful modules. These complex sets of entities and relationships are displayed to users via easy-to-interpret, interactive knowledge graphs within an open-access service. CROssBAR knowledge graphs incorporate relevant genes-proteins, molecular interactions, pathways, phenotypes, diseases, as well as known/predicted drugs and bioactive compounds, and they are constructed on-the-fly based on simple non-programmatic user queries. These intensely processed heterogeneous networks are expected to aid systems-level research, especially to infer biological mechanisms in relation to genes, proteins, their ligands, and diseases.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Computer Engineering, Hacettepe University, Ankara 06800, Turkey
- Institute of Informatics, Hacettepe University, Ankara 06800, Turkey
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Heval Atas
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
| | - Vishal Joshi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Ahmet Atakan
- Department of Computer Engineering, METU, Ankara 06800, Turkey
- Department of Computer Engineering, EBYU, Erzincan 24002, Turkey
| | - Ahmet Sureyya Rifaioglu
- Department of Computer Engineering, METU, Ankara 06800, Turkey
- Department of Computer Engineering, İskenderun Technical University, Hatay 31200, Turkey
| | - Esra Nalbat
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
| | - Andrew Nightingale
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rabie Saidi
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Vladimir Volynkin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Hermann Zellner
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Rengul Cetin-Atalay
- Cancer Systems Biology Laboratory, Graduate School of Informatics, METU, Ankara 06800, Turkey
- Section of Pulmonary and Critical Care Medicine, University of Chicago, Chicago, IL 60637, USA
| | - Maria Martin
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL–EBI), Hinxton, Cambridgeshire CB10 1SD, UK
| | - Volkan Atalay
- Department of Computer Engineering, METU, Ankara 06800, Turkey
| |
Collapse
|