1
|
Alvarez-Mamani E, Dechant R, Beltran-Castañón CA, Ibáñez AJ. Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
Affiliation(s)
- Edwin Alvarez-Mamani
- Engineering Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
| | - Reinhard Dechant
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Calico Life Sciences, 1170 Veterans Blvd, San Francisco, CA, 94080, USA
| | | | - Alfredo J Ibáñez
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
- Science Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
| |
Collapse
|
2
|
Tayal S, Bhatnagar S. Role of molecular mimicry in the SARS-CoV-2-human interactome for pathogenesis of cardiovascular diseases: An update to ImitateDB. Comput Biol Chem 2023; 106:107919. [PMID: 37463554 DOI: 10.1016/j.compbiolchem.2023.107919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2023] [Revised: 06/13/2023] [Accepted: 07/06/2023] [Indexed: 07/20/2023]
Abstract
Mimicry of host proteins is a strategy employed by pathogens to hijack host functions. Domain and motif mimicry was explored in the experimental and predicted SARS-CoV-2-human interactome. The host first interactor proteins were also added to capture the continuum of the interactions. The domains and motifs of the proteins were annotated using NCBI CD Search and ScanProsite, respectively. Host and pathogen proteins with a common host interactor and similar domain/motif constitute a mimicry pair indicating global structural similarity (domain mimicry pair; DMP) or local sequence similarity (motif mimicry pair; MMP). 593 DMPs and 7,02,472 MMPs were determined. AAA, DEXDc and Macro domains were frequent among DMPs whereas glycosylation, myristoylation and RGD motifs were abundant among MMP. The proteins involved in mimicry were visualised as a SARS-CoV-2 mimicry interaction network. The host proteins were enriched in multiple CVD pathways indicating the role of mimicry in COVID-19 associated CVDs. Bridging nodes were identified as potential drug targets. Approved antihypertensive and anti-inflammatory drugs are proposed for repurposing against COVID-19 associated CVDs. The SARS-CoV-2 mimicry data has been updated in ImitateDB (http://imitatedb.sblab-nsit.net/SARSCoV2Mimicry). Determination of key mechanisms, proteins, pathways, drug targets and repurposing candidates is critical for developing therapeutics for SARS CoV-2 associated CVDs.
Collapse
Affiliation(s)
- Sonali Tayal
- Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi 110078, India
| | - Sonika Bhatnagar
- Computational and Structural Biology Laboratory, Department of Biological Sciences and Engineering, Netaji Subhas University of Technology, Dwarka, New Delhi 110078, India.
| |
Collapse
|
3
|
Nissan N, Hooker J, Arezza E, Dick K, Golshani A, Mimee B, Cober E, Green J, Samanfar B. Large-scale data mining pipeline for identifying novel soybean genes involved in resistance against the soybean cyst nematode. FRONTIERS IN BIOINFORMATICS 2023; 3:1199675. [PMID: 37409347 PMCID: PMC10319130 DOI: 10.3389/fbinf.2023.1199675] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/31/2023] [Indexed: 07/07/2023] Open
Abstract
The soybean cyst nematode (SCN) [Heterodera glycines Ichinohe] is a devastating pathogen of soybean [Glycine max (L.) Merr.] that is rapidly becoming a global economic issue. Two loci conferring SCN resistance have been identified in soybean, Rhg1 and Rhg4; however, they offer declining protection. Therefore, it is imperative that we identify additional mechanisms for SCN resistance. In this paper, we develop a bioinformatics pipeline to identify protein-protein interactions related to SCN resistance by data mining massive-scale datasets. The pipeline combines two leading sequence-based protein-protein interaction predictors, the Protein-protein Interaction Prediction Engine (PIPE), PIPE4, and Scoring PRotein INTeractions (SPRINT) to predict high-confidence interactomes. First, we predicted the top soy interacting protein partners of the Rhg1 and Rhg4 proteins. Both PIPE4 and SPRINT overlap in their predictions with 58 soybean interacting partners, 19 of which had GO terms related to defense. Beginning with the top predicted interactors of Rhg1 and Rhg4, we implement a "guilt by association" in silico proteome-wide approach to identify novel soybean genes that may be involved in SCN resistance. This pipeline identified 1,082 candidate genes whose local interactomes overlap significantly with the Rhg1 and Rhg4 interactomes. Using GO enrichment tools, we highlighted many important genes including five genes with GO terms related to response to the nematode (GO:0009624), namely, Glyma.18G029000, Glyma.11G228300, Glyma.08G120500, Glyma.17G152300, and Glyma.08G265700. This study is the first of its kind to predict interacting partners of known resistance proteins Rhg1 and Rhg4, forming an analysis pipeline that enables researchers to focus their search on high-confidence targets to identify novel SCN resistance genes in soybean.
Collapse
Affiliation(s)
- Nour Nissan
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Julia Hooker
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Eric Arezza
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Kevin Dick
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Ashkan Golshani
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| | - Benjamin Mimee
- Agriculture and Agri-Food Canada, Saint-Jean-sur-Richelieu Research and Development Centre, Saint-Jeansur-Richelieu, QC, Canada
| | - Elroy Cober
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
| | - James Green
- Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada
| | - Bahram Samanfar
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre, Ottawa, ON, Canada
- Department of Biology and Ottawa Institute of Systems Biology, Carleton University, Ottawa, ON, Canada
| |
Collapse
|
4
|
Reciprocal perspective as a super learner improves drug-target interaction prediction (MUSDTI). Sci Rep 2022; 12:13237. [PMID: 35918366 PMCID: PMC9344797 DOI: 10.1038/s41598-022-16493-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 07/11/2022] [Indexed: 11/08/2022] Open
Abstract
The identification of novel drug-target interactions (DTI) is critical to drug discovery and drug repurposing to address contemporary medical and public health challenges presented by emergent diseases. Historically, computational methods have framed DTI prediction as a binary classification problem (indicating whether or not a drug physically interacts with a given protein target); however, framing the problem instead as a regression-based prediction of the physiochemical binding affinity is more meaningful. With growing databases of experimentally derived drug-target interactions (e.g. Davis, Binding-DB, and Kiba), deep learning-based DTI predictors can be effectively leveraged to achieve state-of-the-art (SOTA) performance. In this work, we formulated a DTI competition as part of the coursework for a senior undergraduate machine learning course and challenged students to generate component DTI models that might surpass SOTA models and effectively combine these component models as part of a meta-model using the Reciprocal Perspective (RP) multi-view learning framework. Following 6 weeks of concerted effort, 28 student-produced component deep-learning DTI models were leveraged in this work to produce a new SOTA RP-DTI model, denoted the Meta Undergraduate Student DTI (MUSDTI) model. Through a series of experiments we demonstrate that (1) RP can considerably improve SOTA DTI prediction, (2) our new double-cold experimental design is more appropriate for emergent DTI challenges, (3) that our novel MUSDTI meta-model outperforms SOTA models, (4) that RP can improve upon individual models as an ensembling method, and finally, (5) RP can be utilized for low computation transfer learning. This work introduces a number of important revelations for the field of DTI prediction and sequence-based, pairwise prediction in general.
Collapse
|
5
|
Aghdam R, Habibi M, Taheri G. Using informative features in machine learning based method for COVID-19 drug repurposing. J Cheminform 2021; 13:70. [PMID: 34544500 PMCID: PMC8451172 DOI: 10.1186/s13321-021-00553-9] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2021] [Accepted: 09/06/2021] [Indexed: 01/14/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) is caused by a novel virus named Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2). This virus induced a large number of deaths and millions of confirmed cases worldwide, creating a serious danger to public health. However, there are no specific therapies or drugs available for COVID-19 treatment. While new drug discovery is a long process, repurposing available drugs for COVID-19 can help recognize treatments with known clinical profiles. Computational drug repurposing methods can reduce the cost, time, and risk of drug toxicity. In this work, we build a graph as a COVID-19 related biological network. This network is related to virus targets or their associated biological processes. We select essential proteins in the constructed biological network that lead to a major disruption in the network. Our method from these essential proteins chooses 93 proteins related to COVID-19 pathology. Then, we propose multiple informative features based on drug-target and protein-protein interaction information. Through these informative features, we find five appropriate clusters of drugs that contain some candidates as potential COVID-19 treatments. To evaluate our results, we provide statistical and clinical evidence for our candidate drugs. From our proposed candidate drugs, 80% of them were studied in other studies and clinical trials.
Collapse
Affiliation(s)
- Rosa Aghdam
- School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran.
| | - Mahnaz Habibi
- Department of Mathematics, Qazvin Branch, Islamic Azad University, Qazvin, Iran
| | - Golnaz Taheri
- Department of Electrical Engineering and Computer Science, KTH Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| |
Collapse
|