1
|
Alvarez-Mamani E, Dechant R, Beltran-Castañón CA, Ibáñez AJ. Graph embedding on mass spectrometry- and sequencing-based biomedical data. BMC Bioinformatics 2024; 25:1. [PMID: 38166530 PMCID: PMC10763173 DOI: 10.1186/s12859-023-05612-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 12/11/2023] [Indexed: 01/04/2024] Open
Abstract
Graph embedding techniques are using deep learning algorithms in data analysis to solve problems of such as node classification, link prediction, community detection, and visualization. Although typically used in the context of guessing friendships in social media, several applications for graph embedding techniques in biomedical data analysis have emerged. While these approaches remain computationally demanding, several developments over the last years facilitate their application to study biomedical data and thus may help advance biological discoveries. Therefore, in this review, we discuss the principles of graph embedding techniques and explore the usefulness for understanding biological network data derived from mass spectrometry and sequencing experiments, the current workhorses of systems biology studies. In particular, we focus on recent examples for characterizing protein-protein interaction networks and predicting novel drug functions.
Collapse
Affiliation(s)
- Edwin Alvarez-Mamani
- Engineering Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
| | - Reinhard Dechant
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru
- Calico Life Sciences, 1170 Veterans Blvd, San Francisco, CA, 94080, USA
| | | | - Alfredo J Ibáñez
- Institute for Omics Sciences and Applied Biotechnology (ICOBA PUCP), Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
- Science Department, Pontificia Universidad Católica del Perú, San Miguel, Lima, Peru.
| |
Collapse
|
2
|
Schiano di Cola V, Chiaro D, Prezioso E, Izzo S, Giampaolo F. Insight Extraction From E-Health Bookings by Means of Hypergraph and Machine Learning. IEEE J Biomed Health Inform 2023; 27:4649-4659. [PMID: 37018305 DOI: 10.1109/jbhi.2022.3233498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
New technologies are transforming medicine, and this revolution starts with data. Usually, health services within public healthcare systems are accessed through a booking centre managed by local health authorities and controlled by the regional government. In this perspective, structuring e-health data through a Knowledge Graph (KG) approach can provide a feasible method to quickly and simply organize data and/or retrieve new information. Starting from raw health bookings data from the public healthcare system in Italy, a KG method is presented to support e-health services through the extraction of medical knowledge and novel insights. By exploiting graph embedding which arranges the various attributes of the entities into the same vector space, we are able to apply Machine Learning (ML) techniques to the embedded vectors. The findings suggest that KGs could be used to assess patients' medical booking patterns, either from unsupervised or supervised ML. In particular, the former can determine possible presence of hidden groups of entities that is not immediately available through the original legacy dataset structure. The latter, although the performance of the used algorithms is not very high, shows encouraging results in predicting a patient's likelihood to undergo a particular medical visit within a year. However, many technological advances remain to be made, especially in graph database technologies and graph embedding algorithms.
Collapse
|
3
|
Cenikj G, Strojnik L, Angelski R, Ogrinc N, Koroušić Seljak B, Eftimov T. From language models to large-scale food and biomedical knowledge graphs. Sci Rep 2023; 13:7815. [PMID: 37188766 DOI: 10.1038/s41598-023-34981-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Accepted: 05/10/2023] [Indexed: 05/17/2023] Open
Abstract
Knowledge about the interactions between dietary and biomedical factors is scattered throughout uncountable research articles in an unstructured form (e.g., text, images, etc.) and requires automatic structuring so that it can be provided to medical professionals in a suitable format. Various biomedical knowledge graphs exist, however, they require further extension with relations between food and biomedical entities. In this study, we evaluate the performance of three state-of-the-art relation-mining pipelines (FooDis, FoodChem and ChemDis) which extract relations between food, chemical and disease entities from textual data. We perform two case studies, where relations were automatically extracted by the pipelines and validated by domain experts. The results show that the pipelines can extract relations with an average precision around 70%, making new discoveries available to domain experts with reduced human effort, since the domain experts should only evaluate the results, instead of finding, and reading all new scientific papers.
Collapse
Affiliation(s)
- Gjorgjina Cenikj
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia.
- Jožef Stefan International Postgraduate School, Ljubljana, 1000, Slovenia.
| | | | | | - Nives Ogrinc
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| | | | - Tome Eftimov
- Jožef Stefan Institute, Ljubljana, 1000, Slovenia
| |
Collapse
|
4
|
Rashid J, Kim J, Hussain A, Naseem U, Juneja S. A novel multiple kernel fuzzy topic modeling technique for biomedical data. BMC Bioinformatics 2022; 23:275. [PMID: 35820793 PMCID: PMC9277941 DOI: 10.1186/s12859-022-04780-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2022] [Accepted: 06/08/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Text mining in the biomedical field has received much attention and regarded as the important research area since a lot of biomedical data is in text format. Topic modeling is one of the popular methods among text mining techniques used to discover hidden semantic structures, so called topics. However, discovering topics from biomedical data is a challenging task due to the sparsity, redundancy, and unstructured format. METHODS In this paper, we proposed a novel multiple kernel fuzzy topic modeling (MKFTM) technique using fusion probabilistic inverse document frequency and multiple kernel fuzzy c-means clustering algorithm for biomedical text mining. In detail, the proposed fusion probabilistic inverse document frequency method is used to estimate the weights of global terms while MKFTM generates frequencies of local and global terms with bag-of-words. In addition, the principal component analysis is applied to eliminate higher-order negative effects for term weights. RESULTS Extensive experiments are conducted on six biomedical datasets. MKFTM achieved the highest classification accuracy 99.04%, 99.62%, 99.69%, 99.61% in the Muchmore Springer dataset and 94.10%, 89.45%, 92.91%, 90.35% in the Ohsumed dataset. The CH index value of MKFTM is higher, which shows that its clustering performance is better than state-of-the-art topic models. CONCLUSION We have confirmed from results that proposed MKFTM approach is very efficient to handles to sparsity and redundancy problem in biomedical text documents. MKFTM discovers semantically relevant topics with high accuracy for biomedical documents. Its gives better results for classification and clustering in biomedical documents. MKFTM is a new approach to topic modeling, which has the flexibility to work with a variety of clustering methods.
Collapse
Affiliation(s)
- Junaid Rashid
- Department of Computer Science and Engineering, Kongju National University, Cheonan, 31080, Korea
| | - Jungeun Kim
- Department of Software, Department of Computer Science and Engineering, Kongju National University, Cheonan, 31080, Korea.
| | - Amir Hussain
- Data Science and Cyber Analytics Research Group, Edinburgh Napier University, Edinburgh, EH11 4DY, UK
| | - Usman Naseem
- School of Computer Science, University of Sydney, Sydney, Australia
| | - Sapna Juneja
- Department of Computer Science, KIET Group of Institutions, Dehli NCR, Ghaziabad, India
| |
Collapse
|
5
|
Pavel A, del Giudice G, Federico A, Di Lieto A, Kinaret PAS, Serra A, Greco D. Integrated network analysis reveals new genes suggesting COVID-19 chronic effects and treatment. Brief Bioinform 2021; 22:1430-1441. [PMID: 33569598 PMCID: PMC7929418 DOI: 10.1093/bib/bbaa417] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 11/13/2020] [Accepted: 12/19/2020] [Indexed: 01/08/2023] Open
Abstract
The COVID-19 disease led to an unprecedented health emergency, still ongoing worldwide. Given the lack of a vaccine or a clear therapeutic strategy to counteract the infection as well as its secondary effects, there is currently a pressing need to generate new insights into the SARS-CoV-2 induced host response. Biomedical data can help to investigate new aspects of the COVID-19 pathogenesis, but source heterogeneity represents a major drawback and limitation. In this work, we applied data integration methods to develop a Unified Knowledge Space (UKS) and used it to identify a new set of genes associated with SARS-CoV-2 host response, both in vitro and in vivo. Functional analysis of these genes reveals possible long-term systemic effects of the infection, such as vascular remodelling and fibrosis. Finally, we identified a set of potentially relevant drugs targeting proteins involved in multiple steps of the host response to the virus.
Collapse
Affiliation(s)
- Alisa Pavel
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Giusy del Giudice
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Antonio Federico
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Antonio Di Lieto
- Department of Forensic Psychiatry, Aarhus University, Aarhus, Denmark
| | - Pia A S Kinaret
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| | - Angela Serra
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
| | - Dario Greco
- Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
- BioMediTech Institute, Tampere University, Tampere, Finland
- Institute of Biotechnology, University of Helsinki, Helsinki, Finland
| |
Collapse
|
6
|
Li X, Rousseau JF, Ding Y, Song M, Lu W. Understanding Drug Repurposing From the Perspective of Biomedical Entities and Their Evolution: Bibliographic Research Using Aspirin. JMIR Med Inform 2020; 8:e16739. [PMID: 32543442 PMCID: PMC7327595 DOI: 10.2196/16739] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Revised: 01/08/2020] [Accepted: 03/31/2020] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Drug development is still a costly and time-consuming process with a low rate of success. Drug repurposing (DR) has attracted significant attention because of its significant advantages over traditional approaches in terms of development time, cost, and safety. Entitymetrics, defined as bibliometric indicators based on biomedical entities (eg, diseases, drugs, and genes) studied in the biomedical literature, make it possible for researchers to measure knowledge evolution and the transfer of drug research. OBJECTIVE The purpose of this study was to understand DR from the perspective of biomedical entities (diseases, drugs, and genes) and their evolution. METHODS In the work reported in this paper, we extended the bibliometric indicators of biomedical entities mentioned in PubMed to detect potential patterns of biomedical entities in various phases of drug research and investigate the factors driving DR. We used aspirin (acetylsalicylic acid) as the subject of the study since it can be repurposed for many applications. We propose 4 easy, transparent measures based on entitymetrics to investigate DR for aspirin: Popularity Index (P1), Promising Index (P2), Prestige Index (P3), and Collaboration Index (CI). RESULTS We found that the maxima of P1, P3, and CI are closely associated with the different repurposing phases of aspirin. These metrics enabled us to observe the way in which biomedical entities interacted with the drug during the various phases of DR and to analyze the potential driving factors for DR at the entity level. P1 and CI were indicative of the dynamic trends of a specific biomedical entity over a long time period, while P2 was more sensitive to immediate changes. P3 reflected the early signs of the practical value of biomedical entities and could be valuable for tracking the research frontiers of a drug. CONCLUSIONS In-depth studies of side effects and mechanisms, fierce market competition, and advanced life science technologies are driving factors for DR. This study showcases the way in which researchers can examine the evolution of DR using entitymetrics, an approach that can be valuable for enhancing decision making in the field of drug discovery and development.
Collapse
Affiliation(s)
- Xin Li
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China.,School of Informatics, Computing, and Engineering, Indiana University, Bloomington, IN, United States
| | - Justin F Rousseau
- Department of Population Health and Department of Neurology, Dell Medical School, The University of Texas at Austin, Austin, TX, United States
| | - Ying Ding
- School of Information, Dell Medical School, The University of Texas Austin, Austin, TX, United States
| | - Min Song
- Department of Library and Information Science, Yonsei University, Seoul, Republic of Korea
| | - Wei Lu
- Information Retrieval and Knowledge Mining Laboratory, School of Information Management, Wuhan University, Wuhan, China
| |
Collapse
|
7
|
Cao Y, Peng H, Yu PS. Multi-information Source HIN for Medical Concept Embedding. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2020. [PMCID: PMC7206250 DOI: 10.1007/978-3-030-47436-2_30] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
8
|
Cao RM, Liu SY, Xu XK. Network embedding for link prediction: The pitfall and improvement. CHAOS (WOODBURY, N.Y.) 2019; 29:103102. [PMID: 31675842 DOI: 10.1063/1.5120724] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2019] [Accepted: 09/10/2019] [Indexed: 06/10/2023]
Abstract
Link prediction plays a significant role in various applications of complex networks. The existing link prediction methods can be divided into two categories: structural similarity algorithms in network domain and network embedding algorithms in the field of machine learning. However, few researchers focus on comparing these two categories of algorithms and exploring the intrinsic relationship between them. In this study, we systematically compare the two categories of algorithms and study the shortcomings of network embedding algorithms. The results indicate that network embedding algorithms have poor performance in short-path networks. Then, we explain the reasons for this phenomenon by computing the Euclidean distance distribution of node pairs after a given network has been embedded into a vector space. In the vector space of a short-path network, the distance distribution of existent and nonexistent links are often less distinguishable, which can sharply reduce the algorithmic performance. In contrast, structural similarity algorithms, which are not restricted by the distance function, can represent node similarity accurately in short-path networks. To address the above pitfall of network embedding, we propose a novel method for link prediction aiming to supplement network embedding algorithms with local structural information. The experimental results suggest that our proposed algorithm has significant performance improvement in many empirical networks, especially in short-path networks. AUC and Precision can be improved by 36.7%-94.4% and 53.2%-207.2%, respectively.
Collapse
Affiliation(s)
- Ren-Meng Cao
- College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| | - Si-Yuan Liu
- College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| | - Xiao-Ke Xu
- College of Information and Communication Engineering, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|