1
|
Ju W, Fang Z, Gu Y, Liu Z, Long Q, Qiao Z, Qin Y, Shen J, Sun F, Xiao Z, Yang J, Yuan J, Zhao Y, Wang Y, Luo X, Zhang M. A Comprehensive Survey on Deep Graph Representation Learning. Neural Netw 2024; 173:106207. [PMID: 38442651 DOI: 10.1016/j.neunet.2024.106207] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 01/23/2024] [Accepted: 02/21/2024] [Indexed: 03/07/2024]
Abstract
Graph representation learning aims to effectively encode high-dimensional sparse graph-structured data into low-dimensional dense vectors, which is a fundamental task that has been widely studied in a range of fields, including machine learning and data mining. Classic graph embedding methods follow the basic idea that the embedding vectors of interconnected nodes in the graph can still maintain a relatively close distance, thereby preserving the structural information between the nodes in the graph. However, this is sub-optimal due to: (i) traditional methods have limited model capacity which limits the learning performance; (ii) existing techniques typically rely on unsupervised learning strategies and fail to couple with the latest learning paradigms; (iii) representation learning and downstream tasks are dependent on each other which should be jointly enhanced. With the remarkable success of deep learning, deep graph representation learning has shown great potential and advantages over shallow (traditional) methods, there exist a large number of deep graph representation learning techniques have been proposed in the past decade, especially graph neural networks. In this survey, we conduct a comprehensive survey on current deep graph representation learning algorithms by proposing a new taxonomy of existing state-of-the-art literature. Specifically, we systematically summarize the essential components of graph representation learning and categorize existing approaches by the ways of graph neural network architectures and the most recent advanced learning paradigms. Moreover, this survey also provides the practical and promising applications of deep graph representation learning. Last but not least, we state new perspectives and suggest challenging directions which deserve further investigations in the future.
Collapse
Affiliation(s)
- Wei Ju
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zheng Fang
- School of Intelligence Science and Technology, Peking University, Beijing, 100871, China
| | - Yiyang Gu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Zequn Liu
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Qingqing Long
- Computer Network Information Center, Chinese Academy of Sciences, Beijing, 100086, China
| | - Ziyue Qiao
- Artificial Intelligence Thrust, The Hong Kong University of Science and Technology, Guangzhou, 511453, China
| | - Yifang Qin
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jianhao Shen
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Fang Sun
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Zhiping Xiao
- Department of Computer Science, University of California, Los Angeles, 90095, USA
| | - Junwei Yang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Jingyang Yuan
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yusheng Zhao
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China
| | - Yifan Wang
- School of Information Technology & Management, University of International Business and Economics, Beijing, 100029, China
| | - Xiao Luo
- Department of Computer Science, University of California, Los Angeles, 90095, USA.
| | - Ming Zhang
- School of Computer Science, National Key Laboratory for Multimedia Information Processing, Peking University, Beijing, 100871, China.
| |
Collapse
|
2
|
Zou S, Liu Z, Wang K, Cao J, Liu S, Xiong W, Li S. A study on pharmaceutical text relationship extraction based on heterogeneous graph neural networks. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2024; 21:1489-1507. [PMID: 38303474 DOI: 10.3934/mbe.2024064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2024]
Abstract
Effective information extraction of pharmaceutical texts is of great significance for clinical research. The ancient Chinese medicine text has streamlined sentences and complex semantic relationships, and the textual relationships may exist between heterogeneous entities. The current mainstream relationship extraction model does not take into account the associations between entities and relationships when extracting, resulting in insufficient semantic information to form an effective structured representation. In this paper, we propose a heterogeneous graph neural network relationship extraction model adapted to traditional Chinese medicine (TCM) text. First, the given sentence and predefined relationships are embedded by bidirectional encoder representation from transformers (BERT fine-tuned) word embedding as model input. Second, a heterogeneous graph network is constructed to associate words, phrases, and relationship nodes to obtain the hidden layer representation. Then, in the decoding stage, two-stage subject-object entity identification method is adopted, and the identifier adopts a binary classifier to locate the start and end positions of the TCM entities, identifying all the subject-object entities in the sentence, and finally forming the TCM entity relationship group. Through the experiments on the TCM relationship extraction dataset, the results show that the precision value of the heterogeneous graph neural network embedded with BERT is 86.99% and the F1 value reaches 87.40%, which is improved by 8.83% and 10.21% compared with the relationship extraction models CNN, Bert-CNN, and Graph LSTM.
Collapse
Affiliation(s)
- Shuilong Zou
- Nanchang Institute of science & Technology, Nanchang 330004, China
| | - Zhaoyang Liu
- School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
| | - Kaiqi Wang
- School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
| | - Jun Cao
- School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
| | - Shixiong Liu
- Nanchang Institute of science & Technology, Nanchang 330004, China
| | - Wangping Xiong
- School of Computer, Jiangxi University of Chinese Medicine, Nanchang 330004, China
| | - Shaoyi Li
- Nanchang Institute of science & Technology, Nanchang 330004, China
| |
Collapse
|
3
|
Guo Q, Liao Y, Li Z, Lin H, Liang S. Convolutional Models with Multi-Feature Fusion for Effective Link Prediction in Knowledge Graph Embedding. ENTROPY (BASEL, SWITZERLAND) 2023; 25:1472. [PMID: 37895593 PMCID: PMC10606879 DOI: 10.3390/e25101472] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/19/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023]
Abstract
Link prediction remains paramount in knowledge graph embedding (KGE), aiming to discern obscured or non-manifest relationships within a given knowledge graph (KG). Despite the critical nature of this endeavor, contemporary methodologies grapple with notable constraints, predominantly in terms of computational overhead and the intricacy of encapsulating multifaceted relationships. This paper introduces a sophisticated approach that amalgamates convolutional operators with pertinent graph structural information. By meticulously integrating information pertinent to entities and their immediate relational neighbors, we enhance the performance of the convolutional model, culminating in an averaged embedding ensuing from the convolution across entities and their proximal nodes. Significantly, our methodology presents a distinctive avenue, facilitating the inclusion of edge-specific data into the convolutional model's input, thus endowing users with the latitude to calibrate the model's architecture and parameters congruent with their specific dataset. Empirical evaluations underscore the ascendancy of our proposition over extant convolution-based link prediction benchmarks, particularly evident across the FB15k, WN18, and YAGO3-10 datasets. The primary objective of this research lies in forging KGE link prediction methodologies imbued with heightened efficiency and adeptness, thereby addressing salient challenges inherent to real-world applications.
Collapse
Affiliation(s)
- Qinglang Guo
- School of Cyber Science and Technology, University of Science and Technology of China, Heifei 230027, China
- National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), CETC Academy of Electronics and Information Technology Group Co., Ltd., China Academic of Electronics and Information Technology, Beijing 100041, China;
| | - Yong Liao
- School of Cyber Science and Technology, University of Science and Technology of China, Heifei 230027, China
| | - Zhe Li
- Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR, China
| | - Hui Lin
- National Engineering Research Center for Public Safety Risk Perception and Control by Big Data (RPP), CETC Academy of Electronics and Information Technology Group Co., Ltd., China Academic of Electronics and Information Technology, Beijing 100041, China;
| | - Shenglin Liang
- School of Telecommunications Engineering, Xidian University, Xi’an 710071, China
| |
Collapse
|
4
|
A novel NIH research grant recommender using BERT. PLoS One 2023; 18:e0278636. [PMID: 36649346 PMCID: PMC9844873 DOI: 10.1371/journal.pone.0278636] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 11/19/2022] [Indexed: 01/18/2023] Open
Abstract
Research grants are important for researchers to sustain a good position in academia. There are many grant opportunities available from different funding agencies. However, finding relevant grant announcements is challenging and time-consuming for researchers. To resolve the problem, we proposed a grant announcements recommendation system for the National Institute of Health (NIH) grants using researchers' publications. We formulated the recommendation as a classification problem and proposed a recommender using state-of-the-art deep learning techniques: i.e. Bidirectional Encoder Representations from Transformers (BERT), to capture intrinsic, non-linear relationship between researchers' publications and grants announcements. Internal and external evaluations were conducted to assess the system's usefulness. During internal evaluations, the grant citations were used to establish grant-publication ground truth, and results were evaluated against Recall@k, Precision@k, Mean reciprocal rank (MRR) and Area under the Receiver Operating Characteristic curve (ROC-AUC). During external evaluations, researchers' publications were clustered using Dirichlet Process Mixture Model (DPMM), recommended grants by our model were then aggregated per cluster through Recency Weight, and finally researchers were invited to provide ratings to recommendations to calculate Precision@k. For comparison, baseline recommenders using Okapi Best Matching (BM25), Term-Frequency Inverse Document Frequency (TF-IDF), doc2vec, and Naïve Bayes (NB) were also developed. Both internal and external evaluations (all metrics) revealed favorable performances of our proposed BERT-based recommender.
Collapse
|