1
|
Ran Y, Xu XK, Jia T. The maximum capability of a topological feature in link prediction. PNAS NEXUS 2024; 3:pgae113. [PMID: 38528954 PMCID: PMC10962729 DOI: 10.1093/pnasnexus/pgae113] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 02/21/2024] [Indexed: 03/27/2024]
Abstract
Networks offer a powerful approach to modeling complex systems by representing the underlying set of pairwise interactions. Link prediction is the task that predicts links of a network that are not directly visible, with profound applications in biological, social, and other complex systems. Despite intensive utilization of the topological feature in this task, it is unclear to what extent a feature can be leveraged to infer missing links. Here, we aim to unveil the capability of a topological feature in link prediction by identifying its prediction performance upper bound. We introduce a theoretical framework that is compatible with different indexes to gauge the feature, different prediction approaches to utilize the feature, and different metrics to quantify the prediction performance. The maximum capability of a topological feature follows a simple yet theoretically validated expression, which only depends on the extent to which the feature is held in missing and nonexistent links. Because a family of indexes based on the same feature shares the same upper bound, the potential of all others can be estimated from one single index. Furthermore, a feature's capability is lifted in the supervised prediction, which can be mathematically quantified, allowing us to estimate the benefit of applying machine learning algorithms. The universality of the pattern uncovered is empirically verified by 550 structurally diverse networks. The findings have applications in feature and method selection, and shed light on network characteristics that make a topological feature effective in link prediction.
Collapse
Affiliation(s)
- Yijun Ran
- College of Computer and Information Science, Southwest University, Chongqing 400715, P.R. China
- Center for Computational Communication Research, Beijing Normal University, Zhuhai 519087, P.R. China
- School of Journalism and Communication, Beijing Normal University, Beijing 100875, P.R. China
| | - Xiao-Ke Xu
- Center for Computational Communication Research, Beijing Normal University, Zhuhai 519087, P.R. China
- School of Journalism and Communication, Beijing Normal University, Beijing 100875, P.R. China
| | - Tao Jia
- College of Computer and Information Science, Southwest University, Chongqing 400715, P.R. China
| |
Collapse
|
2
|
Ishii C, Asatani K, Sakata I. Detecting possible pairs of materials for composites using a material word co-occurrence network. PLoS One 2024; 19:e0297361. [PMID: 38277416 PMCID: PMC10817182 DOI: 10.1371/journal.pone.0297361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 01/02/2024] [Indexed: 01/28/2024] Open
Abstract
Composite materials are popular because of their high performance capabilities, but new material development is time-consuming. To accelerate this process, researchers studying material informatics, an academic discipline combining computational science and material science, have developed less time-consuming approaches for predicting possible material combinations. However, these processes remain problematic because some materials are not suited for them. The limitations of specific candidates for new composites may cause potential new material pairs to be overlooked. To solve this problem, we developed a new method to predict possible composite material pairs by considering more materials than previous techniques. We predicted possible material pairs by conducting link predictions of material word co-occurrence networks while assuming that co-occurring material word pairs in scientific papers on composites were reported as composite materials. As a result, we succeeded in predicting the co-occurrence of material words with high specificity. Nodes tended to link to many other words, generating new links in the created co-occurrence material word network; notably, the number of material words co-occurring with graphene increased rapidly. This phenomenon confirmed that graphene is an attractive composite component. We expect our method to contribute to the accelerated development of new composite materials.
Collapse
Affiliation(s)
- Chika Ishii
- Customer Experience Department, Cisco Systems G.K., Minato-ku, Tokyo, Japan
| | - Kimitaka Asatani
- Department of Technology Management for Innovation, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | - Ichiro Sakata
- Department of Technology Management for Innovation, Graduate School of Engineering, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| |
Collapse
|
3
|
Lu H, Uddin S. Embedding-based link predictions to explore latent comorbidity of chronic diseases. Health Inf Sci Syst 2023; 11:2. [PMID: 36593862 PMCID: PMC9803807 DOI: 10.1007/s13755-022-00206-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2022] [Accepted: 12/13/2022] [Indexed: 12/31/2022] Open
Abstract
Purpose Comorbidity is a term used to describe when a patient simultaneously has more than one chronic disease. Comorbidity is a significant health issue that affects people worldwide. This study aims to use machine learning and graph theory to predict the comorbidity of chronic diseases. Methods A patient-disease bipartite graph is constructed based on the administrative claim data. The bipartite graph projection approach was used to create the comorbidity network. For the link prediction task, three graph machine learning embedding-based models (node2vec, graph neural networks and hand-crafted approach) with different variants were used on the comorbidity network to compare their performance. This study also considered three commonly used similarity-based link prediction approaches (Jaccard coefficient, Adamic-Adar index and Resource allocation index) for performance comparison. Results The results showed that the embedding-based hand-crafted features technique achieved outstanding performance compared with the remaining similarity-based and embedding-based models. Especially, the hand-crafted technique with the extreme gradient boosting classifier achieved the highest accuracy (91.67%), followed by the same technique with the Logistic regression classifier (90.26%). For this shallow embedding method, the Jaccard coefficient and the degree centrality of the original chronic disease were the most important features for comorbidity prediction. Conclusion The proposed framework can be used to predict the comorbidity of chronic disease at an early stage of hospital admission. Thus, the prediction outcome could be valuable for medical practice, giving healthcare providers more control over their services and lowering expenses.
Collapse
Affiliation(s)
- Haohui Lu
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| | - Shahadat Uddin
- School of Project Management, Faculty of Engineering, The University of Sydney, Level 2, 21 Ross Street, Forest Lodge, NSW 2037 Australia
| |
Collapse
|
4
|
Wu H, Song C, Ge Y, Ge T. Link Prediction on Complex Networks: An Experimental Survey. DATA SCIENCE AND ENGINEERING 2022; 7:253-278. [PMID: 35754861 PMCID: PMC9211798 DOI: 10.1007/s41019-022-00188-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Revised: 03/31/2022] [Accepted: 06/08/2022] [Indexed: 06/15/2023]
Abstract
Complex networks have been used widely to model a large number of relationships. The outbreak of COVID-19 has had a huge impact on various complex networks in the real world, for example global trade networks, air transport networks, and even social networks, known as racial equality issues caused by the spread of the epidemic. Link prediction plays an important role in complex network analysis in that it can find missing links or predict the links which will arise in the future in the network by analyzing the existing network structures. Therefore, it is extremely important to study the link prediction problem on complex networks. There are a variety of techniques for link prediction based on the topology of the network and the properties of entities. In this work, a new taxonomy is proposed to divide the link prediction methods into five categories and a comprehensive overview of these methods is provided. The network embedding-based methods, especially graph neural network-based methods, which have attracted increasing attention in recent years, have been creatively investigated as well. Moreover, we analyze thirty-six datasets and divide them into seven types of networks according to their topological features shown in real networks and perform comprehensive experiments on these networks. We further analyze the results of experiments in detail, aiming to discover the most suitable approach for each kind of network.
Collapse
Affiliation(s)
- Haixia Wu
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, China
| | - Chunyao Song
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, China
| | - Yao Ge
- College of Computer Science, Tianjin Key Laboratory of Network and Data Security Technology, Nankai University, Tianjin, China
| | - Tingjian Ge
- University of Massachusetts Lowell, Massachusetts, United States
| |
Collapse
|
5
|
Strydom T, Bouskila S, Banville F, Barros C, Caron D, Farrell MJ, Fortin M, Hemming V, Mercier B, Pollock LJ, Runghen R, Dalla Riva GV, Poisot T. Food web reconstruction through phylogenetic transfer of low‐rank network representation. Methods Ecol Evol 2022. [DOI: 10.1111/2041-210x.13835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Tanya Strydom
- Département de Sciences Biologiques Université de Montréal Montréal Canada
- Quebec Centre for Biodiversity Science Montréal Canada
| | - Salomé Bouskila
- Département de Sciences Biologiques Université de Montréal Montréal Canada
| | - Francis Banville
- Département de Sciences Biologiques Université de Montréal Montréal Canada
- Quebec Centre for Biodiversity Science Montréal Canada
- Département de Biologie Université de Sherbrooke Sherbrooke Canada
| | - Ceres Barros
- Department of Forest Resources Management University of British Columbia Vancouver Canada
| | - Dominique Caron
- Quebec Centre for Biodiversity Science Montréal Canada
- Department of Biology McGill University Montréal Canada
| | - Maxwell J. Farrell
- Department of Ecology & Evolutionary Biology University of Toronto Toronto Canada
| | - Marie‐Josée Fortin
- Department of Ecology & Evolutionary Biology University of Toronto Toronto Canada
| | - Victoria Hemming
- Department of Forest and Conservation Sciences University of British Columbia Vancouver Canada
| | - Benjamin Mercier
- Quebec Centre for Biodiversity Science Montréal Canada
- Département de Biologie Université de Sherbrooke Sherbrooke Canada
| | - Laura J. Pollock
- Quebec Centre for Biodiversity Science Montréal Canada
- Department of Biology McGill University Montréal Canada
| | - Rogini Runghen
- Centre for Integrative Ecology, School of Biological Sciences University of Canterbury Canterbury New Zealand
| | - Giulio V. Dalla Riva
- School of Mathematics and Statistics University of Canterbury Canterbury New Zealand
| | - Timothée Poisot
- Département de Sciences Biologiques Université de Montréal Montréal Canada
- Quebec Centre for Biodiversity Science Montréal Canada
| |
Collapse
|
6
|
Tang Y, Kurths J, Lin W, Ott E, Kocarev L. Introduction to Focus Issue: When machine learning meets complex systems: Networks, chaos, and nonlinear dynamics. CHAOS (WOODBURY, N.Y.) 2020; 30:063151. [PMID: 32611112 DOI: 10.1063/5.0016505] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2020] [Accepted: 06/05/2020] [Indexed: 06/11/2023]
Affiliation(s)
- Yang Tang
- Key Laboratory of Advanced Control and Optimization for Chemical Processes, Ministry of Education, East China University of Science and Technology, Shanghai, China
| | - Jürgen Kurths
- Potsdam Institute for Climate Impact Research, Potsdam 14473, Germany
| | - Wei Lin
- Center for Computational Systems Biology of ISTBI and Research Institute of Intelligent Complex Systems, Fudan University, Shanghai 200433, China
| | - Edward Ott
- Department of Physics, University of Maryland, College Park, Maryland 20742, USA
| | - Ljupco Kocarev
- Macedonian Academy of Sciences and Arts, 1000 Skopje, Macedonia
| |
Collapse
|
7
|
Banerjee A, Pathak J, Roy R, Restrepo JG, Ott E. Using machine learning to assess short term causal dependence and infer network links. CHAOS (WOODBURY, N.Y.) 2019; 29:121104. [PMID: 31893648 DOI: 10.1063/1.5134845] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 12/05/2019] [Indexed: 06/10/2023]
Abstract
We introduce and test a general machine-learning-based technique for the inference of short term causal dependence between state variables of an unknown dynamical system from time-series measurements of its state variables. Our technique leverages the results of a machine learning process for short time prediction to achieve our goal. The basic idea is to use the machine learning to estimate the elements of the Jacobian matrix of the dynamical flow along an orbit. The type of machine learning that we employ is reservoir computing. We present numerical tests on link inference of a network of interacting dynamical nodes. It is seen that dynamical noise can greatly enhance the effectiveness of our technique, while observational noise degrades the effectiveness. We believe that the competition between these two opposing types of noise will be the key factor determining the success of causal inference in many of the most important application situations.
Collapse
Affiliation(s)
- Amitava Banerjee
- Department of Physics and Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, Maryland 20742, USA
| | - Jaideep Pathak
- Department of Physics and Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, Maryland 20742, USA
| | - Rajarshi Roy
- Department of Physics and Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, Maryland 20742, USA
| | - Juan G Restrepo
- Department of Applied Mathematics, University of Colorado, Boulder, Colorado 80309, USA
| | - Edward Ott
- Department of Physics and Institute for Research in Electronics and Applied Physics, University of Maryland, College Park, Maryland 20742, USA
| |
Collapse
|