1
|
Baptista A, Brière G, Baudot A. Random walk with restart on multilayer networks: from node prioritisation to supervised link prediction and beyond. BMC Bioinformatics 2024; 25:70. [PMID: 38355439 PMCID: PMC10865648 DOI: 10.1186/s12859-024-05683-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 01/29/2024] [Indexed: 02/16/2024] Open
Abstract
BACKGROUND Biological networks have proven invaluable ability for representing biological knowledge. Multilayer networks, which gather different types of nodes and edges in multiplex, heterogeneous and bipartite networks, provide a natural way to integrate diverse and multi-scale data sources into a common framework. Recently, we developed MultiXrank, a Random Walk with Restart algorithm able to explore such multilayer networks. MultiXrank outputs scores reflecting the proximity between an initial set of seed node(s) and all the other nodes in the multilayer network. We illustrate here the versatility of bioinformatics tasks that can be performed using MultiXrank. RESULTS We first show that MultiXrank can be used to prioritise genes and drugs of interest by exploring multilayer networks containing interactions between genes, drugs, and diseases. In a second study, we illustrate how MultiXrank scores can also be used in a supervised strategy to train a binary classifier to predict gene-disease associations. The classifier performance are validated using outdated and novel gene-disease association for training and evaluation, respectively. Finally, we show that MultiXrank scores can be used to compute diffusion profiles and use them as disease signatures. We computed the diffusion profiles of more than 100 immune diseases using a multilayer network that includes cell-type specific genomic information. The clustering of the immune disease diffusion profiles reveals shared shared phenotypic characteristics. CONCLUSION Overall, we illustrate here diverse applications of MultiXrank to showcase its versatility. We expect that this can lead to further and broader bioinformatics applications.
Collapse
Affiliation(s)
- Anthony Baptista
- School of Mathematical Sciences, Queen Mary University of London, London, UK.
- The Alan Turing Institute, London, UK.
| | | | - Anaïs Baudot
- INSERM, MMG, Turing Center for Living Systems, Aix-Marseille Univ, Marseille, France.
- Barcelona Supercomputing Center, Barcelona, Spain.
| |
Collapse
|
2
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
3
|
Li X, Yuan H, Wu X, Wang C, Wu M, Shi H, Lv Y. MultiDS-MDA: Integrating multiple data sources into heterogeneous network for predicting novel metabolite-drug associations. Comput Biol Med 2023; 162:107067. [PMID: 37276756 DOI: 10.1016/j.compbiomed.2023.107067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 05/15/2023] [Accepted: 05/27/2023] [Indexed: 06/07/2023]
Abstract
Metabolic processes in the human body play an important role in maintaining normal life activities, and the abnormal concentration of metabolites is closely related to the occurrence and development of diseases. The use of drugs is considered to have a major impact on metabolism, and drug metabolites can contribute to efficacy, drug toxicity and drug-drug interaction. However, our understanding of metabolite-drug associations is far from complete, and individual data source tends to be incomplete and noisy. Therefore, the integration of various types of data sources for inferring reliable metabolite-drug associations is urgently needed. In this study, we proposed a computational framework, MultiDS-MDA, for identifying metabolite-drug associations by integrating multiple data sources, including chemical structure information of metabolites and drugs, the relationships of metabolite-gene, metabolite-disease, drug-gene and drug-disease, the data of gene ontology (GO) and disease ontology (DO) and known metabolite-drug connections. The performance of MultiDS-MDA was evaluated by 5-fold cross-validation, which achieved an area under the ROC curve (AUROC) of 0.911 and an area under the precision-recall curve (AUPRC) of 0.907. Additionally, MultiDS-MDA showed outstanding performance compared with similar approaches. Case studies for three metabolites (cholesterol, thromboxane B2 and coenzyme Q10) and three drugs (simvastatin, pravastatin and morphine) also demonstrated the reliability and efficiency of MultiDS-MDA, and it is anticipated that MultiDS-MDA will serve as a powerful tool for future exploration of metabolite-drug interactions and contribute to drug development and drug combination.
Collapse
Affiliation(s)
- Xiuhong Li
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Hao Yuan
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Xiaoliang Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Chengyi Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Meitao Wu
- College of Bioinformatics Science and Technology, Harbin Medical University, China
| | - Hongbo Shi
- College of Bioinformatics Science and Technology, Harbin Medical University, China.
| | - Yingli Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, China.
| |
Collapse
|
4
|
Kumar N, Mukhtar MS. Ranking Plant Network Nodes Based on Their Centrality Measures. ENTROPY (BASEL, SWITZERLAND) 2023; 25:e25040676. [PMID: 37190464 PMCID: PMC10137616 DOI: 10.3390/e25040676] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2023] [Revised: 04/14/2023] [Accepted: 04/16/2023] [Indexed: 05/17/2023]
Abstract
Biological networks are often large and complex, making it difficult to accurately identify the most important nodes. Node prioritization algorithms are used to identify the most influential nodes in a biological network by considering their relationships with other nodes. These algorithms can help us understand the functioning of the network and the role of individual nodes. We developed CentralityCosDist, an algorithm that ranks nodes based on a combination of centrality measures and seed nodes. We applied this and four other algorithms to protein-protein interactions and co-expression patterns in Arabidopsis thaliana using pathogen effector targets as seed nodes. The accuracy of the algorithms was evaluated through functional enrichment analysis of the top 10 nodes identified by each algorithm. Most enriched terms were similar across algorithms, except for DIAMOnD. CentralityCosDist identified more plant-pathogen interactions and related functions and pathways compared to the other algorithms.
Collapse
Affiliation(s)
- Nilesh Kumar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| | - M Shahid Mukhtar
- Department of Biology, University of Alabama at Birmingham, Birmingham, AL 35294, USA
| |
Collapse
|
5
|
Wang Z, Gu Y, Zheng S, Yang L, Li J. MGREL: A multi-graph representation learning-based ensemble learning method for gene-disease association prediction. Comput Biol Med 2023; 155:106642. [PMID: 36805231 DOI: 10.1016/j.compbiomed.2023.106642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Revised: 01/15/2023] [Accepted: 02/05/2023] [Indexed: 02/12/2023]
Abstract
The identification of gene-disease associations plays an important role in the exploration of pathogenic mechanisms and therapeutic targets. Computational methods have been regarded as an effective way to discover the potential gene-disease associations in recent years. However, most of them ignored the combination of abundant genetic, therapeutic information, and gene-disease network topology. To this end, we re-organized the current gene-disease association benchmark dataset by extracting the newest gene-disease associations from the OMIM database. Then, we developed a multi-graph representation learning-based ensemble model, named MGREL to predict gene-disease associations. MGREL integrated two feature generation channels to extract gene and disease features, including a knowledge extraction channel which learned high-order representations from genetic and therapeutic information, and a graph learning channel which acquired network topological representations through multiple advanced graph representation learning methods. Then, an ensemble learning method with 5 machine learning models was used as the classifier to predict the gene-disease association. Comprehensive experiments have demonstrated the significant performance achieved by MGREL compared to 5 state-of-the-art methods. For the major measurements (AUC = 0.925, AUPR = 0.935), the relative improvements of MGREL compared to the suboptimal methods are 3.24%, and 2.75%, respectively. MGREL also achieved impressive improvements in the challenging tasks of predicting potential associations for unknown genes/diseases. In addition, case studies implied potential applications for MGREL in the discovery of potential therapeutic targets.
Collapse
Affiliation(s)
- Ziyang Wang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Yaowen Gu
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Si Zheng
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China; Institute for Artificial Intelligence, Department of Computer Science and Technology, BNRist, Tsinghua University, Beijing, 100084, China
| | - Lin Yang
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China
| | - Jiao Li
- Institute of Medical Information IMI, Chinese Academy of Medical Sciences and Peking Union Medical College CAMS & PUMC, Beijing, 100020, China.
| |
Collapse
|
6
|
Voitalov I, Zhang L, Kilpatrick C, Withers JB, Saleh A, Akmaev VR, Ghiassian SD. The module triad: a novel network biology approach to utilize patients' multi-omics data for target discovery in ulcerative colitis. Sci Rep 2022; 12:21685. [PMID: 36522454 PMCID: PMC9755270 DOI: 10.1038/s41598-022-26276-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 12/13/2022] [Indexed: 12/23/2022] Open
Abstract
Tumor necrosis factor-[Formula: see text] inhibitors (TNFi) have been a standard treatment in ulcerative colitis (UC) for nearly 20 years. However, insufficient response rate to TNFi therapies along with concerns around their immunogenicity and inconvenience of drug delivery through injections calls for development of UC drugs targeting alternative proteins. Here, we propose a multi-omic network biology method for prioritization of protein targets for UC treatment. Our method identifies network modules on the Human Interactome-a network of protein-protein interactions in human cells-consisting of genes contributing to the predisposition to UC (Genotype module), genes whose expression needs to be modulated to achieve low disease activity (Response module), and proteins whose perturbation alters expression of the Response module genes to a healthy state (Treatment module). Targets are prioritized based on their topological relevance to the Genotype module and functional similarity to the Treatment module. We demonstrate utility of our method in UC and other complex diseases by efficiently recovering the protein targets associated with compounds in clinical trials and on the market . The proposed method may help to reduce cost and time of drug development by offering a computational screening tool for identification of novel and repurposing therapeutic opportunities in UC and other complex diseases.
Collapse
Affiliation(s)
- Ivan Voitalov
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Lixia Zhang
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Casey Kilpatrick
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Johanna B. Withers
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | - Alif Saleh
- Scipher Medicine Corporation, 221 Crescent St Suite 103A, Waltham, MA 02453 USA
| | | | | |
Collapse
|
7
|
Zhang L, Fan S, Vera J, Lai X. A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer. Comput Struct Biotechnol J 2022; 21:34-45. [PMID: 36514340 PMCID: PMC9732137 DOI: 10.1016/j.csbj.2022.11.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer is a heterogeneous disease mainly driven by abnormal gene perturbations in regulatory networks. Therefore, it is appealing to identify the common and specific perturbed genes from multiple cancer networks. We developed an integrative network medicine approach to identify novel biomarkers and investigate drug repurposing across cancer types. We used a network-based method to prioritize genes in cancer-specific networks reconstructed using human transcriptome and interactome data. The prioritized genes show extensive perturbation and strong regulatory interaction with other highly perturbed genes, suggesting their vital contribution to tumorigenesis and tumor progression, and are therefore regarded as cancer genes. The cancer genes detected show remarkable performances in discriminating tumors from normal tissues and predicting survival times of cancer patients. Finally, we developed a network proximity approach to systematically screen drugs and identified dozens of candidates with repurposable potential in several cancer types. Taken together, we demonstrated the power of the network medicine approach to identify novel biomarkers and repurposable drugs in multiple cancer types. We have also made the data and code freely accessible to ensure reproducibility and reusability of the developed computational workflow.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Shiwei Fan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany,BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland,Corresponding author at: Universitätsklinikum Erlangen, Erlangen, Germany; Tampere University, Tampere, Finland.
| |
Collapse
|
8
|
Nguyen T, Yue Z, Slominski R, Welner R, Zhang J, Chen JY. WINNER: A network biology tool for biomolecular characterization and prioritization. Front Big Data 2022; 5:1016606. [DOI: 10.3389/fdata.2022.1016606] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Accepted: 10/14/2022] [Indexed: 11/06/2022] Open
Abstract
Background and contributionIn network biology, molecular functions can be characterized by network-based inference, or “guilt-by-associations.” PageRank-like tools have been applied in the study of biomolecular interaction networks to obtain further the relative significance of all molecules in the network. However, there is a great deal of inherent noise in widely accessible data sets for gene-to-gene associations or protein-protein interactions. How to develop robust tests to expand, filter, and rank molecular entities in disease-specific networks remains an ad hoc data analysis process.ResultsWe describe a new biomolecular characterization and prioritization tool called Weighted In-Network Node Expansion and Ranking (WINNER). It takes the input of any molecular interaction network data and generates an optionally expanded network with all the nodes ranked according to their relevance to one another in the network. To help users assess the robustness of results, WINNER provides two different types of statistics. The first type is a node-expansion p-value, which helps evaluate the statistical significance of adding “non-seed” molecules to the original biomolecular interaction network consisting of “seed” molecules and molecular interactions. The second type is a node-ranking p-value, which helps evaluate the relative statistical significance of the contribution of each node to the overall network architecture. We validated the robustness of WINNER in ranking top molecules by spiking noises in several network permutation experiments. We have found that node degree–preservation randomization of the gene network produced normally distributed ranking scores, which outperform those made with other gene network randomization techniques. Furthermore, we validated that a more significant proportion of the WINNER-ranked genes was associated with disease biology than existing methods such as PageRank. We demonstrated the performance of WINNER with a few case studies, including Alzheimer's disease, breast cancer, myocardial infarctions, and Triple negative breast cancer (TNBC). In all these case studies, the expanded and top-ranked genes identified by WINNER reveal disease biology more significantly than those identified by other gene prioritizing software tools, including Ingenuity Pathway Analysis (IPA) and DiAMOND.ConclusionWINNER ranking strongly correlates to other ranking methods when the network covers sufficient node and edge information, indicating a high network quality. WINNER users can use this new tool to robustly evaluate a list of candidate genes, proteins, or metabolites produced from high-throughput biology experiments, as long as there is available gene/protein/metabolic network information.
Collapse
|
9
|
Rintala TJ, Ghosh A, Fortino V. Network approaches for modeling the effect of drugs and diseases. Brief Bioinform 2022; 23:6608969. [PMID: 35704883 PMCID: PMC9294412 DOI: 10.1093/bib/bbac229] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2022] [Revised: 04/29/2022] [Accepted: 05/17/2021] [Indexed: 12/12/2022] Open
Abstract
The network approach is quickly becoming a fundamental building block of computational methods aiming at elucidating the mechanism of action (MoA) and therapeutic effect of drugs. By modeling the effect of drugs and diseases on different biological networks, it is possible to better explain the interplay between disease perturbations and drug targets as well as how drug compounds induce favorable biological responses and/or adverse effects. Omics technologies have been extensively used to generate the data needed to study the mechanisms of action of drugs and diseases. These data are often exploited to define condition-specific networks and to study whether drugs can reverse disease perturbations. In this review, we describe network data mining algorithms that are commonly used to study drug’s MoA and to improve our understanding of the basis of chronic diseases. These methods can support fundamental stages of the drug development process, including the identification of putative drug targets, the in silico screening of drug compounds and drug combinations for the treatment of diseases. We also discuss recent studies using biological and omics-driven networks to search for possible repurposed FDA-approved drug treatments for SARS-CoV-2 infections (COVID-19).
Collapse
Affiliation(s)
- T J Rintala
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - Arindam Ghosh
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| | - V Fortino
- Institute of Biomedicine, University of Eastern Finland, 70210 Kuopio, Finland
| |
Collapse
|
10
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
11
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
12
|
Li W, Zhang Y, Wang Y, Rong Z, Liu C, Miao H, Chen H, He Y, He W, Chen L. Candidate gene prioritization for chronic obstructive pulmonary disease using expression information in protein-protein interaction networks. BMC Pulm Med 2021; 21:280. [PMID: 34481483 PMCID: PMC8418003 DOI: 10.1186/s12890-021-01646-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 08/23/2021] [Indexed: 11/30/2022] Open
Abstract
Background Identifying or prioritizing genes for chronic obstructive pulmonary disease (COPD), one type of complex disease, is particularly important for its prevention and treatment. Methods In this paper, a novel method was proposed to Prioritize genes using Expression information in Protein–protein interaction networks with disease risks transferred between genes (abbreviated as PEP). A weighted COPD PPI network was constructed using expression information and then COPD candidate genes were prioritized based on their corresponding disease risk scores in descending order. Results Further analysis demonstrated that the PEP method was robust in prioritizing disease candidate genes, and superior to other existing prioritization methods exploiting either topological or functional information. Top-ranked COPD candidate genes and their significantly enriched functions were verified to be related to COPD. The top 200 candidate genes might be potential disease genes in the diagnosis and treatment of COPD. Conclusions The proposed method could provide new insights to the research of prioritizing candidate genes of COPD or other complex diseases with expression information from sequencing or microarray data. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-021-01646-9.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Zherou Rong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Chenyu Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Hui Miao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Hongwei Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Weiming He
- Institute of Opto-Electronics, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China.
| |
Collapse
|