1
|
Gravel B, Renaux A, Papadimitriou S, Smits G, Nowé A, Lenaerts T. Prioritization of oligogenic variant combinations in whole exomes. Bioinformatics 2024; 40:btae184. [PMID: 38603604 PMCID: PMC11037482 DOI: 10.1093/bioinformatics/btae184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/29/2024] [Accepted: 04/10/2024] [Indexed: 04/13/2024] Open
Abstract
MOTIVATION Whole exome sequencing (WES) has emerged as a powerful tool for genetic research, enabling the collection of a tremendous amount of data about human genetic variation. However, properly identifying which variants are causative of a genetic disease remains an important challenge, often due to the number of variants that need to be screened. Expanding the screening to combinations of variants in two or more genes, as would be required under the oligogenic inheritance model, simply blows this problem out of proportion. RESULTS We present here the High-throughput oligogenic prioritizer (Hop), a novel prioritization method that uses direct oligogenic information at the variant, gene and gene pair level to detect digenic variant combinations in WES data. This method leverages information from a knowledge graph, together with specialized pathogenicity predictions in order to effectively rank variant combinations based on how likely they are to explain the patient's phenotype. The performance of Hop is evaluated in cross-validation on 36 120 synthetic exomes for training and 14 280 additional synthetic exomes for independent testing. Whereas the known pathogenic variant combinations are found in the top 20 in approximately 60% of the cross-validation exomes, 71% are found in the same ranking range when considering the independent set. These results provide a significant improvement over alternative approaches that depend simply on a monogenic assessment of pathogenicity, including early attempts for digenic ranking using monogenic pathogenicity scores. AVAILABILITY AND IMPLEMENTATION Hop is available at https://github.com/oligogenic/HOP.
Collapse
Affiliation(s)
- Barbara Gravel
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Alexandre Renaux
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Sofia Papadimitriou
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Brussels Interuniversity Genomics High Throughput core (BRIGHTcore), UZ Brussel, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), 1090 Brussels, Belgium
| | - Guillaume Smits
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Center of Human Genetics, Hôpital Erasme, Hôpital Universitaire de Bruxelles, Université Libre de Bruxelles, 1070 Brussels, Belgium
| | - Ann Nowé
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussel, 1050 Brussels, Belgium
- Department of Computer Science, Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Department of Computer Science, Artificial Intelligence Laboratory, Vrije Universiteit Brussels, 1050 Brussels, Belgium
| |
Collapse
|
2
|
Identification of Cancer Driver Genes by Integrating Multiomics Data with Graph Neural Networks. Metabolites 2023; 13:metabo13030339. [PMID: 36984779 PMCID: PMC10052551 DOI: 10.3390/metabo13030339] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 02/20/2023] [Accepted: 02/22/2023] [Indexed: 03/02/2023] Open
Abstract
Cancer is a heterogeneous disease that is driven by the accumulation of both genetic and nongenetic alterations, so integrating multiomics data and extracting effective information from them is expected to be an effective way to predict cancer driver genes. In this paper, we first generate comprehensive instructive features for each gene from genomic, epigenomic, transcriptomic levels together with protein–protein interaction (PPI)-networks-derived attributes and then propose a novel semisupervised deep graph learning framework GGraphSAGE to predict cancer driver genes according to the impact of the alterations on a biological system. When applied to eight tumor types, experimental results suggest that GGraphSAGE outperforms several state-of-the-art computational methods for driver genes identification. Moreover, it broadens our current understanding of cancer driver genes from multiomics level and identifies driver genes specific to the tumor type rather than pan-cancer. We expect GGraphSAGE to open new avenues in precision medicine and even further predict drivers for other complex diseases.
Collapse
|
3
|
Zhang L, Fan S, Vera J, Lai X. A network medicine approach for identifying diagnostic and prognostic biomarkers and exploring drug repurposing in human cancer. Comput Struct Biotechnol J 2022; 21:34-45. [PMID: 36514340 PMCID: PMC9732137 DOI: 10.1016/j.csbj.2022.11.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Revised: 11/18/2022] [Accepted: 11/18/2022] [Indexed: 12/03/2022] Open
Abstract
Cancer is a heterogeneous disease mainly driven by abnormal gene perturbations in regulatory networks. Therefore, it is appealing to identify the common and specific perturbed genes from multiple cancer networks. We developed an integrative network medicine approach to identify novel biomarkers and investigate drug repurposing across cancer types. We used a network-based method to prioritize genes in cancer-specific networks reconstructed using human transcriptome and interactome data. The prioritized genes show extensive perturbation and strong regulatory interaction with other highly perturbed genes, suggesting their vital contribution to tumorigenesis and tumor progression, and are therefore regarded as cancer genes. The cancer genes detected show remarkable performances in discriminating tumors from normal tissues and predicting survival times of cancer patients. Finally, we developed a network proximity approach to systematically screen drugs and identified dozens of candidates with repurposable potential in several cancer types. Taken together, we demonstrated the power of the network medicine approach to identify novel biomarkers and repurposable drugs in multiple cancer types. We have also made the data and code freely accessible to ensure reproducibility and reusability of the developed computational workflow.
Collapse
Affiliation(s)
- Le Zhang
- College of Computer Science, Sichuan University, Chengdu, China
| | - Shiwei Fan
- College of Computer Science, Sichuan University, Chengdu, China
| | - Julio Vera
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany
| | - Xin Lai
- Laboratory of Systems Tumor Immunology, Department of Dermatology, Universitätsklinikum Erlangen and Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany,Deutsches Zentrum Immuntherapie, Erlangen, Germany,Comprehensive Cancer Center Erlangen, Erlangen, Germany,BioMediTech, Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland,Corresponding author at: Universitätsklinikum Erlangen, Erlangen, Germany; Tampere University, Tampere, Finland.
| |
Collapse
|
4
|
Network-Based Approaches for Disease-Gene Association Prediction Using Protein-Protein Interaction Networks. Int J Mol Sci 2022; 23:ijms23137411. [PMID: 35806415 PMCID: PMC9266751 DOI: 10.3390/ijms23137411] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/25/2022] [Accepted: 06/30/2022] [Indexed: 01/02/2023] Open
Abstract
Genome-wide association studies (GWAS) can be used to infer genome intervals that are involved in genetic diseases. However, investigating a large number of putative mutations for GWAS is resource- and time-intensive. Network-based computational approaches are being used for efficient disease-gene association prediction. Network-based methods are based on the underlying assumption that the genes causing the same diseases are located close to each other in a molecular network, such as a protein-protein interaction (PPI) network. In this survey, we provide an overview of network-based disease-gene association prediction methods based on three categories: graph-theoretic algorithms, machine learning algorithms, and an integration of these two. We experimented with six selected methods to compare their prediction performance using a heterogeneous network constructed by combining a genome-wide weighted PPI network, an ontology-based disease network, and disease-gene associations. The experiment was conducted in two different settings according to the presence and absence of known disease-associated genes. The results revealed that HerGePred, an integrative method, outperformed in the presence of known disease-associated genes, whereas PRINCE, which adopted a network propagation algorithm, was the most competitive in the absence of known disease-associated genes. Overall, the results demonstrated that the integrative methods performed better than the methods using graph-theory only, and the methods using a heterogeneous network performed better than those using a homogeneous PPI network only.
Collapse
|
5
|
Tan H, Qiu S, Wang J, Yu G, Guo W, Guo M. Weighted deep factorizing heterogeneous molecular network for genome-phenome association prediction. Methods 2022; 205:18-28. [PMID: 35690250 DOI: 10.1016/j.ymeth.2022.05.008] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2022] [Revised: 05/14/2022] [Accepted: 05/26/2022] [Indexed: 11/18/2022] Open
Abstract
Genome-phenome association (GPA) prediction can promote the understanding of biological mechanisms about complex pathology of phenotypes (i.e., traits and diseases). Traditional heterogeneous network-based GPA approaches overwhelmingly need to project heterogeneous data toward homogeneous network for data fusion and prediction, such projections result in the loss of heterogeneous network structure information. Matrix factorization based data fusion can avoid such projection by integrating multi-type data in a coherent way, but they typically perform linear factorization and cannot mine the nonlinear relationships between molecules, which compromise the accuracy of GPA analysis. Furthermore, most of them can not selectively synergy network topology and node attribution information in a principle way. In this paper, we propose a weighted deep matrix factorization based solution (WDGPA) to predict GPAs by selectively and differentially fusing heterogeneous molecular network and diverse attributes of nodes. WDGPA firstly assigns weights to inter/intra-relational data matrices and attribute data matrices, and performs deep matrix factorization on these matrices of heterogeneous network in a cooperative manner to obtain the nonlinear representations of different nodes. In addition, it performs low-rank representation learning on the attribute data with the shared nonlinear representations. In this way, both the network topology and node attributes are jointly mined to explore the representations of molecules and complex interplays between molecules and phenotypes. WDGPA then uses the representational vectors of gene and phenotype nodes to predict GPAs. Experimental results on maize and human datasets confirm that WDGPA outperforms competitive methods by a large margin under different evaluation protocols.
Collapse
Affiliation(s)
- Haojiang Tan
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Sichao Qiu
- School of Software, Shandong University, Jinan, China; Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Jun Wang
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Guoxian Yu
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Wei Guo
- Joint SDU-NTU Centre For AI Research (C-FAIR), Shandong University, Jinan, China.
| | - Maozu Guo
- College of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China.
| |
Collapse
|
6
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
7
|
Zhang H, Ferguson A, Robertson G, Jiang M, Zhang T, Sudlow C, Smith K, Rannikmae K, Wu H. Benchmarking network-based gene prioritization methods for cerebral small vessel disease. Brief Bioinform 2021; 22:bbab006. [PMID: 33634312 PMCID: PMC8425308 DOI: 10.1093/bib/bbab006] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Revised: 12/31/2020] [Accepted: 01/04/2021] [Indexed: 12/25/2022] Open
Abstract
Network-based gene prioritization algorithms are designed to prioritize disease-associated genes based on known ones using biological networks of protein interactions, gene-disease associations (GDAs) and other relationships between biological entities. Various algorithms have been developed based on different mechanisms, but it is not obvious which algorithm is optimal for a specific disease. To address this issue, we benchmarked multiple algorithms for their application in cerebral small vessel disease (cSVD). We curated protein-gene interactions (PGIs) and GDAs from databases and assembled PGI networks and disease-gene heterogeneous networks. A screening of algorithms resulted in seven representative algorithms to be benchmarked. Performance of algorithms was assessed using both leave-one-out cross-validation (LOOCV) and external validation with MEGASTROKE genome-wide association study (GWAS). We found that random walk with restart on the heterogeneous network (RWRH) showed best LOOCV performance, with median LOOCV rediscovery rank of 185.5 (out of 19 463 genes). The GenePanda algorithm had most GWAS-confirmable genes in top 200 predictions, while RWRH had best ranks for small vessel stroke-associated genes confirmed in GWAS. In conclusion, RWRH has overall better performance for application in cSVD despite its susceptibility to bias caused by degree centrality. Choice of algorithms should be determined before applying to specific disease. Current pure network-based gene prioritization algorithms are unlikely to find novel disease-associated genes that are not associated with known ones. The tools for implementing and benchmarking algorithms have been made available and can be generalized for other diseases.
Collapse
Affiliation(s)
- Huayu Zhang
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Amy Ferguson
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Robertson
- Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Muchen Jiang
- Edinburgh Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Teng Zhang
- Department of Orthopaedics and Traumatology, the University of Hong Kong, Hong Kong, China
| | - Cathie Sudlow
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Keith Smith
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Kristiina Rannikmae
- Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
- Health Data Research UK, London, United Kingdom
| | - Honghan Wu
- Health Data Research UK, London, United Kingdom
- Institute of Health Informatics, University College London, London, United Kingdom
| |
Collapse
|
8
|
Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021; 37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, many computational methods have been developed to
incorporate information about phenotypes for disease–gene
prioritization task. These methods generally compute the similarity between
a patient’s phenotypes and a database of gene-phenotype to find the
most phenotypically similar match. The main limitation in these methods is
their reliance on knowledge about phenotypes associated with particular
genes, which is not complete in humans as well as in many model organisms,
such as the mouse and fish. Information about functions of gene products and
anatomical site of gene expression is available for more genes and can also
be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical
ontologies, which is able to exploit axioms in ontologies and other
graph-structured data. Using our machine-learning method, we embed genes
based on their associated phenotypes, functions of the gene products and
anatomical location of gene expression. We then develop a machine-learning
model to predict gene–disease associations based on the associations
between genes and multiple biomedical ontologies, and this model
significantly improves over state-of-the-art methods. Furthermore, we extend
phenotype-based gene prioritization methods significantly to all genes,
which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Supplementary information Supplementary data
are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Chen
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.,Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| |
Collapse
|
9
|
Xiang J, Zhang J, Zheng R, Li X, Li M. NIDM: network impulsive dynamics on multiplex biological network for disease-gene prediction. Brief Bioinform 2021; 22:6236070. [PMID: 33866352 DOI: 10.1093/bib/bbab080] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 02/11/2021] [Accepted: 02/21/2021] [Indexed: 12/12/2022] Open
Abstract
The prediction of genes related to diseases is important to the study of the diseases due to high cost and time consumption of biological experiments. Network propagation is a popular strategy for disease-gene prediction. However, existing methods focus on the stable solution of dynamics while ignoring the useful information hidden in the dynamical process, and it is still a challenge to make use of multiple types of physical/functional relationships between proteins/genes to effectively predict disease-related genes. Therefore, we proposed a framework of network impulsive dynamics on multiplex biological network (NIDM) to predict disease-related genes, along with four variants of NIDM models and four kinds of impulsive dynamical signatures (IDSs). NIDM is to identify disease-related genes by mining the dynamical responses of nodes to impulsive signals being exerted at specific nodes. By a series of experimental evaluations in various types of biological networks, we confirmed the advantage of multiplex network and the important roles of functional associations in disease-gene prediction, demonstrated superior performance of NIDM compared with four types of network-based algorithms and then gave the effective recommendations of NIDM models and IDS signatures. To facilitate the prioritization and analysis of (candidate) genes associated to specific diseases, we developed a user-friendly web server, which provides three kinds of filtering patterns for genes, network visualization, enrichment analysis and a wealth of external links (http://bioinformatics.csu.edu.cn/DGP/NID.jsp). NIDM is a protocol for disease-gene prediction integrating different types of biological networks, which may become a very useful computational tool for the study of disease-related genes.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Jiashuai Zhang
- School of Computer Science and Engineering, Central South University, Human, China
| | - Ruiqing Zheng
- School of Computer Science and Engineering, Central South University, China
| | - Xingyi Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
10
|
Xiang J, Zhang NR, Zhang JS, Lv XY, Li M. PrGeFNE: Predicting disease-related genes by fast network embedding. Methods 2020; 192:3-12. [PMID: 32610158 DOI: 10.1016/j.ymeth.2020.06.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/13/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022] Open
Abstract
Identifying disease-related genes is of importance for understanding of molecule mechanisms of diseases, as well as diagnosis and treatment of diseases. Many computational methods have been proposed to predict disease-related genes, but how to make full use of multi-source biological data to enhance the ability of disease-gene prediction is still challenging. In this paper, we proposed a novel method for predicting disease-related genes by using fast network embedding (PrGeFNE), which can integrate multiple types of associations related to diseases and genes. Specifically, we first constructed a heterogeneous network by using phenotype-disease, disease-gene, protein-protein and gene-GO associations; and low-dimensional representation of nodes is extracted from the network by using a fast network embedding algorithm. Then, a dual-layer heterogeneous network was reconstructed by using the low-dimensional representation, and a network propagation was applied to the dual-layer heterogeneous network to predict disease-related genes. Through cross-validation and newly added-association validation, we displayed the important roles of different types of association data in enhancing the ability of disease-gene prediction, and confirmed the excellent performance of PrGeFNE by comparing to state-of-the-art algorithms. Furthermore, we developed a web tool that can facilitate researchers to search for candidate genes of different diseases predicted by PrGeFNE, along with the enrichment analysis of GO and pathway on candidate gene set. This may be useful for investigation of diseases' molecular mechanisms as well as their experimental validations. The web tool is available at http://bioinformatics.csu.edu.cn/prgefne/.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, 410219 Hunan, China
| | - Ning-Rui Zhang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Jia-Shuai Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiao-Yi Lv
- School of Software, Xinjiang University, Urumqi 830046, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
11
|
Zhang Y, Li W, Zhang Y, Hu E, Rong Z, Ge L, Deng G, He Y, Lv J, Chen L, He W. Network-based integration method for potential breast cancer gene identification. J Cell Physiol 2020; 235:7960-7969. [PMID: 31943201 DOI: 10.1002/jcp.29450] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2019] [Accepted: 01/03/2020] [Indexed: 11/11/2022]
Abstract
Breast cancer is the most common female death-causing cancer worldwide. A network-based integration method was proposed to identify potential breast cancer genes. First, genes were prioritized using a gene prioritization algorithm by the strategy of disease risks transferred between genes in a network with weighted vertexes and edges. Our prioritization algorithm was effectives and robust for top-ranked seed gene number and higher area under the curve values compared to ToppGene and ToppNet. Then, 20 potential breast cancer genes were identified as common genes of the top 50 candidate genes for their robustness in multiple prioritizations. These genes could accurately classify tumor and normal samples of all and paired sample sets and three independent datasets. Of potential breast cancer genes, 18 were verified by literature and 2 were novel genes that need further study. This study would contribute to the understanding of the genetic architecture for the diagnosis and treatment of breast cancer.
Collapse
Affiliation(s)
- Yue Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Erqiang Hu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Zherou Rong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Luanfeng Ge
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Gui Deng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang, China
| | - Weiming He
- Institute of Opto-Electronics, Harbin Institute of Technology, Harbin, Heilongjiang, China
| |
Collapse
|
12
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
13
|
Predicting disease-genes based on network information loss and protein complexes in heterogeneous network. Inf Sci (N Y) 2019. [DOI: 10.1016/j.ins.2018.12.008] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
|
14
|
Li W, Zhang Y, He Y, Wang Y, Guo S, Zhao X, Feng Y, Song Z, Zou Y, He W, Chen L. Candidate gene prioritization for non-communicable diseases based on functional information: Case studies. J Biomed Inform 2019; 93:103155. [PMID: 30902596 DOI: 10.1016/j.jbi.2019.103155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 03/14/2019] [Accepted: 03/19/2019] [Indexed: 10/27/2022]
Abstract
Candidate gene prioritization for complex non-communicable diseases is essential to understanding the mechanism and developing better means for diagnosing and treating these diseases. Many methods have been developed to prioritize candidate genes in protein-protein interaction (PPI) networks. Integrating functional information/similarity into disease-related PPI networks could improve the performance of prioritization. In this study, a candidate gene prioritization method was proposed for non-communicable diseases considering disease risks transferred between genes in weighted disease PPI networks with weights for nodes and edges based on functional information. Here, three types of non-communicable diseases with pathobiological similarity, Type 2 diabetes (T2D), coronary artery disease (CAD) and dilated cardiomyopathy (DCM), were used as case studies. Literature review and pathway enrichment analysis of top-ranked genes demonstrated the effectiveness of our method. Better performance was achieved after comparing our method with other existing methods. Pathobiological similarity among these three diseases was further investigated for common top-ranked genes to reveal their pathogenesis.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Shanshan Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Xilei Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuyan Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Zhaona Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuqing Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin 150000, Heilongjiang Province, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China.
| |
Collapse
|