1
|
Saranya KR, Vimina ER, Pinto FR. TransNeT-CGP: A cluster-based comorbid gene prioritization by integrating transcriptomics and network-topological features. Comput Biol Chem 2024; 110:108038. [PMID: 38461796 DOI: 10.1016/j.compbiolchem.2024.108038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 01/11/2024] [Accepted: 02/25/2024] [Indexed: 03/12/2024]
Abstract
The local disruptions caused by the genes of one disease can influence the pathways associated with the other diseases resulting in comorbidity. For gene therapies, it is necessary to prioritize the key genes that regulate common biological mechanisms to tackle the issues caused by overlapping diseases. This work proposes a clustering-based computational approach for prioritising the comorbid genes within the overlapping disease modules by analyzing Protein-Protein Interaction networks. For this, a sub-network with gene interactions of the disease pair was extracted from the interactome. The edge weights are assigned by combining the pairwise gene expression correlation and betweenness centrality scores. Further, a weighted graph clustering algorithm is applied and dominant nodes of high-density clusters are ranked based on clustering coefficients and neighborhood connectivity. Case studies based on neurodegenerative diseases such as Amyotrophic Lateral Sclerosis- Spinal Muscular Atrophy (ALS-SMA) pair and cancers such as Ovarian Carcinoma-Invasive Ductal Breast Carcinoma (OC-IDBC) pair were conducted to examine the efficacy of the proposed method. To identify the mechanistic role of top-ranked genes, we used Functional and Pathway enrichment analysis, connectivity analysis with leave-one-out (LOO) method, analysis of associated disease-related protein complexes, and prioritization tools such as TOPPGENE and Heml2.0. From pathway analysis, it was observed that the top 10 genes obtained using the proposed method were associated with 10 pathways in ALS-SMA comorbidity and 15 in the case of OC-IDBC, while that in similar methods like SAPDSB and S2B were 4, 6 respectively for ALS-SMA and 9, 10 respectively for OC-IDBC. In both case studies, 70 % of the disease-specific benchmark protein complexes were linked to top-ranked genes of the proposed method while that of SAPDSB and S2B were 55 % and 60 % respectively. Additionally, it was found that the removal of the top 10 genes disconnect the network into 14 distinct components in the case of ALS-SMA and 9 in the case of OC-IDBC. The experimental results shows that the proposed method can be effectively used for identifying key genes in comorbidity and can offer insights about the intricate molecular relationship driving comorbid diseases.
Collapse
Affiliation(s)
- K R Saranya
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - E R Vimina
- Department of Computer Science & IT, School of Computing, Amrita Vishwa Vidyapeetham, Kochi Campus, India.
| | - F R Pinto
- Chemistry and Biochemistry Department, Faculty of Sciences, University of Lisbon, Portugal.
| |
Collapse
|
2
|
Wu X, Cao S, Zou Y, Wu F. Traditional Chinese Medicine studies for Alzheimer's disease via network pharmacology based on entropy and random walk. PLoS One 2023; 18:e0294772. [PMID: 38019798 PMCID: PMC10686466 DOI: 10.1371/journal.pone.0294772] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2023] [Accepted: 11/08/2023] [Indexed: 12/01/2023] Open
Abstract
Alzheimer's disease (AD) is a common neurodegenerative disease having complex pathogenesis, approved drugs can only alleviate symptoms of AD for a period of time. Traditional Chinese medicine (TCM) contains multiple active ingredients that can act on multiple targets simultaneously. In this paper, a novel algorithm based on entropy and random walk with the restart of heterogeneous network (RWRHE) is proposed for predicting active ingredients for AD and screening out the effective TCMs for AD. First, Six TCM compounds containing 20 herbs from the AD drug reviews in the CNKI (China National Knowledge Internet) are collected, their active ingredients and targets are retrieved from different databases. Then, comprehensive similarity networks of active ingredients and targets are constructed based on different aspects and entropy weight, respectively. A comprehensive heterogeneous network is constructed by integrating the known active ingredient-target association information and two comprehensive similarity networks. Subsequently, bi-random walks are applied on the heterogeneous network to predict active ingredient-target associations. AD related targets are selected as the seed nodes, a random walk is carried out on the target similarity network to predict the AD-target associations, and the associations of AD-active ingredients are inferred and scored. The effective herbs and compounds for AD are screened out based on their active ingredients' scores. The results measured by machine learning and bioinformatics show that the RWRHE algorithm achieves better prediction accuracy, the top 15 active ingredients may act as multi-target agents in the prevention and treatment of AD, Danshen, Gouteng and Chaihu are recommended as effective TCMs for AD, Yiqitongyutang is recommended as effective compound for AD.
Collapse
Affiliation(s)
- Xiaolu Wu
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Shujuan Cao
- School of Mathematical Sciences, Tiangong University, Tianjin, China
| | - Yongming Zou
- Department of Neurology, Tianjin Huanhu Hospital, Tianjin, China
| | - Fangxiang Wu
- Division of Biomedical Engineering, Department of Mechanical Engineering and Department of Computer Science, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
3
|
Shi W, Feng H, Li J, Liu T, Liu Z. DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding. Front Genet 2023; 14:1222346. [PMID: 37811150 PMCID: PMC10556742 DOI: 10.3389/fgene.2023.1222346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/11/2023] [Indexed: 10/10/2023] Open
Abstract
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human disease-gene network, mouse gene-phenotype network, human-mouse homologous gene network, and human protein-protein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations.
Collapse
Affiliation(s)
- Wanqi Shi
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Hailin Feng
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Tongcun Liu
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Zhe Liu
- College of Media Engineering, Zhejiang University of Media and Communications, Hangzhou, Zhejiang, China
| |
Collapse
|
4
|
Liu X, Gao L, Peng Y, Fang Z, Wang J. PheSom: a term frequency-based method for measuring human phenotype similarity on the basis of MeSH vocabulary. Front Genet 2023; 14:1185790. [PMID: 37496714 PMCID: PMC10366691 DOI: 10.3389/fgene.2023.1185790] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Accepted: 06/21/2023] [Indexed: 07/28/2023] Open
Abstract
Background: Phenotype similarity calculation should be used to help improve drug repurposing. In this study, based on the MeSH terms describing the phenotypes deposited in OMIM, we proposed a method, namely, PheSom (Phenotype Similarity On MeSH), to measure the similarity between phenotypes. PheSom counted the number of overlapping MeSH terms between two phenotypes and then took the weight of every MeSH term within each phenotype into account according to the term frequency-inverse document frequency (FIDC). Phenotype-related genes were used for the evaluation of our method. Results: A 7,739 × 7,739 similarity score matrix was finally obtained and the number of phenotype pairs was dramatically decreased with the increase of similarity score. Besides, the overlapping rates of phenotype-related genes were remarkably increased with the increase of similarity score between phenotypes, which supports the reliability of our method. Conclusion: We anticipate our method can be applied to identifying novel therapeutic methods for complex diseases.
Collapse
Affiliation(s)
- Xinhua Liu
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ling Gao
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Hangzhou Normal University, Hangzhou, Zhejiang, China
| | - Yonglin Peng
- Shanghai Center for Systems Biomedicine, Shanghai Jiao Tong University, Shanghai, China
| | - Zhonghai Fang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin, China
| |
Collapse
|
5
|
Hoang VT, Jeon HJ, You ES, Yoon Y, Jung S, Lee OJ. Graph Representation Learning and Its Applications: A Survey. SENSORS (BASEL, SWITZERLAND) 2023; 23:4168. [PMID: 37112507 PMCID: PMC10144941 DOI: 10.3390/s23084168] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2023] [Revised: 04/16/2023] [Accepted: 04/17/2023] [Indexed: 06/19/2023]
Abstract
Graphs are data structures that effectively represent relational data in the real world. Graph representation learning is a significant task since it could facilitate various downstream tasks, such as node classification, link prediction, etc. Graph representation learning aims to map graph entities to low-dimensional vectors while preserving graph structure and entity relationships. Over the decades, many models have been proposed for graph representation learning. This paper aims to show a comprehensive picture of graph representation learning models, including traditional and state-of-the-art models on various graphs in different geometric spaces. First, we begin with five types of graph embedding models: graph kernels, matrix factorization models, shallow models, deep-learning models, and non-Euclidean models. In addition, we also discuss graph transformer models and Gaussian embedding models. Second, we present practical applications of graph embedding models, from constructing graphs for specific domains to applying models to solve tasks. Finally, we discuss challenges for existing models and future research directions in detail. As a result, this paper provides a structured overview of the diversity of graph embedding models.
Collapse
Affiliation(s)
- Van Thuy Hoang
- Department of Artificial Intelligence, The Catholic University of Korea, 43, Jibong-ro, Bucheon-si 14662, Gyeonggi-do, Republic of Korea; (V.T.H.); (E.-S.Y.)
| | - Hyeon-Ju Jeon
- Data Assimilation Group, Korea Institute of Atmospheric Prediction Systems (KIAPS), 35, Boramae-ro 5-gil, Dongjak-gu, Seoul 07071, Republic of Korea;
| | - Eun-Soon You
- Department of Artificial Intelligence, The Catholic University of Korea, 43, Jibong-ro, Bucheon-si 14662, Gyeonggi-do, Republic of Korea; (V.T.H.); (E.-S.Y.)
| | - Yoewon Yoon
- Department of Social Welfare, Dongguk University, 30, Pildong-ro 1-gil, Jung-gu, Seoul 04620, Republic of Korea;
| | - Sungyeop Jung
- Semiconductor Devices and Circuits Laboratory, Advanced Institute of Convergence Technology (AICT), Seoul National University, 145, Gwanggyo-ro, Yeongtong-gu, Suwon-si 16229, Gyeonggi-do, Republic of Korea;
| | - O-Joun Lee
- Department of Artificial Intelligence, The Catholic University of Korea, 43, Jibong-ro, Bucheon-si 14662, Gyeonggi-do, Republic of Korea; (V.T.H.); (E.-S.Y.)
| |
Collapse
|
6
|
Azadifar S, Ahmadi A. A novel candidate disease gene prioritization method using deep graph convolutional networks and semi-supervised learning. BMC Bioinformatics 2022; 23:422. [PMID: 36241966 PMCID: PMC9563530 DOI: 10.1186/s12859-022-04954-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Accepted: 09/20/2022] [Indexed: 11/18/2022] Open
Abstract
Background Selecting and prioritizing candidate disease genes is necessary before conducting laboratory studies as identifying disease genes from a large number of candidate genes using laboratory methods, is a very costly and time-consuming task. There are many machine learning-based gene prioritization methods. These methods differ in various aspects including the feature vectors of genes, the used datasets with different structures, and the learning model. Creating a suitable feature vector for genes and an appropriate learning model on a variety of data with different and non-Euclidean structures, including graphs, as well as the lack of negative data are very important challenges of these methods. The use of graph neural networks has recently emerged in machine learning and other related fields, and they have demonstrated superior performance for a broad range of problems. Methods In this study, a new semi-supervised learning method based on graph convolutional networks is presented using the novel constructing feature vector for each gene. In the proposed method, first, we construct three feature vectors for each gene using terms from the Gene Ontology (GO) database. Then, we train a graph convolution network on these vectors using protein–protein interaction (PPI) network data to identify disease candidate genes. Our model discovers hidden layer representations encoding in both local graph structure as well as features of nodes. This method is characterized by the simultaneous consideration of topological information of the biological network (e.g., PPI) and other sources of evidence. Finally, a validation has been done to demonstrate the efficiency of our method. Results Several experiments are performed on 16 diseases to evaluate the proposed method's performance. The experiments demonstrate that our proposed method achieves the best results, in terms of precision, the area under the ROC curve (AUCs), and F1-score values, when compared with eight state-of-the-art network and machine learning-based disease gene prioritization methods. Conclusion This study shows that the proposed semi-supervised learning method appropriately classifies and ranks candidate disease genes using a graph convolutional network and an innovative method to create three feature vectors for genes based on the molecular function, cellular component, and biological process terms from GO data.
Collapse
Affiliation(s)
- Saeid Azadifar
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran.
| | - Ali Ahmadi
- Faculty of Computer Engineering, K. N. Toosi University of Technology, Tehran, Iran
| |
Collapse
|
7
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
8
|
Liu J, Zhu H, Qiu J. Locally Adjust Networks Based on Connectivity and Semantic Similarities for Disease Module Detection. Front Genet 2021; 12:726596. [PMID: 34759955 PMCID: PMC8575408 DOI: 10.3389/fgene.2021.726596] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 09/22/2021] [Indexed: 11/13/2022] Open
Abstract
For studying the pathogenesis of complex diseases, it is important to identify the disease modules in the system level. Since the protein-protein interaction (PPI) networks contain a number of incomplete and incorrect interactome, most existing methods often lead to many disease proteins isolating from disease modules. In this paper, we propose an effective disease module identification method IDMCSS, where the used human PPI networks are obtained by adding some potential missing interactions from existing PPI networks, as well as removing some potential incorrect interactions. In IDMCSS, a network adjustment strategy is developed to add or remove links around disease proteins based on both topological and semantic information. Next, neighboring proteins of disease proteins are prioritized according to a suggested similarity between each of them and disease proteins, and the protein with the largest similarity with disease proteins is added into a candidate disease protein set one by one. The stopping criterion is set to the boundary of the disease proteins. Finally, the connected subnetwork having the largest number of disease proteins is selected as a disease module. Experimental results on asthma demonstrate the effectiveness of the method in comparison to existing algorithms for disease module identification. It is also shown that the proposed IDMCSS can obtain the disease modules having crucial biological processes of asthma and 12 targets for drug intervention can be predicted.
Collapse
Affiliation(s)
- Jia Liu
- State Key Laboratory of Media Convergence and Communication, Communication University of China, Beijing, China
| | - Huole Zhu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| | - Jianfeng Qiu
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Artificial Intelligence, Anhui University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, School of Artificial Intelligence, Anhui University, Hefei, China
| |
Collapse
|
9
|
Xiao Q, Dai J, Luo J. A survey of circular RNAs in complex diseases: databases, tools and computational methods. Brief Bioinform 2021; 23:6407737. [PMID: 34676391 DOI: 10.1093/bib/bbab444] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2021] [Revised: 09/21/2021] [Accepted: 09/28/2021] [Indexed: 01/22/2023] Open
Abstract
Circular RNAs (circRNAs) are a category of novelty discovered competing endogenous non-coding RNAs that have been proved to implicate many human complex diseases. A large number of circRNAs have been confirmed to be involved in cancer progression and are expected to become promising biomarkers for tumor diagnosis and targeted therapy. Deciphering the underlying relationships between circRNAs and diseases may provide new insights for us to understand the pathogenesis of complex diseases and further characterize the biological functions of circRNAs. As traditional experimental methods are usually time-consuming and laborious, computational models have made significant progress in systematically exploring potential circRNA-disease associations, which not only creates new opportunities for investigating pathogenic mechanisms at the level of circRNAs, but also helps to significantly improve the efficiency of clinical trials. In this review, we first summarize the functions and characteristics of circRNAs and introduce some representative circRNAs related to tumorigenesis. Then, we mainly investigate the available databases and tools dedicated to circRNA and disease studies. Next, we present a comprehensive review of computational methods for predicting circRNA-disease associations and classify them into five categories, including network propagating-based, path-based, matrix factorization-based, deep learning-based and other machine learning methods. Finally, we further discuss the challenges and future researches in this field.
Collapse
Affiliation(s)
- Qiu Xiao
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jianhua Dai
- Hunan Normal University and Hunan Xiangjiang Artificial Intelligence Academy, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
10
|
Boizard F, Buffin-Meyer B, Aligon J, Teste O, Schanstra JP, Klein J. PRYNT: a tool for prioritization of disease candidates from proteomics data using a combination of shortest-path and random walk algorithms. Sci Rep 2021; 11:5764. [PMID: 33707596 PMCID: PMC7952700 DOI: 10.1038/s41598-021-85135-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Accepted: 01/29/2021] [Indexed: 11/14/2022] Open
Abstract
The urinary proteome is a promising pool of biomarkers of kidney disease. However, the protein changes observed in urine only partially reflect the deregulated mechanisms within kidney tissue. In order to improve on the mechanistic insight based on the urinary protein changes, we developed a new prioritization strategy called PRYNT (PRioritization bY protein NeTwork) that employs a combination of two closeness-based algorithms, shortest-path and random walk, and a contextualized protein-protein interaction (PPI) network, mainly based on clique consolidation of STRING network. To assess the performance of our approach, we evaluated both precision and specificity of PRYNT in prioritizing kidney disease candidates. Using four urinary proteome datasets, PRYNT prioritization performed better than other prioritization methods and tools available in the literature. Moreover, PRYNT performed to a similar, but complementary, extent compared to the upstream regulator analysis from the commercial Ingenuity Pathway Analysis software. In conclusion, PRYNT appears to be a valuable freely accessible tool to predict key proteins indirectly from urinary proteome data. In the future, PRYNT approach could be applied to other biofluids, molecular traits and diseases. The source code is freely available on GitHub at: https://github.com/Boizard/PRYNT and has been integrated as an interactive web apps to improved accessibility ( https://github.com/Boizard/PRYNT/tree/master/AppPRYNT ).
Collapse
Affiliation(s)
- Franck Boizard
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, 31432, Toulouse, France
- Université Toulouse III Paul-Sabatier, 31330, Toulouse, France
| | - Bénédicte Buffin-Meyer
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, 31432, Toulouse, France
- Université Toulouse III Paul-Sabatier, 31330, Toulouse, France
| | - Julien Aligon
- Université de Toulouse, UT1, IRIT, (CNRS/UMR 5505), Toulouse, France
| | - Olivier Teste
- Université de Toulouse, UT2J, IRIT, (CNRS/UMR 5505), Toulouse, France
| | - Joost P Schanstra
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, 31432, Toulouse, France
- Université Toulouse III Paul-Sabatier, 31330, Toulouse, France
| | - Julie Klein
- Institut National de la Santé et de la Recherche Médicale (INSERM), U1297, Institute of Cardiovascular and Metabolic Disease, 31432, Toulouse, France.
- Université Toulouse III Paul-Sabatier, 31330, Toulouse, France.
| |
Collapse
|
11
|
Joodaki M, Ghadiri N, Maleki Z, Lotfi Shahreza M. A scalable random walk with restart on heterogeneous networks with Apache Spark for ranking disease-related genes through type-II fuzzy data fusion. J Biomed Inform 2021; 115:103688. [PMID: 33545331 DOI: 10.1016/j.jbi.2021.103688] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2020] [Revised: 01/10/2021] [Accepted: 01/23/2021] [Indexed: 12/11/2022]
Abstract
One of the effective missions of biology and medical science is to find disease-related genes. Recent research uses gene/protein networks to find such genes. Due to false positive interactions in these networks, the results often are not accurate and reliable. Integrating multiple gene/protein networks could overcome this drawback, causing a network with fewer false positive interactions. The integration method plays a crucial role in the quality of the constructed network. In this paper, we integrate several sources to build a reliable heterogeneous network, i.e., a network that includes nodes of different types. Due to the different gene/protein sources, four gene-gene similarity networks are constructed first and integrated by applying the type-II fuzzy voter scheme. The resulting gene-gene network is linked to a disease-disease similarity network (as the outcome of integrating four sources) through a two-part disease-gene network. We propose a novel algorithm, namely random walk with restart on the heterogeneous network method with fuzzy fusion (RWRHN-FF). Through running RWRHN-FF over the heterogeneous network, disease-related genes are determined. Experimental results using the leave-one-out cross-validation indicate that RWRHN-FF outperforms existing methods. The proposed algorithm can be applied to find new genes for prostate, breast, gastric, and colon cancers. Since the RWRHN-FF algorithm converges slowly on large heterogeneous networks, we propose a parallel implementation of the RWRHN-FF algorithm on the Apache Spark platform for high-throughput and reliable network inference. Experiments run on heterogeneous networks of different sizes indicate faster convergence compared to other non-distributed modes of implementation.
Collapse
Affiliation(s)
- Mehdi Joodaki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | - Nasser Ghadiri
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran.
| | - Zeinab Maleki
- Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
| | | |
Collapse
|
12
|
Liu Y, Guo Y, Liu X, Wang C, Guo M. Pathogenic gene prediction based on network embedding. Brief Bioinform 2020; 22:6053103. [PMID: 33367541 DOI: 10.1093/bib/bbaa353] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2020] [Revised: 11/02/2020] [Accepted: 11/03/2020] [Indexed: 11/13/2022] Open
Abstract
In disease research, the study of gene-disease correlation has always been an important topic. With the emergence of large-scale connected data sets in biology, we use known correlations between the entities, which may be from different sets, to build a biological heterogeneous network and propose a new network embedded representation algorithm to calculate the correlation between disease and genes, using the correlation score to predict pathogenic genes. Then, we conduct several experiments to compare our method to other state-of-the-art methods. The results reveal that our method achieves better performance than the traditional methods.
Collapse
Affiliation(s)
- Yang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuchen Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
13
|
Xiao Q, Luo J, Liang C, Li G, Cai J, Ding P, Liu Y. Identifying lncRNA and mRNA Co-Expression Modules from Matched Expression Data in Ovarian Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:623-634. [PMID: 30106686 DOI: 10.1109/tcbb.2018.2864129] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Long non-coding RNAs (lncRNAs) have been shown to be involved in multiple biological processes and play critical roles in tumorigenesis. Numerous lncRNAs have been discovered in diverse species, but the functions of most lncRNAs still remain unclear. Meanwhile, their expression patterns and regulation mechanisms are also far from being fully understood. With the advances of high-throughput technologies, the increasing availability of genomic data creates opportunities for deciphering the molecular mechanism and underlying pathogenesis of human diseases. Here, we develop an integrative framework called JONMF to identify lncRNA-mRNA co-expression modules based on the sample-matched lncRNA and mRNA expression profiles. We formulate the module detection task as an optimization problem with joint orthogonal non-negative matrix factorization that could effectively prevent multicollinearity and produce a good modularity interpretation. The constructed lncRNA-mRNA co-expression network and the gene-gene interaction network are used as the network-regularized constraints to improve the module accuracy, while the sparsity constraints are simultaneously utilized to achieve modular sparse solutions. We applied JONMF to human ovarian cancer dataset and the experiment results demonstrate that the proposed method can effectively discover biologically functional co-expression modules, which may provide insights into the function of lncRNAs and molecular mechanism of human diseases.
Collapse
|
14
|
Zhang W, Lei Ieee Member X, Bian C. Identifying Cancer genes by combining two-rounds RWR based on multiple biological data. BMC Bioinformatics 2019; 20:518. [PMID: 31760937 PMCID: PMC6876101 DOI: 10.1186/s12859-019-3123-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background It’s a very urgent task to identify cancer genes that enables us to understand the mechanisms of biochemical processes at a biomolecular level and facilitates the development of bioinformatics. Although a large number of methods have been proposed to identify cancer genes at recent times, the biological data utilized by most of these methods is still quite less, which reflects an insufficient consideration of the relationship between genes and diseases from a variety of factors. Results In this paper, we propose a two-rounds random walk algorithm to identify cancer genes based on multiple biological data (TRWR-MB), including protein-protein interaction (PPI) network, pathway network, microRNA similarity network, lncRNA similarity network, cancer similarity network and protein complexes. In the first-round random walk, all cancer nodes, cancer-related genes, cancer-related microRNAs and cancer-related lncRNAs, being associated with all the cancer, are used as seed nodes, and then a random walker walks on a quadruple layer heterogeneous network constructed by multiple biological data. The first-round random walk aims to select the top score k of potential cancer genes. Then in the second-round random walk, genes, microRNAs and lncRNAs, being associated with a certain special cancer in corresponding cancer class, are regarded as seed nodes, and then the walker walks on a new quadruple layer heterogeneous network constructed by lncRNAs, microRNAs, cancer and selected potential cancer genes. After the above walks finish, we combine the results of two-rounds RWR as ranking score for experimental analysis. As a result, a higher value of area under the receiver operating characteristic curve (AUC) is obtained. Besides, cases studies for identifying new cancer genes are performed in corresponding section. Conclusion In summary, TRWR-MB integrates multiple biological data to identify cancer genes by analyzing the relationship between genes and cancer from a variety of biological molecular perspective.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, Shaanxi, China
| | | | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, 710119, Shaanxi, China
| |
Collapse
|
15
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
16
|
Ozturk K, Dow M, Carlin DE, Bejar R, Carter H. The Emerging Potential for Network Analysis to Inform Precision Cancer Medicine. J Mol Biol 2018; 430:2875-2899. [PMID: 29908887 PMCID: PMC6097914 DOI: 10.1016/j.jmb.2018.06.016] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2018] [Revised: 05/30/2018] [Accepted: 06/06/2018] [Indexed: 12/19/2022]
Abstract
Precision cancer medicine promises to tailor clinical decisions to patients using genomic information. Indeed, successes of drugs targeting genetic alterations in tumors, such as imatinib that targets BCR-ABL in chronic myelogenous leukemia, have demonstrated the power of this approach. However, biological systems are complex, and patients may differ not only by the specific genetic alterations in their tumor, but also by more subtle interactions among such alterations. Systems biology and more specifically, network analysis, provides a framework for advancing precision medicine beyond clinical actionability of individual mutations. Here we discuss applications of network analysis to study tumor biology, early methods for N-of-1 tumor genome analysis, and the path for such tools to the clinic.
Collapse
Affiliation(s)
- Kivilcim Ozturk
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Michelle Dow
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Daniel E Carlin
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA
| | - Rafael Bejar
- Moores Cancer Center, Division of Hematology and Oncology, University of California San Diego, La Jolla, CA 92093, USA
| | - Hannah Carter
- Department of Medicine, Division of Medical Genetics, University of California San Diego, La Jolla, CA 92093, USA; Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA; Moores Cancer Center and Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA; CIFAR, MaRS Centre, West Tower, 661 University Ave., Suite 505, Toronto, ON M5G 1M1, Canada.
| |
Collapse
|
17
|
Zhang W, Wang SL. An efficient strategy for identifying cancer-related key genes based on graph entropy. Comput Biol Chem 2018; 74:142-148. [PMID: 29609142 DOI: 10.1016/j.compbiolchem.2018.03.022] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2017] [Revised: 01/22/2018] [Accepted: 03/20/2018] [Indexed: 02/02/2023]
Abstract
Gene networks are beneficial to identify functional genes that are highly relevant to clinical outcomes. Most of the current methods require information about the interaction of genes or proteins to construct genetic network connection. However, the conclusion of these methods may be bias because of the current incompleteness of human interactome. In this paper, we propose an efficient strategy to use gene expression data and gene mutation data for identifying cancer-related key genes based on graph entropy (iKGGE). Firstly, we construct a gene network using only gene expression data based on the sparse inverse covariance matrix, then, cluster genes use the algorithm of parallel maximal cliques for quickly obtaining a series of subgraphs, and at last, we introduce a novel metric that combine graph entropy and the influence of upstream gene mutations information to measure the impact factors of genes. Testing of the three available cancer datasets shows that our strategy can effectively extract key genes that may play distinct roles in tumorigenesis, and the cancer patient risk groups are well predicted based on key genes.
Collapse
Affiliation(s)
- Wei Zhang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, Hunan, 410082, China.
| | - Shu-Lin Wang
- College of Computer Science and Electronics Engineering, Hunan University, Changsha, Hunan, 410082, China.
| |
Collapse
|
18
|
Abstract
Genome-wide association studies (GWAS) have identified more than 100 loci that show robust association with schizophrenia risk. However, due to the complexity of linkage disequilibrium and gene regulatory, it is challenging to pinpoint the causal genes at the risk loci and translate the genetic findings from GWAS into disease mechanism and clinical treatment. Here we systematically predicted the plausible candidate causal genes for schizophrenia at genome-wide level. We utilized different approaches and strategies to predict causal genes for schizophrenia, including Sherlock, SMR, DAPPLE, Prix Fixe, NetWAS, and DEPICT. By integrating the results from different prediction approaches, we identified six top candidates that represent promising causal genes for schizophrenia, including CNTN4, GATAD2A, GPM6A, MMP16, PSMA4, and TCF4. Besides, we also identified 35 additional high-confidence causal genes for schizophrenia. The identified causal genes showed distinct spatio-temporal expression patterns in developing and adult human brain. Cell-type-specific expression analysis indicated that the expression level of the predicted causal genes was significantly higher in neurons compared with oligodendrocytes and microglia (P < 0.05). We found that synaptic transmission-related genes were significantly enriched among the identified causal genes (P < 0.05), providing further support for the dysregulation of synaptic transmission in schizophrenia. Finally, we showed that the top six causal genes are dysregulated in schizophrenia cases compared with controls and knockdown of these genes impaired the proliferation of neuronal cells. Our study depicts the landscape of plausible schizophrenia causal genes for the first time. Further genetic and functional validation of these genes will provide mechanistic insights into schizophrenia pathogenesis and may facilitate to provide potential targets for future therapeutics and diagnostics.
Collapse
Affiliation(s)
- Changguo Ma
- 0000000119573309grid.9227.eKey Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223 China
| | - Chunjie Gu
- 0000000119573309grid.9227.eKey Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223 China
| | - Yongxia Huo
- 0000000119573309grid.9227.eKey Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223 China
| | - Xiaoyan Li
- 0000000119573309grid.9227.eKey Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223 China
| | - Xiong-Jian Luo
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China. .,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, Yunnan, 650223, China.
| |
Collapse
|
19
|
Ding P, Luo J, Liang C, Xiao Q, Cao B. Human disease MiRNA inference by combining target information based on heterogeneous manifolds. J Biomed Inform 2018; 80:26-36. [PMID: 29481877 DOI: 10.1016/j.jbi.2018.02.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2017] [Revised: 02/11/2018] [Accepted: 02/21/2018] [Indexed: 12/12/2022]
Abstract
The emergence of network medicine has provided great insight into the identification of disease-related molecules, which could help with the development of personalized medicine. However, the state-of-the-art methods could neither simultaneously consider target information and the known miRNA-disease associations nor effectively explore novel gene-disease associations as a by-product during the process of inferring disease-related miRNAs. Computational methods incorporating multiple sources of information offer more opportunities to infer disease-related molecules, including miRNAs and genes in heterogeneous networks at a system level. In this study, we developed a novel algorithm, named inference of Disease-related MiRNAs based on Heterogeneous Manifold (DMHM), to accurately and efficiently identify miRNA-disease associations by integrating multi-omics data. Graph-based regularization was utilized to obtain a smooth function on the data manifold, which constitutes the main principle of DMHM. The novelty of this framework lies in the relatedness between diseases and miRNAs, which are measured via heterogeneous manifolds on heterogeneous networks integrating target information. To demonstrate the effectiveness of DMHM, we conducted comprehensive experiments based on HMDD datasets and compared DMHM with six state-of-the-art methods. Experimental results indicated that DMHM significantly outperformed the other six methods under fivefold cross validation and de novo prediction tests. Case studies have further confirmed the practical usefulness of DMHM.
Collapse
Affiliation(s)
- Pingjian Ding
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China.
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan 250014, China
| | - Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| | - Buwen Cao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410083, China
| |
Collapse
|
20
|
|
21
|
Kim J, Bang C, Hwang H, Kim D, Park C, Park S. IMA: Identifying disease-related genes using MeSH terms and association rules. J Biomed Inform 2017; 76:110-123. [DOI: 10.1016/j.jbi.2017.11.009] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2017] [Revised: 10/29/2017] [Accepted: 11/13/2017] [Indexed: 01/19/2023]
|
22
|
Ramyachitra D, Nithya R. Construction of reliable heterogeneous network using protein sequence similarity for the prioritization of candidate disease genes. GENE REPORTS 2017. [DOI: 10.1016/j.genrep.2017.04.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
23
|
Disease genes prioritizing mechanisms: a comprehensive and systematic literature review. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/s13721-017-0154-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
24
|
Diaz-Beltran L, Esteban FJ, Varma M, Ortuzk A, David M, Wall DP. Cross-disorder comparative analysis of comorbid conditions reveals novel autism candidate genes. BMC Genomics 2017; 18:315. [PMID: 28427329 PMCID: PMC5399393 DOI: 10.1186/s12864-017-3667-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 03/28/2017] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Numerous studies have highlighted the elevated degree of comorbidity associated with autism spectrum disorder (ASD). These comorbid conditions may add further impairments to individuals with autism and are substantially more prevalent compared to neurotypical populations. These high rates of comorbidity are not surprising taking into account the overlap of symptoms that ASD shares with other pathologies. From a research perspective, this suggests common molecular mechanisms involved in these conditions. Therefore, identifying crucial genes in the overlap between ASD and these comorbid disorders may help unravel the common biological processes involved and, ultimately, shed some light in the understanding of autism etiology. RESULTS In this work, we used a two-fold systems biology approach specially focused on biological processes and gene networks to conduct a comparative analysis of autism with 31 frequently comorbid disorders in order to define a multi-disorder subcomponent of ASD and predict new genes of potential relevance to ASD etiology. We validated our predictions by determining the significance of our candidate genes in high throughput transcriptome expression profiling studies. Using prior knowledge of disease-related biological processes and the interaction networks of the disorders related to autism, we identified a set of 19 genes not previously linked to ASD that were significantly differentially regulated in individuals with autism. In addition, these genes were of potential etiologic relevance to autism, given their enriched roles in neurological processes crucial for optimal brain development and function, learning and memory, cognition and social behavior. CONCLUSIONS Taken together, our approach represents a novel perspective of autism from the point of view of related comorbid disorders and proposes a model by which prior knowledge of interaction networks may enlighten and focus the genome-wide search for autism candidate genes to better define the genetic heterogeneity of ASD.
Collapse
Affiliation(s)
- Leticia Diaz-Beltran
- Division of Systems Medicine, Department of Pediatrics, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5488, USA
- Division of Systems Medicine, Department of Psychiatry, Stanford University, Stanford, CA, USA
- Systems Biology Unit, Department of Experimental Biology, University of Jaén, Jaén, Spain
| | - Francisco J Esteban
- Systems Biology Unit, Department of Experimental Biology, University of Jaén, Jaén, Spain
| | - Maya Varma
- Division of Systems Medicine, Department of Pediatrics, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5488, USA
- Division of Systems Medicine, Department of Psychiatry, Stanford University, Stanford, CA, USA
| | - Alp Ortuzk
- Division of Systems Medicine, Department of Pediatrics, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5488, USA
- Division of Systems Medicine, Department of Psychiatry, Stanford University, Stanford, CA, USA
| | - Maude David
- Division of Systems Medicine, Department of Pediatrics, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5488, USA
- Division of Systems Medicine, Department of Psychiatry, Stanford University, Stanford, CA, USA
| | - Dennis P Wall
- Division of Systems Medicine, Department of Pediatrics, School of Medicine, Stanford University, 1265 Welch Road, Stanford, CA, 94305-5488, USA.
- Division of Systems Medicine, Department of Psychiatry, Stanford University, Stanford, CA, USA.
- Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
| |
Collapse
|
25
|
Luo J, Xiao Q. A novel approach for predicting microRNA-disease associations by unbalanced bi-random walk on heterogeneous network. J Biomed Inform 2017; 66:194-203. [PMID: 28104458 DOI: 10.1016/j.jbi.2017.01.008] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2016] [Revised: 01/11/2017] [Accepted: 01/13/2017] [Indexed: 12/24/2022]
Abstract
MicroRNAs (miRNAs) play a critical role by regulating their targets in post-transcriptional level. Identification of potential miRNA-disease associations will aid in deciphering the pathogenesis of human polygenic diseases. Several computational models have been developed to uncover novel miRNA-disease associations based on the predicted target genes. However, due to the insufficient number of experimentally validated miRNA-target interactions as well as the relatively high false-positive and false-negative rates of predicted target genes, it is still challenging for these prediction models to obtain remarkable performances. The purpose of this study is to prioritize miRNA candidates for diseases. We first construct a heterogeneous network, which consists of a disease similarity network, a miRNA functional similarity network and a known miRNA-disease association network. Then, an unbalanced bi-random walk-based algorithm on the heterogeneous network (BRWH) is adopted to discover potential associations by exploiting bipartite subgraphs. Based on 5-fold cross validation, the proposed network-based method achieves AUC values ranging from 0.782 to 0.907 for the 22 human diseases and an average AUC of almost 0.846. The experiments indicated that BRWH can achieve better performances compared with several popular methods. In addition, case studies of some common diseases further demonstrated the superior performance of our proposed method on prioritizing disease-related miRNA candidates.
Collapse
Affiliation(s)
- Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China.
| | - Qiu Xiao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| |
Collapse
|
26
|
A path-based measurement for human miRNA functional similarities using miRNA-disease associations. Sci Rep 2016; 6:32533. [PMID: 27585796 PMCID: PMC5009308 DOI: 10.1038/srep32533] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/04/2016] [Indexed: 01/09/2023] Open
Abstract
Compared with the sequence and expression similarity, miRNA functional similarity is so important for biology researches and many applications such as miRNA clustering, miRNA function prediction, miRNA synergism identification and disease miRNA prioritization. However, the existing methods always utilized the predicted miRNA target which has high false positive and false negative to calculate the miRNA functional similarity. Meanwhile, it is difficult to achieve high reliability of miRNA functional similarity with miRNA-disease associations. Therefore, it is increasingly needed to improve the measurement of miRNA functional similarity. In this study, we develop a novel path-based calculation method of miRNA functional similarity based on miRNA-disease associations, called MFSP. Compared with other methods, our method obtains higher average functional similarity of intra-family and intra-cluster selected groups. Meanwhile, the lower average functional similarity of inter-family and inter-cluster miRNA pair is obtained. In addition, the smaller p-value is achieved, while applying Wilcoxon rank-sum test and Kruskal-Wallis test to different miRNA groups. The relationship between miRNA functional similarity and other information sources is exhibited. Furthermore, the constructed miRNA functional network based on MFSP is a scale-free and small-world network. Moreover, the higher AUC for miRNA-disease prediction indicates the ability of MFSP uncovering miRNA functional similarity.
Collapse
|
27
|
Jiang J, Li W, Liang B, Xie R, Chen B, Huang H, Li Y, He Y, Lv J, He W, Chen L. A Novel Prioritization Method in Identifying Recurrent Venous Thromboembolism-Related Genes. PLoS One 2016; 11:e0153006. [PMID: 27050193 PMCID: PMC4822849 DOI: 10.1371/journal.pone.0153006] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2015] [Accepted: 03/21/2016] [Indexed: 12/13/2022] Open
Abstract
Identifying the genes involved in venous thromboembolism (VTE) recurrence is important not only for understanding the pathogenesis but also for discovering the therapeutic targets. We proposed a novel prioritization method called Function-Interaction-Pearson (FIP) by creating gene-disease similarity scores to prioritize candidate genes underling VTE. The scores were calculated by integrating and optimizing three types of resources including gene expression, gene ontology and protein-protein interaction. As a result, 124 out of top 200 prioritized candidate genes had been confirmed in literature, among which there were 34 antithrombotic drug targets. Compared with two well-known gene prioritization tools Endeavour and ToppNet, FIP was shown to have better performance. The approach provides a valuable alternative for drug targets discovery and disease therapy.
Collapse
Affiliation(s)
- Jing Jiang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Binhua Liang
- National Microbology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Ruiqiang Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Binbin Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Hao Huang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Yiran Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Junjie Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Hei Longjiang Province, China
- * E-mail: (LC); (WH)
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Hei Longjiang Province, China, Postal code: 150081
- * E-mail: (LC); (WH)
| |
Collapse
|
28
|
Shyr C, Kushniruk A, van Karnebeek CDM, Wasserman WW. Dynamic software design for clinical exome and genome analyses: insights from bioinformaticians, clinical geneticists, and genetic counselors. J Am Med Inform Assoc 2016; 23:257-68. [PMID: 26117142 PMCID: PMC4784553 DOI: 10.1093/jamia/ocv053] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Revised: 04/03/2015] [Accepted: 04/22/2015] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND The transition of whole-exome and whole-genome sequencing (WES/WGS) from the research setting to routine clinical practice remains challenging. OBJECTIVES With almost no previous research specifically assessing interface designs and functionalities of WES and WGS software tools, the authors set out to ascertain perspectives from healthcare professionals in distinct domains on optimal clinical genomics user interfaces. METHODS A series of semi-scripted focus groups, structured around professional challenges encountered in clinical WES and WGS, were conducted with bioinformaticians (n = 8), clinical geneticists (n = 9), genetic counselors (n = 5), and general physicians (n = 4). RESULTS Contrary to popular existing system designs, bioinformaticians preferred command line over graphical user interfaces for better software compatibility and customization flexibility. Clinical geneticists and genetic counselors desired an overarching interactive graphical layout to prioritize candidate variants--a "tiered" system where only functionalities relevant to the user domain are made accessible. They favored a system capable of retrieving consistent representations of external genetic information from third-party sources. To streamline collaboration and patient exchanges, the authors identified user requirements toward an automated reporting system capable of summarizing key evidence-based clinical findings among the vast array of technical details. CONCLUSIONS Successful adoption of a clinical WES/WGS system is heavily dependent on its ability to address the diverse necessities and predilections among specialists in distinct healthcare domains. Tailored software interfaces suitable for each group is likely more appropriate than the current popular "one size fits all" generic framework. This study provides interfaces for future intervention studies and software engineering opportunities.
Collapse
Affiliation(s)
- Casper Shyr
- Centre for Molecular Medicine and Therapeutics; Child and Family Research Institute, Vancouver BC, Canada Bioinformatics Graduate Program, University of British Columbia, Vancouver BC, Canada Treatable Intellectual Disability Endeavour in British Columbia (www.tidebc.org), Vancouver, Canada
| | - Andre Kushniruk
- School of Health Information Science, University of Victoria, 3800 Finnerty Rd, Victoria, BC V8P 5C2, Canada
| | - Clara D M van Karnebeek
- Treatable Intellectual Disability Endeavour in British Columbia (www.tidebc.org), Vancouver, Canada Division of Biochemical Diseases, BC Children's Hospital, Vancouver BC, Canada Department of Pediatrics, University of British Columbia, Vancouver BC, Canada
| | - Wyeth W Wasserman
- Centre for Molecular Medicine and Therapeutics; Child and Family Research Institute, Vancouver BC, Canada Treatable Intellectual Disability Endeavour in British Columbia (www.tidebc.org), Vancouver, Canada Department of Medical Genetics, University of British Columbia, Vancouver BC, Canada
| |
Collapse
|
29
|
Chen L, Zhang YH, Huang T, Cai YD. Identifying novel protein phenotype annotations by hybridizing protein-protein interactions and protein sequence similarities. Mol Genet Genomics 2016; 291:913-34. [PMID: 26728152 DOI: 10.1007/s00438-015-1157-9] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 12/08/2015] [Indexed: 01/18/2023]
Abstract
Studies of protein phenotypes represent a central challenge of modern genetics in the post-genome era because effective and accurate investigation of protein phenotypes is one of the most critical procedures to identify functional biological processes in microscale, which involves the analysis of multifactorial traits and has greatly contributed to the development of modern biology in the post genome era. Therefore, we have developed a novel computational method that identifies novel proteins associated with certain phenotypes in yeast based on the protein-protein interaction network. Unlike some existing network-based computational methods that identify the phenotype of a query protein based on its direct neighbors in the local network, the proposed method identifies novel candidate proteins for a certain phenotype by considering all annotated proteins with this phenotype on the global network using a shortest path (SP) algorithm. The identified proteins are further filtered using both a permutation test and their interactions and sequence similarities to annotated proteins. We compared our method with another widely used method called random walk with restart (RWR). The biological functions of proteins for each phenotype identified by our SP method and the RWR method were analyzed and compared. The results confirmed a large proportion of our novel protein phenotype annotation, and the RWR method showed a higher false positive rate than the SP method. Our method is equally effective for the prediction of proteins involving in all the eleven clustered yeast phenotypes with a quite low false positive rate. Considering the universality and generalizability of our supporting materials and computing strategies, our method can further be applied to study other organisms and the new functions we predicted can provide pertinent instructions for the further experimental verifications.
Collapse
Affiliation(s)
- Lei Chen
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China. .,College of Information Engineering, Shanghai Maritime University, Shanghai, 201306, People's Republic of China.
| | - Yu-Hang Zhang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Tao Huang
- Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, People's Republic of China
| | - Yu-Dong Cai
- School of Life Sciences, Shanghai University, Shanghai, 200444, People's Republic of China.
| |
Collapse
|