1
|
Wang X, Yang K, Jia T, Gu F, Wang C, Xu K, Shu Z, Xia J, Zhu Q, Zhou X. KDGene: knowledge graph completion for disease gene prediction using interactional tensor decomposition. Brief Bioinform 2024; 25:bbae161. [PMID: 38605639 PMCID: PMC11009469 DOI: 10.1093/bib/bbae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/20/2024] [Accepted: 03/13/2024] [Indexed: 04/13/2024] Open
Abstract
The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.
Collapse
Affiliation(s)
| | - Kuo Yang
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| | | | | | | | | | | | | | | | - Xuezhong Zhou
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| |
Collapse
|
2
|
Gan X, Shu Z, Wang X, Yan D, Li J, Ofaim S, Albert R, Li X, Liu B, Zhou X, Barabási AL. Network medicine framework reveals generic herb-symptom effectiveness of traditional Chinese medicine. SCIENCE ADVANCES 2023; 9:eadh0215. [PMID: 37889962 PMCID: PMC10610911 DOI: 10.1126/sciadv.adh0215] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/15/2023] [Accepted: 09/26/2023] [Indexed: 10/29/2023]
Abstract
Understanding natural and traditional medicine can lead to world-changing drug discoveries. Despite the therapeutic effectiveness of individual herbs, traditional Chinese medicine (TCM) lacks a scientific foundation and is often considered a myth. In this study, we establish a network medicine framework and reveal the general TCM treatment principle as the topological relationship between disease symptoms and TCM herb targets on the human protein interactome. We find that proteins associated with a symptom form a network module, and the network proximity of an herb's targets to a symptom module is predictive of the herb's effectiveness in treating the symptom. These findings are validated using patient data from a hospital. We highlight the translational value of our framework by predicting herb-symptom treatments with therapeutic potential. Our network medicine framework reveals the scientific foundation of TCM and establishes a paradigm for understanding the molecular basis of natural medicine and predicting disease treatments.
Collapse
Affiliation(s)
- Xiao Gan
- Institute for AI in Medicine, School of Artificial Intelligence, Nanjing University of Information Science and Technology, Nanjing 210044, China
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, MA 02115, USA
- Department of Physics, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Zixin Shu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100063, China
| | - Xinyan Wang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100063, China
| | - Dengying Yan
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100063, China
| | - Jun Li
- Hubei University of Chinese Medicine, Wuhan 430065, China
| | - Shany Ofaim
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
| | - Réka Albert
- Department of Physics, Pennsylvania State University, University Park, PA 16802, USA
- Department of Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Xiaodong Li
- Hubei University of Chinese Medicine, Wuhan 430065, China
- Hubei Provincial Hospital of Traditional Chinese Medicine (Affiliated Hospital of Hubei University of Traditional Chinese Medicine, Hubei Academy of Chinese Medicine, Wuhan 430061, China
| | - Baoyan Liu
- China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Xuezhong Zhou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100063, China
| | - Albert-László Barabási
- Network Science Institute, Northeastern University, Boston, MA 02115, USA
- Department of Network and Data Science, Central European University, Budapest 1051, Hungary
| |
Collapse
|
3
|
Yang K, Yang Y, Fan S, Xia J, Zheng Q, Dong X, Liu J, Liu Q, Lei L, Zhang Y, Li B, Gao Z, Zhang R, Liu B, Wang Z, Zhou X. DRONet: effectiveness-driven drug repositioning framework using network embedding and ranking learning. Brief Bioinform 2023; 24:6958501. [PMID: 36562715 DOI: 10.1093/bib/bbac518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 10/11/2022] [Accepted: 10/31/2022] [Indexed: 12/24/2022] Open
Abstract
As one of the most vital methods in drug development, drug repositioning emphasizes further analysis and research of approved drugs based on the existing large amount of clinical and experimental data to identify new indications of drugs. However, the existing drug repositioning methods didn't achieve enough prediction performance, and these methods do not consider the effectiveness information of drugs, which make it difficult to obtain reliable and valuable results. In this study, we proposed a drug repositioning framework termed DRONet, which make full use of effectiveness comparative relationships (ECR) among drugs as prior information by combining network embedding and ranking learning. We utilized network embedding methods to learn the deep features of drugs from a heterogeneous drug-disease network, and constructed a high-quality drug-indication data set including effectiveness-based drug contrast relationships. The embedding features and ECR of drugs are combined effectively through a designed ranking learning model to prioritize candidate drugs. Comprehensive experiments show that DRONet has higher prediction accuracy (improving 87.4% on Hit@1 and 37.9% on mean reciprocal rank) than state of the art. The case analysis also demonstrates high reliability of predicted results, which has potential to guide clinical drug development.
Collapse
Affiliation(s)
- Kuo Yang
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| | | | - Shuyue Fan
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| | - Jianan Xia
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| | - Qiguang Zheng
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| | - Xin Dong
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| | - Jun Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, China
| | - Qiong Liu
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, China
| | - Lei Lei
- Institute of Information on Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, China
| | - Yingying Zhang
- Dongzhimen Hospital, Beijing University of Chinese Medicine, China
| | - Bing Li
- Institute of Chinese Materia Medica, China Academy of Chinese Medical Sciences, China
| | - Zhuye Gao
- Xiyuan Hospital, China Academy of Chinese Medical Sciences, National Clinical Research Center for Chinese Medicine Cardiology, China
| | - Runshun Zhang
- Guanganmen Hospital, China Academy of Chinese Medical Sciences, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, China
| | - Zhong Wang
- Institute of Basic Research in Clinical Medicine, China Academy of Chinese Medical Sciences, China
| | - Xuezhong Zhou
- Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, China
| |
Collapse
|
4
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how two biomedical entities are related. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.05.522941. [PMID: 36711546 PMCID: PMC9882000 DOI: 10.1101/2023.01.05.522941] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Hetnets, short for "heterogeneous networks", contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet connects 11 types of nodes - including genes, diseases, drugs, pathways, and anatomical structures - with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious not only how metformin is related to breast cancer, but also how the GJA1 gene might be involved in insomnia. We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any two nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). We find that predictions are broadly similar to those from previously described supervised approaches for certain node type pairs. Scoring of individual paths is based on the most specific paths of a given type. Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. We implemented the method on Hetionet and provide an online interface at https://het.io/search . We provide an open source implementation of these methods in our new Python package named hetmatpy .
Collapse
Affiliation(s)
- Daniel S. Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Related Sciences
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Kyle Kloster
- Carbon, Inc.; Department of Computer Science, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Benjamin J. Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania
| | - Faisal Alquaddoomi
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine University of Pennsylvania, Philadelphia PA, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia PA, USA
| | | | - Michael W. Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, Massachusetts, United States of America; Neurogenomics, Translational Sciences, Neurology Business Group, Eisai Inc, Cambridge, Massachusetts, United States of America
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, Colorado, United States of America; Center for Health AI, University of Colorado School of Medicine, Aurora, Colorado, United States of America
| |
Collapse
|
5
|
Himmelstein DS, Zietz M, Rubinetti V, Kloster K, Heil BJ, Alquaddoomi F, Hu D, Nicholson DN, Hao Y, Sullivan BD, Nagle MW, Greene CS. Hetnet connectivity search provides rapid insights into how biomedical entities are related. Gigascience 2022; 12:giad047. [PMID: 37503959 PMCID: PMC10375517 DOI: 10.1093/gigascience/giad047] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 04/14/2023] [Accepted: 06/06/2023] [Indexed: 07/29/2023] Open
Abstract
BACKGROUND Hetnets, short for "heterogeneous networks," contain multiple node and relationship types and offer a way to encode biomedical knowledge. One such example, Hetionet, connects 11 types of nodes-including genes, diseases, drugs, pathways, and anatomical structures-with over 2 million edges of 24 types. Previous work has demonstrated that supervised machine learning methods applied to such networks can identify drug repurposing opportunities. However, a training set of known relationships does not exist for many types of node pairs, even when it would be useful to examine how nodes of those types are meaningfully connected. For example, users may be curious about not only how metformin is related to breast cancer but also how a given gene might be involved in insomnia. FINDINGS We developed a new procedure, termed hetnet connectivity search, that proposes important paths between any 2 nodes without requiring a supervised gold standard. The algorithm behind connectivity search identifies types of paths that occur more frequently than would be expected by chance (based on node degree alone). Several optimizations were required to precompute significant instances of node connectivity at the scale of large knowledge graphs. CONCLUSION We implemented the method on Hetionet and provide an online interface at https://het.io/search. We provide an open-source implementation of these methods in our new Python package named hetmatpy.
Collapse
Affiliation(s)
- Daniel S Himmelstein
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Related Sciences, Denver, CO 80202, USA
| | - Michael Zietz
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| | - Vincent Rubinetti
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Kyle Kloster
- Carbon, Inc., Redwood City, CA 94063, USA
- Department of Computer Science, North Carolina State University, Raleigh, NC 27606, USA
| | - Benjamin J Heil
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Faisal Alquaddoomi
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - Dongbo Hu
- Department of Pathology, Perelman School of Medicine University of Pennsylvania, Philadelphia, PA 19104, USA
| | - David N Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yun Hao
- Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Blair D Sullivan
- School of Computing, University of Utah, Salt Lake City, UT 84112, USA
| | - Michael W Nagle
- Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc, Cambridge, MA 02139, USA
- Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc., Cambridge, MA 02140, USA
| | - Casey S Greene
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, USA
- Center for Health AI, University of Colorado School of Medicine, Aurora, CO 80045, USA
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
6
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
7
|
Shi Y, Yao X, Xu J, Hu X, Tu L, Lan F, Cui J, Cui L, Huang J, Li J, Bi Z, Li J. A New Approach of Fatigue Classification Based on Data of Tongue and Pulse With Machine Learning. Front Physiol 2022; 12:708742. [PMID: 35197858 PMCID: PMC8859319 DOI: 10.3389/fphys.2021.708742] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Accepted: 11/03/2021] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Fatigue is a common and subjective symptom, which is associated with many diseases and suboptimal health status. A reliable and evidence-based approach is lacking to distinguish disease fatigue and non-disease fatigue. This study aimed to establish a method for early differential diagnosis of fatigue, which can be used to distinguish disease fatigue from non-disease fatigue, and to investigate the feasibility of characterizing fatigue states in a view of tongue and pulse data analysis. METHODS Tongue and Face Diagnosis Analysis-1 (TFDA-1) instrument and Pulse Diagnosis Analysis-1 (PDA-1) instrument were used to collect tongue and pulse data. Four machine learning models were used to perform classification experiments of disease fatigue vs. non-disease fatigue. RESULTS The results showed that all the four classifiers over "Tongue & Pulse" joint data showed better performances than those only over tongue data or only over pulse data. The model accuracy rates based on logistic regression, support vector machine, random forest, and neural network were (85.51 ± 1.87)%, (83.78 ± 4.39)%, (83.27 ± 3.48)% and (85.82 ± 3.01)%, and with Area Under Curve estimates of 0.9160 ± 0.0136, 0.9106 ± 0.0365, 0.8959 ± 0.0254 and 0.9239 ± 0.0174, respectively. CONCLUSION This study proposed and validated an innovative, non-invasive differential diagnosis approach. Results suggest that it is feasible to characterize disease fatigue and non-disease fatigue by using objective tongue data and pulse data.
Collapse
Affiliation(s)
- Yulin Shi
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Xinghua Yao
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jiatuo Xu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Xiaojuan Hu
- Shanghai Innovation Center of TCM Health Service, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Liping Tu
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Fang Lan
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Ji Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Longtao Cui
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jingbin Huang
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jun Li
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Zijuan Bi
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| | - Jiacai Li
- Basic Medical College, Shanghai University of Traditional Chinese Medicine, Pudong, China
| |
Collapse
|
8
|
Yang K, Zheng Y, Lu K, Chang K, Wang N, Shu Z, Yu J, Liu B, Gao Z, Zhou X. PDGNet: Predicting Disease Genes Using a Deep Neural Network With Multi-View Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:575-584. [PMID: 32750864 DOI: 10.1109/tcbb.2020.3002771] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
The knowledge of phenotype-genotype associations is crucial for the understanding of disease mechanisms. Numerous studies have focused on developing efficient and accurate computing approaches to predict disease genes. However, owing to the sparseness and complexity of medical data, developing an efficient deep neural network model to identify disease genes remains a huge challenge. Therefore, we develop a novel deep neural network model that fuses the multi-view features of phenotypes and genotypes to identify disease genes (termed PDGNet). Our model integrated the multi-view features of diseases and genes and leveraged the feedback information of training samples to optimize the parameters of deep neural network and obtain the deep vector features of diseases and genes. The evaluation experiments on a large data set indicated that PDGNet obtained higher performance than the state-of-the-art method (precision and recall improved by 9.55 and 9.63 percent). The analysis results for the candidate genes indicated that the predicted genes have strong functional homogeneity and dense interactions with known genes. We validated the top predicted genes of Parkinson's disease based on external curated data and published medical literatures, which indicated that the candidate genes have a huge potential to guide the selection of causal genes in the 'wet experiment'. The source codes and the data of PDGNet are available at https://github.com/yangkuoone/PDGNet.
Collapse
|
9
|
Shu Z, Jia T, Tian H, Yan D, Yang Y, Zhou X. AIM in Alternative Medicine. Artif Intell Med 2022. [DOI: 10.1007/978-3-030-64573-1_57] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
10
|
Shu Z, Wang J, Sun H, Xu N, Lu C, Zhang R, Li X, Liu B, Zhou X. Diversity and molecular network patterns of symptom phenotypes. NPJ Syst Biol Appl 2021; 7:41. [PMID: 34848731 PMCID: PMC8632989 DOI: 10.1038/s41540-021-00206-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 11/01/2021] [Indexed: 11/08/2022] Open
Abstract
Symptom phenotypes have continuously been an important clinical entity for clinical diagnosis and management. However, non-specificity of symptom phenotypes for clinical diagnosis is one of the major challenges that need be addressed to advance symptom science and precision health. Network medicine has delivered a successful approach for understanding the underlying mechanisms of complex disease phenotypes, which will also be a useful tool for symptom science. Here, we extracted symptom co-occurrences from clinical textbooks to construct phenotype network of symptoms with clinical co-occurrence and incorporated high-quality symptom-gene associations and protein-protein interactions to explore the molecular network patterns of symptom phenotypes. Furthermore, we adopted established network diversity measure in network medicine to quantify both the phenotypic diversity (i.e., non-specificity) and molecular diversity of symptom phenotypes. The results showed that the clinical diversity of symptom phenotypes could partially be explained by their underlying molecular network diversity (PCC = 0.49, P-value = 2.14E-08). For example, non-specific symptoms, such as chill, vomiting, and amnesia, have both high phenotypic and molecular network diversities. Moreover, we further validated and confirmed the approach of symptom clusters to reduce the non-specificity of symptom phenotypes. Network diversity proposes a useful approach to evaluate the non-specificity of symptom phenotypes and would help elucidate the underlying molecular network mechanisms of symptom phenotypes and thus promotes the advance of symptom science for precision health.
Collapse
Affiliation(s)
- Zixin Shu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, China
| | - Jingjing Wang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, China
| | - Hailong Sun
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, China
| | - Ning Xu
- The First Affiliated Hospital of Henan University of Chinese Medicine (Co-construction Collaborative Innovation Center for Chinese Medicine and Respiratory Diseases by Henan, Henan University of Chinese Medicine), Zhengzhou, 450046, China
| | - Chenxia Lu
- Hubei Provincial Hospital of Traditional Chinese Medicine (Affiliated Hospital of Hubei University of Traditional Chinese Medicine, Hubei Academy of Traditional Chinese Medicine), Wuhan, 430061, China
| | - Runshun Zhang
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing, 100053, China
| | - Xiaodong Li
- Hubei Provincial Hospital of Traditional Chinese Medicine (Affiliated Hospital of Hubei University of Traditional Chinese Medicine, Hubei Academy of Traditional Chinese Medicine), Wuhan, 430061, China
| | - Baoyan Liu
- China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Xuezhong Zhou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100063, China.
| |
Collapse
|
11
|
Identification of Hypertension Subgroups through Topological Analysis of Symptom-Based Patient Similarity. Chin J Integr Med 2021; 27:656-665. [PMID: 34060025 DOI: 10.1007/s11655-021-3336-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/11/2020] [Indexed: 01/12/2023]
Abstract
OBJECTIVE To obtain the subtypes of the clinical hypertension population based on symptoms and to explore the relationship between hypertension and comorbidities. METHODS The data set was collected from the Chinese medicine (CM) electronic medical records of 33,458 hypertension inpatients in the Affiliated Hospital of Shandong University of Traditional Chinese Medicine between July 2014 and May 2017. Then, a hypertension disease comorbidity network (HDCN) was built to investigate the complicated associations between hypertension and their comorbidities. Moreover, a hypertension patient similarity network (HPSN) was constructed with patients' shared symptoms, and 7 main hypertension patient subgroups were identified from HPSN with a community detection method to exhibit the characteristics of clinical phenotypes and molecular mechanisms. In addition, the significant symptoms, diseases, CM syndromes and pathways of each main patient subgroup were obtained by enrichment analysis. RESULTS The significant symptoms and diseases of these patient subgroups were associated with different damaged target organs of hypertension. Additionally, the specific phenotypic features (symptoms, diseases, and CM syndromes) were consistent with specific molecular features (pathways) in the same patient subgroup. CONCLUSION The utility and comprehensiveness of disease classification based on community detection of patient networks using shared CM symptom phenotypes showed the importance of hypertension patient subgroups.
Collapse
|
12
|
|
13
|
Yang K, Lu K, Wu Y, Yu J, Liu B, Zhao Y, Chen J, Zhou X. A network-based machine-learning framework to identify both functional modules and disease genes. Hum Genet 2021; 140:897-913. [PMID: 33409574 DOI: 10.1007/s00439-020-02253-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2020] [Accepted: 12/22/2020] [Indexed: 01/20/2023]
Abstract
Disease gene identification is a critical step towards uncovering the molecular mechanisms of diseases and systematically investigating complex disease phenotypes. Despite considerable efforts to develop powerful computing methods, candidate gene identification remains a severe challenge owing to the connectivity of an incomplete interactome network, which hampers the discovery of true novel candidate genes. We developed a network-based machine-learning framework to identify both functional modules and disease candidate genes. In this framework, we designed a semi-supervised non-negative matrix factorization model to obtain the functional modules related to the diseases and genes. Of note, we proposed a disease gene-prioritizing method called MapGene that integrates the correlations from both functional modules and network closeness. Our framework identified a set of functional modules with highly functional homogeneity and close gene interactions. Experiments on a large-scale benchmark dataset showed that MapGene performs significantly better than the state-of-the-art algorithms. Further analysis demonstrates MapGene can effectively relieve the impact of the incompleteness of interactome networks and obtain highly reliable rankings of candidate genes. In addition, disease cases on Parkinson's disease and diabetes mellitus confirmed the generalization of MapGene for novel candidate gene identification. This work proposed, for the first time, an integrated computing framework to predict both functional modules and disease candidate genes. The methodology and results support that our framework has the potential to help discover underlying functional modules and reliable candidate genes in human disease.
Collapse
Affiliation(s)
- Kuo Yang
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,Institute for TCM-X, MOE Key Laboratory of Bioinformatics / Bioinformatics Division, BNRIST, Department of Automation, Tsinghua University, Beijing, 10084, China
| | - Kezhi Lu
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China.,imec-DistriNet, KU Leuven, Leuven, 3001, Belgium
| | - Yang Wu
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jian Yu
- Beijing Key Laboratory of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China
| | - Yi Zhao
- Key Laboratory of Intelligent Information Processing, Advanced Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing, 100029, China
| | - Xuezhong Zhou
- School of Computer and Information Technology, Institute of Medical Intelligence, Beijing Jiaotong University, Beijing, 100044, China. .,Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing, 100700, China.
| |
Collapse
|
14
|
Long Y, Luo J. Association Mining to Identify Microbe Drug Interactions Based on Heterogeneous Network Embedding Representation. IEEE J Biomed Health Inform 2021; 25:266-275. [PMID: 32750918 DOI: 10.1109/jbhi.2020.2998906] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Accurately identifying microbe-drug associations plays a critical role in drug development and precision medicine. Considering that the conventional wet-lab method is time-consuming, labor-intensive and expensive, computational approach is an alternative choice. The increasing availability of numerous biological data provides a great opportunity to systematically understand complex interaction mechanisms between microbes and drugs. However, few computational methods have been developed for microbe drug prediction. In this work, we leverage multiple sources of biomedical data to construct a heterogeneous network for microbes and drugs, including drug-drug interactions, microbe-microbe interactions and microbe-drug associations. And then we propose a novel Heterogeneous Network Embedding Representation framework for Microbe-Drug Association prediction, named (HNERMDA), by combining metapath2vec with bipartite network recommendation. In this framework, we introduce metapath2vec, a heterogeneous network representation learning method, to learn low-dimensional embedding representations for microbes and drugs. Following that, we further design a bias bipartite network projection recommendation algorithm to improve prediction accuracy. Comprehensive experiments on two datasets, named MDAD and aBiofilm, demonstrated that our model consistently outperformed five baseline methods in three types of cross-validations. Case study on two popular drugs (i.e., Ciprofloxacin and Pefloxacin) further validated the effectiveness of our HNERMDA model in inferring potential target microbes for drugs.
Collapse
|
15
|
AIM in Alternative Medicine. Artif Intell Med 2021. [DOI: 10.1007/978-3-030-58080-3_57-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
16
|
Lu K, Yang K, Niyongabo E, Shu Z, Wang J, Chang K, Zou Q, Jiang J, Jia C, Liu B, Zhou X. Integrated network analysis of symptom clusters across disease conditions. J Biomed Inform 2020; 107:103482. [PMID: 32535270 DOI: 10.1016/j.jbi.2020.103482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 05/18/2020] [Accepted: 06/08/2020] [Indexed: 10/24/2022]
Abstract
Identifying the symptom clusters (two or more related symptoms) with shared underlying molecular mechanisms has been a vital analysis task to promote the symptom science and precision health. Related studies have applied the clustering algorithms (e.g. k-means, latent class model) to detect the symptom clusters mostly from various kinds of clinical data. In addition, they focused on identifying the symptom clusters (SCs) for a specific disease, which also mainly concerned with the clinical regularities for symptom management. Here, we utilized a network-based clustering algorithm (i.e., BigCLAM) to obtain 208 typical SCs across disease conditions on a large-scale symptom network derived from integrated high-quality disease-symptom associations. Furthermore, we evaluated the underlying shared molecular mechanisms for SCs, i.e., shared genes, protein-protein interaction (PPI) and gene functional annotations using integrated networks and similarity measures. We found that the symptoms in the same SCs tend to share a higher degree of genes, PPIs and have higher functional homogeneities. In addition, we found that most SCs have related symptoms with shared underlying molecular mechanisms (e.g. enriched pathways) across different disease conditions. Our work demonstrated that the integrated network analysis method could be used for identifying robust SCs and investigate the molecular mechanisms of these SCs, which would be valuable for symptom science and precision health.
Collapse
Affiliation(s)
- Kezhi Lu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kuo Yang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Edouard Niyongabo
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Zixin Shu
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jingjing Wang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Kai Chang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Qunsheng Zou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Jiyue Jiang
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Caiyan Jia
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China.
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| | - Xuezhong Zhou
- Institute of Medical Intelligence, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China; Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China.
| |
Collapse
|
17
|
Nicholson DN, Greene CS. Constructing knowledge graphs and their biomedical applications. Comput Struct Biotechnol J 2020; 18:1414-1428. [PMID: 32637040 PMCID: PMC7327409 DOI: 10.1016/j.csbj.2020.05.017] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2020] [Revised: 05/22/2020] [Accepted: 05/23/2020] [Indexed: 12/31/2022] Open
Abstract
Knowledge graphs can support many biomedical applications. These graphs represent biomedical concepts and relationships in the form of nodes and edges. In this review, we discuss how these graphs are constructed and applied with a particular focus on how machine learning approaches are changing these processes. Biomedical knowledge graphs have often been constructed by integrating databases that were populated by experts via manual curation, but we are now seeing a more robust use of automated systems. A number of techniques are used to represent knowledge graphs, but often machine learning methods are used to construct a low-dimensional representation that can support many different applications. This representation is designed to preserve a knowledge graph's local and/or global structure. Additional machine learning methods can be applied to this representation to make predictions within genomic, pharmaceutical, and clinical domains. We frame our discussion first around knowledge graph construction and then around unifying representational learning techniques and unifying applications. Advances in machine learning for biomedicine are creating new opportunities across many domains, and we note potential avenues for future work with knowledge graphs that appear particularly promising.
Collapse
Affiliation(s)
- David N. Nicholson
- Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania, United States
| | - Casey S. Greene
- Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Childhood Cancer Data Lab, Alex’s Lemonade Stand Foundation, United States
| |
Collapse
|
18
|
Hu F, Li L, Huang X, Yan X, Huang P. Symptom Distribution Regularity of Insomnia: Network and Spectral Clustering Analysis. JMIR Med Inform 2020; 8:e16749. [PMID: 32297869 PMCID: PMC7193440 DOI: 10.2196/16749] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2019] [Revised: 01/31/2020] [Accepted: 02/10/2020] [Indexed: 12/15/2022] Open
Abstract
Background Recent research in machine-learning techniques has led to significant progress in various research fields. In particular, knowledge discovery using this method has become a hot topic in traditional Chinese medicine. As the key clinical manifestations of patients, symptoms play a significant role in clinical diagnosis and treatment, which evidently have their underlying traditional Chinese medicine mechanisms. Objective We aimed to explore the core symptoms and potential regularity of symptoms for diagnosing insomnia to reveal the key symptoms, hidden relationships underlying the symptoms, and their corresponding syndromes. Methods An insomnia dataset with 807 samples was extracted from real-world electronic medical records. After cleaning and selecting the theme data referring to the syndromes and symptoms, the symptom network analysis model was constructed using complex network theory. We used four evaluation metrics of node centrality to discover the core symptom nodes from multiple aspects. To explore the hidden relationships among symptoms, we trained each symptom node in the network to obtain the symptom embedding representation using the Skip-Gram model and node embedding theory. After acquiring the symptom vocabulary in a digital vector format, we calculated the similarities between any two symptom embeddings, and clustered these symptom embeddings into five communities using the spectral clustering algorithm. Results The top five core symptoms of insomnia diagnosis, including difficulty falling asleep, easy to wake up at night, dysphoria and irascibility, forgetful, and spiritlessness and weakness, were identified using evaluation metrics of node centrality. The symptom embeddings with hidden relationships were constructed, which can be considered as the basic dataset for future insomnia research. The symptom network was divided into five communities, and these symptoms were accurately categorized into their corresponding syndromes. Conclusions These results highlight that network and clustering analyses can objectively and effectively find the key symptoms and relationships among symptoms. Identification of the symptom distribution and symptom clusters of insomnia further provide valuable guidance for clinical diagnosis and treatment.
Collapse
Affiliation(s)
- Fang Hu
- College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, China
| | - Liuhuan Li
- College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, China
| | - Xiaoyu Huang
- College of Basic Medicine, Hubei University of Chinese Medicine, Wuhan, China
| | - Xingyu Yan
- College of Information Engineering, Hubei University of Chinese Medicine, Wuhan, China
| | - Panpan Huang
- College of Basic Medicine, Hubei University of Chinese Medicine, Wuhan, China
| |
Collapse
|
19
|
Wang N, Li P, Hu X, Yang K, Peng Y, Zhu Q, Zhang R, Gao Z, Xu H, Liu B, Chen J, Zhou X. Herb Target Prediction Based on Representation Learning of Symptom related Heterogeneous Network. Comput Struct Biotechnol J 2019; 17:282-290. [PMID: 30867892 PMCID: PMC6396098 DOI: 10.1016/j.csbj.2019.02.002] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Revised: 02/01/2019] [Accepted: 02/01/2019] [Indexed: 11/02/2022] Open
Abstract
Traditional Chinese Medicine (TCM) has received increasing attention as a complementary approach or alternative to modern medicine. However, experimental methods for identifying novel targets of TCM herbs heavily relied on the current available herb-compound-target relationships. In this work, we present an Herb-Target Interaction Network (HTINet) approach, a novel network integration pipeline for herb-target prediction mainly relying on the symptom related associations. HTINet focuses on capturing the low-dimensional feature vectors for both herbs and proteins by network embedding, which incorporate the topological properties of nodes across multi-layered heterogeneous network, and then performs supervised learning based on these low-dimensional feature representations. HTINet obtains performance improvement over a well-established random walk based herb-target prediction method. Furthermore, we have manually validated several predicted herb-target interactions from independent literatures. These results indicate that HTINet can be used to integrate heterogeneous information to predict novel herb-target interactions.
Collapse
Affiliation(s)
- Ning Wang
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Peng Li
- College of Arts and Sciences, Shanxi Agricultural University, Taigu 030801, China
| | - Xiaochen Hu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Kuo Yang
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Yonghong Peng
- Faculty of Computer Science, University of Sunderland, St Peters Campus, Sunderland SR6 0DD, UK
| | - Qiang Zhu
- Medical Intelligence Institute, School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China
| | - Runshun Zhang
- Guanganmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Zhuye Gao
- Department of Cardiology, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing 100091, China
| | - Hao Xu
- Department of Cardiology, Xiyuan Hospital of China Academy of Chinese Medical Sciences, Beijing 100091, China
| | - Baoyan Liu
- Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jianxin Chen
- Beijing University of Chinese Medicine, Beijing 100029, China
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China.,Data Center of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| |
Collapse
|