1
|
Fu M, Yan Y, Olde Loohuis LM, Chang TS. Defining the distance between diseases using SNOMED CT embeddings. J Biomed Inform 2023; 139:104307. [PMID: 36738869 DOI: 10.1016/j.jbi.2023.104307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2022] [Revised: 12/10/2022] [Accepted: 01/29/2023] [Indexed: 02/05/2023]
Abstract
Characterizing disease relationships is essential to biomedical research to understand disease etiology and improve clinical decision-making. Measurements of distance between disease pairs enable valuable research tasks, such as subgrouping patients and identifying common time courses of disease onset. Distance metrics developed in prior work focused on smaller, targeted disease sets. Distance metrics covering all diseases have not yet been defined, which limits the applications to a broader disease spectrum. Our current study defines disease distances for all disease pairs within the International Classification of Diseases, version 10 (ICD-10), the diagnostic classification system universally used in electronic health records. Our proposed distance is computed based on a biomedical ontology, SNOMED CT (Systemized Nomenclature of Medicine, Clinical Terms), which can also be viewed as a structured knowledge graph. We compared the knowledge graph-based metric to three other distance metrics based on the hierarchical structure of ICD, clinical comorbidity, and genetic correlation, to evaluate how each may capture similar or unique aspects of disease relationships. We show that our knowledge graph-based distance metric captures known phenotypic, clinical, and molecular characteristics at a finer granularity than the other three. With the continued growth of using electronic health records data for research, we believe that our distance metric will play an important role in subgrouping patients for precision health, and enabling individualized disease prevention and treatments.
Collapse
Affiliation(s)
- Mingzhou Fu
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA; Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Yu Yan
- Medical Informatics Home Area, Department of Bioinformatics, University of California, Los Angeles, CA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA; Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA.
| | - Timothy S Chang
- Movement Disorders Program, Department of Neurology, David Geffen School of Medicine, University of California, Los Angeles, CA, USA.
| |
Collapse
|
2
|
Li W, Zhang Y, Wang Y, Rong Z, Liu C, Miao H, Chen H, He Y, He W, Chen L. Candidate gene prioritization for chronic obstructive pulmonary disease using expression information in protein-protein interaction networks. BMC Pulm Med 2021; 21:280. [PMID: 34481483 PMCID: PMC8418003 DOI: 10.1186/s12890-021-01646-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2021] [Accepted: 08/23/2021] [Indexed: 11/30/2022] Open
Abstract
Background Identifying or prioritizing genes for chronic obstructive pulmonary disease (COPD), one type of complex disease, is particularly important for its prevention and treatment. Methods In this paper, a novel method was proposed to Prioritize genes using Expression information in Protein–protein interaction networks with disease risks transferred between genes (abbreviated as PEP). A weighted COPD PPI network was constructed using expression information and then COPD candidate genes were prioritized based on their corresponding disease risk scores in descending order. Results Further analysis demonstrated that the PEP method was robust in prioritizing disease candidate genes, and superior to other existing prioritization methods exploiting either topological or functional information. Top-ranked COPD candidate genes and their significantly enriched functions were verified to be related to COPD. The top 200 candidate genes might be potential disease genes in the diagnosis and treatment of COPD. Conclusions The proposed method could provide new insights to the research of prioritizing candidate genes of COPD or other complex diseases with expression information from sequencing or microarray data. Supplementary Information The online version contains supplementary material available at 10.1186/s12890-021-01646-9.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Zherou Rong
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Chenyu Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Hui Miao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Hongwei Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China
| | - Weiming He
- Institute of Opto-Electronics, Harbin Institute of Technology, Harbin, 150000, Heilongjiang, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150000, Heilongjiang, China.
| |
Collapse
|
3
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
4
|
Li W, Zhang Y, He Y, Wang Y, Guo S, Zhao X, Feng Y, Song Z, Zou Y, He W, Chen L. Candidate gene prioritization for non-communicable diseases based on functional information: Case studies. J Biomed Inform 2019; 93:103155. [PMID: 30902596 DOI: 10.1016/j.jbi.2019.103155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Revised: 03/14/2019] [Accepted: 03/19/2019] [Indexed: 10/27/2022]
Abstract
Candidate gene prioritization for complex non-communicable diseases is essential to understanding the mechanism and developing better means for diagnosing and treating these diseases. Many methods have been developed to prioritize candidate genes in protein-protein interaction (PPI) networks. Integrating functional information/similarity into disease-related PPI networks could improve the performance of prioritization. In this study, a candidate gene prioritization method was proposed for non-communicable diseases considering disease risks transferred between genes in weighted disease PPI networks with weights for nodes and edges based on functional information. Here, three types of non-communicable diseases with pathobiological similarity, Type 2 diabetes (T2D), coronary artery disease (CAD) and dilated cardiomyopathy (DCM), were used as case studies. Literature review and pathway enrichment analysis of top-ranked genes demonstrated the effectiveness of our method. Better performance was achieved after comparing our method with other existing methods. Pathobiological similarity among these three diseases was further investigated for common top-ranked genes to reveal their pathogenesis.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yihua Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuehan He
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yahui Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Shanshan Guo
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Xilei Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuyan Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Zhaona Song
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Yuqing Zou
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin 150000, Heilongjiang Province, China.
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150000, Heilongjiang Province, China.
| |
Collapse
|
5
|
Nam Y, Jhee JH, Cho J, Lee JH, Shin H. Disease gene identification based on generic and disease-specific genome networks. Bioinformatics 2018; 35:1923-1930. [DOI: 10.1093/bioinformatics/bty882] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Revised: 10/11/2018] [Accepted: 10/17/2018] [Indexed: 01/11/2023] Open
Affiliation(s)
- Yonghyun Nam
- Department of Industrial Engineering, Ajou University, Yeongtong-gu, Suwon, South Korea
| | - Jong Ho Jhee
- Department of Industrial Engineering, Ajou University, Yeongtong-gu, Suwon, South Korea
| | - Junhee Cho
- Department of Industrial Engineering, Ajou University, Yeongtong-gu, Suwon, South Korea
| | - Ji-Hyun Lee
- DR. Noah Biotech, Yeongtong-gu, Suwon, South Korea
| | - Hyunjung Shin
- Department of Industrial Engineering, Ajou University, Yeongtong-gu, Suwon, South Korea
| |
Collapse
|
6
|
Lin L, Yang T, Fang L, Yang J, Yang F, Zhao J. Gene gravity-like algorithm for disease gene prediction based on phenotype-specific network. BMC SYSTEMS BIOLOGY 2017; 11:121. [PMID: 29212543 PMCID: PMC5718078 DOI: 10.1186/s12918-017-0519-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 11/24/2017] [Indexed: 01/24/2023]
Abstract
Background Polygenic diseases are usually caused by the dysfunction of multiple genes. Unravelling such disease genes is crucial to fully understand the genetic landscape of diseases on molecular level. With the advent of ‘omic’ data era, network-based methods have prominently boosted disease gene discovery. However, how to make better use of different types of data for the prediction of disease genes remains a challenge. Results In this study, we improved the performance of disease gene prediction by integrating the similarity of disease phenotype, biological function and network topology. First, for each phenotype, a phenotype-specific network was specially constructed by mapping phenotype similarity information of given phenotype onto the protein-protein interaction (PPI) network. Then, we developed a gene gravity-like algorithm, to score candidate genes based on not only topological similarity but also functional similarity. We tested the proposed network and algorithm by conducting leave-one-out and leave-10%-out cross validation and compared them with state-of-art algorithms. The results showed a preference to phenotype-specific network as well as gene gravity-like algorithm. At last, we tested the predicting capacity of proposed algorithms by test gene set derived from the DisGeNET database. Also, potential disease genes of three polygenic diseases, obesity, prostate cancer and lung cancer, were predicted by proposed methods. We found that the predicted disease genes are highly consistent with literature and database evidence. Conclusions The good performance of phenotype-specific networks indicates that phenotype similarity information has positive effect on the prediction of disease genes. The proposed gene gravity-like algorithm outperforms the algorithm of Random Walk with Restart (RWR), implicating its predicting capacity by combing topological similarity with functional similarity. Our work will give an insight to the discovery of disease genes by fusing multiple similarities of genes and diseases. Electronic supplementary material The online version of this article (10.1186/s12918-017-0519-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Limei Lin
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Tinghong Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Ling Fang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jian Yang
- School of Pharmacy, Second Military Medical University, Shanghai, China
| | - Fan Yang
- Department of Mathematics, Army Logistics University of PLA, Chongqing, China
| | - Jing Zhao
- Institute of Interdisciplinary Complex Research, Shanghai University of Traditional Chinese Medicine, Shanghai, China.
| |
Collapse
|
7
|
Wang L, Zhang C, Watkins J, Jin Y, McNutt M, Yin Y. SoftPanel: a website for grouping diseases and related disorders for generation of customized panels. BMC Bioinformatics 2016; 17:153. [PMID: 27044653 PMCID: PMC4820874 DOI: 10.1186/s12859-016-0998-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/23/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Targeted next-generation sequencing is playing an increasingly important role in biological research and clinical diagnosis by allowing researchers to sequence high priority genes at much higher depths and at a fraction of the cost of whole genome or exome sequencing. However, in designing the panel of genes to be sequenced, investigators need to consider the tradeoff between the better sensitivity of a broad panel and the higher specificity of a potentially more relevant panel. Although tools to prioritize candidate disease genes have been developed, the great majority of these require prior knowledge and a set of seed genes as input, which is only possible for diseases with a known genetic etiology. RESULTS To meet the demands of both researchers and clinicians, we have developed a user-friendly website called SoftPanel. This website is intended to serve users by allowing them to input a single disorder or a disorder group and generate a panel of genes predicted to underlie the disorder of interest. Various methods of retrieval including a keyword search, browsing of an arborized list of International Classification of Diseases, 10th revision (ICD-10) codes or using disorder phenotypic similarities can be combined to define a group of disorders and the genes known to be associated with them. Moreover, SoftPanel enables users to expand or refine a gene list by utilizing several biological data resources. In addition to providing users with the facility to create a "hard" panel that contains an exact gene list for targeted sequencing, SoftPanel also enables generation of a "soft" panel of genes, which may be used to further filter a significantly altered set of genes identified through whole genome or whole exome sequencing. The service and data provided by SoftPanel can be accessed at http://www.isb.pku.edu.cn/SoftPanel/ . A tutorial page is included for trying out sample data and interpreting results. CONCLUSION SoftPanel provides a convenient and powerful tool for creating a targeted panel of potential disease genes while supporting different forms of input. SoftPanel may be utilized in both genomics research and personalized medicine.
Collapse
Affiliation(s)
- Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Cong Zhang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Johnathan Watkins
- Institute for Mathematical and Molecular Biomedicine, King's College London, Guy's Campus, London, SE1 1UL, UK.,Department of Research Oncology, King's College London, Guy's Campus, Great Maze Pond, London, SE1 9RT, UK
| | - Yan Jin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Michael McNutt
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Yuxin Yin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|