1
|
Zhou Z, Zhang R, Zhou A, Lv J, Chen S, Zou H, Zhang G, Lin T, Wang Z, Zhang Y, Weng S, Han X, Liu Z. Proteomics appending a complementary dimension to precision oncotherapy. Comput Struct Biotechnol J 2024; 23:1725-1739. [PMID: 38689716 PMCID: PMC11058087 DOI: 10.1016/j.csbj.2024.04.044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2024] [Revised: 04/11/2024] [Accepted: 04/17/2024] [Indexed: 05/02/2024] Open
Abstract
Recent advances in high-throughput proteomic profiling technologies have facilitated the precise quantification of numerous proteins across multiple specimens concurrently. Researchers have the opportunity to comprehensively analyze the molecular signatures in plentiful medical specimens or disease pattern cell lines. Along with advances in data analysis and integration, proteomics data could be efficiently consolidated and employed to recognize precise elementary molecular mechanisms and decode individual biomarkers, guiding the precision treatment of tumors. Herein, we review a broad array of proteomics technologies and the progress and methods for the integration of proteomics data and further discuss how to better merge proteomics in precision medicine and clinical settings.
Collapse
Affiliation(s)
- Zhaokai Zhou
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Department of Urology, The First Affiliated Hospital of Zhengzhou University, Henan 450052, China
| | - Ruiqi Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Aoyang Zhou
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Jinxiang Lv
- Department of Gastroenterology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Shuang Chen
- Center of Reproductive Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Haijiao Zou
- Center of Reproductive Medicine, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Ge Zhang
- Department of Cardiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Ting Lin
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Zhan Wang
- Department of Urology, The First Affiliated Hospital of Zhengzhou University, Henan 450052, China
| | - Yuyuan Zhang
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Siyuan Weng
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
| | - Xinwei Han
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Institute of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, Henan 450052, China
| | - Zaoqu Liu
- Department of Interventional Radiology, The First Affiliated Hospital of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Institute of Zhengzhou University, Zhengzhou, Henan 450052, China
- Interventional Treatment and Clinical Research Center of Henan Province, Zhengzhou, Henan 450052, China
- Institute of Basic Medical Sciences, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100730, China
| |
Collapse
|
2
|
Giudice G, Chen H, Koutsandreas T, Petsalaki E. phuEGO: A Network-Based Method to Reconstruct Active Signaling Pathways From Phosphoproteomics Datasets. Mol Cell Proteomics 2024; 23:100771. [PMID: 38642805 PMCID: PMC11134849 DOI: 10.1016/j.mcpro.2024.100771] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 04/08/2024] [Accepted: 04/17/2024] [Indexed: 04/22/2024] Open
Abstract
Signaling networks are critical for virtually all cell functions. Our current knowledge of cell signaling has been summarized in signaling pathway databases, which, while useful, are highly biased toward well-studied processes, and do not capture context specific network wiring or pathway cross-talk. Mass spectrometry-based phosphoproteomics data can provide a more unbiased view of active cell signaling processes in a given context, however, it suffers from low signal-to-noise ratio and poor reproducibility across experiments. While progress in methods to extract active signaling signatures from such data has been made, there are still limitations with respect to balancing bias and interpretability. Here we present phuEGO, which combines up-to-three-layer network propagation with ego network decomposition to provide small networks comprising active functional signaling modules. PhuEGO boosts the signal-to-noise ratio from global phosphoproteomics datasets, enriches the resulting networks for functional phosphosites and allows the improved comparison and integration across datasets. We applied phuEGO to five phosphoproteomics data sets from cell lines collected upon infection with SARS CoV2. PhuEGO was better able to identify common active functions across datasets and to point to a subnetwork enriched for known COVID-19 targets. Overall, phuEGO provides a flexible tool to the community for the improved functional interpretation of global phosphoproteomics datasets.
Collapse
Affiliation(s)
- Girolamo Giudice
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Haoqi Chen
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Thodoris Koutsandreas
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom
| | - Evangelia Petsalaki
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridgeshire, United Kingdom.
| |
Collapse
|
3
|
Kim Y, Han Y, Hopper C, Lee J, Joo JI, Gong JR, Lee CK, Jang SH, Kang J, Kim T, Cho KH. A gray box framework that optimizes a white box logical model using a black box optimizer for simulating cellular responses to perturbations. CELL REPORTS METHODS 2024; 4:100773. [PMID: 38744288 PMCID: PMC11133856 DOI: 10.1016/j.crmeth.2024.100773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Revised: 03/19/2024] [Accepted: 04/19/2024] [Indexed: 05/16/2024]
Abstract
Predicting cellular responses to perturbations requires interpretable insights into molecular regulatory dynamics to perform reliable cell fate control, despite the confounding non-linearity of the underlying interactions. There is a growing interest in developing machine learning-based perturbation response prediction models to handle the non-linearity of perturbation data, but their interpretation in terms of molecular regulatory dynamics remains a challenge. Alternatively, for meaningful biological interpretation, logical network models such as Boolean networks are widely used in systems biology to represent intracellular molecular regulation. However, determining the appropriate regulatory logic of large-scale networks remains an obstacle due to the high-dimensional and discontinuous search space. To tackle these challenges, we present a scalable derivative-free optimizer trained by meta-reinforcement learning for Boolean network models. The logical network model optimized by the trained optimizer successfully predicts anti-cancer drug responses of cancer cell lines, while simultaneously providing insight into their underlying molecular regulatory mechanisms.
Collapse
Affiliation(s)
- Yunseong Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Younghyun Han
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Corbin Hopper
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jonghoon Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jae Il Joo
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Jeong-Ryeol Gong
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Chun-Kyung Lee
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Seong-Hoon Jang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Junsoo Kang
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Taeyoung Kim
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea
| | - Kwang-Hyun Cho
- Laboratory for Systems Biology and Bio-inspired Engineering, Department of Bio and Brain Engineering, Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Korea.
| |
Collapse
|
4
|
Wright SN, Colton S, Schaffer LV, Pillich RT, Churas C, Pratt D, Ideker T. State of the Interactomes: an evaluation of molecular networks for generating biological insights. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.04.26.587073. [PMID: 38746239 PMCID: PMC11092493 DOI: 10.1101/2024.04.26.587073] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Advancements in genomic and proteomic technologies have powered the use of gene and protein networks ("interactomes") for understanding genotype-phenotype translation. However, the proliferation of interactomes complicates the selection of networks for specific applications. Here, we present a comprehensive evaluation of 46 current human interactomes, encompassing protein-protein interactions as well as gene regulatory, signaling, colocalization, and genetic interaction networks. Our analysis shows that large composite networks such as HumanNet, STRING, and FunCoup are most effective for identifying disease genes, while smaller networks such as DIP and SIGNOR demonstrate strong interaction prediction performance. These findings provide a benchmark for interactomes across diverse network biology applications and clarify factors that influence network performance. Furthermore, our evaluation pipeline paves the way for continued assessment of emerging and updated interaction networks in the future.
Collapse
|
5
|
Wang Y, Liu M, Jafari M, Tang J. A critical assessment of Traditional Chinese Medicine databases as a source for drug discovery. Front Pharmacol 2024; 15:1303693. [PMID: 38738181 PMCID: PMC11082401 DOI: 10.3389/fphar.2024.1303693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Accepted: 04/15/2024] [Indexed: 05/14/2024] Open
Abstract
Traditional Chinese Medicine (TCM) has been used for thousands of years to treat human diseases. Recently, many databases have been devoted to studying TCM pharmacology. Most of these databases include information about the active ingredients of TCM herbs and their disease indications. These databases enable researchers to interrogate the mechanisms of action of TCM systematically. However, there is a need for comparative studies of these databases, as they are derived from various resources with different data processing methods. In this review, we provide a comprehensive analysis of the existing TCM databases. We found that the information complements each other by comparing herbs, ingredients, and herb-ingredient pairs in these databases. Therefore, data harmonization is vital to use all the available information fully. Moreover, different TCM databases may contain various annotation types for herbs or ingredients, notably for the chemical structure of ingredients, making it challenging to integrate data from them. We also highlight the latest TCM databases on symptoms or gene expressions, suggesting that using multi-omics data and advanced bioinformatics approaches may provide new insights for drug discovery in TCM. In summary, such a comparative study would help improve the understanding of data complexity that may ultimately motivate more efficient and more standardized strategies towards the digitalization of TCM.
Collapse
Affiliation(s)
- Yinyin Wang
- School of Traditional Chinese Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Minxia Liu
- Faculty of Life Science, Anhui Medical University, Hefei, China
| | - Mohieddin Jafari
- Department Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
| | - Jing Tang
- Department Biochemistry and Developmental Biology, University of Helsinki, Helsinki, Finland
- Research Program in Systems Oncology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| |
Collapse
|
6
|
Zhang P, Catterson JH, Grönke S, Partridge L. Inhibition of S6K lowers age-related inflammation and increases lifespan through the endolysosomal system. NATURE AGING 2024; 4:491-509. [PMID: 38413780 PMCID: PMC11031405 DOI: 10.1038/s43587-024-00578-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 01/24/2024] [Indexed: 02/29/2024]
Abstract
Suppression of target of rapamycin complex 1 (TORC1) by rapamycin ameliorates aging in diverse species. S6 kinase (S6K) is an essential mediator, but the mechanisms involved are unclear. Here we show that activation of S6K specifically in Drosophila fat-body blocked extension of lifespan by rapamycin, induced accumulation of multilamellar lysosomes and blocked age-associated hyperactivation of the NF-κB-like immune deficiency (IMD) pathway, indicative of reduced inflammaging. Syntaxin 13 mediated the effects of TORC1-S6K signaling on lysosome morphology and inflammaging, suggesting they may be linked. Inflammaging depended on the IMD receptor regulatory isoform PGRP-LC, and repression of the IMD pathway from midlife extended lifespan. Age-related inflammaging was higher in females than in males and was not lowered in males by rapamycin treatment or lowered S6K. Rapamycin treatment also elevated Syntaxin 12/13 levels in mouse liver and prevented age-related increase in noncanonical NF-κB signaling, suggesting that the effect of TORC1 on inflammaging is conserved from flies to mammals.
Collapse
Affiliation(s)
- Pingze Zhang
- Max Planck Institute for Biology of Ageing, Cologne, Germany
| | - James H Catterson
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, UK
- Centre for Discovery Brain Sciences, UK Dementia Research Institute, University of Edinburgh, Edinburgh, UK
| | | | - Linda Partridge
- Max Planck Institute for Biology of Ageing, Cologne, Germany.
- Institute of Healthy Ageing, Department of Genetics, Evolution and Environment, University College London, London, UK.
| |
Collapse
|
7
|
Wang X, Yang K, Jia T, Gu F, Wang C, Xu K, Shu Z, Xia J, Zhu Q, Zhou X. KDGene: knowledge graph completion for disease gene prediction using interactional tensor decomposition. Brief Bioinform 2024; 25:bbae161. [PMID: 38605639 PMCID: PMC11009469 DOI: 10.1093/bib/bbae161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 02/20/2024] [Accepted: 03/13/2024] [Indexed: 04/13/2024] Open
Abstract
The accurate identification of disease-associated genes is crucial for understanding the molecular mechanisms underlying various diseases. Most current methods focus on constructing biological networks and utilizing machine learning, particularly deep learning, to identify disease genes. However, these methods overlook complex relations among entities in biological knowledge graphs. Such information has been successfully applied in other areas of life science research, demonstrating their effectiveness. Knowledge graph embedding methods can learn the semantic information of different relations within the knowledge graphs. Nonetheless, the performance of existing representation learning techniques, when applied to domain-specific biological data, remains suboptimal. To solve these problems, we construct a biological knowledge graph centered on diseases and genes, and develop an end-to-end knowledge graph completion framework for disease gene prediction using interactional tensor decomposition named KDGene. KDGene incorporates an interaction module that bridges entity and relation embeddings within tensor decomposition, aiming to improve the representation of semantically similar concepts in specific domains and enhance the ability to accurately predict disease genes. Experimental results show that KDGene significantly outperforms state-of-the-art algorithms, whether existing disease gene prediction methods or knowledge graph embedding methods for general domains. Moreover, the comprehensive biological analysis of the predicted results further validates KDGene's capability to accurately identify new candidate genes. This work proposes a scalable knowledge graph completion framework to identify disease candidate genes, from which the results are promising to provide valuable references for further wet experiments. Data and source codes are available at https://github.com/2020MEAI/KDGene.
Collapse
Affiliation(s)
| | - Kuo Yang
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| | | | | | | | | | | | | | | | - Xuezhong Zhou
- Corresponding author: Kuo Yang and Xuezhong Zhou, Institute of Medical Intelligence, Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer Science & Technology, Beijing Jiaotong University, Beijing 100044, China. E-mail: and
| |
Collapse
|
8
|
Leger BS, Meredith JJ, Ideker T, Sanchez-Roige S, Palmer AA. Rare and Common Variants Associated with Alcohol Consumption Identify a Conserved Molecular Network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.26.582195. [PMID: 38464225 PMCID: PMC10925118 DOI: 10.1101/2024.02.26.582195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Genome-wide association studies (GWAS) have identified hundreds of common variants associated with alcohol consumption. In contrast, rare variants have only begun to be studied for their role in alcohol consumption. No studies have examined whether common and rare variants implicate the same genes and molecular networks. To address this knowledge gap, we used publicly available alcohol consumption GWAS summary statistics (GSCAN, N=666,978) and whole exome sequencing data (Genebass, N=393,099) to identify a set of common and rare variants for alcohol consumption. Gene-based analysis of each dataset have implicated 294 (common variants) and 35 (rare variants) genes, including ethanol metabolizing genes ADH1B and ADH1C, which were identified by both analyses, and ANKRD12, GIGYF1, KIF21B, and STK31, which were identified only by rare variant analysis, but have been associated with related psychiatric traits. We then used a network colocalization procedure to propagate the common and rare gene sets onto a shared molecular network, revealing significant overlap. The shared network identified gene families that function in alcohol metabolism, including ADH, ALDH, CYP, and UGT. 74 of the genes in the network were previously implicated in comorbid psychiatric or substance use disorders, but had not previously been identified for alcohol-related behaviors, including EXOC2, EPM2A, CACNB3, and CACNG4. Differential gene expression analysis showed enrichment in the liver and several brain regions supporting the role of network genes in alcohol consumption. Thus, genes implicated by common and rare variants identify shared functions relevant to alcohol consumption, which also underlie psychiatric traits and substance use disorders that are comorbid with alcohol use.
Collapse
Affiliation(s)
- Brittany S Leger
- Program in Biomedical Sciences, University of California San Diego, La Jolla, CA, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - John J Meredith
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sandra Sanchez-Roige
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Department of Medicine, Division of Genetic Medicine, Vanderbilt University, Nashville, TN, USA
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
9
|
Zhang P, Zhang W, Sun W, Xu J, Hu H, Wang L, Wong L. Identification of gene biomarkers for brain diseases via multi-network topological semantics extraction and graph convolutional network. BMC Genomics 2024; 25:175. [PMID: 38350848 PMCID: PMC10865627 DOI: 10.1186/s12864-024-09967-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2023] [Accepted: 01/03/2024] [Indexed: 02/15/2024] Open
Abstract
BACKGROUND Brain diseases pose a significant threat to human health, and various network-based methods have been proposed for identifying gene biomarkers associated with these diseases. However, the brain is a complex system, and extracting topological semantics from different brain networks is necessary yet challenging to identify pathogenic genes for brain diseases. RESULTS In this study, we present a multi-network representation learning framework called M-GBBD for the identification of gene biomarker in brain diseases. Specifically, we collected multi-omics data to construct eleven networks from different perspectives. M-GBBD extracts the spatial distributions of features from these networks and iteratively optimizes them using Kullback-Leibler divergence to fuse the networks into a common semantic space that represents the gene network for the brain. Subsequently, a graph consisting of both gene and large-scale disease proximity networks learns representations through graph convolution techniques and predicts whether a gene is associated which brain diseases while providing associated scores. Experimental results demonstrate that M-GBBD outperforms several baseline methods. Furthermore, our analysis supported by bioinformatics revealed CAMP as a significantly associated gene with Alzheimer's disease identified by M-GBBD. CONCLUSION Collectively, M-GBBD provides valuable insights into identifying gene biomarkers for brain diseases and serves as a promising framework for brain networks representation learning.
Collapse
Affiliation(s)
- Ping Zhang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Weihan Zhang
- CAS Key Laboratory of Plant Germplasm Enhancement and Specialty Agriculture, Wuhan Botanical Garden, The Innovative Academy of Seed Design, Chinese Academy of Sciences, Hubei Hongshan Laboratory, Wuhan, 430074, China
| | - Weicheng Sun
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jinsheng Xu
- College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hua Hu
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, Shandong, China.
- Guangxi Key Lab of Human-Machine Interaction and Intelligent Decision, Guangxi Academy of Sciences, Nanning, 530007, China.
| | - Leon Wong
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen, 518118, China.
| |
Collapse
|
10
|
Visonà G, Bouzigon E, Demenais F, Schweikert G. Network propagation for GWAS analysis: a practical guide to leveraging molecular networks for disease gene discovery. Brief Bioinform 2024; 25:bbae014. [PMID: 38340090 PMCID: PMC10858647 DOI: 10.1093/bib/bbae014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2023] [Revised: 12/28/2023] [Accepted: 01/08/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION Genome-wide association studies (GWAS) have enabled large-scale analysis of the role of genetic variants in human disease. Despite impressive methodological advances, subsequent clinical interpretation and application remains challenging when GWAS suffer from a lack of statistical power. In recent years, however, the use of information diffusion algorithms with molecular networks has led to fruitful insights on disease genes. RESULTS We present an overview of the design choices and pitfalls that prove crucial in the application of network propagation methods to GWAS summary statistics. We highlight general trends from the literature, and present benchmark experiments to expand on these insights selecting as case study three diseases and five molecular networks. We verify that the use of gene-level scores based on GWAS P-values offers advantages over the selection of a set of 'seed' disease genes not weighted by the associated P-values if the GWAS summary statistics are of sufficient quality. Beyond that, the size and the density of the networks prove to be important factors for consideration. Finally, we explore several ensemble methods and show that combining multiple networks may improve the network propagation approach.
Collapse
Affiliation(s)
- Giovanni Visonà
- Empirical Inference, Max-Planck Institute for Intelligent Systems, Tübingen 72076, Germany
| | | | | | | |
Collapse
|
11
|
Chang X, Yan S, Zhang Y, Zhang Y, Li L, Gao Z, Lin X, Chi X. GINv2.0: a comprehensive topological network integrating molecular interactions from multiple knowledge bases. NPJ Syst Biol Appl 2024; 10:4. [PMID: 38218959 PMCID: PMC10787761 DOI: 10.1038/s41540-024-00330-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 01/02/2024] [Indexed: 01/15/2024] Open
Abstract
Knowledge bases have been instrumental in advancing biological research, facilitating pathway analysis and data visualization, which are now widely employed in the scientific community. Despite the establishment of several prominent knowledge bases focusing on signaling, metabolic networks, or both, integrating these networks into a unified topological network has proven to be challenging. The intricacy of molecular interactions and the diverse formats employed to store and display them contribute to the complexity of this task. In a prior study, we addressed this challenge by introducing a "meta-pathway" structure that integrated the advantages of the Simple Interaction Format (SIF) while accommodating reaction information. Nevertheless, the earlier Global Integrative Network (GIN) was limited to reliance on KEGG alone. Here, we present GIN version 2.0, which incorporates human molecular interaction data from ten distinct knowledge bases, including KEGG, Reactome, and HumanCyc, among others. We standardized the data structure, gene IDs, and chemical IDs, and conducted a comprehensive analysis of the consistency among the ten knowledge bases before combining all unified interactions into GINv2.0. Utilizing GINv2.0, we investigated the glycolysis process and its regulatory proteins, revealing coordinated regulations on glycolysis and autophagy, particularly under glucose starvation. The expanded scope and enhanced capabilities of GINv2.0 provide a valuable resource for comprehensive systems-level analyses in the field of biological research. GINv2.0 can be accessed at: https://github.com/BIGchix/GINv2.0 .
Collapse
Affiliation(s)
- Xiao Chang
- Department of Dermatology and Venereal Disease, Xuan Wu Hospital, Beijing, 100053, China
| | - Shen Yan
- Agricultural Information Institute, Chinese Academy of Agricultural Science, Beijing, 100081, China
| | - Yizheng Zhang
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Yingchun Zhang
- Key Laboratory of Plant Molecular Physiology, Institute of Botany, Chinese Academy of Sciences, Beijing, 100093, China
| | - Luyang Li
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhanyu Gao
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xuefei Lin
- Department of Dermatology and Venereal Disease, Xuan Wu Hospital, Beijing, 100053, China
| | - Xu Chi
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, Beijing, 100101, China.
| |
Collapse
|
12
|
Saha S, Chatterjee P, Nasipuri M, Basu S, Chakraborti T. Computational drug repurposing for viral infectious diseases: a case study on monkeypox. Brief Funct Genomics 2024:elad058. [PMID: 38183212 DOI: 10.1093/bfgp/elad058] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2023] [Revised: 12/04/2023] [Accepted: 12/12/2023] [Indexed: 01/07/2024] Open
Abstract
The traditional method of drug reuse or repurposing has significantly contributed to the identification of new antiviral compounds and therapeutic targets, enabling rapid response to developing infectious illnesses. This article presents an overview of how modern computational methods are used in drug repurposing for the treatment of viral infectious diseases. These methods utilize data sets that include reviewed information on the host's response to pathogens and drugs, as well as various connections such as gene expression patterns and protein-protein interaction networks. We assess the potential benefits and limitations of these methods by examining monkeypox as a specific example, but the knowledge acquired can be applied to other comparable disease scenarios.
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering (Artificial Intelligence and Machine Learning), Techno Main Salt Lake, EM-4/1, Sector V, Bidhannagar, Kolkata, West Bengal 700091, India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, Kolkata-700152, India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata - 700032, India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata - 700032, India
| | - Tapabrata Chakraborti
- Department of Medical Physics and Biomedical Engineering, University College London, UK
- Health Science Programme, The Alan Turing Institute, London, UK
- Linacre College, University of Oxford, UK
| |
Collapse
|
13
|
Mancuso CA, Johnson KA, Liu R, Krishnan A. Joint representation of molecular networks from multiple species improves gene classification. PLoS Comput Biol 2024; 20:e1011773. [PMID: 38198480 PMCID: PMC10805316 DOI: 10.1371/journal.pcbi.1011773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 01/23/2024] [Accepted: 12/20/2023] [Indexed: 01/12/2024] Open
Abstract
Network-based machine learning (ML) has the potential for predicting novel genes associated with nearly any health and disease context. However, this approach often uses network information from only the single species under consideration even though networks for most species are noisy and incomplete. While some recent methods have begun addressing this shortcoming by using networks from more than one species, they lack one or more key desirable properties: handling networks from more than two species simultaneously, incorporating many-to-many orthology information, or generating a network representation that is reusable across different types of and newly-defined prediction tasks. Here, we present GenePlexusZoo, a framework that casts molecular networks from multiple species into a single reusable feature space for network-based ML. We demonstrate that this multi-species network representation improves both gene classification within a single species and knowledge-transfer across species, even in cases where the inter-species correspondence is undetectable based on shared orthologous genes. Thus, GenePlexusZoo enables effectively leveraging the high evolutionary molecular, functional, and phenotypic conservation across species to discover novel genes associated with diverse biological contexts.
Collapse
Affiliation(s)
- Christopher A. Mancuso
- Department of Biostatistics and Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Kayla A. Johnson
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Renming Liu
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| | - Arjun Krishnan
- Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
- Department of Computational Mathematics, Science and Engineering, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
14
|
Ratajczak F, Joblin M, Hildebrandt M, Ringsquandl M, Falter-Braun P, Heinig M. Speos: an ensemble graph representation learning framework to predict core gene candidates for complex diseases. Nat Commun 2023; 14:7206. [PMID: 37938585 PMCID: PMC10632370 DOI: 10.1038/s41467-023-42975-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2023] [Accepted: 10/27/2023] [Indexed: 11/09/2023] Open
Abstract
Understanding phenotype-to-genotype relationships is a grand challenge of 21st century biology with translational implications. The recently proposed "omnigenic" model postulates that effects of genetic variation on traits are mediated by core-genes and -proteins whose activities mechanistically influence the phenotype, whereas peripheral genes encode a regulatory network that indirectly affects phenotypes via core gene products. Here, we develop a positive-unlabeled graph representation-learning ensemble-approach based on a nested cross-validation to predict core-like genes for diverse diseases using Mendelian disorder genes for training. Employing mouse knockout phenotypes for external validations, we demonstrate that core-like genes display several key properties of core genes: Mouse knockouts of genes corresponding to our most confident predictions give rise to relevant mouse phenotypes at rates on par with the Mendelian disorder genes, and all candidates exhibit core gene properties like transcriptional deregulation in disease and loss-of-function intolerance. Moreover, as predicted for core genes, our candidates are enriched for drug targets and druggable proteins. In contrast to Mendelian disorder genes the new core-like genes are enriched for druggable yet untargeted gene products, which are therefore attractive targets for drug development. Interpretation of the underlying deep learning model suggests plausible explanations for our core gene predictions in form of molecular mechanisms and physical interactions. Our results demonstrate the potential of graph representation learning for the interpretation of biological complexity and pave the way for studying core gene properties and future drug development.
Collapse
Affiliation(s)
- Florin Ratajczak
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany
| | | | | | | | - Pascal Falter-Braun
- Institute of Network Biology (INET), Molecular Targets and Therapeutics Center (MTTC), Helmholtz Munich, Neuherberg, Germany.
- Microbe-Host Interactions, Faculty of Biology, Ludwig-Maximilians-Universität München, Planegg-Martinsried, Germany.
| | - Matthias Heinig
- Institute of Computational Biology (ICB), Helmholtz Munich, Neuherberg, Germany.
- Department of Computer Science, TUM School of Computation, Information and Technology, Technical University of Munich, Garching, Germany.
- German Centre for Cardiovascular Research (DZHK), Munich Heart Association, Partner Site Munich, Berlin, Germany.
| |
Collapse
|
15
|
Martin-Hernandez R, Espeso-Gil S, Domingo C, Latorre P, Hervas S, Hernandez Mora JR, Kotelnikova E. Machine learning combining multi-omics data and network algorithms identifies adrenocortical carcinoma prognostic biomarkers. Front Mol Biosci 2023; 10:1258902. [PMID: 38028548 PMCID: PMC10658191 DOI: 10.3389/fmolb.2023.1258902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 10/06/2023] [Indexed: 12/01/2023] Open
Abstract
Background: Rare endocrine cancers such as Adrenocortical Carcinoma (ACC) present a serious diagnostic and prognostication challenge. The knowledge about ACC pathogenesis is incomplete, and patients have limited therapeutic options. Identification of molecular drivers and effective biomarkers is required for timely diagnosis of the disease and stratify patients to offer the most beneficial treatments. In this study we demonstrate how machine learning methods integrating multi-omics data, in combination with system biology tools, can contribute to the identification of new prognostic biomarkers for ACC. Methods: ACC gene expression and DNA methylation datasets were downloaded from the Xena Browser (GDC TCGA Adrenocortical Carcinoma cohort). A highly correlated multi-omics signature discriminating groups of samples was identified with the data integration analysis for biomarker discovery using latent components (DIABLO) method. Additional regulators of the identified signature were discovered using Clarivate CBDD (Computational Biology for Drug Discovery) network propagation and hidden nodes algorithms on a curated network of molecular interactions (MetaBase™). The discriminative power of the multi-omics signature and their regulators was delineated by training a random forest classifier using 55 samples, by employing a 10-fold cross validation with five iterations. The prognostic value of the identified biomarkers was further assessed on an external ACC dataset obtained from GEO (GSE49280) using the Kaplan-Meier estimator method. An optimal prognostic signature was finally derived using the stepwise Akaike Information Criterion (AIC) that allowed categorization of samples into high and low-risk groups. Results: A multi-omics signature including genes, micro RNA's and methylation sites was generated. Systems biology tools identified additional genes regulating the features included in the multi-omics signature. RNA-seq, miRNA-seq and DNA methylation sets of features revealed a high power to classify patients from stages I-II and stages III-IV, outperforming previously identified prognostic biomarkers. Using an independent dataset, associations of the genes included in the signature with Overall Survival (OS) data demonstrated that patients with differential expression levels of 8 genes and 4 micro RNA's showed a statistically significant decrease in OS. We also found an independent prognostic signature for ACC with potential use in clinical practice, combining 9-gene/micro RNA features, that successfully predicted high-risk ACC cancer patients. Conclusion: Machine learning and integrative analysis of multi-omics data, in combination with Clarivate CBDD systems biology tools, identified a set of biomarkers with high prognostic value for ACC disease. Multi-omics data is a promising resource for the identification of drivers and new prognostic biomarkers in rare diseases that could be used in clinical practice.
Collapse
|
16
|
Petti M, Farina L. Network medicine for patients' stratification: From single-layer to multi-omics. WIREs Mech Dis 2023; 15:e1623. [PMID: 37323106 DOI: 10.1002/wsbm.1623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 03/08/2023] [Accepted: 05/30/2023] [Indexed: 06/17/2023]
Abstract
Precision medicine research increasingly relies on the integrated analysis of multiple types of omics. In the era of big data, the large availability of different health-related information represents a great, but at the same time untapped, chance with a potentially fundamental role in the prevention, diagnosis and prognosis of diseases. Computational methods are needed to combine this data to create a comprehensive view of a given disease. Network science can model biomedical data in terms of relationships among molecular players of different nature and has been successfully proposed as a new paradigm for studying human diseases. Patient stratification is an open challenge aimed at identifying subtypes with different disease manifestations, severity, and expected survival time. Several stratification approaches based on high-throughput gene expression measurements have been successfully applied. However, few attempts have been proposed to exploit the integration of various genotypic and phenotypic data to discover novel sub-types or improve the detection of known groupings. This article is categorized under: Cancer > Biomedical Engineering Cancer > Computational Models Cancer > Genetics/Genomics/Epigenetics.
Collapse
Affiliation(s)
- Manuela Petti
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| | - Lorenzo Farina
- Department of Computer, Control and Management Engineering, Sapienza University of Rome, Rome, Italy
| |
Collapse
|
17
|
Shi W, Feng H, Li J, Liu T, Liu Z. DapBCH: a disease association prediction model Based on Cross-species and Heterogeneous graph embedding. Front Genet 2023; 14:1222346. [PMID: 37811150 PMCID: PMC10556742 DOI: 10.3389/fgene.2023.1222346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Accepted: 09/11/2023] [Indexed: 10/10/2023] Open
Abstract
The study of comorbidity can provide new insights into the pathogenesis of the disease and has important economic significance in the clinical evaluation of treatment difficulty, medical expenses, length of stay, and prognosis of the disease. In this paper, we propose a disease association prediction model DapBCH, which constructs a cross-species biological network and applies heterogeneous graph embedding to predict disease association. First, we combine the human disease-gene network, mouse gene-phenotype network, human-mouse homologous gene network, and human protein-protein interaction network to reconstruct a heterogeneous biological network. Second, we apply heterogeneous graph embedding based on meta-path aggregation to generate the feature vector of disease nodes. Finally, we employ link prediction to obtain the similarity of disease pairs. The experimental results indicate that our model is highly competitive in predicting the disease association and is promising for finding potential disease associations.
Collapse
Affiliation(s)
- Wanqi Shi
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Hailin Feng
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Jian Li
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Tongcun Liu
- School of Mathematics and Computer Science, Zhejiang A & F University, Hangzhou, Zhejiang, China
| | - Zhe Liu
- College of Media Engineering, Zhejiang University of Media and Communications, Hangzhou, Zhejiang, China
| |
Collapse
|
18
|
Schlüter A, Vélez-Santamaría V, Verdura E, Rodríguez-Palmero A, Ruiz M, Fourcade S, Planas-Serra L, Launay N, Guilera C, Martínez JJ, Homedes-Pedret C, Albertí-Aguiló MA, Zulaika M, Martí I, Troncoso M, Tomás-Vila M, Bullich G, García-Pérez MA, Sobrido-Gómez MJ, López-Laso E, Fons C, Del Toro M, Macaya A, Beltran S, Gutiérrez-Solana LG, Pérez-Jurado LA, Aguilera-Albesa S, de Munain AL, Casasnovas C, Pujol A. ClinPrior: an algorithm for diagnosis and novel gene discovery by network-based prioritization. Genome Med 2023; 15:68. [PMID: 37679823 PMCID: PMC10486091 DOI: 10.1186/s13073-023-01214-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Accepted: 07/24/2023] [Indexed: 09/09/2023] Open
Abstract
BACKGROUND Whole-exome sequencing (WES) and whole-genome sequencing (WGS) have become indispensable tools to solve rare Mendelian genetic conditions. Nevertheless, there is still an urgent need for sensitive, fast algorithms to maximise WES/WGS diagnostic yield in rare disease patients. Most tools devoted to this aim take advantage of patient phenotype information for prioritization of genomic data, although are often limited by incomplete gene-phenotype knowledge stored in biomedical databases and a lack of proper benchmarking on real-world patient cohorts. METHODS We developed ClinPrior, a novel method for the analysis of WES/WGS data that ranks candidate causal variants based on the patient's standardized phenotypic features (in Human Phenotype Ontology (HPO) terms). The algorithm propagates the data through an interactome network-based prioritization approach. This algorithm was thoroughly benchmarked using a synthetic patient cohort and was subsequently tested on a heterogeneous prospective, real-world series of 135 families affected by hereditary spastic paraplegia (HSP) and/or cerebellar ataxia (CA). RESULTS ClinPrior successfully identified causative variants achieving a final positive diagnostic yield of 70% in our real-world cohort. This includes 10 novel candidate genes not previously associated with disease, 7 of which were functionally validated within this project. We used the knowledge generated by ClinPrior to create a specific interactome for HSP/CA disorders thus enabling future diagnoses as well as the discovery of novel disease genes. CONCLUSIONS ClinPrior is an algorithm that uses standardized phenotype information and interactome data to improve clinical genomic diagnosis. It helps in identifying atypical cases and efficiently predicts novel disease-causing genes. This leads to increasing diagnostic yield, shortening of the diagnostic Odysseys and advancing our understanding of human illnesses.
Collapse
Affiliation(s)
- Agatha Schlüter
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Valentina Vélez-Santamaría
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Neurology Department, Neuromuscular Unit, Bellvitge University Hospital, Universitat de Barcelona, Barcelona, Spain
| | - Edgard Verdura
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Agustí Rodríguez-Palmero
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Unit, Pediatrics Department, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Montserrat Ruiz
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Stéphane Fourcade
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Laura Planas-Serra
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Nathalie Launay
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Cristina Guilera
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Juan José Martínez
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | - Christian Homedes-Pedret
- Neurology Department, Neuromuscular Unit, Bellvitge University Hospital, Universitat de Barcelona, Barcelona, Spain
- Neurology Department, Hospital Universitari General de Catalunya, Barcelona, Spain
| | - M Antonia Albertí-Aguiló
- Neurology Department, Neuromuscular Unit, Bellvitge University Hospital, Universitat de Barcelona, Barcelona, Spain
| | - Miren Zulaika
- Neuromuscular Area, Group of Neurodegenerative Diseases, Biodonostia Health Research Institute (Biodonostia HRI), San Sebastian, Spain
- Network Center for Biomedical Research in Neurodegenerative Diseases (CIBERNED), ISCIII, Madrid, Spain
| | - Itxaso Martí
- Neuromuscular Area, Group of Neurodegenerative Diseases, Biodonostia Health Research Institute (Biodonostia HRI), San Sebastian, Spain
- Network Center for Biomedical Research in Neurodegenerative Diseases (CIBERNED), ISCIII, Madrid, Spain
- Pediatric Neurology Department, Donostia University Hospital, University of the Basque Country (UPV-EHU), San Sebastian, Spain
| | - Mónica Troncoso
- Pediatric Neurology Department, Central Campus, Hospital Clínico San Borja Arriarán, Universidad de Chile, Santiago, Chile
| | - Miguel Tomás-Vila
- Neuropediatrics Department, Hospital Universitari i Politècnic La Fe, Valencia, Spain
| | - Gemma Bullich
- Centro Nacional Análisis Genómico (CNAG) - Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, Spain
| | - M Asunción García-Pérez
- Pediatric Neurology Unit, Pediatrics Department, Hospital Universitario Fundación Alcorcón, Madrid, Spain
| | - María-Jesús Sobrido-Gómez
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Coruña Institute of Biomedical Research (INIBIC), A Coruña, Spain
- Hospital Clínico Universitario, A Coruña, Spain
| | - Eduardo López-Laso
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Unit, Pediatrics Department, Reina Sofía University Hospital, Córdoba, Spain
- Maimonides Institute For Biomedical Research of Cordoba (IMIBIC), Córdoba, Spain
| | - Carme Fons
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Department, Sant Joan de Déu University Hospital, Member of the ERN EpiCARE, Barcelona, Spain
- Sant Joan de Déu Research Institute, (IRSJD), Barcelona, Spain
| | - Mireia Del Toro
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Department, Vall d'Hebron University Hospital, Universitat Autònoma de Barcelona, Barcelona, Spain
- Pediatric Neurology Research Group, Vall d'Hebron Research Institute (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Alfons Macaya
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Department, Vall d'Hebron University Hospital, Universitat Autònoma de Barcelona, Barcelona, Spain
- Pediatric Neurology Research Group, Vall d'Hebron Research Institute (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain
| | - Sergi Beltran
- Centro Nacional Análisis Genómico (CNAG) - Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Baldiri Reixac 4, Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Departament de Genètica, Facultat de Biologia, Microbiologia i Estadística, Universitat de Barcelona (UB), Barcelona, 08028, Spain
| | - Luis G Gutiérrez-Solana
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Pediatric Neurology Department, Children's University Hospital Niño Jesús, Madrid, Spain
| | - Luis A Pérez-Jurado
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- Genetics Service, Hospital del Mar Research Institute (IMIM), Barcelona, Spain
- Department of Experimental and Health Sciences, Universitat Pompeu Fabra, Barcelona, Spain
| | - Sergio Aguilera-Albesa
- Pediatric Neurology Unit, Pediatrics Department, Navarra Health Service, Pamplona, Spain
- Navarrabiomed, Biomedical Research Center, Pamplona, Spain
| | - Adolfo López de Munain
- Neuromuscular Area, Group of Neurodegenerative Diseases, Biodonostia Health Research Institute (Biodonostia HRI), San Sebastian, Spain
- Network Center for Biomedical Research in Neurodegenerative Diseases (CIBERNED), ISCIII, Madrid, Spain
- Neurology Department, Donostia University Hospital, San Sebastian, Spain
| | - Carlos Casasnovas
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain.
- Neurology Department, Neuromuscular Unit, Bellvitge University Hospital, Universitat de Barcelona, Barcelona, Spain.
| | - Aurora Pujol
- Neurometabolic Diseases Laboratory, Bellvitge Biomedical Research Institute (IDIBELL), Hospital Duran i Reynals, Gran Via 199, L'Hospitalet de Llobregat, Barcelona, 08908, Spain.
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain.
- Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Catalonia, Spain.
| |
Collapse
|
19
|
Huang Y, Wu Z, Lan W, Zhong C. Predicting Disease-Associated N7-Methylguanosine (m 7G) Sites via Random Walk on Heterogeneous Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3173-3181. [PMID: 37294648 DOI: 10.1109/tcbb.2023.3284505] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Recent studies revealed that the modification of N7-methylguanosine (m7G) has associations with many human diseases. Effectively identifying disease-associated m7G methylation sites would provide crucial clues for disease diagnosis and treatment. Previous studies have developed computational methods to predict disease-associated m7G sites based on similarities among m7G sites and diseases. However, few have focused on the influence of the known m7G-disease association information on calculating similarity measures of m7G site and disease, which potentially promotes the identification of the disease-associated m7G sites. In this work, we propose а computational method called m7GDP-RW to predict m7G-disease associations by random walk algorithm. m7GDP-RW first incorporates the feature information of m7G site and disease with the known m7G-disease associations to compute m7G site similarity and disease similarity. Then m7GDP-RW combines the known m7G-disease associations with the computed similarity of m7G site and disease to construct a m7G-disease heterogeneous network. Finally, m7GDP-RW utilizes a two-pass random walk with restart algorithm to find novel m7G-disease associations on the heterogeneous network. The experimental results show that our method achieves higher prediction accuracy compared to the existing methods. The study case also demonstrates the effectiveness of m7GDP-RW in discovering potential m7G-disease associations.
Collapse
|
20
|
Wright SN, Leger BS, Rosenthal SB, Liu SN, Jia T, Chitre AS, Polesskaya O, Holl K, Gao J, Cheng R, Garcia Martinez A, George A, Gileta AF, Han W, Netzley AH, King CP, Lamparelli A, Martin C, St Pierre CL, Wang T, Bimschleger H, Richards J, Ishiwari K, Chen H, Flagel SB, Meyer P, Robinson TE, Solberg Woods LC, Kreisberg JF, Ideker T, Palmer AA. Genome-wide association studies of human and rat BMI converge on synapse, epigenome, and hormone signaling networks. Cell Rep 2023; 42:112873. [PMID: 37527041 PMCID: PMC10546330 DOI: 10.1016/j.celrep.2023.112873] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 07/05/2023] [Accepted: 07/11/2023] [Indexed: 08/03/2023] Open
Abstract
A vexing observation in genome-wide association studies (GWASs) is that parallel analyses in different species may not identify orthologous genes. Here, we demonstrate that cross-species translation of GWASs can be greatly improved by an analysis of co-localization within molecular networks. Using body mass index (BMI) as an example, we show that the genes associated with BMI in humans lack significant agreement with those identified in rats. However, the networks interconnecting these genes show substantial overlap, highlighting common mechanisms including synaptic signaling, epigenetic modification, and hormonal regulation. Genetic perturbations within these networks cause abnormal BMI phenotypes in mice, too, supporting their broad conservation across mammals. Other mechanisms appear species specific, including carbohydrate biosynthesis (humans) and glycerolipid metabolism (rodents). Finally, network co-localization also identifies cross-species convergence for height/body length. This study advances a general paradigm for determining whether and how phenotypes measured in model species recapitulate human biology.
Collapse
Affiliation(s)
- Sarah N Wright
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Program in Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA 92093, USA
| | - Brittany S Leger
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Program in Biomedical Sciences, University of California San Diego, La Jolla, CA 93093, USA
| | - Sara Brin Rosenthal
- Center for Computational Biology & Bioinformatics, Department of Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sophie N Liu
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Tongqiu Jia
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Apurva S Chitre
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Oksana Polesskaya
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Katie Holl
- Department of Physiology, Medical College of Wisconsin, Milwaukee, WI 53226, USA
| | - Jianjun Gao
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Riyan Cheng
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Angel Garcia Martinez
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Anthony George
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA
| | - Alexander F Gileta
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Wenyan Han
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Alesa H Netzley
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Christopher P King
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | | | - Connor Martin
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | | | - Tengfei Wang
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Hannah Bimschleger
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA
| | - Jerry Richards
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA
| | - Keita Ishiwari
- Clinical and Research Institute on Addictions, University at Buffalo, Buffalo, NY 14203, USA; Department of Pharmacology and Toxicology, University at Buffalo, Buffalo, NY 14203, USA
| | - Hao Chen
- Department of Pharmacology, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Shelly B Flagel
- Department of Psychiatry, University of Michigan, Ann Arbor, MI 48109, USA; Michigan Neuroscience Institute, University of Michigan, Ann Arbor, MI 48109, USA
| | - Paul Meyer
- Department of Psychology, University at Buffalo, Buffalo, NY 14260, USA
| | - Terry E Robinson
- Department of Psychology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Leah C Solberg Woods
- Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Jason F Kreisberg
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA 93093, USA; Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
21
|
Nunes S, Sousa R, Pesquita C. Multi-domain knowledge graph embeddings for gene-disease association prediction. J Biomed Semantics 2023; 14:11. [PMID: 37580835 PMCID: PMC10426189 DOI: 10.1186/s13326-023-00291-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2022] [Accepted: 07/29/2023] [Indexed: 08/16/2023] Open
Abstract
BACKGROUND Predicting gene-disease associations typically requires exploring diverse sources of information as well as sophisticated computational approaches. Knowledge graph embeddings can help tackle these challenges by creating representations of genes and diseases based on the scientific knowledge described in ontologies, which can then be explored by machine learning algorithms. However, state-of-the-art knowledge graph embeddings are produced over a single ontology or multiple but disconnected ones, ignoring the impact that considering multiple interconnected domains can have on complex tasks such as gene-disease association prediction. RESULTS We propose a novel approach to predict gene-disease associations using rich semantic representations based on knowledge graph embeddings over multiple ontologies linked by logical definitions and compound ontology mappings. The experiments showed that considering richer knowledge graphs significantly improves gene-disease prediction and that different knowledge graph embeddings methods benefit more from distinct types of semantic richness. CONCLUSIONS This work demonstrated the potential for knowledge graph embeddings across multiple and interconnected biomedical ontologies to support gene-disease prediction. It also paved the way for considering other ontologies or tackling other tasks where multiple perspectives over the data can be beneficial. All software and data are freely available.
Collapse
Affiliation(s)
- Susana Nunes
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Rita T. Sousa
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| | - Catia Pesquita
- LASIGE, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal
| |
Collapse
|
22
|
Chu X, Guan B, Dai L, Liu JX, Li F, Shang J. Network embedding framework for driver gene discovery by combining functional and structural information. BMC Genomics 2023; 24:426. [PMID: 37516822 PMCID: PMC10386255 DOI: 10.1186/s12864-023-09515-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Accepted: 07/13/2023] [Indexed: 07/31/2023] Open
Abstract
Comprehensive analysis of multiple data sets can identify potential driver genes for various cancers. In recent years, driver gene discovery based on massive mutation data and gene interaction networks has attracted increasing attention, but there is still a need to explore combining functional and structural information of genes in protein interaction networks to identify driver genes. Therefore, we propose a network embedding framework combining functional and structural information to identify driver genes. Firstly, we combine the mutation data and gene interaction networks to construct mutation integration network using network propagation algorithm. Secondly, the struc2vec model is used for extracting gene features from the mutation integration network, which contains both gene's functional and structural information. Finally, machine learning algorithms are utilized to identify the driver genes. Compared with the previous four excellent methods, our method can find gene pairs that are distant from each other through structural similarities and has better performance in identifying driver genes for 12 cancers in the cancer genome atlas. At the same time, we also conduct a comparative analysis of three gene interaction networks, three gene standard sets, and five machine learning algorithms. Our framework provides a new perspective for feature selection to identify novel driver genes.
Collapse
Affiliation(s)
- Xin Chu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Boxin Guan
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Lingyun Dai
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China
| | - Feng Li
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 27826, China.
| |
Collapse
|
23
|
Kim Y, Cho YR. Predicting Drug-Gene-Disease Associations by Tensor Decomposition for Network-Based Computational Drug Repositioning. Biomedicines 2023; 11:1998. [PMID: 37509637 PMCID: PMC10377142 DOI: 10.3390/biomedicines11071998] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/07/2023] [Accepted: 07/12/2023] [Indexed: 07/30/2023] Open
Abstract
Drug repositioning offers the significant advantage of greatly reducing the cost and time of drug discovery by identifying new therapeutic indications for existing drugs. In particular, computational approaches using networks in drug repositioning have attracted attention for inferring potential associations between drugs and diseases efficiently based on the network connectivity. In this article, we proposed a network-based drug repositioning method to construct a drug-gene-disease tensor by integrating drug-disease, drug-gene, and disease-gene associations and predict drug-gene-disease triple associations through tensor decomposition. The proposed method, which ensembles generalized tensor decomposition (GTD) and multi-layer perceptron (MLP), models drug-gene-disease associations through GTD and learns the features of drugs, genes, and diseases through MLP, providing more flexibility and non-linearity than conventional tensor decomposition. We experimented with drug-gene-disease association prediction using two distinct networks created by chemical structures and ATC codes as drug features. Moreover, we leveraged drug, gene, and disease latent vectors obtained from the predicted triple associations to predict drug-disease, drug-gene, and disease-gene pairwise associations. Our experimental results revealed that the proposed ensemble method was superior for triple association prediction. The ensemble model achieved an AUC of 0.96 in predicting triple associations for new drugs, resulting in an approximately 7% improvement over the performance of existing models. It also showed competitive accuracy for pairwise association prediction compared with previous methods. This study demonstrated that incorporating genetic information leads to notable advancements in drug repositioning.
Collapse
Affiliation(s)
- Yoonbee Kim
- Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Republic of Korea
| | - Young-Rae Cho
- Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Republic of Korea
- Division of Digital Healthcare, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Republic of Korea
| |
Collapse
|
24
|
Boyd SS, Slawson C, Thompson JA. AMEND: active module identification using experimental data and network diffusion. BMC Bioinformatics 2023; 24:277. [PMID: 37415126 DOI: 10.1186/s12859-023-05376-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 06/02/2023] [Indexed: 07/08/2023] Open
Abstract
BACKGROUND Molecular interaction networks have become an important tool in providing context to the results of various omics experiments. For example, by integrating transcriptomic data and protein-protein interaction (PPI) networks, one can better understand how the altered expression of several genes are related with one another. The challenge then becomes how to determine, in the context of the interaction network, the subset(s) of genes that best captures the main mechanisms underlying the experimental conditions. Different algorithms have been developed to address this challenge, each with specific biological questions in mind. One emerging area of interest is to determine which genes are equivalently or inversely changed between different experiments. The equivalent change index (ECI) is a recently proposed metric that measures the extent to which a gene is equivalently or inversely regulated between two experiments. The goal of this work is to develop an algorithm that makes use of the ECI and powerful network analysis techniques to identify a connected subset of genes that are highly relevant to the experimental conditions. RESULTS To address the above goal, we developed a method called Active Module identification using Experimental data and Network Diffusion (AMEND). The AMEND algorithm is designed to find a subset of connected genes in a PPI network that have large experimental values. It makes use of random walk with restart to create gene weights, and a heuristic solution to the Maximum-weight Connected Subgraph problem using these weights. This is performed iteratively until an optimal subnetwork (i.e., active module) is found. AMEND was compared to two current methods, NetCore and DOMINO, using two gene expression datasets. CONCLUSION The AMEND algorithm is an effective, fast, and easy-to-use method for identifying network-based active modules. It returned connected subnetworks with the largest median ECI by magnitude, capturing distinct but related functional groups of genes. Code is freely available at https://github.com/samboyd0/AMEND .
Collapse
Affiliation(s)
- Samuel S Boyd
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS, 66103, USA
- University of Kansas Cancer Center, Kansas City, KS, USA
| | - Chad Slawson
- Department of Biochemistry, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS, 66103, USA
- University of Kansas Cancer Center, Kansas City, KS, USA
- University of Kansas Alzheimer's Disease Research Center, Fairway, KS, USA
| | - Jeffrey A Thompson
- Department of Biostatistics and Data Science, University of Kansas Medical Center, 3901 Rainbow Blvd., Kansas City, KS, 66103, USA.
- University of Kansas Cancer Center, Kansas City, KS, USA.
| |
Collapse
|
25
|
Szebényi K, Barrio-Hernandez I, Gibbons GM, Biasetti L, Troakes C, Beltrao P, Lakatos A. A human proteogenomic-cellular framework identifies KIF5A as a modulator of astrocyte process integrity with relevance to ALS. Commun Biol 2023; 6:678. [PMID: 37386082 PMCID: PMC10310856 DOI: 10.1038/s42003-023-05041-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Accepted: 06/13/2023] [Indexed: 07/01/2023] Open
Abstract
Genome-wide association studies identified several disease-causing mutations in neurodegenerative diseases, including amyotrophic lateral sclerosis (ALS). However, the contribution of genetic variants to pathway disturbances and their cell type-specific variations, especially in glia, is poorly understood. We integrated ALS GWAS-linked gene networks with human astrocyte-specific multi-omics datasets to elucidate pathognomonic signatures. It predicts that KIF5A, a motor protein kinesin-1 heavy-chain isoform, previously detected only in neurons, can also potentiate disease pathways in astrocytes. Using postmortem tissue and super-resolution structured illumination microscopy in cell-based perturbation platforms, we provide evidence that KIF5A is present in astrocyte processes and its deficiency disrupts structural integrity and mitochondrial transport. We show that this may underly cytoskeletal and trafficking changes in SOD1 ALS astrocytes characterised by low KIF5A levels, which can be rescued by c-Jun N-terminal Kinase-1 (JNK1), a kinesin transport regulator. Altogether, our pipeline reveals a mechanism controlling astrocyte process integrity, a pre-requisite for synapse maintenance and suggests a targetable loss-of-function in ALS.
Collapse
Affiliation(s)
- Kornélia Szebényi
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | | | - George M Gibbons
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK
| | - Luca Biasetti
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Claire Troakes
- Department of Basic and Clinical Neuroscience, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, SE5 8AF, UK
| | - Pedro Beltrao
- European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, CB10 1SD, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, 8093, Switzerland.
| | - András Lakatos
- John van Geest Centre for Brain Repair, Department of Clinical Neurosciences, University of Cambridge, Cambridge Biomedical Campus, Cambridge, CB2 0PY, UK.
- Wellcome Trust-MRC Cambridge Stem Cell Institute, Cambridge Biomedical Campus, Cambridge, CB2 0AW, UK.
| |
Collapse
|
26
|
Zhao L, Zhang H, Li N, Chen J, Xu H, Wang Y, Liang Q. Network pharmacology, a promising approach to reveal the pharmacology mechanism of Chinese medicine formula. JOURNAL OF ETHNOPHARMACOLOGY 2023; 309:116306. [PMID: 36858276 DOI: 10.1016/j.jep.2023.116306] [Citation(s) in RCA: 99] [Impact Index Per Article: 99.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 02/06/2023] [Accepted: 02/19/2023] [Indexed: 05/20/2023]
Abstract
ETHNOPHARMACOLOGICAL RELEVANCE Network pharmacology is a new discipline based on systems biology theory, biological system network analysis, and multi-target drug molecule design specific signal node selection. The mechanism of action of TCM formula has the characteristics of multiple targets and levels. The mechanism is similar to the integrity, systematization and comprehensiveness of network pharmacology, so network pharmacology is suitable for the study of the pharmacological mechanism of Chinese medicine compounds. AIM OF THE STUDY The paper summarizes the present application status and existing problems of network pharmacology in the field of Chinese medicine formula, and formulates the research ideas, up-to-date key technology and application method and strategy of network pharmacology. Its purpose is to provide guidance and reference for using network pharmacology to reveal the modern scientific connotation of Chinese medicine. MATERIALS AND METHODS Literatures in this review were searched in PubMed, China National Knowledge Infrastructure (CNKI), Web of Science, ScienceDirect and Google Scholar using the keywords "traditional Chinese medicine", "Chinese herb medicine" and "network pharmacology". The literature cited in this review dates from 2002 to 2022. RESULTS Using network pharmacology methods to predict the basis and mechanism of pharmacodynamic substances of traditional Chinese medicines has become a trend. CONCLUSION Network pharmacology is a promising approach to reveal the pharmacology mechanism of Chinese medicine formula.
Collapse
Affiliation(s)
- Li Zhao
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Hong Zhang
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Ning Li
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Jinman Chen
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Hao Xu
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China
| | - Yongjun Wang
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China.
| | - Qianqian Liang
- Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China; Spine Institute, Shanghai University of Traditional Chinese Medicine, Shanghai, 201203, China; Key Laboratory of Ministry of Education of Theory and Therapy of Muscles and Bones, Shanghai University of Traditional Chinese Medicine, Shanghai, 200032, China.
| |
Collapse
|
27
|
Zhan Y, Liu J, Wu M, Tan CSH, Li X, Ou-Yang L. A partially shared joint clustering framework for detecting protein complexes from multiple state-specific signed interaction networks. Comput Biol Med 2023; 159:106936. [PMID: 37105110 DOI: 10.1016/j.compbiomed.2023.106936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/27/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023]
Abstract
Detecting protein complexes is critical for studying cellular organizations and functions. The accumulation of protein-protein interaction (PPI) data enables the identification of protein complexes computationally. Although a great number of computational methods have been proposed to identify protein complexes from PPI networks, most of them ignore the signs of PPIs that reflect the ways proteins interact (activation or inhibition). As not all PPIs imply co-complex relationships, taking into account the signs of PPIs can benefit the identification of protein complexes. Moreover, PPI networks are not static, but vary with the change of cell states or environments. However, existing methods are primarily designed for single-network clustering, and rarely consider joint clustering of multiple PPI networks. In this study, we propose a novel partially shared signed network clustering (PS-SNC) model for identifying protein complexes from multiple state-specific signed PPI networks jointly. PS-SNC can not only consider the signs of PPIs, but also identify the common and unique protein complexes in different states. Experimental results on synthetic and real datasets show that our PS-SNC model can achieve better performance than other state-of-the-art protein complex detection methods. Extensive analysis on real datasets demonstrate the effectiveness of PS-SNC in revealing novel insights about the underlying patterns of different cell lines.
Collapse
Affiliation(s)
- Youlin Zhan
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Jiahan Liu
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Min Wu
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Chris Soon Heng Tan
- Department of Chemistry, College of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaoli Li
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China; Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518129, China.
| |
Collapse
|
28
|
Altuntas V. Diffusion Alignment Coefficient (DAC): A Novel Similarity Metric for Protein-Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:894-903. [PMID: 35737632 DOI: 10.1109/tcbb.2022.3185406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Interaction networks can be used to predict the functions of unknown proteins using known interactions and proteins with known functions. Many graph theory or diffusion-based methods have been proposed, using the assumption that the topological properties of a protein in a network are related to its biological function. Here we seek to improve function prediction by finding more similar neighbors with a new diffusion-based alignment technique to overcome the topological information loss of the node. In this study, we introduce the Diffusion Alignment Coefficient (DAC) algorithm, which combines diffusion, longest common subsequence, and longest common substring techniques to measure the similarity of two nodes in protein interaction networks. As a proof of concept, our experiments, conducted on a real PPI networks S.cerevisiae and Homo Sapiens, demonstrated that our method obtained better results than competitors for MIPS and MSigDB Collections hallmark gene set functional categories. This is the first study to develop a measure of node function similarity using alignment to consider the positions of nodes in protein-protein interaction networks. According to the experimental results, the use of spatial information belonging to the nodes in the network has a positive effect on the detection of more functionally similar neighboring nodes.
Collapse
|
29
|
Barrio-Hernandez I, Schwartzentruber J, Shrivastava A, Del-Toro N, Gonzalez A, Zhang Q, Mountjoy E, Suveges D, Ochoa D, Ghoussaini M, Bradley G, Hermjakob H, Orchard S, Dunham I, Anderson CA, Porras P, Beltrao P. Network expansion of genetic associations defines a pleiotropy map of human cell biology. Nat Genet 2023; 55:389-398. [PMID: 36823319 PMCID: PMC10011132 DOI: 10.1038/s41588-023-01327-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2021] [Accepted: 01/30/2023] [Indexed: 02/25/2023]
Abstract
Interacting proteins tend to have similar functions, influencing the same organismal traits. Interaction networks can be used to expand the list of candidate trait-associated genes from genome-wide association studies. Here, we performed network-based expansion of trait-associated genes for 1,002 human traits showing that this recovers known disease genes or drug targets. The similarity of network expansion scores identifies groups of traits likely to share an underlying genetic and biological process. We identified 73 pleiotropic gene modules linked to multiple traits, enriched in genes involved in processes such as protein ubiquitination and RNA processing. In contrast to gene deletion studies, pleiotropy as defined here captures specifically multicellular-related processes. We show examples of modules linked to human diseases enriched in genes with known pathogenic variants that can be used to map targets of approved drugs for repurposing. Finally, we illustrate the use of network expansion scores to study genes at inflammatory bowel disease genome-wide association study loci, and implicate inflammatory bowel disease-relevant genes with strong functional and genetic support.
Collapse
Affiliation(s)
- Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Jeremy Schwartzentruber
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Anjali Shrivastava
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Asier Gonzalez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Qian Zhang
- Wellcome Sanger Institute, Cambridge, UK
| | - Edward Mountjoy
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Daniel Suveges
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - David Ochoa
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Maya Ghoussaini
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Glyn Bradley
- Computational Biology, Genomic Sciences, GSK, Stevenage, UK
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Ian Dunham
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Carl A Anderson
- Open Targets, Cambridge, UK
- Wellcome Sanger Institute, Cambridge, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK
- Open Targets, Cambridge, UK
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Cambridge, UK.
- Open Targets, Cambridge, UK.
- Institute of Molecular Systems Biology, ETH Zürich, Zürich, Switzerland.
| |
Collapse
|
30
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
31
|
Zhang Y, Xiang J, Tang L, Yang J, Li J. PGAGP: Predicting pathogenic genes based on adaptive network embedding algorithm. Front Genet 2023; 13:1087784. [PMID: 36744177 PMCID: PMC9895109 DOI: 10.3389/fgene.2022.1087784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 12/09/2022] [Indexed: 01/21/2023] Open
Abstract
The study of disease-gene associations is an important topic in the field of computational biology. The accumulation of massive amounts of biomedical data provides new possibilities for exploring potential relations between diseases and genes through computational strategy, but how to extract valuable information from the data to predict pathogenic genes accurately and rapidly is currently a challenging and meaningful task. Therefore, we present a novel computational method called PGAGP for inferring potential pathogenic genes based on an adaptive network embedding algorithm. The PGAGP algorithm is to first extract initial features of nodes from a heterogeneous network of diseases and genes efficiently and effectively by Gaussian random projection and then optimize the features of nodes by an adaptive refining process. These low-dimensional features are used to improve the disease-gene heterogenous network, and we apply network propagation to the improved heterogenous network to predict pathogenic genes more effectively. By a series of experiments, we study the effect of PGAGP's parameters and integrated strategies on predictive performance and confirm that PGAGP is better than the state-of-the-art algorithms. Case studies show that many of the predicted candidate genes for specific diseases have been implied to be related to these diseases by literature verification and enrichment analysis, which further verifies the effectiveness of PGAGP. Overall, this work provides a useful solution for mining disease-gene heterogeneous network to predict pathogenic genes more effectively.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| | - Liang Tang
- Academician Workstation, Changsha Medical University, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd, Beijing, China
| | - Jianming Li
- Academician Workstation, Changsha Medical University, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| |
Collapse
|
32
|
Mapping the common gene networks that underlie related diseases. Nat Protoc 2023:10.1038/s41596-022-00797-1. [PMID: 36653526 DOI: 10.1038/s41596-022-00797-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 11/21/2022] [Indexed: 01/19/2023]
Abstract
A longstanding goal of biomedicine is to understand how alterations in molecular and cellular networks give rise to the spectrum of human diseases. For diseases with shared etiology, understanding the common causes allows for improved diagnosis of each disease, development of new therapies and more comprehensive identification of disease genes. Accordingly, this protocol describes how to evaluate the extent to which two diseases, each characterized by a set of mapped genes, are colocalized in a reference gene interaction network. This procedure uses network propagation to measure the network 'distance' between gene sets. For colocalized diseases, the network can be further analyzed to extract common gene communities at progressive granularities. In particular, we show how to: (1) obtain input gene sets and a reference gene interaction network; (2) identify common subnetworks of genes that encompass or are in close proximity to all gene sets; (3) use multiscale community detection to identify systems and pathways represented by each common subnetwork to generate a network colocalized systems map; (4) validate identified genes and systems using a mouse variant database; and (5) visualize and further investigate select genes, interactions and systems for relevance to phenotype(s) of interest. We demonstrate the utility of this approach by identifying shared biological mechanisms underlying autism and congenital heart disease. However, this protocol is general and can be applied to any gene sets attributed to diseases or other phenotypes with suspected joint association. A typical NetColoc run takes less than an hour. Software and documentation are available at https://github.com/ucsd-ccbb/NetColoc .
Collapse
|
33
|
de la Fuente L, Del Pozo-Valero M, Perea-Romero I, Blanco-Kelly F, Fernández-Caballero L, Cortón M, Ayuso C, Mínguez P. Prioritization of New Candidate Genes for Rare Genetic Diseases by a Disease-Aware Evaluation of Heterogeneous Molecular Networks. Int J Mol Sci 2023; 24:ijms24021661. [PMID: 36675175 PMCID: PMC9864172 DOI: 10.3390/ijms24021661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2022] [Revised: 01/10/2023] [Accepted: 01/11/2023] [Indexed: 01/18/2023] Open
Abstract
Screening for pathogenic variants in the diagnosis of rare genetic diseases can now be performed on all genes thanks to the application of whole exome and genome sequencing (WES, WGS). Yet the repertoire of gene-disease associations is not complete. Several computer-based algorithms and databases integrate distinct gene-gene functional networks to accelerate the discovery of gene-disease associations. We hypothesize that the ability of every type of information to extract relevant insights is disease-dependent. We compiled 33 functional networks classified into 13 knowledge categories (KCs) and observed large variability in their ability to recover genes associated with 91 genetic diseases, as measured using efficiency and exclusivity. We developed GLOWgenes, a network-based algorithm that applies random walk with restart to evaluate KCs' ability to recover genes from a given list associated with a phenotype and modulates the prediction of new candidates accordingly. Comparison with other integration strategies and tools shows that our disease-aware approach can boost the discovery of new gene-disease associations, especially for the less obvious ones. KC contribution also varies if obtained using recently discovered genes. Applied to 15 unsolved WES, GLOWgenes proposed three new genes to be involved in the phenotypes of patients with syndromic inherited retinal dystrophies.
Collapse
Affiliation(s)
- Lorena de la Fuente
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
- Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
| | - Marta Del Pozo-Valero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Irene Perea-Romero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Fiona Blanco-Kelly
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Lidia Fernández-Caballero
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Marta Cortón
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Carmen Ayuso
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
| | - Pablo Mínguez
- Department of Genetics, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Center for Biomedical Network Research on Rare Diseases (CIBERER), Instituto de Salud Carlos III (ISCIII), 28040 Madrid, Spain
- Bioinformatics Unit, Health Research Institute–Fundación Jiménez Díaz University Hospital, Universidad Autónoma de Madrid (IIS-FJD, UAM), 28049 Madrid, Spain
- Correspondence:
| |
Collapse
|
34
|
Lin L, Chen R, Zhu Y, Xie W, Jing H, Chen L, Zou M. SCCPMD: Probability matrix decomposition method subject to corrected similarity constraints for inferring long non-coding RNA-disease associations. Front Microbiol 2023; 13:1093615. [PMID: 36713213 PMCID: PMC9874942 DOI: 10.3389/fmicb.2022.1093615] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Accepted: 11/30/2022] [Indexed: 01/13/2023] Open
Abstract
Accumulating evidence has demonstrated various associations of long non-coding RNAs (lncRNAs) with human diseases, such as abnormal expression due to microbial influences that cause disease. Gaining a deeper understanding of lncRNA-disease associations is essential for disease diagnosis, treatment, and prevention. In recent years, many matrix decomposition methods have also been used to predict potential lncRNA-disease associations. However, these methods do not consider the use of microbe-disease association information to enrich disease similarity, and also do not make more use of similarity information in the decomposition process. To address these issues, we here propose a correction-based similarity-constrained probability matrix decomposition method (SCCPMD) to predict lncRNA-disease associations. The microbe-disease associations are first used to enrich the disease semantic similarity matrix, and then the logistic function is used to correct the lncRNA and disease similarity matrix, and then these two corrected similarity matrices are added to the probability matrix decomposition as constraints to finally predict the potential lncRNA-disease associations. The experimental results show that SCCPMD outperforms the five advanced comparison algorithms. In addition, SCCPMD demonstrated excellent prediction performance in a case study for breast cancer, lung cancer, and renal cell carcinoma, with prediction accuracy reaching 80, 100, and 100%, respectively. Therefore, SCCPMD shows excellent predictive performance in identifying unknown lncRNA-disease associations.
Collapse
Affiliation(s)
- Lieqing Lin
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
| | - Ruibin Chen
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Yinting Zhu
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Weijie Xie
- School of Computer, Guangdong University of Technology, Guangzhou, China
| | - Huaiguo Jing
- Sports Department, Guangdong University of Technology, Guangzhou, China,*Correspondence: Huaiguo Jing,
| | - Langcheng Chen
- Center of Campus Network & Modern Educational Technology, Guangdong University of Technology, Guangzhou, China,Langcheng Chen,
| | - Minqing Zou
- Department of Experiment Teaching, Guangdong University of Technology, Guangzhou, China
| |
Collapse
|
35
|
Koch E, Kauppi K, Chen CH. Candidates for drug repurposing to address the cognitive symptoms in schizophrenia. Prog Neuropsychopharmacol Biol Psychiatry 2023; 120:110637. [PMID: 36099967 DOI: 10.1016/j.pnpbp.2022.110637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 07/23/2022] [Accepted: 09/07/2022] [Indexed: 01/24/2023]
Abstract
In the protein-protein interactome, we have previously identified a significant overlap between schizophrenia risk genes and genes associated with cognitive performance. Here, we further studied this overlap to identify potential candidate drugs for repurposing to treat the cognitive symptoms in schizophrenia. We first defined a cognition-related schizophrenia interactome from network propagation analyses, and identified drugs known to target more than one protein within this network. Thereafter, we used gene expression data to further select drugs that could counteract schizophrenia-associated gene expression perturbations. Additionally, we stratified these analyses by sex to identify sex-specific pharmacological treatment options for the cognitive symptoms in schizophrenia. After excluding drugs contraindicated in schizophrenia, we identified 12 drug repurposing candidates, most of which have anti-inflammatory and neuroprotective effects. Sex-stratified analyses showed that out of these 12 drugs, four were identified in females only, three were identified in males only, and five were identified in both sexes. Based on our bioinformatics analyses of disease genetics, we suggest 12 candidate drugs that warrant further examination for repurposing to treat the cognitive symptoms in schizophrenia, and suggest that these symptoms could be addressed by sex-specific pharmacological treatment options.
Collapse
Affiliation(s)
- Elise Koch
- Department of Integrative Medical Biology, Umeå University, Umeå, Sweden; NORMENT, Centre for Mental Disorders Research, Division of Mental Health and Addiction, Oslo University Hospital, and Institute of Clinical Medicine, University of Oslo, Oslo, Norway.
| | - Karolina Kauppi
- Department of Integrative Medical Biology, Umeå University, Umeå, Sweden; Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Solna, Sweden
| | - Chi-Hua Chen
- Department of Radiology and Center for Multimodal Imaging and Genetics, University of California San Diego, USA.
| |
Collapse
|
36
|
Yang X, Yang G, Chu J. The Neural Metric Factorization for Computational Drug Repositioning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:731-741. [PMID: 35061591 DOI: 10.1109/tcbb.2022.3144429] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Computational drug repositioning aims to discover new therapeutic diseases for marketed drugs and has the advantages of low cost, short development cycle, and high controllability compared to traditional drug development. The matrix factorization model has become the cornerstone technique for computational drug repositioning due to its ease of implementation and excellent scalability. However, the matrix factorization model uses the inner product operation to represent the association between drugs and diseases, which is lacking in expressive ability. Moreover, the degree of similarity of drugs or diseases could not be implied on their respective latent factor vectors, which is not satisfy the common sense of drug discovery. Therefore, a neural metric factorization model for computational drug repositioning (NMFDR) is proposed in this work. We novelly consider the latent factor vector of drugs and diseases as a point in the high-dimensional coordinate system and propose a generalized euclidean distance to represent the association between drugs and diseases to compensate for the shortcomings of the inner product operation. Furthermore, by embedding multiple drug (disease) metrics information into the encoding space of the latent factor vector, the information about the similarity between drugs (diseases) can be reflected in the distance between latent factor vectors. Finally, we conduct wide analysis experiments on three real datasets to demonstrate the effectiveness of the above improvement points and the superiority of the NMFDR model.
Collapse
|
37
|
Ma J, Qin T, Xiang J. Disease-gene prediction based on preserving structure network embedding. Front Aging Neurosci 2023; 15:1061892. [PMID: 36896421 PMCID: PMC9990751 DOI: 10.3389/fnagi.2023.1061892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/30/2023] [Indexed: 02/23/2023] Open
Abstract
Many diseases, such as Alzheimer's disease (AD) and Parkinson's disease (PD), are caused by abnormalities or mutations of related genes. Many computational methods based on the network relationship between diseases and genes have been proposed to predict potential pathogenic genes. However, how to effectively mine the disease-gene relationship network to predict disease genes better is still an open problem. In this paper, a disease-gene-prediction method based on preserving structure network embedding (PSNE) is introduced. In order to predict pathogenic genes more effectively, a heterogeneous network with multiple types of bio-entities was constructed by integrating disease-gene associations, human protein network, and disease-disease associations. Furthermore, the low-dimension features of nodes extracted from the network were used to reconstruct a new disease-gene heterogeneous network. Compared with other advanced methods, the performance of PSNE has been confirmed more effective in disease-gene prediction. Finally, we applied the PSNE method to predict potential pathogenic genes for age-associated diseases such as AD and PD. We verified the effectiveness of these predicted potential genes by literature verification. Overall, this work provides an effective method for disease-gene prediction, and a series of high-confidence potential pathogenic genes of AD and PD which may be helpful for the experimental discovery of disease genes.
Collapse
Affiliation(s)
- Jinlong Ma
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Tian Qin
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China.,Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| |
Collapse
|
38
|
Genome-Wide Association Screening Determines Peripheral Players in Male Fertility Maintenance. Int J Mol Sci 2022; 24:ijms24010524. [PMID: 36613967 PMCID: PMC9820667 DOI: 10.3390/ijms24010524] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/21/2022] [Accepted: 12/24/2022] [Indexed: 12/30/2022] Open
Abstract
Deciphering the functional relationships of genes resulting from genome-wide screens for polymorphisms that are associated with phenotypic variations can be challenging. However, given the common association with certain phenotypes, a functional link should exist. We have tested this prediction in newly sequenced exomes of altogether 100 men representing different states of fertility. Fertile subjects presented with normal semen parameters and had naturally fathered offspring. In contrast, infertile probands were involuntarily childless and had reduced sperm quantity and quality. Genome-wide association study (GWAS) linked twelve non-synonymous single-nucleotide polymorphisms (SNPs) to fertility variation between both cohorts. The SNPs localized to nine genes for which previous evidence is in line with a role in male fertility maintenance: ANAPC1, CES1, FAM131C, HLA-DRB1, KMT2C, NOMO1, SAA1, SRGAP2, and SUSD2. Most of the SNPs residing in these genes imply amino acid exchanges that should only moderately affect protein functionality. In addition, proteins encoded by genes from present GWAS occupied peripheral positions in a protein-protein interaction network, the backbone of which consisted of genes listed in the Online Mendelian Inheritance in Man (OMIM) database for their implication in male infertility. Suggestive of an indirect impact on male fertility, the genes focused were indeed linked to each other, albeit mediated by other interactants. Thus, the chances of identifying a central player in male infertility by GWAS could be limited in general. Furthermore, the SNPs determined and the genes containing these might prove to have potential as biomarkers in the diagnosis of male fertility.
Collapse
|
39
|
Delmas M, Filangi O, Duperier C, Paulhe N, Vinson F, Rodriguez-Mier P, Giacomoni F, Jourdan F, Frainay C. Suggesting disease associations for overlooked metabolites using literature from metabolic neighbors. Gigascience 2022; 12:giad065. [PMID: 37712592 PMCID: PMC10502579 DOI: 10.1093/gigascience/giad065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Revised: 06/13/2023] [Accepted: 07/28/2023] [Indexed: 09/16/2023] Open
Abstract
In human health research, metabolic signatures extracted from metabolomics data have a strong added value for stratifying patients and identifying biomarkers. Nevertheless, one of the main challenges is to interpret and relate these lists of discriminant metabolites to pathological mechanisms. This task requires experts to combine their knowledge with information extracted from databases and the scientific literature. However, we show that most compounds (>99%) in the PubChem database lack annotated literature. This dearth of available information can have a direct impact on the interpretation of metabolic signatures, which is often restricted to a subset of significant metabolites. To suggest potential pathological phenotypes related to overlooked metabolites that lack annotated literature, we extend the "guilt-by-association" principle to literature information by using a Bayesian framework. The underlying assumption is that the literature associated with the metabolic neighbors of a compound can provide valuable insights, or an a priori, into its biomedical context. The metabolic neighborhood of a compound can be defined from a metabolic network and correspond to metabolites to which it is connected through biochemical reactions. With the proposed approach, we suggest more than 35,000 associations between 1,047 overlooked metabolites and 3,288 diseases (or disease families). All these newly inferred associations are freely available on the FORUM ftp server (see information at https://github.com/eMetaboHUB/Forum-LiteraturePropagation).
Collapse
Affiliation(s)
- Maxime Delmas
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Olivier Filangi
- IGEPP, INRAE, Institut Agro, Université de Rennes, Domaine de la Motte, 35653 Le Rheu, France
| | - Christophe Duperier
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Nils Paulhe
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Florence Vinson
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Pablo Rodriguez-Mier
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| | - Franck Giacomoni
- Université Clermont Auvergne, INRAE, UNH, Plateforme d’Exploration du Métabolisme, MetaboHUB Clermont, F-63000 Clermont-Ferrand, France
| | - Fabien Jourdan
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
- MetaboHUB-Metatoul, National Infrastructure of Metabolomics and Fluxomics, Toulouse, 31300, France
| | - Clément Frainay
- Toxalim (Research Center in Food Toxicology), Université de Toulouse, INRAE, ENVT, INP-Purpan, UPS, 31300 Toulouse, France
| |
Collapse
|
40
|
Barrio-Hernandez I, Beltrao P. Network analysis of genome-wide association studies for drug target prioritisation. Curr Opin Chem Biol 2022; 71:102206. [PMID: 36087372 DOI: 10.1016/j.cbpa.2022.102206] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Revised: 07/29/2022] [Accepted: 08/05/2022] [Indexed: 01/27/2023]
Abstract
Over the past decades, genome-wide association studies (GWAS) have led to a dramatic expansion of genetic variants implicated with human traits and diseases. These advances are expected to result in new drug targets but the identification of causal genes and the cell biology underlying human diseases from GWAS remains challenging. Here, we review protein interaction network-based methods to analyse GWAS data. These approaches can rank candidate drug targets at GWAS-associated loci or among interactors of disease genes without direct genetic support. These methods identify the cell biology affected in common across diseases, offering opportunities for drug repurposing, as well as be combined with expression data to identify focal tissues and cell types. Going forward, we expect that these methods will further improve from advances in the characterisation of context specific interaction networks and the joint analysis of rare and common genetic signals.
Collapse
Affiliation(s)
- Inigo Barrio-Hernandez
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, CB10 1SD, UK; Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK.
| | - Pedro Beltrao
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Cambridge, CB10 1SD, UK; Open Targets, Wellcome Genome Campus, Cambridge, CB10 1SA, UK; Department of Biology, Institute of Molecular Systems Biology, ETH Zurich, Zurich, 8093, Switzerland.
| |
Collapse
|
41
|
Chitra U, Park TY, Raphael BJ. NetMix2: A Principled Network Propagation Algorithm for Identifying Altered Subnetworks. J Comput Biol 2022; 29:1305-1323. [PMID: 36525308 PMCID: PMC9917315 DOI: 10.1089/cmb.2022.0336] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
A standard paradigm in computational biology is to leverage interaction networks as prior knowledge in analyzing high-throughput biological data, where the data give a score for each vertex in the network. One classical approach is the identification of altered subnetworks, or subnetworks of the interaction network that have both outlier vertex scores and a defined network topology. One class of algorithms for identifying altered subnetworks search for high-scoring subnetworks in subnetwork families with simple topological constraints, such as connected subnetworks, and have sound statistical guarantees. A second class of algorithms employ network propagation-the smoothing of vertex scores over the network using a random walk or diffusion process-and utilize the global structure of the network. However, network propagation algorithms often rely on ad hoc heuristics that lack a rigorous statistical foundation. In this work, we unify the subnetwork family and network propagation approaches by deriving the propagation family, a subnetwork family that approximates the sets of vertices ranked highly by network propagation approaches. We introduce NetMix2, a principled algorithm for identifying altered subnetworks from a wide range of subnetwork families. When using the propagation family, NetMix2 combines the advantages of the subnetwork family and network propagation approaches. NetMix2 outperforms other methods, including network propagation on simulated data, pan-cancer somatic mutation data, and genome-wide association data from multiple human diseases.
Collapse
Affiliation(s)
- Uthsav Chitra
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
| | - Tae Yoon Park
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| | - Benjamin J. Raphael
- Department of Computer Science, Princeton University, Princeton, New Jersey, USA
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, USA
| |
Collapse
|
42
|
He B, Wang K, Xiang J, Bing P, Tang M, Tian G, Guo C, Xu M, Yang J. DGHNE: network enhancement-based method in identifying disease-causing genes through a heterogeneous biomedical network. Brief Bioinform 2022; 23:6712302. [PMID: 36151744 DOI: 10.1093/bib/bbac405] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/01/2022] [Accepted: 08/21/2022] [Indexed: 12/14/2022] Open
Abstract
The identification of disease-causing genes is critical for mechanistic understanding of disease etiology and clinical manipulation in disease prevention and treatment. Yet the existing approaches in tackling this question are inadequate in accuracy and efficiency, demanding computational methods with higher identification power. Here, we proposed a new method called DGHNE to identify disease-causing genes through a heterogeneous biomedical network empowered by network enhancement. First, a disease-disease association network was constructed by the cosine similarity scores between phenotype annotation vectors of diseases, and a new heterogeneous biomedical network was constructed by using disease-gene associations to connect the disease-disease network and gene-gene network. Then, the heterogeneous biomedical network was further enhanced by using network embedding based on the Gaussian random projection. Finally, network propagation was used to identify candidate genes in the enhanced network. We applied DGHNE together with five other methods into the most updated disease-gene association database termed DisGeNet. Compared with all other methods, DGHNE displayed the highest area under the receiver operating characteristic curve and the precision-recall curve, as well as the highest precision and recall, in both the global 5-fold cross-validation and predicting new disease-gene associations. We further performed DGHNE in identifying the candidate causal genes of Parkinson's disease and diabetes mellitus, and the genes connecting hyperglycemia and diabetes mellitus. In all cases, the predicted causing genes were enriched in disease-associated gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways, and the gene-disease associations were highly evidenced by independent experimental studies.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Kun Wang
- School of Mathematical Sciences, Ocean University of China, Qingdao 266100, China
| | - Ju Xiang
- Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang 212001, Jiangsu, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing 100102, China
| | - Cheng Guo
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Miao Xu
- Broad institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China.,Geneis (Beijing) Co., Ltd., Beijing 100102, China
| |
Collapse
|
43
|
Hu S, Luo Y, Zhang Z, Xiong H, Yan W, Jiang M, Zhao B. Protein function annotation based on heterogeneous biological networks. BMC Bioinformatics 2022; 23:493. [DOI: 10.1186/s12859-022-05057-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2022] [Accepted: 11/15/2022] [Indexed: 11/19/2022] Open
Abstract
Abstract
Background
Accurate annotation of protein function is the key to understanding life at the molecular level and has great implications for biomedicine and pharmaceuticals. The rapid developments of high-throughput technologies have generated huge amounts of protein–protein interaction (PPI) data, which prompts the emergence of computational methods to determine protein function. Plagued by errors and noises hidden in PPI data, these computational methods have undertaken to focus on the prediction of functions by integrating the topology of protein interaction networks and multi-source biological data. Despite effective improvement of these computational methods, it is still challenging to build a suitable network model for integrating multiplex biological data.
Results
In this paper, we constructed a heterogeneous biological network by initially integrating original protein interaction networks, protein-domain association data and protein complexes. To prove the effectiveness of the heterogeneous biological network, we applied the propagation algorithm on this network, and proposed a novel iterative model, named Propagate on Heterogeneous Biological Networks (PHN) to score and rank functions in descending order from all functional partners, Finally, we picked out top L of these predicted functions as candidates to annotate the target protein. Our comprehensive experimental results demonstrated that PHN outperformed seven other competing approaches using cross-validation. Experimental results indicated that PHN performs significantly better than competing methods and improves the Area Under the Receiver-Operating Curve (AUROC) in Biological Process (BP), Molecular Function (MF) and Cellular Components (CC) by no less than 33%, 15% and 28%, respectively.
Conclusions
We demonstrated that integrating multi-source data into a heterogeneous biological network can preserve the complex relationship among multiplex biological data and improve the prediction accuracy of protein function by getting rid of the constraints of errors in PPI networks effectively. PHN, our proposed method, is effective for protein function prediction.
Collapse
|
44
|
Chen L, Lin D, Xu H, Li J, Lin L. WLLP: A weighted reconstruction-based linear label propagation algorithm for predicting potential therapeutic agents for COVID-19. Front Microbiol 2022; 13:1040252. [PMID: 36466666 PMCID: PMC9713947 DOI: 10.3389/fmicb.2022.1040252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 10/06/2022] [Indexed: 11/18/2022] Open
Abstract
The global coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus-2 (SARS-CoV) has led to a huge health and economic crises. However, the research required to develop new drugs and vaccines is very expensive in terms of labor, money, and time. Owing to recent advances in data science, drug-repositioning technologies have become one of the most promising strategies available for developing effective treatment options. Using the previously reported human drug virus database (HDVD), we proposed a model to predict possible drug regimens based on a weighted reconstruction-based linear label propagation algorithm (WLLP). For the drug–virus association matrix, we used the weighted K-nearest known neighbors method for preprocessing and label propagation of the network based on the linear neighborhood similarity of drugs and viruses to obtain the final prediction results. In the framework of 10 times 10-fold cross-validated area under the receiver operating characteristic (ROC) curve (AUC), WLLP exhibited excellent performance with an AUC of 0.8828 ± 0.0037 and an area under the precision-recall curve of 0.5277 ± 0.0053, outperforming the other four models used for comparison. We also predicted effective drug regimens against SARS-CoV-2, and this case study showed that WLLP can be used to suggest potential drugs for the treatment of COVID-19.
Collapse
Affiliation(s)
- Langcheng Chen
- Center of Campus Network and Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
| | - Dongying Lin
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Haojie Xu
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Jianming Li
- School of Computer Science, Guangdong University of Technology, Guangzhou, China
| | - Lieqing Lin
- Center of Campus Network and Modern Educational Technology, Guangdong University of Technology, Guangzhou, China
- *Correspondence: Lieqing Lin
| |
Collapse
|
45
|
Bi Y, Wang P. Exploring drought-responsive crucial genes in Sorghum. iScience 2022; 25:105347. [PMID: 36325072 PMCID: PMC9619295 DOI: 10.1016/j.isci.2022.105347] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2022] [Revised: 09/18/2022] [Accepted: 10/11/2022] [Indexed: 12/11/2022] Open
Abstract
Drought severely affects global food production. Sorghum is a typical drought-resistant model crop. Based on RNA-seq data for Sorghum with multiple time points and the gray correlation coefficient, this paper firstly selects candidate genes via mean variance test and constructs weighted gene differential co-expression networks (WGDCNs); then, based on guilt-by-rewiring principle, the WGDCNs and the hidden Markov random field model, drought-responsive crucial genes are identified for five developmental stages respectively. Enrichment and sequence alignment analysis reveal that the screened genes may play critical functional roles in drought responsiveness. A multilayer differential co-expression network for the screened genes reveals that Sorghum is very sensitive to pre-flowering drought. Furthermore, a crucial gene regulatory module is established, which regulates drought responsiveness via plant hormone signal transduction, MAPK cascades, and transcriptional regulations. The proposed method can well excavate crucial genes through RNA-seq data, which have implications in breeding of new varieties with improved drought tolerance. We design a method that unites gene rewiring network and Markov random field model Drought-responsive genes for five developmental stages of Sorghum are explored A multilayer network reveals that Sorghum is very sensitive to pre-flowering drought A drought-responsive crucial gene regulatory module is established for Sorghum
Collapse
|
46
|
Wang L, Peng J, Kuang L, Tan Y, Chen Z. Identification of Essential Proteins Based on Local Random Walk and Adaptive Multi-View Multi-Label Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3507-3516. [PMID: 34788220 DOI: 10.1109/tcbb.2021.3128638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Accumulating evidences have indicated that essential proteins play vital roles in human physiological process. In recent years, although researches on prediction of essential proteins have been developing rapidly, there are as well various limitations such as unsatisfactory data suitability, low accuracy of predictive results and so on. In this manuscript, a novel method called RWAMVL was proposed to predict essential proteins based on the Random Walk and the Adaptive Multi-View multi-label Learning. In RWAMVL, considering that the inherent noise is ubiquitous in existing datasets of known protein-protein interactions (PPIs), a variety of different features including biological features of proteins and topological features of PPI networks were obtained by adopting adaptive multi-view multi-label learning first. And then, an improved random walk method was designed to detect essential proteins based on these different features. Finally, in order to verify the predictive performance of RWAMVL, intensive experiments were done to compare it with multiple state-of-the-art predictive methods under different expeditionary frameworks. And as a result, RWAMVL was proven that it can achieve better prediction accuracy than all those competitive methods, which demonstrated as well that RWAMVL may be a potential tool for prediction of key proteins in the future.
Collapse
|
47
|
Koch E, Demontis D. Drug repurposing candidates to treat core symptoms in autism spectrum disorder. Front Pharmacol 2022; 13:995439. [PMID: 36172193 PMCID: PMC9510394 DOI: 10.3389/fphar.2022.995439] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 08/03/2022] [Indexed: 11/17/2022] Open
Abstract
Autism spectrum disorder (ASD) is characterized by high heritability and clinical heterogeneity. The main core symptoms are social communication deficits. There are no medications approved for the treatment of these symptoms, and medications used to treat non-specific symptoms have serious side effects. To identify potential drugs for repurposing to effectively treat ASD core symptoms, we studied ASD risk genes within networks of protein-protein interactions of gene products. We first defined an ASD network from network-based analyses, and identified approved drugs known to interact with proteins within this network. Thereafter, we evaluated if these drugs can change ASD-associated gene expression perturbations in genes in the ASD network. This was done by analyses of drug-induced versus ASD-associated gene expression, where opposite gene expression perturbations in drug versus ASD indicate that the drug could counteract ASD-associated perturbations. Four drugs showing significant (p < 0.05) opposite gene expression perturbations in drug versus ASD were identified: Loperamide, bromocriptine, drospirenone, and progesterone. These drugs act on ASD-related biological systems, indicating that these drugs could effectively treat ASD core symptoms. Based on our bioinformatics analyses of ASD genetics, we shortlist potential drug repurposing candidates that warrant clinical translation to treat core symptoms in ASD.
Collapse
Affiliation(s)
- Elise Koch
- Norwegian Centre for Mental Disorders Research (NORMENT), University of Oslo and Oslo University Hospital, Oslo, Norway
- *Correspondence: Elise Koch,
| | - Ditte Demontis
- The Lundbeck Foundation Initiative for Integrative Psychiatric Research, iPSYCH, Aarhus, Denmark
- Department of Biomedicine (Human Genetics) and Centre for Integrative Sequencing, Aarhus University, Aarhus, Denmark
- Center for Genomics and Personalized Medicine, Aarhus, Denmark
| |
Collapse
|
48
|
Khemka N, Rajkumar MS, Garg R, Jain M. Genome-wide analysis suggests the potential role of lncRNAs during seed development and seed size/weight determination in chickpea. PLANTA 2022; 256:79. [PMID: 36094579 DOI: 10.1007/s00425-022-03986-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Accepted: 08/29/2022] [Indexed: 06/15/2023]
Abstract
The integrated transcriptome data analyses suggested the plausible roles of lncRNAs during seed development in chickpea. The candidate lncRNAs associated with QTLs and those involved in miRNA-mediated seed size/weight determination in chickpea have been identified. Long non-coding RNAs (lncRNAs) are important regulators of various biological processes. Here, we identified lncRNAs at seven successive stages of seed development in small-seeded and large-seeded chickpea cultivars. In total, 4751 lncRNAs implicated in diverse biological processes were identified. Most of lncRNAs were conserved between the two cultivars, whereas only a few of them were conserved in other plants, suggesting their species-specificity. A large number of lncRNAs differentially expressed between the two chickpea cultivars associated with seed development-related processes were identified. The lncRNAs acting as precursors of miRNAs and those mimicking target protein-coding genes of miRNAs involved in seed size/weight determination, including HAIKU1, BIG SEEDS1, and SHB1, were also revealed. Further, lncRNAs located within seed size/weight associated quantitative trait loci were also detected. Overall, we present a comprehensive resource and identified candidate lncRNAs that may play important roles during seed development and seed size/weight determination in chickpea.
Collapse
Affiliation(s)
- Niraj Khemka
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | - Mohan Singh Rajkumar
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India
| | - Rohini Garg
- Department of Life Sciences, School of Natural Sciences, Shiv Nadar University, Gautam Buddha Nagar, Uttar Pradesh, 201314, India
| | - Mukesh Jain
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India.
| |
Collapse
|
49
|
Scope of repurposed drugs against the potential targets of the latest variants of SARS-CoV-2. Struct Chem 2022; 33:1585-1608. [PMID: 35938064 PMCID: PMC9346052 DOI: 10.1007/s11224-022-02020-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 07/19/2022] [Indexed: 11/21/2022]
Abstract
The unprecedented outbreak of the severe acute respiratory syndrome (SARS) Coronavirus-2, across the globe, triggered a worldwide uproar in the search for immediate treatment strategies. With no specific drug and not much data available, alternative approaches such as drug repurposing came to the limelight. To date, extensive research on the repositioning of drugs has led to the identification of numerous drugs against various important protein targets of the coronavirus strains, with hopes of the drugs working against the major variants of concerns (alpha, beta, gamma, delta, omicron) of the virus. Advancements in computational sciences have led to improved scope of repurposing via techniques such as structure-based approaches including molecular docking, molecular dynamic simulations and quantitative structure activity relationships, network-based approaches, and artificial intelligence-based approaches with other core machine and deep learning algorithms. This review highlights the various approaches to repurposing drugs from a computational biological perspective, with various mechanisms of action of the drugs against some of the major protein targets of SARS-CoV-2. Additionally, clinical trials data on potential COVID-19 repurposed drugs are also highlighted with stress on the major SARS-CoV-2 targets and the structural effect of variants on these targets. The interaction modelling of some important repurposed drugs has also been elucidated. Furthermore, the merits and demerits of drug repurposing are also discussed, with a focus on the scope and applications of the latest advancements in repurposing.
Collapse
|
50
|
Sun F, Sun J, Zhao Q. A deep learning method for predicting metabolite-disease associations via graph neural network. Brief Bioinform 2022; 23:6640005. [PMID: 35817399 DOI: 10.1093/bib/bbac266] [Citation(s) in RCA: 129] [Impact Index Per Article: 64.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2022] [Revised: 06/04/2022] [Accepted: 06/06/2022] [Indexed: 12/15/2022] Open
Abstract
Metabolism is the process by which an organism continuously replaces old substances with new substances. It plays an important role in maintaining human life, body growth and reproduction. More and more researchers have shown that the concentrations of some metabolites in patients are different from those in healthy people. Traditional biological experiments can test some hypotheses and verify their relationships but usually take a considerable amount of time and money. Therefore, it is urgent to develop a new computational method to identify the relationships between metabolites and diseases. In this work, we present a new deep learning algorithm named as graph convolutional network with graph attention network (GCNAT) to predict the potential associations of disease-related metabolites. First, we construct a heterogeneous network based on known metabolite-disease associations, metabolite-metabolite similarities and disease-disease similarities. Metabolite and disease features are encoded and learned through the graph convolutional neural network. Then, a graph attention layer is used to combine the embeddings of multiple convolutional layers, and the corresponding attention coefficients are calculated to assign different weights to the embeddings of each layer. Further, the prediction result is obtained by decoding and scoring the final synthetic embeddings. Finally, GCNAT achieves a reliable area under the receiver operating characteristic curve of 0.95 and the precision-recall curve of 0.405, which are better than the results of existing five state-of-the-art predictive methods in 5-fold cross-validation, and the case studies show that the metabolite-disease correlations predicted by our method can be successfully demonstrated by relevant experiments. We hope that GCNAT could be a useful biomedical research tool for predicting potential metabolite-disease associations in the future.
Collapse
Affiliation(s)
- Feiyue Sun
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China
| |
Collapse
|