1
|
Xu M, Abdullah NA, Md Sabri AQ. A method to improve the prediction performance of cancer-gene association by screening negative training samples through gene network data. Comput Biol Chem 2024; 108:107997. [PMID: 38154318 DOI: 10.1016/j.compbiolchem.2023.107997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2023] [Revised: 11/03/2023] [Accepted: 12/03/2023] [Indexed: 12/30/2023]
Abstract
This work focuses on data sampling in cancer-gene association prediction. Currently, researchers are using machine learning methods to predict genes that are more likely to produce cancer-causing mutations. To improve the performance of machine learning models, methods have been proposed, one of which is to improve the quality of the training data. Existing methods focus mainly on positive data, i.e. cancer driver genes, for screening selection. This paper proposes a low-cancer-related gene screening method based on gene network and graph theory algorithms to improve the negative samples selection. Genetic data with low cancer correlation is used as negative training samples. After experimental verification, using the negative samples screened by this method to train the cancer gene classification model can improve prediction performance. The biggest advantage of this method is that it can be easily combined with other methods that focus on enhancing the quality of positive training samples. It has been demonstrated that significant improvement is achieved by combining this method with three state-of-the-arts cancer gene prediction methods.
Collapse
Affiliation(s)
- Mingzhe Xu
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia; School of Energy and Intelligence Engineering, Henan University of Animal Husbandry and Economy, #6 North Longzihu Rd, Zhengzhou 450000, China.
| | - Nor Aniza Abdullah
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| | - Aznul Qalid Md Sabri
- Faculty of Computer Science & Information Technology, Universiti Malaya, Kuala Lumpur, 50603 Malaysia.
| |
Collapse
|
2
|
Na D, Lim DH, Hong JS, Lee HM, Cho D, Yu MS, Shaker B, Ren J, Lee B, Song JG, Oh Y, Lee K, Oh KS, Lee MY, Choi MS, Choi HS, Kim YH, Bui JM, Lee K, Kim HW, Lee YS, Gsponer J. A multi-layered network model identifies Akt1 as a common modulator of neurodegeneration. Mol Syst Biol 2023; 19:e11801. [PMID: 37984409 DOI: 10.15252/msb.202311801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/25/2023] [Accepted: 10/27/2023] [Indexed: 11/22/2023] Open
Abstract
The accumulation of misfolded and aggregated proteins is a hallmark of neurodegenerative proteinopathies. Although multiple genetic loci have been associated with specific neurodegenerative diseases (NDs), molecular mechanisms that may have a broader relevance for most or all proteinopathies remain poorly resolved. In this study, we developed a multi-layered network expansion (MLnet) model to predict protein modifiers that are common to a group of diseases and, therefore, may have broader pathophysiological relevance for that group. When applied to the four NDs Alzheimer's disease (AD), Huntington's disease, and spinocerebellar ataxia types 1 and 3, we predicted multiple members of the insulin pathway, including PDK1, Akt1, InR, and sgg (GSK-3β), as common modifiers. We validated these modifiers with the help of four Drosophila ND models. Further evaluation of Akt1 in human cell-based ND models revealed that activation of Akt1 signaling by the small molecule SC79 increased cell viability in all models. Moreover, treatment of AD model mice with SC79 enhanced their long-term memory and ameliorated dysregulated anxiety levels, which are commonly affected in AD patients. These findings validate MLnet as a valuable tool to uncover molecular pathways and proteins involved in the pathophysiology of entire disease groups and identify potential therapeutic targets that have relevance across disease boundaries. MLnet can be used for any group of diseases and is available as a web tool at http://ssbio.cau.ac.kr/software/mlnet.
Collapse
Affiliation(s)
- Dokyun Na
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Do-Hwan Lim
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
- School of Systems Biomedical Science, Soongsil University, Seoul, Republic of Korea
| | - Jae-Sang Hong
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
- Center for Systems Biology, Massachusetts General Hospital, Boston, MA, USA
| | - Hyang-Mi Lee
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Daeahn Cho
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Myeong-Sang Yu
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Bilal Shaker
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Jun Ren
- Department of Biomedical Engineering, Chung-Ang University, Seoul, Republic of Korea
| | - Bomi Lee
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Jae Gwang Song
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Yuna Oh
- Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - Kyungeun Lee
- Korea Institute of Science and Technology, Seoul, Republic of Korea
| | - Kwang-Seok Oh
- Information-based Drug Research Center, Korea Research Institute of Chemical Technology, Deajeon, Republic of Korea
| | - Mi Young Lee
- Information-based Drug Research Center, Korea Research Institute of Chemical Technology, Deajeon, Republic of Korea
| | - Min-Seok Choi
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Han Saem Choi
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Yang-Hee Kim
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Jennifer M Bui
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| | - Kangseok Lee
- Department of Life Science, Chung-Ang University, Seoul, Republic of Korea
| | - Hyung Wook Kim
- College of Life Sciences, Sejong University, Seoul, Republic of Korea
| | - Young Sik Lee
- College of Life Sciences and Biotechnology, Korea University, Seoul, Republic of Korea
| | - Jörg Gsponer
- Department of Biochemistry and Molecular Biology, Michael Smith Laboratories, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
3
|
Acencio ML, Ostaszewski M, Mazein A, Rosenstiel P, Aden K, Mishra N, Andersen V, Sidiropoulos P, Banos A, Filia A, Rahmouni S, Finckh A, Gu W, Schneider R, Satagopam V. The SYSCID map: a graphical and computational resource of molecular mechanisms across rheumatoid arthritis, systemic lupus erythematosus and inflammatory bowel disease. Front Immunol 2023; 14:1257321. [PMID: 38022524 PMCID: PMC10646502 DOI: 10.3389/fimmu.2023.1257321] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
Chronic inflammatory diseases (CIDs), including inflammatory bowel disease (IBD), rheumatoid arthritis (RA) and systemic lupus erythematosus (SLE) are thought to emerge from an impaired complex network of inter- and intracellular biochemical interactions among several proteins and small chemical compounds under strong influence of genetic and environmental factors. CIDs are characterised by shared and disease-specific processes, which is reflected by partially overlapping genetic risk maps and pathogenic cells (e.g., T cells). Their pathogenesis involves a plethora of intracellular pathways. The translation of the research findings on CIDs molecular mechanisms into effective treatments is challenging and may explain the low remission rates despite modern targeted therapies. Modelling CID-related causal interactions as networks allows us to tackle the complexity at a systems level and improve our understanding of the interplay of key pathways. Here we report the construction, description, and initial applications of the SYSCID map (https://syscid.elixir-luxembourg.org/), a mechanistic causal interaction network covering the molecular crosstalk between IBD, RA and SLE. We demonstrate that the map serves as an interactive, graphical review of IBD, RA and SLE molecular mechanisms, and helps to understand the complexity of omics data. Examples of such application are illustrated using transcriptome data from time-series gene expression profiles following anti-TNF treatment and data from genome-wide associations studies that enable us to suggest potential effects to altered pathways and propose possible mechanistic biomarkers of treatment response.
Collapse
Affiliation(s)
- Marcio Luis Acencio
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Marek Ostaszewski
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Alexander Mazein
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Philip Rosenstiel
- Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Konrad Aden
- Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel and University Medical Center Schleswig-Holstein, Kiel, Germany
- Department of Internal Medicine I, University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Neha Mishra
- Institute of Clinical Molecular Biology, Christian-Albrechts-University Kiel and University Medical Center Schleswig-Holstein, Kiel, Germany
| | - Vibeke Andersen
- Diagnostics and Clinical Research Unit, Institute of Regional Health Research, University Hospital of Southern Denmark, Aabenraa, Denmark
- Institute of Molecular Medicine, University of Southern Denmark, Odense, Denmark
| | - Prodromos Sidiropoulos
- Rheumatology and Clinical Immunology, Medical School, University of Crete, Heraklion, Greece
- Laboratory of Rheumatology, Autoimmunity and Inflammation, Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology (IMBB-FORTH), Heraklion, Greece
| | - Aggelos Banos
- Autoimmunity and Inflammation Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens and Laboratory of Molecular Hematology, Democritus University of Thrace, University Hospital of Alexandroupolis, Alexandroupolis, Greece
| | - Anastasia Filia
- Autoimmunity and Inflammation Laboratory, Biomedical Research Foundation of the Academy of Athens, Athens and Laboratory of Molecular Hematology, Democritus University of Thrace, University Hospital of Alexandroupolis, Alexandroupolis, Greece
| | - Souad Rahmouni
- Unit of Animal Genomics, GIGA-Institute, University of Liège, Liège, Belgium
| | - Axel Finckh
- Rheumatology Division, Geneva University Hospital (HUG), Geneva, Switzerland
- Geneva Center for Inflammation Research (GCIR), University of Geneva (UNIGE), Geneva, Switzerland
| | - Wei Gu
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Reinhard Schneider
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Esch-sur-Alzette, Luxembourg
| | - Venkata Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Esch-sur-Alzette, Luxembourg
- ELIXIR Luxembourg, Esch-sur-Alzette, Luxembourg
| |
Collapse
|
4
|
Xu Z, Marchionni L, Wang S. MultiNEP: a multi-omics network enhancement framework for prioritizing disease genes and metabolites simultaneously. Bioinformatics 2023; 39:btad333. [PMID: 37216914 PMCID: PMC10250081 DOI: 10.1093/bioinformatics/btad333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 04/28/2023] [Accepted: 05/19/2023] [Indexed: 05/24/2023] Open
Abstract
MOTIVATION Many studies have successfully used network information to prioritize candidate omics profiles associated with diseases. The metabolome, as the link between genotypes and phenotypes, has accumulated growing attention. Using a "multi-omics" network constructed with a gene-gene network, a metabolite-metabolite network, and a gene-metabolite network to simultaneously prioritize candidate disease-associated metabolites and gene expressions could further utilize gene-metabolite interactions that are not used when prioritizing them separately. However, the number of metabolites is usually 100 times fewer than that of genes. Without accounting for this imbalance issue, we cannot effectively use gene-metabolite interactions when simultaneously prioritizing disease-associated metabolites and genes. RESULTS Here, we developed a Multi-omics Network Enhancement Prioritization (MultiNEP) framework with a weighting scheme to reweight contributions of different sub-networks in a multi-omics network to effectively prioritize candidate disease-associated metabolites and genes simultaneously. In simulation studies, MultiNEP outperforms competing methods that do not address network imbalances and identifies more true signal genes and metabolites simultaneously when we down-weight relative contributions of the gene-gene network and up-weight that of the metabolite-metabolite network to the gene-metabolite network. Applications to two human cancer cohorts show that MultiNEP prioritizes more cancer-related genes by effectively using both within- and between-omics interactions after handling network imbalance. AVAILABILITY AND IMPLEMENTATION The developed MultiNEP framework is implemented in an R package and available at: https://github.com/Karenxzr/MultiNep.
Collapse
Affiliation(s)
- Zhuoran Xu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, United States
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, United States
| | - Shuang Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
5
|
Yang T, Zhao S, Sun N, Zhao Y, Wang H, Zhang Y, Hou X, Tang Y, Gao X, Fan H. Network pharmacology and in vivo studies reveal the pharmacological effects and molecular mechanisms of Celastrol against acute hepatic injury induced by LPS. Int Immunopharmacol 2023; 117:109898. [PMID: 36827925 DOI: 10.1016/j.intimp.2023.109898] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 01/28/2023] [Accepted: 02/12/2023] [Indexed: 02/24/2023]
Abstract
Sepsis is currently the main factor of death in the ICU, and the liver, as an important organ of immunity and stable metabolism, can be acutely damaged during sepsis, and the mortality rate of patients with sepsis complicated by acute liver injury is greatly increased. Celastrol (CEL) is derived from the root bark of Tripterygium wilfordii Hook.f.. As a traditional Chinese medicine, CEL has anti-inflammatory, anti-cancer, anti-oxidant, and other biological activities. Obtain CEL and AHI intersection targets via database and construct protein-protein interaction (PPI) network by STRING. GO functional enrichment and KEGG pathway analyses were performed by R studio. Targets were finally selected to perform molecular docking simulations with CEL. In vivo experiments based on the model of AHI were established by intraperitoneal injection of Lipopolysaccharide (LPS) 4 h, and pre-treated with CEL (0.5 mg/kg, 1 mg/kg, 1.5 mg/kg). The results are as follows: 273 genes with the intersection of CEL and AHI were obtained, and GO and KEGG enrichment analysis were used to design the mechanism of inflammation, apoptosis, and oxidative stress-related injury. By constructing the PPI network selected top 10 targets are: STAT3, RELA, MAPK1, MAPK3, TP53, AKT1, HSP90AA1, JUN, TNF, MAPK14, predicted CEL protection AHI design related pathways of MAPK and PI3K/AKT-related signal pathways. In vivo experiments, CEL inhibited the activation of MAPK and PI3K/AKT related pathways, reduced inflammatory response, apoptosis, and oxidative stress, and significantly improved LPS-induced AHI. In summary, this study predicted the mechanisms involved in the protective effect of CEL on AHI through network pharmacology. In vivo, CEL inhibited MAPK and PI3K/AKT-related signaling pathways, and reduced inflammatory response, apoptosis, and oxidative stress to protect LPS-induced AHI.
Collapse
Affiliation(s)
- Tianyuan Yang
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Shuping Zhao
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Ning Sun
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Yuan Zhao
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Hui Wang
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Yuntong Zhang
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Xiaoyu Hou
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Yulin Tang
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China
| | - Xiang Gao
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China.
| | - Honggang Fan
- Heilongjiang Key Laboratory for Laboratory Animals and Comparative Medicine, College of Veterinary Medicine, Northeast Agricultural University, Harbin, PR China.
| |
Collapse
|
6
|
Zhang Y, Xiang J, Tang L, Yang J, Li J. PGAGP: Predicting pathogenic genes based on adaptive network embedding algorithm. Front Genet 2023; 13:1087784. [PMID: 36744177 PMCID: PMC9895109 DOI: 10.3389/fgene.2022.1087784] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 12/09/2022] [Indexed: 01/21/2023] Open
Abstract
The study of disease-gene associations is an important topic in the field of computational biology. The accumulation of massive amounts of biomedical data provides new possibilities for exploring potential relations between diseases and genes through computational strategy, but how to extract valuable information from the data to predict pathogenic genes accurately and rapidly is currently a challenging and meaningful task. Therefore, we present a novel computational method called PGAGP for inferring potential pathogenic genes based on an adaptive network embedding algorithm. The PGAGP algorithm is to first extract initial features of nodes from a heterogeneous network of diseases and genes efficiently and effectively by Gaussian random projection and then optimize the features of nodes by an adaptive refining process. These low-dimensional features are used to improve the disease-gene heterogenous network, and we apply network propagation to the improved heterogenous network to predict pathogenic genes more effectively. By a series of experiments, we study the effect of PGAGP's parameters and integrated strategies on predictive performance and confirm that PGAGP is better than the state-of-the-art algorithms. Case studies show that many of the predicted candidate genes for specific diseases have been implied to be related to these diseases by literature verification and enrichment analysis, which further verifies the effectiveness of PGAGP. Overall, this work provides a useful solution for mining disease-gene heterogeneous network to predict pathogenic genes more effectively.
Collapse
Affiliation(s)
- Yan Zhang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
| | - Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha, China
- School of Information Science and Engineering, Changsha Medical University, Changsha, China
- Academician Workstation, Changsha Medical University, Changsha, China
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| | - Liang Tang
- Academician Workstation, Changsha Medical University, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha, China
- Qingdao Geneis Institute of Big Data Mining and Precision Medicine, Qingdao, China
- Geneis Beijing Co., Ltd, Beijing, China
| | - Jianming Li
- Academician Workstation, Changsha Medical University, Changsha, China
- Department of Basic Medical Sciences and Neuroscience Research Center, Changsha Medical University, Changsha, China
| |
Collapse
|
7
|
Ma J, Qin T, Xiang J. Disease-gene prediction based on preserving structure network embedding. Front Aging Neurosci 2023; 15:1061892. [PMID: 36896421 PMCID: PMC9990751 DOI: 10.3389/fnagi.2023.1061892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/30/2023] [Indexed: 02/23/2023] Open
Abstract
Many diseases, such as Alzheimer's disease (AD) and Parkinson's disease (PD), are caused by abnormalities or mutations of related genes. Many computational methods based on the network relationship between diseases and genes have been proposed to predict potential pathogenic genes. However, how to effectively mine the disease-gene relationship network to predict disease genes better is still an open problem. In this paper, a disease-gene-prediction method based on preserving structure network embedding (PSNE) is introduced. In order to predict pathogenic genes more effectively, a heterogeneous network with multiple types of bio-entities was constructed by integrating disease-gene associations, human protein network, and disease-disease associations. Furthermore, the low-dimension features of nodes extracted from the network were used to reconstruct a new disease-gene heterogeneous network. Compared with other advanced methods, the performance of PSNE has been confirmed more effective in disease-gene prediction. Finally, we applied the PSNE method to predict potential pathogenic genes for age-associated diseases such as AD and PD. We verified the effectiveness of these predicted potential genes by literature verification. Overall, this work provides an effective method for disease-gene prediction, and a series of high-confidence potential pathogenic genes of AD and PD which may be helpful for the experimental discovery of disease genes.
Collapse
Affiliation(s)
- Jinlong Ma
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Tian Qin
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China.,Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| |
Collapse
|
8
|
He B, Wang K, Xiang J, Bing P, Tang M, Tian G, Guo C, Xu M, Yang J. DGHNE: network enhancement-based method in identifying disease-causing genes through a heterogeneous biomedical network. Brief Bioinform 2022; 23:6712302. [PMID: 36151744 DOI: 10.1093/bib/bbac405] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2022] [Revised: 08/01/2022] [Accepted: 08/21/2022] [Indexed: 12/14/2022] Open
Abstract
The identification of disease-causing genes is critical for mechanistic understanding of disease etiology and clinical manipulation in disease prevention and treatment. Yet the existing approaches in tackling this question are inadequate in accuracy and efficiency, demanding computational methods with higher identification power. Here, we proposed a new method called DGHNE to identify disease-causing genes through a heterogeneous biomedical network empowered by network enhancement. First, a disease-disease association network was constructed by the cosine similarity scores between phenotype annotation vectors of diseases, and a new heterogeneous biomedical network was constructed by using disease-gene associations to connect the disease-disease network and gene-gene network. Then, the heterogeneous biomedical network was further enhanced by using network embedding based on the Gaussian random projection. Finally, network propagation was used to identify candidate genes in the enhanced network. We applied DGHNE together with five other methods into the most updated disease-gene association database termed DisGeNet. Compared with all other methods, DGHNE displayed the highest area under the receiver operating characteristic curve and the precision-recall curve, as well as the highest precision and recall, in both the global 5-fold cross-validation and predicting new disease-gene associations. We further performed DGHNE in identifying the candidate causal genes of Parkinson's disease and diabetes mellitus, and the genes connecting hyperglycemia and diabetes mellitus. In all cases, the predicted causing genes were enriched in disease-associated gene ontology terms and Kyoto Encyclopedia of Genes and Genomes pathways, and the gene-disease associations were highly evidenced by independent experimental studies.
Collapse
Affiliation(s)
- Binsheng He
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Kun Wang
- School of Mathematical Sciences, Ocean University of China, Qingdao 266100, China
| | - Ju Xiang
- Academician Workstation, Changsha Medical University, Changsha 410219, China
| | - Pingping Bing
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China
| | - Min Tang
- School of Life Sciences, Jiangsu University, Zhenjiang 212001, Jiangsu, China
| | - Geng Tian
- Geneis (Beijing) Co., Ltd., Beijing 100102, China
| | - Cheng Guo
- Center for Infection and Immunity, Mailman School of Public Health, Columbia University, New York, NY, 10032, USA
| | - Miao Xu
- Broad institute of MIT and Harvard, 415 Main Street, Cambridge, MA 02142, USA
| | - Jialiang Yang
- Academician Workstation, Changsha Medical University, Changsha 410219, China.,Hunan Key Laboratory of the Research and Development of Novel Pharmaceutical Preparations, Changsha Medical University, Changsha 410219, P. R. China.,School of pharmacy, Changsha Medical University, Changsha 410219, P. R. China.,Geneis (Beijing) Co., Ltd., Beijing 100102, China
| |
Collapse
|
9
|
Xing X, Yang F, Li H, Zhang J, Zhao Y, Gao M, Huang J, Yao J. Multi-level attention graph neural network based on co-expression gene modules for disease diagnosis and prognosis. Bioinformatics 2022; 38:2178-2186. [PMID: 35157021 DOI: 10.1093/bioinformatics/btac088] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2021] [Revised: 01/29/2022] [Accepted: 02/09/2022] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION Advanced deep learning techniques have been widely applied in disease diagnosis and prognosis with clinical omics, especially gene expression data. In the regulation of biological processes and disease progression, genes often work interactively rather than individually. Therefore, investigating gene association information and co-functional gene modules can facilitate disease state prediction. RESULTS To explore the gene modules and inter-gene relational information contained in the omics data, we propose a novel multi-level attention graph neural network (MLA-GNN) for disease diagnosis and prognosis. Specifically, we format omics data into co-expression graphs via weighted correlation network analysis, and then construct multi-level graph features, finally fuse them through a well-designed multi-level graph feature fully fusion module to conduct predictions. For model interpretation, a novel full-gradient graph saliency mechanism is developed to identify the disease-relevant genes. MLA-GNN achieves state-of-the-art performance on transcriptomic data from TCGA-LGG/TCGA-GBM and proteomic data from coronavirus disease 2019 (COVID-19)/non-COVID-19 patient sera. More importantly, the relevant genes selected by our model are interpretable and are consistent with the clinical understanding. AVAILABILITYAND IMPLEMENTATION The codes are available at https://github.com/TencentAILabHealthcare/MLA-GNN.
Collapse
Affiliation(s)
- Xiaohan Xing
- Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong 999077, China.,AI Lab, Tencent, Shenzhen 518000, China
| | - Fan Yang
- AI Lab, Tencent, Shenzhen 518000, China
| | - Hang Li
- AI Lab, Tencent, Shenzhen 518000, China.,School of Informatics, Xiamen University, Xiamen 361005, China
| | - Jun Zhang
- AI Lab, Tencent, Shenzhen 518000, China
| | - Yu Zhao
- AI Lab, Tencent, Shenzhen 518000, China
| | - Mingxuan Gao
- AI Lab, Tencent, Shenzhen 518000, China.,School of Informatics, Xiamen University, Xiamen 361005, China
| | | | | |
Collapse
|
10
|
Xiang J, Meng X, Zhao Y, Wu FX, Li M. HyMM: hybrid method for disease-gene prediction by integrating multiscale module structure. Brief Bioinform 2022; 23:6547263. [PMID: 35275996 DOI: 10.1093/bib/bbac072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2021] [Revised: 01/18/2022] [Accepted: 02/13/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identifying disease-related genes is an important issue in computational biology. Module structure widely exists in biomolecule networks, and complex diseases are usually thought to be caused by perturbations of local neighborhoods in the networks, which can provide useful insights for the study of disease-related genes. However, the mining and effective utilization of the module structure is still challenging in such issues as a disease gene prediction. RESULTS We propose a hybrid disease-gene prediction method integrating multiscale module structure (HyMM), which can utilize multiscale information from local to global structure to more effectively predict disease-related genes. HyMM extracts module partitions from local to global scales by multiscale modularity optimization with exponential sampling, and estimates the disease relatedness of genes in partitions by the abundance of disease-related genes within modules. Then, a probabilistic model for integration of gene rankings is designed in order to integrate multiple predictions derived from multiscale module partitions and network propagation, and a parameter estimation strategy based on functional information is proposed to further enhance HyMM's predictive power. By a series of experiments, we reveal the importance of module partitions at different scales, and verify the stable and good performance of HyMM compared with eight other state-of-the-arts and its further performance improvement derived from the parameter estimation. CONCLUSIONS The results confirm that HyMM is an effective framework for integrating multiscale module structure to enhance the ability to predict disease-related genes, which may provide useful insights for the study of the multiscale module structure and its application in such issues as a disease-gene prediction.
Collapse
Affiliation(s)
- Ju Xiang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China; Department of Basic Medical Sciences & Academician Workstation, Changsha Medical University, Changsha, Hunan 410219, China
| | - Xiangmao Meng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, China
| |
Collapse
|
11
|
de Weerd HA, Åkesson J, Guala D, Gustafsson M, Lubovac-Pilav Z. MODalyseR-a novel software for inference of disease module hub regulators identified a putative multiple sclerosis regulator supported by independent eQTL data. BIOINFORMATICS ADVANCES 2022; 2:vbac006. [PMID: 36699378 PMCID: PMC9710626 DOI: 10.1093/bioadv/vbac006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/14/2021] [Revised: 01/04/2022] [Accepted: 01/24/2022] [Indexed: 02/01/2023]
Abstract
Motivation Network-based disease modules have proven to be a powerful concept for extracting knowledge about disease mechanisms, predicting for example disease risk factors and side effects of treatments. Plenty of tools exist for the purpose of module inference, but less effort has been put on simultaneously utilizing knowledge about regulatory mechanisms for predicting disease module hub regulators. Results We developed MODalyseR, a novel software for identifying disease module regulators and reducing modules to the most disease-associated genes. This pipeline integrates and extends previously published software packages MODifieR and ComHub and hereby provides a user-friendly network medicine framework combining the concepts of disease modules and hub regulators for precise disease gene identification from transcriptomics data. To demonstrate the usability of the tool, we designed a case study for multiple sclerosis that revealed IKZF1 as a promising hub regulator, which was supported by independent ChIP-seq data. Availability and implementation MODalyseR is available as a Docker image at https://hub.docker.com/r/ddeweerd/modalyser with user guide and installation instructions found at https://gustafsson-lab.gitlab.io/MODalyseR/. Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Hendrik A de Weerd
- School of Bioscience, Systems Biology Research Center, University of Skövde, Skövde 541 45, Sweden,Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - Julia Åkesson
- School of Bioscience, Systems Biology Research Center, University of Skövde, Skövde 541 45, Sweden,Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden
| | - Dimitri Guala
- Department of Biochemistry and Biophysics, Stockholm University, Solna 17121, Sweden,Merck AB, Solna 16970, Sweden
| | - Mika Gustafsson
- Department of Physics, Chemistry and Biology, Linköping University, Linköping 581 83, Sweden,To whom correspondence should be addressed. or
| | - Zelmina Lubovac-Pilav
- School of Bioscience, Systems Biology Research Center, University of Skövde, Skövde 541 45, Sweden,To whom correspondence should be addressed. or
| |
Collapse
|
12
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|