1
|
Xu Z, Marchionni L, Wang S. MultiNEP: a multi-omics network enhancement framework for prioritizing disease genes and metabolites simultaneously. Bioinformatics 2023; 39:btad333. [PMID: 37216914 PMCID: PMC10250081 DOI: 10.1093/bioinformatics/btad333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Revised: 04/28/2023] [Accepted: 05/19/2023] [Indexed: 05/24/2023] Open
Abstract
MOTIVATION Many studies have successfully used network information to prioritize candidate omics profiles associated with diseases. The metabolome, as the link between genotypes and phenotypes, has accumulated growing attention. Using a "multi-omics" network constructed with a gene-gene network, a metabolite-metabolite network, and a gene-metabolite network to simultaneously prioritize candidate disease-associated metabolites and gene expressions could further utilize gene-metabolite interactions that are not used when prioritizing them separately. However, the number of metabolites is usually 100 times fewer than that of genes. Without accounting for this imbalance issue, we cannot effectively use gene-metabolite interactions when simultaneously prioritizing disease-associated metabolites and genes. RESULTS Here, we developed a Multi-omics Network Enhancement Prioritization (MultiNEP) framework with a weighting scheme to reweight contributions of different sub-networks in a multi-omics network to effectively prioritize candidate disease-associated metabolites and genes simultaneously. In simulation studies, MultiNEP outperforms competing methods that do not address network imbalances and identifies more true signal genes and metabolites simultaneously when we down-weight relative contributions of the gene-gene network and up-weight that of the metabolite-metabolite network to the gene-metabolite network. Applications to two human cancer cohorts show that MultiNEP prioritizes more cancer-related genes by effectively using both within- and between-omics interactions after handling network imbalance. AVAILABILITY AND IMPLEMENTATION The developed MultiNEP framework is implemented in an R package and available at: https://github.com/Karenxzr/MultiNep.
Collapse
Affiliation(s)
- Zhuoran Xu
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, United States
| | - Luigi Marchionni
- Department of Pathology and Laboratory Medicine, Weill Cornell Medicine, New York, NY 10065, United States
| | - Shuang Wang
- Department of Biostatistics, Columbia University, New York, NY 10032, United States
| |
Collapse
|
2
|
Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022; 13:genes13050764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open
Abstract
The early developmental phase is of critical importance for human health and disease later in life. To decipher the molecular mechanisms at play, current biomedical research is increasingly relying on large quantities of diverse omics data. The integration and interpretation of the different datasets pose a critical challenge towards the holistic understanding of the complex biological processes that are involved in early development. In this review, we outline the major transcriptomic and epigenetic processes and the respective datasets that are most relevant for studying the periconceptional period. We cover both basic data processing and analysis steps, as well as more advanced data integration methods. A particular focus is given to network-based methods. Finally, we review the medical applications of such integrative analyses.
Collapse
|
3
|
Ruan P, Wang S. DiSNEP: a Disease-Specific gene Network Enhancement to improve Prioritizing candidate disease genes. Brief Bioinform 2020; 22:5925270. [PMID: 33064143 DOI: 10.1093/bib/bbaa241] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Revised: 07/25/2020] [Accepted: 08/29/2020] [Indexed: 12/27/2022] Open
Abstract
Biological network-based strategies are useful in prioritizing genes associated with diseases. Several comprehensive human gene networks such as STRING, GIANT and HumanNet were developed and used in network-assisted algorithms to identify disease-associated genes. However, none of these networks are disease-specific and may not accurately reflect gene interactions for a specific disease. Aiming to improve disease gene prioritization using networks, we propose a Disease-Specific Network Enhancement Prioritization (DiSNEP) framework. DiSNEP first enhances a comprehensive gene network specifically for a disease through a diffusion process on a gene-gene similarity matrix derived from disease omics data. The enhanced disease-specific gene network thus better reflects true gene interactions for the disease and may improve prioritizing disease-associated genes subsequently. In simulations, DiSNEP that uses an enhanced disease-specific network prioritizes more true signal genes than comparison methods using a general gene network or without prioritization. Applications to prioritize cancer-associated gene expression and DNA methylation signal genes for five cancer types from The Cancer Genome Atlas (TCGA) project suggest that more prioritized candidate genes by DiSNEP are cancer-related according to the DisGeNET database than those prioritized by the comparison methods, consistently across all five cancer types considered, and for both gene expression and DNA methylation signal genes.
Collapse
|
4
|
Jiang Y, Liang Y, Wang D, Xu D, Joshi T. A dynamic programing approach to integrate gene expression data and network information for pathway model generation. Bioinformatics 2020; 36:169-176. [PMID: 31168616 DOI: 10.1093/bioinformatics/btz467] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2018] [Revised: 05/15/2019] [Accepted: 05/31/2019] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION As large amounts of biological data continue to be rapidly generated, a major focus of bioinformatics research has been aimed toward integrating these data to identify active pathways or modules under certain experimental conditions or phenotypes. Although biologically significant modules can often be detected globally by many existing methods, it is often hard to interpret or make use of the results toward pathway model generation and testing. RESULTS To address this gap, we have developed the IMPRes algorithm, a new step-wise active pathway detection method using a dynamic programing approach. IMPRes takes advantage of the existing pathway interaction knowledge in Kyoto Encyclopedia of Genes and Genomes. Omics data are then used to assign penalties to genes, interactions and pathways. Finally, starting from one or multiple seed genes, a shortest path algorithm is applied to detect downstream pathways that best explain the gene expression data. Since dynamic programing enables the detection one step at a time, it is easy for researchers to trace the pathways, which may lead to more accurate drug design and more effective treatment strategies. The evaluation experiments conducted on three yeast datasets have shown that IMPRes can achieve competitive or better performance than other state-of-the-art methods. Furthermore, a case study on human lung cancer dataset was performed and we provided several insights on genes and mechanisms involved in lung cancer, which had not been discovered before. AVAILABILITY AND IMPLEMENTATION IMPRes visualization tool is available via web server at http://digbio.missouri.edu/impres. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuexu Jiang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Yanchun Liang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Duolin Wang
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA
| | - Dong Xu
- Department of Computer Science and Technology, Jilin University, Changchun 130012, China.,Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science, Columbia, MO 65211, USA.,Informatics Institute and Christopher S. Bond Life Sciences Center, Columbia, MO 65211, USA.,Department of Health Management and Informatics, University of Missouri, Columbia, MO 65211, USA
| |
Collapse
|
5
|
IMPRes-Pro: A high dimensional multiomics integration method for in silico hypothesis generation. Methods 2020; 173:16-23. [DOI: 10.1016/j.ymeth.2019.06.013] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2019] [Revised: 06/08/2019] [Accepted: 06/13/2019] [Indexed: 01/18/2023] Open
|
6
|
Kim K, Sun H. Incorporating genetic networks into case-control association studies with high-dimensional DNA methylation data. BMC Bioinformatics 2019; 20:510. [PMID: 31640538 PMCID: PMC6805595 DOI: 10.1186/s12859-019-3040-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 08/21/2019] [Indexed: 12/23/2022] Open
Abstract
Background In human genetic association studies with high-dimensional gene expression data, it has been well known that statistical selection methods utilizing prior biological network knowledge such as genetic pathways and signaling pathways can outperform other methods that ignore genetic network structures in terms of true positive selection. In recent epigenetic research on case-control association studies, relatively many statistical methods have been proposed to identify cancer-related CpG sites and their corresponding genes from high-dimensional DNA methylation array data. However, most of existing methods are not designed to utilize genetic network information although methylation levels between linked genes in the genetic networks tend to be highly correlated with each other. Results We propose new approach that combines data dimension reduction techniques with network-based regularization to identify outcome-related genes for analysis of high-dimensional DNA methylation data. In simulation studies, we demonstrated that the proposed approach overwhelms other statistical methods that do not utilize genetic network information in terms of true positive selection. We also applied it to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. Conclusions The proposed variable selection approach can utilize prior biological network information for analysis of high-dimensional DNA methylation array data. It first captures gene level signals from multiple CpG sites using data a dimension reduction technique and then performs network-based regularization based on biological network graph information. It can select potentially cancer-related genes and genetic pathways that were missed by the existing methods. Electronic supplementary material The online version of this article (10.1186/s12859-019-3040-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kipoong Kim
- Department of Statistic, Pusan National University, Busan, 46241, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, 46241, Korea.
| |
Collapse
|
7
|
Min W, Liu J, Zhang S. Edge-group sparse PCA for network-guided high dimensional data analysis. Bioinformatics 2019; 34:3479-3487. [PMID: 29726900 DOI: 10.1093/bioinformatics/bty362] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2017] [Accepted: 05/02/2018] [Indexed: 12/14/2022] Open
Abstract
Motivation Principal component analysis (PCA) has been widely used to deal with high-dimensional gene expression data. In this study, we proposed an Edge-group Sparse PCA (ESPCA) model by incorporating the group structure from a prior gene network into the PCA framework for dimension reduction and feature interpretation. ESPCA enforces sparsity of principal component (PC) loadings through considering the connectivity of gene variables in the prior network. We developed an alternating iterative algorithm to solve ESPCA. The key of this algorithm is to solve a new k-edge sparse projection problem and a greedy strategy has been adapted to address it. Here we adopted ESPCA for analyzing multiple gene expression matrices simultaneously. By incorporating prior knowledge, our method can overcome the drawbacks of sparse PCA and capture some gene modules with better biological interpretations. Results We evaluated the performance of ESPCA using a set of artificial datasets and two real biological datasets (including TCGA pan-cancer expression data and ENCODE expression data), and compared their performance with PCA and sparse PCA. The results showed that ESPCA could identify more biologically relevant genes, improve their biological interpretations and reveal distinct sample characteristics. Availability and implementation An R package of ESPCA is available at http://page.amss.ac.cn/shihua.zhang/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenwen Min
- School of Computer Science, Wuhan University, Wuhan, China
| | - Juan Liu
- School of Computer Science, Wuhan University, Wuhan, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematics Sciences, University of Chinese Academy of Sciences, Beijing, China.,Center for Excellence in Animal Evolution and Genetics, Chinese Academy of Sciences, Kunming, China
| |
Collapse
|
8
|
Sun H, Wang Y, Chen Y, Li Y, Wang S. pETM: a penalized Exponential Tilt Model for analysis of correlated high-dimensional DNA methylation data. Bioinformatics 2018; 33:1765-1772. [PMID: 28165116 DOI: 10.1093/bioinformatics/btx064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2016] [Accepted: 01/31/2017] [Indexed: 12/31/2022] Open
Abstract
Motivation DNA methylation plays an important role in many biological processes and cancer progression. Recent studies have found that there are also differences in methylation variations in different groups other than differences in methylation means. Several methods have been developed that consider both mean and variance signals in order to improve statistical power of detecting differentially methylated loci. Moreover, as methylation levels of neighboring CpG sites are known to be strongly correlated, methods that incorporate correlations have also been developed. We previously developed a network-based penalized logistic regression for correlated methylation data, but only focusing on mean signals. We have also developed a generalized exponential tilt model that captures both mean and variance signals but only examining one CpG site at a time. Results In this article, we proposed a penalized Exponential Tilt Model (pETM) using network-based regularization that captures both mean and variance signals in DNA methylation data and takes into account the correlations among nearby CpG sites. By combining the strength of the two models we previously developed, we demonstrated the superior power and better performance of the pETM method through simulations and the applications to the 450K DNA methylation array data of the four breast invasive carcinoma cancer subtypes from The Cancer Genome Atlas (TCGA) project. The developed pETM method identifies many cancer-related methylation loci that were missed by our previously developed method that considers correlations among nearby methylation loci but not variance signals. Availability and Implementation The R package 'pETM' is publicly available through CRAN: http://cran.r-project.org . Contact sw2206@columbia.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hokeun Sun
- Department of Statistics, Pusan National University, Busan, Korea
| | - Ya Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Yong Chen
- Division of Biostatistics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC, USA.,Department of Genetics, University of North Carolina, Chapel Hill, NC, USA.,Department of Computer Science, University of North Carolina, Chapel Hill, NC, USA
| | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
9
|
Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet 2017; 19:129-147. [PMID: 29129922 DOI: 10.1038/nrg.2017.86] [Citation(s) in RCA: 168] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information.
Collapse
|
10
|
Wang Y, Teschendorff AE, Widschwendter M, Wang S. Accounting for differential variability in detecting differentially methylated regions. Brief Bioinform 2017; 20:47-57. [DOI: 10.1093/bib/bbx097] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2017] [Indexed: 12/11/2022] Open
Affiliation(s)
- Ya Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Andrew E Teschendorff
- Department of Women's Cancer, University College London, London, UK
- CAS Key Lab of Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
- Statistical Cancer Genomics, UCL Cancer Institute, University College London, London, UK
| | | | - Shuang Wang
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| |
Collapse
|
11
|
Zhong D, Cen H. Aberrant promoter methylation profiles and association with survival in patients with hepatocellular carcinoma. Onco Targets Ther 2017; 10:2501-2509. [PMID: 28507442 PMCID: PMC5428754 DOI: 10.2147/ott.s128058] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
The aim of this study was to investigate the prognostic and diagnostic value of genes with promoter methylation in hepatocellular carcinoma (HCC) patients. On the basis of The Cancer Genome Atlas data, we identified genes with differentially methylated promoters in HCC tissues and adjacent non-tumor tissues, using the linear models for microarray data approach. Cox proportional hazard regression analysis was applied to access the prognostic value of identified differentially methylated genes. The diagnostic value of the genes was evaluated through receiver operating characteristic. Pathway analyses were performed to illustrate biological functions of the identified genes. Compared to adjacent tissues, 77 genes with hypermethylated promoters and 2,412 genes with hypomethylated promoters were identified in HCC. The promoter hypomethylations of RNA5SP38, IL21, SDC4P, and MIR4439 were found to be associated with HCC patient survival (P=0.035, 0.040, 0.004, and 0.024, respectively). Hypomethylated SDC4P was associated with a better prognosis (hazard ratio, 0.482; 95% confidence interval [CI], −0.147–1.110; P=0.007). The combination of the promoter hypomethylations with RNA5SP38, IL21, and SDC4P showed an area under receiver operating characteristic curves of 0.975 (95% CI, 0.962–0.989; P=4.811E-25). Several pathways, including olfactory transduction, cytokine–cytokine receptor interaction, natural killer cell–mediated cytotoxicity, as well as inflammation mediated by chemokine and cytokine signaling pathway, were annotated with the hypomethylated promoter genes. SDC4P promoter hypomethylation may be a potential prognosis biomarker. A panel of promoter methylations in RNA5SP38, IL21, and SDC4P was proven a novel approach to diagnosis HCC. The pathway analysis defined the extensive functional role of DNA hypomethylation in cancer.
Collapse
Affiliation(s)
- Dani Zhong
- Department of Chemotherapy, Tumor Hospital of Guangxi Medical University, Nanning, Guangxi, People's Republic of China
| | - Hong Cen
- Department of Chemotherapy, Tumor Hospital of Guangxi Medical University, Nanning, Guangxi, People's Republic of China
| |
Collapse
|