1
|
Allele-specific expression of GATA2 due to epigenetic dysregulation in CEBPA double-mutant AML. Blood 2021; 138:160-177. [PMID: 33831168 DOI: 10.1182/blood.2020009244] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Accepted: 03/24/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptional deregulation is a central event in the development of acute myeloid leukemia (AML). To identify potential disturbances in gene regulation, we conducted an unbiased screen of allele-specific expression (ASE) in 209 AML cases. The gene encoding GATA binding protein 2 (GATA2) displayed ASE more often than any other myeloid- or cancer-related gene. GATA2 ASE was strongly associated with CEBPA double mutations (DMs), with 95% of cases presenting GATA2 ASE. In CEBPA DM AML with GATA2 mutations, the mutated allele was preferentially expressed. We found that GATA2 ASE was a somatic event lost in complete remission, supporting the notion that it plays a role in CEBPA DM AML. Acquisition of GATA2 ASE involved silencing of 1 allele via promoter methylation and concurrent overactivation of the other allele, thereby preserving expression levels. Notably, promoter methylation was also lost in remission along with GATA2 ASE. In summary, we propose that GATA2 ASE is acquired by epigenetic mechanisms and is a prerequisite for the development of AML with CEBPA DMs. This finding constitutes a novel example of an epigenetic hit cooperating with a genetic hit in the pathogenesis of AML.
Collapse
|
2
|
Mok L, Park T. HisCoM-PAGE: software for hierarchical structural component models for pathway analysis of gene expression data. Genomics Inform 2019; 17:e45. [PMID: 31896245 PMCID: PMC6944051 DOI: 10.5808/gi.2019.17.4.e45] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 11/22/2019] [Indexed: 12/04/2022] Open
Abstract
To identify pathways associated with survival phenotypes using gene expression data, we recently proposed the hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE) method. The HisCoM-PAGE software can consider hierarchical structural relationships between genes and pathways and analyze multiple pathways simultaneously. It can be applied to various types of gene expression data, such as microarray data or RNA sequencing data. We expect that the HisCoM-PAGE software will make our method more easily accessible to researchers who want to perform pathway analysis for survival times.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
- Corresponding author: E-mail:
| |
Collapse
|
3
|
Mok L, Kim Y, Lee S, Choi S, Lee S, Jang JY, Park T. HisCoM-PAGE: Hierarchical Structural Component Models for Pathway Analysis of Gene Expression Data. Genes (Basel) 2019; 10:E931. [PMID: 31739607 PMCID: PMC6896173 DOI: 10.3390/genes10110931] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2019] [Revised: 11/06/2019] [Accepted: 11/07/2019] [Indexed: 01/10/2023] Open
Abstract
Although there have been several analyses for identifying cancer-associated pathways, based on gene expression data, most of these are based on single pathway analyses, and thus do not consider correlations between pathways. In this paper, we propose a hierarchical structural component model for pathway analysis of gene expression data (HisCoM-PAGE), which accounts for the hierarchical structure of genes and pathways, as well as the correlations among pathways. Specifically, HisCoM-PAGE focuses on the survival phenotype and identifies its associated pathways. Moreover, its application to real biological data analysis of pancreatic cancer data demonstrated that HisCoM-PAGE could successfully identify pathways associated with pancreatic cancer prognosis. Simulation studies comparing the performance of HisCoM-PAGE with other competing methods such as Gene Set Enrichment Analysis (GSEA), Global Test, and Wald-type Test showed HisCoM-PAGE to have the highest power to detect causal pathways in most simulation scenarios.
Collapse
Affiliation(s)
- Lydia Mok
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
| | - Yongkang Kim
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul 03080, Korea
| | - Sungkyoung Choi
- Department of Applied Mathematics, Hanyang University (ERICA), Ansan 15588, Korea
| | - Seungyeoun Lee
- Department of Mathematics and Statistics, Sejong University, Seoul 05006, Korea
| | - Jin-Young Jang
- Department of Surgery, Seoul National University College of Medicine, Seoul 03080, Korea
| | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul 08826, Korea
- Department of Statistics, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
4
|
Saxena A, Sachin K, Bhatia AK. System Level Meta-analysis of Microarray Datasets for Elucidation of Diabetes Mellitus Pathobiology. Curr Genomics 2017; 18:298-304. [PMID: 28659725 PMCID: PMC5476948 DOI: 10.2174/1389202918666170105093339] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 06/21/2016] [Accepted: 11/16/2016] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Type 2 diabetes (T2D) is a common multi-factorial disease that is primarily ac-counted to ineffective insulin action in lowering blood glucose level and later escalates to impaired insu-lin secretion by pancreatic β cells. Deregulation in insulin signaling to its target organs is attributed to this disease phenotype. Various genome-wide microarray studies from multiple insulin responsive tis-sues have been conducted in past but due to inherent noise in microarray data and heterogeneity in dis-ease etiology; reproduction of prioritized pathways/genes is very low across various studies. OBJECTIVE In this study, we aim to identify consensus signaling and metabolic pathways through system level meta-analysis of multiple expression-sets to elucidate T2D pathobiology. METHOD We used 'R', an open source statistical environment, which is routinely used for Microarray data analysis particularly using special sets of packages available at Bioconductor. We primarily focused on gene-set analysis methods to elucidate various aspects of T2D. RESULT Literature-based evidences have shown the success of our approach in exploring various known aspects of diabetes pathophysiology. CONCLUSION Our study stressed the need to develop novel bioinformatics workflows to advance our understanding further in insulin signaling.
Collapse
Affiliation(s)
- Aditya Saxena
- Department of Biotechnology, Institute of Applied Sciences & Humanities, GLA University, Mathura (U.P.) India
- Uttarakhand Technical University, Dehradun (U.K.) India
| | - Kumar Sachin
- Department of Biochemistry and Biotechnology, S.B.S. (PG) Institute of Biomedical Sciences & Research, Dehradun (U.K.) India
| | - Ashok Kumar Bhatia
- Department of Biotechnology, Institute of Applied Sciences & Humanities, GLA University, Mathura (U.P.) India
| |
Collapse
|
5
|
Zeng T, Zhang W, Yu X, Liu X, Li M, Liu R, Chen L. Edge biomarkers for classification and prediction of phenotypes. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1103-14. [PMID: 25326072 DOI: 10.1007/s11427-014-4757-4] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/09/2014] [Accepted: 08/07/2014] [Indexed: 12/19/2022]
Abstract
In general, a disease manifests not from malfunction of individual molecules but from failure of the relevant system or network, which can be considered as a set of interactions or edges among molecules. Thus, instead of individual molecules, networks or edges are stable forms to reliably characterize complex diseases. This paper reviews both traditional node biomarkers and edge biomarkers, which have been newly proposed. These biomarkers are classified in terms of their contained information. In particular, we show that edge and network biomarkers provide novel ways of stably and reliably diagnosing the disease state of a sample. First, we categorize the biomarkers based on the information used in the learning and prediction steps. We then briefly introduce conventional node biomarkers, or molecular biomarkers without network information, and their computational approaches. The main focus of this paper is edge and network biomarkers, which exploit network information to improve the accuracy of diagnosis and prognosis. Moreover, by extracting both network and dynamic information from the data, we can develop dynamical network and edge biomarkers. These biomarkers not only diagnose the immediate pre-disease state but also detect the critical molecules or networks by which the biological system progresses from the healthy to the disease state. The identified critical molecules can be used as drug targets, and the critical state indicates the critical point of disease control. The paper also discusses representative biomarker-based methods.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 200031, China
| | | | | | | | | | | | | |
Collapse
|
6
|
Dellinger AE, Nixon AB, Pang H. Integrative Pathway Analysis Using Graph-Based Learning with Applications to TCGA Colon and Ovarian Data. Cancer Inform 2014; 13:1-9. [PMID: 25125969 PMCID: PMC4125381 DOI: 10.4137/cin.s13634] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 12/15/2022] Open
Abstract
Recent method development has included multi-dimensional genomic data algorithms because such methods have more accurately predicted clinical phenotypes related to disease. This study is the first to conduct an integrative genomic pathway-based analysis with a graph-based learning algorithm. The methodology of this analysis, graph-based semi-supervised learning, detects pathways that improve prediction of a dichotomous variable, which in this study is cancer stage. This analysis integrates genome-level gene expression, methylation, and single nucleotide polymorphism (SNP) data in serous cystadenocarcinoma (OV) and colon adenocarcinoma (COAD). The top 10 ranked predictive pathways in COAD and OV were biologically relevant to their respective cancer stages and significantly enhanced prediction accuracy and area under the ROC curve (AUC) when compared to single data-type analyses. This method is an effective way to simultaneously predict binary clinical phenotypes and discover their biological mechanisms.
Collapse
Affiliation(s)
- Andrew E Dellinger
- Department of Mathematics and Statistics, Elon University, Elon, NC, USA
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
| | - Andrew B Nixon
- Department of Medicine, Division of Medical Oncology, Duke University School of Medicine, Durham, NC, USA
| | - Herbert Pang
- Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| |
Collapse
|
7
|
Zeng T, Sun SY, Wang Y, Zhu H, Chen L. Network biomarkers reveal dysfunctional gene regulations during disease progression. FEBS J 2013; 280:5682-95. [PMID: 24107168 DOI: 10.1111/febs.12536] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2013] [Revised: 08/30/2013] [Accepted: 09/09/2013] [Indexed: 12/13/2022]
Abstract
Extensive studies have been conducted on gene biomarkers by exploring the increasingly accumulated gene expression and sequence data generated from high-throughput technology. Here, we briefly report on the state-of-the-art research and application of biomarkers from single genes (i.e. gene biomarkers) to gene sets (i.e. group or set biomarkers), gene networks (i.e. network biomarkers) and dynamical gene networks (i.e. dynamical network biomarkers). In particular, differential and dynamical network biomarkers are used as representative examples to demonstrate their effectiveness in both detecting early signals for complex diseases and revealing essential mechanisms on disease initiation and progression at a network level.
Collapse
Affiliation(s)
- Tao Zeng
- Key Laboratory of Systems Biology, SIBS-Novo Nordisk Translational Research Centre for PreDiabetes, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, China
| | | | | | | | | |
Collapse
|
8
|
Komurov K, Dursun S, Erdin S, Ram PT. NetWalker: a contextual network analysis tool for functional genomics. BMC Genomics 2012; 13:282. [PMID: 22732065 PMCID: PMC3439272 DOI: 10.1186/1471-2164-13-282] [Citation(s) in RCA: 90] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 06/25/2012] [Indexed: 11/14/2022] Open
Abstract
Background Functional analyses of genomic data within the context of a priori biomolecular networks can give valuable mechanistic insights. However, such analyses are not a trivial task, owing to the complexity of biological networks and lack of computational methods for their effective integration with experimental data. Results We developed a software application suite, NetWalker, as a one-stop platform featuring a number of novel holistic (i.e. assesses the whole data distribution without requiring data cutoffs) data integration and analysis methods for network-based comparative interpretations of genome-scale data. The central analysis components, NetWalk and FunWalk, are novel random walk-based network analysis methods that provide unique analysis capabilities to assess the entire data distributions together with network connectivity to prioritize molecular and functional networks, respectively, most highlighted in the supplied data. Extensive inter-operability between the analysis components and with external applications, including R, adds to the flexibility of data analyses. Here, we present a detailed computational analysis of our microarray gene expression data from MCF7 cells treated with lethal and sublethal doses of doxorubicin. Conclusion NetWalker, a detailed step-by-step tutorial containing the analyses presented in this paper and a manual are available at the web site http://netwalkersuite.org.
Collapse
Affiliation(s)
- Kakajan Komurov
- Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, Cincinnati, OH, USA.
| | | | | | | |
Collapse
|
9
|
Kaneko S, Hirakawa A, Hamada C. Gene Selection using a High-Dimensional Regression Model with Microarrays in Cancer Prognostic Studies. Cancer Inform 2012; 11:29-39. [PMID: 22442625 PMCID: PMC3298378 DOI: 10.4137/cin.s9048] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Mining of gene expression data to identify genes associated with patient survival is an ongoing problem in cancer prognostic studies using microarrays in order to use such genes to achieve more accurate prognoses. The least absolute shrinkage and selection operator (lasso) is often used for gene selection and parameter estimation in high-dimensional microarray data. The lasso shrinks some of the coefficients to zero, and the amount of shrinkage is determined by the tuning parameter, often determined by cross validation. The model determined by this cross validation contains many false positives whose coefficients are actually zero. We propose a method for estimating the false positive rate (FPR) for lasso estimates in a high-dimensional Cox model. We performed a simulation study to examine the precision of the FPR estimate by the proposed method. We applied the proposed method to real data and illustrated the identification of false positive genes.
Collapse
Affiliation(s)
- Shuhei Kaneko
- Department of Management Science, Graduate School of Engineering, Tokyo University of Science, 1-3 Kagurazaka, Shinjuku-ku, Tokyo 162-8601, Japan
| | | | | |
Collapse
|