1
|
Chekouo T, Mukherjee H. A Bayesian hierarchical hidden Markov model for clustering and gene selection: Application to kidney cancer gene expression data. Biom J 2024; 66:e2300173. [PMID: 38817110 PMCID: PMC11239327 DOI: 10.1002/bimj.202300173] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2023] [Revised: 02/18/2024] [Accepted: 03/02/2024] [Indexed: 06/01/2024]
Abstract
We introduce a Bayesian approach for biclustering that accounts for the prior functional dependence between genes using hidden Markov models (HMMs). We utilize biological knowledge gathered from gene ontologies and the hidden Markov structure to capture the potential coexpression of neighboring genes. Our interpretable model-based clustering characterized each cluster of samples by three groups of features: overexpressed, underexpressed, and irrelevant features. The proposed methods have been implemented in an R package and are used to analyze both the simulated data and The Cancer Genome Atlas kidney cancer data.
Collapse
Affiliation(s)
- Thierry Chekouo
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minnesota, USA
| | - Himadri Mukherjee
- Department of Mathematics and Statistics, University of Minnesota Duluth, Minnesota, USA
| |
Collapse
|
2
|
Sajedi S, Ebrahimi G, Roudi R, Mehta I, Heshmat A, Samimi H, Kazempour S, Zainulabadeen A, Docking TR, Arora SP, Cigarroa F, Seshadri S, Karsan A, Zare H. Integrating DNA methylation and gene expression data in a single gene network using the iNETgrate package. Sci Rep 2023; 13:21721. [PMID: 38066050 PMCID: PMC10709411 DOI: 10.1038/s41598-023-48237-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
Analyzing different omics data types independently is often too restrictive to allow for detection of subtle, but consistent, variations that are coherently supported based upon different assays. Integrating multi-omics data in one model can increase statistical power. However, designing such a model is challenging because different omics are measured at different levels. We developed the iNETgrate package ( https://bioconductor.org/packages/iNETgrate/ ) that efficiently integrates transcriptome and DNA methylation data in a single gene network. Applying iNETgrate on five independent datasets improved prognostication compared to common clinical gold standards and a patient similarity network approach.
Collapse
Affiliation(s)
- Sogand Sajedi
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
| | - Ghazal Ebrahimi
- Bioinformatics Program, The University of British Columbia, Vancouver, BC, Canada
| | - Raheleh Roudi
- Department of Radiology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | - Isha Mehta
- Department of Immunology, University of Pittsburgh, Pittsburgh, PA, 15213, USA
| | - Amirreza Heshmat
- Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, 77030, USA
| | - Hanie Samimi
- School of Architecture, University of Utah, Salt Lake City, UT, 84112, USA
| | - Shiva Kazempour
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
| | - Aamir Zainulabadeen
- Department of Computer Science, Princeton University, Princeton, NJ, 08540, USA
| | - Thomas Roderick Docking
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, BC, V5Z 1L3, Canada
| | - Sukeshi Patel Arora
- Mays Cancer Center, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
| | - Francisco Cigarroa
- Malu and Carlos Alvarez Center for Transplantation, Hepatobiliary Surgery and Innovation, The University of Texas Health Science Center, San Antonio, TX, 78229, USA
| | - Sudha Seshadri
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA
- Department of Neurology, University of Texas, San Antonio, TX, 78229, USA
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts, 02139, USA
| | - Aly Karsan
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, BC, V5Z 1L3, Canada
| | - Habil Zare
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, TX, 78229, USA.
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, San Antonio, TX, 78229, USA.
- Department of Cell Systems & Anatomy, 7703 Floyd Curl Drive, San Antonio, TX, 78229, USA.
| |
Collapse
|
3
|
Maehara H, Kokaji T, Hatano A, Suzuki Y, Matsumoto M, Nakayama KI, Egami R, Tsuchiya T, Ozaki H, Morita K, Shirai M, Li D, Terakawa A, Uematsu S, Hironaka KI, Ohno S, Kubota H, Araki H, Miura F, Ito T, Kuroda S. DNA hypomethylation characterizes genes encoding tissue-dominant functional proteins in liver and skeletal muscle. Sci Rep 2023; 13:19118. [PMID: 37926704 PMCID: PMC10625943 DOI: 10.1038/s41598-023-46393-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 10/31/2023] [Indexed: 11/07/2023] Open
Abstract
Each tissue has a dominant set of functional proteins required to mediate tissue-specific functions. Epigenetic modifications, transcription, and translational efficiency control tissue-dominant protein production. However, the coordination of these regulatory mechanisms to achieve such tissue-specific protein production remains unclear. Here, we analyzed the DNA methylome, transcriptome, and proteome in mouse liver and skeletal muscle. We found that DNA hypomethylation at promoter regions is globally associated with liver-dominant or skeletal muscle-dominant functional protein production within each tissue, as well as with genes encoding proteins involved in ubiquitous functions in both tissues. Thus, genes encoding liver-dominant proteins, such as those involved in glycolysis or gluconeogenesis, the urea cycle, complement and coagulation systems, enzymes of tryptophan metabolism, and cytochrome P450-related metabolism, were hypomethylated in the liver, whereas those encoding-skeletal muscle-dominant proteins, such as those involved in sarcomere organization, were hypomethylated in the skeletal muscle. Thus, DNA hypomethylation characterizes genes encoding tissue-dominant functional proteins.
Collapse
Affiliation(s)
- Hideki Maehara
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Toshiya Kokaji
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
- Data Science Center, Nara Institute of Science and Technology, 8916‑5 Takayama, Ikoma, Nara, Japan
| | - Atsushi Hatano
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
- Department of Omics and Systems Biology, Graduate School of Medical and Dental Sciences, Niigata University, 757 Ichibancho, Asahimachi-Dori, Chuo-Ku, Niigata City, Niigata, 951-8510, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Masaki Matsumoto
- Department of Omics and Systems Biology, Graduate School of Medical and Dental Sciences, Niigata University, 757 Ichibancho, Asahimachi-Dori, Chuo-Ku, Niigata City, Niigata, 951-8510, Japan
| | - Keiichi I Nakayama
- Department of Molecular and Cellular Biology, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-Ku, Fukuoka, 812-8582, Japan
| | - Riku Egami
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Takaho Tsuchiya
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, Ibaraki, 305‑8575, Japan
- Center for Artificial Intelligence Research, University of Tsukuba, Ibaraki, 305‑8577, Japan
| | - Haruka Ozaki
- Bioinformatics Laboratory, Institute of Medicine, University of Tsukuba, Ibaraki, 305‑8575, Japan
- Center for Artificial Intelligence Research, University of Tsukuba, Ibaraki, 305‑8577, Japan
| | - Keigo Morita
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Masaki Shirai
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Dongzi Li
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Akira Terakawa
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Saori Uematsu
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan
| | - Ken-Ichi Hironaka
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
| | - Satoshi Ohno
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan
- Molecular Genetics Research Laboratory, Graduate School of Science, University of Tokyo, 7‑3‑1 Hongo, Bunkyo‑ku, Tokyo, 113‑0033, Japan
- Department of AI Systems Medicine, M&D Data Science Center, Tokyo Medical and Dental University, Tokyo, 113-8510, Japan
| | - Hiroyuki Kubota
- Division of Integrated Omics, Medical Research Center for High Depth Omics, Medical Institute of Bioregulation, Kyushu University, 3-1-1 Maidashi, Higashi-Ku, Fukuoka, Fukuoka, 812-8582, Japan
| | - Hiromitsu Araki
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, 812-8582, Japan
| | - Fumihito Miura
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, 812-8582, Japan
| | - Takashi Ito
- Department of Biochemistry, Kyushu University Graduate School of Medical Sciences, Fukuoka, 812-8582, Japan
| | - Shinya Kuroda
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, 7-3-1 Hongo, Bunkyo-Ku, Tokyo, 113-0033, Japan.
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, 5-1-5 Kashiwanoha, Kashiwa, Chiba, 277-8562, Japan.
- Molecular Genetics Research Laboratory, Graduate School of Science, University of Tokyo, 7‑3‑1 Hongo, Bunkyo‑ku, Tokyo, 113‑0033, Japan.
| |
Collapse
|
4
|
Sajedi S, Ebrahimi G, Roudi R, Mehta I, Samimi H, Kazempour S, Zainulabadeen A, Docking TR, Arora SP, Cigarroa F, Seshadri S, Karsan A, Zare H. "iNETgrate": integrating DNA methylation and gene expression data in a single gene network. RESEARCH SQUARE 2023:rs.3.rs-3246325. [PMID: 37645739 PMCID: PMC10462231 DOI: 10.21203/rs.3.rs-3246325/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/31/2023]
Abstract
Integrating multi-omics data in one model can increase statistical power. However, designing such a model is challenging because different omics are measured at different levels. We developed the iNETgrate package (https://bioconductor.org/packages/iNETgrate/) that efficiently integrates transcriptome and DNA methylation data in a single gene network. Applying iNETgrate on five independent datasets improved prognostication compared to common clinical gold standards and a patient similarity network approach.
Collapse
Affiliation(s)
- Sogand Sajedi
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, Texas 78229, USA
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, San Antonio, Texas 78229, USA
| | - Ghazal Ebrahimi
- Bioinformatics Program, the University of British Columbia, Vancouver, BC, Canada
| | - Raheleh Roudi
- Department of Radiology, Stanford University School of Medicine, Stanford, California 94305, USA
| | - Isha Mehta
- Department of Immunology, University of Pittsburgh, Pittsburgh, Pennsylvania 15213, USA
| | - Hanie Samimi
- School of Architecture, University of Utah, Salt Lake City, Utah 84112, USA
| | - Shiva Kazempour
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, Texas 78229, USA
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, San Antonio, Texas 78229, USA
| | - Aamir Zainulabadeen
- Department of Computer Science, Princeton University, Princeton, New Jersey 08540, USA
| | - Thomas Roderick Docking
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Sukeshi Patel Arora
- Mays Cancer Center, The University of Texas Health Science Center, San Antonio, Texas 78229, USA
| | - Francisco Cigarroa
- Malu and Carlos Alvarez Center for Transplantation, Hepatobiliary Surgery and Innovation, The University of Texas Health Science Center, San Antonio, Texas 78229, USA
| | - Sudha Seshadri
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, San Antonio, Texas 78229, USA
- Department of Neurology, University of Texas, San Antonio, Texas 78229, USA
- Department of Neurology, Boston University School of Medicine, Boston, Massachusetts 02139,USA
| | - Aly Karsan
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Research Centre, Vancouver, British Columbia, V5Z 1L3, Canada
| | - Habil Zare
- Department of Cell Systems & Anatomy, The University of Texas Health Science Center, San Antonio, Texas 78229, USA
- Glenn Biggs Institute for Alzheimer’s & Neurodegenerative Diseases, San Antonio, Texas 78229, USA
| |
Collapse
|
5
|
Bogan SN, Strader ME, Hofmann GE. Associations between DNA methylation and gene regulation depend on chromatin accessibility during transgenerational plasticity. BMC Biol 2023; 21:149. [PMID: 37365578 DOI: 10.1186/s12915-023-01645-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 06/07/2023] [Indexed: 06/28/2023] Open
Abstract
BACKGROUND Epigenetic processes are proposed to be a mechanism regulating gene expression during phenotypic plasticity. However, environmentally induced changes in DNA methylation exhibit little-to-no association with differential gene expression in metazoans at a transcriptome-wide level. It remains unexplored whether associations between environmentally induced differential methylation and expression are contingent upon other epigenomic processes such as chromatin accessibility. We quantified methylation and gene expression in larvae of the purple sea urchin Strongylocentrotus purpuratus exposed to different ecologically relevant conditions during gametogenesis (maternal conditioning) and modeled changes in gene expression and splicing resulting from maternal conditioning as functions of differential methylation, incorporating covariates for genomic features and chromatin accessibility. We detected significant interactions between differential methylation, chromatin accessibility, and genic feature type associated with differential expression and splicing. RESULTS Differential gene body methylation had significantly stronger effects on expression among genes with poorly accessible transcriptional start sites while baseline transcript abundance influenced the direction of this effect. Transcriptional responses to maternal conditioning were 4-13 × more likely when accounting for interactions between methylation and chromatin accessibility, demonstrating that the relationship between differential methylation and gene regulation is partially explained by chromatin state. CONCLUSIONS DNA methylation likely possesses multiple associations with gene regulation during transgenerational plasticity in S. purpuratus and potentially other metazoans, but its effects are dependent on chromatin accessibility and underlying genic features.
Collapse
Affiliation(s)
- Samuel N Bogan
- Department of Ecology, Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, USA.
| | - Marie E Strader
- Department of Ecology, Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, USA
- Department of Biology, Texas A&M University, College Station, USA
| | - Gretchen E Hofmann
- Department of Ecology, Evolution and Marine Biology, University of California Santa Barbara, Santa Barbara, USA
| |
Collapse
|
6
|
Inference of epigenetic subnetworks by Bayesian regression with the incorporation of prior information. Sci Rep 2022; 12:20224. [PMID: 36418365 PMCID: PMC9684215 DOI: 10.1038/s41598-022-19879-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2020] [Accepted: 09/06/2022] [Indexed: 11/25/2022] Open
Abstract
Changes in gene expression have been thought to play a crucial role in various types of cancer. With the advance of high-throughput experimental techniques, many genome-wide studies are underway to analyze underlying mechanisms that may drive the changes in gene expression. It has been observed that the change could arise from altered DNA methylation. However, the knowledge about the degree to which epigenetic changes might cause differences in gene expression in cancer is currently lacking. By considering the change of gene expression as the response of altered DNA methylation, we introduce a novel analytical framework to identify epigenetic subnetworks in which the methylation status of a set of highly correlated genes is predictive of a set of gene expression. By detecting highly correlated modules as representatives of the regulatory scenario underling the gene expression and DNA methylation, the dependency between DNA methylation and gene expression is explored by a Bayesian regression model with the incorporation of g-prior followed by a strategy of an optimal predictor subset selection. The subsequent network analysis indicates that the detected epigenetic subnetworks are highly biologically relevant and contain many verified epigenetic causal mechanisms. Moreover, a survival analysis indicates that they might be effective prognostic factors associated with patient survival time.
Collapse
|
7
|
Tulsyan S, Aftab M, Sisodiya S, Khan A, Chikara A, Tanwar P, Hussain S. Molecular basis of epigenetic regulation in cancer diagnosis and treatment. Front Genet 2022; 13:885635. [PMID: 36092905 PMCID: PMC9449878 DOI: 10.3389/fgene.2022.885635] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 07/19/2022] [Indexed: 02/01/2023] Open
Abstract
The global cancer cases and mortality rates are increasing and demand efficient biomarkers for accurate screening, detection, diagnosis, and prognosis. Recent studies have demonstrated that variations in epigenetic mechanisms like aberrant promoter methylation, altered histone modification and mutations in ATP-dependent chromatin remodelling complexes play an important role in the development of carcinogenic events. However, the influence of other epigenetic alterations in various cancers was confirmed with evolving research and the emergence of high throughput technologies. Therefore, alterations in epigenetic marks may have clinical utility as potential biomarkers for early cancer detection and diagnosis. In this review, an outline of the key epigenetic mechanism(s), and their deregulation in cancer etiology have been discussed to decipher the future prospects in cancer therapeutics including precision medicine. Also, this review attempts to highlight the gaps in epigenetic drug development with emphasis on integrative analysis of epigenetic biomarkers to establish minimally non-invasive biomarkers with clinical applications.
Collapse
Affiliation(s)
- Sonam Tulsyan
- Division of Cellular and Molecular Diagnostics (Molecular Biology Group), ICMR- National Institute of Cancer Prevention and Research, Noida, India
| | - Mehreen Aftab
- Division of Cellular and Molecular Diagnostics (Molecular Biology Group), ICMR- National Institute of Cancer Prevention and Research, Noida, India
| | - Sandeep Sisodiya
- Division of Cellular and Molecular Diagnostics (Molecular Biology Group), ICMR- National Institute of Cancer Prevention and Research, Noida, India
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| | - Asiya Khan
- Laboratory Oncology Unit, Dr. B. R. A. Institute Rotary Cancer Hospital, All India Institute of Medical Sciences, New Delhi, India
| | - Atul Chikara
- Division of Cellular and Molecular Diagnostics (Molecular Biology Group), ICMR- National Institute of Cancer Prevention and Research, Noida, India
- Symbiosis School of Biological Sciences, Symbiosis International (Deemed University), Pune, India
| | - Pranay Tanwar
- Laboratory Oncology Unit, Dr. B. R. A. Institute Rotary Cancer Hospital, All India Institute of Medical Sciences, New Delhi, India
- *Correspondence: Showket Hussain, ; Pranay Tanwar,
| | - Showket Hussain
- Division of Cellular and Molecular Diagnostics (Molecular Biology Group), ICMR- National Institute of Cancer Prevention and Research, Noida, India
- *Correspondence: Showket Hussain, ; Pranay Tanwar,
| |
Collapse
|
8
|
Epigenetic regulation of fetal brain development in pig. Gene 2022; 844:146823. [PMID: 35988784 DOI: 10.1016/j.gene.2022.146823] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Revised: 07/27/2022] [Accepted: 08/15/2022] [Indexed: 02/01/2023]
Abstract
How fetal brain development is regulated at the molecular level is not well understood. Due to ethical challenges associated with research on the human fetus, large animals particularly pigs are increasingly used to study development and disorders of fetal brain. The pig fetal brain grows rapidly during the last ∼ 50 days before birth which is around day 60 (d60) of pig gestation. But what regulates the onset of accelerated growth of the brain is unknown. The current study tests the hypothesis that epigenetic alteration around d60 is involved in the onset of rapid growth of fetal brain of pig. To test this hypothesis, DNA methylation changes of fetal brain was assessed in a genome-wide manner by Enzymatic Methyl-seq (EM-seq) during two gestational periods (GP): d45 vs. d60 (GP1) and d60 vs. d90 (GP2). The cytosine-guanine (CpG) methylation data was analyzed in an integrative manner with the RNA-seq data generated from the same brain samples from our earlier study. A neural network based modeling approach was implemented to learn changes in methylation patterns of the differentially expressed genes, and then predict methylations of the brain in a genome-wide manner during rapid growth. This approach identified specific methylations that changed in a mutually informative manner during rapid growth of the fetal brain. These methylations were significantly overrepresented in specific genic as well as intergenic features including CpG islands, introns, and untranslated regions. In addition, sex-bias methylations of known single nucleotide polymorphic sites were also identified in the fetal brain ide during rapid growth.
Collapse
|
9
|
Lombardo SD, Wangsaputra IF, Menche J, Stevens A. Network Approaches for Charting the Transcriptomic and Epigenetic Landscape of the Developmental Origins of Health and Disease. Genes (Basel) 2022; 13:764. [PMID: 35627149 PMCID: PMC9141211 DOI: 10.3390/genes13050764] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 04/04/2022] [Accepted: 04/13/2022] [Indexed: 02/04/2023] Open
Abstract
The early developmental phase is of critical importance for human health and disease later in life. To decipher the molecular mechanisms at play, current biomedical research is increasingly relying on large quantities of diverse omics data. The integration and interpretation of the different datasets pose a critical challenge towards the holistic understanding of the complex biological processes that are involved in early development. In this review, we outline the major transcriptomic and epigenetic processes and the respective datasets that are most relevant for studying the periconceptional period. We cover both basic data processing and analysis steps, as well as more advanced data integration methods. A particular focus is given to network-based methods. Finally, we review the medical applications of such integrative analyses.
Collapse
Affiliation(s)
- Salvo Danilo Lombardo
- Max Perutz Labs, Department of Structural and Computational Biology, University of Vienna, 1030 Vienna, Austria;
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1030 Vienna, Austria
| | - Ivan Fernando Wangsaputra
- Maternal and Fetal Health Research Group, Division of Developmental Biology and Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9WL, UK;
| | - Jörg Menche
- Max Perutz Labs, Department of Structural and Computational Biology, University of Vienna, 1030 Vienna, Austria;
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1030 Vienna, Austria
- Faculty of Mathematics, University of Vienna, 1030 Vienna, Austria
| | - Adam Stevens
- Maternal and Fetal Health Research Group, Division of Developmental Biology and Medicine, Faculty of Biology, Medicine and Health, University of Manchester, Manchester M13 9WL, UK;
| |
Collapse
|
10
|
Maity AK, Stone TC, Ward V, Webster AP, Yang Z, Hogan A, McBain H, Duku M, Ho KMA, Wolfson P, Graham DG, Beck S, Teschendorff AE, Lovat LB. Novel epigenetic network biomarkers for early detection of esophageal cancer. Clin Epigenetics 2022; 14:23. [PMID: 35164838 PMCID: PMC8845366 DOI: 10.1186/s13148-022-01243-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Accepted: 02/04/2022] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Early detection of esophageal cancer is critical to improve survival. Whilst studies have identified biomarkers, their interpretation and validity is often confounded by cell-type heterogeneity. RESULTS Here we applied systems-epigenomic and cell-type deconvolution algorithms to a discovery set encompassing RNA-Seq and DNA methylation data from esophageal adenocarcinoma (EAC) patients and matched normal-adjacent tissue, in order to identify robust biomarkers, free from the confounding effect posed by cell-type heterogeneity. We identify 12 gene-modules that are epigenetically deregulated in EAC, and are able to validate all 12 modules in 4 independent EAC cohorts. We demonstrate that the epigenetic deregulation is present in the epithelial compartment of EAC-tissue. Using single-cell RNA-Seq data we show that one of these modules, a proto-cadherin module centered around CTNND2, is inactivated in Barrett's Esophagus, a precursor lesion to EAC. By measuring DNA methylation in saliva from EAC cases and controls, we identify a chemokine module centered around CCL20, whose methylation patterns in saliva correlate with EAC status. CONCLUSIONS Given our observations that a CCL20 chemokine network is overactivated in EAC tissue and saliva from EAC patients, and that in independent studies CCL20 has been found to be overactivated in EAC tissue infected with the bacterium F. nucleatum, a bacterium that normally inhabits the oral cavity, our results highlight the possibility of using DNAm measurements in saliva as a proxy for changes occurring in the esophageal epithelium. Both the CTNND2/CCL20 modules represent novel promising network biomarkers for EAC that merit further investigation.
Collapse
Affiliation(s)
- Alok K Maity
- CAS Key Lab of Computational Biology, Shanghai Institute for Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China
| | - Timothy C Stone
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Vanessa Ward
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Amy P Webster
- University of Exeter Medical School, University of Exeter, Exeter, UK
| | - Zhen Yang
- Key Laboratory of Medical Epigenetics and Metabolism, Institutes of Biomedical Sciences, Fudan University, Shanghai, 200032, China
| | - Aine Hogan
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Hazel McBain
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Margaraet Duku
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Kai Man Alexander Ho
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - Paul Wolfson
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK
| | - David G Graham
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK.,Division of GI Services, University College London Hospitals NHS Foundation Trust, 235 Euston Road, London, NW1 2BU, UK
| | | | - Stephan Beck
- UCL Cancer Institute, University College London, Gower Street, London, WC1E 6BT, UK
| | - Andrew E Teschendorff
- CAS Key Lab of Computational Biology, Shanghai Institute for Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031, China.
| | - Laurence B Lovat
- Division of Surgery and Interventional Science, University College London, Gower Street, London, WC1E 6BT, UK. .,Division of GI Services, University College London Hospitals NHS Foundation Trust, 235 Euston Road, London, NW1 2BU, UK.
| |
Collapse
|
11
|
Li C, Gao Z, Su B, Xu G, Lin X. Data analysis methods for defining biomarkers from omics data. Anal Bioanal Chem 2021; 414:235-250. [PMID: 34951658 DOI: 10.1007/s00216-021-03813-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 11/26/2021] [Accepted: 11/29/2021] [Indexed: 02/01/2023]
Abstract
Omics mainly includes genomics, epigenomics, transcriptomics, proteomics and metabolomics. The rapid development of omics technology has opened up new ways to study disease diagnosis and prognosis and to define prospective information of complex diseases. Since omics data are usually large and complex, the method used to analyze the data and to define important information is crucial in omics study. In this review, we focus on advances in biomarker discovery methods based on omics data in the last decade, and categorize them as individual feature analysis, combinatorial feature analysis and network analysis. We also discuss the challenges and perspectives in this field.
Collapse
Affiliation(s)
- Chao Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Zhenbo Gao
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Benzhe Su
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
| | - Guowang Xu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, 116023, Liaoning, China
| | - Xiaohui Lin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China.
| |
Collapse
|
12
|
Arslan E, Schulz J, Rai K. Machine Learning in Epigenomics: Insights into Cancer Biology and Medicine. Biochim Biophys Acta Rev Cancer 2021; 1876:188588. [PMID: 34245839 PMCID: PMC8595561 DOI: 10.1016/j.bbcan.2021.188588] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2021] [Revised: 05/29/2021] [Accepted: 07/02/2021] [Indexed: 02/01/2023]
Abstract
The recent deluge of genome-wide technologies for the mapping of the epigenome and resulting data in cancer samples has provided the opportunity for gaining insights into and understanding the roles of epigenetic processes in cancer. However, the complexity, high-dimensionality, sparsity, and noise associated with these data pose challenges for extensive integrative analyses. Machine Learning (ML) algorithms are particularly suited for epigenomic data analyses due to their flexibility and ability to learn underlying hidden structures. We will discuss four overlapping but distinct major categories under ML: dimensionality reduction, unsupervised methods, supervised methods, and deep learning (DL). We review the preferred use cases of these algorithms in analyses of cancer epigenomics data with the hope to provide an overview of how ML approaches can be used to explore fundamental questions on the roles of epigenome in cancer biology and medicine.
Collapse
Affiliation(s)
- Emre Arslan
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Jonathan Schulz
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America
| | - Kunal Rai
- Department of Genomic Medicine, MD Anderson Cancer Center, Houston, TX 77030, United States of America.
| |
Collapse
|
13
|
Dou Z, Ma X. Inferring Functional Epigenetic Modules by Integrative Analysis of Multiple Heterogeneous Networks. Front Genet 2021; 12:706952. [PMID: 34504516 PMCID: PMC8421682 DOI: 10.3389/fgene.2021.706952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2021] [Accepted: 06/29/2021] [Indexed: 02/02/2023] Open
Abstract
Gene expression and methylation are critical biological processes for cells, and how to integrate these heterogeneous data has been extensively investigated, which is the foundation for revealing the underlying patterns of cancers. The vast majority of the current algorithms fuse gene methylation and expression into a network, failing to fully explore the relations and heterogeneity of them. To resolve these problems, in this study we define the epigenetic modules as a gene set whose members are co-methylated and co-expressed. To address the heterogeneity of data, we construct gene co-expression and co-methylation networks, respectively. In this case, the epigenetic module is characterized as a common module in multiple networks. Then, a non-negative matrix factorization-based algorithm that jointly clusters the co-expression and co-methylation networks is proposed for discovering the epigenetic modules (called Ep-jNMF). Ep-jNMF is more accurate than the baselines on the artificial data. Moreover, Ep-jNMF identifies more biologically meaningful modules. And the modules can predict the subtypes of cancers. These results indicate that Ep-jNMF is efficient for the integration of expression and methylation data.
Collapse
Affiliation(s)
- Zengfa Dou
- The 20-th Research Institute, China Electronics Technology Group Corporation, Xi'an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, Xi'an, China
| |
Collapse
|
14
|
Wang Y, Xia Z, Deng J, Xie X, Gong M, Ma X. TLGP: a flexible transfer learning algorithm for gene prioritization based on heterogeneous source domain. BMC Bioinformatics 2021; 22:274. [PMID: 34433414 PMCID: PMC8386056 DOI: 10.1186/s12859-021-04190-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 05/12/2021] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Gene prioritization (gene ranking) aims to obtain the centrality of genes, which is critical for cancer diagnosis and therapy since keys genes correspond to the biomarkers or targets of drugs. Great efforts have been devoted to the gene ranking problem by exploring the similarity between candidate and known disease-causing genes. However, when the number of disease-causing genes is limited, they are not applicable largely due to the low accuracy. Actually, the number of disease-causing genes for cancers, particularly for these rare cancers, are really limited. Therefore, there is a critical needed to design effective and efficient algorithms for gene ranking with limited prior disease-causing genes. RESULTS In this study, we propose a transfer learning based algorithm for gene prioritization (called TLGP) in the cancer (target domain) without disease-causing genes by transferring knowledge from other cancers (source domain). The underlying assumption is that knowledge shared by similar cancers improves the accuracy of gene prioritization. Specifically, TLGP first quantifies the similarity between the target and source domain by calculating the affinity matrix for genes. Then, TLGP automatically learns a fusion network for the target cancer by fusing affinity matrix, pathogenic genes and genomic data of source cancers. Finally, genes in the target cancer are prioritized. The experimental results indicate that the learnt fusion network is more reliable than gene co-expression network, implying that transferring knowledge from other cancers improves the accuracy of network construction. Moreover, TLGP outperforms state-of-the-art approaches in terms of accuracy, improving at least 5%. CONCLUSION The proposed model and method provide an effective and efficient strategy for gene ranking by integrating genomic data from various cancers.
Collapse
Affiliation(s)
- Yan Wang
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
- Department of Library, Xidian University, South TaiBai Road, Xi’an, China
| | - Zuheng Xia
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| | - Jingjing Deng
- Department of Computer Science, Swansea University, Bay, UK
| | - Xianghua Xie
- Department of Computer Science, Swansea University, Bay, UK
| | - Maoguo Gong
- School of Electronic Engineering, Xidian University, South TaiBai Road, Xi’an, China
| | - Xiaoke Ma
- School of Computer Science and Technology, Xidian University, South TaiBai Road, Xi’an, China
| |
Collapse
|
15
|
Mahapatra S, Bhuyan R, Das J, Swarnkar T. Integrated multiplex network based approach for hub gene identification in oral cancer. Heliyon 2021; 7:e07418. [PMID: 34258466 PMCID: PMC8258848 DOI: 10.1016/j.heliyon.2021.e07418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2020] [Revised: 01/27/2021] [Accepted: 06/23/2021] [Indexed: 02/01/2023] Open
Abstract
Background: The incidence of Oral Cancer (OC) is high in Asian countries, which goes undetected at its early stage. The study of genetics, especially genetic networks holds great promise in this endeavor. Hub genes in a genetic network are prominent in regulating the whole network structure of genes. Thus identification of such genes related to specific cancer types can help in reducing the gap in OC prognosis. Methods: Traditional study of network biology is unable to decipher the inter-dependencies within and across diverse biological networks. Multiplex network provides a powerful representation of such systems and encodes much richer information than isolated networks. In this work, we focused on the entire multiplex structure of the genetic network integrating the gene expression profile and DNA methylation profile for OC. Further, hub genes were identified by considering their connectivity in the multiplex structure and the respective protein-protein interaction (PPI) network as well. Results: 46 hub genes were inferred in our approach with a high prediction accuracy (96%), outstanding Matthews coefficient correlation value (93%) and significant biological implications. Among them, genes PIK3CG, PIK3R5, MYH7, CDC20 and CCL4 were differentially expressed and predominantly enriched in molecular cascades specific to OC. Conclusions: The identified hub genes in this work carry ontological signatures specific to cancer, which may further facilitate improved understanding of the tumorigenesis process and the underlying molecular events. Result indicates the effectiveness of our integrated multiplex network approach for hub gene identification. This work puts an innovative research route for multi-omics biological data analysis.
Collapse
Affiliation(s)
- S. Mahapatra
- Department of Computer Application, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - R. Bhuyan
- Department of Oral Pathology & Microbiology, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - J. Das
- Centre for Genomics & Biomedical Informatics, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| | - T. Swarnkar
- Department of Computer Application, Siksha O Anusandhan Deemed to be University, Bhubaneswar, India
| |
Collapse
|
16
|
Sarno F, Benincasa G, List M, Barabasi AL, Baumbach J, Ciardiello F, Filetti S, Glass K, Loscalzo J, Marchese C, Maron BA, Paci P, Parini P, Petrillo E, Silverman EK, Verrienti A, Altucci L, Napoli C. Clinical epigenetics settings for cancer and cardiovascular diseases: real-life applications of network medicine at the bedside. Clin Epigenetics 2021; 13:66. [PMID: 33785068 PMCID: PMC8010949 DOI: 10.1186/s13148-021-01047-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 03/01/2021] [Indexed: 02/07/2023] Open
Abstract
Despite impressive efforts invested in epigenetic research in the last 50 years, clinical applications are still lacking. Only a few university hospital centers currently use epigenetic biomarkers at the bedside. Moreover, the overall concept of precision medicine is not widely recognized in routine medical practice and the reductionist approach remains predominant in treating patients affected by major diseases such as cancer and cardiovascular diseases. By its' very nature, epigenetics is integrative of genetic networks. The study of epigenetic biomarkers has led to the identification of numerous drugs with an increasingly significant role in clinical therapy especially of cancer patients. Here, we provide an overview of clinical epigenetics within the context of network analysis. We illustrate achievements to date and discuss how we can move from traditional medicine into the era of network medicine (NM), where pathway-informed molecular diagnostics will allow treatment selection following the paradigm of precision medicine.
Collapse
Affiliation(s)
- Federica Sarno
- Department of Precision Medicine, University of Campania "Luigi Vanvitelli", Napoli, Italy
| | - Giuditta Benincasa
- Department of Advanced Medical and Surgical Sciences (DAMSS), University of Campania "Luigi Vanvitelli", Naples, Italy
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
| | - Albert-Lazlo Barabasi
- Network Science Institute and Department of Physics, Northeastern University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Network and Data Science, Central European University, Budapest, Hungary
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical University of Munich, Freising, Germany
- Department of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
- Chair of Computational Systems Biology, University of Hamburg, Notkestrasse 9, Hamburg, Germany
| | - Fortunato Ciardiello
- Department of Precision Medicine, University of Campania "Luigi Vanvitelli", Napoli, Italy
| | | | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Joseph Loscalzo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Cinzia Marchese
- Department of Experimental Medicine, Sapienza University of Rome, Rome, Italy
| | - Bradley A Maron
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Paola Paci
- Department of Computer, Control, and Management Engineering, Sapienza University, Rome, Italy
| | - Paolo Parini
- Department of Laboratory Medicine and Department of Medicine, Karolinska Institute and Karolinska University Hospital, Stockholm, Sweden
| | - Enrico Petrillo
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Boston, MA, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
- Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA
| | - Antonella Verrienti
- Department of Translational and Precision Medicine, Sapienza University, Rome, Italy
| | - Lucia Altucci
- Department of Precision Medicine, University of Campania "Luigi Vanvitelli", Napoli, Italy.
| | - Claudio Napoli
- Department of Advanced Medical and Surgical Sciences (DAMSS), University of Campania "Luigi Vanvitelli", Naples, Italy
- Clinical Department of Internal Medicine and Specialistic Units, AOU, University of Campania "Luigi Vanvitelli", Naples, Italy
| |
Collapse
|
17
|
Li D, Zhang S, Ma X. Dynamic Module Detection in Temporal Attributed Networks of cancers. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 19:2219-2230. [PMID: 33780342 DOI: 10.1109/tcbb.2021.3069441] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Tracking the dynamic modules during cancer progression is essential for studying cancer pathogenesis, diagnosis and therapy. However, current algorithms only focus on detecting dynamic modules from temporal cancer networks without integrating the heterogeneous genomic data, thereby resulting in undesirable performance. To attack this issue, a novel algorithm (aka TANMF) is proposed to detect dynamic modules in cancer temporal attributed networks, which integrates the temporal networks and gene attributes. To obtain the dynamic modules, the temporality and gene attributed are incorporated into an overall objective function, which transforms the dynamic module detection into an optimization problem. TANMF jointly decomposes the snapshots at two subsequent time steps to obtain the latent features of dynamic modules, where the attributes are fused via regulations. Furthermore, L1 constraint is imposed to improve the robustness. Experimental results demonstrate that TANMF is more accurate than state-of-the-art methods in terms of accuracy. By applying TANMF to breast cancer data, the obtained dynamic modules are more enriched by the known pathways and associated with the survival time of patients. The proposed model and algorithm provide an effective way for the integrative analysis of heterogeneous omics.
Collapse
|
18
|
Chen X, Ashoor H, Musich R, Wang J, Zhang M, Zhang C, Lu M, Li S. epihet for intra-tumoral epigenetic heterogeneity analysis and visualization. Sci Rep 2021; 11:376. [PMID: 33432081 PMCID: PMC7801679 DOI: 10.1038/s41598-020-79627-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 12/04/2020] [Indexed: 02/01/2023] Open
Abstract
Intra-tumoral epigenetic heterogeneity is an indicator of tumor population fitness and is linked to the deregulation of transcription. However, there is no published computational tool to automate the measurement of intra-tumoral epigenetic allelic heterogeneity. We developed an R/Bioconductor package, epihet, to calculate the intra-tumoral epigenetic heterogeneity and to perform differential epigenetic heterogeneity analysis. Furthermore, epihet can implement a biological network analysis workflow for transforming cancer-specific differential epigenetic heterogeneity loci into cancer-related biological function and clinical biomarkers. Finally, we demonstrated epihet utility on acute myeloid leukemia. We found statistically significant differential epigenetic heterogeneity (DEH) loci compared to normal controls and constructed co-epigenetic heterogeneity network and modules. epihet is available at https://bioconductor.org/packages/release/bioc/html/epihet.html .
Collapse
Affiliation(s)
- Xiaowen Chen
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA
| | - Haitham Ashoor
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA
| | - Ryan Musich
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA
| | - Jiahui Wang
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA
| | - Mingsheng Zhang
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA
| | - Chao Zhang
- grid.5386.8000000041936877XWeill Cornell Medicine, New York, NY USA
| | - Mingyang Lu
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Mammalian Genetics, Bar Harbor, ME USA
| | - Sheng Li
- grid.249880.f0000 0004 0374 0039The Jackson Laboratory for Genomic Medicine, 10 Discovery Drive, Farmington, CT 06032-2374 USA ,grid.249880.f0000 0004 0374 0039The Jackson Laboratory Cancer Center, Bar Harbor, ME USA ,grid.208078.50000000419370394Department of Genetics and Genome Sciences, University of Connecticut School of Medicine, Farmington, CT USA ,grid.63054.340000 0001 0860 4915Department of Computer Science and Engineering, University of Connecticut, Storrs, CT USA
| |
Collapse
|
19
|
Mukhopadhyay S, Ghosh S, Das D, Arun P, Roy B, Biswas NK, Maitra A, Majumder PP. Application of Random Forest and data integration identifies three dysregulated genes and enrichment of Central Carbon Metabolism pathway in Oral Cancer. BMC Cancer 2020; 20:1219. [PMID: 33317464 PMCID: PMC7737291 DOI: 10.1186/s12885-020-07709-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2020] [Accepted: 12/03/2020] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND Studies of epigenomic alterations associated with diseases primarily focus on methylation profiles of promoter regions of genes, but not of other genomic regions. In our past work (Das et al. 2019) on patients suffering from gingivo-buccal oral cancer - the most prevalent form of cancer among males in India - we have also focused on promoter methylation changes and resultant impact on transcription profiles. Here, we have investigated alterations in non-promoter (gene-body) methylation profiles and have carried out an integrative analysis of gene-body methylation and transcriptomic data of oral cancer patients. METHODS Tumor and adjacent normal tissue samples were collected from 40 patients. Data on methylation in the non-promoter (gene-body) regions of genes and transcriptome profiles were generated and analyzed. Because of high dimensionality and highly correlated nature of these data, we have used Random Forest (RF) and other data-analytical methods. RESULTS Integrative analysis of non-promoter methylation and transcriptome data revealed significant methylation-driven alterations in some genes that also significantly impact on their transcription levels. These changes result in enrichment of the Central Carbon Metabolism (CCM) pathway, primarily by dysregulation of (a) NTRK3, which plays a dual role as an oncogene and a tumor suppressor; (b) SLC7A5 (LAT1) which is a transporter dedicated to essential amino acids, and is overexpressed in cancer cells to meet the increased demand for nutrients that include glucose and essential amino acids; and, (c) EGFR which has been earlier implicated in progression, recurrence, and stemness of oral cancer, but we provide evidence of epigenetic impact on overexpression of this gene for the first time. CONCLUSIONS In rapidly dividing cancer cells, metabolic reprogramming from normal cells takes place to enable enhanced proliferation. Here, we have identified that among oral cancer patients, genes in the CCM pathway - that plays a fundamental role in metabolic reprogramming - are significantly dysregulated because of perturbation of methylation in non-promoter regions of the genome. This result compliments our previous result that perturbation of promoter methylation results in significant changes in key genes that regulate the feedback process of DNA methylation for the maintenance of normal cell division.
Collapse
Affiliation(s)
| | - Sahana Ghosh
- National Institute of Biomedical Genomics, Kalyani, 741251, India
| | - Debodipta Das
- National Institute of Biomedical Genomics, Kalyani, 741251, India
| | - P Arun
- Tata Medical Centre, Kolkata, India
| | - Bidyut Roy
- Indian Statistical Institute, Kolkata, India
| | - Nidhan K Biswas
- National Institute of Biomedical Genomics, Kalyani, 741251, India
| | - Arindam Maitra
- National Institute of Biomedical Genomics, Kalyani, 741251, India
| | - Partha P Majumder
- National Institute of Biomedical Genomics, Kalyani, 741251, India. .,Indian Statistical Institute, Kolkata, India.
| |
Collapse
|
20
|
Ma X, Zhang X, Luo J, Liang B, Peng J, Chen C, Guo H, Wang Q, Xing X, Deng Q, Huang H, Liao Q, Chen W, Hu Q, Yu D, Xiao Y. MiR-486-5p-directed MAGI1/Rap1/RASSF5 signaling pathway contributes to hydroquinone-induced inhibition of erythroid differentiation in K562 cells. Toxicol In Vitro 2020; 66:104830. [DOI: 10.1016/j.tiv.2020.104830] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 02/25/2020] [Accepted: 03/16/2020] [Indexed: 02/01/2023]
|
21
|
Mallik S, Qin G, Jia P, Zhao Z. Molecular signatures identified by integrating gene expression and methylation in non-seminoma and seminoma of testicular germ cell tumours. Epigenetics 2020; 16:162-176. [PMID: 32615059 DOI: 10.1080/15592294.2020.1790108] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Testicular germ cell tumours (TGCTs) are the most common cancer in young male adults (aged 15 to 40). Unlike most other cancer types, identification of molecular signatures in TGCT has rarely reported. In this study, we developed a novel integrative analysis framework to identify co-methylated and co-expressed genes [mRNAs and microRNAs (miRNAs)] modules in two TGCT subtypes: non-seminoma (NSE) and seminoma (SE). We first integrated DNA methylation and mRNA/miRNA expression data and then used a statistical method, CoMEx (Combined score of DNA Methylation and Expression), to assess differentially expressed and methylated (DEM) genes/miRNAs. Next, we identified co-methylation and co-expression modules by applying WGCNA (Weighted Gene Correlation Network Analysis) tool to these DEM genes/miRNAs. The module with the highest average Pearson's Correlation Coefficient (PCC) after considering all pair-wise molecules (genes/miRNAs) included 91 molecules. By integrating both transcription factor and miRNA regulations, we constructed subtype-specific regulatory networks for NSE and SE. We identified four hub miRNAs (miR-182-5p, miR-520b, miR-520c-3p, and miR-7-5p), two hub TFs (MYC and SP1), and two genes (RECK and TERT) in the NSE-specific regulatory network, and two hub miRNAs (miR-182-5p and miR-338-3p), five hub TFs (ETS1, HIF1A, HNF1A, MYC, and SP1), and three hub genes (CDH1, CXCR4, and SNAI1) in the SE-specific regulatory network. miRNA (miR-182-5p) and two TFs (MYC and SP1) were common hubs of NSE and SE. We further examined pathways enriched in these subtype-specific networks. Our study provides a comprehensive view of the molecular signatures and co-regulation in two TGCT subtypes.
Collapse
Affiliation(s)
- Saurav Mallik
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Guimin Qin
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX, USA
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston , Houston, TX, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston , Houston, TX, USA.,MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences , Houston, TX, USA.,Department of Biomedical Informatics, Vanderbilt University Medical Center , Nashville, TN, USA
| |
Collapse
|
22
|
Ma X, Sun P, Gong M. An integrative framework of heterogeneous genomic data for cancer dynamic modules based on matrix decomposition. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 19:305-316. [PMID: 32750874 DOI: 10.1109/tcbb.2020.3004808] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Cancer progression is dynamic, and tracking dynamic modules is promising for cancer diagnosis and therapy. Accumulated genomic data provide us an opportunity to investigate the underlying mechanisms of cancers. However, as far as we know, no algorithm has been designed for dynamic modules by integrating heterogeneous omics data. To address this issue, we propose an integrative framework for dynamic module detection based on regularized nonnegative matrix factorization method (DrNMF) by integrating the gene expression and protein interaction network. To remove the heterogeneity of genomic data, we divide the samples of expression profiles into groups to construct gene co-expression networks. To characterize the dynamics of modules, the temporal smoothness framework is adopted, in which the gene co-expression network at the previous stage and protein interaction network are incorporated into the objective function of DrNMF via regularization. The experimental results demonstrate that DrNMF is superior to state-of-the-art methods in terms of accuracy. For breast cancer data, the obtained dynamic modules are more enriched by the known pathways, and can be used to predict the stages of cancers and survival time of patients. The proposed model and algorithm provide an effective integrative analysis of heterogeneous genomic data for cancer progression.
Collapse
|
23
|
Di Nanni N, Bersanelli M, Milanesi L, Mosca E. Network Diffusion Promotes the Integrative Analysis of Multiple Omics. Front Genet 2020; 11:106. [PMID: 32180795 PMCID: PMC7057719 DOI: 10.3389/fgene.2020.00106] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 01/29/2020] [Indexed: 02/01/2023] Open
Abstract
The development of integrative methods is one of the main challenges in bioinformatics. Network-based methods for the analysis of multiple gene-centered datasets take into account known and/or inferred relations between genes. In the last decades, the mathematical machinery of network diffusion—also referred to as network propagation—has been exploited in several network-based pipelines, thanks to its ability of amplifying association between genes that lie in network proximity. Indeed, network diffusion provides a quantitative estimation of network proximity between genes associated with one or more different data types, from simple binary vectors to real vectors. Therefore, this powerful data transformation method has also been increasingly used in integrative analyses of multiple collections of biological scores and/or one or more interaction networks. We present an overview of the state of the art of bioinformatics pipelines that use network diffusion processes for the integrative analysis of omics data. We discuss the fundamental ways in which network diffusion is exploited, open issues and potential developments in the field. Current trends suggest that network diffusion is a tool of broad utility in omics data analysis. It is reasonable to think that it will continue to be used and further refined as new data types arise (e.g. single cell datasets) and the identification of system-level patterns will be considered more and more important in omics data analysis.
Collapse
Affiliation(s)
- Noemi Di Nanni
- Institute of Biomedical Technologies, National Research Council, Milan, Italy.,Department of Industrial and Information Engineering, University of Pavia, Pavia, Italy
| | - Matteo Bersanelli
- Department of Physics and Astronomy, University of Bologna, Bologna, Italy.,National Institute of Nuclear Physics (INFN), Bologna, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| | - Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Milan, Italy
| |
Collapse
|
24
|
Sanchez R, Mackenzie SA. Integrative Network Analysis of Differentially Methylated and Expressed Genes for Biomarker Identification in Leukemia. Sci Rep 2020; 10:2123. [PMID: 32034170 PMCID: PMC7005804 DOI: 10.1038/s41598-020-58123-2] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2019] [Accepted: 01/07/2020] [Indexed: 02/01/2023] Open
Abstract
Genome-wide DNA methylation and gene expression are commonly altered in pediatric acute lymphoblastic leukemia (PALL). Integrated network analysis of cytosine methylation and expression datasets has the potential to provide deeper insights into the complex disease states and their causes than individual disconnected analyses. With the purpose of identifying reliable cancer-associated methylation signal in gene regions from leukemia patients, we present an integrative network analysis of differentially methylated (DMGs) and differentially expressed genes (DEGs). The application of a novel signal detection-machine learning approach to methylation analysis of whole genome bisulfite sequencing (WGBS) data permitted a high level of methylation signal resolution in cancer-associated genes and pathways. This integrative network analysis approach revealed that gene expression and methylation consistently targeted the same gene pathways relevant to cancer: Pathways in cancer, Ras signaling pathway, PI3K-Akt signaling pathway, and Rap1 signaling pathway, among others. Detected gene hubs and hub sub-networks were integrated by signature loci associated with cancer that include, for example, NOTCH1, RAC1, PIK3CD, BCL2, and EGFR. Statistical analysis disclosed a stochastic deterministic relationship between methylation and gene expression within the set of genes simultaneously identified as DEGs and DMGs, where larger values of gene expression changes were probabilistically associated with larger values of methylation changes. Concordance analysis of the overlap between enriched pathways in DEG and DMG datasets revealed statistically significant agreement between gene expression and methylation changes. These results support the potential identification of reliable and stable methylation biomarkers at genes for cancer diagnosis and prognosis.
Collapse
Affiliation(s)
- Robersy Sanchez
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA.
| | - Sally A Mackenzie
- Department of Biology, The Pennsylvania State University, University Park, PA, 16802, USA. .,Department of Plant Science, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
25
|
Zhang Y, Kou C, Wang S, Zhang Y. Genome-wide Differential-based Analysis of the Relationship between DNA Methylation and Gene Expression in Cancer. Curr Bioinform 2019. [DOI: 10.2174/1574893614666190424160046] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Background::
DNA methylation is an epigenetic modification that plays an important
role in regulating gene expression. There is evidence that the hypermethylation of promoter regions
always causes gene silencing. However, how the methylation patterns of other regions in the
genome, such as gene body and 3’UTR, affect gene expression is unknown.
Objective::
The study aimed to fully explore the relationship between DNA methylation and expression
throughout the genome-wide analysis which is important in understanding the function of
DNA methylation essentially.
Method::
In this paper, we develop a heuristic framework to analyze the relationship between the
methylated change in different regions and that of the corresponding gene expression based on differential
analysis.
Results::
To understande the methylated function of different genomic regions, a gene is divided
into seven functional regions. By applying the method in five cancer datasets from the Synapse database,
it was found that methylated regions with a significant difference between cases and controls
were almost uniformly distributed in the seven regions of the genome. Also, the effect of
DNA methylation in different regions on gene expression was different. For example, there was a
higher percentage of positive relationships in 1stExon, gene body and 3’UTR than in TSS1500 and
TSS200. The functional analysis of genes with a significant positive and negative correlation between
DNA methylation and gene expression demonstrated the epigenetic mechanism of cancerassociated
genes.
Conclusion::
Differential based analysis helps us to recognize the change in DNA methylation and
how this change affects the change in gene expression. It provides a basis for further integrating
gene expression and DNA methylation data to identify disease-associated biomarkers.
Collapse
Affiliation(s)
- Yuanyuan Zhang
- School of information and control engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Chuanhua Kou
- School of information and control engineering, Qingdao University of Technology, Qingdao, Shandong, China
| | - Shudong Wang
- College of Computer and Communication Engineering, China University of Petroleum (East China), Qingdao, Shandong, China
| | - Yulin Zhang
- College of Mathematics and Systems Science, Shandong University of Science and Technology, Qingdao, Shandong, China
| |
Collapse
|
26
|
Epsi NJ, Panja S, Pine SR, Mitrofanova A. pathCHEMO, a generalizable computational framework uncovers molecular pathways of chemoresistance in lung adenocarcinoma. Commun Biol 2019; 2:334. [PMID: 31508508 PMCID: PMC6731276 DOI: 10.1038/s42003-019-0572-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Accepted: 08/01/2019] [Indexed: 02/01/2023] Open
Abstract
Despite recent advances in discovering a wide array of novel chemotherapy agents, identification of patients with poor and favorable chemotherapy response prior to treatment administration remains a major challenge in clinical oncology. To tackle this challenge, we present a generalizable genome-wide computational framework pathCHEMO that uncovers interplay between transcriptomic and epigenomic mechanisms altered in biological pathways that govern chemotherapy response in cancer patients. Our approach is tested on patients with lung adenocarcinoma who received adjuvant standard-of-care doublet chemotherapy (i.e., carboplatin-paclitaxel), identifying seven molecular pathway markers of primary treatment response and demonstrating their ability to predict patients at risk of carboplatin-paclitaxel resistance in an independent patient cohort (log-rank p-value = 0.008, HR = 10). Furthermore, we extend our method to additional chemotherapy-regimens and cancer types to demonstrate its accuracy and generalizability. We propose that our model can be utilized to prioritize patients for specific chemotherapy-regimens as a part of treatment planning. Nusrat Epsi et al. present pathCHEMO, a computational framework for uncovering transcriptomic and epigenomic pathways of chemoresistance in cancer that has the potential to improve clinical decision-making. They apply pathCHEMO to lung adenocarcinoma data from public databases, and identify seven molecular pathways implicated in carboplatin-paclitaxel resistance.
Collapse
Affiliation(s)
- Nusrat J Epsi
- 1Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107 USA
| | - Sukanya Panja
- 1Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107 USA
| | - Sharon R Pine
- 2Departments of Pharmacology and Medicine, Rutgers Cancer Institute of New Jersey, Robert Wood Johnson Medical School, New Brunswick, NJ 08901 USA
| | - Antonina Mitrofanova
- 1Department of Health Informatics, Rutgers School of Health Professions, Rutgers Biomedical and Health Sciences, Newark, NJ 07107 USA.,3Rutgers Cancer Institute of New Jersey, Rutgers, The State University of New Jersey, New Brunswick, NJ 08901 USA
| |
Collapse
|
27
|
Sun S, Lee YR, Enfield B. Hemimethylation Patterns in Breast Cancer Cell Lines. Cancer Inform 2019; 18:1176935119872959. [PMID: 31496635 PMCID: PMC6716185 DOI: 10.1177/1176935119872959] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 08/05/2019] [Indexed: 02/01/2023] Open
Abstract
DNA methylation is an epigenetic event that involves adding a methyl group to the cytosine (C) site, especially the one that pairs with a guanine (G) site (ie, CG or CpG site), in a human genome. This event plays an important role in both cancerous and normal cell development. Previous studies often assume symmetric methylation on both DNA strands. However, asymmetric methylation, or hemimethylation (methylation that occurs only on 1 DNA strand), does exist and has been reported in several studies. Due to the limitation of previous DNA methylation sequencing technologies, researchers could only study hemimethylation on specific genes, but the overall genomic hemimethylation landscape remains relatively unexplored. With the development of advanced next-generation sequencing techniques, it is now possible to measure methylation levels on both forward and reverse strands at all CpG sites in an entire genome. Analyzing hemimethylation patterns may potentially reveal regions related to undergoing tumor growth. For our research, we first identify hemimethylated CpG sites in breast cancer cell lines using Wilcoxon signed rank tests. We then identify hemimethylation patterns by grouping consecutive hemimethylated CpG sites based on their methylation states, methylation "M" or unmethylation "U." These patterns include regular (or consecutive) hemimethylation clusters (eg, "MMM" on one strand and "UUU" on another strand) and polarity (or reverse) clusters (eg, "MU" on one strand and "UM" on another strand). Our results reveal that most hemimethylation clusters are the polarity type, and hemimethylation does occur across the entire genome with notably higher numbers in the breast cancer cell lines. The lengths or sizes of most hemimethylation clusters are very short, often less than 50 base pairs. After mapping hemimethylation clusters and sites to corresponding genes, we study the functions of these genes and find that several of the highly hemimethylated genes may influence tumor growth or suppression. These genes may also indicate a progressing transition to a new tumor stage.
Collapse
Affiliation(s)
- Shuying Sun
- Department of Mathematics, Texas State University, San Marcos, TX, USA
| | - Yu Ri Lee
- Department of Mathematics, Texas State University, San Marcos, TX, USA
| | - Brittany Enfield
- Global Engineering Systems, Cypress Semiconductor, Austin, TX, USA
| |
Collapse
|
28
|
Matsubara T, Ochiai T, Hayashida M, Akutsu T, Nacher JC. Convolutional neural network approach to lung cancer classification integrating protein interaction network and gene expression profiles. J Bioinform Comput Biol 2019; 17:1940007. [DOI: 10.1142/s0219720019400079] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Deep learning technologies are permeating every field from image and speech recognition to computational and systems biology. However, the application of convolutional neural networks (CCNs) to “omics” data poses some difficulties, such as the processing of complex networks structures as well as its integration with transcriptome data. Here, we propose a CNN approach that combines spectral clustering information processing to classify lung cancer. The developed spectral-convolutional neural network based method achieves success in integrating protein interaction network data and gene expression profiles to classify lung cancer. The performed computational experiments suggest that in terms of accuracy the predictive performance of our proposed method was better than those of other machine learning methods such as SVM or Random Forest. Moreover, the computational results also indicate that the underlying protein network structure assists to enhance the predictions. Data and CNN code can be downloaded from the link: https://sites.google.com/site/nacherlab/analysis
Collapse
Affiliation(s)
- Teppei Matsubara
- Department of Information Science, Faculty of Science, Toho University, Funabashi, Chiba, Japan
| | - Tomoshiro Ochiai
- Department of Social Information Studies, Otsuma Women’s University, Tokyo, Japan
| | - Morihiro Hayashida
- Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, Shimane, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Japan
| | - Jose C. Nacher
- Department of Information Science, Faculty of Science, Toho University, Funabashi, Chiba, Japan
| |
Collapse
|
29
|
Oulas A, Minadakis G, Zachariou M, Sokratous K, Bourdakou MM, Spyrou GM. Systems Bioinformatics: increasing precision of computational diagnostics and therapeutics through network-based approaches. Brief Bioinform 2019; 20:806-824. [PMID: 29186305 PMCID: PMC6585387 DOI: 10.1093/bib/bbx151] [Citation(s) in RCA: 69] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Revised: 10/17/2017] [Indexed: 02/01/2023] Open
Abstract
Systems Bioinformatics is a relatively new approach, which lies in the intersection of systems biology and classical bioinformatics. It focuses on integrating information across different levels using a bottom-up approach as in systems biology with a data-driven top-down approach as in bioinformatics. The advent of omics technologies has provided the stepping-stone for the emergence of Systems Bioinformatics. These technologies provide a spectrum of information ranging from genomics, transcriptomics and proteomics to epigenomics, pharmacogenomics, metagenomics and metabolomics. Systems Bioinformatics is the framework in which systems approaches are applied to such data, setting the level of resolution as well as the boundary of the system of interest and studying the emerging properties of the system as a whole rather than the sum of the properties derived from the system's individual components. A key approach in Systems Bioinformatics is the construction of multiple networks representing each level of the omics spectrum and their integration in a layered network that exchanges information within and between layers. Here, we provide evidence on how Systems Bioinformatics enhances computational therapeutics and diagnostics, hence paving the way to precision medicine. The aim of this review is to familiarize the reader with the emerging field of Systems Bioinformatics and to provide a comprehensive overview of its current state-of-the-art methods and technologies. Moreover, we provide examples of success stories and case studies that utilize such methods and tools to significantly advance research in the fields of systems biology and systems medicine.
Collapse
Affiliation(s)
- Anastasis Oulas
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George Minadakis
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Margarita Zachariou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Kleitos Sokratous
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - Marilena M Bourdakou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| | - George M Spyrou
- Bioinformatics European Research Area Chair, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus
| |
Collapse
|
30
|
Abstract
Background DNA methylation is an epigenetic event that may regulate gene expression. Because of this regulation role, aberrant DNA methylation is often associated with many diseases. Within-sample DNA co-methylation is the similarity of methylation in nearby cytosine sites of a chromosome. It is important to study co-methylation patterns. However, it is not well studied yet, and it is unclear to us what co-methylation patterns normal DNA samples have. Are the co-methylation patterns of the same tissue across several samples different? Are the co-methylation patterns of various tissues of the same sample different? To answer these questions, we conduct analyses using two sets of data: 3-sample-1-tissue (3S1T) and 1-sample-8-tissue (1S8T). Results To study the co-methylation patterns of the two datasets, 3S1T and 1S8T, we investigate the following questions: How often does one methylation state change to other methylation states and how is this change associated with chromosome distance? Based on the 3S1T data, we find there is not significant co-methylation difference among the same spleen tissues of three different samples. However, the analysis results of 1S8T data show that there were significant differences among eight tissues of one sample. For both 3S1T and 1S8T data, we find that the no/low methylation state A and high/full methylation state D tend to remain the same along a chromosome region. We also find that the low/partial methylation state B and partial/high methylation state C tend to change to higher methylation states along a chromosome. Finally, we find that lengths of most co-methylation regions are very short with only a few hundred base pairs. In fact, only a small proportion of methylated regions are longer than 1000 base pairs. Conclusions In this paper, we have addressed a few questions regarding within-sample co-methylation patterns in normal tissues. Our statistical analysis results and answers may help researchers to better understand the biological process of DNA methylation. This may pave the way to develop better analysis methods for future methylation research. Electronic supplementary material The online version of this article (10.1186/s13040-019-0198-8) contains supplementary material, which is available to authorized users.
Collapse
|
31
|
Pacini C, Koziol MJ. Bioinformatics challenges and perspectives when studying the effect of epigenetic modifications on alternative splicing. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0073. [PMID: 29685977 PMCID: PMC5915717 DOI: 10.1098/rstb.2017.0073] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/14/2017] [Indexed: 02/07/2023] Open
Abstract
It is widely known that epigenetic modifications are important in regulating transcription, but several have also been reported in alternative splicing. The regulation of pre-mRNA splicing is important to explain proteomic diversity and the misregulation of splicing has been implicated in many diseases. Here, we give a brief overview of the role of epigenetics in alternative splicing and disease. We then discuss the bioinformatics methods that can be used to model interactions between epigenetic marks and regulators of splicing. These models can be used to identify alternative splicing and epigenetic changes across different phenotypes. This article is part of a discussion meeting issue ‘Frontiers in epigenetic chemical biology’.
Collapse
Affiliation(s)
- Clare Pacini
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK.,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| | - Magdalena J Koziol
- Wellcome Trust Cancer Research UK Gurdon Institute, University of Cambridge, Tennis Court Road, Cambridge, CB2 1QN, UK .,Department of Zoology, University of Cambridge, Downing Street, Cambridge, CB2 3EJ, UK
| |
Collapse
|
32
|
Jiang Z, Cinti C, Taranta M, Mattioli E, Schena E, Singh S, Khurana R, Lattanzi G, Tsinoremas NF, Capobianco E. Network assessment of demethylation treatment in melanoma: Differential transcriptome-methylome and antigen profile signatures. PLoS One 2018; 13:e0206686. [PMID: 30485296 PMCID: PMC6261551 DOI: 10.1371/journal.pone.0206686] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 10/17/2018] [Indexed: 02/07/2023] Open
Abstract
Background In melanoma, like in other cancers, both genetic alterations and epigenetic underlie the metastatic process. These effects are usually measured by changes in both methylome and transcriptome profiles, whose cross-correlation remains uncertain. We aimed to assess at systems scale the significance of epigenetic treatment in melanoma cells with different metastatic potential. Methods and findings Treatment by DAC demethylation with 5-Aza-2’-deoxycytidine of two melanoma cell lines endowed with different metastatic potential, SKMEL-2 and HS294T, was performed and high-throughput coupled RNA-Seq and RRBS-Seq experiments delivered differential profiles (DiP) of both transcriptomes and methylomes. Methylation levels measured at both TSS and gene body were studied to inspect correlated patterns with wide-spectrum transcript abundance levels quantified in both protein coding and non-coding RNA (ncRNA) regions. The DiP were then mapped onto standard bio-annotation sources (pathways, biological processes) and network configurations were obtained. The prioritized associations for target identification purposes were expected to elucidate the reprogramming dynamics induced by the epigenetic therapy. The interactomic connectivity maps of each cell line were formed to support the analysis of epigenetically re-activated genes. i.e. those supposedly silenced by melanoma. In particular, modular protein interaction networks (PIN) were used, evidencing a limited number of shared annotations, with an example being MAPK13 (cascade of cellular responses evoked by extracellular stimuli). This gene is also a target associated to the PANDAR ncRNA, therapeutically relevant because of its aberrant expression observed in various cancers. Overall, the non-metastatic SKMEL-2 map reveals post-treatment re-activation of a richer pathway landscape, involving cadherins and integrins as signatures of cell adhesion and proliferation. Relatively more lncRNAs were also annotated, indicating more complex regulation patterns in view of target identification. Finally, the antigen maps matched to DiP display other differential signatures with respect to the metastatic potential of the cell lines. In particular, as demethylated melanomas show connected targets that grow with the increased metastatic potential, also the potential target actionability seems to depend to some degree on the metastatic state. However, caution is required when assessing the direct influence of re-activated genes over the identified targets. In light of the stronger treatment effects observed in non-metastatic conditions, some limitations likely refer to in silico data integration tools and resources available for the analysis of tumor antigens. Conclusion Demethylation treatment strongly affects early melanoma progression by re-activating many genes. This evidence suggests that the efficacy of this type of therapeutic intervention is potentially high at the pre-metastatic stages. The biomarkers that can be assessed through antigens seem informative depending on the metastatic conditions, and networks help to elucidate the assessment of possible targets actionability.
Collapse
Affiliation(s)
- Zhijie Jiang
- Center for Computational Science, University of Miami, Miami, FL, United States of America
| | | | | | - Elisabetta Mattioli
- CNR Institute of Molecular Genetics, Bologna, Italy
- IRCCS Rizzoli Orthopedic Institute, Bologna, Italy
| | - Elisa Schena
- CNR Institute of Molecular Genetics, Bologna, Italy
- Endocrinology Unit, Department of Medical & Surgical Sciences, Alma Mater Studiorum University of Bologna, S Orsola-Malpighi Hospital, Bologna, Italy
| | - Sakshi Singh
- Institute of Clinical Physiology, CNR, Siena, Italy
| | - Rimpi Khurana
- Center for Computational Science, University of Miami, Miami, FL, United States of America
| | - Giovanna Lattanzi
- CNR Institute of Molecular Genetics, Bologna, Italy
- IRCCS Rizzoli Orthopedic Institute, Bologna, Italy
| | - Nicholas F. Tsinoremas
- Center for Computational Science, University of Miami, Miami, FL, United States of America
- Department of Medicine, University of Miami, Miami, FL, United States of America
| | - Enrico Capobianco
- Center for Computational Science, University of Miami, Miami, FL, United States of America
- * E-mail:
| |
Collapse
|
33
|
Li Q, Cassese A, Guindani M, Vannucci M. Bayesian negative binomial mixture regression models for the analysis of sequence count and methylation data. Biometrics 2018; 75:183-192. [DOI: 10.1111/biom.12962] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2017] [Revised: 05/01/2018] [Accepted: 07/01/2018] [Indexed: 02/01/2023]
Affiliation(s)
- Qiwei Li
- Department of Clinical SciencesUniversity of Texas Southwestern Medical Center Dallas Texas U.S.A
| | - Alberto Cassese
- Department of Methodology and StatisticsFaculty of Psychology and NeuroscienceMaastricht University Maastricht, The Netherlands
| | - Michele Guindani
- Department of StatisticsUniversity of California Irvine California U.S.A
| | | |
Collapse
|
34
|
Kakouri AC, Christodoulou CC, Zachariou M, Oulas A, Minadakis G, Demetriou CA, Votsi C, Zamba-Papanicolaou E, Christodoulou K, Spyrou GM. Revealing Clusters of Connected Pathways Through Multisource Data Integration in Huntington's Disease and Spastic Ataxia. IEEE J Biomed Health Inform 2018; 23:26-37. [PMID: 30176611 DOI: 10.1109/jbhi.2018.2865569] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The advancement of scientific and medical research over the past years has generated a wealth of experimental data from multiple technologies, including genomics, transcriptomics, proteomics, and other forms of -omics data, which are available for a number of diseases. The integration of such multisource data is a key component toward the success of precision medicine. In this paper, we are investigating a multisource data integration method developed by our group, regarding its ability to drive to clusters of connected pathways under two different approaches: first, a disease-centric approach, where we integrate data around a disease, and second, a gene-centric approach, where we integrate data around a gene. We have used as a paradigm for the first approach Huntington's disease (HD), a disease with a plethora of available data, whereas for the second approach the GBA2, a gene that is related to spastic ataxia (SA), a phenotype with sparse availability of data. Our paper shows that valuable information at the level of disease-related pathway clusters can be obtained for both HD and SA. New pathways that classical pathway analysis methods were unable to reveal, emerged as necessary "connectors" to build connected pathway stories formed as pathway clusters. The capability to integrate multisource molecular data, concluding to something more than the sum of the existing information, empowers precision and personalized medicine approaches.
Collapse
|
35
|
Yuan L, Guo LH, Yuan CA, Zhang YH, Han K, Nandi A, Honig B, Huang DS. Integration of Multi-omics Data for Gene Regulatory Network Inference and Application to Breast Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:782-791. [PMID: 30137012 DOI: 10.1109/tcbb.2018.2866836] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Underlying a cancer phenotype is a specific gene regulatory network that represents the complex regulatory relationships between genes. However, it remains a challenge to find cancer-related gene regulatory network because of insufficient sample sizes and complex regulatory mechanisms in which gene is influenced by not only other genes but also other biological factors. With the development of high-throughput technologies and the unprecedented wealth of multi-omics data give us a new opportunity to design machine learning method to investigate underlying gene regulatory network. In this paper, we propose an approach, which use biweight midcorrelation to measure the correlation between factors and make use of nonconvex penalty based sparse regression for gene regulatory network inference (BMNPGRN). BMNCGRN incorporates multi-omics data (including DNA methylation and copy number variation) and their interactions in gene regulatory network model. The experimental results on synthetic datasets show that BMNPGRN outperforms popular and state-of-the-art methods (including DCGRN, ARACNE and CLR) under false positive control. Furthermore, we applied BMNPGRN on breast cancer (BRCA) data from The Cancer Genome Atlas database and provided gene regulatory network.
Collapse
|
36
|
Klett H, Balavarca Y, Toth R, Gigic B, Habermann N, Scherer D, Schrotz-King P, Ulrich A, Schirmacher P, Herpel E, Brenner H, Ulrich CM, Michels KB, Busch H, Boerries M. Robust prediction of gene regulation in colorectal cancer tissues from DNA methylation profiles. Epigenetics 2018; 13:386-397. [PMID: 29697014 PMCID: PMC6140810 DOI: 10.1080/15592294.2018.1460034] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 03/19/2018] [Accepted: 03/27/2018] [Indexed: 02/01/2023] Open
Abstract
DNA methylation is recognized as one of several epigenetic regulators of gene expression and as potential driver of carcinogenesis through gene-silencing of tumor suppressors and activation of oncogenes. However, abnormal methylation, even of promoter regions, does not necessarily alter gene expression levels, especially if the gene is already silenced, leaving the exact mechanisms of methylation unanswered. Using a large cohort of matching DNA methylation and gene expression samples of colorectal cancer (CRC; n = 77) and normal adjacent mucosa tissues (n = 108), we investigated the regulatory role of methylation on gene expression. We show that on a subset of genes enriched in common cancer pathways, methylation is significantly associated with gene regulation through gene-specific mechanisms. We built two classification models to infer gene regulation in CRC from methylation differences of tumor and normal tissues, taking into account both gene-silencing and gene-activation effects through hyper- and hypo-methylation of CpGs. The classification models result in high prediction performances in both training and independent CRC testing cohorts (0.92
Collapse
Affiliation(s)
- Hagen Klett
- German Cancer Consortium (DKTK), Heidelberg, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Yesilda Balavarca
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Reka Toth
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Biljana Gigic
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of General, Visceral and Transplantation Surgery, University Clinic Heidelberg, Heidelberg, Germany
| | - Nina Habermann
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dominique Scherer
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Germany
| | - Petra Schrotz-King
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alexis Ulrich
- Department of General, Visceral and Transplantation Surgery, University Clinic Heidelberg, Heidelberg, Germany
| | - Peter Schirmacher
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Institute of Pathology, University Clinic Heidelberg, Heidelberg, Germany
| | - Esther Herpel
- Institute of Pathology, University Clinic Heidelberg, Heidelberg, Germany
- Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, Germany
| | - Hermann Brenner
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Cornelia M. Ulrich
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Karin B. Michels
- Institute for Prevention and Cancer Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Germany
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, CA, USA
| | - Hauke Busch
- Lübeck Institute of Experimental Dermatology and Institute of Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Melanie Boerries
- German Cancer Consortium (DKTK), Heidelberg, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| |
Collapse
|
37
|
Ma X, Sun P, Zhang ZY. An Integrative Framework for Protein Interaction Network and Methylation Data to Discover Epigenetic Modules. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 16:1855-1866. [PMID: 29994031 DOI: 10.1109/tcbb.2018.2831666] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
DNA methylation is a critical epigenetic modification that plays an important role in cancers. The available algorithms fail to fully characterize epigenetic modules. To address this issue, we first characterize the epigenetic module as a group of well-connected genes in the protein interaction network and are also co-methylated based on gene methylation profiles. Then, the epigenetic module discovery problem is transformed into an optimization problem. Then, a regularized nonnegative matrix factorization algorithm for methylation modules (RNMF-MM) is presented, where the co-methylation constraint is treated as a regularizer. Using the artificial networks with known module structure, we demonstrate that the proposed algorithm outperforms state-of-the-art approaches in terms of accuracy. On the basis of breast cancer methylation data and protein interaction network, the RNMF-MM algorithm discovers methylation modules that are significantly more enriched by the known pathways than those obtained by other algorithms. These modules serve as biomarkers for predicting cancer stages and estimating survival time of patients. The proposed model and algorithm provide an effective way for the integrative analysis of protein interaction network and methylation data.
Collapse
|
38
|
Zachariou M, Minadakis G, Oulas A, Afxenti S, Spyrou GM. Integrating multi-source information on a single network to detect disease-related clusters of molecular mechanisms. J Proteomics 2018; 188:15-29. [PMID: 29545169 DOI: 10.1016/j.jprot.2018.03.009] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Revised: 02/27/2018] [Accepted: 03/05/2018] [Indexed: 02/08/2023]
Abstract
The abundance of available information for each disease from multiple sources (e.g. as genetic, regulatory, metabolic, and protein-protein interaction) constitutes both an advantage and a challenge in identifying disease-specific underlying mechanisms. Integration of multi-source data is a rising topic and a great challenge in precision medicine and is crucial in enhancing disease understanding, identifying meaningful clusters of molecular mechanisms and increasing precision and personalisation towards the goal of Predictive, Preventive and Personalised Medicine (PPPM). The overall aim of this work was to develop a novel network-based integration methodology with the following characteristics: (i) maximise the number of data sources, (ii) utilise holistic approaches to integrate these sources (iii) be simple, flexible and extendable, (iv) be conclusive. Here, we present the case of Alzheimer's disease as a paradigm for illustrating our novel approach. SIGNIFICANCE In this work we present an integration methodology, which aggregates a large number of the available data sources and types by exploiting the holistic nature of network approaches. It is simple, flexible and extendable generating solid conclusions regarding the molecular mechanisms that underlie the input data. We have illustrated the strength of our proposed methodology using Alzheimer's disease as a paradigm. This method is expected to serve as a stepping-stone for further development of integration methods of multi-source omic-data and to contribute to progress towards the goal of Predictive, Preventive and Personalised Medicine (PPPM). The output of this methodology may act as a reference map of implicated pathways in the disease under investigation, where pathways related to additional omics data from any kind of experiment may be projected. This will increase the precision in the understanding of the disease and may contribute to personalised approaches for patients with different disease-related pathway profile, leading to a more precise, personalised and ideally preventive management of the disease.
Collapse
Affiliation(s)
- Margarita Zachariou
- The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, P.O.Box 23462, 2370 Nicosia, Cyprus
| | - George Minadakis
- The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, P.O.Box 23462, 2370 Nicosia, Cyprus
| | - Anastasis Oulas
- The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, P.O.Box 23462, 2370 Nicosia, Cyprus
| | - Sotiroula Afxenti
- The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, P.O.Box 23462, 2370 Nicosia, Cyprus
| | - George M Spyrou
- The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, P.O.Box 23462, 2370 Nicosia, Cyprus.
| |
Collapse
|
39
|
Gampenrieder SP, Rinnerthaler G, Hackl H, Pulverer W, Weinhaeusel A, Ilic S, Hufnagl C, Hauser-Kronberger C, Egle A, Risch A, Greil R. DNA Methylation Signatures Predicting Bevacizumab Efficacy in Metastatic Breast Cancer. Am J Cancer Res 2018; 8:2278-2288. [PMID: 29721079 PMCID: PMC5928889 DOI: 10.7150/thno.23544] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2017] [Accepted: 12/08/2017] [Indexed: 02/01/2023] Open
Abstract
Background: Biomarkers predicting response to bevacizumab in breast cancer are still missing. Since epigenetic modifications can contribute to an aberrant regulation of angiogenesis and treatment resistance, we investigated the influence of DNA methylation patterns on bevacizumab efficacy. Methods: Genome-wide methylation profiling using the Illumina Infinium HumanMethylation450 BeadChip was performed in archival FFPE specimens of 36 patients with HER2-negative metastatic breast cancer treated with chemotherapy in combination with bevacizumab as first-line therapy (learning set). Based on objective response and progression-free survival (PFS) and considering ER expression, patients were divided in responders (R) and non-responders (NR). Significantly differentially methylated gene loci (CpGs) with a strong change in methylation levels (Δβ>0.15 or Δβ<-0.15) between R and NR were identified and further investigated in 80 bevacizumab-treated breast cancer patients (optimization set) and in 15 patients treated with chemotherapy alone (control set) using targeted deep amplicon bisulfite sequencing. Methylated gene loci were considered predictive if there was a significant association with outcome (PFS) in the optimization set but not in the control set using Spearman rank correlation, Cox regression, and logrank test. Results: Differentially methylated loci in 48 genes were identified, allowing a good separation between R and NR (odds ratio (OR) 101, p<0.0001). Methylation of at least one cytosine in 26 gene-regions was significantly associated with progression-free survival (PFS) in the optimization set, but not in the control set. Using information from the optimization set, the panel was reduced to a 9-gene signature, which could divide patients from the learning set into 2 clusters, thereby predicting response with an OR of 40 (p<0.001) and an AUC of 0.91 (LOOCV). A further restricted 3-gene methylation model showed a significant association of predicted responders with longer PFS in the learning and optimization set even in multivariate analysis with an excellent and good separation of R and NR with AUC=0.94 and AUC=0.86, respectively. Conclusion: Both a 9-gene and 3-gene methylation signature can discriminate between R and NR to a bevacizumab-based therapy in MBC and could help identify patients deriving greater benefit from bevacizumab.
Collapse
|
40
|
Yan KK, Zhao H, Pang H. A comparison of graph- and kernel-based -omics data integration algorithms for classifying complex traits. BMC Bioinformatics 2017; 18:539. [PMID: 29212468 PMCID: PMC6389230 DOI: 10.1186/s12859-017-1982-4] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2017] [Accepted: 11/26/2017] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND High-throughput sequencing data are widely collected and analyzed in the study of complex diseases in quest of improving human health. Well-studied algorithms mostly deal with single data source, and cannot fully utilize the potential of these multi-omics data sources. In order to provide a holistic understanding of human health and diseases, it is necessary to integrate multiple data sources. Several algorithms have been proposed so far, however, a comprehensive comparison of data integration algorithms for classification of binary traits is currently lacking. RESULTS In this paper, we focus on two common classes of integration algorithms, graph-based that depict relationships with subjects denoted by nodes and relationships denoted by edges, and kernel-based that can generate a classifier in feature space. Our paper provides a comprehensive comparison of their performance in terms of various measurements of classification accuracy and computation time. Seven different integration algorithms, including graph-based semi-supervised learning, graph sharpening integration, composite association network, Bayesian network, semi-definite programming-support vector machine (SDP-SVM), relevance vector machine (RVM) and Ada-boost relevance vector machine are compared and evaluated with hypertension and two cancer data sets in our study. In general, kernel-based algorithms create more complex models and require longer computation time, but they tend to perform better than graph-based algorithms. The performance of graph-based algorithms has the advantage of being faster computationally. CONCLUSIONS The empirical results demonstrate that composite association network, relevance vector machine, and Ada-boost RVM are the better performers. We provide recommendations on how to choose an appropriate algorithm for integrating data from multiple sources.
Collapse
Affiliation(s)
- Kang K Yan
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, CT, USA
| | - Herbert Pang
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
41
|
Statistical and integrative system-level analysis of DNA methylation data. Nat Rev Genet 2017; 19:129-147. [PMID: 29129922 DOI: 10.1038/nrg.2017.86] [Citation(s) in RCA: 183] [Impact Index Per Article: 26.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Epigenetics plays a key role in cellular development and function. Alterations to the epigenome are thought to capture and mediate the effects of genetic and environmental risk factors on complex disease. Currently, DNA methylation is the only epigenetic mark that can be measured reliably and genome-wide in large numbers of samples. This Review discusses some of the key statistical challenges and algorithms associated with drawing inferences from DNA methylation data, including cell-type heterogeneity, feature selection, reverse causation and system-level analyses that require integration with other data types such as gene expression, genotype, transcription factor binding and other epigenetic information.
Collapse
|
42
|
Ma X, Yu L, Wang P, Yang X. Discovering DNA methylation patterns for long non-coding RNAs associated with cancer subtypes. Comput Biol Chem 2017; 69:164-170. [PMID: 28501295 DOI: 10.1016/j.compbiolchem.2017.03.014] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2017] [Revised: 03/28/2017] [Accepted: 03/28/2017] [Indexed: 02/01/2023]
Abstract
Despite growing evidence demonstrates that the long non-coding ribonucleic acids (lncRNAs) are critical modulators for cancers, the knowledge about the DNA methylation patterns of lncRNAs is quite limited. We develop a systematic analysis pipeline to discover DNA methylation patterns for lncRNAs across multiple cancer subtypes from probe, gene and network levels. By using The Cancer Genome Atlas (TCGA) breast cancer methylation data, the pipeline discovers various DNA methylation patterns for lncRNAs across four major subtypes such as luminal A, luminal B, her2-enriched as well as basal-like. On the probe and gene level, we find that both differentially methylated probes and lncRNAs are subtype specific, while the lncRNAs are not as specific as probes. On the network level, the pipeline constructs differential co-methylation lncRNA network for each subtype. Then, it identifies both subtype specific and common lncRNA modules by simultaneously analyzing multiple networks. We show that the lncRNAs in subtype specific and common modules differ greatly in terms of topological structure, sequence conservation as well as expression. Furthermore, the subtype specific lncRNA modules serve as biomarkers to improve significantly the accuracy of breast cancer subtypes prediction. Finally, the common lncRNA modules associate with survival time of patients, which is critical for cancer therapy.
Collapse
Affiliation(s)
- Xiaoke Ma
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, Shaanxi, China; Xidian-Ningbo Information Technology Institute, Xidian University, No. 777 Zhongguanxi Road, Ningbo City, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, Shaanxi, China
| | - Peizhuo Wang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, Shaanxi, China
| | - Xiaofei Yang
- School of Computer Science and Technology, Xidian University, No.2 South Taibai Road, Xi'an, Shaanxi, China
| |
Collapse
|