1
|
Guerin LN, Scott TJ, Yap JA, Johansson A, Puddu F, Charlesworth T, Yang Y, Simmons AJ, Lau KS, Ihrie RA, Hodges E. Temporally discordant chromatin accessibility and DNA demethylation define short and long-term enhancer regulation during cell fate specification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.08.27.609789. [PMID: 39253426 PMCID: PMC11383056 DOI: 10.1101/2024.08.27.609789] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 09/11/2024]
Abstract
Epigenetic mechanisms govern the transcriptional activity of lineage-specifying enhancers; but recent work challenges the dogma that joint chromatin accessibility and DNA demethylation are prerequisites for transcription. To understand this paradox, we established a highly-resolved timeline of DNA demethylation, chromatin accessibility, and transcription factor occupancy during neural progenitor cell differentiation. We show thousands of enhancers undergo rapid, transient accessibility changes associated with distinct periods of transcription factor expression. However, most DNA methylation changes are unidirectional and delayed relative to chromatin dynamics, creating transiently discordant epigenetic states. Genome-wide detection of 5-hydroxymethylcytosine further revealed active demethylation begins ahead of chromatin and transcription factor activity, while enhancer hypomethylation persists long after these activities have dissipated. We demonstrate that these timepoint specific methylation states predict past, present and future chromatin accessibility using machine learning models. Thus, chromatin and DNA methylation collaborate on different timescales to mediate short and long-term enhancer regulation during cell fate specification.
Collapse
Affiliation(s)
- Lindsey N. Guerin
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Timothy J. Scott
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Jacqueline A. Yap
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, USA
| | | | - Fabio Puddu
- biomodal, Chesterford Research Park, Cambridge, UK
| | | | - Yilin Yang
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Alan J. Simmons
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Ken S. Lau
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
- Epithelial Biology Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Program in Chemical and Physical Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Rebecca A. Ihrie
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Department of Neurological Surgery, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Brain Institute, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Emily Hodges
- Department of Biochemistry, Vanderbilt University School of Medicine, Nashville, TN, USA
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, TN, USA
- Vanderbilt Genetics Institute, Vanderbilt University School of Medicine, Nashville, TN, USA
| |
Collapse
|
2
|
Labani M, Beheshti A, O’Brien TA. GENet: A Graph-Based Model Leveraging Histone Marks and Transcription Factors for Enhanced Gene Expression Prediction. Genes (Basel) 2024; 15:938. [PMID: 39062717 PMCID: PMC11275947 DOI: 10.3390/genes15070938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2024] [Revised: 07/16/2024] [Accepted: 07/17/2024] [Indexed: 07/28/2024] Open
Abstract
Understanding the regulatory mechanisms of gene expression is a crucial objective in genomics. Although the DNA sequence near the transcription start site (TSS) offers valuable insights, recent methods suggest that analyzing only the surrounding DNA may not suffice to accurately predict gene expression levels. We developed GENet (Gene Expression Network from Histone and Transcription Factor Integration), a novel approach that integrates essential regulatory signals from transcription factors and histone modifications into a graph-based model. GENet extends beyond simple DNA sequence analysis by incorporating additional layers of genetic control, which are vital for determining gene expression. Our method markedly enhances the prediction of mRNA levels compared to previous models that depend solely on DNA sequence data. The results underscore the significance of including comprehensive regulatory information in gene expression studies. GENet emerges as a promising tool for researchers, with potential applications extending from fundamental biological research to the development of medical therapies.
Collapse
Affiliation(s)
- Mahdieh Labani
- School of Computing, Macquarie University, Sydney 2109, Australia; (M.L.); (T.A.O.)
| | - Amin Beheshti
- School of Computing, Macquarie University, Sydney 2109, Australia; (M.L.); (T.A.O.)
| | - Tracey A. O’Brien
- School of Computing, Macquarie University, Sydney 2109, Australia; (M.L.); (T.A.O.)
- Cancer Institute NSW, Sydney 2065, Australia
- School of Clinical Medicine, Medicine & Health, University of New South Wales (UNSW), Sydney 2052, Australia
| |
Collapse
|
3
|
Sakellaropoulos T, Do C, Jiang G, Cova G, Meyn P, Dimartino D, Ramaswami S, Heguy A, Tsirigos A, Skok JA. MethNet: a robust approach to identify regulatory hubs and their distal targets from cancer data. Nat Commun 2024; 15:6027. [PMID: 39025865 PMCID: PMC11258126 DOI: 10.1038/s41467-024-50380-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2023] [Accepted: 07/09/2024] [Indexed: 07/20/2024] Open
Abstract
Aberrations in the capacity of DNA/chromatin modifiers and transcription factors to bind non-coding regions can lead to changes in gene regulation and impact disease phenotypes. However, identifying distal regulatory elements and connecting them with their target genes remains challenging. Here, we present MethNet, a pipeline that integrates large-scale DNA methylation and gene expression data across multiple cancers, to uncover cis regulatory elements (CREs) in a 1 Mb region around every promoter in the genome. MethNet identifies clusters of highly ranked CREs, referred to as 'hubs', which contribute to the regulation of multiple genes and significantly affect patient survival. Promoter-capture Hi-C confirmed that highly ranked associations involve physical interactions between CREs and their gene targets, and CRISPR interference based single-cell RNA Perturb-seq validated the functional impact of CREs. Thus, MethNet-identified CREs represent a valuable resource for unraveling complex mechanisms underlying gene expression, and for prioritizing the verification of predicted non-coding disease hotspots.
Collapse
Affiliation(s)
- Theodore Sakellaropoulos
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Catherine Do
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Guimei Jiang
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Giulia Cova
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Peter Meyn
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Dacia Dimartino
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Sitharam Ramaswami
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Adriana Heguy
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Aristotelis Tsirigos
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA.
- Applied Bioinformatics Laboratories, Office of Science & Research, NYU Grossman School of Medicine, New York, NY, USA.
| | - Jane A Skok
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA.
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA.
| |
Collapse
|
4
|
Gonzalez-Avalos E, Onodera A, Samaniego-Castruita D, Rao A, Ay F. Predicting gene expression state and prioritizing putative enhancers using 5hmC signal. Genome Biol 2024; 25:142. [PMID: 38825692 PMCID: PMC11145787 DOI: 10.1186/s13059-024-03273-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 05/11/2024] [Indexed: 06/04/2024] Open
Abstract
BACKGROUND Like its parent base 5-methylcytosine (5mC), 5-hydroxymethylcytosine (5hmC) is a direct epigenetic modification of cytosines in the context of CpG dinucleotides. 5hmC is the most abundant oxidized form of 5mC, generated through the action of TET dioxygenases at gene bodies of actively-transcribed genes and at active or lineage-specific enhancers. Although such enrichments are reported for 5hmC, to date, predictive models of gene expression state or putative regulatory regions for genes using 5hmC have not been developed. RESULTS Here, by using only 5hmC enrichment in genic regions and their vicinity, we develop neural network models that predict gene expression state across 49 cell types. We show that our deep neural network models distinguish high vs low expression state utilizing only 5hmC levels and these predictive models generalize to unseen cell types. Further, in order to leverage 5hmC signal in distal enhancers for expression prediction, we employ an Activity-by-Contact model and also develop a graph convolutional neural network model with both utilizing Hi-C data and 5hmC enrichment to prioritize enhancer-promoter links. These approaches identify known and novel putative enhancers for key genes in multiple immune cell subsets. CONCLUSIONS Our work highlights the importance of 5hmC in gene regulation through proximal and distal mechanisms and provides a framework to link it to genome function. With the recent advances in 6-letter DNA sequencing by short and long-read techniques, profiling of 5mC and 5hmC may be done routinely in the near future, hence, providing a broad range of applications for the methods developed here.
Collapse
Affiliation(s)
- Edahi Gonzalez-Avalos
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Atsushi Onodera
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Department of Immunology, Graduate School of Medicine, Chiba University, Chiba, 260-8670, Japan
| | - Daniela Samaniego-Castruita
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA
- Biological Sciences Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA
| | - Anjana Rao
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pharmacology, University of California San Diego, La Jolla, CA, 92093, USA.
- Sanford Consortium for Regenerative Medicine, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
| | - Ferhat Ay
- La Jolla Institute for Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
- Bioinformatics and Systems Biology Graduate Program, University of California San Diego, La Jolla, CA, 92093, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, 92093, USA.
- Department of Pediatrics, University of California San Diego, La Jolla, CA, 92093, USA.
| |
Collapse
|
5
|
Roig-Genoves JV, García-Giménez JL, Mena-Molla S. A miRNA-based epigenetic molecular clock for biological skin-age prediction. Arch Dermatol Res 2024; 316:326. [PMID: 38822910 PMCID: PMC11144124 DOI: 10.1007/s00403-024-03129-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2024] [Revised: 04/27/2024] [Accepted: 05/02/2024] [Indexed: 06/03/2024]
Abstract
Skin aging is one of the visible characteristics of the aging process in humans. In recent years, different biological clocks have been generated based on protein or epigenetic markers, but few have focused on biological age in the skin. Arrest the aging process or even being able to restore an organism from an older to a younger stage is one of the main challenges in the last 20 years in biomedical research. We have implemented several machine learning models, including regression and classification algorithms, in order to create an epigenetic molecular clock based on miRNA expression profiles of healthy subjects to predict biological age-related to skin. Our best models are capable of classifying skin samples according to age groups (18-28; 29-39; 40-50; 51-60 or 61-83 years old) with an accuracy of 80% or predict age with a mean absolute error of 10.89 years using the expression levels of 1856 unique miRNAs. Our results suggest that this kind of epigenetic clocks arises as a promising tool with several applications in the pharmaco-cosmetic industry.
Collapse
Affiliation(s)
| | - José Luis García-Giménez
- Consortium Center for Biomedical Network Research on Rare Diseases (CIBERER), Institute of Health Carlos III, Valencia, 46010, Spain
- INCLIVA Health Research Institute, INCLIVA, Valencia, 46010, Spain
- EpiDisease S.L (Spin-off from the CIBER-ISCIII), Parc Científic de la Universitat de Valencia, Paterna, 46980, Spain
- Department of Physiology, Faculty of Pharmacy, University of Valencia, Burjassot, 46100, Spain
| | - Salvador Mena-Molla
- INCLIVA Health Research Institute, INCLIVA, Valencia, 46010, Spain.
- EpiDisease S.L (Spin-off from the CIBER-ISCIII), Parc Científic de la Universitat de Valencia, Paterna, 46980, Spain.
- Department of Physiology, Faculty of Pharmacy, University of Valencia, Burjassot, 46100, Spain.
| |
Collapse
|
6
|
Okwori M, Eslami A. Feature engineering from meta-data for prediction of differentially expressed genes: An investigation of Mus musculus exposed to space-conditions. Comput Biol Chem 2024; 109:108026. [PMID: 38335853 DOI: 10.1016/j.compbiolchem.2024.108026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Revised: 12/29/2023] [Accepted: 02/02/2024] [Indexed: 02/12/2024]
Abstract
Transcription profiling is a key process that can reveal those biological mechanisms driving the response to various exposure conditions or gene perturbations. In this work, we investigate the prediction of differentially expressed genes (DEGs) when exposed to conditions in space from a set of diverse engineered features. To do this, we collected DEGs and non-differentially expressed genes (NDEGs) of Mus musculus-based experiments on the GeneLab database. We engineered a diverse set of features from factors reported in the literature to affect gene expression. An extreme gradient boosting (XGBoost) model was trained to predict if a given gene would be differentially expressed at various levels of differential expression. The test results on a separate holdout dataset showed an area under the receiver operating characteristics curves (AUCs) of 0.90±0.07, averaged across the five selected percentages of the most and least differentially expressed genes. Subsequently, we investigated the impact of selection of features, both individually with a correlation-based feature-selection procedure and in groups with a combination procedure, on the prediction performance. The feature selection confirmed some known drivers of adaptation to radiation and highlighted some new transcription factors and micro RNAs (miRNAs). Finally, gene ontology (GO) analysis revealed biological processes that tend to have expression patterns most suitable for this approach. This work highlights the potential of detection of differentially expressed genes using a machine learning (ML) approach, and provides some evidence of gene expression changes being captured by a diverse feature set not related to the condition under study.
Collapse
Affiliation(s)
- Michael Okwori
- Department of Electrical, Computer and Biomedical Engineering, Union College, Schenectady, 12308, NY, United States of America.
| | - Ali Eslami
- Department of Electrical and Computer Engineering, Wichita State University, Wichita, 67260, KS, United States of America
| |
Collapse
|
7
|
Sakellaropoulos T, Do C, Jiang G, Cova G, Meyn P, Dimartino D, Ramaswami S, Heguy A, Tsirigos A, Skok JA. MethNet: a robust approach to identify regulatory hubs and their distal targets in cancer. RESEARCH SQUARE 2023:rs.3.rs-3150386. [PMID: 37577603 PMCID: PMC10418566 DOI: 10.21203/rs.3.rs-3150386/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
Aberrations in the capacity of DNA/chromatin modifiers and transcription factors to bind non-coding regions can lead to changes in gene regulation and impact disease phenotypes. However, identifying distal regulatory elements and connecting them with their target genes remains challenging. Here, we present MethNet, a pipeline that integrates large-scale DNA methylation and gene expression data across multiple cancers, to uncover novel cis regulatory elements (CREs) in a 1Mb region around every promoter in the genome. MethNet identifies clusters of highly ranked CREs, referred to as 'hubs', which contribute to the regulation of multiple genes and significantly affect patient survival. Promoter-capture Hi-C confirmed that highly ranked associations involve physical interactions between CREs and their gene targets, and CRISPRi based scRNA Perturb-seq validated the functional impact of CREs. Thus, MethNet-identified CREs represent a valuable resource for unraveling complex mechanisms underlying gene expression, and for prioritizing the verification of predicted non-coding disease hotspots.
Collapse
Affiliation(s)
- Theodore Sakellaropoulos
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Catherine Do
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Guimei Jiang
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Giulia Cova
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| | - Peter Meyn
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Dacia Dimartino
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Sitharam Ramaswami
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Adriana Heguy
- Genome Technology Center, NYU Grossman School of Medicine, New York, NY, USA
| | - Aristotelis Tsirigos
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
- Applied Bioinformatics Laboratories, Office of Science & Research, NYU Grossman School of Medicine, New York, NY, USA
| | - Jane A Skok
- Department of Pathology, NYU Grossman School of Medicine, New York, NY, USA
- Perlmutter Cancer Center, NYU Langone Health, New York, NY, USA
| |
Collapse
|
8
|
Ribeiro ML, Sánchez Vinces S, Mondragon L, Roué G. Epigenetic targets in B- and T-cell lymphomas: latest developments. Ther Adv Hematol 2023; 14:20406207231173485. [PMID: 37273421 PMCID: PMC10236259 DOI: 10.1177/20406207231173485] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 04/17/2023] [Indexed: 06/06/2023] Open
Abstract
Non-Hodgkin's lymphomas (NHLs) comprise a diverse group of diseases, either of mature B-cell or of T-cell derivation, characterized by heterogeneous molecular features and clinical manifestations. While most of the patients are responsive to standard chemotherapy, immunotherapy, radiation and/or stem cell transplantation, relapsed and/or refractory cases still have a dismal outcome. Deep sequencing analysis have pointed out that epigenetic dysregulations, including mutations in epigenetic enzymes, such as chromatin modifiers and DNA methyltransferases (DNMTs), are prevalent in both B- cell and T-cell lymphomas. Accordingly, over the past decade, a large number of epigenetic-modifying agents have been developed and introduced into the clinical management of these entities, and a few specific inhibitors have already been approved for clinical use. Here we summarize the main epigenetic alterations described in B- and T-NHL, that further supported the clinical development of a selected set of epidrugs in determined diseases, including inhibitors of DNMTs, histone deacetylases (HDACs), and extra-terminal domain proteins (bromodomain and extra-terminal motif; BETs). Finally, we highlight the most promising future directions of research in this area, explaining how bioinformatics approaches can help to identify new epigenetic targets in B- and T-cell lymphoid neoplasms.
Collapse
Affiliation(s)
- Marcelo Lima Ribeiro
- Lymphoma Translational Group, Josep Carreras
Leukaemia Research Institute, Badalona, Spain
- Laboratory of Immunopharmacology and Molecular
Biology, Sao Francisco University Medical School, Braganca Paulista,
Brazil
| | - Salvador Sánchez Vinces
- Laboratory of Immunopharmacology and Molecular
Biology, Sao Francisco University Medical School, Braganca Paulista,
Brazil
| | - Laura Mondragon
- T Cell Lymphoma Group, Josep Carreras Leukaemia
Research Institute, IJC. Ctra de Can Ruti, Camí de les Escoles s/n, 08916
Badalona, Barcelona, Spain
| | - Gael Roué
- Lymphoma Translational Group, Josep Carreras
Leukaemia Research Institute, IJC. Ctra de Can Ruti, Camí de les Escoles
s/n, 08916 Badalona, Barcelona, Spain
| |
Collapse
|
9
|
Liang P, Chen J, Yao L, Hao Z, Chang Q. A Deep Learning Approach for Prognostic Evaluation of Lung Adenocarcinoma Based on Cuproptosis-Related Genes. Biomedicines 2023; 11:biomedicines11051479. [PMID: 37239150 DOI: 10.3390/biomedicines11051479] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 05/13/2023] [Accepted: 05/16/2023] [Indexed: 05/28/2023] Open
Abstract
Lung adenocarcinoma represents a significant global health challenge. Despite advances in diagnosis and treatment, the prognosis remains poor for many patients. In this study, we aimed to identify cuproptosis-related genes and to develop a deep neural network model to predict the prognosis of lung adenocarcinoma. We screened differentially expressed genes from The Cancer Genome Atlas data through differential analysis of cuproptosis-related genes. We then used this information to establish a prognostic model using a deep neural network, which we validated using data from the Gene Expression Omnibus. Our deep neural network model incorporated nine cuproptosis-related genes and achieved an area under the curve of 0.732 in the training set and 0.646 in the validation set. The model effectively distinguished between distinct risk groups, as evidenced by significant differences in survival curves (p < 0.001), and demonstrated significant independence as a standalone prognostic predictor (p < 0.001). Functional analysis revealed differences in cellular pathways, the immune microenvironment, and tumor mutation burden between the risk groups. Furthermore, our model provided personalized survival probability predictions with a concordance index of 0.795 and identified the drug candidate BMS-754807 as a potentially sensitive treatment option for lung adenocarcinoma. In summary, we presented a deep neural network prognostic model for lung adenocarcinoma, based on nine cuproptosis-related genes, which offers independent prognostic capabilities. This model can be used for personalized predictions of patient survival and the identification of potential therapeutic agents for lung adenocarcinoma, which may ultimately improve patient outcomes.
Collapse
Affiliation(s)
- Pengchen Liang
- Shanghai Key Laboratory of Gastric Neoplasms, Department of Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200020, China
- School of Microelectronics, Shanghai University, Shanghai 201800, China
| | - Jianguo Chen
- School of Software Engineering, Sun Yat-sen University, Zhuhai 528478, China
| | - Lei Yao
- School of Microelectronics, Shanghai University, Shanghai 201800, China
| | - Zezhou Hao
- School of Health Science and Engineering, University of Shanghai for Science and Technology, Shanghai 200093, China
| | - Qing Chang
- Shanghai Key Laboratory of Gastric Neoplasms, Department of Surgery, Shanghai Institute of Digestive Surgery, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200020, China
| |
Collapse
|
10
|
Chen Y, Xie M, Wen J. Predicting gene expression from histone modifications with self-attention based neural networks and transfer learning. Front Genet 2022; 13:1081842. [PMID: 36588793 PMCID: PMC9797047 DOI: 10.3389/fgene.2022.1081842] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2022] [Accepted: 11/28/2022] [Indexed: 12/15/2022] Open
Abstract
It is well known that histone modifications play an important part in various chromatin-dependent processes such as DNA replication, repair, and transcription. Using computational models to predict gene expression based on histone modifications has been intensively studied. However, the accuracy of the proposed models still has room for improvement, especially in cross-cell lines gene expression prediction. In the work, we proposed a new model TransferChrome to predict gene expression from histone modifications based on deep learning. The model uses a densely connected convolutional network to capture the features of histone modifications data and uses self-attention layers to aggregate global features of the data. For cross-cell lines gene expression prediction, TransferChrome adopts transfer learning to improve prediction accuracy. We trained and tested our model on 56 different cell lines from the REMC database. The experimental results show that our model achieved an average Area Under the Curve (AUC) score of 84.79%. Compared to three state-of-the-art models, TransferChrome improves the prediction performance on most cell lines. The experiments of cross-cell lines gene expression prediction show that TransferChrome performs best and is an efficient model for predicting cross-cell lines gene expression.
Collapse
|
11
|
Vekariya V, Passi K, Jain CK. Predicting liver cancer on epigenomics data using machine learning. FRONTIERS IN BIOINFORMATICS 2022; 2:954529. [PMID: 36304318 PMCID: PMC9580905 DOI: 10.3389/fbinf.2022.954529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 09/05/2022] [Indexed: 11/20/2022] Open
Abstract
Epigenomics is the branch of biology concerned with the phenotype modifications that do not induce any change in the cell DNA sequence. Epigenetic modifications apply changes to the properties of DNA, which ultimately prevents such DNA actions from being executed. These alterations arise in the cancer cells, which is the only cause of cancer. The liver is the metabolic cleansing center of the human body and the only organ, which can regenerate itself, but liver cancer can stop the cleansing of the body. Machine learning techniques are used in this research to predict the gene expression of the liver cells for the liver hepatocellular carcinoma (LIHC), which is the third biggest reason of death by cancer and affects five hundred thousand people per year. The data for LIHC include four different types, namely, methylation, histone, the human genome, and RNA sequences. The data were accessed through open-source technologies in R programming languages for The Cancer Genome Atlas (TCGA). The proposed method considers 1,000 features across the four types of data. Nine different feature selection methods were used and eight different classification methods were compared to select the best model over 5-fold cross-validation and different training-to-test ratios. The best model was obtained for 140 features for ReliefF feature selection and XGBoost classification method with an AUC of 1.0 and an accuracy of 99.67% to predict the liver cancer.
Collapse
Affiliation(s)
- Vishalkumar Vekariya
- School of Engineering and Computer Science, Laurentian University, Sudbury, ON, Canada
| | - Kalpdrum Passi
- School of Engineering and Computer Science, Laurentian University, Sudbury, ON, Canada
| | - Chakresh Kumar Jain
- Department of Biotechnology, Jaypee Institute of Information Technology, Noida, India
| |
Collapse
|
12
|
Lung cancer prediction using multi-gene genetic programming by selecting automatic features from amino acid sequences. Comput Biol Chem 2022; 98:107638. [DOI: 10.1016/j.compbiolchem.2022.107638] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2021] [Revised: 12/22/2021] [Accepted: 02/01/2022] [Indexed: 02/07/2023]
|
13
|
Zhang L, Yang Y, Chai L, Li Q, Liu J, Lin H, Liu L. A deep learning model to identify gene expression level using cobinding transcription factor signals. Brief Bioinform 2021; 23:6447678. [PMID: 34864886 DOI: 10.1093/bib/bbab501] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 10/13/2021] [Accepted: 11/01/2021] [Indexed: 01/02/2023] Open
Abstract
Gene expression is directly controlled by transcription factors (TFs) in a complex combination manner. It remains a challenging task to systematically infer how the cooperative binding of TFs drives gene activity. Here, we quantitatively analyzed the correlation between TFs and surveyed the TF interaction networks associated with gene expression in GM12878 and K562 cell lines. We identified six TF modules associated with gene expression in each cell line. Furthermore, according to the enrichment characteristics of TFs in these TF modules around a target gene, a convolutional neural network model, called TFCNN, was constructed to identify gene expression level. Results showed that the TFCNN model achieved a good prediction performance for gene expression. The average of the area under receiver operating characteristics curve (AUC) can reach up to 0.975 and 0.976, respectively in GM12878 and K562 cell lines. By comparison, we found that the TFCNN model outperformed the prediction models based on SVM and LDA. This is due to the TFCNN model could better extract the combinatorial interaction among TFs. Further analysis indicated that the abundant binding of regulatory TFs dominates expression of target genes, while the cooperative interaction between TFs has a subtle regulatory effects. And gene expression could be regulated by different TF combinations in a nonlinear way. These results are helpful for deciphering the mechanism of TF combination regulating gene expression.
Collapse
Affiliation(s)
- Lirong Zhang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Yanchao Yang
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Lu Chai
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Qianzhong Li
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Junjie Liu
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Li Liu
- School of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| |
Collapse
|
14
|
Chenarani N, Emamjomeh A, Allahverdi A, Mirmostafa S, Afsharinia MH, Zahiri J. Bioinformatic tools for DNA methylation and histone modification: A survey. Genomics 2021; 113:1098-1113. [PMID: 33677056 DOI: 10.1016/j.ygeno.2021.03.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Revised: 10/10/2020] [Accepted: 03/02/2021] [Indexed: 01/19/2023]
Abstract
Epigenetic inheritance occurs due to different mechanisms such as chromatin and histone modifications, DNA methylation and processes mediated by non-coding RNAs. It leads to changes in gene expressions and the emergence of new traits in different organisms in many diseases such as cancer. Recent advances in experimental methods led to the identification of epigenetic target sites in various organisms. Computational approaches have enabled us to analyze mass data produced by these methods. Next-generation sequencing (NGS) methods have been broadly used to identify these target sites and their patterns. By using these patterns, the emergence of diseases could be prognosticated. In this study, target site prediction tools for two major epigenetic mechanisms comprising histone modification and DNA methylation are reviewed. Publicly accessible databases are reviewed as well. Some suggestions regarding the state-of-the-art methods and databases have been made, including examining patterns of epigenetic changes that are important in epigenotypes detection.
Collapse
Affiliation(s)
- Nasibeh Chenarani
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran
| | - Abbasali Emamjomeh
- Department of Plant Breeding and Biotechnology (PBB), Faculty of Agriculture, University of Zabol, Zabol, Iran; Laboratory of Computational Biotechnology and Bioinformatics (CBB), Department of Bioinformatics, Faculty of Basic Sciences, University of Zabol, Zabol, Iran.
| | - Abdollah Allahverdi
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - SeyedAli Mirmostafa
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Mohammad Hossein Afsharinia
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran
| | - Javad Zahiri
- Bioinformatics and Computational Omics Lab (BioCOOL), Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Tehran, Iran; Department of Neuroscience, University of California, San Diego, USA.
| |
Collapse
|
15
|
Yan R, Fan C, Yin Z, Wang T, Chen X. Potential applications of deep learning in single-cell RNA sequencing analysis for cell therapy and regenerative medicine. Stem Cells 2021; 39:511-521. [PMID: 33587792 DOI: 10.1002/stem.3336] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 12/07/2020] [Indexed: 12/26/2022]
Abstract
When used in cell therapy and regenerative medicine strategies, stem cells have potential to treat many previously incurable diseases. However, current application methods using stem cells are underdeveloped, as these cells are used directly regardless of their culture medium and subgroup. For example, when using mesenchymal stem cells (MSCs) in cell therapy, researchers do not consider their source and culture method nor their application angle and function (soft tissue regeneration, hard tissue regeneration, suppression of immune function, or promotion of immune function). By combining machine learning methods (such as deep learning) with data sets obtained through single-cell RNA sequencing (scRNA-seq) technology, we can discover the hidden structure of these cells, predict their effects more accurately, and effectively use subpopulations with differentiation potential for stem cell therapy. scRNA-seq technology has changed the study of transcription, because it can express single-cell genes with single-cell anatomical resolution. However, this powerful technology is sensitive to biological and technical noise. The subsequent data analysis can be computationally difficult for a variety of reasons, such as denoising single cell data, reducing dimensionality, imputing missing values, and accounting for the zero-inflated nature. In this review, we discussed how deep learning methods combined with scRNA-seq data for research, how to interpret scRNA-seq data in more depth, improve the follow-up analysis of stem cells, identify potential subgroups, and promote the implementation of cell therapy and regenerative medicine measures.
Collapse
Affiliation(s)
- Ruojin Yan
- Dr. Li Dak Sum - Yip Yio Chin Center for Stem Cells and Regenerative Medicine and Department of Orthopedic Surgery of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Key Laboratory of Tissue Engineering and Regenerative Medicine of Zhejiang Province, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Department of Sports Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,China Orthopedic Regenerative Medicine Group (CORMed), Hangzhou, People's Republic of China
| | - Chunmei Fan
- Dr. Li Dak Sum - Yip Yio Chin Center for Stem Cells and Regenerative Medicine and Department of Orthopedic Surgery of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Key Laboratory of Tissue Engineering and Regenerative Medicine of Zhejiang Province, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Department of Sports Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,China Orthopedic Regenerative Medicine Group (CORMed), Hangzhou, People's Republic of China
| | - Zi Yin
- Dr. Li Dak Sum - Yip Yio Chin Center for Stem Cells and Regenerative Medicine and Department of Orthopedic Surgery of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Key Laboratory of Tissue Engineering and Regenerative Medicine of Zhejiang Province, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Department of Sports Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,China Orthopedic Regenerative Medicine Group (CORMed), Hangzhou, People's Republic of China
| | - Tingzhang Wang
- Key Laboratory of Microbial Technology and Bioinformatics of Zhejiang Province, Hangzhou, People's Republic of China.,NMPA Key laboratory for Testing and Risk Warning of Pharmaceutical Microbiology, Hangzhou, People's Republic of China
| | - Xiao Chen
- Dr. Li Dak Sum - Yip Yio Chin Center for Stem Cells and Regenerative Medicine and Department of Orthopedic Surgery of The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Key Laboratory of Tissue Engineering and Regenerative Medicine of Zhejiang Province, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,Department of Sports Medicine, Zhejiang University School of Medicine, Hangzhou, People's Republic of China.,China Orthopedic Regenerative Medicine Group (CORMed), Hangzhou, People's Republic of China
| |
Collapse
|
16
|
Lian Q, Wang B, Fan L, Sun J, Wang G, Zhang J. DNA methylation data-based molecular subtype classification and prediction in patients with gastric cancer. Cancer Cell Int 2020; 20:349. [PMID: 32742196 PMCID: PMC7388223 DOI: 10.1186/s12935-020-01253-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 05/10/2020] [Indexed: 01/29/2023] Open
Abstract
Background Genetic and epigenetic alterations have been indicated to be closely correlated with the carcinogenesis, DNA methylation is one of most frequently occurring molecular behavior that take place early during this complicated process in gastric cancer (GC). Methods In this study, 398 samples were collected from the cancer genome atlas (TCGA) database and were analyzed, so as to mine the specific DNA methylation sites that affected the prognosis for GC patients. Moreover, the 23,588 selected CpGs that were markedly correlated with patient prognosis were used for consistent clustering of the samples into 6 subgroups, and samples in each subgroup varied in terms of M, Stage, Grade, and Age. In addition, the levels of methylation sites in each subgroup were calculated, and 347 methylation sites (corresponding to 271 genes) were screened as the intrasubgroup specific methylation sites. Meanwhile, genes in the corresponding promoter regions that the above specific methylation sites were located were performed signaling pathway enrichment analysis. Results The specific genes were enriched to the biological pathways that were reported to be closely correlated with GC; moreover, the subsequent transcription factor enrichment analysis discovered that, these genes were mainly enriched into the cell response to transcription factor B, regulation of MAPK signaling pathways, and regulation of cell proliferation and metastasis. Eventually, the prognosis prediction model for GC patients was constructed using the Random Forest Classifier model, and the training set and test set data were carried out independent verification and test. Conclusions Such specific classification based on specific DNA methylation sites can well reflect the heterogeneity of GC tissues, which contributes to developing the individualized treatment and accurately predicting patient prognosis.
Collapse
Affiliation(s)
- Qixin Lian
- Oncology Department, First Affiliated Hospital of Jiamusi University, 154002 Qiqihar, Heilongjiang China
| | - Bo Wang
- Oncology Department, First Affiliated Hospital of Jiamusi University, 154002 Qiqihar, Heilongjiang China
| | - Lijun Fan
- Gastroenterology Department, The First Hospital of Qiqihar, The Affiliate Qiqihar Hospital of Southern Medical University, Longsha District, 30 of Park Road, Qiqihar, Heilongjiang 161005 China
| | - Junqiang Sun
- Radiotherapy and Chemotherapy, The First Hospital of Dandong, Liaoning, 118000 China
| | - Guilai Wang
- General Surgery, The First Hospital of Qiqihar, The Affiliate Qiqihar Hospital of Southern Medical University, Longsha District, 30 of Park Road, Qiqihar, Heilongjiang 161005 China
| | - Jidong Zhang
- Gastroenterology Department, The First Hospital of Qiqihar, The Affiliate Qiqihar Hospital of Southern Medical University, Longsha District, 30 of Park Road, Qiqihar, Heilongjiang 161005 China
| |
Collapse
|
17
|
Wang Y, Franks JM, Whitfield ML, Cheng C. BioMethyl: an R package for biological interpretation of DNA methylation data. Bioinformatics 2020; 35:3635-3641. [PMID: 30799505 PMCID: PMC6761945 DOI: 10.1093/bioinformatics/btz137] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 01/25/2019] [Accepted: 02/22/2019] [Indexed: 12/16/2022] Open
Abstract
Motivation The accumulation of publicly available DNA methylation datasets has resulted in the need for tools to interpret the specific cellular phenotypes in bulk tissue data. Current approaches use either single differentially methylated CpG sites or differentially methylated regions that map to genes. However, these approaches may introduce biases in downstream analyses of biological interpretation, because of the variability in gene length. There is a lack of approaches to interpret DNA methylation effectively. Therefore, we have developed computational models to provide biological interpretation of relevant gene sets using DNA methylation data in the context of The Cancer Genome Atlas. Results We illustrate that Biological interpretation of DNA Methylation (BioMethyl) utilizes the complete DNA methylation data for a given cancer type to reflect corresponding gene expression profiles and performs pathway enrichment analyses, providing unique biological insight. Using breast cancer as an example, BioMethyl shows high consistency in the identification of enriched biological pathways from DNA methylation data compared to the results calculated from RNA sequencing data. We find that 12 out of 14 pathways identified by BioMethyl are shared with those by using RNA-seq data, with a Jaccard score 0.8 for estrogen receptor (ER) positive samples. For ER negative samples, three pathways are shared in the two enrichments with a slight lower similarity (Jaccard score = 0.6). Using BioMethyl, we can successfully identify those hidden biological pathways in DNA methylation data when gene expression profile is lacking. Availability and implementation BioMethyl R package is freely available in the GitHub repository (https://github.com/yuewangpanda/BioMethyl). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yue Wang
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Jennifer M Franks
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Michael L Whitfield
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA
| | - Chao Cheng
- Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.,Department of Biomedical Data Science, Geisel School of Medicine at Dartmouth, Lebanon, NH, USA.,Norris Cotton Cancer Center, Lebanon, NH, USA.,Department of Medicine, Baylor College of Medicine, Houston, TX, USA
| |
Collapse
|
18
|
Complex Network Characterization Using Graph Theory and Fractal Geometry: The Case Study of Lung Cancer DNA Sequences. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10093037] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
This paper discusses an approach developed for exploiting the local elementary movements of evolution to study complex networks in terms of shared common embedding and, consequently, shared fractal properties. This approach can be useful for the analysis of lung cancer DNA sequences and their properties by using the concepts of graph theory and fractal geometry. The proposed method advances a renewed consideration of network complexity both on local and global scales. Several researchers have illustrated the advantages of fractal mathematics, as well as its applicability to lung cancer research. Nevertheless, many researchers and clinicians continue to be unaware of its potential. Therefore, this paper aims to examine the underlying assumptions of fractals and analyze the fractal dimension and related measurements for possible application to complex networks and, especially, to the lung cancer network. The strict relationship between the lung cancer network properties and the fractal dimension is proved. Results show that the fractal dimension decreases in the lung cancer network while the topological properties of the network increase in the lung cancer network. Finally, statistical and topological significance between the complexity of the network and lung cancer network is shown.
Collapse
|
19
|
Rauschert S, Raubenheimer K, Melton PE, Huang RC. Machine learning and clinical epigenetics: a review of challenges for diagnosis and classification. Clin Epigenetics 2020; 12:51. [PMID: 32245523 PMCID: PMC7118917 DOI: 10.1186/s13148-020-00842-4] [Citation(s) in RCA: 96] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 03/22/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Machine learning is a sub-field of artificial intelligence, which utilises large data sets to make predictions for future events. Although most algorithms used in machine learning were developed as far back as the 1950s, the advent of big data in combination with dramatically increased computing power has spurred renewed interest in this technology over the last two decades. MAIN BODY Within the medical field, machine learning is promising in the development of assistive clinical tools for detection of e.g. cancers and prediction of disease. Recent advances in deep learning technologies, a sub-discipline of machine learning that requires less user input but more data and processing power, has provided even greater promise in assisting physicians to achieve accurate diagnoses. Within the fields of genetics and its sub-field epigenetics, both prime examples of complex data, machine learning methods are on the rise, as the field of personalised medicine is aiming for treatment of the individual based on their genetic and epigenetic profiles. CONCLUSION We now have an ever-growing number of reported epigenetic alterations in disease, and this offers a chance to increase sensitivity and specificity of future diagnostics and therapies. Currently, there are limited studies using machine learning applied to epigenetics. They pertain to a wide variety of disease states and have used mostly supervised machine learning methods.
Collapse
Affiliation(s)
- S Rauschert
- Telethon Kids Institute, University of Western Australia, Nedlands, Perth, Western Australia.
| | - K Raubenheimer
- School of Medicine, Notre Dame University, Fremantle, Western Australia
| | - P E Melton
- Centre for Genetic Origins of Health and Disease, The University of Western Australia and Curtin University, Perth, Western Australia
- School of Pharmacy and Biomedical Sciences, Curtin University, Bentley, Western Australia
- Menzies Institute for Medical Research, University of Tasmania, Hobart, Tasmania, Australia
| | - R C Huang
- Telethon Kids Institute, University of Western Australia, Nedlands, Perth, Western Australia
| |
Collapse
|
20
|
Seal DB, Das V, Goswami S, De RK. Estimating gene expression from DNA methylation and copy number variation: A deep learning regression model for multi-omics integration. Genomics 2020; 112:2833-2841. [PMID: 32234433 DOI: 10.1016/j.ygeno.2020.03.021] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Revised: 03/17/2020] [Accepted: 03/22/2020] [Indexed: 12/21/2022]
Abstract
Gene expression analysis plays a significant role for providing molecular insights in cancer. Various genetic and epigenetic factors (being dealt under multi-omics) affect gene expression giving rise to cancer phenotypes. A recent growth in understanding of multi-omics seems to provide a resource for integration in interdisciplinary biology since they altogether can draw the comprehensive picture of an organism's developmental and disease biology in cancers. Such large scale multi-omics data can be obtained from public consortium like The Cancer Genome Atlas (TCGA) and several other platforms. Integrating these multi-omics data from varied platforms is still challenging due to high noise and sensitivity of the platforms used. Currently, a robust integrative predictive model to estimate gene expression from these genetic and epigenetic data is lacking. In this study, we have developed a deep learning-based predictive model using Deep Denoising Auto-encoder (DDAE) and Multi-layer Perceptron (MLP) that can quantitatively capture how genetic and epigenetic alterations correlate with directionality of gene expression for liver hepatocellular carcinoma (LIHC). The DDAE used in the study has been trained to extract significant features from the input omics data to estimate the gene expression. These features have then been used for back-propagation learning by the multilayer perceptron for the task of regression and classification. We have benchmarked the proposed model against state-of-the-art regression models. Finally, the deep learning-based integration model has been evaluated for its disease classification capability, where an accuracy of 95.1% has been obtained.
Collapse
Affiliation(s)
- Dibyendu Bikash Seal
- A. K. Choudhury School of Information Technology, University of Calcutta, JD-2, Sector III, Salt Lake City, Kolkata 700106, India
| | - Vivek Das
- Novo Nordisk Research Center Seattle, Inc., 530 Fairview Ave N # 5000, Seattle, WA 98109, United States
| | - Saptarsi Goswami
- Bangabasi Morning College, 35 Rajkumar Chakraborty Sarani, Scott Ln, Kolkata 700009, India
| | - Rajat K De
- Machine Intelligence Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, Kolkata 700108, India.
| |
Collapse
|
21
|
Csumita M, Csermely A, Horvath A, Nagy G, Monori F, Göczi L, Orbea HA, Reith W, Széles L. Specific enhancer selection by IRF3, IRF5 and IRF9 is determined by ISRE half-sites, 5' and 3' flanking bases, collaborating transcription factors and the chromatin environment in a combinatorial fashion. Nucleic Acids Res 2020; 48:589-604. [PMID: 31799619 PMCID: PMC6954429 DOI: 10.1093/nar/gkz1112] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2019] [Revised: 10/22/2019] [Accepted: 11/12/2019] [Indexed: 12/28/2022] Open
Abstract
IRF3, IRF5 and IRF9 are transcription factors, which play distinct roles in the regulation of antiviral and inflammatory responses. The determinants that mediate IRF-specific enhancer selection are not fully understood. To uncover regions occupied predominantly by IRF3, IRF5 or IRF9, we performed ChIP-seq experiments in activated murine dendritic cells. The identified regions were analysed with respect to the enrichment of DNA motifs, the interferon-stimulated response element (ISRE) and ISRE half-site variants, and chromatin accessibility. Using a machine learning method, we investigated the predictability of IRF-dominance. We found that IRF5-dominant regions differed fundamentally from the IRF3- and IRF9-dominant regions: ISREs were rare, while the NFKB motif and special ISRE half-sites, such as 5'-GAGA-3' and 5'-GACA-3', were enriched. IRF3- and IRF9-dominant regions were characterized by the enriched ISRE motif and lower frequency of accessible chromatin. Enrichment analysis and the machine learning method uncovered the features that favour IRF3 or IRF9 dominancy (e.g. a tripartite form of ISRE and motifs for NF-κB for IRF3, and the GAS motif and certain ISRE variants for IRF9). This study contributes to our understanding of how IRF members, which bind overlapping sets of DNA sequences, can initiate signal-dependent responses without activating superfluous or harmful programmes.
Collapse
Affiliation(s)
- Mária Csumita
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Attila Csermely
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Attila Horvath
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Gergely Nagy
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Fanny Monori
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Loránd Göczi
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| | - Hans-Acha Orbea
- Department of Biochemistry, University of Lausanne, CH-1066 Epalinges, Switzerland
| | - Walter Reith
- Department of Pathology and Immunology, Faculty of Medicine, University of Geneva, Centre Médical Universitaire (CMU), CH-1211 Geneva, Switzerland
| | - Lajos Széles
- Department of Biochemistry and Molecular Biology, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
- Department of Human Genetics, Faculty of Medicine, University of Debrecen, Debrecen H-4032, Hungary
| |
Collapse
|
22
|
Schmidt F, Schulz MH. On the problem of confounders in modeling gene expression. Bioinformatics 2019; 35:711-719. [PMID: 30084962 PMCID: PMC6530814 DOI: 10.1093/bioinformatics/bty674] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2018] [Revised: 06/21/2018] [Accepted: 08/02/2018] [Indexed: 01/01/2023] Open
Abstract
Motivation Modeling of Transcription Factor (TF) binding from both ChIP-seq and chromatin accessibility data has become prevalent in computational biology. Several models have been proposed to generate new hypotheses on transcriptional regulation. However, there is no distinct approach to derive TF binding scores from ChIP-seq and open chromatin experiments. Here, we review biases of various scoring approaches and their effects on the interpretation and reliability of predictive gene expression models. Results We generated predictive models for gene expression using ChIP-seq and DNase1-seq data from DEEP and ENCODE. Via randomization experiments, we identified confounders in TF gene scores derived from both ChIP-seq and DNase1-seq data. We reviewed correction approaches for both data types, which reduced the influence of identified confounders without harm to model performance. Also, our analyses highlighted further quality control measures, in addition to model performance, that may help to assure model reliability and to avoid misinterpretation in future studies. Availability and implementation The software used in this study is available online at https://github.com/SchulzLab/TEPIC. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Florian Schmidt
- High-througput Genomics and Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, Saarbrücken, Germany.,Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany.,Graduate School for Computer Science, Saarland Informatics Campus, Saarbrücken, Germany
| | - Marcel H Schulz
- High-througput Genomics and Systems Biology, Cluster of Excellence on Multimodal Computing and Interaction, Saarland Informatics Campus, Saarbrücken, Germany.,Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarland Informatics Campus, Saarbrücken, Germany
| |
Collapse
|
23
|
Kober P, Boresowicz J, Rusetska N, Maksymowicz M, Paziewska A, Dąbrowska M, Kunicki J, Bonicki W, Ostrowski J, Siedlecki JA, Bujko M. The Role of Aberrant DNA Methylation in Misregulation of Gene Expression in Gonadotroph Nonfunctioning Pituitary Tumors. Cancers (Basel) 2019; 11:E1650. [PMID: 31731486 PMCID: PMC6895980 DOI: 10.3390/cancers11111650] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 10/18/2019] [Accepted: 10/21/2019] [Indexed: 12/21/2022] Open
Abstract
Gonadotroph nonfunctioning pituitary adenomas (NFPAs) are common intracranial tumors, but the role of aberrant epigenetic regulation in their development remains poorly understood. In this study, we investigated the effect of impaired CpG methylation in NFPAs. We determined DNA methylation and transcriptomic profiles in 32 NFPAs and normal pituitary sections using methylation arrays and sequencing, respectively. Ten percent of differentially methylated CpGs were correlated with gene expression, and the affected genes are involved in a variety of tumorigenesis-related pathways. Different proportions of gene body and promoter region localization were observed in CpGs with negative and positive correlations between methylation and gene expression, and different proportions of CpGs were located in 'open sea' and 'shelf/shore' regions. The expression of ~8% of genes differentially expressed in NFPAs was related to aberrant methylation. Methylation levels of seven CpGs located in the regulatory regions of FAM163A, HIF3A and PRSS8 were determined by pyrosequencing, and gene expression was measured by qRT-PCR and immunohistochemistry in 83 independent NFPAs. The results clearly confirmed the negative correlation between methylation and gene expression for these genes. By identifying which aberrantly methylated CpGs affect gene expression in gonadotrophinomas, our data confirm the role of aberrant methylation in pathogenesis of gonadotroph NFPAs.
Collapse
Affiliation(s)
- Paulina Kober
- Department of Molecular and Translational Oncology, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (P.K.); (J.B.); (N.R.); (J.A.S.)
| | - Joanna Boresowicz
- Department of Molecular and Translational Oncology, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (P.K.); (J.B.); (N.R.); (J.A.S.)
| | - Natalia Rusetska
- Department of Molecular and Translational Oncology, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (P.K.); (J.B.); (N.R.); (J.A.S.)
| | - Maria Maksymowicz
- Department of Pathology and Laboratory Diagnostics, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland;
| | - Agnieszka Paziewska
- Department of Genetics, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (A.P.); (M.D.); (J.O.)
- Department of Gastroenterology, Hepatology and Clinical Oncology, Medical Center for Postgraduate Education, 01-813 Warsaw, Poland
| | - Michalina Dąbrowska
- Department of Genetics, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (A.P.); (M.D.); (J.O.)
| | - Jacek Kunicki
- Department of Neurosurgery, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (J.K.); (W.B.)
| | - Wiesław Bonicki
- Department of Neurosurgery, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (J.K.); (W.B.)
| | - Jerzy Ostrowski
- Department of Genetics, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (A.P.); (M.D.); (J.O.)
- Department of Gastroenterology, Hepatology and Clinical Oncology, Medical Center for Postgraduate Education, 01-813 Warsaw, Poland
| | - Janusz A. Siedlecki
- Department of Molecular and Translational Oncology, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (P.K.); (J.B.); (N.R.); (J.A.S.)
| | - Mateusz Bujko
- Department of Molecular and Translational Oncology, Maria Skłodowska-Curie Institute—Oncology Center, 02-034 Warsaw, Poland; (P.K.); (J.B.); (N.R.); (J.A.S.)
| |
Collapse
|
24
|
Nishimura T, Nakamura H, Végvári Á, Marko-Varga G, Furuya N, Saji H. Current status of clinical proteogenomics in lung cancer. Expert Rev Proteomics 2019; 16:761-772. [PMID: 31402712 DOI: 10.1080/14789450.2019.1654861] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Introduction: Lung cancer is the leading cause of cancer death worldwide. Proteogenomics, a way to integrate genomics, transcriptomics, and proteomics, have emerged as a way to understand molecular causes in cancer tumorigenesis. This understanding will help identify therapeutic targets that are urgently needed to improve individual patient outcomes. Areas covered: To explore underlying molecular mechanisms of lung cancer subtypes, several efforts have used proteogenomic approaches that integrate next generation sequencing (NGS) and mass spectrometry (MS)-based technologies. Expert opinion: A large-scale, MS-based, proteomic analysis, together with both NGS-based genomic data and clinicopathological information, will facilitate establishing extensive databases for lung cancer subtypes that can be used for further proteogenomic analyzes. Proteogenomic strategies will further be understanding of how major driver mutations affect downstream molecular networks, resulting in lung cancer progression and malignancy, and how therapy-resistant cancers resistant are molecularly structured. These strategies require advanced bioinformatics based on a dynamic theory of network systems, rather than statistics, to accurately identify mutant proteins and their affected key networks.
Collapse
Affiliation(s)
- Toshihide Nishimura
- Department of Translational Medicine Informatics, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| | - Haruhiko Nakamura
- Department of Translational Medicine Informatics, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan.,Department of Chest Surgery, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| | - Ákos Végvári
- Proteomics Biomedicum, Division of Physiological Chemistry I, Department of Medical Biochemistry & Biophysics (MBB), Karolinska Institutet , Solna , Sweden
| | - György Marko-Varga
- Clinical Protein Science & Imaging, Biomedical Centre, Department of Biomedical Engineering, Lund University , Lund , Sweden.,Section for Clinical Chemistry, Department of Translational Medicine, Lund University, Skåne University Hospital Malmö , Malmö , Sweden
| | - Naoki Furuya
- Department of Internal Medicine, Division of Respiratory Medicine, St. Marianna University School of Medicine , Kawasaki , Kanagawa , Japan
| | - Hisashi Saji
- Department of Chest Surgery, St. Marianna University School of Medicine , Kawasaki, Kanagawa , Japan
| |
Collapse
|
25
|
Onco-Multi-OMICS Approach: A New Frontier in Cancer Research. BIOMED RESEARCH INTERNATIONAL 2018; 2018:9836256. [PMID: 30402498 PMCID: PMC6192166 DOI: 10.1155/2018/9836256] [Citation(s) in RCA: 171] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2018] [Accepted: 09/06/2018] [Indexed: 02/07/2023]
Abstract
The acquisition of cancer hallmarks requires molecular alterations at multiple levels including genome, epigenome, transcriptome, proteome, and metabolome. In the past decade, numerous attempts have been made to untangle the molecular mechanisms of carcinogenesis involving single OMICS approaches such as scanning the genome for cancer-specific mutations and identifying altered epigenetic-landscapes within cancer cells or by exploring the differential expression of mRNA and protein through transcriptomics and proteomics techniques, respectively. While these single-level OMICS approaches have contributed towards the identification of cancer-specific mutations, epigenetic alterations, and molecular subtyping of tumors based on gene/protein-expression, they lack the resolving-power to establish the casual relationship between molecular signatures and the phenotypic manifestation of cancer hallmarks. In contrast, the multi-OMICS approaches involving the interrogation of the cancer cells/tissues in multiple dimensions have the potential to uncover the intricate molecular mechanism underlying different phenotypic manifestations of cancer hallmarks such as metastasis and angiogenesis. Moreover, multi-OMICS approaches can be used to dissect the cellular response to chemo- or immunotherapy as well as discover molecular candidates with diagnostic/prognostic value. In this review, we focused on the applications of different multi-OMICS approaches in the field of cancer research and discussed how these approaches are shaping the field of personalized oncomedicine. We have highlighted pioneering studies from “The Cancer Genome Atlas (TCGA)” consortium encompassing integrated OMICS analysis of over 11,000 tumors from 33 most prevalent forms of cancer. Accumulation of huge cancer-specific multi-OMICS data in repositories like TCGA provides a unique opportunity for the systems biology approach to tackle the complexity of cancer cells through the unification of experimental data and computational/mathematical models. In future, systems biology based approach is likely to predict the phenotypic changes of cancer cells upon chemo-/immunotherapy treatment. This review is sought to encourage investigators to bring these different approaches together for interrogating cancer at molecular, cellular, and systems levels.
Collapse
|
26
|
Sekhon A, Singh R, Qi Y. DeepDiff: DEEP-learning for predicting DIFFerential gene expression from histone modifications. Bioinformatics 2018; 34:i891-i900. [DOI: 10.1093/bioinformatics/bty612] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Affiliation(s)
- Arshdeep Sekhon
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Ritambhara Singh
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| | - Yanjun Qi
- Department of Computer Science, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
27
|
RNA-seq assistant: machine learning based methods to identify more transcriptional regulated genes. BMC Genomics 2018; 19:546. [PMID: 30029596 PMCID: PMC6053725 DOI: 10.1186/s12864-018-4932-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/08/2018] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Although different quality controls have been applied at different stages of the sample preparation and data analysis to ensure both reproducibility and reliability of RNA-seq results, there are still limitations and bias on the detectability for certain differentially expressed genes (DEGs). Whether the transcriptional dynamics of a gene can be captured accurately depends on experimental design/operation and the following data analysis processes. The workflow of subsequent data processing, such as reads alignment, transcript quantification, normalization, and statistical methods for ultimate identification of DEGs can influence the accuracy and sensitivity of DEGs analysis, producing a certain number of false-positivity or false-negativity. Machine learning (ML) is a multidisciplinary field that employs computer science, artificial intelligence, computational statistics and information theory to construct algorithms that can learn from existing data sets and to make predictions on new data set. ML-based differential network analysis has been applied to predict stress-responsive genes through learning the patterns of 32 expression characteristics of known stress-related genes. In addition, the epigenetic regulation plays critical roles in gene expression, therefore, DNA and histone methylation data has been shown to be powerful for ML-based model for prediction of gene expression in many systems, including lung cancer cells. Therefore, it is promising that ML-based methods could help to identify the DEGs that are not identified by traditional RNA-seq method. RESULTS We identified the top 23 most informative features through assessing the performance of three different feature selection algorithms combined with five different classification methods on training and testing data sets. By comprehensive comparison, we found that the model based on InfoGain feature selection and Logistic Regression classification is powerful for DEGs prediction. Moreover, the power and performance of ML-based prediction was validated by the prediction on ethylene regulated gene expression and the following qRT-PCR. CONCLUSIONS Our study shows that the combination of ML-based method with RNA-seq greatly improves the sensitivity of DEGs identification.
Collapse
|
28
|
Klett H, Balavarca Y, Toth R, Gigic B, Habermann N, Scherer D, Schrotz-King P, Ulrich A, Schirmacher P, Herpel E, Brenner H, Ulrich CM, Michels KB, Busch H, Boerries M. Robust prediction of gene regulation in colorectal cancer tissues from DNA methylation profiles. Epigenetics 2018; 13:386-397. [PMID: 29697014 PMCID: PMC6140810 DOI: 10.1080/15592294.2018.1460034] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 03/19/2018] [Accepted: 03/27/2018] [Indexed: 02/01/2023] Open
Abstract
DNA methylation is recognized as one of several epigenetic regulators of gene expression and as potential driver of carcinogenesis through gene-silencing of tumor suppressors and activation of oncogenes. However, abnormal methylation, even of promoter regions, does not necessarily alter gene expression levels, especially if the gene is already silenced, leaving the exact mechanisms of methylation unanswered. Using a large cohort of matching DNA methylation and gene expression samples of colorectal cancer (CRC; n = 77) and normal adjacent mucosa tissues (n = 108), we investigated the regulatory role of methylation on gene expression. We show that on a subset of genes enriched in common cancer pathways, methylation is significantly associated with gene regulation through gene-specific mechanisms. We built two classification models to infer gene regulation in CRC from methylation differences of tumor and normal tissues, taking into account both gene-silencing and gene-activation effects through hyper- and hypo-methylation of CpGs. The classification models result in high prediction performances in both training and independent CRC testing cohorts (0.92
Collapse
Affiliation(s)
- Hagen Klett
- German Cancer Consortium (DKTK), Heidelberg, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| | - Yesilda Balavarca
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Reka Toth
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Biljana Gigic
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Department of General, Visceral and Transplantation Surgery, University Clinic Heidelberg, Heidelberg, Germany
| | - Nina Habermann
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Dominique Scherer
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Medical Biometry and Informatics, University of Heidelberg, Heidelberg, Germany
| | - Petra Schrotz-King
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Alexis Ulrich
- Department of General, Visceral and Transplantation Surgery, University Clinic Heidelberg, Heidelberg, Germany
| | - Peter Schirmacher
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Institute of Pathology, University Clinic Heidelberg, Heidelberg, Germany
| | - Esther Herpel
- Institute of Pathology, University Clinic Heidelberg, Heidelberg, Germany
- Tissue Bank of the National Center for Tumor Diseases (NCT) Heidelberg, Germany
| | - Hermann Brenner
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Division of Clinical Epidemiology and Aging Research, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Cornelia M. Ulrich
- German Cancer Consortium (DKTK), Heidelberg, Germany
- Division of Preventive Oncology, National Center for Tumor Diseases (NCT) and German Cancer Research Center (DKFZ), Heidelberg, Germany
- Huntsman Cancer Institute and Department of Population Health Sciences, University of Utah, Salt Lake City, UT, USA
| | - Karin B. Michels
- Institute for Prevention and Cancer Epidemiology, Faculty of Medicine and Medical Center, University of Freiburg, Germany
- Department of Epidemiology, Fielding School of Public Health, University of California, Los Angeles, CA, USA
| | - Hauke Busch
- Lübeck Institute of Experimental Dermatology and Institute of Cardiogenetics, University of Lübeck, Lübeck, Germany
| | - Melanie Boerries
- German Cancer Consortium (DKTK), Heidelberg, Germany
- German Cancer Research Center (DKFZ), Heidelberg, Germany
- Institute of Molecular Medicine and Cell Research, Faculty of Medicine and Medical Center, University of Freiburg, Germany
| |
Collapse
|
29
|
Kim M, Tagkopoulos I. Data integration and predictive modeling methods for multi-omics datasets. Mol Omics 2018; 14:8-25. [DOI: 10.1039/c7mo00051k] [Citation(s) in RCA: 56] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
We provide an overview of opportunities and challenges in multi-omics predictive analytics with particular emphasis on data integration and machine learning methods.
Collapse
Affiliation(s)
- Minseung Kim
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| | - Ilias Tagkopoulos
- Department of Computer Science
- University of California
- Davis
- USA
- Genome Center
| |
Collapse
|
30
|
Lee G, Bang L, Kim SY, Kim D, Sohn KA. Identifying subtype-specific associations between gene expression and DNA methylation profiles in breast cancer. BMC Med Genomics 2017; 10:28. [PMID: 28589855 PMCID: PMC5461552 DOI: 10.1186/s12920-017-0268-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Breast cancer is a complex disease in which different genomic patterns exists depending on different subtypes. Recent researches present that multiple subtypes of breast cancer occur at different rates, and play a crucial role in planning treatment. To better understand underlying biological mechanisms on breast cancer subtypes, investigating the specific gene regulatory system via different subtypes is desirable. METHODS Gene expression, as an intermediate phenotype, is estimated based on methylation profiles to identify the impact of epigenomic features on transcriptomic changes in breast cancer. We propose a kernel weighted l1-regularized regression model to incorporate tumor subtype information and further reveal gene regulations affected by different breast cancer subtypes. For the proper control of subtype-specific estimation, samples from different breast cancer subtype are learned at different rate based on target estimates. Kolmogorov Smirnov test is conducted to determine learning rate of each sample from different subtype. RESULTS It is observed that genes that might be sensitive to breast cancer subtype show prediction improvement when estimated using our proposed method. Comparing to a standard method, overall performance is also enhanced by incorporating tumor subtypes. In addition, we identified subtype-specific network structures based on the associations between gene expression and DNA methylation. CONCLUSIONS In this study, kernel weighted lasso model is proposed for identifying subtype-specific associations between gene expressions and DNA methylation profiles. Identification of subtype-specific gene expression associated with epigenomic changes might be helpful for better planning treatment and developing new therapies.
Collapse
Affiliation(s)
- Garam Lee
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Lisa Bang
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA
| | - So Yeon Kim
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea
| | - Dokyoon Kim
- Biomedical & Translational Informatics Institute, Geisinger Health System, Danville, PA, USA. .,The Huck Institute of the Life Sciences, Pennsylvania State University, University Park, PA, USA.
| | - Kyung-Ah Sohn
- Department of Software and Computer Engineering, Ajou University, Suwon, 16499, South Korea.
| |
Collapse
|
31
|
Abstract
Understanding epigenetic processes holds immense promise for medical applications. Advances in Machine Learning (ML) are critical to realize this promise. Previous studies used epigenetic data sets associated with the germline transmission of epigenetic transgenerational inheritance of disease and novel ML approaches to predict genome-wide locations of critical epimutations. A combination of Active Learning (ACL) and Imbalanced Class Learning (ICL) was used to address past problems with ML to develop a more efficient feature selection process and address the imbalance problem in all genomic data sets. The power of this novel ML approach and our ability to predict epigenetic phenomena and associated disease is suggested. The current approach requires extensive computation of features over the genome. A promising new approach is to introduce Deep Learning (DL) for the generation and simultaneous computation of novel genomic features tuned to the classification task. This approach can be used with any genomic or biological data set applied to medicine. The application of molecular epigenetic data in advanced machine learning analysis to medicine is the focus of this review.
Collapse
Affiliation(s)
- Lawrence B Holder
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA
| | - M Muksitul Haque
- a School of Electrical Engineering and Computer Science , Washington State University , Pullman , WA , USA.,b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| | - Michael K Skinner
- b Center for Reproductive Biology, School of Biological Sciences , Washington State University , Pullman , WA , USA
| |
Collapse
|
32
|
Analysis of Microarray Data on Gene Expression and Methylation to Identify Long Non-coding RNAs in Non-small Cell Lung Cancer. Sci Rep 2016; 6:37233. [PMID: 27849024 PMCID: PMC5110979 DOI: 10.1038/srep37233] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 10/26/2016] [Indexed: 12/28/2022] Open
Abstract
To identify what long non-coding RNAs (lncRNAs) are involved in non-small cell lung cancer (NSCLC), we analyzed microarray data on gene expression and methylation. Gene expression chip and HumanMethylation450BeadChip were used to interrogate genome-wide expression and methylation in tumor samples. Differential expression and methylation were analyzed through comparing tumors with adjacent non-tumor tissues. LncRNAs expressed differentially and correlated with coding genes and DNA methylation were validated in additional tumor samples using RT-qPCR and pyrosequencing. In vitro experiments were performed to evaluate lncRNA’s effects on tumor cells. We identified 8,500 lncRNAs expressed differentially between tumor and non-tumor tissues, of which 1,504 were correlated with mRNA expression. Two of the lncRNAs, LOC146880 and ENST00000439577, were positively correlated with expression of two cancer-related genes, KPNA2 and RCC2, respectively. High expression of LOC146880 and ENST00000439577 were also associated with poor survival. Analysis of lncRNA expression in relation to DNA methylation showed that LOC146880 expression was down-regulated by DNA methylation in its promoter. Lowering the expression of LOC146880 or ENST00000439577 in tumor cells could inhibit cell proliferation, invasion and migration. Analysis of microarray data on gene expression and methylation allows us to identify two lncRNAs, LOC146880 and ENST00000439577, which may promote the progression of NSCLC.
Collapse
|
33
|
Modeling Gene Regulation in Liver Hepatocellular Carcinoma with Random Forests. BIOMED RESEARCH INTERNATIONAL 2016; 2016:1035945. [PMID: 27818995 PMCID: PMC5080476 DOI: 10.1155/2016/1035945] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Accepted: 09/21/2016] [Indexed: 11/29/2022]
Abstract
Liver hepatocellular carcinoma (HCC) remains a leading cause of cancer-related death. Poor understanding of the mechanisms underlying HCC prevents early detection and leads to high mortality. We developed a random forest model that incorporates copy-number variation, DNA methylation, transcription factor, and microRNA binding information as features to predict gene expression in HCC. Our model achieved a highly significant correlation between predicted and measured expression of held-out genes. Furthermore, we identified potential regulators of gene expression in HCC. Many of these regulators have been previously found to be associated with cancer and are differentially expressed in HCC. We also evaluated our predicted target sets for these regulators by making comparison with experimental results. Lastly, we found that the transcription factor E2F6, one of the candidate regulators inferred by our model, is predictive of survival rate in HCC. Results of this study will provide directions for future prospective studies in HCC.
Collapse
|
34
|
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol 2016; 12:878. [PMID: 27474269 PMCID: PMC4965871 DOI: 10.15252/msb.20156651] [Citation(s) in RCA: 669] [Impact Index Per Article: 83.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2016] [Revised: 06/02/2016] [Accepted: 06/06/2016] [Indexed: 12/11/2022] Open
Abstract
Technological advances in genomics and imaging have led to an explosion of molecular and cellular profiling data from large numbers of samples. This rapid increase in biological data dimension and acquisition rate is challenging conventional analysis strategies. Modern machine learning methods, such as deep learning, promise to leverage very large data sets for finding hidden structure within them, and for making accurate predictions. In this review, we discuss applications of this new breed of analysis approaches in regulatory genomics and cellular imaging. We provide background of what deep learning is, and the settings in which it can be successfully applied to derive biological insights. In addition to presenting specific applications and providing tips for practical use, we also highlight possible pitfalls and limitations to guide computational biologists when and how to make the most use of this new technology.
Collapse
Affiliation(s)
- Christof Angermueller
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Tanel Pärnamaa
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Leopold Parts
- Department of Computer Science, University of Tartu, Tartu, Estonia Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| | - Oliver Stegle
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton Cambridge, UK
| |
Collapse
|
35
|
Baur B, Bozdag S. A Feature Selection Algorithm to Compute Gene Centric Methylation from Probe Level Methylation Data. PLoS One 2016; 11:e0148977. [PMID: 26872146 PMCID: PMC4752315 DOI: 10.1371/journal.pone.0148977] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2015] [Accepted: 01/26/2016] [Indexed: 02/07/2023] Open
Abstract
DNA methylation is an important epigenetic event that effects gene expression during development and various diseases such as cancer. Understanding the mechanism of action of DNA methylation is important for downstream analysis. In the Illumina Infinium HumanMethylation 450K array, there are tens of probes associated with each gene. Given methylation intensities of all these probes, it is necessary to compute which of these probes are most representative of the gene centric methylation level. In this study, we developed a feature selection algorithm based on sequential forward selection that utilized different classification methods to compute gene centric DNA methylation using probe level DNA methylation data. We compared our algorithm to other feature selection algorithms such as support vector machines with recursive feature elimination, genetic algorithms and ReliefF. We evaluated all methods based on the predictive power of selected probes on their mRNA expression levels and found that a K-Nearest Neighbors classification using the sequential forward selection algorithm performed better than other algorithms based on all metrics. We also observed that transcriptional activities of certain genes were more sensitive to DNA methylation changes than transcriptional activities of other genes. Our algorithm was able to predict the expression of those genes with high accuracy using only DNA methylation data. Our results also showed that those DNA methylation-sensitive genes were enriched in Gene Ontology terms related to the regulation of various biological processes.
Collapse
Affiliation(s)
- Brittany Baur
- Department of Math, Statistics and Computer Science, Marquette University, Milwaukee, Wisconsin, United States of America
| | - Serdar Bozdag
- Department of Math, Statistics and Computer Science, Marquette University, Milwaukee, Wisconsin, United States of America
| |
Collapse
|
36
|
Ahmad A. Epigenetics in Personalized Management of Lung Cancer. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 890:111-22. [PMID: 26703801 DOI: 10.1007/978-3-319-24932-2_6] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
In last several years, the focus on the origin and progression of human cancers has shifted from genetic to epigenetic regulation, with particular attention to methylation and acetylation events that have profound effect on the eventual expression of oncogenes and the suppression of tumor suppressors. A few drugs targeting these epigenetic changes have already been approved for treatment, albeit not for lung cancer. With the recent advances in the push towards personalized therapy, questions have been asked about the possible targeting of epigenetic events for personalized lung cancer therapy. Some progress has been made but a lot needs to be done. In this chapter, a succinct review of these topics is provided.
Collapse
Affiliation(s)
- Aamir Ahmad
- Karmanos Cancer Institute, Wayne State University, Detroit, MI, 48201, USA.
| |
Collapse
|