1
|
|
2
|
Tian Q, Zou J, Tang J, Fang Y, Yu Z, Fan S. MRCNN: a deep learning model for regression of genome-wide DNA methylation. BMC Genomics 2019; 20:192. [PMID: 30967120 PMCID: PMC6457069 DOI: 10.1186/s12864-019-5488-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Background Determination of genome-wide DNA methylation is significant for both basic research and drug development. As a key epigenetic modification, this biochemical process can modulate gene expression to influence the cell differentiation which can possibly lead to cancer. Due to the involuted biochemical mechanism of DNA methylation, obtaining a precise prediction is a considerably tough challenge. Existing approaches have yielded good predictions, but the methods either need to combine plenty of features and prerequisites or deal with only hypermethylation and hypomethylation. Results In this paper, we propose a deep learning method for prediction of the genome-wide DNA methylation, in which the Methylation Regression is implemented by Convolutional Neural Networks (MRCNN). Through minimizing the continuous loss function, experiments show that our model is convergent and more precise than the state-of-art method (DeepCpG) according to results of the evaluation. MRCNN also achieves the discovery of de novo motifs by analysis of features from the training process. Conclusions Genome-wide DNA methylation could be evaluated based on the corresponding local DNA sequences of target CpG loci. With the autonomous learning pattern of deep learning, MRCNN enables accurate predictions of genome-wide DNA methylation status without predefined features and discovers some de novo methylation-related motifs that match known motifs by extracting sequence patterns. Electronic supplementary material The online version of this article (10.1186/s12864-019-5488-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qi Tian
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Jianxiao Zou
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Jianxiong Tang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Yuan Fang
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Zhongli Yu
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China
| | - Shicai Fan
- School of Automation Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China. .,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.
| |
Collapse
|
3
|
Ma B, Allard C, Bouchard L, Perron P, Mittleman MA, Hivert MF, Liang L. Locus-specific DNA methylation prediction in cord blood and placenta. Epigenetics 2019; 14:405-420. [PMID: 30885044 DOI: 10.1080/15592294.2019.1588685] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
DNA methylation is known to be responsive to prenatal exposures, which may be a part of the mechanism linking early developmental exposures to future chronic diseases. Many studies use blood to measure DNA methylation, yet we know that DNA methylation is tissue specific. Placenta is central to fetal growth and development, but it is rarely feasible to collect this tissue in large epidemiological studies; on the other hand, cord blood samples are more accessible. In this study, based on paired samples of both placenta and cord blood tissues from 169 individuals, we investigated the methylation concordance between placenta and cord blood. We then employed a machine-learning-based model to predict locus-specific DNA methylation levels in placenta using DNA methylation levels in cord blood. We found that methylation correlation between placenta and cord blood is lower than other tissue pairs, consistent with existing observations that placenta methylation has a distinct pattern. Nonetheless, there are still a number of CpG sites showing robust association between the two tissues. We built prediction models for placenta methylation based on cord blood data and documented a subset of 1,012 CpG sites with high correlation between measured and predicted placenta methylation levels. The resulting list of CpG sites and prediction models could help to reveal the loci where internal or external influences may affect DNA methylation in both placenta and cord blood, and provide a reference data to predict the effects on placenta in future study even when the tissue is not available in an epidemiological study.
Collapse
Affiliation(s)
- Baoshan Ma
- a College of Information Science and Technology , Dalian Maritime University , Dalian , Liaoning Province , China
| | - Catherine Allard
- b Centre de Recherche du Center Hospitalier Universitaire de Sherbrooke , Sherbrooke , Quebec , Canada
| | - Luigi Bouchard
- b Centre de Recherche du Center Hospitalier Universitaire de Sherbrooke , Sherbrooke , Quebec , Canada.,c Department of Biochemistry, Faculty of Medicine and Health Sciences , Université de Sherbrooke , Sherbrooke , Quebec , Canada.,d ECOGENE-21 Biocluster , CSSS de Chicoutimi , Chicoutimi , Quebec , Canada
| | - Patrice Perron
- b Centre de Recherche du Center Hospitalier Universitaire de Sherbrooke , Sherbrooke , Quebec , Canada.,e Department of Medicine, Faculty of Medicine and Life Sciences , Université de Sherbrooke , Sherbrooke , Quebec , Canada
| | - Murray A Mittleman
- f Department of Epidemiology , Harvard T.H. Chan School of Public Health , Boston , MA , USA.,g Cardiovascular Epidemiology Research Unit , Beth Israel Deaconess Medical Center , Boston , MA , USA
| | - Marie-France Hivert
- b Centre de Recherche du Center Hospitalier Universitaire de Sherbrooke , Sherbrooke , Quebec , Canada.,e Department of Medicine, Faculty of Medicine and Life Sciences , Université de Sherbrooke , Sherbrooke , Quebec , Canada.,h Department of Population Medicine , Harvard Pilgrim Health Care Institute, Harvard Medical School , Boston , MA , USA.,i Diabetes Unit , Massachusetts General Hospital , Boston , MA , USA
| | - Liming Liang
- f Department of Epidemiology , Harvard T.H. Chan School of Public Health , Boston , MA , USA.,j Department of Biostatistics , Harvard T.H. Chan School of Public Health , Boston , MA , USA
| |
Collapse
|
4
|
Pan G, Jiang L, Tang J, Guo F. A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties. Int J Mol Sci 2018; 19:ijms19020511. [PMID: 29419752 PMCID: PMC5855733 DOI: 10.3390/ijms19020511] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2017] [Revised: 02/01/2018] [Accepted: 02/02/2018] [Indexed: 02/06/2023] Open
Abstract
DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC), Matthew’s correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399. For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.
Collapse
Affiliation(s)
- Gaofeng Pan
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
- Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China.
| | - Limin Jiang
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
- Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China.
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
- Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China.
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA.
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin 300350, China.
- Tianjin University Institute of Computational Biology, Tianjin University, Tianjin 300350, China.
| |
Collapse
|
5
|
Wu C, Yao S, Li X, Chen C, Hu X. Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human. Int J Mol Sci 2017; 18:E420. [PMID: 28212312 PMCID: PMC5343954 DOI: 10.3390/ijms18020420] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2017] [Revised: 02/03/2017] [Accepted: 02/08/2017] [Indexed: 02/02/2023] Open
Abstract
DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.
Collapse
Affiliation(s)
- Chengchao Wu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Shixin Yao
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xinghao Li
- College of Science, Huazhong Agricultural University, Wuhan 430070, China.
| | - Chujia Chen
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| | - Xuehai Hu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
6
|
González C, Salces-Ortiz J, Calvo JH, Serrano MM. In silico analysis of regulatory and structural motifs of the ovine HSP90AA1 gene. Cell Stress Chaperones 2016; 21:415-27. [PMID: 26810179 PMCID: PMC4837184 DOI: 10.1007/s12192-016-0668-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 01/02/2016] [Accepted: 01/06/2016] [Indexed: 01/21/2023] Open
Abstract
Gene promoters are essential regions of DNA where the transcriptional molecular machinery to produce RNA molecules is recruited. In this process, DNA epigenetic modifications can acquire a fundamental role in the regulation of gene expression. Recently, in a previous work of our group, functional features and DNA methylation involved in the ovine HSP90AA1 gene expression regulation have been observed. In this work, we report a combination of methylation analysis by bisulfite sequencing in several tissues and at different developmental stages together with in silico bioinformatic analysis of putative regulating factors in order to identify regulative mechanisms both at the promoter and gene body. Our results show a "hybrid structure" (TATA box + CpG island) of the ovine HSP90AA1 gene promoter both in somatic and non-differentiated germ tissues, revealing the ability of the HSP90AA1 gene to be regulated both in an inducible and constitutive fashion. In addition, in silico analysis showed that several putative alternative spliced regulatory motifs, exonic splicing enhancers (ESEs), and G-quadruplex secondary structures were somehow related to the DNA methylation pattern found. The results obtained here could help explain the differences in cell-type transcripts, tissue expression rate, and transcription silencing mechanisms found in this gene.
Collapse
Affiliation(s)
| | | | - Jorge H Calvo
- Unidad de Tecnología en Producción Animal, CITA, 59059, Zaragoza, Spain
| | | |
Collapse
|
7
|
Predicting CpG methylation levels by integrating Infinium HumanMethylation450 BeadChip array data. Genomics 2016; 107:132-7. [DOI: 10.1016/j.ygeno.2016.02.005] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Revised: 02/19/2016] [Accepted: 02/22/2016] [Indexed: 12/23/2022]
|
8
|
Ghorbani M, Themis M, Payne A. Genome wide classification and characterisation of CpG sites in cancer and normal cells. Comput Biol Med 2015; 68:57-66. [PMID: 26615449 DOI: 10.1016/j.compbiomed.2015.09.023] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 09/16/2015] [Accepted: 09/29/2015] [Indexed: 11/30/2022]
Abstract
This study identifies common methylation patterns across different cancer types in an effort to identify common molecular events in diverse types of cancer cells and provides evidence for the sequence surrounding a CpG to influence its susceptibility to aberrant methylation. CpG sites throughout the genome were divided into four classes: sites that either become hypo or hyper-methylated in a variety cancers using all the freely available microarray data (HypoCancer and HyperCancer classes) and those found in a constant hypo (Never methylated class) or hyper-methylated (Always methylated class) state in both normal and cancer cells. Our data shows that most CpG sites included in the HumanMethylation450K microarray remain unmethylated in normal and cancerous cells; however, certain sites in all the cancers investigated become specifically modified. More detailed analysis of the sites revealed that majority of those in the never methylated class were in CpG islands whereas those in the HyperCancer class were mostly associated with miRNA coding regions. The sites in the Hypermethylated class are associated with genes involved in initiating or maintaining the cancerous state, being enriched for processes involved in apoptosis, and with transcription factors predicted to bind to these genes linked to apoptosis and tumourgenesis (notably including E2F). Further we show that more LINE elements are associated with the HypoCancer class and more Alu repeats are associated with the HyperCancer class. Motifs that classify the classes were identified to distinguish them based on the surrounding DNA sequence alone, and for the identification of DNA sequences that could render sites more prone to aberrant methylation in cancer cells. This provides evidence that the sequence surrounding a CpG site has an influence on whether a site is hypo or hyper methylated.
Collapse
Affiliation(s)
- Mohammadmersad Ghorbani
- Department of Computer Science, Brunel University, Uxbridge, Middlesex UB8 3PH, UK; Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute
| | - Michael Themis
- Department of Biosciences, Brunel University, Uxbridge, Middlesex UB8 3PH, UK
| | - Annette Payne
- Department of Computer Science, Brunel University, Uxbridge, Middlesex UB8 3PH, UK.
| |
Collapse
|
9
|
Cava C, Bertoli G, Castiglioni I. Integrating genetics and epigenetics in breast cancer: biological insights, experimental, computational methods and therapeutic potential. BMC SYSTEMS BIOLOGY 2015; 9:62. [PMID: 26391647 PMCID: PMC4578257 DOI: 10.1186/s12918-015-0211-x] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Accepted: 09/15/2015] [Indexed: 12/11/2022]
Abstract
BACKGROUND Development of human cancer can proceed through the accumulation of different genetic changes affecting the structure and function of the genome. Combined analyses of molecular data at multiple levels, such as DNA copy-number alteration, mRNA and miRNA expression, can clarify biological functions and pathways deregulated in cancer. The integrative methods that are used to investigate these data involve different fields, including biology, bioinformatics, and statistics. RESULTS These methodologies are presented in this review, and their implementation in breast cancer is discussed with a focus on integration strategies. We report current applications, recent studies and interesting results leading to the identification of candidate biomarkers for diagnosis, prognosis, and therapy in breast cancer by using both individual and combined analyses. CONCLUSION This review presents a state of art of the role of different technologies in breast cancer based on the integration of genetics and epigenetics, and shares some issues related to the new opportunities and challenges offered by the application of such integrative approaches.
Collapse
Affiliation(s)
- Claudia Cava
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Gloria Bertoli
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| | - Isabella Castiglioni
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), Milan, Italy.
| |
Collapse
|
10
|
Zhang W, Spector TD, Deloukas P, Bell JT, Engelhardt BE. Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements. Genome Biol 2015; 16:14. [PMID: 25616342 PMCID: PMC4389802 DOI: 10.1186/s13059-015-0581-9] [Citation(s) in RCA: 125] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2013] [Accepted: 01/02/2015] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. RESULTS We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. CONCLUSIONS Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes.
Collapse
Affiliation(s)
- Weiwei Zhang
- Department of Molecular Genetics and Microbiology, Duke University, Durham, NC, USA.
| | - Tim D Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.
| | - Panos Deloukas
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, UK.
- Princess Al-Jawhara Al-Brahim Centre of Excellence in Research of Hereditary Disorders (PACER-HD), King Abdulaziz University, Jeddah, 21589, Saudi Arabia.
| | - Jordana T Bell
- Department of Twin Research and Genetic Epidemiology, King's College London, London, UK.
| | | |
Collapse
|
11
|
Wing MR, Ramezani A, Gill HS, Devaney JM, Raj DS. Epigenetics of progression of chronic kidney disease: fact or fantasy? Semin Nephrol 2014; 33:363-74. [PMID: 24011578 DOI: 10.1016/j.semnephrol.2013.05.008] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Epigenetic modifications are important in the normal functioning of the cell, from regulating dynamic expression of essential genes and associated proteins to repressing those that are unneeded. Epigenetic changes are essential for development and functioning of the kidney, and aberrant methylation, histone modifications, and expression of microRNA could lead to chronic kidney disease (CKD). Here, epigenetic modifications modulate transforming growth factor β signaling, inflammation, profibrotic genes, and the epithelial-to-mesenchymal transition, promoting renal fibrosis and progression of CKD. Identification of these epigenetic changes is important because they are potentially reversible and may serve as therapeutic targets in the future to prevent subsequent renal fibrosis and CKD. In this review we discuss the different types of epigenetic control, methods to study epigenetic modifications, and how epigenetics promotes progression of CKD.
Collapse
Affiliation(s)
- Maria R Wing
- Division of Renal Disease and Hypertension, The George Washington University, Washington, DC
| | | | | | | | | |
Collapse
|
12
|
Comparative (computational) analysis of the DNA methylation status of trinucleotide repeat expansion diseases. J Nucleic Acids 2013; 2013:689798. [PMID: 24455203 PMCID: PMC3884633 DOI: 10.1155/2013/689798] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2013] [Revised: 10/11/2013] [Accepted: 10/15/2013] [Indexed: 12/26/2022] Open
Abstract
Previous studies have examined DNA methylation in different trinucleotide repeat diseases. We have combined this data and used a pattern searching algorithm to identify motifs in the DNA surrounding aberrantly methylated CpGs found in the DNA of patients with one of the three trinucleotide repeat (TNR) expansion diseases: fragile X syndrome (FRAXA), myotonic dystrophy type I (DM1), or Friedreich's ataxia (FRDA). We examined sequences surrounding both the variably methylated (VM) CpGs, which are hypermethylated in patients compared with unaffected controls, and the nonvariably methylated CpGs which remain either always methylated (AM) or never methylated (NM) in both patients and controls. Using the J48 algorithm of WEKA analysis, we identified that two patterns are all that is necessary to classify our three regions CCGG∗ which is found in VM and not in AM regions and AATT∗ which distinguished between NM and VM + AM using proportional frequency. Furthermore, comparing our software with MEME software, we have demonstrated that our software identifies more patterns than MEME in these short DNA sequences. Thus, we present evidence that the DNA sequence surrounding CpG can influence its susceptibility to be de novo methylated in a disease state associated with a trinucleotide repeat.
Collapse
|
13
|
Zheng H, Wu H, Li J, Jiang SW. CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome. BMC Med Genomics 2013; 6 Suppl 1:S13. [PMID: 23369266 PMCID: PMC3552668 DOI: 10.1186/1755-8794-6-s1-s13] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
DNA methylation is an inheritable chemical modification of cytosine, and represents one of the most important epigenetic events. Computational prediction of the DNA methylation status can be employed to speed up the genome-wide methylation profiling, and to identify the key features that are correlated with various methylation patterns. Here, we develop CpGIMethPred, the support vector machine-based models to predict the methylation status of the CpG islands in the human genome under normal conditions. The features for prediction include those that have been previously demonstrated effective (CpG island specific attributes, DNA sequence composition patterns, DNA structure patterns, distribution patterns of conserved transcription factor binding sites and conserved elements, and histone methylation status) as well as those that have not been extensively explored but are likely to contribute additional information from a biological point of view (nucleosome positioning propensities, gene functions, and histone acetylation status). Statistical tests are performed to identify the features that are significantly correlated with the methylation status of the CpG islands, and principal component analysis is then performed to decorrelate the selected features. Data from the Human Epigenome Project (HEP) are used to train, validate and test the predictive models. Specifically, the models are trained and validated by using the DNA methylation data obtained in the CD4 lymphocytes, and are then tested for generalizability using the DNA methylation data obtained in the other 11 normal tissues and cell types. Our experiments have shown that (1) an eight-dimensional feature space that is selected via the principal component analysis and that combines all categories of information is effective for predicting the CpG island methylation status, (2) by incorporating the information regarding the nucleosome positioning, gene functions, and histone acetylation, the models can achieve higher specificity and accuracy than the existing models while maintaining a comparable sensitivity measure, (3) the histone modification (methylation and acetylation) information contributes significantly to the prediction, without which the performance of the models deteriorate, and, (4) the predictive models generalize well to different tissues and cell types. The developed program CpGIMethPred is freely available at http://users.ece.gatech.edu/~hzheng7/CGIMetPred.zip.
Collapse
Affiliation(s)
- Hao Zheng
- School of Electrical and Computer Engineering, Georgia Institute of Technology, GA, USA
| | | | | | | |
Collapse
|
14
|
Deng H, Guo Y, Song H, Xiao B, Sun W, Liu Z, Yu X, Xia T, Cui L, Guo J. MicroRNA-195 and microRNA-378 mediate tumor growth suppression by epigenetical regulation in gastric cancer. Gene 2013; 518:351-9. [PMID: 23333942 DOI: 10.1016/j.gene.2012.12.103] [Citation(s) in RCA: 122] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2012] [Revised: 10/30/2012] [Accepted: 12/24/2012] [Indexed: 12/19/2022]
Abstract
The epigenetic regulation of microRNAs is one of several mechanisms underlying carcinogenesis. We found that microRNA-195 (miR-195) and microRNA-378 (miR-378) were significantly down-regulated in gastric cancer tissues and gastric cancer cell lines. The expression of miR-195 and miR-378 in gastric cancer cells was significantly restored by 5-aza-dC, a demethylation reagent. The low expression of miR-195 and miR-378 was closely related to the presence of promoter CpG island methylation. Treatment with miR-195/miR-378 mimics strikingly suppressed the growth of gastric cancer cells whereas promoted the growth of normal gastric epithelial cells. In contrast, administration of miR-195/miR-378 inhibitors significantly prevented the growth of normal gastric epithelial cells. Expression of cyclin-dependent kinase 6 and vascular endothelial growth factor was down-regulated by exogenous miR-195 and miR-378, respectively. In conclusion, miR-195 and miR-378 are abnormally expressed and epigenetically regulated in gastric cancer cell lines and tissues via the suppression of CDK6 and VEGF signaling, suggesting that miR-195 and miR-378 have tumor suppressor properties in gastric cancer.
Collapse
Affiliation(s)
- Hongxia Deng
- Department of Biochemistry and Molecular Biology, and Zhejiang Provincial Key Laboratory of Pathophysiology, Ningbo University School of Medicine, Ningbo, 315211, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Tan L, Shi YG. Tet family proteins and 5-hydroxymethylcytosine in development and disease. Development 2012; 139:1895-902. [PMID: 22569552 DOI: 10.1242/dev.070771] [Citation(s) in RCA: 272] [Impact Index Per Article: 22.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Over the past few decades, DNA methylation at the 5-position of cytosine (5-methylcytosine, 5mC) has emerged as an important epigenetic modification that plays essential roles in development, aging and disease. However, the mechanisms controlling 5mC dynamics remain elusive. Recent studies have shown that ten-eleven translocation (Tet) proteins can catalyze 5mC oxidation and generate 5mC derivatives, including 5-hydroxymethylcytosine (5hmC). The exciting discovery of these novel 5mC derivatives has begun to shed light on the dynamic nature of 5mC, and emerging evidence has shown that Tet family proteins and 5hmC are involved in normal development as well as in many diseases. In this Primer we provide an overview of the role of Tet family proteins and 5hmC in development and cancer.
Collapse
Affiliation(s)
- Li Tan
- Laboratory of Epigenetics, Institutes of Biomedical Sciences, Fudan University, Shanghai 200032, PR China
| | | |
Collapse
|
16
|
Linking the epigenome to the genome: correlation of different features to DNA methylation of CpG islands. PLoS One 2012; 7:e35327. [PMID: 22558141 PMCID: PMC3340366 DOI: 10.1371/journal.pone.0035327] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Accepted: 03/12/2012] [Indexed: 11/24/2022] Open
Abstract
DNA methylation of CpG islands plays a crucial role in the regulation of gene expression. More than half of all human promoters contain CpG islands with a tissue-specific methylation pattern in differentiated cells. Still today, the whole process of how DNA methyltransferases determine which region should be methylated is not completely revealed. There are many hypotheses of which genomic features are correlated to the epigenome that have not yet been evaluated. Furthermore, many explorative approaches of measuring DNA methylation are limited to a subset of the genome and thus, cannot be employed, e.g., for genome-wide biomarker prediction methods. In this study, we evaluated the correlation of genetic, epigenetic and hypothesis-driven features to DNA methylation of CpG islands. To this end, various binary classifiers were trained and evaluated by cross-validation on a dataset comprising DNA methylation data for 190 CpG islands in HEPG2, HEK293, fibroblasts and leukocytes. We achieved an accuracy of up to 91% with an MCC of 0.8 using ten-fold cross-validation and ten repetitions. With these models, we extended the existing dataset to the whole genome and thus, predicted the methylation landscape for the given cell types. The method used for these predictions is also validated on another external whole-genome dataset. Our results reveal features correlated to DNA methylation and confirm or disprove various hypotheses of DNA methylation related features. This study confirms correlations between DNA methylation and histone modifications, DNA structure, DNA sequence, genomic attributes and CpG island properties. Furthermore, the method has been validated on a genome-wide dataset from the ENCODE consortium. The developed software, as well as the predicted datasets and a web-service to compare methylation states of CpG islands are available at http://www.cogsys.cs.uni-tuebingen.de/software/dna-methylation/.
Collapse
|
17
|
Yang Y, Nephew K, Kim S. A novel k-mer mixture logistic regression for methylation susceptibility modeling of CpG dinucleotides in human gene promoters. BMC Bioinformatics 2012; 13 Suppl 3:S15. [PMID: 22536899 PMCID: PMC3311103 DOI: 10.1186/1471-2105-13-s3-s15] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Background DNA methylation is essential for normal development and differentiation and plays a crucial role in the development of nearly all types of cancer. Aberrant DNA methylation patterns, including genome-wide hypomethylation and region-specific hypermethylation, are frequently observed and contribute to the malignant phenotype. A number of studies have recently identified distinct features of genomic sequences that can be used for modeling specific DNA sequences that may be susceptible to aberrant CpG methylation in both cancer and normal cells. Although it is now possible, using next generation sequencing technologies, to assess human methylomes at base resolution, no reports currently exist on modeling cell type-specific DNA methylation susceptibility. Thus, we conducted a comprehensive modeling study of cell type-specific DNA methylation susceptibility at three different resolutions: CpG dinucleotides, CpG segments, and individual gene promoter regions. Results Using a k-mer mixture logistic regression model, we effectively modeled DNA methylation susceptibility across five different cell types. Further, at the segment level, we achieved up to 0.75 in AUC prediction accuracy in a 10-fold cross validation study using a mixture of k-mers. Conclusions The significance of these results is three fold: 1) this is the first report to indicate that CpG methylation susceptible "segments" exist; 2) our model demonstrates the significance of certain k-mers for the mixture model, potentially highlighting DNA sequence features (k-mers) of differentially methylated, promoter CpG island sequences across different tissue types; 3) as only 3 or 4 bp patterns had previously been used for modeling DNA methylation susceptibility, ours is the first demonstration that 6-mer modeling can be performed without loss of accuracy.
Collapse
|
18
|
Ali I, Seker H. A comparative study for characterisation and prediction of tissue-specific DNA methylation of CpG islands in chromosomes 6, 20 and 22. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2011; 2010:1832-5. [PMID: 21096144 DOI: 10.1109/iembs.2010.5626437] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Advanced technology has enabled identification of tissue-specific methylated CpG islands of different human tissues. As methylation of CpG islands is involved in various biological phenomena and function of the DNA methylation is linked to various human diseases such as cancer, analysis of the CpG islands has become important and useful in characterising and modelling biological phenomena and understanding mechanism of such diseases. However, analysis of the data associated with the CpG islands is a quite new and challenging subject in bioinformatics, systems biology and epigenetics.
Collapse
Affiliation(s)
- Isse Ali
- Bio-Health Informatics Research Group within the Centre for Computational Intelligence, Faculty of Technology, De Montfort University, Leicester, LE1 9BH, United Kingdom.
| | | |
Collapse
|
19
|
Hackenberg M, Barturen G, Carpena P, Luque-Escamilla PL, Previti C, Oliver JL. Prediction of CpG-island function: CpG clustering vs. sliding-window methods. BMC Genomics 2010; 11:327. [PMID: 20500903 PMCID: PMC2887419 DOI: 10.1186/1471-2164-11-327] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 05/26/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Unmethylated stretches of CpG dinucleotides (CpG islands) are an outstanding property of mammal genomes. Conventionally, these regions are detected by sliding window approaches using %G + C, CpG observed/expected ratio and length thresholds as main parameters. Recently, clustering methods directly detect clusters of CpG dinucleotides as a statistical property of the genome sequence. RESULTS We compare sliding-window to clustering (i.e. CpGcluster) predictions by applying new ways to detect putative functionality of CpG islands. Analyzing the co-localization with several genomic regions as a function of window size vs. statistical significance (p-value), CpGcluster shows a higher overlap with promoter regions and highly conserved elements, at the same time showing less overlap with Alu retrotransposons. The major difference in the prediction was found for short islands (CpG islets), often exclusively predicted by CpGcluster. Many of these islets seem to be functional, as they are unmethylated, highly conserved and/or located within the promoter region. Finally, we show that window-based islands can spuriously overlap several, differentially regulated promoters as well as different methylation domains, which might indicate a wrong merge of several CpG islands into a single, very long island. The shorter CpGcluster islands seem to be much more specific when concerning the overlap with alternative transcription start sites or the detection of homogenous methylation domains. CONCLUSIONS The main difference between sliding-window approaches and clustering methods is the length of the predicted islands. Short islands, often differentially methylated, are almost exclusively predicted by CpGcluster. This suggests that CpGcluster may be the algorithm of choice to explore the function of these short, but putatively functional CpG islands.
Collapse
Affiliation(s)
- Michael Hackenberg
- Dpto. de Genética, Facultad de Ciencias, Universidad de Granada, Campus de Fuentenueva s/n, 18071, Granada, Spain.
| | | | | | | | | | | |
Collapse
|
20
|
Shumay E, Fowler JS. Identification and characterization of putative methylation targets in the MAOA locus using bioinformatic approaches. Epigenetics 2010; 5:325-42. [PMID: 20421737 DOI: 10.4161/epi.5.4.11719] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Monoamine oxidase A (MAO A) is an enzyme that catalyzes the oxidation of neurotransmitter amines. A functional polymorphism in the human MAOA gene (high- and low-MAOA) has been associated with distinct behavioral phenotypes. To investigate directly the biological mechanism whereby this polymorphism influences brain function, we recently measured the activity of the MAO A enzyme in healthy volunteers. When found no relationship between the individual's brain MAO A level and the MAOA genotype, we postulated that there are additional regulatory mechanisms that control the MAOA expression. Given that DNA methylation is linked to the regulation of gene expression, we hypothesized that epigenetic mechanisms factor into the MAOA expression. Our underplaying assumption was that the differences in an individual's genotype play a key role in the epigenetic potential of the MAOA locus and, consequently, determine the individual's level of MAO A activity in the brain. As a first step towards experimental validation of the hypothesis, we performed a comprehensive bioinformatic analysis aiming to interrogate genomic features and attributes of the MAOA locus that might modulate its epigenetic sensitivity. Major findings of our analysis are the following: (1) the extended MAOA regulatory region contains two CpG islands (CGIs), one of which overlaps with the canonical MAOA promoter and the other is located further upstream; both CGIs exhibit sensitivity to differential methylation. (2) The uVNTR's effect on the MAOA's transcriptional activity might have epigenetic nature: this polymorphic region resides within the MAOA's CGI and itself contains CpGs, thus, the number of repeating increments effectively changes the number of methylatable cytosines in the MAOA promoter. An array of in silico analyses (the nucleosome positioning, the physical properties of the local DNA, the clustering of transcription-factor binding sites) together with experimental data on histone modifications and Pol 2 sites and data from the RefSeq mRNA library suggest that the MAOA gene might have an alternative promoter. Based on our findings, we propose a regulatory mechanism for the human MAOA according to which the MAOA expression in vivo is executed by the generation of tissue-specific transcripts initiated from the alternative promoters (both CGI-associated) where transcriptional activation of a particular promoter is under epigenetic control.
Collapse
Affiliation(s)
- Elena Shumay
- Brookhaven National Laboratory, Medical Department, Upton, NY, USA.
| | | |
Collapse
|
21
|
Sun S, Yan PS, Huang THM, Lin S. Identifying differentially methylated genes using mixed effect and generalized least square models. BMC Bioinformatics 2009; 10:404. [PMID: 20003206 PMCID: PMC2800121 DOI: 10.1186/1471-2105-10-404] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 12/09/2009] [Indexed: 11/10/2022] Open
Abstract
Background DNA methylation plays an important role in the process of tumorigenesis. Identifying differentially methylated genes or CpG islands (CGIs) associated with genes between two tumor subtypes is thus an important biological question. The methylation status of all CGIs in the whole genome can be assayed with differential methylation hybridization (DMH) microarrays. However, patient samples or cell lines are heterogeneous, so their methylation pattern may be very different. In addition, neighboring probes at each CGI are correlated. How these factors affect the analysis of DMH data is unknown. Results We propose a new method for identifying differentially methylated (DM) genes by identifying the associated DM CGI(s). At each CGI, we implement four different mixed effect and generalized least square models to identify DM genes between two groups. We compare four models with a simple least square regression model to study the impact of incorporating random effects and correlations. Conclusions We demonstrate that the inclusion (or exclusion) of random effects and the choice of correlation structures can significantly affect the results of the data analysis. We also assess the false discovery rate of different models using CGIs associated with housekeeping genes.
Collapse
Affiliation(s)
- Shuying Sun
- Case Comprehensive Cancer Center, Case Western Reserve University, Cleveland, Ohio 44106, USA.
| | | | | | | |
Collapse
|