1
|
Yassi M, Shams Davodly E, Hajebi Khaniki S, Kerachian MA. HBCR_DMR: A Hybrid Method Based on Beta-Binomial Bayesian Hierarchical Model and Combination of Ranking Method to Detect Differential Methylation Regions in Bisulfite Sequencing Data. J Pers Med 2024; 14:361. [PMID: 38672987 PMCID: PMC11051304 DOI: 10.3390/jpm14040361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Revised: 10/20/2023] [Accepted: 01/09/2024] [Indexed: 04/28/2024] Open
Abstract
DNA methylation is a key epigenetic modification involved in gene regulation, contributing to both physiological and pathological conditions. For a more profound comprehension, it is essential to conduct a precise comparison of DNA methylation patterns between sample groups that represent distinct statuses. Analysis of differentially methylated regions (DMRs) using computational approaches can help uncover the precise relationships between these phenomena. This paper describes a hybrid model that combines the beta-binomial Bayesian hierarchical model with a combination of ranking methods known as HBCR_DMR. During the initial phase, we model the actual methylation proportions of the CpG sites (CpGs) within the replicates. This modeling is achieved through beta-binomial distribution, with parameters set by a group mean and a dispersion parameter. During the second stage, we establish the selection of distinguishing CpG sites based on their methylation status, employing multiple ranking techniques. Finally, we combine the ranking lists of differentially methylated CpG sites through a voting system. Our analyses, encompassing simulations and real data, reveal outstanding performance metrics, including a sensitivity of 0.72, specificity of 0.89, and an F1 score of 0.76, yielding an overall accuracy of 0.82 and an AUC of 0.94. These findings underscore HBCR_DMR's robust capacity to distinguish methylated regions, confirming its utility as a valuable tool for DNA methylation analysis.
Collapse
Affiliation(s)
- Maryam Yassi
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad 9184156815, Iran; (M.Y.); (E.S.D.)
- Department of Mathematics and Statistics, University of Otago, Dunedin 9054, New Zealand
- Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin 9054, New Zealand
| | - Ehsan Shams Davodly
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad 9184156815, Iran; (M.Y.); (E.S.D.)
| | - Saeedeh Hajebi Khaniki
- Student Research Committee, Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran;
| | - Mohammad Amin Kerachian
- Cancer Genetics Research Unit, Reza Radiotherapy and Oncology Center, Mashhad 9184156815, Iran; (M.Y.); (E.S.D.)
- Medical Genetics Research Center, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran
- Department of Medical Genetics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad 9177948564, Iran
- Department of Chemistry and Biology, Toronto Metropolitan University, Toronto, ON M5B 2K3, Canada
| |
Collapse
|
2
|
Shokoohi F, Khaniki SH. Uncovering Alterations in Cancer Epigenetics via Trans-Dimensional Markov Chain Monte Carlo and Hidden Markov Models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.06.15.545168. [PMID: 37398181 PMCID: PMC10312753 DOI: 10.1101/2023.06.15.545168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Epigenetic alterations are key drivers in the development and progression of cancer. Identifying differentially methylated cytosines (DMCs) in cancer samples is a crucial step toward understanding these changes. In this paper, we propose a trans-dimensional Markov chain Monte Carlo (TMCMC) approach that uses hidden Markov models (HMMs) with binomial emission, and bisulfite sequencing (BS-Seq) data, called DMCTHM, to identify DMCs in cancer epigenetic studies. We introduce the Expander-Collider penalty to tackle under and over-estimation in TMCMC-HMMs. We address all known challenges inherent in BS-Seq data by introducing novel approaches for capturing functional patterns and autocorrelation structure of the data, as well as for handling missing values, multiple covariates, multiple comparisons, and family-wise errors. We demonstrate the effectiveness of DMCTHM through comprehensive simulation studies. The results show that our proposed method outperforms other competing methods in identifying DMCs. Notably, with DMCTHM, we uncovered new DMCs and genes in Colorectal cancer that were significantly enriched in the Tp53 pathway.
Collapse
Affiliation(s)
- Farhad Shokoohi
- Department of Mathematical Sciences, University of Nevada-Las Vegas, Las Vega, NV 89154, USA
| | - Saeedeh Hajebi Khaniki
- Department of Biostatistics, School of Health, Mashhad University of Medical Sciences, Mashhad, Iran
| |
Collapse
|
3
|
Malonzo MH, Lähdesmäki H. LuxHMM: DNA methylation analysis with genome segmentation via hidden Markov model. BMC Bioinformatics 2023; 24:58. [PMID: 36810075 PMCID: PMC9945676 DOI: 10.1186/s12859-023-05174-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2022] [Accepted: 02/06/2023] [Indexed: 02/23/2023] Open
Abstract
BACKGROUND DNA methylation plays an important role in studying the epigenetics of various biological processes including many diseases. Although differential methylation of individual cytosines can be informative, given that methylation of neighboring CpGs are typically correlated, analysis of differentially methylated regions is often of more interest. RESULTS We have developed a probabilistic method and software, LuxHMM, that uses hidden Markov model (HMM) to segment the genome into regions and a Bayesian regression model, which allows handling of multiple covariates, to infer differential methylation of regions. Moreover, our model includes experimental parameters that describe the underlying biochemistry in bisulfite sequencing and model inference is done using either variational inference for efficient genome-scale analysis or Hamiltonian Monte Carlo (HMC). CONCLUSIONS Analyses of real and simulated bisulfite sequencing data demonstrate the competitive performance of LuxHMM compared with other published differential methylation analysis methods.
Collapse
Affiliation(s)
- Maia H. Malonzo
- grid.5373.20000000108389418Department of Computer Science, Aalto University, 00076 Espoo, Finland
| | - Harri Lähdesmäki
- grid.5373.20000000108389418Department of Computer Science, Aalto University, 00076 Espoo, Finland
| |
Collapse
|
4
|
Chen X, Luo J, Liu J, Chen T, Sun J, Zhang Y, Xi Q. Exploration of the Effect on Genome-Wide DNA Methylation by miR-143 Knock-Out in Mice Liver. Int J Mol Sci 2021; 22:13075. [PMID: 34884879 PMCID: PMC8658369 DOI: 10.3390/ijms222313075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2021] [Revised: 11/30/2021] [Accepted: 12/01/2021] [Indexed: 12/12/2022] Open
Abstract
MiR-143 play an important role in hepatocellular carcinoma and liver fibrosis via inhibiting hepatoma cell proliferation. DNA methyltransferase 3 alpha (DNMT3a), as a target of miR-143, regulates the development of primary organic solid tumors through DNA methylation mechanisms. However, the effect of miR-143 on DNA methylation profiles in liver is unclear. In this study, we used Whole-Genome Bisulfite Sequencing (WGBS) to detect the differentially methylated regions (DMRs), and investigated DMR-related genes and their enriched pathways by miR-143. We found that methylated cytosines increased 0.19% in the miR-143 knock-out (KO) liver fed with high-fat diet (HFD), compared with the wild type (WT). Furthermore, compared with the WT group, the CG methylation patterns of the KO group showed lower CG methylation levels in CG islands (CGIs), promoters and hypermethylation in CGI shores, 5'UTRs, exons, introns, 3'UTRs, and repeat regions. A total of 984 DMRs were identified between the WT and KO groups consisting of 559 hypermethylation and 425 hypomethylation DMRs. Furthermore, DMR-related genes were enriched in metabolism pathways such as carbon metabolism (serine hydroxymethyltransferase 2 (Shmt2), acyl-Coenzyme A dehydrogenase medium chain (Acadm)), arginine and proline metabolism (spermine synthase (Sms), proline dehydrogenase (Prodh2)) and purine metabolism (phosphoribosyl pyrophosphate synthetase 2 (Prps2)). In summary, we are the first to report the change in whole-genome methylation levels by miR-143-null through WGBS in mice liver, and provide an experimental basis for clinical diagnosis and treatment in liver diseases, indicating that miR-143 may be a potential therapeutic target and biomarker for liver damage-associated diseases and hepatocellular carcinoma.
Collapse
Affiliation(s)
| | | | | | | | | | - Yongliang Zhang
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, No. 483 Wushan Road, Guangzhou 510642, China; (X.C.); (J.L.); (J.L.); (T.C.); (J.S.)
| | - Qianyun Xi
- Guangdong Provincial Key Laboratory of Animal Nutrition Control, National Engineering Research Center for Breeding Swine Industry, College of Animal Science, South China Agricultural University, No. 483 Wushan Road, Guangzhou 510642, China; (X.C.); (J.L.); (J.L.); (T.C.); (J.S.)
| |
Collapse
|
5
|
Peng X, Li Y, Kong X, Zhu X, Ding X. Investigating Different DNA Methylation Patterns at the Resolution of Methylation Haplotypes. Front Genet 2021; 12:697279. [PMID: 34262601 PMCID: PMC8273290 DOI: 10.3389/fgene.2021.697279] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 06/01/2021] [Indexed: 11/15/2022] Open
Abstract
Different DNA methylation patterns presented on different tissues or cell types are considered as one of the main reasons accounting for the tissue-specific gene expressions. In recent years, many methods have been proposed to identify differentially methylated regions (DMRs) based on the mixture of methylation signals from homologous chromosomes. To investigate the possible influence of homologous chromosomes on methylation analysis, this paper proposed a method (MHap) to construct methylation haplotypes for homologous chromosomes in CpG dense regions. Through comparing the methylation consistency between homologous chromosomes in different cell types, it can be found that majority of paired methylation haplotypes derived from homologous chromosomes are consistent, while a lower methylation consistency was observed in the breast cancer sample. It also can be observed that the hypomethylation consistency of differentiated cells is higher than that of the corresponding undifferentiated stem cells. Furthermore, based on the methylation haplotypes constructed on homologous chromosomes, a method (MHap_DMR) is developed to identify DMRs between differentiated cells and the corresponding undifferentiated stem cells, or between the breast cancer sample and the normal breast sample. Through comparing the methylation haplotype modes of DMRs in two cell types, the DNA methylation changing directions of homologous chromosomes in cell differentiation and cancerization can be revealed. The code is available at: https://github.com/xqpeng/MHap_DMR.
Collapse
Affiliation(s)
- Xiaoqing Peng
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Yiming Li
- Center for Medical Genetics & Hunan Key Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, China
| | - Xiangyan Kong
- School of Computer Science and Engineering, Central South University, Changsha, China
| | - Xiaoshu Zhu
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| | - Xiaojun Ding
- School of Computer Science and Engineering, Yulin Normal University, Yulin, China
| |
Collapse
|
6
|
Halla-Aho V, Lähdesmäki H. LuxUS: DNA methylation analysis using generalized linear mixed model with spatial correlation. Bioinformatics 2020; 36:4535-4543. [PMID: 32484876 PMCID: PMC7750928 DOI: 10.1093/bioinformatics/btaa539] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2019] [Revised: 05/05/2020] [Accepted: 05/27/2020] [Indexed: 11/19/2022] Open
Abstract
Motivation DNA methylation is an important epigenetic modification, which has multiple functions. DNA methylation and its connections to diseases have been extensively studied in recent years. It is known that DNA methylation levels of neighboring cytosines are correlated and that differential DNA methylation typically occurs rather as regions instead of individual cytosine level. Results We have developed a generalized linear mixed model, LuxUS, that makes use of the correlation between neighboring cytosines to facilitate analysis of differential methylation. LuxUS implements a likelihood model for bisulfite sequencing data that accounts for experimental variation in underlying biochemistry. LuxUS can model both binary and continuous covariates, and mixed model formulation enables including replicate and cytosine random effects. Spatial correlation is included to the model through a cytosine random effect correlation structure. We show with simulation experiments that using the spatial correlation, we gain more power to the statistical testing of differential DNA methylation. Results with real bisulfite sequencing dataset show that LuxUS is able to detect biologically significant differentially methylated cytosines. Availability and implementation The tool is available at https://github.com/hallav/LuxUS. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Viivi Halla-Aho
- Department of Computer Science, Aalto University, FI-00076 Aalto, Finland
| | - Harri Lähdesmäki
- Department of Computer Science, Aalto University, FI-00076 Aalto, Finland
| |
Collapse
|
7
|
Korthauer K, Chakraborty S, Benjamini Y, Irizarry RA. Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics 2019; 20:367-383. [PMID: 29481604 PMCID: PMC6587918 DOI: 10.1093/biostatistics/kxy007] [Citation(s) in RCA: 92] [Impact Index Per Article: 18.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 01/21/2018] [Indexed: 12/22/2022] Open
Abstract
With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.
Collapse
Affiliation(s)
- Keegan Korthauer
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Sutirtha Chakraborty
- Novartis, Inorbit Mall Rd, Silpa Gram Craft Village, HITEC City, Hyderabad, Telangana, India
| | - Yuval Benjamini
- The Statistics Department, Hebrew University, Mount Scopus, Jerusalem, Israel
| | - Rafael A Irizarry
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| |
Collapse
|
8
|
Srivastava A, Karpievitch YV, Eichten SR, Borevitz JO, Lister R. HOME: a histogram based machine learning approach for effective identification of differentially methylated regions. BMC Bioinformatics 2019; 20:253. [PMID: 31096906 PMCID: PMC6521357 DOI: 10.1186/s12859-019-2845-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2018] [Accepted: 04/24/2019] [Indexed: 12/23/2022] Open
Abstract
Background The development of whole genome bisulfite sequencing has made it possible to identify methylation differences at single base resolution throughout an entire genome. However, a persistent challenge in DNA methylome analysis is the accurate identification of differentially methylated regions (DMRs) between samples. Sensitive and specific identification of DMRs among different conditions requires accurate and efficient algorithms, and while various tools have been developed to tackle this problem, they frequently suffer from inaccurate DMR boundary identification and high false positive rate. Results We present a novel Histogram Of MEthylation (HOME) based method that takes into account the inherent difference in the distribution of methylation levels between DMRs and non-DMRs to discriminate between the two using a Support Vector Machine. We show that generated features used by HOME are dataset-independent such that a classifier trained on, for example, a mouse methylome training set of regions of differentially accessible chromatin, can be applied to any other organism’s dataset and identify accurate DMRs. We demonstrate that DMRs identified by HOME exhibit higher association with biologically relevant genes, processes, and regulatory events compared to the existing methods. Moreover, HOME provides additional functionalities lacking in most of the current DMR finders such as DMR identification in non-CG context and time series analysis. HOME is freely available at https://github.com/ListerLab/HOME. Conclusion HOME produces more accurate DMRs than the current state-of-the-art methods on both simulated and biological datasets. The broad applicability of HOME to identify accurate DMRs in genomic data from any organism will have a significant impact upon expanding our knowledge of how DNA methylation dynamics affect cell development and differentiation. Electronic supplementary material The online version of this article (10.1186/s12859-019-2845-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Akanksha Srivastava
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia
| | - Yuliya V Karpievitch
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia.,Harry Perkins Institute of Medical Research, Perth, Australia
| | - Steven R Eichten
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia
| | - Justin O Borevitz
- ARC Centre of Excellence in Plant Energy Biology, The Australian National University, Canberra, Australia
| | - Ryan Lister
- ARC Centre of Excellence in Plant Energy Biology, The University of Western Australia, Perth, Australia. .,Harry Perkins Institute of Medical Research, Perth, Australia.
| |
Collapse
|
9
|
Shafi A, Mitrea C, Nguyen T, Draghici S. A survey of the approaches for identifying differential methylation using bisulfite sequencing data. Brief Bioinform 2019; 19:737-753. [PMID: 28334228 DOI: 10.1093/bib/bbx013] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Indexed: 01/03/2023] Open
Abstract
DNA methylation is an important epigenetic mechanism that plays a crucial role in cellular regulatory systems. Recent advancements in sequencing technologies now enable us to generate high-throughput methylation data and to measure methylation up to single-base resolution. This wealth of data does not come without challenges, and one of the key challenges in DNA methylation studies is to identify the significant differences in the methylation levels of the base pairs across distinct biological conditions. Several computational methods have been developed to identify differential methylation using bisulfite sequencing data; however, there is no clear consensus among existing approaches. A comprehensive survey of these approaches would be of great benefit to potential users and researchers to get a complete picture of the available resources. In this article, we present a detailed survey of 22 such approaches focusing on their underlying statistical models, primary features, key advantages and major limitations. Importantly, the intrinsic drawbacks of the approaches pointed out in this survey could potentially be addressed by future research.
Collapse
Affiliation(s)
- Adib Shafi
- Department of Computer Science, Wayne State University, USA
| | | | - Tin Nguyen
- Department of Computer Science, Wayne State University, USA
| | - Sorin Draghici
- Department of Computer Science, Wayne State University, USA.,Department of Obstetrics and Gynecology, Wayne State University, USA
| |
Collapse
|
10
|
An SM, Kwon S, Hwang JH, Yu GE, Kang DG, Park DH, Kim TW, Park HC, Ha J, Kim CW. Hypomethylation in the promoter region of ZPBP as a potential litter size indicator in Berkshire pigs. Arch Anim Breed 2019; 62:69-76. [PMID: 31807615 PMCID: PMC6852858 DOI: 10.5194/aab-62-69-2019] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 03/01/2019] [Indexed: 01/09/2023] Open
Abstract
In pigs, litter size is typically defined as the total number of piglets born (TNB) or the number of piglets born alive (NBA). Increasing pig litter size is of great economic interest as a means to increase productivity. The capacity of the uterus is a critical component of litter size and may play a central role in prolificacy. In this study, we investigated litter-size-related epigenetic markers in uterine tissue from Berkshire pigs with smaller litter size groups (SLGs) and larger litter size groups (LLGs) using genome-wide bisulfite sequencing (GWBS). A total of 3269 differentially methylated regions (DMRs) were identified: 1566 were hypermethylated and 1703 hypomethylated in LLG compared to SLG. The zona pellucida binding protein (ZPBP) gene was significantly hypomethylated in the LLG promoter region, and its expression was significantly upregulated in uterine tissue. Thus, the methylation status of ZPBP gene was identified as a potential indicator of litter size. Furthermore, we verified its negative correlation with litter size traits (TNB and NBA) in whole blood samples from 172 Berkshire sows as a blood-based biomarker by a porcine methylation-specific restriction enzyme polymerase chain reaction (PMP) assay. The results suggest that the methylation status of the ZPBP gene can serve as a valuable epigenetic biomarker for hyperprolific sows.
Collapse
Affiliation(s)
- Sang Mi An
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Seulgi Kwon
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Jung Hye Hwang
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Go Eun Yu
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Deok Gyeong Kang
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Da Hye Park
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Tae Wan Kim
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | | | - Jeongim Ha
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| | - Chul Wook Kim
- Swine Science and Technology Center, Gyeongnam National University of Science & Technology, Jinju, 52725, South Korea
| |
Collapse
|
11
|
Guo W, Zhu P, Pellegrini M, Zhang MQ, Wang X, Ni Z. CGmapTools improves the precision of heterozygous SNV calls and supports allele-specific methylation detection and visualization in bisulfite-sequencing data. Bioinformatics 2018; 34:381-387. [PMID: 28968643 DOI: 10.1093/bioinformatics/btx595] [Citation(s) in RCA: 106] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 09/15/2017] [Indexed: 12/23/2022] Open
Abstract
Motivation DNA methylation is important for gene silencing and imprinting in both plants and animals. Recent advances in bisulfite sequencing allow detection of single nucleotide variations (SNVs) achieving high sensitivity, but accurately identifying heterozygous SNVs from partially C-to-T converted sequences remains challenging. Results We designed two methods, BayesWC and BinomWC, that substantially improved the precision of heterozygous SNV calls from ∼80% to 99% while retaining comparable recalls. With these SNV calls, we provided functions for allele-specific DNA methylation (ASM) analysis and visualizing the methylation status on reads. Applying ASM analysis to a previous dataset, we found that an average of 1.5% of investigated regions showed allelic methylation, which were significantly enriched in transposon elements and likely to be shared by the same cell-type. A dynamic fragment strategy was utilized for DMR analysis in low-coverage data and was able to find differentially methylated regions (DMRs) related to key genes involved in tumorigenesis using a public cancer dataset. Finally, we integrated 40 applications into the software package CGmapTools to analyze DNA methylomes. This package uses CGmap as the format interface, and designs binary formats to reduce the file size and support fast data retrieval, and can be applied for context-wise, gene-wise, bin-wise, region-wise and sample-wise analyses and visualizations. Availability and implementation The CGmapTools software is freely available at https://cgmaptools.github.io/. Contact guoweilong@cau.edu.cn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Weilong Guo
- State Key Laboratory for Agrobiotechnology, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| | - Ping Zhu
- State Key Laboratory of Experimental Hematology, Institute of Hematology and Blood Disease Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Tianjin 300020, China.,BIOPIC, Peking-Tsinghua Center for Life Sciences, College of Life Sciences, Peking University, Beijing 100871, China
| | - Matteo Pellegrini
- Department of Molecular, Cell and Developmental Biology, University of California, Los Angeles, CA 90095, USA
| | - Michael Q Zhang
- Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX 75080, USA.,Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China
| | - Xiangfeng Wang
- Beijing Advanced Innovation Center for Food Nutrition and Human health, China Agricultural University, Beijing 100193, China
| | - Zhongfu Ni
- State Key Laboratory for Agrobiotechnology, Key Laboratory of Crop Heterosis and Utilization, Beijing Key Laboratory of Crop Genetic Improvement, China Agricultural University, Beijing 100193, China
| |
Collapse
|
12
|
Kwon S, An SM, Yu GE, Hwang JH, Park DH, Kang DG, Kim TW, Park HC, Ha J, Kim CW. A prognostic method for the litter size in Berkshire pigs based on DNA methylation of IGFBP4 gene. CANADIAN JOURNAL OF ANIMAL SCIENCE 2018. [DOI: 10.1139/cjas-2017-0160] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Litter size is an important trait in the pig industry. Therefore, a lot of effort has been put into improving this trait. DNA methylation is an essential epigenetic modification present in unique DNA sequences. Alterations in methylation can affect transcription and phenotypic variation. The purpose of this study was to investigate the effect of DNA methylation on litter size. Methylation-specific restriction enzymes are simple and useful tools for detecting DNA methylation status. We used a pair of methylation-sensitive isoschizomers, which have the same recognition site, HpaII and MspI. Insulin-like growth factor binding protein 4 (IGFBP4) is a key regulator of ovarian follicular development and fetal growth in eutherian mammals. In this study, we discovered that IGFBP4 was hyper-methylated in the uterus tissue of a larger litter size group using bisulfite sequencing, and validated the positive relationship between the methylation status of IGFBP4 and the total number born of pigs using the porcine methylation-specific restriction enzyme polymerase chain reaction (PMP) assay. We suggest that the IGFPB4 gene can be used as a prognostic biomarker for hyperprolific sows and that the PMP assay is a useful tool for methylation status screening.
Collapse
Affiliation(s)
- Seulgi Kwon
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Sang Mi An
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Go Eun Yu
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Jung Hye Hwang
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Da Hye Park
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Deok Gyeong Kang
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Tae Wan Kim
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Hwa Chun Park
- Dasan Pig Breeding Co., Namwon-si 590-831, South Korea
| | - Jeongim Ha
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| | - Chul Wook Kim
- Swine Science and Technology Center, Gyeongnam National University of Science and Technology, Jinju 52725, South Korea
| |
Collapse
|
13
|
Li W, Li Q, Kang S, Same M, Zhou Y, Sun C, Liu CC, Matsuoka L, Sher L, Wong WH, Alber F, Zhou X. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res 2018; 46:e89. [PMID: 29897492 PMCID: PMC6125664 DOI: 10.1093/nar/gky423] [Citation(s) in RCA: 113] [Impact Index Per Article: 18.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 05/01/2018] [Accepted: 05/29/2018] [Indexed: 12/13/2022] Open
Abstract
The detection of tumor-derived cell-free DNA in plasma is one of the most promising directions in cancer diagnosis. The major challenge in such an approach is how to identify the tiny amount of tumor DNAs out of total cell-free DNAs in blood. Here we propose an ultrasensitive cancer detection method, termed 'CancerDetector', using the DNA methylation profiles of cell-free DNAs. The key of our method is to probabilistically model the joint methylation states of multiple adjacent CpG sites on an individual sequencing read, in order to exploit the pervasive nature of DNA methylation for signal amplification. Therefore, CancerDetector can sensitively identify a trace amount of tumor cfDNAs in plasma, at the level of individual reads. We evaluated CancerDetector on the simulated data, and showed a high concordance of the predicted and true tumor fraction. Testing CancerDetector on real plasma data demonstrated its high sensitivity and specificity in detecting tumor cfDNAs. In addition, the predicted tumor fraction showed great consistency with tumor size and survival outcome. Note that all of those testing were performed on sequencing data at low to medium coverage (1× to 10×). Therefore, CancerDetector holds the great potential to detect cancer early and cost-effectively.
Collapse
Affiliation(s)
- Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Qingjiao Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Shuli Kang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Mary Same
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Yonggang Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Carol Sun
- Oak Park High School, Oak Park, CA 91377, USA
| | - Chun-Chi Liu
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taiwan 40227, Republic of China
| | - Lea Matsuoka
- Division of Hepatobiliary Surgery & Liver Transplantation, Department of Surgery, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Linda Sher
- Department of Surgery, University of Southern California, Keck School of Medicine, Los Angeles, Los Angeles, CA 90033, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Health Research & Policy, Stanford University, Stanford, CA 94305, USA
| | - Frank Alber
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California at Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
14
|
Chen F, Zhang Q, Deng X, Zhang X, Chen C, Lv D, Li Y, Li D, Zhang Y, Li P, Diao Y, Kang L, Owen GI, Chen J, Li Z. Conflicts of CpG density and DNA methylation are proximally and distally involved in gene regulation in human and mouse tissues. Epigenetics 2018; 13:721-741. [PMID: 30009687 DOI: 10.1080/15592294.2018.1500057] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
The relationship between CpG content and DNA methylation has attracted considerable interest in recent years. Direct or indirect methods have been developed to investigate their regulatory functions based on various hypotheses, large cohort studies, and meta-analyses. However, all of these analyses were performed at units of CpG blocks and, thus, the influence of finer genome structure has been neglected. Herein, we present a novel algorithm of base-pair resolution to systematically investigate the relationship between CpG contents and DNA methylation. By introducing the concept of 'complementary index' we examined the methylomes of 34 adult and 7 embryonic tissues and successfully fitted the relationship of DNA methylation and CpG density into a nonlinear mathematical model. A further algorithm was developed to locate the regions where CpG density does not match expectations from the model, termed 'conflict of gap' (COG) regions. Interestingly, COGs are highly concordant in human and mouse and their distributions display a tissue-specific pattern. Based on COG methylation patterns we correctly classified tissues according to their function or origin. We demonstrate that COGs based on our method can reveal more and deeper information than traditional differential methylation region (DMR) approaches. We also found that when COGs are located near to transcription start site (TSS), these regions can determine which promoters will be utilized for initiating gene transcription. Furthermore, COGs located far from the TSS perform as enhancers in terms of histone modification, sequence conservation, transcription factor binding, and DNase I-hypersensitivity.
Collapse
Affiliation(s)
- Fushun Chen
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Qingzheng Zhang
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Xiaodi Deng
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Xia Zhang
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Chengjun Chen
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Dekang Lv
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Yulong Li
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Dan Li
- b The Second Hospital of Dalian Medical University , Dalian , China
| | - Yu Zhang
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Peiying Li
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Yunpeng Diao
- c Department of Pharmacy , Dalian Medical University , Dalian , China
| | - Lan Kang
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| | - Gareth I Owen
- d Faculty of Biological Sciences , Pontificia Universidad Católica de Chile , Santiago , Chile
| | - Jun Chen
- b The Second Hospital of Dalian Medical University , Dalian , China
| | - Zhiguang Li
- a Center of Genome and Personalized Medicine , Institute of Cancer Stem Cell, Dalian Medical University , Dalian , China
| |
Collapse
|
15
|
Jenkinson G, Abante J, Feinberg AP, Goutsias J. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data. BMC Bioinformatics 2018; 19:87. [PMID: 29514626 PMCID: PMC5842653 DOI: 10.1186/s12859-018-2086-5] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 02/22/2018] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. RESULTS We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. CONCLUSIONS This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.
Collapse
Affiliation(s)
- Garrett Jenkinson
- Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA
- Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - Jordi Abante
- Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA
| | - Andrew P. Feinberg
- Center for Epigenetics, Johns Hopkins School of Medicine, Baltimore, MD, USA
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
- Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA
| | - John Goutsias
- Whitaker Biomedical Engineering Institute, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|