1
|
Majumdar K, Silva R, Perry AS, Watson RW, Rau A, Jaffrezic F, Murphy TB, Gormley IC. A novel family of beta mixture models for the differential analysis of DNA methylation data: An application to prostate cancer. PLoS One 2024; 19:e0314014. [PMID: 39661598 PMCID: PMC11633993 DOI: 10.1371/journal.pone.0314014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 11/04/2024] [Indexed: 12/13/2024] Open
Abstract
Identifying differentially methylated cytosine-guanine dinucleotide (CpG) sites between benign and tumour samples can assist in understanding disease. However, differential analysis of bounded DNA methylation data often requires data transformation, reducing biological interpretability. To address this, a family of beta mixture models (BMMs) is proposed that (i) objectively infers methylation state thresholds and (ii) identifies differentially methylated CpG sites (DMCs) given untransformed, beta-valued methylation data. The BMMs achieve this through model-based clustering of CpG sites and by employing parameter constraints, facilitating application to different study settings. Inference proceeds via an expectation-maximisation algorithm, with an approximate maximization step providing tractability and computational feasibility. Performance of the BMMs is assessed through thorough simulation studies, and the BMMs are used for differential analyses of DNA methylation data from a prostate cancer study. Intuitive and biologically interpretable methylation state thresholds are inferred and DMCs are identified, including those related to genes such as GSTP1, RASSF1 and RARB, known for their role in prostate cancer development. Gene ontology analysis of the DMCs revealed significant enrichment in cancer-related pathways, demonstrating the utility of BMMs to reveal biologically relevant insights. An R package betaclust facilitates widespread use of BMMs.
Collapse
Affiliation(s)
- Koyel Majumdar
- School of Mathematics and Statistics, University College Dublin, Dublin, Ireland
| | - Romina Silva
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
| | - Antoinette Sabrina Perry
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
- School of Biology and Environmental Science, University College Dublin, Dublin, Ireland
| | - Ronald William Watson
- School of Medicine, University College Dublin, Dublin, Ireland
- Conway Institute of Biomedical and Biomolecular Research, University College Dublin, Dublin, Ireland
| | - Andrea Rau
- INRAE, UMR1313 AgroParisTech, GABI, Université Paris-Saclay, Gif-sur-Yvette, France
| | - Florence Jaffrezic
- INRAE, UMR1313 AgroParisTech, GABI, Université Paris-Saclay, Gif-sur-Yvette, France
| | | | | |
Collapse
|
2
|
Mouse4mC-BGRU: deep learning for predicting DNA N4-methylcytosine sites in mouse genome. Methods 2022; 204:258-262. [DOI: 10.1016/j.ymeth.2022.01.009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 01/14/2022] [Accepted: 01/24/2022] [Indexed: 12/12/2022] Open
|
3
|
Damgacioglu H, Celik E, Celik N. Intra-Cluster Distance Minimization in DNA Methylation Analysis Using an Advanced Tabu-Based Iterative k-Medoids Clustering Algorithm (T-CLUST). IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1241-1252. [PMID: 30530337 DOI: 10.1109/tcbb.2018.2886006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Recent advances in DNA methylation profiling have paved the way for understanding the underlying epigenetic mechanisms of various diseases such as cancer. While conventional distance-based clustering algorithms (e.g., hierarchical and k-means clustering) have been heavily used in such profiling owing to their speed in conduct of high-throughput analysis, these methods commonly converge to suboptimal solutions and/or trivial clusters due to their greedy search nature. Hence, methodologies are needed to improve the quality of clusters formed by these algorithms without sacrificing from their speed. In this study, we introduce three related algorithms for a complete high-throughput methylation analysis: a variance-based dimension reduction algorithm to handle high-dimensionality in data, an outlier detection algorithm to identify the outliers of data, and an advanced Tabu-based iterative k-medoids clustering algorithm (T-CLUST) to reduce the impact of initial solutions on the performance of conventional k-medoids algorithm. The performance of the proposed algorithms is demonstrated on nine different real DNA methylation datasets obtained from the Gene Expression Omnibus DataSets database. The accuracy of the cluster identification obtained by our proposed algorithms is higher than those of hierarchical and k-means clustering, as well as the conventional methods. The algorithms are implemented in MATLAB, and available at: http://www.coe.miami.edu/simlab/tclust.html.
Collapse
|
4
|
Omony J, Nussbaumer T, Gutzat R. DNA methylation analysis in plants: review of computational tools and future perspectives. Brief Bioinform 2020; 21:906-918. [PMID: 31220217 DOI: 10.1093/bib/bbz039] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2018] [Revised: 02/28/2019] [Accepted: 03/12/2019] [Indexed: 12/12/2022] Open
Abstract
Genome-wide DNA methylation studies have quickly expanded due to advances in next-generation sequencing techniques along with a wealth of computational tools to analyze the data. Most of our knowledge about DNA methylation profiles, epigenetic heritability and the function of DNA methylation in plants derives from the model species Arabidopsis thaliana. There are increasingly many studies on DNA methylation in plants-uncovering methylation profiles and explaining variations in different plant tissues. Additionally, DNA methylation comparisons of different plant tissue types and dynamics during development processes are only slowly emerging but are crucial for understanding developmental and regulatory decisions. Translating this knowledge from plant model species to commercial crops could allow the establishment of new varieties with increased stress resilience and improved yield. In this review, we provide an overview of the most commonly applied bioinformatics tools for the analysis of DNA methylation data (particularly bisulfite sequencing data). The performances of a selection of the tools are analyzed for computational time and agreement in predicted methylated sites for A. thaliana, which has a smaller genome compared to the hexaploid bread wheat. The performance of the tools was benchmarked on five plant genomes. We give examples of applications of DNA methylation data analysis in crops (with a focus on cereals) and an outlook for future developments for DNA methylation status manipulations and data integration.
Collapse
Affiliation(s)
- Jimmy Omony
- Plant Genome and Systems Biology, Helmholtz Center Munich-German Research Center for Environmental Health, Neuherberg, Germany
| | - Thomas Nussbaumer
- Institute of Network Biology, Department of Environmental Science, Helmholtz Center Munich, Neuherberg, Germany.,Institute of Environmental Medicine, UNIKA-T, Technical University of Munich and Helmholtz Center Munich, Research Center for Environmental Health, Augsburg, Germany; CK CARE Christine Kühne Center for Allergy Research and Education, Davos, Switzerland
| | - Ruben Gutzat
- Gregor Mendel Institute of Molecular Plant Biology, Austrian Academy of Sciences, Vienna BioCenter (VBC), Vienna, Austria
| |
Collapse
|
5
|
Li W, Li Q, Kang S, Same M, Zhou Y, Sun C, Liu CC, Matsuoka L, Sher L, Wong WH, Alber F, Zhou X. CancerDetector: ultrasensitive and non-invasive cancer detection at the resolution of individual reads using cell-free DNA methylation sequencing data. Nucleic Acids Res 2018; 46:e89. [PMID: 29897492 PMCID: PMC6125664 DOI: 10.1093/nar/gky423] [Citation(s) in RCA: 117] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 05/01/2018] [Accepted: 05/29/2018] [Indexed: 12/13/2022] Open
Abstract
The detection of tumor-derived cell-free DNA in plasma is one of the most promising directions in cancer diagnosis. The major challenge in such an approach is how to identify the tiny amount of tumor DNAs out of total cell-free DNAs in blood. Here we propose an ultrasensitive cancer detection method, termed 'CancerDetector', using the DNA methylation profiles of cell-free DNAs. The key of our method is to probabilistically model the joint methylation states of multiple adjacent CpG sites on an individual sequencing read, in order to exploit the pervasive nature of DNA methylation for signal amplification. Therefore, CancerDetector can sensitively identify a trace amount of tumor cfDNAs in plasma, at the level of individual reads. We evaluated CancerDetector on the simulated data, and showed a high concordance of the predicted and true tumor fraction. Testing CancerDetector on real plasma data demonstrated its high sensitivity and specificity in detecting tumor cfDNAs. In addition, the predicted tumor fraction showed great consistency with tumor size and survival outcome. Note that all of those testing were performed on sequencing data at low to medium coverage (1× to 10×). Therefore, CancerDetector holds the great potential to detect cancer early and cost-effectively.
Collapse
Affiliation(s)
- Wenyuan Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Qingjiao Li
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Shuli Kang
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Mary Same
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Yonggang Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
| | - Carol Sun
- Oak Park High School, Oak Park, CA 91377, USA
| | - Chun-Chi Liu
- Institute of Genomics and Bioinformatics, National Chung Hsing University, Taiwan 40227, Republic of China
| | - Lea Matsuoka
- Division of Hepatobiliary Surgery & Liver Transplantation, Department of Surgery, Vanderbilt University Medical Center, Nashville, TN 37232, USA
| | - Linda Sher
- Department of Surgery, University of Southern California, Keck School of Medicine, Los Angeles, Los Angeles, CA 90033, USA
| | - Wing Hung Wong
- Department of Statistics, Stanford University, Stanford, CA 94305, USA
- Department of Health Research & Policy, Stanford University, Stanford, CA 94305, USA
| | - Frank Alber
- Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA
| | - Xianghong Jasmine Zhou
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA 90095, USA
- Institute for Quantitative and Computational Biosciences, University of California at Los Angeles, Los Angeles, CA 90095, USA
| |
Collapse
|
6
|
Zhou T, Erber L, Liu B, Gao Y, Ruan HB, Chen Y. Proteomic analysis reveals diverse proline hydroxylation-mediated oxygen-sensing cellular pathways in cancer cells. Oncotarget 2018; 7:79154-79169. [PMID: 27764789 PMCID: PMC5346705 DOI: 10.18632/oncotarget.12632] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Accepted: 09/25/2016] [Indexed: 12/28/2022] Open
Abstract
Proline hydroxylation is a critical cellular mechanism regulating oxygen-response pathways in tumor initiation and progression. Yet, its substrate diversity and functions remain largely unknown. Here, we report a system-wide analysis to characterize proline hydroxylation substrates in cancer cells using an immunoaffinity-purification assisted proteomics strategy. We identified 562 sites from 272 proteins in HeLa cells. Bioinformatic analysis revealed that proline hydroxylation substrates are significantly enriched with mRNA processing and stress-response cellular pathways with canonical and diverse flanking sequence motifs. Structural analysis indicates a significant enrichment of proline hydroxylation participating in the secondary structure of substrate proteins. Our study identified and validated Brd4, a key transcription factor, as a novel proline hydroxylation substrate. Functional analysis showed that the inhibition of proline hydroxylation pathway significantly reduced the proline hydroxylation abundance on Brd4 and affected Brd4-mediated transcriptional activity as well as cell proliferation in AML leukemia cells. Taken together, our study identified a broad regulatory role of proline hydroxylation in cellular oxygen-sensing pathways and revealed potentially new targets that dynamically respond to hypoxia microenvironment in tumor cells.
Collapse
Affiliation(s)
- Tong Zhou
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA
| | - Luke Erber
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA
| | - Bing Liu
- Department of Integrative Biology and Physiology, University of Minnesota Medical School, Minneapolis, MN 55455, USA
| | - Yankun Gao
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA
| | - Hai-Bin Ruan
- Department of Integrative Biology and Physiology, University of Minnesota Medical School, Minneapolis, MN 55455, USA
| | - Yue Chen
- Department of Biochemistry, Molecular Biology and Biophysics, University of Minnesota at Twin Cities, Minneapolis, MN 55455, USA
| |
Collapse
|
7
|
Gregory KB, Momin AA, Coombes KR, Baladandayuthapani V. Latent Feature Decompositions for Integrative Analysis of Multi-Platform Genomic Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:984-994. [PMID: 26146492 PMCID: PMC4486317 DOI: 10.1109/tcbb.2014.2325035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Increased availability of multi-platform genomics data on matched samples has sparked research efforts to discover how diverse molecular features interact both within and between platforms. In addition, simultaneous measurements of genetic and epigenetic characteristics illuminate the roles their complex relationships play in disease progression and outcomes. However, integrative methods for diverse genomics data are faced with the challenges of ultra-high dimensionality and the existence of complex interactions both within and between platforms. We propose a novel modeling framework for integrative analysis based on decompositions of the large number of platform-specific features into a smaller number of latent features. Subsequently we build a predictive model for clinical outcomes accounting for both within- and between-platform interactions based on Bayesian model averaging procedures. Principal components, partial least squares and non-negative matrix factorization as well as sparse counterparts of each are used to define the latent features, and the performance of these decompositions is compared both on real and simulated data. The latent feature interactions are shown to preserve interactions between the original features and not only aid prediction but also allow explicit selection of outcome-related features. The methods are motivated by and applied to a glioblastoma multiforme data set from The Cancer Genome Atlas to predict patient survival times integrating gene expression, microRNA, copy number and methylation data. For the glioblastoma data, we find a high concordance between our selected prognostic genes and genes with known associations with glioblastoma. In addition, our model discovers several relevant cross-platform interactions such as copy number variation associated gene dosing and epigenetic regulation through promoter methylation. On simulated data, we show that our proposed method successfully incorporates interactions within and between genomic platforms to aid accurate prediction and variable selection. Our methods perform best when principal components are used to define the latent features.
Collapse
Affiliation(s)
- Karl B. Gregory
- PhD candidate in Department of Statistics, Texas A&M University, College Station, TX, 77843-3143, USA
| | - Amin A. Momin
- Department of Clinical Cancer Prevention, The University of Texas MD Anderson Cancer Center, Houston, TX, 77230-1402, USA
| | - Kevin R. Coombes
- Department of Biomedical Informatics, The Ohio State University Wexner Medical Center, USA
| | | |
Collapse
|
8
|
Saito Y, Tsuji J, Mituyama T. Bisulfighter: accurate detection of methylated cytosines and differentially methylated regions. Nucleic Acids Res 2014; 42:e45. [PMID: 24423865 PMCID: PMC3973284 DOI: 10.1093/nar/gkt1373] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Analysis of bisulfite sequencing data usually requires two tasks: to call methylated cytosines (mCs) in a sample, and to detect differentially methylated regions (DMRs) between paired samples. Although numerous tools have been proposed for mC calling, methods for DMR detection have been largely limited. Here, we present Bisulfighter, a new software package for detecting mCs and DMRs from bisulfite sequencing data. Bisulfighter combines the LAST alignment tool for mC calling, and a novel framework for DMR detection based on hidden Markov models (HMMs). Unlike previous attempts that depend on empirical parameters, Bisulfighter can use the expectation-maximization algorithm for HMMs to adjust parameters for each data set. We conduct extensive experiments in which accuracy of mC calling and DMR detection is evaluated on simulated data with various mC contexts, read qualities, sequencing depths and DMR lengths, as well as on real data from a wide range of biological processes. We demonstrate that Bisulfighter consistently achieves better accuracy than other published tools, providing greater sensitivity for mCs with fewer false positives, more precise estimates of mC levels, more exact locations of DMRs and better agreement of DMRs with gene expression and DNase I hypersensitivity. The source code is available at http://epigenome.cbrc.jp/bisulfighter.
Collapse
Affiliation(s)
- Yutaka Saito
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan, Japan Science and Technology Agency, CREST, 4-1-8 Honcho, Kawaguchi, Saitama 332-0012, Japan and Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, 55 Lake Avenue North, Worcester, MA 01655, USA
| | | | | |
Collapse
|