1
|
Shokoohi F, Stephens DA, Greenwood CMT. Identifying Differential Methylation in Cancer Epigenetics via a Bayesian Functional Regression Model. Biomolecules 2024; 14:639. [PMID: 38927043 PMCID: PMC11201607 DOI: 10.3390/biom14060639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2024] [Revised: 05/20/2024] [Accepted: 05/20/2024] [Indexed: 06/28/2024] Open
Abstract
DNA methylation plays an essential role in regulating gene activity, modulating disease risk, and determining treatment response. We can obtain insight into methylation patterns at a single-nucleotide level via next-generation sequencing technologies. However, complex features inherent in the data obtained via these technologies pose challenges beyond the typical big data problems. Identifying differentially methylated cytosines (dmc) or regions is one such challenge. We have developed DMCFB, an efficient dmc identification method based on Bayesian functional regression, to tackle these challenges. Using simulations, we establish that DMCFB outperforms current methods and results in better smoothing and efficient imputation. We analyzed a dataset of patients with acute promyelocytic leukemia and control samples. With DMCFB, we discovered many new dmcs and, more importantly, exhibited enhanced consistency of differential methylation within islands and their adjacent shores. Additionally, we detected differential methylation at more of the binding sites of the fused gene involved in this cancer.
Collapse
Affiliation(s)
- Farhad Shokoohi
- Department of Mathematical Sciences, University of Nevada Las Vegas, Las Vegas, NV 89154, USA
| | - David A. Stephens
- Department of Mathematics and Statistics, McGill University, Montreal, QC H3A 0B9, Canada;
| | - Celia M. T. Greenwood
- Lady Davis Institute for Medical Research, Montreal, QC H3T 1E2, Canada;
- Gerald Bronfman Department of Oncology, McGill University, Montreal, QC H4A 3T2, Canada
- Department of Epidemiology, Biostatistics & Occupational Health, McGill University, Montreal, QC H3A 1G1, Canada
| |
Collapse
|
2
|
Wei S, Tao J, Xu J, Chen X, Wang Z, Zhang N, Zuo L, Jia Z, Chen H, Sun H, Yan Y, Zhang M, Lv H, Kong F, Duan L, Ma Y, Liao M, Xu L, Feng R, Liu G, Project TEWAS, Jiang Y. Ten Years of EWAS. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2021; 8:e2100727. [PMID: 34382344 PMCID: PMC8529436 DOI: 10.1002/advs.202100727] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/22/2021] [Revised: 05/11/2021] [Indexed: 06/13/2023]
Abstract
Epigenome-wide association study (EWAS) has been applied to analyze DNA methylation variation in complex diseases for a decade, and epigenome as a research target has gradually become a hot topic of current studies. The DNA methylation microarrays, next-generation, and third-generation sequencing technologies have prepared a high-quality platform for EWAS. Here, the progress of EWAS research is reviewed, its contributions to clinical applications, and mainly describe the achievements of four typical diseases. Finally, the challenges encountered by EWAS and make bold predictions for its future development are presented.
Collapse
Affiliation(s)
- Siyu Wei
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
- The EWAS ProjectHarbinChina
| | - Junxian Tao
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
- The EWAS ProjectHarbinChina
| | - Jing Xu
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
- The EWAS ProjectHarbinChina
| | - Xingyu Chen
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Zhaoyang Wang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Nan Zhang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Lijiao Zuo
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Zhe Jia
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Haiyan Chen
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Hongmei Sun
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Yubo Yan
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Mingming Zhang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Hongchao Lv
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
| | - Fanwu Kong
- The EWAS ProjectHarbinChina
- Department of NephrologyThe Second Affiliated HospitalHarbin Medical UniversityHarbin150001China
| | - Lian Duan
- The EWAS ProjectHarbinChina
- The First Affiliated Hospital of Wenzhou Medical UniversityWenzhou325000China
| | - Ye Ma
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
- The EWAS ProjectHarbinChina
| | - Mingzhi Liao
- The EWAS ProjectHarbinChina
- College of Life SciencesNorthwest A&F UniversityYanglingShanxi712100China
| | - Liangde Xu
- The EWAS ProjectHarbinChina
- School of Biomedical EngineeringWenzhou Medical UniversityWenzhou325035China
| | - Rennan Feng
- The EWAS ProjectHarbinChina
- Department of Nutrition and Food HygienePublic Health CollegeHarbin Medical UniversityHarbin150081China
| | - Guiyou Liu
- The EWAS ProjectHarbinChina
- Beijing Institute for Brain DisordersCapital Medical UniversityBeijing100069China
| | | | - Yongshuai Jiang
- College of Bioinformatics Science and TechnologyHarbin Medical UniversityHarbin150081China
- The EWAS ProjectHarbinChina
| |
Collapse
|
3
|
Zemplenyi M, Meyer MJ, Cardenas A, Hivert MF, Rifas-Shiman SL, Gibson H, Kloog I, Schwartz J, Oken E, DeMeo DL, Gold DR, Coull BA. Function-on-function regression for the identification of epigenetic regions exhibiting windows of susceptibility to environmental exposures. Ann Appl Stat 2021; 15:1366-1385. [DOI: 10.1214/20-aoas1425] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
- Michele Zemplenyi
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| | - Mark J. Meyer
- Department of Mathematics and Statistics, Georgetown University
| | - Andres Cardenas
- Division of Environmental Health Sciences, University of California, Berkeley
| | | | | | - Heike Gibson
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Itai Kloog
- Department of Geography and Environmental Development, Ben-Gurion University
| | - Joel Schwartz
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Emily Oken
- Department of Population Medicine, Harvard Medical School
| | - Dawn L. DeMeo
- Center for Chest Diseases, Brigham and Women’s Hospital
| | - Diane R. Gold
- Department of Environmental Health, Harvard T. H. Chan School of Public Health
| | - Brent A. Coull
- Department of Biostatistics, Harvard T. H. Chan School of Public Health
| |
Collapse
|
4
|
Denault WRP, Romanowska J, Haaland ØA, Lyle R, Taylor J, Xu Z, Lie RT, Gjessing HK, Jugessur A. Wavelet Screening identifies regions highly enriched for differentially methylated loci for orofacial clefts. NAR Genom Bioinform 2021; 3:lqab035. [PMID: 33987535 PMCID: PMC8092375 DOI: 10.1093/nargab/lqab035] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 04/05/2021] [Accepted: 04/16/2021] [Indexed: 12/04/2022] Open
Abstract
DNA methylation is the most widely studied epigenetic mark in humans and plays an essential role in normal biological processes as well as in disease development. More focus has recently been placed on understanding functional aspects of methylation, prompting the development of methods to investigate the relationship between heterogeneity in methylation patterns and disease risk. However, most of these methods are limited in that they use simplified models that may rely on arbitrarily chosen parameters, they can only detect differentially methylated regions (DMRs) one at a time, or they are computationally intensive. To address these shortcomings, we present a wavelet-based method called 'Wavelet Screening' (WS) that can perform an epigenome-wide association study (EWAS) of thousands of individuals on a single CPU in only a matter of hours. By detecting multiple DMRs located near each other, WS identifies more complex patterns that can differentiate between different methylation profiles. We performed an extensive set of simulations to demonstrate the robustness and high power of WS, before applying it to a previously published EWAS dataset of orofacial clefts (OFCs). WS identified 82 associated regions containing several known genes and loci for OFCs, while other findings are novel and warrant replication in other OFCs cohorts.
Collapse
Affiliation(s)
- William R P Denault
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, 0473, Oslo, Norway
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
| | - Julia Romanowska
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
| | - Øystein A Haaland
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
| | - Robert Lyle
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
- Department of Medical Genetics, Oslo University Hospital, 0450, Oslo, Norway
| | - Jack A Taylor
- Epidemiology Branch and Epigenetics and Stem Cell Biology Laboratory, National Institute of Environmental Health Sciences (NIH/NIEHS), 27709, Durham, North Carolina, USA
| | - Zongli Xu
- Epidemiology Branch, National Institute of Environmental Health Sciences (NIH/NIEHS), 27709, Durham, North Carolina, USA
| | - Rolv T Lie
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
| | - Håkon K Gjessing
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
| | - Astanand Jugessur
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, 0473, Oslo, Norway
- Department of Global Public Health and Primary Care, University of Bergen, 5006, Bergen, Norway
- Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, 0473, Oslo, Norway
| |
Collapse
|
5
|
Denault WRP, Romanowska J, Helgeland Ø, Jacobsson B, Gjessing HK, Jugessur A. A fast wavelet-based functional association analysis replicates several susceptibility loci for birth weight in a Norwegian population. BMC Genomics 2021; 22:321. [PMID: 33932983 PMCID: PMC8088671 DOI: 10.1186/s12864-021-07582-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2020] [Accepted: 03/26/2021] [Indexed: 11/28/2022] Open
Abstract
Background Birth weight (BW) is one of the most widely studied anthropometric traits in humans because of its role in various adult-onset diseases. The number of loci associated with BW has increased dramatically since the advent of whole-genome screening approaches such as genome-wide association studies (GWASes) and meta-analyses of GWASes (GWAMAs). To further contribute to elucidating the genetic architecture of BW, we analyzed a genotyped Norwegian dataset with information on child’s BW (N=9,063) using a slightly modified version of a wavelet-based method by Shim and Stephens (2015) called WaveQTL. Results WaveQTL uses wavelet regression for regional testing and offers a more flexible functional modeling framework compared to conventional GWAS methods. To further improve WaveQTL, we added a novel feature termed “zooming strategy” to enhance the detection of associations in typically small regions. The modified WaveQTL replicated five out of the 133 loci previously identified by the largest GWAMA of BW to date by Warrington et al. (2019), even though our sample size was 26 times smaller than that study and 18 times smaller than the second largest GWAMA of BW by Horikoshi et al. (2016). In addition, the modified WaveQTL performed better in regions of high LD between SNPs. Conclusions This study is the first adaptation of the original WaveQTL method to the analysis of genome-wide genotypic data. Our results highlight the utility of the modified WaveQTL as a complementary tool for identifying loci that might escape detection by conventional genome-wide screening methods due to power issues. An attractive application of the modified WaveQTL would be to select traits from various public GWAS repositories to investigate whether they might benefit from a second analysis. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-021-07582-6).
Collapse
Affiliation(s)
- William R P Denault
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway. .,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway. .,Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, Oslo, Norway.
| | - Julia Romanowska
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, Oslo, Norway
| | - Øyvind Helgeland
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Bo Jacobsson
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Department of Obstetrics and Gynecology, Institute of Clinical Sciences, Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden
| | - Håkon K Gjessing
- Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, Oslo, Norway
| | - Astanand Jugessur
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.,Centre for Fertility and Health (CeFH), Norwegian Institute of Public Health, Oslo, Norway
| |
Collapse
|
6
|
Denault WRP, Jugessur A. Detecting differentially methylated regions using a fast wavelet-based approach to functional association analysis. BMC Bioinformatics 2021; 22:61. [PMID: 33568045 PMCID: PMC7876806 DOI: 10.1186/s12859-021-03979-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2020] [Accepted: 01/27/2021] [Indexed: 11/10/2022] Open
Abstract
Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.
Collapse
Affiliation(s)
- William R P Denault
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway. .,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway. .,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway.
| | - Astanand Jugessur
- Department of Genetics and Bioinformatics, Norwegian Institute of Public Health, Oslo, Norway.,Centre for Fertility and Health, Norwegian Institute of Public Health, Oslo, Norway.,Department of Global Public Health and Primary Care, University of Bergen, Bergen, Norway
| |
Collapse
|
7
|
The progress on the estimation of DNA methylation level and the detection of abnormal methylation. QUANTITATIVE BIOLOGY 2021. [DOI: 10.15302/j-qb-022-0289] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
8
|
Cai J, Xu Y, Zhang W, Ding S, Sun Y, Lyu J, Duan M, Liu S, Huang L, Zhou F. A comprehensive comparison of residue-level methylation levels with the regression-based gene-level methylation estimations by ReGear. Brief Bioinform 2020; 22:5921981. [PMID: 33048108 DOI: 10.1093/bib/bbaa253] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2020] [Revised: 08/10/2020] [Accepted: 09/08/2020] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION DNA methylation is a biological process impacting the gene functions without changing the underlying DNA sequence. The DNA methylation machinery usually attaches methyl groups to some specific cytosine residues, which modify the chromatin architectures. Such modifications in the promoter regions will inactivate some tumor-suppressor genes. DNA methylation within the coding region may significantly reduce the transcription elongation efficiency. The gene function may be tuned through some cytosines are methylated. METHODS This study hypothesizes that the overall methylation level across a gene may have a better association with the sample labels like diseases than the methylations of individual cytosines. The gene methylation level is formulated as a regression model using the methylation levels of all the cytosines within this gene. A comprehensive evaluation of various feature selection algorithms and classification algorithms is carried out between the gene-level and residue-level methylation levels. RESULTS A comprehensive evaluation was conducted to compare the gene and cytosine methylation levels for their associations with the sample labels and classification performances. The unsupervised clustering was also improved using the gene methylation levels. Some genes demonstrated statistically significant associations with the class label, even when no residue-level methylation features have statistically significant associations with the class label. So in summary, the trained gene methylation levels improved various methylome-based machine learning models. Both methodology development of regression algorithms and experimental validation of the gene-level methylation biomarkers are worth of further investigations in the future studies. The source code, example data files and manual are available at http://www.healthinformaticslab.org/supp/.
Collapse
|
9
|
Noh H, Choi T, Park J, Chung Y. Bayesian latent factor regression for multivariate functional data with variable selection. J Korean Stat Soc 2020. [DOI: 10.1007/s42952-019-00044-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
10
|
Korthauer K, Chakraborty S, Benjamini Y, Irizarry RA. Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics 2019; 20:367-383. [PMID: 29481604 PMCID: PMC6587918 DOI: 10.1093/biostatistics/kxy007] [Citation(s) in RCA: 81] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 01/21/2018] [Indexed: 12/22/2022] Open
Abstract
With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.
Collapse
Affiliation(s)
- Keegan Korthauer
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Sutirtha Chakraborty
- Novartis, Inorbit Mall Rd, Silpa Gram Craft Village, HITEC City, Hyderabad, Telangana, India
| | - Yuval Benjamini
- The Statistics Department, Hebrew University, Mount Scopus, Jerusalem, Israel
| | - Rafael A Irizarry
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| |
Collapse
|
11
|
Shen L, Zhu J, Robert Li SY, Fan X. Detect differentially methylated regions using non-homogeneous hidden Markov model for methylation array data. Bioinformatics 2018; 33:3701-3708. [PMID: 29036320 DOI: 10.1093/bioinformatics/btx467] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 07/18/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation DNA methylation is an important epigenetic mechanism in gene regulation and the detection of differentially methylated regions (DMRs) is enthralling for many disease studies. There are several aspects that we can improve over existing DMR detection methods: (i) methylation statuses of nearby CpG sites are highly correlated, but this fact has seldom been modelled rigorously due to the uneven spacing; (ii) it is practically important to be able to handle both paired and unpaired samples; and (iii) the capability to detect DMRs from a single pair of samples is demanded. Results We present DMRMark (DMR detection based on non-homogeneous hidden Markov model), a novel Bayesian framework for detecting DMRs from methylation array data. It combines the constrained Gaussian mixture model that incorporates the biological knowledge with the non-homogeneous hidden Markov model that models spatial correlation. Unlike existing methods, our DMR detection is achieved without predefined boundaries or decision windows. Furthermore, our method can detect DMRs from a single pair of samples and can also incorporate unpaired samples. Both simulation studies and real datasets from The Cancer Genome Atlas showed the significant improvement of DMRMark over other methods. Availability and implementation DMRMark is freely available as an R package at the CRAN R package repository. Contact xfan@cuhk.edu.hk. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Linghao Shen
- Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Jun Zhu
- Department of Genetics and Genomic Sciences, Mount Sinai School of Medicine, New York University, New York, NY, USA
| | - Shuo-Yen Robert Li
- University of Electronic Science and Technology of China, Sichuan, China
| | - Xiaodan Fan
- Department of Statistics, The Chinese University of Hong Kong, Hong Kong
| |
Collapse
|
12
|
Zhu H, Caspers P, Morris JS, Wu X, Müller R. A Unified Analysis of Structured Sonar-terrain Data using Bayesian Functional Mixed Models. Technometrics 2018; 60:112-123. [PMID: 29749977 DOI: 10.1080/00401706.2016.1274681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
Abstract
Sonar emits pulses of sound and uses the reflected echoes to gain information about target objects. It offers a low cost, complementary sensing modality for small robotic platforms. While existing analytical approaches often assume independence across echoes, real sonar data can have more complicated structures due to device setup or experimental design. In this paper, we consider sonar echo data collected from multiple terrain substrates with a dual-channel sonar head. Our goals are to identify the differential sonar responses to terrains and study the effectiveness of this dual-channel design in discriminating targets. We describe a unified analytical framework that achieves these goals rigorously, simultaneously, and automatically. The analysis was done by treating the echo envelope signals as functional responses and the terrain/channel information as covariates in a functional regression setting. We adopt functional mixed models that facilitate the estimation of terrain and channel effects while capturing the complex hierarchical structure in data. This unified analytical framework incorporates both Gaussian models and robust models. We fit the models using a full Bayesian approach, which enables us to perform multiple inferential tasks under the same modeling framework, including selecting models, estimating the effects of interest, identifying significant local regions, discriminating terrain types, and describing the discriminatory power of local regions. Our analysis of the sonar-terrain data identifies time regions that reflect differential sonar responses to terrains. The discriminant analysis suggests that a multi- or dual-channel design achieves target identification performance comparable with or better than a single-channel design.
Collapse
Affiliation(s)
- Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Philip Caspers
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| | - Jeffrey S Morris
- The University of Texas M.D. Anderson Cancer Center, Houston, TX 77230
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061
| | - Rolf Müller
- Department of Mechanical Engineering, Virginia Tech, Blacksburg, VA 24061
| |
Collapse
|
13
|
Tran H, Zhu H, Wu X, Kim G, Clarke CR, Larose H, Haak DC, Askew SD, Barney JN, Westwood JH, Zhang L. Identification of Differentially Methylated Sites with Weak Methylation Effects. Genes (Basel) 2018; 9:E75. [PMID: 29419727 PMCID: PMC5852571 DOI: 10.3390/genes9020075] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2017] [Revised: 01/17/2018] [Accepted: 01/25/2018] [Indexed: 12/28/2022] Open
Abstract
Deoxyribonucleic acid (DNA) methylation is an epigenetic alteration crucial for regulating stress responses. Identifying large-scale DNA methylation at single nucleotide resolution is made possible by whole genome bisulfite sequencing. An essential task following the generation of bisulfite sequencing data is to detect differentially methylated cytosines (DMCs) among treatments. Most statistical methods for DMC detection do not consider the dependency of methylation patterns across the genome, thus possibly inflating type I error. Furthermore, small sample sizes and weak methylation effects among different phenotype categories make it difficult for these statistical methods to accurately detect DMCs. To address these issues, the wavelet-based functional mixed model (WFMM) was introduced to detect DMCs. To further examine the performance of WFMM in detecting weak differential methylation events, we used both simulated and empirical data and compare WFMM performance to a popular DMC detection tool methylKit. Analyses of simulated data that replicated the effects of the herbicide glyphosate on DNA methylation in Arabidopsis thaliana show that WFMM results in higher sensitivity and specificity in detecting DMCs compared to methylKit, especially when the methylation differences among phenotype groups are small. Moreover, the performance of WFMM is robust with respect to small sample sizes, making it particularly attractive considering the current high costs of bisulfite sequencing. Analysis of empirical Arabidopsis thaliana data under varying glyphosate dosages, and the analysis of monozygotic (MZ) twins who have different pain sensitivities-both datasets have weak methylation effects of <1%-show that WFMM can identify more relevant DMCs related to the phenotype of interest than methylKit. Differentially methylated regions (DMRs) are genomic regions with different DNA methylation status across biological samples. DMRs and DMCs are essentially the same concepts, with the only difference being how methylation information across the genome is summarized. If methylation levels are determined by grouping neighboring cytosine sites, then they are DMRs; if methylation levels are calculated based on single cytosines, they are DMCs.
Collapse
Affiliation(s)
- Hong Tran
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Hongxiao Zhu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Xiaowei Wu
- Department of Statistics, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Gunjune Kim
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Christopher R Clarke
- Genetic Improvement of Fruits and Vegetables Laboratory, United States Department of Agriculture, Agricultural Research Service, Beltsville, MD 20705, USA.
| | - Hailey Larose
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - David C Haak
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Shawn D Askew
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Jacob N Barney
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - James H Westwood
- Department of Plant Pathology, Physiology and Weed Science, Virginia Tech, Blacksburg, VA 24061, USA.
| | - Liqing Zhang
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA.
| |
Collapse
|
14
|
Morris JS, Baladandayuthapani V. Statistical Contributions to Bioinformatics: Design, Modeling, Structure Learning, and Integration. STAT MODEL 2017; 17:245-289. [PMID: 29129969 PMCID: PMC5679480 DOI: 10.1177/1471082x17698255] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modeling, and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all availble information to uncover new biological insights.
Collapse
Affiliation(s)
- Jeffrey S Morris
- Department of Biostatistics, The University of Texas M.D. Anderson Cancer Center, Houston, Texas, USA
| | | |
Collapse
|
15
|
Abstract
In this article, Greven and Scheipl describe an impressively general framework for performing functional regression that builds upon the generalized additive modeling framework. Over the past number of years, my collaborators and I have also been developing a general framework for functional regression, functional mixed models, which shares many similarities with this framework, but has many differences as well. In this discussion, I compare and contrast these two frameworks, to hopefully illuminate characteristics of each, highlighting their respecitve strengths and weaknesses, and providing recommendations regarding the settings in which each approach might be preferable.
Collapse
Affiliation(s)
- Jeffrey S Morris
- The University of Texas, MD Anderson Cancer Center, Unit 1411, PO Box 301402, Houston, TX 77230-1402
| |
Collapse
|
16
|
Chen DP, Lin YC, Fann CSJ. Methods for identifying differentially methylated regions for sequence- and array-based data. Brief Funct Genomics 2016; 15:485-490. [PMID: 27323952 DOI: 10.1093/bfgp/elw018] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
DNA methylation is one of the most important epigenetic mechanisms, and participates in the pathogenic processes of many diseases. Differentially methylated regions (DMRs) in the genome have been reported and implicated in a number of different diseases, tissues and cell types, and are associated with gene expression levels. Therefore, identification of DMRs is one of the most critical and fundamental issues in dissecting the disease etiologies. Based on bisulfite conversion, advances in sequence- and array-based technologies have helped investigators study genome-wide DNA methylation. Many methods have been developed to detect DMRs, and they have revolutionized our understanding of DNA methylation and provided new insights into its role in diverse biological functions. According to data and region types, we discuss various methods in detecting DMRs, their utility and limitations comprehensively. We recommend using a few of the methods in the same data and region type to detect DMRs because they could be complementary to one another.
Collapse
|