1
|
Zhao K, Oualkacha K, Zeng Y, Shen C, Klein K, Lakhal-Chaieb L, Labbe A, Pastinen T, Hudson M, Colmegna I, Bernatsky S, Greenwood CMT. Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data. Stat Med 2024; 43:3899-3920. [PMID: 38932470 DOI: 10.1002/sim.10149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 04/13/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
Collapse
Affiliation(s)
- Kaiqiong Zhao
- Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, Quebec, Canada
| | - Yixiao Zeng
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Cathy Shen
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Kathleen Klein
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Lajmi Lakhal-Chaieb
- Département de Mathématiques et de Statistique, Université Laval, Quebec, Quebec, Canada
| | - Aurélie Labbe
- Département de Sciences de la Décision, HEC Montrèal, Montreal, Quebec, Canada
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy, Independence, Missouri, USA
| | - Marie Hudson
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Inés Colmegna
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
| | - Sasha Bernatsky
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
- Department of Human Genetics and Gerald Bronfman Department of Oncology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
2
|
Górczak K, Burzykowski T, Claesen J. A varying-coefficient model for the analysis of methylation sequencing data. Comput Biol Chem 2024; 111:108094. [PMID: 38781748 DOI: 10.1016/j.compbiolchem.2024.108094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2024] [Revised: 05/06/2024] [Accepted: 05/08/2024] [Indexed: 05/25/2024]
Abstract
DNA methylation is an important epigenetic modification involved in gene regulation. Advances in the next generation sequencing technology have enabled the retrieval of DNA methylation information at single-base-resolution. However, due to the sequencing process and the limited amount of isolated DNA, DNA-methylation-data are often noisy and sparse, which complicates the identification of differentially methylated regions (DMRs), especially when few replicates are available. We present a varying-coefficient model for detecting DMRs by using single-base-resolved methylation information. The model simultaneously smooths the methylation profiles and allows detection of DMRs, while accounting for additional covariates. The proposed model takes into account possible overdispersion by using a beta-binomial distribution. The overdispersion itself can be modeled as a function of the genomic region and explanatory variables. We illustrate the properties of the proposed model by applying it to two real-life case studies.
Collapse
Affiliation(s)
- Katarzyna Górczak
- Data Science Institute, Hasselt University, Belgium; Open Analytics NV, Antwerp, Belgium
| | - Tomasz Burzykowski
- Data Science Institute, Hasselt University, Belgium; Department of Biostatistics and Medical Informatics, Medical University of Bialystok, Poland; International Drug Development Institute (IDDI), Belgium
| | - Jürgen Claesen
- Data Science Institute, Hasselt University, Belgium; Department of Epidemiology and Data Science, Amsterdam UMC, VU Amsterdam, The Netherlands.
| |
Collapse
|
3
|
Yu JCY, Zeng Y, Zhao K, Lu T, Oros Klein K, Colmegna I, Lora M, Bhatnagar SR, Leask A, Greenwood CMT, Hudson M. Novel insights into systemic sclerosis using a sensitive computational method to analyze whole-genome bisulfite sequencing data. Clin Epigenetics 2023; 15:96. [PMID: 37270501 DOI: 10.1186/s13148-023-01513-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2022] [Accepted: 05/28/2023] [Indexed: 06/05/2023] Open
Abstract
BACKGROUND Abnormal DNA methylation is thought to contribute to the onset and progression of systemic sclerosis. Currently, the most comprehensive assay for profiling DNA methylation is whole-genome bisulfite sequencing (WGBS), but its precision depends on read depth and it may be subject to sequencing errors. SOMNiBUS, a method for regional analysis, attempts to overcome some of these limitations. Using SOMNiBUS, we re-analyzed WGBS data previously analyzed using bumphunter, an approach that initially fits single CpG associations, to contrast DNA methylation estimates by both methods. METHODS Purified CD4+ T lymphocytes of 9 SSc and 4 control females were sequenced using WGBS. We separated the resulting sequencing data into regions with dense CpG data, and differentially methylated regions (DMRs) were inferred with the SOMNiBUS region-level test, adjusted for age. Pathway enrichment analysis was performed with ingenuity pathway analysis (IPA). We compared the results obtained by SOMNiBUS and bumphunter. RESULTS Of 8268 CpG regions of ≥ 60 CpGs eligible for analysis with SOMNiBUS, we identified 131 DMRs and 125 differentially methylated genes (DMGs; p-values less than Bonferroni-corrected threshold of 6.05-06 controlling family-wise error rate at 0.05; 1.6% of the regions). In comparison, bumphunter identified 821,929 CpG regions, 599 DMRs (of which none had ≥ 60 CpGs) and 340 DMGs (q-value of 0.05; 0.04% of all regions). The top ranked gene identified by SOMNiBUS was FLT4, a lymphangiogenic orchestrator, and the top ranked gene on chromosome X was CHST7, known to catalyze the sulfation of glycosaminoglycans in the extracellular matrix. The top networks identified by IPA included connective tissue disorders. CONCLUSIONS SOMNiBUS is a complementary method of analyzing WGBS data that enhances biological insights into SSc and provides novel avenues of investigation into its pathogenesis.
Collapse
Affiliation(s)
- Jeffrey C Y Yu
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
| | - Yixiao Zeng
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
| | - Kaiqiong Zhao
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
| | - Tianyuan Lu
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
| | - Kathleen Oros Klein
- Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Côte Sainte Catherine, Montreal, H3T 1E2, Canada
| | - Inés Colmegna
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
- Research Institute of the McGill University Health Center, Montreal, Canada
| | - Maximilien Lora
- Research Institute of the McGill University Health Center, Montreal, Canada
| | | | | | - Celia M T Greenwood
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada
- Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Côte Sainte Catherine, Montreal, H3T 1E2, Canada
| | - Marie Hudson
- McGill University, 845 Sherbrooke St W, Montreal, H3A 0G4, Canada.
- Lady Davis Institute for Medical Research, Jewish General Hospital, 3755 Côte Sainte Catherine, Montreal, H3T 1E2, Canada.
| |
Collapse
|
4
|
Zeng Y, Zhao K, Oros Klein K, Shao X, Fritzler MJ, Hudson M, Colmegna I, Pastinen T, Bernatsky S, Greenwood CMT. Thousands of CpGs Show DNA Methylation Differences in ACPA-Positive Individuals. Genes (Basel) 2021; 12:1349. [PMID: 34573331 PMCID: PMC8472734 DOI: 10.3390/genes12091349] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2021] [Revised: 08/25/2021] [Accepted: 08/26/2021] [Indexed: 11/27/2022] Open
Abstract
High levels of anti-citrullinated protein antibodies (ACPA) are often observed prior to a diagnosis of rheumatoid arthritis (RA). We undertook a replication study to confirm CpG sites showing evidence of differential methylation in subjects positive vs. negative for ACPA, in a new subset of 112 individuals sampled from the population cohort and biobank CARTaGENE in Quebec, Canada. Targeted custom capture bisulfite sequencing was conducted at approximately 5.3 million CpGs located in regulatory or hypomethylated regions from whole blood; library and protocol improvements had been instituted between the original and this replication study, enabling better coverage and additional identification of differentially methylated regions (DMRs). Using binomial regression models, we identified 19,472 ACPA-associated differentially methylated cytosines (DMCs), of which 430 overlapped with the 1909 DMCs reported by the original study; 814 DMRs of relevance were clustered by grouping adjacent DMCs into regions. Furthermore, we performed an additional integrative analysis by looking at the DMRs that overlap with RA related loci published in the GWAS Catalog, and protein-coding genes associated with these DMRs were enriched in the biological process of cell adhesion and involved in immune-related pathways.
Collapse
Affiliation(s)
- Yixiao Zeng
- PhD Program in Quantitative Life Sciences, Interfaculty Studies, McGill University, Montréal, QC H3A 1E3, Canada;
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada; (K.Z.); (K.O.K.); (M.H.)
| | - Kaiqiong Zhao
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada; (K.Z.); (K.O.K.); (M.H.)
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC H3A 1A2, Canada
| | - Kathleen Oros Klein
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada; (K.Z.); (K.O.K.); (M.H.)
| | - Xiaojian Shao
- Digital Technologies Research Centre, National Research Council Canada, Ottawa, ON K1A 0R6, Canada;
| | - Marvin J. Fritzler
- Cumming School of Medicine, University of Calgary, Calgary, AB T2N 1N4, Canada;
| | - Marie Hudson
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada; (K.Z.); (K.O.K.); (M.H.)
- Department of Medicine, McGill University, Montréal, QC H4A 3J1, Canada; (I.C.); (S.B.)
- Division of Rheumatology, Jewish General Hospital, Montréal, QC H3T 1E2, Canada
| | - Inés Colmegna
- Department of Medicine, McGill University, Montréal, QC H4A 3J1, Canada; (I.C.); (S.B.)
- Division of Rheumatology, McGill University, Montréal, QC H3G 1A4, Canada
| | - Tomi Pastinen
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada;
- Center for Pediatric Genomic Medicine, Children’s Mercy, Kansas City, MO 64108, USA
| | - Sasha Bernatsky
- Department of Medicine, McGill University, Montréal, QC H4A 3J1, Canada; (I.C.); (S.B.)
- Division of Rheumatology, McGill University, Montréal, QC H3G 1A4, Canada
| | - Celia M. T. Greenwood
- PhD Program in Quantitative Life Sciences, Interfaculty Studies, McGill University, Montréal, QC H3A 1E3, Canada;
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montréal, QC H3T 1E2, Canada; (K.Z.); (K.O.K.); (M.H.)
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montréal, QC H3A 1A2, Canada
- Department of Human Genetics, McGill University, Montréal, QC H3A 0C7, Canada;
- Gerald Bronfman Department of Oncology, McGill University, Montréal, QC H4A 3T2, Canada
| |
Collapse
|
5
|
Zhao K, Oualkacha K, Lakhal-Chaieb L, Labbe A, Klein K, Ciampi A, Hudson M, Colmegna I, Pastinen T, Zhang T, Daley D, Greenwood CMT. A novel statistical method for modeling covariate effects in bisulfite sequencing derived measures of DNA methylation. Biometrics 2020; 77:424-438. [PMID: 32438470 PMCID: PMC8359306 DOI: 10.1111/biom.13307] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 02/28/2020] [Accepted: 05/08/2020] [Indexed: 01/24/2023]
Abstract
Identifying disease-associated changes in DNA methylation can help us gain a better understanding of disease etiology. Bisulfite sequencing allows the generation of high-throughput methylation profiles at single-base resolution of DNA. However, optimally modeling and analyzing these sparse and discrete sequencing data is still very challenging due to variable read depth, missing data patterns, long-range correlations, data errors, and confounding from cell type mixtures. We propose a regression-based hierarchical model that allows covariate effects to vary smoothly along genomic positions and we have built a specialized EM algorithm, which explicitly allows for experimental errors and cell type mixtures, to make inference about smooth covariate effects in the model. Simulations show that the proposed method provides accurate estimates of covariate effects and captures the major underlying methylation patterns with excellent power. We also apply our method to analyze data from rheumatoid arthritis patients and controls. The method has been implemented in R package SOMNiBUS.
Collapse
Affiliation(s)
- Kaiqiong Zhao
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.,Lady Davis Institute for Medical Research, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montrèal, Montreal, QC, Canada
| | - Lajmi Lakhal-Chaieb
- Département de Mathématiques et de Statistique, Université Laval, Quebec City, QC, Canada
| | - Aurélie Labbe
- Département des Sciences de la Décision, HEC Montrèal, Montreal, QC, Canada
| | - Kathleen Klein
- Lady Davis Institute for Medical Research, Montreal, QC, Canada
| | - Antonio Ciampi
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.,Lady Davis Institute for Medical Research, Montreal, QC, Canada
| | - Marie Hudson
- Lady Davis Institute for Medical Research, Montreal, QC, Canada.,Department of Medicine, McGill University, Montreal, QC, Canada
| | - Inés Colmegna
- Department of Medicine, McGill University, Montreal, QC, Canada.,The Research Institute of the McGill University Health Centre, Montreal, QC, Canada
| | - Tomi Pastinen
- Center for Pediatric Genomic Medicine, Children's Mercy Kansas City, Kansas City, MO, USA
| | - Tieyuan Zhang
- Department of Psychiatry, Douglas Mental Health University Institute, McGill University, Montreal, QC, Canada
| | - Denise Daley
- The Centre for Heart Lung Innovation, and Department of Medicine, University of British Columbia, Vancouver, BC, Canada
| | - Celia M T Greenwood
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada.,Lady Davis Institute for Medical Research, Montreal, QC, Canada.,Department of Human Genetics and Gerald Bronfman Department of Oncology, McGill University, Montreal, QC, Canada
| |
Collapse
|