1
|
Zhao K, Oualkacha K, Zeng Y, Shen C, Klein K, Lakhal-Chaieb L, Labbe A, Pastinen T, Hudson M, Colmegna I, Bernatsky S, Greenwood CMT. Addressing dispersion in mis-measured multivariate binomial outcomes: A novel statistical approach for detecting differentially methylated regions in bisulfite sequencing data. Stat Med 2024; 43:3899-3920. [PMID: 38932470 DOI: 10.1002/sim.10149] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 04/13/2024] [Accepted: 06/07/2024] [Indexed: 06/28/2024]
Abstract
Motivated by a DNA methylation application, this article addresses the problem of fitting and inferring a multivariate binomial regression model for outcomes that are contaminated by errors and exhibit extra-parametric variations, also known as dispersion. While dispersion in univariate binomial regression has been extensively studied, addressing dispersion in the context of multivariate outcomes remains a complex and relatively unexplored task. The complexity arises from a noteworthy data characteristic observed in our motivating dataset: non-constant yet correlated dispersion across outcomes. To address this challenge and account for possible measurement error, we propose a novel hierarchical quasi-binomial varying coefficient mixed model, which enables flexible dispersion patterns through a combination of additive and multiplicative dispersion components. To maximize the Laplace-approximated quasi-likelihood of our model, we further develop a specialized two-stage expectation-maximization (EM) algorithm, where a plug-in estimate for the multiplicative scale parameter enhances the speed and stability of the EM iterations. Simulations demonstrated that our approach yields accurate inference for smooth covariate effects and exhibits excellent power in detecting non-zero effects. Additionally, we applied our proposed method to investigate the association between DNA methylation, measured across the genome through targeted custom capture sequencing of whole blood, and levels of anti-citrullinated protein antibodies (ACPA), a preclinical marker for rheumatoid arthritis (RA) risk. Our analysis revealed 23 significant genes that potentially contribute to ACPA-related differential methylation, highlighting the relevance of cell signaling and collagen metabolism in RA. We implemented our method in the R Bioconductor package called "SOMNiBUS."
Collapse
Affiliation(s)
- Kaiqiong Zhao
- Department of Mathematics and Statistics, York University, Toronto, Ontario, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, Quebec, Canada
| | - Yixiao Zeng
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Cathy Shen
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Kathleen Klein
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
| | - Lajmi Lakhal-Chaieb
- Département de Mathématiques et de Statistique, Université Laval, Quebec, Quebec, Canada
| | - Aurélie Labbe
- Département de Sciences de la Décision, HEC Montrèal, Montreal, Quebec, Canada
| | - Tomi Pastinen
- Genomic Medicine Center, Children's Mercy, Independence, Missouri, USA
| | - Marie Hudson
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
- Department of Medicine, McGill University, Montreal, Quebec, Canada
| | - Inés Colmegna
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
| | - Sasha Bernatsky
- Department of Medicine, McGill University, Montreal, Quebec, Canada
- The Research Institute of the McGill University Health Centre, Montreal, Quebec, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Quebec, Canada
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada
- Department of Human Genetics and Gerald Bronfman Department of Oncology, McGill University, Montreal, Quebec, Canada
| |
Collapse
|
2
|
Vazzana KM, Musolf AM, Bailey-Wilson JE, Hiraki LT, Silverman ED, Scott C, Dalgard CL, Hasni S, Deng Z, Kaplan MJ, Lewandowski LB. Transmission disequilibrium analysis of whole genome data in childhood-onset systemic lupus erythematosus. Genes Immun 2023; 24:200-206. [PMID: 37488248 PMCID: PMC10529982 DOI: 10.1038/s41435-023-00214-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2023] [Revised: 06/23/2023] [Accepted: 07/06/2023] [Indexed: 07/26/2023]
Abstract
Childhood-onset systemic lupus erythematosus (cSLE) patients are unique, with hallmarks of Mendelian disorders (early-onset and severe disease) and thus are an ideal population for genetic investigation of SLE. In this study, we use the transmission disequilibrium test (TDT), a family-based genetic association analysis that employs robust methodology, to analyze whole genome sequencing data. We aim to identify novel genetic associations in an ancestrally diverse, international cSLE cohort. Forty-two cSLE patients and 84 unaffected parents from 3 countries underwent whole genome sequencing. First, we performed TDT with single nucleotide variant (SNV)-based (common variants) using PLINK 1.9, and gene-based (rare variants) analyses using Efficient and Parallelizable Association Container Toolbox (EPACTS) and rare variant TDT (rvTDT), which applies multiple gene-based burden tests adapted for TDT, including the burden of rare variants test. Applying the GWAS standard threshold (5.0 × 10-8) to common variants, our SNV-based analysis did not return any genome-wide significant SNVs. The rare variant gene-based TDT analysis identified many novel genes significantly enriched in cSLE patients, including HNRNPUL2, a DNA repair protein, and DNAH11, a ciliary movement protein, among others. Our approach identifies several novel SLE susceptibility genes in an ancestrally diverse childhood-onset lupus cohort.
Collapse
Affiliation(s)
- Kathleen M Vazzana
- Lupus Genomics and Global Health Disparities Unit, Systemic Autoimmunity Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
- Arnold Palmer Hospital for Children, Orlando, FL, USA
| | - Anthony M Musolf
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, 22124, USA
| | - Joan E Bailey-Wilson
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Baltimore, MD, 22124, USA
| | - Linda T Hiraki
- Division of Rheumatology, Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
| | - Earl D Silverman
- Division of Rheumatology, Department of Pediatrics, The Hospital for Sick Children, University of Toronto, Toronto, ON, Canada
| | - Christiaan Scott
- Paediatric Rheumatology, Red Cross War Memorial Children's Hospital and University of Cape Town, Cape Town, South Africa
| | - Clifton L Dalgard
- The American Genome Center, Department of Anatomy, Physiology & Genetics, Uniformed Services University, Bethesda, MD, USA
| | - Sarfaraz Hasni
- Clinical Program, Systemic Autoimmunity Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Zuoming Deng
- Biodata Mining and Discovery Section, Office of Science and Technology, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Mariana J Kaplan
- Systemic Autoimmunity Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA
| | - Laura B Lewandowski
- Lupus Genomics and Global Health Disparities Unit, Systemic Autoimmunity Branch, National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, MD, USA.
| |
Collapse
|
3
|
Battineni G, Hossain MA, Chintalapudi N, Amenta F. A Survey on the Role of Artificial Intelligence in Biobanking Studies: A Systematic Review. Diagnostics (Basel) 2022; 12:1179. [PMID: 35626333 PMCID: PMC9140088 DOI: 10.3390/diagnostics12051179] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2022] [Revised: 05/02/2022] [Accepted: 05/06/2022] [Indexed: 02/04/2023] Open
Abstract
Introduction: In biobanks, participants' biological samples are stored for future research. The application of artificial intelligence (AI) involves the analysis of data and the prediction of any pathological outcomes. In AI, models are used to diagnose diseases as well as classify and predict disease risks. Our research analyzed AI's role in the development of biobanks in the healthcare industry, systematically. Methods: The literature search was conducted using three digital reference databases, namely PubMed, CINAHL, and WoS. Guidelines for preferred reporting elements for systematic reviews and meta-analyses (PRISMA)-2020 in conducting the systematic review were followed. The search terms included "biobanks", "AI", "machine learning", and "deep learning", as well as combinations such as "biobanks with AI", "deep learning in the biobanking field", and "recent advances in biobanking". Only English-language papers were included in the study, and to assess the quality of selected works, the Newcastle-Ottawa scale (NOS) was used. The good quality range (NOS ≥ 7) is only considered for further review. Results: A literature analysis of the above entries resulted in 239 studies. Based on their relevance to the study's goal, research characteristics, and NOS criteria, we included 18 articles for reviewing. In the last decade, biobanks and artificial intelligence have had a relatively large impact on the medical system. Interestingly, UK biobanks account for the highest percentage of high-quality works, followed by Qatar, South Korea, Singapore, Japan, and Denmark. Conclusions: Translational bioinformatics probably represent a future leader in precision medicine. AI and machine learning applications to biobanking research may contribute to the development of biobanks for the utility of health services and citizens.
Collapse
Affiliation(s)
- Gopi Battineni
- Clinical Research Centre, School of Medicinal and Health Products Sciences, University of Camerino, 62032 Camerino, Italy; (M.A.H.); (N.C.); (F.A.)
| | | | | | | |
Collapse
|