101
|
Sun X, Bulekova K, Yang J, Lai M, Pitsillides AN, Liu X, Zhang Y, Guo X, Yong Q, Raffield LM, Rotter JI, Rich SS, Abecasis G, Carson AP, Vasan RS, Bis JC, Psaty BM, Boerwinkle E, Fitzpatrick AL, Satizabal CL, Arking DE, Ding J, Levy D, Liu C. Association analysis of mitochondrial DNA heteroplasmic variants: methods and application. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.01.12.24301233. [PMID: 38260412 PMCID: PMC10802757 DOI: 10.1101/2024.01.12.24301233] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
We rigorously assessed a comprehensive association testing framework for heteroplasmy, employing both simulated and real-world data. This framework employed a variant allele fraction (VAF) threshold and harnessed multiple gene-based tests for robust identification and association testing of heteroplasmy. Our simulation studies demonstrated that gene-based tests maintained an appropriate type I error rate at α=0.001. Notably, when 5% or more heteroplasmic variants within a target region were linked to an outcome, burden-extension tests (including the adaptive burden test, variable threshold burden test, and z-score weighting burden test) outperformed the sequence kernel association test (SKAT) and the original burden test. Applying this framework, we conducted association analyses on whole-blood derived heteroplasmy in 17,507 individuals of African and European ancestries (31% of African Ancestry, mean age of 62, with 58% women) with whole genome sequencing data. We performed both cohort- and ancestry-specific association analyses, followed by meta-analysis on both pooled samples and within each ancestry group. Our results suggest that mtDNA-encoded genes/regions are likely to exhibit varying rates in somatic aging, with the notably strong associations observed between heteroplasmy in the RNR1 and RNR2 genes (p<0.001) and advance aging by the Original Burden test. In contrast, SKAT identified significant associations (p<0.001) between diabetes and the aggregated effects of heteroplasmy in several protein-coding genes. Further research is warranted to validate these findings. In summary, our proposed statistical framework represents a valuable tool for facilitating association testing of heteroplasmy with disease traits in large human populations.
Collapse
Affiliation(s)
- Xianbang Sun
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Katia Bulekova
- Research Computing Services, Boston University, Boston, MA 02215, USA
| | - Jian Yang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Meng Lai
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | | | - Xue Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Yuankai Zhang
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Qian Yong
- Longitudinal Studies Section, Translational Gerontology Branch, NIA/NIH, Baltimore, MD 21224, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Stephen S. Rich
- Department of Public Health Services, Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Goncalo Abecasis
- TOPMed Informatics Research Center, University of Michigan, Ann Arbor, MI 48109, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Ramachandran S. Vasan
- Sections of Preventive Medicine and Epidemiology, and Cardiovascular Medicine, Boston University School of Medicine, Boston, MA, 02118, USA
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98101, USA
- Departments of Epidemiology, and Health Services, University of Washington, Seattle, WA 98101, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, TX 77030, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Annette L. Fitzpatrick
- Departments of Family Medicine, Epidemiology, and Global Health, University of Washington, Seattle, WA 98195, USA
| | - Claudia L. Satizabal
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
- Glenn Biggs Institute for Alzheimer’s and Neurodegenerative Diseases, University of Texas Health Science Center at San Antonio, San Antonio, TX 78229, USA
| | - Dan E. Arking
- McKusick-Nathans Institute, Department of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, MD 21205, USA
| | - Jun Ding
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27514, USA
| | - Daniel Levy
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
- Population Sciences Branch, National Heart, Lung, and Blood Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | | | - Chunyu Liu
- Department of Biostatistics, School of Public Health, Boston University, Boston, MA 02118, USA
- Framingham Heart Study, NHLBI/NIH, Framingham, MA 01702, USA
| |
Collapse
|
102
|
Shrestha S, Wiener HW, Kajimoto H, Srinivasasainagendra V, Ledee D, Chowdhury S, Cui J, Chen JY, Beckley MA, Padilla LA, Dahdah N, Tiwari HK, Portman MA. Pharmacogenomics of intravenous immunoglobulin response in Kawasaki disease. Front Immunol 2024; 14:1287094. [PMID: 38259468 PMCID: PMC10800400 DOI: 10.3389/fimmu.2023.1287094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Accepted: 12/12/2023] [Indexed: 01/24/2024] Open
Abstract
Introduction Kawasaki disease (KD) is a diffuse vasculitis in children. Response to high dose intravenous gamma globulin (IVIG), the primary treatment, varies according to genetic background. We sought to identify genetic loci, which associate with treatment response using whole genome sequencing (WGS). Method We performed WGS in 472 KD patients with 305 IVIG responders and 167 non-responders defined by AHA clinical criteria. We conducted logistic regression models to test additive genetic effect in the entire cohort and in four subgroups defined by ancestry information markers (Whites, African Americans, Asians, and Hispanics). We performed functional mapping and annotation using FUMA to examine genetic variants that are potentially involved IVIG non-response. Further, we conducted SNP-set [Sequence] Kernel Association Test (SKAT) for all rare and common variants. Results Of the 43,288,336 SNPs (23,660,970 in intergenic regions, 16,764,594 in introns and 556,814 in the exons) identified, the top ten hits associated with IVIG non-response were in FANK1, MAP2K3:KCNJ12, CA10, FRG1DP, CWH43 regions. When analyzed separately in ancestry-based racial subgroups, SNPs in several novel genes were associated. A total of 23 possible causal genes were pinpointed by positional and chromatin mapping. SKAT analysis demonstrated association in the entire MANIA2, EDN1, SFMBT2, and PPP2R5E genes and segments of CSMD2, LINC01317, HIVEPI, HSP90AB1, and TTLL11 genes. Conclusions This WGS study identified multiple predominantly novel understudied genes associated with IVIG response. These data can serve to inform regarding pathogenesis of KD, as well as lay ground work for developing treatment response predictors.
Collapse
Affiliation(s)
- Sadeep Shrestha
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Howard W. Wiener
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Hidemi Kajimoto
- Division of Cardiology, Seattle Children’s and University of Washington Department of Pediatrics, Seattle, WA, United States
| | - Vinodh Srinivasasainagendra
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Dolena Ledee
- Division of Cardiology, Seattle Children’s and University of Washington Department of Pediatrics, Seattle, WA, United States
| | - Sabrina Chowdhury
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jinhong Cui
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Mikayla A Beckley
- Division of Cardiology, Seattle Children’s and University of Washington Department of Pediatrics, Seattle, WA, United States
| | - Luz A. Padilla
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Nagib Dahdah
- CHU Ste-Justine, Universite de Montreal, Montreal, QC, Canada
| | - Hemant K. Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, United States
| | - Michael A. Portman
- Division of Cardiology, Seattle Children’s and University of Washington Department of Pediatrics, Seattle, WA, United States
| |
Collapse
|
103
|
Cao C, Shao M, Zuo C, Kwok D, Liu L, Ge Y, Zhang Z, Cui F, Chen M, Fan R, Ding Y, Jiang H, Wang G, Zou Q. RAVAR: a curated repository for rare variant-trait associations. Nucleic Acids Res 2024; 52:D990-D997. [PMID: 37831073 PMCID: PMC10767942 DOI: 10.1093/nar/gkad876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2023] [Revised: 09/20/2023] [Accepted: 09/28/2023] [Indexed: 10/14/2023] Open
Abstract
Rare variants contribute significantly to the genetic causes of complex traits, as they can have much larger effects than common variants and account for much of the missing heritability in genome-wide association studies. The emergence of UK Biobank scale datasets and accurate gene-level rare variant-trait association testing methods have dramatically increased the number of rare variant associations that have been detected. However, no systematic collection of these associations has been carried out to date, especially at the gene level. To address the issue, we present the Rare Variant Association Repository (RAVAR), a comprehensive collection of rare variant associations. RAVAR includes 95 047 high-quality rare variant associations (76186 gene-level and 18 861 variant-level associations) for 4429 reported traits which are manually curated from 245 publications. RAVAR is the first resource to collect and curate published rare variant associations in an interactive web interface with integrated visualization, search, and download features. Detailed gene and SNP information are provided for each association, and users can conveniently search for related studies by exploring the EFO tree structure and interactive Manhattan plots. RAVAR could vastly improve the accessibility of rare variant studies. RAVAR is freely available for all users without login requirement at http://www.ravar.bio.
Collapse
Affiliation(s)
- Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Chunman Zuo
- Institute of Artificial Intelligence, Donghua University, Shanghai, China
| | - Devin Kwok
- School of Computer Science, McGill University, Montreal, Canada
| | - Lin Liu
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Yuli Ge
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Nanjing, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Mingshuai Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Rui Fan
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Hangjin Jiang
- Center for Data Science, Zhejiang University, Hangzhou, China
| | - Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, Changchun, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| |
Collapse
|
104
|
He J, Antonyan L, Zhu H, Ardila K, Li Q, Enoma D, Zhang W, Liu A, Chekouo T, Cao B, MacDonald ME, Arnold PD, Long Q. A statistical method for image-mediated association studies discovers genes and pathways associated with four brain disorders. Am J Hum Genet 2024; 111:48-69. [PMID: 38118447 PMCID: PMC10806749 DOI: 10.1016/j.ajhg.2023.11.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Revised: 11/04/2023] [Accepted: 11/16/2023] [Indexed: 12/22/2023] Open
Abstract
Brain imaging and genomics are critical tools enabling characterization of the genetic basis of brain disorders. However, imaging large cohorts is expensive and may be unavailable for legacy datasets used for genome-wide association studies (GWASs). Using an integrated feature selection/aggregation model, we developed an image-mediated association study (IMAS), which utilizes borrowed imaging/genomics data to conduct association mapping in legacy GWAS cohorts. By leveraging the UK Biobank image-derived phenotypes (IDPs), the IMAS discovered genetic bases underlying four neuropsychiatric disorders and verified them by analyzing annotations, pathways, and expression quantitative trait loci (eQTLs). A cerebellar-mediated mechanism was identified to be common to the four disorders. Simulations show that, if the goal is identifying genetic risk, our IMAS is more powerful than a hypothetical protocol in which the imaging results were available in the GWAS dataset. This implies the feasibility of reanalyzing legacy GWAS datasets without conducting additional imaging, yielding cost savings for integrated analysis of genetics and imaging.
Collapse
Affiliation(s)
- Jingni He
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Lilit Antonyan
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Harold Zhu
- Department of Biological Sciences, Faculty of Science, University of Calgary, Calgary, AB, Canada
| | - Karen Ardila
- Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - David Enoma
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | | | - Andy Liu
- Sir Winston Churchill High School, Calgary, AB, Canada; College of Letters and Science, University of California, Los Angeles, Los Angeles, CA, USA
| | - Thierry Chekouo
- Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada; Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Bo Cao
- Department of Psychiatry, Faculty of Medicine & Dentistry, University of Alberta, Edmonton, AB, Canada
| | - M Ethan MacDonald
- The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Biomedical Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Electrical and Software Engineering, Schulich School of Engineering, University of Calgary, Calgary, AB, Canada; Department of Radiology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada
| | - Paul D Arnold
- Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Psychiatry, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada.
| | - Quan Long
- Department of Biochemistry and Molecular Biology, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Medical Genetics, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; The Mathison Centre for Mental Health Research & Education, Hotchkiss Brain Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Alberta Children's Hospital Research Institute, Cumming School of Medicine, University of Calgary, Calgary, AB, Canada; Department of Mathematics and Statistics, Faculty of Science, University of Calgary, Calgary, AB, Canada.
| |
Collapse
|
105
|
Cruciani F, Aparo A, Brusini L, Combi C, Storti SF, Giugno R, Menegaz G, Boscolo Galazzo I. Identifying the joint signature of brain atrophy and gene variant scores in Alzheimer's Disease. J Biomed Inform 2024; 149:104569. [PMID: 38104851 DOI: 10.1016/j.jbi.2023.104569] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 11/20/2023] [Accepted: 12/07/2023] [Indexed: 12/19/2023]
Abstract
The joint modeling of genetic data and brain imaging information allows for determining the pathophysiological pathways of neurodegenerative diseases such as Alzheimer's disease (AD). This task has typically been approached using mass-univariate methods that rely on a complete set of Single Nucleotide Polymorphisms (SNPs) to assess their association with selected image-derived phenotypes (IDPs). However, such methods are prone to multiple comparisons bias and, most importantly, fail to account for potential cross-feature interactions, resulting in insufficient detection of significant associations. Ways to overcome these limitations while reducing the number of traits aim at conveying genetic information at the gene level and capturing the integrated genetic effects of a set of genetic variants, rather than looking at each SNP individually. Their associations with brain IDPs are still largely unexplored in the current literature, though they can uncover new potential genetic determinants for brain modulations in the AD continuum. In this work, we explored an explainable multivariate model to analyze the genetic basis of the grey matter modulations, relying on the AD Neuroimaging Initiative (ADNI) phase 3 dataset. Cortical thicknesses and subcortical volumes derived from T1-weighted Magnetic Resonance were considered to describe the imaging phenotypes. At the same time the genetic counterpart was represented by gene variant scores extracted by the Sequence Kernel Association Test (SKAT) filtering model. Moreover, transcriptomic analysis was carried on to assess the expression of the resulting genes in the main brain structures as a form of validation. Results highlighted meaningful genotype-phenotype interactionsas defined by three latent components showing a significant difference in the projection scores between patients and controls. Among the significant associations, the model highlighted EPHX1 and BCAS1 gene variant scores involved in neurodegenerative and myelination processes, hence relevant for AD. In particular, the first was associated with decreased subcortical volumes and the second with decreasedtemporal lobe thickness. Noteworthy, BCAS1 is particularly expressed in the dentate gyrus. Overall, the proposed approach allowed capturing genotype-phenotype interactions in a restricted study cohort that was confirmed by transcriptomic analysis, offering insights into the underlying mechanisms of neurodegeneration in AD in line with previous findings and suggesting new potential disease biomarkers.
Collapse
Affiliation(s)
- Federica Cruciani
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy.
| | - Antonino Aparo
- Department of Computer Science, University of Verona, Verona, Italy
| | - Lorenza Brusini
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | - Carlo Combi
- Department of Computer Science, University of Verona, Verona, Italy
| | - Silvia F Storti
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | - Rosalba Giugno
- Department of Computer Science, University of Verona, Verona, Italy
| | - Gloria Menegaz
- Department of Engineering for Innovation Medicine, University of Verona, Verona, Italy
| | | |
Collapse
|
106
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
107
|
Sun R, Shi A, Lin X. Differences in set-based tests for sparse alternatives when testing sets of outcomes compared to sets of explanatory factors in genetic association studies. Biostatistics 2023; 25:171-187. [PMID: 36000269 PMCID: PMC10724113 DOI: 10.1093/biostatistics/kxac036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2022] [Revised: 07/15/2022] [Accepted: 08/07/2022] [Indexed: 01/11/2023] Open
Abstract
Set-based association tests are widely popular in genetic association settings for their ability to aggregate weak signals and reduce multiple testing burdens. In particular, a class of set-based tests including the Higher Criticism, Berk-Jones, and other statistics have recently been popularized for reaching a so-called detection boundary when signals are rare and weak. Such tests have been applied in two subtly different settings: (a) associating a genetic variant set with a single phenotype and (b) associating a single genetic variant with a phenotype set. A significant issue in practice is the choice of test, especially when deciding between innovated and generalized type methods for detection boundary tests. Conflicting guidance is present in the literature. This work describes how correlation structures generate marked differences in relative operating characteristics for settings (a) and (b). The implications for study design are significant. We also develop novel power bounds that facilitate the aforementioned calculations and allow for analysis of individual testing settings. In more concrete terms, our investigation is motivated by translational expression quantitative trait loci (eQTL) studies in lung cancer. These studies involve both testing for groups of variants associated with a single gene expression (multiple explanatory factors) and testing whether a single variant is associated with a group of gene expressions (multiple outcomes). Results are supported by a collection of simulation studies and illustrated through lung cancer eQTL examples.
Collapse
Affiliation(s)
- Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, 1515 Holcombe Boulevard, Houston, TX 77030, USA
| | - Andy Shi
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02215, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Avenue, Boston, MA 02215, USA
| |
Collapse
|
108
|
Khan A, Unlu G, Lin P, Liu Y, Kilic E, Kenny TC, Birsoy K, Gamazon ER. GeneMAP: A discovery platform for metabolic gene function. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.07.570588. [PMID: 38106122 PMCID: PMC10723489 DOI: 10.1101/2023.12.07.570588] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2023]
Abstract
Organisms maintain metabolic homeostasis through the combined functions of small molecule transporters and enzymes. While many of the metabolic components have been well-established, a substantial number remains without identified physiological substrates. To bridge this gap, we have leveraged large-scale plasma metabolome genome-wide association studies (GWAS) to develop a multiomic Gene-Metabolite Associations Prediction (GeneMAP) discovery platform. GeneMAP can generate accurate predictions, even pinpointing genes that are distant from the variants implicated by GWAS. In particular, our work identified SLC25A48 as a genetic determinant of plasma choline levels. Mechanistically, SLC25A48 loss strongly impairs mitochondrial choline import and synthesis of its downstream metabolite, betaine. Rare variant testing and polygenic risk score analyses have elucidated choline-relevant phenomic consequences of SLC25A48 dysfunction. Altogether, our study proposes SLC25A48 as a mitochondrial choline transporter and provides a discovery platform for metabolic gene function.
Collapse
|
109
|
Martino J, Liu Q, Vukojevic K, Ke J, Lim TY, Khan A, Gupta Y, Perez A, Yan Z, Milo Rasouly H, Vena N, Lippa N, Giordano JL, Saraga M, Saraga-Babic M, Westland R, Bodria M, Piaggio G, Bendapudi PK, Iglesias AD, Wapner RJ, Tasic V, Wang F, Ionita-Laza I, Ghiggeri GM, Kiryluk K, Sampogna RV, Mendelsohn CL, D'Agati VD, Gharavi AG, Sanna-Cherchi S. Mouse and human studies support DSTYK loss of function as a low-penetrance and variable expressivity risk factor for congenital urinary tract anomalies. Genet Med 2023; 25:100983. [PMID: 37746849 DOI: 10.1016/j.gim.2023.100983] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 09/14/2023] [Accepted: 09/17/2023] [Indexed: 09/26/2023] Open
Abstract
PURPOSE Previous work identified rare variants in DSTYK associated with human congenital anomalies of the kidney and urinary tract (CAKUT). Here, we present a series of mouse and human studies to clarify the association, penetrance, and expressivity of DSTYK variants. METHODS We phenotypically characterized Dstyk knockout mice of 3 separate inbred backgrounds and re-analyzed the original family segregating the DSTYK c.654+1G>A splice-site variant (referred to as "SSV" below). DSTYK loss of function (LOF) and SSVs were annotated in individuals with CAKUT, epilepsy, or amyotrophic lateral sclerosis vs controls. A phenome-wide association study analysis was also performed using United Kingdom Biobank (UKBB) data. RESULTS Results demonstrate ∼20% to 25% penetrance of obstructive uropathy, at least, in C57BL/6J and FVB/NJ Dstyk-/- mice. Phenotypic penetrance increased to ∼40% in C3H/HeJ mutants, with mild-to-moderate severity. Re-analysis of the original family segregating the rare SSV showed low penetrance (43.8%) and no alternative genetic causes for CAKUT. LOF DSTYK variants burden showed significant excess for CAKUT and epilepsy vs controls and an exploratory phenome-wide association study supported association with neurological disorders. CONCLUSION These data support causality for DSTYK LOF variants and highlights the need for large-scale sequencing studies (here >200,000 cases) to accurately assess causality for genes and variants to lowly penetrant traits with common population prevalence.
Collapse
Affiliation(s)
- Jeremiah Martino
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Qingxue Liu
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Katarina Vukojevic
- Department of Medicine, Columbia University Irving Medical Center, New York, NY; Department of Anatomy, Histology and Embryology, University of Split School of Medicine, Split, Croatia
| | - Juntao Ke
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Tze Y Lim
- Department of Medicine, Columbia University Irving Medical Center, New York, NY; Unit of Genomic Variability and Complex Diseases, Department of Medical Sciences, University of Turin, Turin, Italy
| | - Atlas Khan
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Yask Gupta
- Department of Medicine, Columbia University Irving Medical Center, New York, NY; Institute for Inflammation Medicine, University of Lubeck, Germany
| | - Alejandra Perez
- Department of Medicine, Columbia University Irving Medical Center, New York, NY; Department of Urology, Mount Sinai Medical Center, Miami, FL
| | - Zonghai Yan
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Hila Milo Rasouly
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Natalie Vena
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Natalie Lippa
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Jessica L Giordano
- Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY
| | - Marijan Saraga
- Department of Pediatrics, University Hospital of Split, Split, Croatia; School of Medicine, University of Split, Split, Croatia
| | - Mirna Saraga-Babic
- Department of Anatomy, Histology and Embryology, University of Split School of Medicine, Split, Croatia
| | - Rik Westland
- Department of Pediatric Nephrology, Emma Children's Hospital, University of Amsterdam, Meibergdreef 9, Amsterdam, The Netherlands
| | - Monica Bodria
- Division of Nephrology and Renal Transplantation, IRCCS Istituto Giannina Gaslini, Genoa, Italy; Laboratory on Molecular Nephrology, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | - Giorgio Piaggio
- Division of Nephrology and Renal Transplantation, IRCCS Istituto Giannina Gaslini, Genoa, Italy; Laboratory on Molecular Nephrology, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | - Pavan K Bendapudi
- Division of Hematology and Blood Transfusion Service, Massachusetts General Hospital, Boston, MA; Division of Hemostasis and Thrombosis, Beth Israel Deaconess Medical Center, Boston, MA; Harvard Medical School, Boston, MA
| | - Alejandro D Iglesias
- Department of Pediatrics, Columbia University Vagelos College of Physicians and Surgeons, New York, NY
| | - Ronald J Wapner
- Obstetrics and Gynecology, Columbia University Irving Medical Center, New York, NY
| | - Velibor Tasic
- Medical Faculty of Skopje, University Children's Hospital, Skopje, Macedonia
| | - Fan Wang
- Department of Biostatistics, Columbia University, New York, NY
| | | | - Gian Marco Ghiggeri
- Division of Nephrology and Renal Transplantation, IRCCS Istituto Giannina Gaslini, Genoa, Italy; Laboratory on Molecular Nephrology, IRCCS Istituto Giannina Gaslini, Genoa, Italy
| | - Krzysztof Kiryluk
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Rosemary V Sampogna
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Cathy L Mendelsohn
- Department of Urology, Columbia University Irving Medical Center, New York, NY; Department of Pathology and Cell Biology, Columbia University Irving Medical Center, New York, NY; Department of Genetics and Development, Columbia University Irving Medical Center, New York, NY; Columbia Stem Cell Initiative, Columbia University Irving Medical Center, New York, NY
| | - Vivette D D'Agati
- The Renal Pathology Laboratory of the Department of Pathology and Cell Biology, Columbia University, New York, NY
| | - Ali G Gharavi
- Department of Medicine, Columbia University Irving Medical Center, New York, NY
| | - Simone Sanna-Cherchi
- Department of Medicine, Columbia University Irving Medical Center, New York, NY.
| |
Collapse
|
110
|
Chen H, Naseri A, Zhi D. FiMAP: A fast identity-by-descent mapping test for biobank-scale cohorts. PLoS Genet 2023; 19:e1011057. [PMID: 38039339 PMCID: PMC10718418 DOI: 10.1371/journal.pgen.1011057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2023] [Revised: 12/13/2023] [Accepted: 11/07/2023] [Indexed: 12/03/2023] Open
Abstract
Although genome-wide association studies (GWAS) have identified tens of thousands of genetic loci, the genetic architecture is still not fully understood for many complex traits. Most GWAS and sequencing association studies have focused on single nucleotide polymorphisms or copy number variations, including common and rare genetic variants. However, phased haplotype information is often ignored in GWAS or variant set tests for rare variants. Here we leverage the identity-by-descent (IBD) segments inferred from a random projection-based IBD detection algorithm in the mapping of genetic associations with complex traits, to develop a computationally efficient statistical test for IBD mapping in biobank-scale cohorts. We used sparse linear algebra and random matrix algorithms to speed up the computation, and a genome-wide IBD mapping scan of more than 400,000 samples finished within a few hours. Simulation studies showed that our new method had well-controlled type I error rates under the null hypothesis of no genetic association in large biobank-scale cohorts, and outperformed traditional GWAS single-variant tests when the causal variants were untyped and rare, or in the presence of haplotype effects. We also applied our method to IBD mapping of six anthropometric traits using the UK Biobank data and identified a total of 3,442 associations, 2,131 (62%) of which remained significant after conditioning on suggestive tag variants in the ± 3 centimorgan flanking regions from GWAS.
Collapse
Affiliation(s)
- Han Chen
- Human Genetics Center, Department of Epidemiology, School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Ardalan Naseri
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| | - Degui Zhi
- Center for Artificial Intelligence and Genome Informatics, McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, United States of America
| |
Collapse
|
111
|
Xie H, Cao X, Zhang S, Sha Q. Joint analysis of multiple phenotypes for extremely unbalanced case-control association studies using multi-layer network. Bioinformatics 2023; 39:btad707. [PMID: 37991852 PMCID: PMC10697735 DOI: 10.1093/bioinformatics/btad707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 09/29/2023] [Accepted: 11/21/2023] [Indexed: 11/24/2023] Open
Abstract
MOTIVATION Genome-wide association studies is an essential tool for analyzing associations between phenotypes and single nucleotide polymorphisms (SNPs). Most of binary phenotypes in large biobanks are extremely unbalanced, which leads to inflated type I error rates for many widely used association tests for joint analysis of multiple phenotypes. In this article, we first propose a novel method to construct a Multi-Layer Network (MLN) using individuals with at least one case status among all phenotypes. Then, we introduce a computationally efficient community detection method to group phenotypes into disjoint clusters based on the MLN. Finally, we propose a novel approach, MLN with Omnibus (MLN-O), to jointly analyse the association between phenotypes and a SNP. MLN-O uses the score test to test the association of each merged phenotype in a cluster and a SNP, then uses the Omnibus test to obtain an overall test statistic to test the association between all phenotypes and a SNP. RESULTS We conduct extensive simulation studies to reveal that the proposed approach can control type I error rates and is more powerful than some existing methods. Meanwhile, we apply the proposed method to a real data set in the UK Biobank. Using phenotypes in Chapter XIII (Diseases of the musculoskeletal system and connective tissue) in the UK Biobank, we find that MLN-O identifies more significant SNPs than other methods we compare with. AVAILABILITY AND IMPLEMENTATION https://github.com/Hongjing-Xie/Multi-Layer-Network-with-Omnibus-MLN-O.
Collapse
Affiliation(s)
- Hongjing Xie
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Xuewei Cao
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
112
|
de Vries PS, Conomos MP, Singh K, Nicholson CJ, Jain D, Hasbani NR, Jiang W, Lee S, Lino Cardenas CL, Lutz SM, Wong D, Guo X, Yao J, Young EP, Tcheandjieu C, Hilliard AT, Bis JC, Bielak LF, Brown MR, Musharoff S, Clarke SL, Terry JG, Palmer ND, Yanek LR, Xu H, Heard-Costa N, Wessel J, Selvaraj MS, Li RH, Sun X, Turner AW, Stilp AM, Khan A, Newman AB, Rasheed A, Freedman BI, Kral BG, McHugh CP, Hodonsky C, Saleheen D, Herrington DM, Jacobs DR, Nickerson DA, Boerwinkle E, Wang FF, Heiss G, Jun G, Kinney GL, Sigurslid HH, Doddapaneni H, Hall IM, Bensenor IM, Broome J, Crapo JD, Wilson JG, Smith JA, Blangero J, Vargas JD, Mosquera JV, Smith JD, Viaud-Martinez KA, Ryan KA, Young KA, Taylor KD, Lange LA, Emery LS, Bittencourt MS, Budoff MJ, Montasser ME, Yu M, Mahaney MC, Mahamdeh MS, Fornage M, Franceschini N, Lotufo PA, Natarajan P, Wong Q, Mathias RA, Gibbs RA, Do R, Mehran R, Tracy RP, Kim RW, Nelson SC, Damrauer SM, Kardia SL, Rich SS, Fuster V, Napolioni V, Zhao W, Tian W, Yin X, Min YI, Manning AK, Peloso G, Kelly TN, O’Donnell CJ, Morrison AC, Curran JE, Zapol WM, Bowden DW, Becker LC, Correa A, Mitchell BD, Psaty BM, Carr JJ, Pereira AC, Assimes TL, Stitziel NO, Hokanson JE, Laurie CA, Rotter JI, Vasan RS, Post WS, Peyser PA, Miller CL, Malhotra R. Whole-genome sequencing uncovers two loci for coronary artery calcification and identifies ARSE as a regulator of vascular calcification. NATURE CARDIOVASCULAR RESEARCH 2023; 2:1159-1172. [PMID: 38817323 PMCID: PMC11138106 DOI: 10.1038/s44161-023-00375-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2022] [Accepted: 10/25/2023] [Indexed: 06/01/2024]
Abstract
Coronary artery calcification (CAC) is a measure of atherosclerosis and a well-established predictor of coronary artery disease (CAD) events. Here we describe a genome-wide association study (GWAS) of CAC in 22,400 participants from multiple ancestral groups. We confirmed associations with four known loci and identified two additional loci associated with CAC (ARSE and MMP16), with evidence of significant associations in replication analyses for both novel loci. Functional assays of ARSE and MMP16 in human vascular smooth muscle cells (VSMCs) demonstrate that ARSE is a promoter of VSMC calcification and VSMC phenotype switching from a contractile to a calcifying or osteogenic phenotype. Furthermore, we show that the association of variants near ARSE with reduced CAC is likely explained by reduced ARSE expression with the G allele of enhancer variant rs5982944. Our study highlights ARSE as an important contributor to atherosclerotic vascular calcification, and a potential drug target for vascular calcific disease.
Collapse
Affiliation(s)
- Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
| | - Matthew P. Conomos
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Kuldeep Singh
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Christopher J. Nicholson
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Deepti Jain
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Natalie R. Hasbani
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
| | - Wanlin Jiang
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Sujin Lee
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Christian L Lino Cardenas
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Sharon M. Lutz
- PRecisiOn Medicine Translational Research (PROMoTeR)
Center, Department of Population Medicine, Harvard Medical School and Harvard
Pilgrim Health Care Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of
Public Health, Boston, MA, USA
| | - Doris Wong
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population
Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical
Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Jie Yao
- The Institute for Translational Genomics and Population
Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical
Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Erica P. Young
- Cardiovascular Division, Department of Internal Medicine,
Washington University School of Medicine, St. Louis, MO, USA
| | - Catherine Tcheandjieu
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of
Medicine, Stanford, CA, USA
| | - Austin T. Hilliard
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Palo Alto Veterans Institute for Research, Palo Alto, CA,
USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of
Medicine, University of Washington, Seattle, WA, USA
| | - Lawrence F. Bielak
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
| | - Michael R. Brown
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
| | - Shaila Musharoff
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Genetics, Stanford University School of
Medicine, Stanford, CA, USA
| | - Shoa L. Clarke
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of
Medicine, Stanford, CA, USA
| | - James G. Terry
- Department of Radiology, Vanderbilt Translational and
Clinical Cardiovascular Research Center, Vanderbilt University Medical Center,
Nashville, TN, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest School of
Medicine, Winston-Salem, NC, USA
| | - Lisa R. Yanek
- Division of General Internal Medicine, Department of
Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Huichun Xu
- Division of Endocrinology, Diabetes and Nutrition,
Department of Medicine, University of Maryland School of Medicine, Baltimore, MD,
USA
| | - Nancy Heard-Costa
- Boston University School of Medicine, Boston, MA,
USA
- Boston University and National Heart, Lung, and Blood
Institute’s Framingham Heart Study, Framingham, MA, USA
| | - Jennifer Wessel
- Department of Epidemiology, Fairbanks School of Public
Health, Indiana University, Indianapolis, IN, USA
- Diabetes Translational Research Center, Indiana
University, Indianapolis, IN, USA
| | - Margaret Sunitha Selvaraj
- Cardiovascular Research Center and Center for Genomic
Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad
Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston,
MA, USA
| | - Rebecca H. Li
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Xiao Sun
- School of Public Health and Tropical Medicine, Department
of Epidemiology, Tulane University, New Orleans, LA, USA
- College of Medicine, Department of Medicine, Division of
Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Adam W. Turner
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Adrienne M. Stilp
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Alyna Khan
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Anne B. Newman
- Department of Epidemiology, Graduate School of Public
Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Asif Rasheed
- Center For Non-Communicable Diseases, Karachi,
Pakistan
| | - Barry I Freedman
- Section on Nephrology, Department of Internal Medicine,
Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Brian G. Kral
- Division of Cardiology, Department of Medicine, Johns
Hopkins University School of Medicine, Baltimore, MD, USA
| | - Caitlin P. McHugh
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Chani Hodonsky
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Danish Saleheen
- Center For Non-Communicable Diseases, Karachi,
Pakistan
- Department of Medicine, Columbia University Irving
Medical Center, New York, NY, USA
- Department of Cardiology, Columbia University Irving
Medical Center, New York, NY, USA
| | - David M. Herrington
- Department of Internal Medicine, Section of
Cardiovascular Medicine, Wake Forest School of Medicine, Winston-Salem, NC,
USA
| | - David R. Jacobs
- Division of Epidemiology and Community Health, University
of Minnesota School of Public Health, Minneapolis, MN, USA
| | - Deborah A. Nickerson
- Department of Genome Sciences, University of Washington,
Seattle, WA, USA
- Northwest Genomics Center, University of Washington,
Seattle, WA, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of
Medicine, Houston, TX, USA
| | - Fei Fei Wang
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Gerardo Heiss
- Department of Epidemiology, Gillings School of Global
Public Health, University of North Carolina, Chapel Hill, NC, USA
| | - Goo Jun
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
| | - Greg L. Kinney
- Department of Epidemiology, Colorado School of Public
Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Haakon H. Sigurslid
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | | | - Ira M. Hall
- Yale Center for Genomic Health, Yale School of Medicine,
New Haven, CT, USA
| | - Isabela M. Bensenor
- Center for Clinical and Epidemiological Research,
University Hospital, University of Sao Paulo Medical School, São Paulo, Brazil
| | - Jai Broome
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - James D. Crapo
- Department of Medicine, National Jewish Health, Denver,
CO, USA
| | - James G. Wilson
- Division of Cardiology, Beth Israel Deaconess Medical
Center, Boston, MA, USA
| | - Jennifer A. Smith
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research,
University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics, University of Texas Rio
Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of
Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Jose D. Vargas
- Medstar Heart and Vascular Institute, Medstar Georgetown
University Hospital, Washington, DC, USA
| | - Jose Verdezoto Mosquera
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Joshua D. Smith
- Department of Genome Sciences, University of Washington,
Seattle, WA, USA
- Northwest Genomics Center, University of Washington,
Seattle, WA, USA
| | | | - Kathleen A. Ryan
- Division of Endocrinology, Diabetes and Nutrition,
Department of Medicine, University of Maryland School of Medicine, Baltimore, MD,
USA
| | - Kendra A. Young
- Department of Epidemiology, Colorado School of Public
Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population
Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical
Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Leslie A. Lange
- Department of Medicine, University of Colorado Denver,
Anschutz Medical Campus, Aurora, CO, USA
| | - Leslie S. Emery
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Marcio S. Bittencourt
- Center for Clinical and Epidemiological Research,
University Hospital, University of Sao Paulo Medical School, São Paulo, Brazil
| | - Matthew J. Budoff
- Department of Medicine, The Lundquist Institute for
Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - May E. Montasser
- Division of Endocrinology, Diabetes and Nutrition,
Department of Medicine, University of Maryland School of Medicine, Baltimore, MD,
USA
| | - Miao Yu
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
| | - Michael C. Mahaney
- Department of Human Genetics, University of Texas Rio
Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of
Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Mohammed S Mahamdeh
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
- Institute of Molecular Medicine, McGovern Medical School,
The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, Gillings School of Global
Public health, University of North Carolina, Chapel Hill, NC, USA
| | - Paulo A. Lotufo
- Center for Clinical and Epidemiological Research,
University Hospital, University of Sao Paulo Medical School, São Paulo, Brazil
| | - Pradeep Natarajan
- Cardiovascular Research Center and Center for Genomic
Medicine, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad
Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston,
MA, USA
| | - Quenna Wong
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Rasika A. Mathias
- Division of General Internal Medicine, Department of
Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
- Division of Allergy and Clinical Immunology, Department
of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of
Medicine, Houston, TX, USA
- Department of Molecular and Human Genetics, Baylor
College of Medicine, Houston, TX, USA
| | - Ron Do
- The Charles Bronfman Institute for Personalized Medicine,
Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Genetics and Genomic Sciences, Icahn School
of Medicine at Mount Sinai, New York, NY, USA
| | - Roxana Mehran
- Icahn School of Medicine at Mount Sinai, New York, NY,
USA
| | - Russell P. Tracy
- Department of Pathology and Laboratory Medicine, Robert
Larner, M.D. College of Medicine, University of Vermont, Burlington, VT, USA
| | | | - Sarah C. Nelson
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Scott M. Damrauer
- Corporal Michael J. Crescenz VA Medical Center,
Philadelphia, PA, USA
- Department of Surgery, Perelman School of Medicine,
University of Pennsylvania, Philadelphia, PA, USA
| | - Sharon L.R. Kardia
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Valentin Fuster
- Centro Nacional de Investigaciones Cardiovasculares
Carlos III, Madrid, Spain
- Mount Sinai Heart Center, New York, NY, USA
| | - Valerio Napolioni
- Genomic And Molecular Epidemiology (GAME) Lab, School of
Biosciences and Veterinary Medicine, University of Camerino, Camerino, Italy
| | - Wei Zhao
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
| | - Wenjie Tian
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| | - Xianyong Yin
- Department of Biostatistics and Center for Statistical
Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Yuan-I Min
- Jackson Heart Study, Department of Medicine, University
of Mississippi Medical Center, Jackson, MS, USA
| | - Alisa K. Manning
- Clinical and Translation Epidemiology Unit, Department of
Medicine, Massachusetts General Hospital, Boston, MA, USA
- Programs in Metabolism and Medical and Population
Genetics, Broad Institute, Cambridge, MA, USA
| | - Gina Peloso
- Department of Biostatistics, Boston University School of
Public Health, Boston, MA, USA
| | - Tanika N. Kelly
- College of Medicine, Department of Medicine, Division of
Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Christopher J. O’Donnell
- VA Boston Healthcare System, Boston, MA, USA
- Department of Medicine, Brigham and Women’s Hospital,
Boston, MA, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human
Genetics, and Environmental Sciences, School of Public Health, The University of
Texas Health Science Center at Houston, Houston, TX, USA
| | - Joanne E. Curran
- Department of Human Genetics, University of Texas Rio
Grande Valley School of Medicine, Brownsville, TX, USA
- South Texas Diabetes and Obesity Institute, University of
Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Warren M. Zapol
- Department of Anesthesia, Critical Care and Pain Medicine
at Massachusetts General Hospital, Boston, MA, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest School of
Medicine, Winston-Salem, NC, USA
| | - Lewis C. Becker
- Division of Cardiology, Department of Medicine, Johns
Hopkins University School of Medicine, Baltimore, MD, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University
of Mississippi Medical Center, Jackson, MS, USA
- Department of Population Health Science, University of
Mississippi Medical Center, Jackson, MS, USA
| | - Braxton D. Mitchell
- Division of Endocrinology, Diabetes and Nutrition,
Department of Medicine, University of Maryland School of Medicine, Baltimore, MD,
USA
- Geriatrics Research and Education Clinical Center,
Baltimore Veterans Administration Medical Center, Baltimore, MD, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of
Medicine, University of Washington, Seattle, WA, USA
- Department of Epidemiology, University of Washington,
Seattle, WA, USA
- Department of Health Services, University of Washington,
Seattle, WA, USA
| | - John Jeffrey Carr
- Department of Radiology, Vanderbilt Translational and
Clinical Cardiovascular Research Center, Vanderbilt University Medical Center,
Nashville, TN, USA
| | - Alexandre C. Pereira
- Department of Genetics, Harvard Medical School, Boston,
MA, USA
- Laboratory of Genetics and Molecular Cardiology, Heart
Institute, University of São Paulo, São Paulo, Brazil
| | - Themistocles L. Assimes
- VA Palo Alto Healthcare System, Palo Alto, CA, USA
- Department of Medicine, Stanford University School of
Medicine, Stanford, CA, USA
| | - Nathan O. Stitziel
- Cardiovascular Division, Department of Internal Medicine,
Washington University School of Medicine, St. Louis, MO, USA
- Department of Genetics, Washington University School of
Medicine, St. Louis, MO, USA
- McDonnell Genome Institute, Washington University School
of Medicine, St. Louis, MO, USA
| | - John E. Hokanson
- Department of Epidemiology, Colorado School of Public
Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA
| | - Cecelia A. Laurie
- Genetic Analysis Center, Department of Biostatistics,
School of Public Health, University of Washington, Seattle, WA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population
Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical
Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Ramachandran S. Vasan
- Boston University and National Heart, Lung, and Blood
Institute’s Framingham Heart Study, Framingham, MA, USA
- Department of Medicine, Boston University School of
Medicine, Boston, MA, USA
- Department of Epidemiology, Boston University School of
Public Health, Boston, MA, USA
| | - Wendy S. Post
- Division of Cardiology, Department of Medicine, Johns
Hopkins University School of Medicine, Baltimore, MD, USA
| | - Patricia A. Peyser
- School of Public Health, Department of Epidemiology,
University of Michigan, Ann Arbor, MI, USA
| | - Clint L. Miller
- Center for Public Health Genomics, University of Virginia
School of Medicine, Charlottesville, VA, USA
| | - Rajeev Malhotra
- Cardiovascular Research Center, Division of Cardiology,
Department of Medicine, Massachusetts General Hospital, Harvard Medical School,
Boston, MA, USA
| |
Collapse
|
113
|
Lopera-Maya EA, Li S, de Brouwer R, Nolte IM, van Breen J, Jongbloed JDH, Swertz MA, Snieder H, Franke L, Wijmenga C, de Boer RA, Deelen P, van der Zwaag PA, Sanna S. Phenotypic and Genetic Factors Associated with Absence of Cardiomyopathy Symptoms in PLN:c.40_42delAGA Carriers. J Cardiovasc Transl Res 2023; 16:1251-1266. [PMID: 36622581 PMCID: PMC10721704 DOI: 10.1007/s12265-022-10347-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 12/14/2022] [Indexed: 01/10/2023]
Abstract
The c.40_42delAGA variant in the phospholamban gene (PLN) has been associated with dilated and arrhythmogenic cardiomyopathy, with up to 70% of carriers experiencing a major cardiac event by age 70. However, there are carriers who remain asymptomatic at older ages. To understand the mechanisms behind this incomplete penetrance, we evaluated potential phenotypic and genetic modifiers in 74 PLN:c.40_42delAGA carriers identified in 36,339 participants of the Lifelines population cohort. Asymptomatic carriers (N = 48) showed shorter QRS duration (- 5.73 ms, q value = 0.001) compared to asymptomatic non-carriers, an effect we could replicate in two different independent cohorts. Furthermore, symptomatic carriers showed a higher correlation (rPearson = 0.17) between polygenic predisposition to higher QRS (PGSQRS) and QRS (p value = 1.98 × 10-8), suggesting that the effect of the genetic variation on cardiac rhythm might be increased in symptomatic carriers. Our results allow for improved clinical interpretation for asymptomatic carriers, while our approach could guide future studies on genetic diseases with incomplete penetrance.
Collapse
Affiliation(s)
- Esteban A Lopera-Maya
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Shuang Li
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Remco de Brouwer
- Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Ilja M Nolte
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Justin van Breen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Jan D H Jongbloed
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Morris A Swertz
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
- Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Harold Snieder
- Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Lude Franke
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Cisca Wijmenga
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Rudolf A de Boer
- Department of Cardiology, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
| | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands
- Oncode Institute, Utrecht, Netherlands
| | - Paul A van der Zwaag
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.
| | - Serena Sanna
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, Netherlands.
- Institute for Genetic and Biomedical Research (IRGB), National Research Council (CNR), Cagliari, Italy.
| |
Collapse
|
114
|
Wang D, Perera D, He J, Cao C, Kossinna P, Li Q, Zhang W, Guo X, Platt A, Wu J, Zhang Q. cLD: Rare-variant linkage disequilibrium between genomic regions identifies novel genomic interactions. PLoS Genet 2023; 19:e1011074. [PMID: 38109434 PMCID: PMC10758262 DOI: 10.1371/journal.pgen.1011074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Revised: 01/01/2024] [Accepted: 11/20/2023] [Indexed: 12/20/2023] Open
Abstract
Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.
Collapse
Affiliation(s)
- Dinghao Wang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Deshan Perera
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Jingni He
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Chen Cao
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Pathum Kossinna
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - Qing Li
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
| | - William Zhang
- The Harker School, San Jose, California, United States of America
| | - Xingyi Guo
- Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America
| | - Alexander Platt
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Jingjing Wu
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
| | - Qingrun Zhang
- Department of Mathematics and Statistics, University of Calgary, Calgary, Alberta, Canada
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, Alberta, Canada
- Alberta Children’s Hospital Research Institute, University of Calgary, Calgary, Alberta, Canada
| |
Collapse
|
115
|
Chen Z, Liang H, Wei P. Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers. Genet Epidemiol 2023; 47:617-636. [PMID: 37822029 DOI: 10.1002/gepi.22537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 07/22/2023] [Accepted: 09/18/2023] [Indexed: 10/13/2023]
Abstract
Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case-control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels.p $p$ -values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.
Collapse
Affiliation(s)
- Zhongyuan Chen
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, Texas, USA
| | - Peng Wei
- Department of Biostatistics, MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
116
|
Chen Y, Paramo MI, Zhang Y, Yao L, Shah SR, Jin Y, Zhang J, Pan X, Yu H. Finding Needles in the Haystack: Strategies for Uncovering Noncoding Regulatory Variants. Annu Rev Genet 2023; 57:201-222. [PMID: 37562413 DOI: 10.1146/annurev-genet-030723-120717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/12/2023]
Abstract
Despite accumulating evidence implicating noncoding variants in human diseases, unraveling their functionality remains a significant challenge. Systematic annotations of the regulatory landscape and the growth of sequence variant data sets have fueled the development of tools and methods to identify causal noncoding variants and evaluate their regulatory effects. Here, we review the latest advances in the field and discuss potential future research avenues to gain a more in-depth understanding of noncoding regulatory variants.
Collapse
Affiliation(s)
- You Chen
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Mauricio I Paramo
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yingying Zhang
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Li Yao
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Sagar R Shah
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Yiyang Jin
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Junke Zhang
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| | - Xiuqi Pan
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, USA
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
| | - Haiyuan Yu
- Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, New York, USA;
- Department of Computational Biology, Cornell University, Ithaca, New York, USA
| |
Collapse
|
117
|
Liu Z, Xu J, Tan J, Li X, Zhang F, Ouyang W, Wang S, Huang Y, Li S, Pan X. Genetic overlap for ten cardiovascular diseases: A comprehensive gene-centric pleiotropic association analysis and Mendelian randomization study. iScience 2023; 26:108150. [PMID: 37908310 PMCID: PMC10613921 DOI: 10.1016/j.isci.2023.108150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Revised: 08/13/2023] [Accepted: 10/02/2023] [Indexed: 11/02/2023] Open
Abstract
Recent studies suggest that pleiotropic effects may explain the genetic architecture of cardiovascular diseases (CVDs). We conducted a comprehensive gene-centric pleiotropic association analysis for ten CVDs using genome-wide association study (GWAS) summary statistics to identify pleiotropic genes and pathways that may underlie multiple CVDs. We found shared genetic mechanisms underlying the pathophysiology of CVDs, with over two-thirds of the diseases exhibiting common genes and single-nucleotide polymorphisms (SNPs). Significant positive genetic correlations were observed in more than half of paired CVDs. Additionally, we investigated the pleiotropic genes shared between different CVDs, as well as their functional pathways and distribution in different tissues. Moreover, six hub genes, including ALDH2, XPO1, HSPA1L, ESR2, WDR12, and RAB1A, as well as 26 targeted potential drugs, were identified. Our study provides further evidence for the pleiotropic effects of genetic variants on CVDs and highlights the importance of considering pleiotropy in genetic association studies.
Collapse
Affiliation(s)
- Zeye Liu
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Jing Xu
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Jiangshan Tan
- Key Laboratory of Pulmonary Vascular Medicine, National Clinical Research Center of Cardiovascular Diseases, State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing 100037, China
| | - Xiaofei Li
- Department of Cardiology, Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Fengwen Zhang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Wenbin Ouyang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Shouzheng Wang
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| | - Yuan Huang
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Shoujun Li
- State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Diseases, Pediatric Cardiac Surgery Center, Fuwai Hospital, Chinese Academy of Medical Sciences, and Peking Union Medical College, Beijing, China
| | - Xiangbin Pan
- Department of Structural Heart Disease, National Center for Cardiovascular Disease, China & Fuwai Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100037, China
- National Health Commission Key Laboratory of Cardiovascular Regeneration Medicine, Beijing 100037, China
- Key Laboratory of Innovative Cardiovascular Devices, Chinese Academy of Medical Sciences, Beijing 100037, China
- National Clinical Research Center for Cardiovascular Diseases, Fuwai Hospital, Chinese Academy of Medical Sciences, Beijing 100037, China
| |
Collapse
|
118
|
Dichgans M, Malik R, Beaufort N, Tanaka K, Georgakis M, He Y, Koido M, Terao C, Anderson C, Kamatani Y. Genetically proxied HTRA1 protease activity and circulating levels independently predict risk of ischemic stroke and coronary artery disease. RESEARCH SQUARE 2023:rs.3.rs-3523612. [PMID: 37986915 PMCID: PMC10659557 DOI: 10.21203/rs.3.rs-3523612/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
HTRA1 has emerged as a major risk gene for stroke and cerebral small vessel disease with both rare and common variants contributing to disease risk. However, the precise mechanisms mediating this risk remain largely unknown as does the full spectrum of phenotypes associated with genetic variation in HTRA1 in the general population. Using a family-history informed approach, we first show that rare variants in HTRA1 are linked to ischemic stroke in 425,338 European individuals from the UK Biobank with replication in 143,149 individuals from the Biobank Japan. Integrating data from biochemical experiments on 76 mutations occurring in the UK Biobank, we next show that rare variants causing loss of protease function in vitro associate with ischemic stroke, coronary artery disease, and skeletal traits. In addition, a common causal variant (rs2672592) modulating circulating HTRA1 mRNA and protein levels enhances the risk of ischemic stroke, small vessel stroke, and coronary artery disease while lowering the risk of migraine and age-related macular dystrophy in GWAS and UK Biobank data from > 2,000,000 individuals. There was no evidence of an interaction between genetically proxied HTRA1 activity and levels. Our findings demonstrate a central role of HTRA1 for human disease including stroke and coronary artery disease and identify two independent mechanisms that might qualify as targets for future therapeutic interventions.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Masaru Koido
- Institute of Medical Science, The University of Tokyo
| | | | | | | |
Collapse
|
119
|
Zheng D, Grandgenett PM, Zhang Q, Baine M, Shi Y, Du Q, Liang X, Wong J, Iqbal S, Preuss K, Kamal A, Yu H, Du H, Hollingsworth MA, Zhang C. radioGWAS: link radiome to genome to discover driver genes with somatic mutations for heterogeneous tumor image phenotype in pancreatic cancer. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.11.02.23297995. [PMID: 37961101 PMCID: PMC10635263 DOI: 10.1101/2023.11.02.23297995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Addressing the significant level of variability exhibited by pancreatic cancer necessitates the adoption of a systems biology approach that integrates molecular data, biological properties of the tumors, and clinical features of the patients. In this study, a comprehensive multi-omics methodology was employed to examine a distinctive collection patient dataset containing rapid autopsy tumor and normal tissue samples as well as longitudinal imaging with a focus on pancreatic cancer. By performing a whole exome sequencing analysis on tumor and normal tissues to identify somatic gene variants and a radiomics feature analysis to tumor CT images, the genome-wide association approach established a connection between pancreatic cancer driver genes and relevant radiomics features, enabling a thorough and quantitative assessment of the heterogeneity of pancreatic tumors. The significant association between sets of genes and radiomics features revealed the involvement of genes in shaping tumor morphological heterogeneity. Some results of the association established a connection between the molecular level mechanism and their outcomes at the level of tumor structural heterogeneity. Because tumor structure and tumor structural heterogeneity are related to the patients' overall survival, patients who had pancreatic cancer driver gene mutations with an association to a certain radiomics feature have been observed to experience worse survival rates than cases without these somatic mutations. Furthermore, the outcome of the association analysis has revealed potential gene mutations and radiomics feature candidates that warrant further investigation in future research endeavors.
Collapse
|
120
|
Makarious MB, Lake J, Pitz V, Ye Fu A, Guidubaldi JL, Solsberg CW, Bandres-Ciga S, Leonard HL, Kim JJ, Billingsley KJ, Grenn FP, Jerez PA, Alvarado CX, Iwaki H, Ta M, Vitale D, Hernandez D, Torkamani A, Ryten M, Hardy J, Scholz SW, Traynor BJ, Dalgard CL, Ehrlich DJ, Tanaka T, Ferrucci L, Beach TG, Serrano GE, Real R, Morris HR, Ding J, Gibbs JR, Singleton AB, Nalls MA, Bhangale T, Blauwendraat C. Large-scale rare variant burden testing in Parkinson's disease. Brain 2023; 146:4622-4632. [PMID: 37348876 PMCID: PMC10629770 DOI: 10.1093/brain/awad214] [Citation(s) in RCA: 13] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 05/01/2023] [Accepted: 05/30/2023] [Indexed: 06/24/2023] Open
Abstract
Parkinson's disease has a large heritable component and genome-wide association studies have identified over 90 variants with disease-associated common variants, providing deeper insights into the disease biology. However, there have not been large-scale rare variant analyses for Parkinson's disease. To address this gap, we investigated the rare genetic component of Parkinson's disease at minor allele frequencies <1%, using whole genome and whole exome sequencing data from 7184 Parkinson's disease cases, 6701 proxy cases and 51 650 healthy controls from the Accelerating Medicines Partnership Parkinson's disease (AMP-PD) initiative, the National Institutes of Health, the UK Biobank and Genentech. We performed burden tests meta-analyses on small indels and single nucleotide protein-altering variants, prioritized based on their predicted functional impact. Our work identified several genes reaching exome-wide significance. Two of these genes, GBA1 and LRRK2, have variants that have been previously implicated as risk factors for Parkinson's disease, with some variants in LRRK2 resulting in monogenic forms of the disease. We identify potential novel risk associations for variants in B3GNT3, AUNIP, ADH5, TUBA1B, OR1G1, CAPN10 and TREML1 but were unable to replicate the observed associations across independent datasets. Of these, B3GNT3 and TREML1 could provide new evidence for the role of neuroinflammation in Parkinson's disease. To date, this is the largest analysis of rare genetic variants in Parkinson's disease.
Collapse
Affiliation(s)
- Mary B Makarious
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- UCL Movement Disorders Centre, University College London, London WC1N 3BG, UK
| | - Julie Lake
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - Vanessa Pitz
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - Allen Ye Fu
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Department of Cell Biology and Neuroscience, Rutgers University, Piscataway, NJ 08854, USA
| | - Joseph L Guidubaldi
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| | - Caroline Warly Solsberg
- Memory and Aging Center, Department of Neurology, University of California San Francisco, San Francisco, CA 94158, USA
- Pharmaceutical Sciences and Pharmacogenomics, University of California San Francisco, San Francisco, CA 94143, USA
| | - Sara Bandres-Ciga
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| | - Hampton L Leonard
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Jonggeol Jeffrey Kim
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Preventive Neurology Unit, Wolfson Institute of Preventive Medicine, Queen Mary University of London, London EC1M 6BQ, UK
| | - Kimberley J Billingsley
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| | - Francis P Grenn
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - Pilar Alvarez Jerez
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| | - Chelsea X Alvarado
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Hirotaka Iwaki
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Michael Ta
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Dan Vitale
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Dena Hernandez
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - Ali Torkamani
- Department of Integrative Structural and Computational Biology, Scripps Research Institute, La Jolla, CA 92037, USA
| | - Mina Ryten
- NIHR Great Ormond Street Hospital Biomedical Research Centre, University College London, London WC1N 1EH, UK
- Department of Genetics and Genomic Medicine, Great Ormond Street Institute of Child Health, University College London, London WC1N 1EH, UK
| | - John Hardy
- UK Dementia Research Institute and Department of Neurodegenerative Disease and Reta Lila Weston Institute, UCL Queen Square Institute of Neurology and UCL Movement Disorders Centre, University College London, London WC1N 3BG, UK
- Institute for Advanced Study, The Hong Kong University of Science and Technology, Hong Kong SAR, China
| | | | - Sonja W Scholz
- Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, Bethesda, MD 20814, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD 21287, USA
| | - Bryan J Traynor
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD 21287, USA
| | - Clifton L Dalgard
- The American Genome Center, Uniformed Services University of the Health Sciences, Bethesda, MD 20814, USA
| | - Debra J Ehrlich
- Parkinson’s Disease Clinic, Office of the Clinical Director, National Institute of Neurological Disorders and Stroke, Bethesda, MD 20814, USA
| | - Toshiko Tanaka
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Luigi Ferrucci
- Translational Gerontology Branch, National Institute on Aging, NIH, Baltimore, MD 21224, USA
| | - Thomas G Beach
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ 85351, USA
| | - Geidy E Serrano
- Civin Laboratory for Neuropathology, Banner Sun Health Research Institute, Sun City, AZ 85351, USA
| | - Raquel Real
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- UCL Movement Disorders Centre, University College London, London WC1N 3BG, UK
| | - Huw R Morris
- Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology, London WC1N 3BG, UK
- UCL Movement Disorders Centre, University College London, London WC1N 3BG, UK
| | - Jinhui Ding
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - J Raphael Gibbs
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
| | - Andrew B Singleton
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| | - Mike A Nalls
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
- Data Tecnica International, Washington, DC 20812, USA
| | - Tushar Bhangale
- Department of Human Genetics, Genentech, Inc., South San Francisco, CA 94080, USA
| | - Cornelis Blauwendraat
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Integrative Neurogenomics Unit, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20814, USA
- Center for Alzheimer's and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20814, USA
| |
Collapse
|
121
|
Li X, Chen H, Selvaraj MS, Van Buren E, Zhou H, Wang Y, Sun R, McCaw ZR, Yu Z, Arnett DK, Bis JC, Blangero J, Boerwinkle E, Bowden DW, Brody JA, Cade BE, Carson AP, Carlson JC, Chami N, Chen YDI, Curran JE, de Vries PS, Fornage M, Franceschini N, Freedman BI, Gu C, Heard-Costa NL, He J, Hou L, Hung YJ, Irvin MR, Kaplan RC, Kardia SL, Kelly T, Konigsberg I, Kooperberg C, Kral BG, Li C, Loos RJ, Mahaney MC, Martin LW, Mathias RA, Minster RL, Mitchell BD, Montasser ME, Morrison AC, Palmer ND, Peyser PA, Psaty BM, Raffield LM, Redline S, Reiner AP, Rich SS, Sitlani CM, Smith JA, Taylor KD, Tiwari H, Vasan RS, Wang Z, Yanek LR, Yu B, Rice KM, Rotter JI, Peloso GM, Natarajan P, Li Z, Liu Z, Lin X. A statistical framework for powerful multi-trait rare variant analysis in large-scale whole-genome sequencing studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.30.564764. [PMID: 37961350 PMCID: PMC10634938 DOI: 10.1101/2023.10.30.564764] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Large-scale whole-genome sequencing (WGS) studies have improved our understanding of the contributions of coding and noncoding rare variants to complex human traits. Leveraging association effect sizes across multiple traits in WGS rare variant association analysis can improve statistical power over single-trait analysis, and also detect pleiotropic genes and regions. Existing multi-trait methods have limited ability to perform rare variant analysis of large-scale WGS data. We propose MultiSTAAR, a statistical framework and computationally-scalable analytical pipeline for functionally-informed multi-trait rare variant analysis in large-scale WGS studies. MultiSTAAR accounts for relatedness, population structure and correlation among phenotypes by jointly analyzing multiple traits, and further empowers rare variant association analysis by incorporating multiple functional annotations. We applied MultiSTAAR to jointly analyze three lipid traits (low-density lipoprotein cholesterol, high-density lipoprotein cholesterol and triglycerides) in 61,861 multi-ethnic samples from the Trans-Omics for Precision Medicine (TOPMed) Program. We discovered new associations with lipid traits missed by single-trait analysis, including rare variants within an enhancer of NIPSNAP3A and an intergenic region on chromosome 1.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Margaret Sunitha Selvaraj
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yuxuan Wang
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Zachary R. McCaw
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Zhi Yu
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Donna K. Arnett
- Provost Office, University of South Carolina, Columbia, SC, USA
| | - Joshua C. Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W. Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jennifer A. Brody
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Brian E. Cade
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - April P. Carson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - Jenna C. Carlson
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nathalie Chami
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yii-Der Ida Chen
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Joanne E. Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Paul S. de Vries
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Myriam Fornage
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Brown Foundation Institute of Molecular Medicine, McGovern Medical School, the University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Barry I. Freedman
- Department of Internal Medicine, Nephrology, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Charles Gu
- Division of Biology & Biomedical Sciences, Washington University School of Medicine, St. Louis, MO, USA
| | - Nancy L. Heard-Costa
- Department of Neurology, Boston University Chobanian & Avedisian School of Medicine, Boston, MA, USA
- Framingham Heart Study, Framingham, MA, USA
| | - Jiang He
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Lifang Hou
- Department of Preventive Medicine, Northwestern University, Chicago, IL, USA
| | - Yi-Jen Hung
- Department of Internal Medicine, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan
| | - Marguerite R. Irvin
- Department of Epidemiology, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Robert C. Kaplan
- Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, USA
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Sharon L.R. Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Tanika Kelly
- Department of Medicine, Division of Nephrology, University of Illinois Chicago, Chicago, IL, USA
| | - Iain Konigsberg
- Department of Biomedical Informatics, University of Colorado, Aurora, CO, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
| | - Brian G. Kral
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Changwei Li
- Department of Epidemiology, Tulane University School of Public Health and Tropical Medicine, New Orleans, LA, USA
- Tulane University Translational Science Institute, New Orleans, LA, USA
| | - Ruth J.F. Loos
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Michael C. Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Lisa W. Martin
- George Washington University School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A. Mathias
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Ryan L. Minster
- Department of Human Genetics and Department of Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Braxton D. Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - May E. Montasser
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Alanna C. Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Nicholette D. Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Patricia A. Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M. Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
- Department of Health Systems and Population Health, University of Washington, Seattle, WA, USA
| | - Laura M. Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women’s Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
| | - Alexander P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, USA
- Departments of Epidemiology, University of Washington, Seattle, WA, USA
| | - Stephen S. Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Colleen M. Sitlani
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA, USA
| | - Jennifer A. Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Kent D. Taylor
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Hemant Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Ramachandran S. Vasan
- Framingham Heart Study, Framingham, MA, USA
- Department of Quantitative and Qualitative Health Sciences, UT Health San Antonio School of Public Health, San Antonia, TX, USA
| | - Zhe Wang
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Lisa R. Yanek
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Bing Yu
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Kenneth M. Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Jerome I. Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Gina M. Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zhonghua Liu
- Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY, USA
| | - Xihong Lin
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| |
Collapse
|
122
|
Dossa HRG, Bureau A, Maziade M, Lakhal-Chaieb L, Oualkacha K. A novel rare variants association test for binary traits in family-based designs via copulas. Stat Methods Med Res 2023; 32:2096-2122. [PMID: 37832140 PMCID: PMC10683345 DOI: 10.1177/09622802231197977] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/15/2023]
Abstract
With the cost-effectiveness technology in whole-genome sequencing, more sophisticated statistical methods for testing genetic association with both rare and common variants are being investigated to identify the genetic variation between individuals. Several methods which group variants, also called gene-based approaches, are developed. For instance, advanced extensions of the sequence kernel association test, which is a widely used variant-set test, have been proposed for unrelated samples and extended for family data. Family data have been shown to be powerful when analyzing rare variants. However, most of such methods capture familial relatedness using a random effect component within the generalized linear mixed model framework. Therefore, there is a need to develop unified and flexible methods to study the association between a set of genetic variants and a trait, especially for a binary outcome. Copulas are multivariate distribution functions with uniform margins on the [ 0 , 1 ] interval and they provide suitable models to capture familial dependence structure. In this work, we propose a flexible family-based association test for both rare and common variants in the presence of binary traits. The method, termed novel rare variant association test (NRVAT), uses a marginal logistic model and a Gaussian Copula. The latter is employed to model the dependence between relatives. An analytic score-type test is derived. Through simulations, we show that our method can achieve greater power than existing approaches. The proposed model is applied to investigate the association between schizophrenia and bipolar disorder in a family-based cohort consisting of 17 extended families from Eastern Quebec.
Collapse
Affiliation(s)
- Houssou R. G. Dossa
- Département de Mathématiques, Université du Québec à Montréal (UQAM) et, Québec, Canada
| | - Alexandre Bureau
- Département de Médecine Sociale et Préventive, Université Laval, Québec, Canada
- Centre de Recherche CERVO, Quebec, Canada
| | - Michel Maziade
- Centre de Recherche CERVO, Quebec, Canada
- Département de Psychiatrie et Neuroscience, Université Laval, Québec, Canada
| | - Lajmi Lakhal-Chaieb
- Département de Mathématiques et Statistique, Université Laval, Québec, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal (UQAM) et, Québec, Canada
| |
Collapse
|
123
|
St-Pierre J, Oualkacha K. A copula-based set-variant association test for bivariate continuous, binary or mixed phenotypes. Int J Biostat 2023; 19:369-387. [PMID: 36279152 PMCID: PMC10644254 DOI: 10.1515/ijb-2022-0010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2022] [Revised: 05/26/2022] [Accepted: 08/23/2022] [Indexed: 11/15/2022]
Abstract
In genome wide association studies (GWAS), researchers are often dealing with dichotomous and non-normally distributed traits, or a mixture of discrete-continuous traits. However, most of the current region-based methods rely on multivariate linear mixed models (mvLMMs) and assume a multivariate normal distribution for the phenotypes of interest. Hence, these methods are not applicable to disease or non-normally distributed traits. Therefore, there is a need to develop unified and flexible methods to study association between a set of (possibly rare) genetic variants and non-normal multivariate phenotypes. Copulas are multivariate distribution functions with uniform margins on the [0, 1] interval and they provide suitable models to deal with non-normality of errors in multivariate association studies. We propose a novel unified and flexible copula-based multivariate association test (CBMAT) for discovering association between a genetic region and a bivariate continuous, binary or mixed phenotype. We also derive a data-driven analytic p-value procedure of the proposed region-based score-type test. Through simulation studies, we demonstrate that CBMAT has well controlled type I error rates and higher power to detect associations compared with other existing methods, for discrete and non-normally distributed traits. At last, we apply CBMAT to detect the association between two genes located on chromosome 11 and several lipid levels measured on 1477 subjects from the ASLPAC study.
Collapse
Affiliation(s)
- Julien St-Pierre
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada
| | - Karim Oualkacha
- Département de Mathématiques, Université du Québec à Montréal, Montreal, QC, Canada
| |
Collapse
|
124
|
Rajabli F, Kunkle BW. Strategies in Aggregation Tests for Rare Variants. Curr Protoc 2023; 3:e931. [PMID: 37988228 DOI: 10.1002/cpz1.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Genome-wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high-throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well-established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step-by-step analysis of aggregations tests with the hands-on experience using R scripts in four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family-based design of rare-variant analysis, population stratification adjustment, and meta-analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Farid Rajabli
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| |
Collapse
|
125
|
Gupta K, Wiener HW, Tiwari HK, Geisler WM. HLA-DQB1*06 and Select Neighboring HLA Variants Predict Chlamydia Reinfection Risk. Int J Mol Sci 2023; 24:15803. [PMID: 37958786 PMCID: PMC10647357 DOI: 10.3390/ijms242115803] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 10/25/2023] [Accepted: 10/27/2023] [Indexed: 11/15/2023] Open
Abstract
Associations of HLA class II alleles with genital chlamydial infection outcomes have been reported, especially HLA DQB1*06. However, the potential role of DQB1*06 in influencing reinfection risk has still not been established. The purpose of this study was to determine whether the association of DQB1*06 with chlamydia reinfection was impacted by any other nearby HLA class II variants that were also associated with reinfection. We used next-generation sequencing to map HLA class II variants spanning the HLA-DQ and -DR loci. DQB1*06 as well as DQB1*04 were confirmed as significant predictors of chlamydia reinfection, when controlling for age and percent African ancestry. SKAT analysis revealed one region each in DRB1, DRB5, DQA2, and three intergenic regions that had variants associated with reinfection. Further analyses of these variants revealed that rs112651494 within DRB5 and an intergenic SNP rs617058 in DRB1:DQA1 were significantly associated with reinfection, but this did not impact the significance of the association of DQB1*06 or DQB1*04 with reinfection.
Collapse
Affiliation(s)
- Kanupriya Gupta
- Department of Medicine, Division of Infectious Diseases, University of Alabama at Birmingham, Birmingham, AL 35294, USA;
| | - Howard W. Wiener
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA;
| | - Hemant K. Tiwari
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA;
| | - William M. Geisler
- Department of Medicine, Division of Infectious Diseases, University of Alabama at Birmingham, Birmingham, AL 35294, USA;
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL 35294, USA;
| |
Collapse
|
126
|
Das Adhikari S, Cui Y, Wang J. BayesKAT: Bayesian Optimal Kernel-based Test for genetic association studies reveals joint genetic effects in complex diseases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.18.562824. [PMID: 37905124 PMCID: PMC10614916 DOI: 10.1101/2023.10.18.562824] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
GWAS methods have identified individual SNPs significantly associated with specific phenotypes. Nonetheless, many complex diseases are polygenic and are controlled by multiple genetic variants that are usually non-linearly dependent. These genetic variants are marginally less effective and remain undetected in GWAS analysis. Kernel-based tests (KBT), which evaluate the joint effect of a group of genetic variants, are therefore critical for complex disease analysis. However, choosing different kernel functions in KBT can significantly influence the type I error control and power, and selecting the optimal kernel remains a statistically challenging task. A few existing methods suffer from inflated type 1 errors, limited scalability, inferior power, or issues of ambiguous conclusions. Here, we present a new Bayesian framework, BayesKAT( https://github.com/wangjr03/BayesKAT ), which overcomes these kernel specification issues by selecting the optimal composite kernel adaptively from the data while testing genetic associations simultaneously. Furthermore, BayesKAT implements a scalable computational strategy to boost its applicability, especially for high-dimensional cases where other methods become less effective. Based on a series of performance comparisons using both simulated and real large-scale genetics data, BayesKAT outperforms the available methods in detecting complex group-level associations and controlling type I errors simultaneously. Applied on a variety of groups of functionally related genetic variants based on biological pathways, co-expression gene modules, and protein complexes, BayesKAT deciphers the complex genetic basis and provides mechanistic insights into human diseases.
Collapse
|
127
|
Hai Y, Zhao W, Meng Q, Liu L, Wen Y. Bayesian linear mixed model with multiple random effects for family-based genetic studies. Front Genet 2023; 14:1267704. [PMID: 37928242 PMCID: PMC10620972 DOI: 10.3389/fgene.2023.1267704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 09/25/2023] [Indexed: 11/07/2023] Open
Abstract
Motivation: Family-based study design is one of the popular designs used in genetic research, and the whole-genome sequencing data obtained from family-based studies offer many unique features for risk prediction studies. They can not only provide a more comprehensive view of many complex diseases, but also utilize information in the design to further improve the prediction accuracy. While promising, existing analytical methods often ignore the information embedded in the study design and overlook the predictive effects of rare variants, leading to a prediction model with sub-optimal performance. Results: We proposed a Bayesian linear mixed model for the prediction analysis of sequencing data obtained from family-based studies. Our method can not only capture predictive effects from both common and rare variants, but also easily accommodate various disease model assumptions. It uses information embedded in the study design to form surrogates, where the predictive effects from unmeasured/unknown genetic and environmental risk factors can be modelled. Through extensive simulation studies and the analysis of sequencing data obtained from the Michigan State University Twin Registry study, we have demonstrated that the proposed method outperforms commonly adopted techniques. Availability: R package is available at https://github.com/yhai943/FBLMM.
Collapse
Affiliation(s)
- Yang Hai
- Department of Statistics, University of Auckland, Auckland, New Zealand
| | - Wenxuan Zhao
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Qingyu Meng
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Long Liu
- Department of Health Statistics, School of Public Health, Shanxi Medical University, Taiyuan, China
| | - Yalu Wen
- Department of Statistics, University of Auckland, Auckland, New Zealand
| |
Collapse
|
128
|
Falk I, Zhao M, Nait Saada J, Guo Q. Learning the kernel for rare variant genetic association test. Front Genet 2023; 14:1245238. [PMID: 37886683 PMCID: PMC10598548 DOI: 10.3389/fgene.2023.1245238] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Accepted: 09/14/2023] [Indexed: 10/28/2023] Open
Abstract
Introduction: Compared to Genome-Wide Association Studies (GWAS) for common variants, single-marker association analysis for rare variants is underpowered. Set-based association analyses for rare variants are powerful tools that capture some of the missing heritability in trait association studies. Methods: We extend the convex-optimized SKAT (cSKAT) test set procedure which learns from data the optimal convex combination of kernels, to the full Generalised Linear Model (GLM) setting with arbitrary non-genetic covariates. We call this extended cSKAT (ecSKAT) and show that the resulting optimization problem is a quadratic programming problem that can be solved with no additional cost compared to cSKAT. Results: We show that a modified objective is related to an upper bound for the p-value through a decreasing exponential term in the objective function, indicating that optimizing this objective function is a principled way of learning the combination of kernels. We evaluate the performance of the proposed method on continuous and binary traits using simulation studies and illustrate its application using UK Biobank Whole Exome Sequencing data on hand grip strength and systemic lupus erythematosus rare variant association analysis. Discussion: Our proposed ecSKAT method enables correcting for important confounders in association studies such as age, sex or population structure for both quantitative and binary traits. Simulation studies showed that ecSKAT can recover sensible weights and achieve higher power across different sample sizes and misspecification settings. Compared to the burden test and SKAT method, ecSKAT gives a lower p-value for the genes tested in both quantitative and binary traits in the UKBiobank cohort.
Collapse
Affiliation(s)
- Isak Falk
- Department of Computer Science, University College London, London, United Kingdom
- Computational Statistics and Machine Learning, Italian Institute of Technology, Genoa, Italy
| | | | | | - Qi Guo
- BenevolentAI, London, United Kingdom
| |
Collapse
|
129
|
Pan R, Dickie EW, Hawco C, Reid N, Voineskos AN, Park JY. Spatial-extent inference for testing variance components in reliability and heritability studies. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.19.537270. [PMID: 37131799 PMCID: PMC10153210 DOI: 10.1101/2023.04.19.537270] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Clusterwise inference is a popular approach in neuroimaging to increase sensitivity, but most existing methods are currently restricted to the General Linear Model (GLM) for testing mean parameters. Statistical methods for testing variance components, which are critical in neuroimaging studies that involve estimation of narrow-sense heritability or test-retest reliability, are underdeveloped due to methodological and computational challenges, which would potentially lead to low power. We propose a fast and powerful test for variance components called CLEAN-V (CLEAN for testing Variance components). CLEAN-V models the global spatial dependence structure of imaging data and computes a locally powerful variance component test statistic by data-adaptively pooling neighborhood information. Correction for multiple comparisons is achieved by permutations to control family-wise error rate (FWER). Through analysis of task-fMRI data from the Human Connectome Project across five tasks and comprehensive data-driven simulations, we show that CLEAN-V outperforms existing methods in detecting test-retest reliability and narrow-sense heritability with significantly improved power, with the detected areas aligning with activation maps. The computational efficiency of CLEAN-V also speaks of its practical utility, and it is available as an R package.
Collapse
Affiliation(s)
- Ruyi Pan
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
| | - Erin W. Dickie
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Colin Hawco
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Nancy Reid
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
| | - Aristotle N. Voineskos
- The Centre for Addiction and Mental Health, Toronto, ON, M5T 1R8, Canada
- Department of Psychiatry, University of Toronto, Toronto, ON, M5T 1R8, Canada
| | - Jun Young Park
- Department of Statistical Sciences, University of Toronto, Toronto, ON, M5G 1Z5, Canada
- Department of Psychology, University of Toronto, Toronto, ON, M5G 1Z5, Canada
| |
Collapse
|
130
|
Lee WP, Wang H, Dombroski B, Cheng PL, Tucci A, Si YQ, Farrell J, Tzeng JY, Leung YY, Malamon J, Wang LS, Vardarajan B, Farrer L, Schellenberg G. Structural Variation Detection and Association Analysis of Whole-Genome-Sequence Data from 16,905 Alzheimer's Diseases Sequencing Project Subjects. RESEARCH SQUARE 2023:rs.3.rs-3353179. [PMID: 37886469 PMCID: PMC10602095 DOI: 10.21203/rs.3.rs-3353179/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/28/2023]
Abstract
Structural variations (SVs) are important contributors to the genetics of human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. We analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (N = 16,905) and identified 400,234 (168,223 high-quality) SVs. Laboratory validation yielded a sensitivity of 82% (85% for high-quality). We found a significant burden of deletions and duplications in AD cases, particularly for singletons and homozygous events. On AD genes, we observed the ultra-rare SVs associated with the disease, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1. Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, exemplified by a 5k deletion in complete LD with rs143080277 in NCK2. We also identified 16 SVs associated with AD and 13 SVs linked to AD-related pathological/cognitive endophenotypes. This study highlights the pivotal role of SVs in shaping our understanding of AD genetics.
Collapse
|
131
|
Norden-Krichmar TM, Rotroff D, Schwantes-An TH, Bataller R, Goldman D, Nagy LE, Liangpunsakul S. Genomic approaches to explore susceptibility and pathogenesis of alcohol use disorder and alcohol-associated liver disease. Hepatology 2023:01515467-990000000-00586. [PMID: 37796138 PMCID: PMC10985049 DOI: 10.1097/hep.0000000000000617] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Accepted: 08/13/2023] [Indexed: 10/06/2023]
Abstract
Excessive alcohol use is a major risk factor for the development of an alcohol use disorder (AUD) and contributes to a wide variety of other medical illnesses, including alcohol-associated liver disease (ALD). Both AUD and ALD are complex and causally interrelated diseases, and multiple factors other than alcohol consumption are implicated in the disease pathogenesis. While the underlying pathophysiology of AUD and ALD is complex, there is substantial evidence for a genetic susceptibility of both diseases. Current genome-wide association studies indicate that the genes associated with clinical AUD only poorly overlap with the genes identified for heavy drinking and, in turn, neither overlap with the genes identified for ALD. Uncovering the main genetic factors will enable us to identify molecular drivers underlying the pathogenesis, discover potential targets for therapy, and implement patient care early in disease progression. In this review, we described multiple genomic approaches and their implications to investigate the susceptibility and pathogenesis of both AUD and ALD. We concluded our review with a discussion of the knowledge gaps and future research on genomic studies in these 2 diseases.
Collapse
Affiliation(s)
| | - Daniel Rotroff
- Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, OH
| | - Tae-Hwi Schwantes-An
- Department of Medical & Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| | - Ramon Bataller
- Liver Unit, Institut of Digestive and Metabolic Diseases, Hospital Clinic, Barcelona, Spain
- Institut d’Investigacions Biomediques August Pi i Sunyer (IDIBAPS)
| | - David Goldman
- Laboratory of Neurogenetics and Office of the Clinical Director, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD
| | - Laura E. Nagy
- Center for Liver Disease Research, Department of Inflammation and Immunity, Cleveland Clinic, Cleveland, OH
- Gastroenterology and Hepatology, Cleveland Clinic, Cleveland, OH
- Department of Molecular Medicine, Case Western Reserve University, Cleveland, OH
| | - Suthat Liangpunsakul
- Division of Gastroenterology and Hepatology, Department of Medicine, Indianapolis, IN
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, IN
- Roudebush Veterans Administration Medical Center, Indianapolis, IN
| |
Collapse
|
132
|
Huang M, Lyu C, Liu N, Nembhard WN, Witte JS, Hobbs CA, Li M. A gene-based association test of interactions for maternal-fetal genotypes identifies genes associated with nonsyndromic congenital heart defects. Genet Epidemiol 2023; 47:475-495. [PMID: 37341229 DOI: 10.1002/gepi.22533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 04/13/2023] [Accepted: 06/02/2023] [Indexed: 06/22/2023]
Abstract
The risk of congenital heart defects (CHDs) may be influenced by maternal genes, fetal genes, and their interactions. Existing methods commonly test the effects of maternal and fetal variants one-at-a-time and may have reduced statistical power to detect genetic variants with low minor allele frequencies. In this article, we propose a gene-based association test of interactions for maternal-fetal genotypes (GATI-MFG) using a case-mother and control-mother design. GATI-MFG can integrate the effects of multiple variants within a gene or genomic region and evaluate the joint effect of maternal and fetal genotypes while allowing for their interactions. In simulation studies, GATI-MFG had improved statistical power over alternative methods, such as the single-variant test and functional data analysis (FDA) under various disease scenarios. We further applied GATI-MFG to a two-phase genome-wide association study of CHDs for the testing of both common variants and rare variants using 947 CHD case mother-infant pairs and 1306 control mother-infant pairs from the National Birth Defects Prevention Study (NBDPS). After Bonferroni adjustment for 23,035 genes, two genes on chromosome 17, TMEM107 (p = 1.64e-06) and CTC1 (p = 2.0e-06), were identified for significant association with CHD in common variants analysis. Gene TMEM107 regulates ciliogenesis and ciliary protein composition and was found to be associated with heterotaxy. Gene CTC1 plays an essential role in protecting telomeres from degradation, which was suggested to be associated with cardiogenesis. Overall, GATI-MFG outperformed the single-variant test and FDA in the simulations, and the results of application to NBDPS samples are consistent with existing literature supporting the association of TMEM107 and CTC1 with CHDs.
Collapse
Affiliation(s)
- Manyan Huang
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Chen Lyu
- Department of Population Health, New York University Grossman School of Medicine, New York City, New York, USA
| | - Nianjun Liu
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| | - Wendy N Nembhard
- Department of Epidemiology, University of Arkansas for Medical Sciences, Little Rock, Arkansas, USA
| | - John S Witte
- Department of Epidemiology and Population Health, Stanford University, Stanford, California, USA
- Department of Biomedical Data Sciences, Stanford University, Stanford, California, USA
| | - Charlotte A Hobbs
- Rady Children's Institute for Genomic Medicine, San Diego, California, USA
| | - Ming Li
- Department of Epidemiology and Biostatistics, Indiana University Bloomington, Bloomington, Indiana, USA
| |
Collapse
|
133
|
Zhu Y, Ryu S, Tare A, Barzilai N, Atzmon G, Suh Y. Targeted sequencing of the 9p21.3 region reveals association with reduced disease risks in Ashkenazi Jewish centenarians. Aging Cell 2023; 22:e13962. [PMID: 37605876 PMCID: PMC10577543 DOI: 10.1111/acel.13962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 07/22/2023] [Accepted: 08/01/2023] [Indexed: 08/23/2023] Open
Abstract
Genome-wide association studies (GWAS) have pinpointed the chromosomal locus 9p21.3 as a genetic hotspot for various age-related disorders. Common genetic variants in this locus are linked to multiple traits, including coronary artery diseases, cancers, and diabetes. Centenarians are known for their reduced risk and delayed onset of these conditions. To investigate whether this evasion of disease risks involves diminished genetic risks in the 9p21.3 locus, we sequenced this region in an Ashkenazi Jewish centenarian cohort (centenarians: n = 450, healthy controls: n = 500). Risk alleles associated with cancers, glaucoma, CAD, and T2D showed a significant depletion in centenarians. Furthermore, the risk and non-risk genotypes are linked to two distinct low-frequency variant profiles, enriched in controls and centenarians, respectively. Our findings provide evidence that the extreme longevity cohort is associated with collectively lower risks of multiple age-related diseases in the 9p21.3 locus.
Collapse
Affiliation(s)
- Yizhou Zhu
- Department of Obstetrics and GynecologyColumbia UniversityNew York CityNew YorkUSA
| | - Seungjin Ryu
- Department of Pharmacology, College of MedicineHallym UniversityChuncheonGangwonKorea
| | - Archana Tare
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Nir Barzilai
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
- Institute for Aging ResearchAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of MedicineAlbert Einstein College of MedicineBronxNew YorkUSA
| | - Gil Atzmon
- Department of GeneticsAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of MedicineAlbert Einstein College of MedicineBronxNew YorkUSA
- Department of Human Biology, Faculty of Natural SciencesUniversity of HaifaHaifaIsrael
| | - Yousin Suh
- Department of Obstetrics and GynecologyColumbia UniversityNew York CityNew YorkUSA
- Department of Genetics and DevelopmentColumbia UniversityNew York CityNew YorkUSA
| |
Collapse
|
134
|
Jin X, Shi G. Cauchy combination methods for the detection of gene-environment interactions for rare variants related to quantitative phenotypes. Heredity (Edinb) 2023; 131:241-252. [PMID: 37481617 PMCID: PMC10539363 DOI: 10.1038/s41437-023-00640-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 07/09/2023] [Accepted: 07/12/2023] [Indexed: 07/24/2023] Open
Abstract
The characterization of gene-environment interactions (GEIs) can provide detailed insights into the biological mechanisms underlying complex diseases. Despite recent interest in GEIs for rare variants, published GEI tests are underpowered for an extremely small proportion of causal rare variants in a gene or a region. By extending the aggregated Cauchy association test (ACAT), we propose three GEI tests to address this issue: a Cauchy combination GEI test with fixed main effects (CCGEI-F), a Cauchy combination GEI test with random main effects (CCGEI-R), and an omnibus Cauchy combination GEI test (CCGEI-O). ACAT was applied to combine p values of single-variant GEI analyses to obtain CCGEI-F and CCGEI-R and p values of multiple GEI tests were combined in CCGEI-O. Through numerical simulations, for small numbers of causal variants, CCGEI-F, CCGEI-R and CCGEI-O provided approximately 5% higher power than the existing GEI tests INT-FIX and INT-RAN; however, they had slightly higher power than the existing GEI test TOW-GE. For large numbers of causal variants, although CCGEI-F and CCGEI-R exhibited comparable or slightly lower power values than the competing tests, the results were still satisfactory. Among all simulation conditions evaluated, CCGEI-O provided significantly higher power than that of competing GEI tests. We further applied our GEI tests in genome-wide analyses of systolic blood pressure or diastolic blood pressure to detect gene-body mass index (BMI) interactions, using whole-exome sequencing data from UK Biobank. At a suggestive significance level of 1.0 × 10-4, KCNC4, GAR1, FAM120AOS and NT5C3B showed interactions with BMI by our GEI tests.
Collapse
Affiliation(s)
- Xiaoqin Jin
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China.
| | - Gang Shi
- State Key Laboratory of Integrated Services Networks, Xidian University, 2 South Taibai Road, Xi'an, Shaanxi, 710071, China
| |
Collapse
|
135
|
Liang X, Sun H. Weighted Selection Probability to Prioritize Susceptible Rare Variants in Multi-Phenotype Association Studies with Application to a Soybean Genetic Data Set. J Comput Biol 2023; 30:1075-1088. [PMID: 37871292 DOI: 10.1089/cmb.2022.0487] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2023] Open
Abstract
Rare variant association studies with multiple traits or diseases have drawn a lot of attention since association signals of rare variants can be boosted if more than one phenotype outcome is associated with the same rare variants. Most of the existing statistical methods to identify rare variants associated with multiple phenotypes are based on a group test, where a pre-specified genetic region is tested one at a time. However, these methods are not designed to locate susceptible rare variants within the genetic region. In this article, we propose new statistical methods to prioritize rare variants within a genetic region when a group test for the genetic region identifies a statistical association with multiple phenotypes. It computes the weighted selection probability (WSP) of individual rare variants and ranks them from largest to smallest according to their WSP. In simulation studies, we demonstrated that the proposed method outperforms other statistical methods in terms of true positive selection, when multiple phenotypes are correlated with each other. We also applied it to our soybean single nucleotide polymorphism (SNP) data with 13 highly correlated amino acids, where we identified some potentially susceptible rare variants in chromosome 19.
Collapse
Affiliation(s)
- Xianglong Liang
- Department of Statistic, Pusan National University, Busan, Korea
| | - Hokeun Sun
- Department of Statistic, Pusan National University, Busan, Korea
| |
Collapse
|
136
|
Chi J, Xu M, Sheng X, Zhou Y. Association detection between multiple traits and rare variants based on family data via a nonparametric method. PeerJ 2023; 11:e16040. [PMID: 37780393 PMCID: PMC10541022 DOI: 10.7717/peerj.16040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Accepted: 08/15/2023] [Indexed: 10/03/2023] Open
Abstract
Background The rapid development of next-generation sequencing technologies allow people to analyze human complex diseases at the molecular level. It has been shown that rare variants play important roles for human diseases besides common variants. Thus, effective statistical methods need to be proposed to test for the associations between traits (e.g., diseases) and rare variants. Currently, more and more rare genetic variants are being detected throughout the human genome, which demonstrates the possibility to study rare variants. Yet complex diseases are usually measured as a variety of forms, such as binary, ordinal, quantitative, or some mixture of them. Therefore, the genetic mapping problem can be attributable to the association detection between multiple traits and multiple loci, with sufficiently considering the correlated structure among multiple traits. Methods In this article, we construct a new non-parametric statistic by the generalized Kendall's τ theory based on family data. The new test statistic has an asymptotic distribution, it can be used to study the associations between multiple traits and rare variants, which broadens the way to identify genetic factors of human complex diseases. Results We apply our method (called Nonp-FAM) to analyze simulated data and GAW17 data, and conduct comprehensive comparison with some existing methods. Experimental results show that the proposed family-based method is powerful and robust for testing associations between multiple traits and rare variants, even if the data has some population stratification effect.
Collapse
Affiliation(s)
- Jinling Chi
- Department of Statistics, Heilongjiang University, Harbin, China
- School of Mathematics and Statistics, Xidian University, Xi’an, China
| | - Meijuan Xu
- Department of Statistics, Heilongjiang University, Harbin, China
| | - Xiaona Sheng
- School of Information Engineering, Harbin University, Harbin, China
| | - Ying Zhou
- Department of Statistics, Heilongjiang University, Harbin, China
| |
Collapse
|
137
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
138
|
Hong H, Schulze KV, Copeland IE, Atyam M, Kamp K, Hanchard NA, Belmont J, Ringel-Kulka T, Heitkemper M, Shulman RJ. Genetic Variants in Carbohydrate Digestive Enzyme and Transport Genes Associated with Risk of Irritable Bowel Syndrome. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.20.23295800. [PMID: 37790351 PMCID: PMC10543038 DOI: 10.1101/2023.09.20.23295800] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Irritable Bowel Syndrome (IBS) is characterized by abdominal pain and alterations in bowel pattern, such as constipation (IBS-C), diarrhea (IBS-D), or mixed (IBS-M). Since malabsorption of ingested carbohydrates (CHO) can cause abdominal symptoms that closely mimic those of IBS, identifying genetic mutations in CHO digestive enzymes associated with IBS symptoms is critical to ascertain IBS pathophysiology. Through candidate gene association studies, we identify several common variants in TREH, SI, SLC5A1 and SLC2A5 that are associated with IBS symptoms. By investigating rare recessive Mendelian or oligogenic inheritance patterns, we identify case-exclusive rare deleterious variation in known disease genes (SI, LCT, ALDOB, and SLC5A1) as well as candidate disease genes (MGAM and SLC5A2), providing potential evidence of monogenic or oligogenic inheritance in a subset of IBS cases. Finally, our data highlight that moderate to severe IBS-associated gastrointestinal symptoms are often observed in IBS cases carrying one or more of deleterious rare variants.
Collapse
Affiliation(s)
- Hyejeong Hong
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing
| | | | - Ian E. Copeland
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - Manasa Atyam
- Department of Medicine, Baylor College of Medicine
| | - Kendra Kamp
- Department of Biobehavioral Nursing and Health Informatics, University of Washington School of Nursing
| | - Neil A. Hanchard
- Department of Molecular and Human Genetics, Baylor College of Medicine
| | - John Belmont
- Departments of Molecular and Human Genetics and Pediatrics, Baylor College of Medicine
| | - Tamar Ringel-Kulka
- Department of Maternal and Child Health, University of North Carolina at Chapel Hill Gillings School of Global Public Health
| | - Margaret Heitkemper
- Department of Biobehavioral Nursing and Health Informatics, University of Washington School of Nursing
| | - Robert J. Shulman
- Children’s Nutrition Research Center, Department of Pediatrics, Baylor College of Medicine
| |
Collapse
|
139
|
Wang H, Dombroski BA, Cheng PL, Tucci A, Si YQ, Farrell JJ, Tzeng JY, Leung YY, Malamon JS, Wang LS, Vardarajan BN, Farrer LA, Schellenberg GD, Lee WP. Structural Variation Detection and Association Analysis of Whole-Genome-Sequence Data from 16,905 Alzheimer's Diseases Sequencing Project Subjects. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.13.23295505. [PMID: 37745545 PMCID: PMC10516060 DOI: 10.1101/2023.09.13.23295505] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Structural variations (SVs) are important contributors to the genetics of numerous human diseases. However, their role in Alzheimer's disease (AD) remains largely unstudied due to challenges in accurately detecting SVs. Here, we analyzed whole-genome sequencing data from the Alzheimer's Disease Sequencing Project (ADSP, N=16,905 subjects) and identified 400,234 (168,223 high-quality) SVs. We found a significant burden of deletions and duplications in AD cases (OR=1.05, P=0.03), particularly for singletons (OR=1.12, P=0.0002) and homozygous events (OR=1.10, P<0.0004). On AD genes, the ultra-rare SVs, including protein-altering SVs in ABCA7, APP, PLCG2, and SORL1, were associated with AD (SKAT-O P=0.004). Twenty-one SVs are in linkage disequilibrium (LD) with known AD-risk variants, e.g., a deletion (chr2:105731359-105736864) in complete LD (R2=0.99) with rs143080277 (chr2:105749599) in NCK2. We also identified 16 SVs associated with AD and 13 SVs associated with AD-related pathological/cognitive endophenotypes. Our findings demonstrate the broad impact of SVs on AD genetics.
Collapse
Affiliation(s)
- Hui Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Beth A Dombroski
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Po-Liang Cheng
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Albert Tucci
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - Ya-Qin Si
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - John J Farrell
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, MA 02118, USA
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, NC 27695, USA
| | - Yuk Yee Leung
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - John S Malamon
- Department of Surgery, Scholl of Medicine, University of Colorado, CO 80045, USA
| | - Li-San Wang
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Badri N Vardarajan
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University, NY 10032, USA
- Department of Neurology, College of Physicians and Surgeons, Columbia University and the New York Presbyterian Hospital, NY 10032, USA
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University School of Medicine, MA 02118, USA
- Department of Neurology, Boston University School of Medicine, MA 02118, USA
- Department of Ophthalmology, Boston University School of Medicine, MA 02118, USA
- Department of Biostatistics, Boston University School of Public Health, MA 02118, USA
- Department of Epidemiology, Boston University School of Public Health, MA 02118, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| | - Wan-Ping Lee
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
- Penn Neurodegeneration Genomics Center, Perelman School of Medicine, University of Pennsylvania, PA 19104, USA
| |
Collapse
|
140
|
Bass AJ, Bian S, Wingo AP, Wingo TS, Cutler DJ, Epstein MP. Identifying latent genetic interactions in genome-wide association studies using multiple traits. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.11.557155. [PMID: 37745553 PMCID: PMC10515795 DOI: 10.1101/2023.09.11.557155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Genome-wide association studies of complex traits frequently find that SNP-based estimates of heritability are considerably smaller than estimates from classic family-based studies. This 'missing' heritability may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. To circumvent these challenges, we propose a new method to detect genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Our approach, Latent Interaction Testing (LIT), uses the observation that correlated traits with shared latent genetic interactions have trait variance and covariance patterns that differ by genotype. LIT examines the relationship between trait variance/covariance patterns and genotype using a flexible kernel-based framework that is computationally scalable for biobank-sized datasets with a large number of traits. We first use simulated data to demonstrate that LIT substantially increases power to detect latent genetic interactions compared to a trait-by-trait univariate method. We then apply LIT to four obesity-related traits in the UK Biobank and detect genetic variants with interactive effects near known obesity-related genes. Overall, we show that LIT, implemented in the R package lit, uses shared information across traits to improve detection of latent genetic interactions compared to standard approaches.
Collapse
Affiliation(s)
- Andrew J. Bass
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | - Shijia Bian
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| | - Aliza P. Wingo
- Department of Psychiatry, Emory University, Atlanta, GA 30322, USA
| | - Thomas S. Wingo
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
- Department of Neurology, Emory University, Atlanta, GA 30322, USA
| | - David J. Cutler
- Department of Human Genetics, Emory University, Atlanta, GA 30322, USA
| | | |
Collapse
|
141
|
Aldisi R, Hassanin E, Sivalingam S, Buness A, Klinkhammer H, Mayr A, Fröhlich H, Krawitz P, Maj C. Gene-based burden scores identify rare variant associations for 28 blood biomarkers. BMC Genom Data 2023; 24:50. [PMID: 37667186 PMCID: PMC10476296 DOI: 10.1186/s12863-023-01155-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 08/28/2023] [Indexed: 09/06/2023] Open
Abstract
BACKGROUND A relevant part of the genetic architecture of complex traits is still unknown; despite the discovery of many disease-associated common variants. Polygenic risk score (PRS) models are based on the evaluation of the additive effects attributable to common variants and have been successfully implemented to assess the genetic susceptibility for many phenotypes. In contrast, burden tests are often used to identify an enrichment of rare deleterious variants in specific genes. Both kinds of genetic contributions are typically analyzed independently. Many studies suggest that complex phenotypes are influenced by both low effect common variants and high effect rare deleterious variants. The aim of this paper is to integrate the effect of both common and rare functional variants for a more comprehensive genetic risk modeling. METHODS We developed a framework combining gene-based scores based on the enrichment of rare functionally relevant variants with genome-wide PRS based on common variants for association analysis and prediction models. We applied our framework on UK Biobank dataset with genotyping and exome data and considered 28 blood biomarkers levels as target phenotypes. For each biomarker, an association analysis was performed on full cohort using gene-based scores (GBS). The cohort was then split into 3 subsets for PRS construction and feature selection, predictive model training, and independent evaluation, respectively. Prediction models were generated including either PRS, GBS or both (combined). RESULTS Association analyses of the cohort were able to detect significant genes that were previously known to be associated with different biomarkers. Interestingly, the analyses also revealed heterogeneous effect sizes and directionality highlighting the complexity of the blood biomarkers regulation. However, the combined models for many biomarkers show little or no improvement in prediction accuracy compared to the PRS models. CONCLUSION This study shows that rare variants play an important role in the genetic architecture of complex multifactorial traits such as blood biomarkers. However, while rare deleterious variants play a strong role at an individual level, our results indicate that classical common variant based PRS might be more informative to predict the genetic susceptibility at the population level.
Collapse
Affiliation(s)
- Rana Aldisi
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany.
| | - Emadeldin Hassanin
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Luxembourg Center for Systems Biomedicine, University of Luxembourg, Esch-Sur-Alzette, Luxembourg
| | - Sugirthan Sivalingam
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Andreas Buness
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Core Unit for Bioinformatics Analysis, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Hannah Klinkhammer
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Andreas Mayr
- Institute of Medical Biometry, Informatics and Epidemiology, University Hospital Bonn, Bonn, Germany
| | - Holger Fröhlich
- Fraunhofer Institute for Algorithms and Scientific Computing, Sankt Augustin, Germany
- Bonn-Aachen International Center for IT (b-it), University of Bonn, Bonn, Germany
| | - Peter Krawitz
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Carlo Maj
- Institute of Genomic Statistic and Bioinformatics, University Hospital Bonn, Bonn, Germany
- Centre for Human Genetics, University of Marburg, Marburg, Germany
| |
Collapse
|
142
|
Bocher O, Marenne G, Génin E, Perdry H. Ravages: An R package for the simulation and analysis of rare variants in multicategory phenotypes. Genet Epidemiol 2023; 47:450-460. [PMID: 37158367 DOI: 10.1002/gepi.22529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2023] [Revised: 03/27/2023] [Accepted: 04/25/2023] [Indexed: 05/10/2023]
Abstract
Current software packages for the analysis and the simulations of rare variants are only available for binary and continuous traits. Ravages provides solutions in a single R package to perform rare variant association tests for multicategory, binary and continuous phenotypes, to simulate datasets under different scenarios and to compute statistical power. Association tests can be run in the whole genome thanks to C++ implementation of most of the functions, using either RAVA-FIRST, a recently developed strategy to filter and analyse genome-wide rare variants, or user-defined candidate regions. Ravages also includes a simulation module that generates genetic data for cases who can be stratified into several subgroups and for controls. Through comparisons with existing programmes, we show that Ravages complements existing tools and will be useful to study the genetic architecture of complex diseases. Ravages is available on the CRAN at https://cran.r-project.org/web/packages/Ravages/ and maintained on Github at https://github.com/genostats/Ravages.
Collapse
Affiliation(s)
- Ozvan Bocher
- Univ Brest, Inserm, EFS, UMR 1078, GGB, Brest, France
- Institute of Translational Genomics, Helmholtz Zentrum München, Munich, Germany
| | | | | | - Hervé Perdry
- CESP Inserm, U1018, UFR Médecine, Univ Paris-Sud, Université Paris-Saclay, Villejuif, France
| |
Collapse
|
143
|
Xu J, Xu W, Choi J, Brhane Y, Christiani DC, Kothari J, McKay J, Field JK, Davies MPA, Liu G, Amos CI, Hung RJ, Briollais L. Large-scale whole exome sequencing studies identify two genes,CTSL and APOE, associated with lung cancer. PLoS Genet 2023; 19:e1010902. [PMID: 37738239 PMCID: PMC10516417 DOI: 10.1371/journal.pgen.1010902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 08/07/2023] [Indexed: 09/24/2023] Open
Abstract
Common genetic variants associated with lung cancer have been well studied in the past decade. However, only 12.3% heritability has been explained by these variants. In this study, we investigate the contribution of rare variants (RVs) (minor allele frequency <0.01) to lung cancer through two large whole exome sequencing case-control studies. We first performed gene-based association tests using a novel Bayes Factor statistic in the International Lung Cancer Consortium, the discovery study (European, 1042 cases vs. 881 controls). The top genes identified are further assessed in the UK Biobank (European, 630 cases vs. 172 864 controls), the replication study. After controlling for the false discovery rate, we found two genes, CTSL and APOE, significantly associated with lung cancer in both studies. Single variant tests in UK Biobank identified 4 RVs (3 missense variants) in CTSL and 2 RVs (1 missense variant) in APOE stongly associated with lung cancer (OR between 2.0 and 139.0). The role of these genetic variants in the regulation of CTSL or APOE expression remains unclear. If such a role is established, this could have important therapeutic implications for lung cancer patients.
Collapse
Affiliation(s)
- Jingxiong Xu
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - Wei Xu
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Jiyeon Choi
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Yonathan Brhane
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
| | - David C. Christiani
- T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - Jui Kothari
- Department of Environmental Health, T. H. Chan School of Public Health, Harvard University, Boston, Massachusetts, United States of America
| | - James McKay
- International Agency for Research on Cancer, Lyon, France
| | - John K. Field
- Department of Molecular and Clinical Cancer Medicine, The University of Liverpool, Liverpool, United Kingdom
| | - Michael P. A. Davies
- Department of Molecular and Clinical Cancer Medicine, The University of Liverpool, Liverpool, United Kingdom
| | - Geoffrey Liu
- Princess Margaret Cancer Center, University Health Network, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Christopher I. Amos
- Dan L. Duncan Comprehensive Cancer Center, Department of Medicine, Baylor College of Medicine, Houston, Texas, United States of America
- Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, Texas, United States of America
| | - Rayjean J. Hung
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Laurent Briollais
- Prosserman Centre for Population Health Research, Lunenfeld-Tanenbaum Research Institute, Sinai Health, Toronto, Ontario, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
144
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
145
|
Babadi M, Fu JM, Lee SK, Smirnov AN, Gauthier LD, Walker M, Benjamin DI, Zhao X, Karczewski KJ, Wong I, Collins RL, Sanchis-Juan A, Brand H, Banks E, Talkowski ME. GATK-gCNV enables the discovery of rare copy number variants from exome sequencing data. Nat Genet 2023; 55:1589-1597. [PMID: 37604963 PMCID: PMC10904014 DOI: 10.1038/s41588-023-01449-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Accepted: 06/16/2023] [Indexed: 08/23/2023]
Abstract
Copy number variants (CNVs) are major contributors to genetic diversity and disease. While standardized methods, such as the genome analysis toolkit (GATK), exist for detecting short variants, technical challenges have confounded uniform large-scale CNV analyses from whole-exome sequencing (WES) data. Given the profound impact of rare and de novo coding CNVs on genome organization and human disease, we developed GATK-gCNV, a flexible algorithm to discover rare CNVs from sequencing read-depth information, complete with open-source distribution via GATK. We benchmarked GATK-gCNV in 7,962 exomes from individuals in quartet families with matched genome sequencing and microarray data, finding up to 95% recall of rare coding CNVs at a resolution of more than two exons. We used GATK-gCNV to generate a reference catalog of rare coding CNVs in WES data from 197,306 individuals in the UK Biobank, and observed strong correlations between per-gene CNV rates and measures of mutational constraint, as well as rare CNV associations with multiple traits. In summary, GATK-gCNV is a tunable approach for sensitive and specific CNV discovery in WES data, with broad applications.
Collapse
Affiliation(s)
- Mehrtash Babadi
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| | - Jack M Fu
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Samuel K Lee
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Andrey N Smirnov
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Laura D Gauthier
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Mark Walker
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - David I Benjamin
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Xuefang Zhao
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Isaac Wong
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Ryan L Collins
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
| | - Alba Sanchis-Juan
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Harrison Brand
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Eric Banks
- Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Michael E Talkowski
- Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
- Department of Neurology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA.
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
| |
Collapse
|
146
|
Jiang Z, Zhang H, Ahearn TU, Garcia-Closas M, Chatterjee N, Zhu H, Zhan X, Zhao N. The sequence kernel association test for multicategorical outcomes. Genet Epidemiol 2023; 47:432-449. [PMID: 37078108 DOI: 10.1002/gepi.22527] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/29/2023] [Accepted: 03/30/2023] [Indexed: 04/21/2023]
Abstract
Disease heterogeneity is ubiquitous in biomedical and clinical studies. In genetic studies, researchers are increasingly interested in understanding the distinct genetic underpinning of subtypes of diseases. However, existing set-based analysis methods for genome-wide association studies are either inadequate or inefficient to handle such multicategorical outcomes. In this paper, we proposed a novel set-based association analysis method, sequence kernel association test (SKAT)-MC, the sequence kernel association test for multicategorical outcomes (nominal or ordinal), which jointly evaluates the relationship between a set of variants (common and rare) and disease subtypes. Through comprehensive simulation studies, we showed that SKAT-MC effectively preserves the nominal type I error rate while substantially increases the statistical power compared to existing methods under various scenarios. We applied SKAT-MC to the Polish breast cancer study (PBCS), and identified gene FGFR2 was significantly associated with estrogen receptor (ER)+ and ER- breast cancer subtypes. We also investigated educational attainment using UK Biobank data (N = 127 , 127 $N=127,127$ ) with SKAT-MC, and identified 21 significant genes in the genome. Consequently, SKAT-MC is a powerful and efficient analysis tool for genetic association studies with multicategorical outcomes. A freely distributed R package SKAT-MC can be accessed at https://github.com/Zhiwen-Owen-Jiang/SKATMC.
Collapse
Affiliation(s)
- Zhiwen Jiang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Haoyu Zhang
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Thomas U Ahearn
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Montserrat Garcia-Closas
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
| | - Nilanjan Chatterjee
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| | - Hongtu Zhu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
| | - Xiang Zhan
- Department of Biostatistics, Peking University, Beijing, China
| | - Ni Zhao
- Department of Biostatistics, Johns Hopkins University, Baltimore, Maryland, USA
| |
Collapse
|
147
|
Hu X, Jiang X, Li J, Zhao N, Gan H, Hu X, Li L, Liu X, Shan H, Bai Y, Pang P. Identification of potential genetic Loci and polygenic risk model for Budd-Chiari syndrome in Chinese population. iScience 2023; 26:107287. [PMID: 37539039 PMCID: PMC10393737 DOI: 10.1016/j.isci.2023.107287] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 05/19/2023] [Accepted: 07/02/2023] [Indexed: 08/05/2023] Open
Abstract
Budd-Chiari syndrome (BCS) is characterized by hepatic venous outflow obstruction, posing life-threatening risks in severe cases. Reported risk factors include inherited and acquired hypercoagulable states or other predisposing factors. However, many patients have no identifiable etiology, and causes of BCS differ between the West and East. This study recruited 500 BCS patients and 696 normal individuals for whole-exome sequencing and developed a polygenic risk scoring (PRS) model using PLINK, LASSOSUM, BLUP, and BayesA methods. Risk factors for venous thromboembolism and vascular malformations were also assessed for BCS risk prediction. Ultimately, we discovered potential BCS risk mutations, such as rs1042331, and the optimal BayesA-generated PRS model presented an AUC >0.9 in the external replication cohort. This model provides particular insights into genetic risk differences between China and the West and suggests shared genetic risks among BCS, venous thromboembolism, and vascular malformations, offering different perspectives on BCS pathogenesis.
Collapse
Affiliation(s)
- Xiaojun Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xiaosen Jiang
- BGI-Shenzhen, Shenzhen, China
- College of Life Sciences, University of the Chinese Academy of Sciences, Beijing, China
| | - Jia Li
- BGI Genomics, BGI-Shenzhen, Shenzhen, China
- Hebei Industrial Technology Research Institute of Genomics in Maternal & Child Health, Shijiazhuang BGI Genomics Co., Ltd, Shijiazhuang, China
| | - Ni Zhao
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Hairun Gan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xinyan Hu
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Luting Li
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | - Xingtao Liu
- Changfeng Hospital of Jinjiang District, Chengdu, China
| | - Hong Shan
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
| | | | - Pengfei Pang
- Center for Interventional Medicine, Fifth Affiliated Hospital of Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Key Laboratory of Biomedical Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
- Guangdong Provincial Engineering Research Center of Molecular Imaging, Fifth Affiliated Hospital, Sun Yat-sen University, Zhuhai, China
| |
Collapse
|
148
|
Fu B, Pazokitoroudi A, Sudarshan M, Liu Z, Subramanian L, Sankararaman S. Fast kernel-based association testing of non-linear genetic effects for biobank-scale data. Nat Commun 2023; 14:4936. [PMID: 37582955 PMCID: PMC10427662 DOI: 10.1038/s41467-023-40346-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2022] [Accepted: 07/18/2023] [Indexed: 08/17/2023] Open
Abstract
Our knowledge of non-linear genetic effects on complex traits remains limited, in part, due to the modest power to detect such effects. While kernel-based tests offer a versatile approach to test for non-linear relationships between sets of genetic variants and traits, current approaches cannot be applied to Biobank-scale datasets containing hundreds of thousands of individuals. We propose, FastKAST, a kernel-based approach that can test for non-linear effects of a set of variants on a quantitative trait. FastKAST provides calibrated hypothesis tests while enabling analysis of Biobank-scale datasets with hundreds of thousands of unrelated individuals from a homogeneous population. We apply FastKAST to 53 quantitative traits measured across ≈ 300 K unrelated white British individuals in the UK Biobank to detect sets of variants with non-linear effects at genome-wide significance.
Collapse
Affiliation(s)
- Boyang Fu
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
| | | | - Mukund Sudarshan
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
| | - Zhengtong Liu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Lakshminarayanan Subramanian
- Department of Computer Science, Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
- Department of Population Health, NYU Grossman School of Medicine, New York, NY, USA
| | - Sriram Sankararaman
- Department of Computer Science, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
149
|
Stamp J, DenAdel A, Weinreich D, Crawford L. Leveraging the genetic correlation between traits improves the detection of epistasis in genome-wide association studies. G3 (BETHESDA, MD.) 2023; 13:jkad118. [PMID: 37243672 PMCID: PMC10484060 DOI: 10.1093/g3journal/jkad118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/11/2023] [Revised: 01/11/2023] [Accepted: 05/23/2023] [Indexed: 05/29/2023]
Abstract
Epistasis, commonly defined as the interaction between genetic loci, is known to play an important role in the phenotypic variation of complex traits. As a result, many statistical methods have been developed to identify genetic variants that are involved in epistasis, and nearly all of these approaches carry out this task by focusing on analyzing one trait at a time. Previous studies have shown that jointly modeling multiple phenotypes can often dramatically increase statistical power for association mapping. In this study, we present the "multivariate MArginal ePIstasis Test" (mvMAPIT)-a multioutcome generalization of a recently proposed epistatic detection method which seeks to detect marginal epistasis or the combined pairwise interaction effects between a given variant and all other variants. By searching for marginal epistatic effects, one can identify genetic variants that are involved in epistasis without the need to identify the exact partners with which the variants interact-thus, potentially alleviating much of the statistical and computational burden associated with conventional explicit search-based methods. Our proposed mvMAPIT builds upon this strategy by taking advantage of correlation structure between traits to improve the identification of variants involved in epistasis. We formulate mvMAPIT as a multivariate linear mixed model and develop a multitrait variance component estimation algorithm for efficient parameter inference and P-value computation. Together with reasonable model approximations, our proposed approach is scalable to moderately sized genome-wide association studies. With simulations, we illustrate the benefits of mvMAPIT over univariate (or single-trait) epistatic mapping strategies. We also apply mvMAPIT framework to protein sequence data from two broadly neutralizing anti-influenza antibodies and approximately 2,000 heterogeneous stock of mice from the Wellcome Trust Centre for Human Genetics. The mvMAPIT R package can be downloaded at https://github.com/lcrawlab/mvMAPIT.
Collapse
Affiliation(s)
- Julian Stamp
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Alan DenAdel
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
| | - Daniel Weinreich
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Ecology, Evolution, and Organismal Biology, Brown University, Providence, RI 02906, USA
| | - Lorin Crawford
- Center for Computational Molecular Biology, Brown University, Providence, RI 02906, USA
- Department of Biostatistics, Brown University, Providence, RI 02903, USA
- Microsoft Research New England, Cambridge, MA 02142, USA
| |
Collapse
|
150
|
McCaw ZR, O'Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. An allelic-series rare-variant association test for candidate-gene discovery. Am J Hum Genet 2023; 110:1330-1342. [PMID: 37494930 PMCID: PMC10432147 DOI: 10.1016/j.ajhg.2023.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany; Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|