1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Dai J, Chen K, Zhu Y, Xia L, Wang T, Yuan Z, Zeng P. Identifying risk loci for obsessive-compulsive disorder and shared genetic component with schizophrenia: A large-scale multi-trait association analysis with summary statistics. Prog Neuropsychopharmacol Biol Psychiatry 2024; 129:110906. [PMID: 38043635 DOI: 10.1016/j.pnpbp.2023.110906] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/23/2023] [Revised: 11/26/2023] [Accepted: 11/28/2023] [Indexed: 12/05/2023]
Abstract
Due to limited samples, no genetic loci have been identified for obsessive-compulsive disorder (OCD) in genome-wide association studies. Additionally, although co-morbidities between OCD and schizophrenia (SCZ) were observed, their common genetic etiology was not completely known. Here, we conducted a comprehensive investigation regarding the genetic architecture of OCD and the common genetic foundation shared by OCD and SCZ using summary statistics data (2688 cases and 7037 controls for OCD; 53,386 cases and 77,258 controls for SCZ). We discovered significant genetic correlation between OCD and SCZ (r̂g=0.296, P = 2.82 × 10-11). We then performed two multi-trait association analyses to detect OCD-associated loci and colocalization analysis to detect causal variants. Parallel gene-level analyses were also implemented. We identified 323 OCD-relevant variants located within 12 loci, with four loci shared the same causal variants between OCD and SCZ. Further, the gene-level analyses discovered 8 OCD-associated genes. Finally, multiple functional analyses at both SNP and gene levels showed that these genetic association signals had significant enrichments in the regions of left ventricle and anterior cingulate cortex, and suggested an important role of pathways involving regulation of telomere maintenance, histone phosphorylation, and GnRH secretion. Overall, this study identified new genetic loci for OCD and provided substantial evidence supporting common genetic foundation underlying OCD and SCZ. The findings advanced our understanding of genetic architecture and pathophysiology of OCD as well as shedding light on shared genetic etiology of the two disorders.
Collapse
Affiliation(s)
- Jing Dai
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Keying Chen
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Yiyang Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Lei Xia
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhongshang Yuan
- Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China; Institute for Medical Dataology, Cheeloo College of Medicine, Shandong University, Jinan, Shandong 250012, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
3
|
Chen K, Gao T, Liu Y, Zhu K, Wang T, Zeng P. Identifying risk loci for FTD and shared genetic component with ALS: A large-scale multitrait association analysis. Neurobiol Aging 2024; 134:28-39. [PMID: 37979250 DOI: 10.1016/j.neurobiolaging.2023.09.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/18/2023] [Accepted: 09/25/2023] [Indexed: 11/20/2023]
Abstract
Current genome-wide association studies of frontotemporal dementia (FTD) are underpowered due to limited samples. Further, common genetic etiologies between FTD and amyotrophic lateral sclerosis (ALS) remain unknown. Using the largest summary statistics of FTD (3526 cases and 9402 controls) and ALS (27,205 cases and 110,881 controls), we found a significant genetic correlation between them (rˆg = 0.637, P = 0.032) and identified 190 FTD-related variants within 5 loci (3p22.1, 5q35.1, 9p21.2, 19p13.11, and 20q13.13). Among these, ALS and FTD had causal variants in 9p21.2 and 19p13.11. Moreover, MOBP (3p22.1), C9orf72 (9p21.2), MOB3B (9p21.2), UNC13A (19p13.11), SLC9A8 (20q13.13), SNAI1 (20q13.13), and SPATA2 (20q13.13) were discovered by both SNP- and gene-level analyses, which together discovered 15 FTD-associated genes, with 10 not detected before (IFNK, RNF114, SLC9A8, SPATA2, SNAI1, SCFD1, POLDIP2, TMEM97, G2E3, and PIGW). Functional analyses showed these genes were enriched in heart left ventricle, kidney cortex, and some brain regions. Overall, this study provides insights into genetic determinants of FTD and shared genetic etiology underlying FTD and ALS.
Collapse
Affiliation(s)
- Keying Chen
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Tongyu Gao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ying Liu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Kexuan Zhu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; Biological Data Mining and Healthcare Transformation Innovation Engineering Research Center, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
4
|
Gao B, Zhou X. MESuSiE enables scalable and powerful multi-ancestry fine-mapping of causal variants in genome-wide association studies. Nat Genet 2024; 56:170-179. [PMID: 38168930 DOI: 10.1038/s41588-023-01604-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 10/30/2023] [Indexed: 01/05/2024]
Abstract
Fine-mapping in genome-wide association studies attempts to identify causal SNPs from a set of candidate SNPs in a local genomic region of interest and is commonly performed in one genetic ancestry at a time. Here, we present multi-ancestry sum of the single effects model (MESuSiE), a probabilistic multi-ancestry fine-mapping method, to improve the accuracy and resolution of fine-mapping by leveraging association information across ancestries. MESuSiE uses summary statistics as input, accounts for the diverse linkage disequilibrium pattern observed in different ancestries, explicitly models both shared and ancestry-specific causal SNPs, and relies on a variational inference algorithm for scalable computation. We evaluated the performance of MESuSiE through comprehensive simulations and multi-ancestry fine-mapping of four lipid traits with both European and African samples. In the real data, MESuSiE improves fine-mapping resolution by 19.0% to 72.0% compared to existing approaches, is an order of magnitude faster, and captures and categorizes shared and ancestry-specific causal signals with enhanced functional enrichment.
Collapse
Affiliation(s)
- Boran Gao
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA.
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA.
| |
Collapse
|
5
|
Qiao J, Wu Y, Zhang S, Xu Y, Zhang J, Zeng P, Wang T. Evaluating significance of European-associated index SNPs in the East Asian population for 31 complex phenotypes. BMC Genomics 2023; 24:324. [PMID: 37312035 DOI: 10.1186/s12864-023-09425-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified many single-nucleotide polymorphisms (SNPs) associated with complex phenotypes in the European (EUR) population; however, the extent to which EUR-associated SNPs can be generalized to other populations such as East Asian (EAS) is not clear. RESULTS By leveraging summary statistics of 31 phenotypes in the EUR and EAS populations, we first evaluated the difference in heritability between the two populations and calculated the trans-ethnic genetic correlation. We observed the heritability estimates of some phenotypes varied substantially across populations and 53.3% of trans-ethnic genetic correlations were significantly smaller than one. Next, we examined whether EUR-associated SNPs of these phenotypes could be identified in EAS using the trans-ethnic false discovery rate method while accounting for winner's curse for SNP effect in EUR and difference of sample sizes in EAS. We found on average 54.5% of EUR-associated SNPs were also significant in EAS. Furthermore, we discovered non-significant SNPs had higher effect heterogeneity, and significant SNPs showed more consistent linkage disequilibrium and allele frequency patterns between the two populations. We also demonstrated non-significant SNPs were more likely to undergo natural selection. CONCLUSIONS Our study revealed the extent to which EUR-associated SNPs could be significant in the EAS population and offered deep insights into the similarity and diversity of genetic architectures underlying phenotypes in distinct ancestral groups.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yue Xu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
6
|
Zhang J, Zhang S, Qiao J, Wang T, Zeng P. Similarity and diversity of genetic architecture for complex traits between East Asian and European populations. BMC Genomics 2023; 24:314. [PMID: 37308816 DOI: 10.1186/s12864-023-09434-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 06/07/2023] [Indexed: 06/14/2023] Open
Abstract
BACKGROUND Genome-wide association studies have detected a large number of single-nucleotide polymorphisms (SNPs) associated with complex traits in diverse ancestral groups. However, the trans-ethnic similarity and diversity of genetic architecture is not well understood currently. RESULTS By leveraging summary statistics of 37 traits from East Asian (Nmax=254,373) or European (Nmax=693,529) populations, we first evaluated the trans-ethnic genetic correlation (ρg) and found substantial evidence of shared genetic overlap underlying these traits between the two populations, with [Formula: see text] ranging from 0.53 (se = 0.11) for adult-onset asthma to 0.98 (se = 0.17) for hemoglobin A1c. However, 88.9% of the genetic correlation estimates were significantly less than one, indicating potential heterogeneity in genetic effect across populations. We next identified common associated SNPs using the conjunction conditional false discovery rate method and observed 21.7% of trait-associated SNPs can be identified simultaneously in both populations. Among these shared associated SNPs, 20.8% showed heterogeneous influence on traits between the two ancestral populations. Moreover, we demonstrated that population-common associated SNPs often exhibited more consistent linkage disequilibrium and allele frequency pattern across ancestral groups compared to population-specific or null ones. We also revealed population-specific associated SNPs were much likely to undergo natural selection compared to population-common associated SNPs. CONCLUSIONS Our study provides an in-depth understanding of similarity and diversity regarding genetic architecture for complex traits across diverse populations, and can assist in trans-ethnic association analysis, genetic risk prediction, and causal variant fine mapping.
Collapse
Affiliation(s)
- Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| |
Collapse
|
7
|
Genetic correlation and gene-based pleiotropy analysis for four major neurodegenerative diseases with summary statistics. Neurobiol Aging 2023; 124:117-128. [PMID: 36740554 DOI: 10.1016/j.neurobiolaging.2022.12.012] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2021] [Revised: 03/25/2022] [Accepted: 12/27/2022] [Indexed: 01/02/2023]
Abstract
Recent genome-wide association studies suggested shared genetic components between neurodegenerative diseases. However, pleiotropic association patterns among them remain poorly understood. We here analyzed 4 major neurodegenerative diseases including Alzheimer's disease (AD), Parkinson's disease (PD), frontotemporal dementia (FTD) and amyotrophic lateral sclerosis (ALS), and found suggestively positive genetic correlation. We next implemented a gene-centric pleiotropy analysis with a powerful method called PLACO and detected 280 pleiotropic associations (226 unique genes) with these diseases. Functional analyses demonstrated that these genes were enriched in the pancreas, liver, heart, blood, brain, and muscle tissues; and that 42 pleiotropic genes exhibited drug-gene interactions with 341 drugs. Using Mendelian randomization, we discovered that AD and PD can increase the risk of developing ALS, and that AD and ALS can also increase the risk of developing FTD, respectively. Overall, this study provides in-depth insights into shared genetic components and causal relationship among the 4 major neurodegenerative diseases, indicating genetic overlap and causality commonly drive their co-occurrence. It also has important implications on the etiology understanding, drug development and therapeutic targets for neurodegenerative diseases.
Collapse
|
8
|
Qiao J, Shao Z, Wu Y, Zeng P, Wang T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. Lab Invest 2022; 20:424. [PMID: 36138484 PMCID: PMC9503281 DOI: 10.1186/s12967-022-03637-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022]
Abstract
Background Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking. Methods By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement. Results Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones. Conclusion Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03637-8.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
9
|
Zhang M, Qiao J, Zhang S, Zeng P. Exploring the association between birthweight and breast cancer using summary statistics from a perspective of genetic correlation, mediation, and causality. J Transl Med 2022; 20:227. [PMID: 35568861 PMCID: PMC9107660 DOI: 10.1186/s12967-022-03435-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2021] [Accepted: 04/04/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND Previous studies demonstrated a positive relationship between birthweight and breast cancer; however, inconsistent, sometimes even controversial, observations also emerged, and the nature of such relationship remains unknown. METHODS Using summary statistics of birthweight and breast cancer, we assessed the fetal/maternal-specific genetic correlation between them via LDSC and prioritized fetal/maternal-specific pleiotropic genes through MAIUP. Relying on summary statistics we conducted Mendelian randomization (MR) to evaluate the fetal/maternal-specific origin of causal relationship between birthweight, age of menarche, age at menopause and breast cancer. RESULTS With summary statistics we identified a positive genetic correlation between fetal-specific birthweight and breast cancer (rg = 0.123 and P = 0.013) as well as a negative but insignificant correlation between maternal-specific birthweight and breast cancer (rg = - 0.068, P = 0.206); and detected 84 pleiotropic genes shared by fetal-specific birthweight and breast cancer, 49 shared by maternal-specific birthweight and breast cancer. We also revealed fetal-specific birthweight indirectly influenced breast cancer risk in adulthood via the path of age of menarche or age at menopause in terms of MR-based mediation analysis. CONCLUSION This study reveals that shared genetic foundation and causal mediation commonly drive the connection between the two traits, and that fetal/maternal-specific birthweight plays substantially distinct roles in such relationship. However, our work offers little supportive evidence for the fetal origins hypothesis of breast cancer originating in utero.
Collapse
Affiliation(s)
- Meng Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
10
|
An integrated framework for local genetic correlation analysis. Nat Genet 2022; 54:274-282. [PMID: 35288712 DOI: 10.1038/s41588-022-01017-y] [Citation(s) in RCA: 108] [Impact Index Per Article: 54.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2021] [Accepted: 01/20/2022] [Indexed: 12/16/2022]
Abstract
Genetic correlation (rg) analysis is used to identify phenotypes that may have a shared genetic basis. Traditionally, rg is studied globally, considering only the average of the shared signal across the genome, although this approach may fail when the rg is confined to particular genomic regions or in opposing directions at different loci. Current tools for local rg analysis are restricted to analysis of two phenotypes. Here we introduce LAVA, an integrated framework for local rg analysis that, in addition to testing the standard bivariate local rgs between two phenotypes, can evaluate local heritabilities and analyze conditional genetic relations between several phenotypes using partial correlation and multiple regression. Applied to 25 behavioral and health phenotypes, we show considerable heterogeneity in the bivariate local rgs across the genome, which is often masked by the global rg patterns, and demonstrate how our conditional approaches can elucidate more complex, multivariate genetic relations.
Collapse
|
11
|
Wang T, Qiao J, Zhang S, Wei Y, Zeng P. Simultaneous test and estimation of total genetic effect in eQTL integrative analysis through mixed models. Brief Bioinform 2022; 23:6535679. [PMID: 35212359 DOI: 10.1093/bib/bbac038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2021] [Revised: 01/22/2022] [Accepted: 02/07/2021] [Indexed: 11/14/2022] Open
Abstract
Integration of expression quantitative trait loci (eQTL) into genome-wide association studies (GWASs) is a promising manner to reveal functional roles of associated single-nucleotide polymorphisms (SNPs) in complex phenotypes and has become an active research field in post-GWAS era. However, how to efficiently incorporate eQTL mapping study into GWAS for prioritization of causal genes remains elusive. We herein proposed a novel method termed as Mixed transcriptome-wide association studies (TWAS) and mediated Variance estimation (MTV) by modeling the effects of cis-SNPs of a gene as a function of eQTL. MTV formulates the integrative method and TWAS within a unified framework via mixed models and therefore includes many prior methods/tests as special cases. We further justified MTV from another two statistical perspectives of mediation analysis and two-stage Mendelian randomization. Relative to existing methods, MTV is superior for pronounced features including the processing of direct effects of cis-SNPs on phenotypes, the powerful likelihood ratio test for assessment of joint effects of cis-SNPs and genetically regulated gene expression (GReX), two useful quantities to measure relative genetic contributions of GReX and cis-SNPs to phenotypic variance, and the computationally efferent parameter expansion expectation maximum algorithm. With extensive simulations, we identified that MTV correctly controlled the type I error in joint evaluation of the total genetic effect and proved more powerful to discover true association signals across various scenarios compared to existing methods. We finally applied MTV to 41 complex traits/diseases available from three GWASs and discovered many new associated genes that had otherwise been missed by existing methods. We also revealed that a small but substantial fraction of phenotypic variation was mediated by GReX. Overall, MTV constructs a robust and realistic modeling foundation for integrative omics analysis and has the advantage of offering more attractive biological interpretations of GWAS results.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Jiahao Qiao
- Department of Biostatistics at Xuzhou Medical University, China
| | - Shuo Zhang
- Department of Biostatistics at Xuzhou Medical University, China
| | - Yongyue Wei
- Department of Biostatistics at Nanjing Medical University, China
| | - Ping Zeng
- Department of Biostatistics, Center for Medical Statistics and Data Analysis and Key Laboratory of Human Genetics and Environmental Medicine at Xuzhou Medical University, China
| |
Collapse
|
12
|
Sutton M, Sugier PE, Truong T, Liquet B. Leveraging pleiotropic association using sparse group variable selection in genomics data. BMC Med Res Methodol 2022; 22:9. [PMID: 34996381 PMCID: PMC8742466 DOI: 10.1186/s12874-021-01491-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 12/03/2021] [Indexed: 12/04/2022] Open
Abstract
Background Genome-wide association studies (GWAS) have identified genetic variants associated with multiple complex diseases. We can leverage this phenomenon, known as pleiotropy, to integrate multiple data sources in a joint analysis. Often integrating additional information such as gene pathway knowledge can improve statistical efficiency and biological interpretation. In this article, we propose statistical methods which incorporate both gene pathway and pleiotropy knowledge to increase statistical power and identify important risk variants affecting multiple traits. Methods We propose novel feature selection methods for the group variable selection in multi-task regression problem. We develop penalised likelihood methods exploiting different penalties to induce structured sparsity at a gene (or pathway) and SNP level across all studies. We implement an alternating direction method of multipliers (ADMM) algorithm for our penalised regression methods. The performance of our approaches are compared to a subset based meta analysis approach on simulated data sets. A bootstrap sampling strategy is provided to explore the stability of the penalised methods. Results Our methods are applied to identify potential pleiotropy in an application considering the joint analysis of thyroid and breast cancers. The methods were able to detect eleven potential pleiotropic SNPs and six pathways. A simulation study found that our method was able to detect more true signals than a popular competing method while retaining a similar false discovery rate. Conclusion We developed feature selection methods for jointly analysing multiple logistic regression tasks where prior grouping knowledge is available. Our method performed well on both simulation studies and when applied to a real data analysis of multiple cancers.
Collapse
Affiliation(s)
- Matthew Sutton
- Queensland University of Technology Centre for Data Science, Brisbane, Australia.
| | - Pierre-Emmanuel Sugier
- Laboratoire De Mathématiques et de leurs Applications de PAU E2S UPPA, CNRS, Pau, France.,University Paris-Saclay, UVSQ, Inserm, Gustave Roussy, CESP, Team "Exposome and Heredity", Villejuif, France
| | - Therese Truong
- University Paris-Saclay, UVSQ, Inserm, Gustave Roussy, CESP, Team "Exposome and Heredity", Villejuif, France
| | - Benoit Liquet
- Laboratoire De Mathématiques et de leurs Applications de PAU E2S UPPA, CNRS, Pau, France.,Department of Mathematics and Statistics, Macquarie University, Sydney, Australia
| |
Collapse
|
13
|
Lu H, Qiao J, Shao Z, Wang T, Huang S, Zeng P. A comprehensive gene-centric pleiotropic association analysis for 14 psychiatric disorders with GWAS summary statistics. BMC Med 2021; 19:314. [PMID: 34895209 PMCID: PMC8667366 DOI: 10.1186/s12916-021-02186-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 08/16/2021] [Accepted: 11/10/2021] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Recent genome-wide association studies (GWASs) have revealed the polygenic nature of psychiatric disorders and discovered a few of single-nucleotide polymorphisms (SNPs) associated with multiple psychiatric disorders. However, the extent and pattern of pleiotropy among distinct psychiatric disorders remain not completely clear. METHODS We analyzed 14 psychiatric disorders using summary statistics available from the largest GWASs by far. We first applied the cross-trait linkage disequilibrium score regression (LDSC) to estimate genetic correlation between disorders. Then, we performed a gene-based pleiotropy analysis by first aggregating a set of SNP-level associations into a single gene-level association signal using MAGMA. From a methodological perspective, we viewed the identification of pleiotropic associations across the entire genome as a high-dimensional problem of composite null hypothesis testing and utilized a novel method called PLACO for pleiotropy mapping. We ultimately implemented functional analysis for identified pleiotropic genes and used Mendelian randomization for detecting causal association between these disorders. RESULTS We confirmed extensive genetic correlation among psychiatric disorders, based on which these disorders can be grouped into three diverse categories. We detected a large number of pleiotropic genes including 5884 associations and 2424 unique genes and found that differentially expressed pleiotropic genes were significantly enriched in pancreas, liver, heart, and brain, and that the biological process of these genes was remarkably enriched in regulating neurodevelopment, neurogenesis, and neuron differentiation, offering substantial evidence supporting the validity of identified pleiotropic loci. We further demonstrated that among all the identified pleiotropic genes there were 342 unique ones linked with 6353 drugs with drug-gene interaction which can be classified into distinct types including inhibitor, agonist, blocker, antagonist, and modulator. We also revealed causal associations among psychiatric disorders, indicating that genetic overlap and causality commonly drove the observed co-existence of these disorders. CONCLUSIONS Our study is among the first large-scale effort to characterize gene-level pleiotropy among a greatly expanded set of psychiatric disorders and provides important insight into shared genetic etiology underlying these disorders. The findings would inform psychiatric nosology, identify potential neurobiological mechanisms predisposing to specific clinical presentations, and pave the way to effective drug targets for clinical treatment.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
14
|
Song Y, Zhou X, Kang J, Aung MT, Zhang M, Zhao W, Needham BL, Kardia SLR, Liu Y, Meeker JD, Smith JA, Mukherjee B. Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects. J R Stat Soc Ser C Appl Stat 2021; 70:1391-1412. [PMID: 34887595 PMCID: PMC8653861 DOI: 10.1111/rssc.12518] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modeling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.
Collapse
Affiliation(s)
- Yanyi Song
- University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- University of Michigan, Ann Arbor, MI, USA
| | - Jian Kang
- University of Michigan, Ann Arbor, MI, USA
| | - Max T Aung
- University of Michigan, Ann Arbor, MI, USA
| | - Min Zhang
- University of Michigan, Ann Arbor, MI, USA
| | - Wei Zhao
- University of Michigan, Ann Arbor, MI, USA
| | | | | | | | | | | | | |
Collapse
|
15
|
Ma Y, Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet 2021; 37:995-1011. [PMID: 34243982 PMCID: PMC8511058 DOI: 10.1016/j.tig.2021.06.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2021] [Revised: 05/31/2021] [Accepted: 06/03/2021] [Indexed: 01/03/2023]
Abstract
Accurate genetic prediction of complex traits can facilitate disease screening, improve early intervention, and aid in the development of personalized medicine. Genetic prediction of complex traits requires the development of statistical methods that can properly model polygenic architecture and construct a polygenic score (PGS). We present a comprehensive review of 46 methods for PGS construction. We connect the majority of these methods through a multiple linear regression framework which can be instrumental for understanding their prediction performance for traits with distinct genetic architectures. We discuss the practical considerations of PGS analysis as well as challenges and future directions of PGS method development. We hope our review serves as a useful reference both for statistical geneticists who develop PGS methods and for data analysts who perform PGS analysis.
Collapse
Affiliation(s)
- Ying Ma
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA; Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
16
|
Chen H, Zhang J, Wang T, Zhang S, Lai Q, Huang S, Zeng P. Type 2 Diabetes Mellitus and Amyotrophic Lateral Sclerosis: Genetic Overlap, Causality, and Mediation. J Clin Endocrinol Metab 2021; 106:e4497-e4508. [PMID: 34171091 DOI: 10.1210/clinem/dgab465] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Indexed: 12/21/2022]
Abstract
CONTEXT Understanding phenotypic connection between type II diabetes (T2D) mellitus and amyotrophic lateral sclerosis (ALS) can offer valuable sight into shared disease etiology and have important implication in drug repositioning and therapeutic intervention. OBJECTIVE This work aims to disentangle the nature of the inverse relationship between T2D mellitus and ALS. METHODS Depending on summary statistics of T2D (n = 898 130) and ALS (n = 80 610), we estimated the genetic correlation between them and prioritized pleiotropic genes through a multiple-tissue expression quantitative trait loci-weighted integrative analysis and the conjunction conditional false discovery rate (ccFDR) method. We implemented mendelian randomization (MR) analyses to evaluate the causal relationship between the 2 diseases. A mediation analysis was performed to assess the mediating role of T2D in the pathway from T2D-related glycemic/anthropometric traits to ALS. RESULTS We found supportive evidence of a common genetic foundation between T2D and ALS (rg = -0.223, P = .004) and identified 8 pleiotropic genes (ccFDR < 0.10). The MR analyses confirmed that T2D exhibited a neuroprotective effect on ALS, leading to an approximately 5% (95% CI, 0% ~ 9.6%, P = .038) reduction in disease risk. In contrast, no substantial evidence was observed that supported the causal influence of ALS on T2D. The mediation analysis revealed T2D can also serve as an active mediator for several glycemic/anthropometric traits, including high-density lipoprotein cholesterol, overweight, body mass index, obesity class 1, and obesity class 2, with the mediation effect estimated to be 0.024, -0.022, -0.041, -0.016, and -0.012, respectively. CONCLUSION We provide new evidence supporting the observed inverse link between T2D and ALS, and revealed that a shared genetic component and causal association commonly drove such a relationship. We also demonstrate the mediating role of T2D standing in the pathway from T2D-related glycemic/anthropometric traits to ALS.
Collapse
Affiliation(s)
- Haimiao Chen
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Qingwei Lai
- Department of Neurology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
17
|
Wang T, Lu H, Zeng P. Identifying pleiotropic genes for complex phenotypes with summary statistics from a perspective of composite null hypothesis testing. Brief Bioinform 2021; 23:6375058. [PMID: 34571531 DOI: 10.1093/bib/bbab389] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2021] [Revised: 08/06/2021] [Accepted: 08/28/2021] [Indexed: 12/13/2022] Open
Abstract
Pleiotropy has important implication on genetic connection among complex phenotypes and facilitates our understanding of disease etiology. Genome-wide association studies provide an unprecedented opportunity to detect pleiotropic associations; however, efficient pleiotropy test methods are still lacking. We here consider pleiotropy identification from a methodological perspective of high-dimensional composite null hypothesis and propose a powerful gene-based method called MAIUP. MAIUP is constructed based on the traditional intersection-union test with two sets of independent P-values as input and follows a novel idea that was originally proposed under the high-dimensional mediation analysis framework. The key improvement of MAIUP is that it takes the composite null nature of pleiotropy test into account by fitting a three-component mixture null distribution, which can ultimately generate well-calibrated P-values for effective control of family-wise error rate and false discover rate. Another attractive advantage of MAIUP is its ability to effectively address the issue of overlapping subjects commonly encountered in association studies. Simulation studies demonstrate that compared with other methods, only MAIUP can maintain correct type I error control and has higher power across a wide range of scenarios. We apply MAIUP to detect shared associated genes among 14 psychiatric disorders with summary statistics and discover many new pleiotropic genes that are otherwise not identified if failing to account for the issue of composite null hypothesis testing. Functional and enrichment analyses offer additional evidence supporting the validity of these identified pleiotropic genes associated with psychiatric disorders. Overall, MAIUP represents an efficient method for pleiotropy identification.
Collapse
Affiliation(s)
- Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Haojie Lu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| |
Collapse
|
18
|
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J 2021; 19:3209-3224. [PMID: 34141140 PMCID: PMC8187160 DOI: 10.1016/j.csbj.2021.05.042] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2021] [Revised: 05/21/2021] [Accepted: 05/21/2021] [Indexed: 12/12/2022] Open
Abstract
Mediation analysis investigates the intermediate mechanism through which an exposure exerts its influence on the outcome of interest. Mediation analysis is becoming increasingly popular in high-throughput genomics studies where a common goal is to identify molecular-level traits, such as gene expression or methylation, which actively mediate the genetic or environmental effects on the outcome. Mediation analysis in genomics studies is particularly challenging, however, thanks to the large number of potential mediators measured in these studies as well as the composite null nature of the mediation effect hypothesis. Indeed, while the standard univariate and multivariate mediation methods have been well-established for analyzing one or multiple mediators, they are not well-suited for genomics studies with a large number of mediators and often yield conservative p-values and limited power. Consequently, over the past few years many new high-dimensional mediation methods have been developed for analyzing the large number of potential mediators collected in high-throughput genomics studies. In this work, we present a thorough review of these important recent methodological advances in high-dimensional mediation analysis. Specifically, we describe in detail more than ten high-dimensional mediation methods, focusing on their motivations, basic modeling ideas, specific modeling assumptions, practical successes, methodological limitations, as well as future directions. We hope our review will serve as a useful guidance for statisticians and computational biologists who develop methods of high-dimensional mediation analysis as well as for analysts who apply mediation methods to high-throughput genomics studies.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
- Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Zhonghe Shao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor 48109, MI, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor 48109, MI, USA
| |
Collapse
|
19
|
Zeng P, Dai J, Jin S, Zhou X. Aggregating multiple expression prediction models improves the power of transcriptome-wide association studies. Hum Mol Genet 2021; 30:939-951. [PMID: 33615361 DOI: 10.1093/hmg/ddab056] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Revised: 02/10/2021] [Accepted: 02/15/2021] [Indexed: 12/11/2022] Open
Abstract
Transcriptome-wide association study (TWAS) is an important integrative method for identifying genes that are causally associated with phenotypes. A key step of TWAS involves the construction of expression prediction models for every gene in turn using its cis-SNPs as predictors. Different TWAS methods rely on different models for gene expression prediction, and each such model makes a distinct modeling assumption that is often suitable for a particular genetic architecture underlying expression. However, the genetic architectures underlying gene expression vary across genes throughout the transcriptome. Consequently, different TWAS methods may be beneficial in detecting genes with distinct genetic architectures. Here, we develop a new method, HMAT, which aggregates TWAS association evidence obtained across multiple gene expression prediction models by leveraging the harmonic mean P-value combination strategy. Because each expression prediction model is suited to capture a particular genetic architecture, aggregating TWAS associations across prediction models as in HMAT improves accurate expression prediction and enables subsequent powerful TWAS analysis across the transcriptome. A key feature of HMAT is its ability to accommodate the correlations among different TWAS test statistics and produce calibrated P-values after aggregation. Through numerical simulations, we illustrated the advantage of HMAT over commonly used TWAS methods as well as ad hoc P-value combination rules such as Fisher's method. We also applied HMAT to analyze summary statistics of nine common diseases. In the real data applications, HMAT was on average 30.6% more powerful compared to the next best method, detecting many new disease-associated genes that were otherwise not identified by existing TWAS approaches. In conclusion, HMAT represents a flexible and powerful TWAS method that enjoys robust performance across a range of genetic architectures underlying gene expression.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jing Dai
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Siyi Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
20
|
Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets. BMC Bioinformatics 2021; 22:50. [PMID: 33546598 PMCID: PMC7866684 DOI: 10.1186/s12859-021-03959-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
Collapse
|
21
|
Chen H, Wang T, Yang J, Huang S, Zeng P. Improved Detection of Potentially Pleiotropic Genes in Coronary Artery Disease and Chronic Kidney Disease Using GWAS Summary Statistics. Front Genet 2020; 11:592461. [PMID: 33343632 PMCID: PMC7744760 DOI: 10.3389/fgene.2020.592461] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2020] [Accepted: 11/17/2020] [Indexed: 12/24/2022] Open
Abstract
The coexistence of coronary artery disease (CAD) and chronic kidney disease (CKD) implies overlapped genetic foundation. However, the common genetic determination between the two diseases remains largely unknown. Relying on summary statistics publicly available from large scale genome-wide association studies (n = 184,305 for CAD and n = 567,460 for CKD), we observed significant positive genetic correlation between CAD and CKD (rg = 0.173, p = 0.024) via the linkage disequilibrium score regression. Next, we implemented gene-based association analysis for each disease through MAGMA (Multi-marker Analysis of GenoMic Annotation) and detected 763 and 827 genes associated with CAD or CKD (FDR < 0.05). Among those 72 genes were shared between the two diseases. Furthermore, by integrating the overlapped genetic information between CAD and CKD, we implemented two pleiotropy-informed informatics approaches including cFDR (conditional false discovery rate) and GPA (Genetic analysis incorporating Pleiotropy and Annotation), and identified 169 and 504 shared genes (FDR < 0.05), of which 121 genes were simultaneously discovered by cFDR and GPA. Importantly, we found 11 potentially new pleiotropic genes related to both CAD and CKD (i.e., ARHGEF19, RSG1, NDST2, CAMK2G, VCL, LRP10, RBM23, USP10, WNT9B, GOSR2, and RPRML). Five of the newly identified pleiotropic genes were further repeated via an additional dataset CAD available from UK Biobank. Our functional enrichment analysis showed that those pleiotropic genes were enriched in diverse relevant pathway processes including quaternary ammonium group transmembrane transporter, dopamine transport. Overall, this study identifies common genetic architectures overlapped between CAD and CKD and will help to advance understanding of the molecular mechanisms underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haimiao Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinna Yang
- Department of Infectious Diseases, People's Hospital of Zhuji, Shaoxing, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
22
|
Wang T, Tang Z, Yu X, Gao Y, Guan F, Li C, Huang S, Zheng J, Zeng P. Birth Weight and Stroke in Adult Life: Genetic Correlation and Causal Inference With Genome-Wide Association Data Sets. Front Neurosci 2020; 14:479. [PMID: 32595438 PMCID: PMC7301963 DOI: 10.3389/fnins.2020.00479] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Accepted: 04/17/2020] [Indexed: 11/14/2022] Open
Abstract
Objective Prior studies have shown that there is an inverse association between birth weight and stroke in adulthood; however, whether such association is causal remains yet known and those studies cannot distinguish between the direct fetal effect and the indirect maternal effect. The aim of the study is to untangle such relationship using novel statistical genetic approaches. Methods We first utilized linkage disequilibrium score regression (LDSC) and Genetic analysis incorporating Pleiotropy and Annotation (GPA) to estimate the overall genetic correlation between birth weight and stroke. Then, with a set of valid birth-weight instruments which had adjusted fetal and maternal effects, we performed a two-sample Mendelian randomization (MR) to evaluate its causal effect on stroke based summary statistics from large scale genome-wide association study (GWAS) (n = 264,498 for birth weight and 446,696 for stroke). We further validated the MR results with extensive sensitivity analyses. Results Both LDSC and GPA demonstrated significant evidence of shared maternal genetic foundation between birth weight and stroke, with the genetic correlation estimated to −0.176. However, no fetal genetic correlation between birth weight and stroke was detected. Furthermore, the inverse variance weighted MR demonstrated the maternally causal effect of birth weight on stroke was 1.12 (95% confidence interval [CI] 1.00–1.27). The maternal ORs of birth weight on three subtypes of stroke including cardioembolic stroke (CES), large artery stroke (LAS) and small vessel stroke (SVS) were 1.16 (95% CI 0.93–1.43), 1.50 (95% CI 1.14–1.96) and 1.47 (95% CI 1.15–1.87), respectively. In contrast, no fetal causal associations were found between birth weight and stroke or the subtypes. Those results were robust against extensive sensitivity analyses, with Egger regression ruling out the possibility of pleiotropy and multivariable MR excluding the likelihood of confounding or mediation effects of other risk factors of stroke. Conclusion This study provides empirically supportive evidence on the fetal developmental origins of stroke and its subtypes. However, further investigation is warranted to understand the pathophysiological role of low birth weight in developing stroke.
Collapse
Affiliation(s)
- Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zaixiang Tang
- Department of Biostatistics, School of Public Health, Medical College of Soochow University, Suzhou, China
| | - Xinghao Yu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Yixing Gao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Fengjun Guan
- Department of Pediatrics, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Chengzong Li
- Center of Stroke and Department of Cardiology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Junnian Zheng
- Cancer Institute, Xuzhou Medical University, Xuzhou, China.,Center of Clinical Oncology, Affiliated Hospital of Xuzhou Medical University, Xuzhou, China.,Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Cancer Institute, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
23
|
Zeng P, Zhou X. Causal Association Between Birth Weight and Adult Diseases: Evidence From a Mendelian Randomization Analysis. Front Genet 2019; 10:618. [PMID: 31354785 PMCID: PMC6635582 DOI: 10.3389/fgene.2019.00618] [Citation(s) in RCA: 44] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 06/13/2019] [Indexed: 01/07/2023] Open
Abstract
Purpose: Birth weight has a profound long-term impact on individual’s predisposition to various diseases at adulthood—a hypothesis commonly referred to as the fetal origins of adult diseases. However, it is not fully clear to what extent the fetal origins of adult diseases hypothesis holds and it is also not completely known what types of adult diseases are causally affected by birth weight. Materials and methods: Mendelian randomization using multiple genetic instruments associated with birth weight was performed to explore the causal relationship between birth weight and adult diseases. The causal relationship between birth weight and 21 adult diseases as well as 38 other complex traits was examined based on data collected from 37 large-scale genome-wide association studies with up to 340,000 individuals of European ancestry. Causal effects of birth weight were estimated using inverse-variance weighted methods. The identified causal relationships between birth weight and adult diseases were further validated through extensive sensitivity analyses, bias calculation, and simulations. Results: Among the 21 adult diseases, three were identified to be inversely causally affected by birth weight after the Bonferroni correction. The measurement unit of birth weight was defined as its standard deviation (i.e., 488 g), and one unit lower birth weight was causally related to an increased risk of coronary artery disease (CAD), myocardial infarction (MI), type 2 diabetes (T2D), and BMI-adjusted T2D, with the estimated odds ratios of 1.34 [95% confidence interval (CI) 1.17–1.53], 1.30 (95% CI 1.13–1.51), 1.41 (95% CI 1.15–1.73), and 1.54 (95% CI 1.25–1.89), respectively. All these identified causal associations were robust across various sensitivity analyses that guard against various confounding due to pleiotropy or maternal effects as well as reverse causation. In addition, analysis on 38 additional complex traits did not identify candidate traits that may mediate the causal association between birth weight and CAD/MI/T2D. Conclusions: The results suggest that lower birth weight is causally associated with an increased risk of CAD, MI, and T2D in later life, supporting the fetal origins of adult diseases hypothesis.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, Xuzhou Medical University, Xuzhou, Jiangsu, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States.,Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
24
|
Birth weight is not causally associated with adult asthma: results from instrumental variable analyses. Sci Rep 2019; 9:7647. [PMID: 31113992 PMCID: PMC6529425 DOI: 10.1038/s41598-019-44114-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2018] [Accepted: 05/07/2019] [Indexed: 11/29/2022] Open
Abstract
The association between lower birth weight and childhood asthma is well established. However, it remains unclear whether the influence of lower birth weight on asthma can persist into adulthood. We conducted a Mendelian randomization analysis to assess the causal relationship of birth weight (~140,000 individuals) on the risk of adult asthma (~62,000 individuals). We estimated the causal effect of birth weight to be 1.00 (95% CI 0.98~1.03, p = 0.737) using the genetic risk score method. We did not observe nonlinear relationship or gender difference for the estimated causal effect. With the inverse-variance weighted method, the causal effect of birth weight on adult asthma was estimated to be 1.02 (95% CI 0.84~1.24, p = 0.813). Additionally, the iMAP method provides no additional genome-wide evidence supporting the causal effects of birth weight on adult asthma. Our results were robust against various sensitivity analyses, and MR-PRESSO and MR-Egger regression showed that no instrument outliers and no horizontal pleiotropy were likely to bias the results. Overall, our study provides no evidence for the fetal origins of diseases hypothesis for adult asthma, implying that the impact of birth weight on asthma in years of children and adolescents does not persist into adult and previous findings may be biased by confounders.
Collapse
|