1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Liao H, Xue H, Pan W. Inferring causal direction between two traits using R 2 with application to transcriptome-wide association studies. Am J Hum Genet 2024; 111:1782-1795. [PMID: 39053457 PMCID: PMC11339628 DOI: 10.1016/j.ajhg.2024.06.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 06/17/2024] [Accepted: 06/24/2024] [Indexed: 07/27/2024] Open
Abstract
In Mendelian randomization, two single SNP-trait correlation-based methods have been developed to infer the causal direction between an exposure (e.g., a gene) and an outcome (e.g., a trait), called MR Steiger's method and its recent extension called Causal Direction-Ratio (CD-Ratio). Here we propose an approach based on R2, the coefficient of determination, to combine information from multiple (possibly correlated) SNPs to simultaneously infer the presence and direction of a causal relationship between an exposure and an outcome. Our proposed method generalizes Steiger's method from using a single SNP to multiple SNPs as IVs. It is especially useful in transcriptome-wide association studies (TWASs) (and similar applications) with typically small sample sizes for gene expression (or another molecular trait) data, providing a more flexible and powerful approach to inferring causal directions. It can be applied to GWAS summary data with a reference panel. We also discuss the influence of invalid IVs and introduce a new approach called R2S to select and remove invalid IVs (if any) to enhance the robustness. We compared the performance of the proposed method with existing methods in simulations to demonstrate its advantages. We applied the methods to identify causal genes for high/low-density lipoprotein cholesterol (HDL/LDL) using the individual-level GTEx gene expression data and UK Biobank GWAS data. The proposed method was able to confirm some well-known causal genes while identifying some novel ones. Additionally, we illustrated an application of the proposed method to GWAS summary to infer causal relationships between HDL/LDL and stroke/coronary artery disease (CAD).
Collapse
Affiliation(s)
- Huiling Liao
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Haoran Xue
- Department of Biostatistics, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei Pan
- Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, MN, USA.
| |
Collapse
|
3
|
Zhai S, Guo B, Wu B, Mehrotra DV, Shen J. Integrating multiple traits for improving polygenic risk prediction in disease and pharmacogenomics GWAS. Brief Bioinform 2023:7169140. [PMID: 37200155 DOI: 10.1093/bib/bbad181] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 03/30/2023] [Accepted: 04/21/2023] [Indexed: 05/20/2023] Open
Abstract
Polygenic risk score (PRS) has been recently developed for predicting complex traits and drug responses. It remains unknown whether multi-trait PRS (mtPRS) methods, by integrating information from multiple genetically correlated traits, can improve prediction accuracy and power for PRS analysis compared with single-trait PRS (stPRS) methods. In this paper, we first review commonly used mtPRS methods and find that they do not directly model the underlying genetic correlations among traits, which has been shown to be useful in guiding multi-trait association analysis in the literature. To overcome this limitation, we propose a mtPRS-PCA method to combine PRSs from multiple traits with weights obtained from performing principal component analysis (PCA) on the genetic correlation matrix. To accommodate various genetic architectures covering different effect directions, signal sparseness and across-trait correlation structures, we further propose an omnibus mtPRS method (mtPRS-O) by combining P values from mtPRS-PCA, mtPRS-ML (mtPRS based on machine learning) and stPRSs using Cauchy Combination Test. Our extensive simulation studies show that mtPRS-PCA outperforms other mtPRS methods in both disease and pharmacogenomics (PGx) genome-wide association studies (GWAS) contexts when traits are similarly correlated, with dense signal effects and in similar effect directions, and mtPRS-O is consistently superior to most other methods due to its robustness under various genetic architectures. We further apply mtPRS-PCA, mtPRS-O and other methods to PGx GWAS data from a randomized clinical trial in the cardiovascular domain and demonstrate performance improvement of mtPRS-PCA in both prediction accuracy and patient stratification as well as the robustness of mtPRS-O in PRS association test.
Collapse
Affiliation(s)
- Song Zhai
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| | - Bin Guo
- Data and Genome Science, Merck & Co., Inc., Cambridge, MA 02141, USA
| | - Baolin Wu
- Department of Epidemiology and Biostatistics, University of California Irvine, Irvine, CA 92697, USA
| | - Devan V Mehrotra
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., North Wales, PA 19454, USA
| | - Judong Shen
- Biostatistics and Research Decision Sciences, Merck & Co., Inc., Rahway, NJ 07065, USA
| |
Collapse
|
4
|
Meng W, Reel PS, Nangia C, Rajendrakumar AL, Hebert HL, Guo Q, Adams MJ, Zheng H, Lu ZH, Ray D, Colvin LA, Palmer CNA, McIntosh AM, Smith BH. A Meta-Analysis of the Genome-Wide Association Studies on Two Genetically Correlated Phenotypes Suggests Four New Risk Loci for Headaches. PHENOMICS (CHAM, SWITZERLAND) 2023; 3:64-76. [PMID: 36939796 PMCID: PMC9883337 DOI: 10.1007/s43657-022-00078-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Revised: 09/16/2022] [Accepted: 09/21/2022] [Indexed: 11/19/2022]
Abstract
Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood while it has been suggested that self-reported headache and self-reported migraine were genetically correlated. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank and the self-reported migraine phenotype from the 23andMe using the Unified Score-based Association Test (metaUSAT) software for genetically correlated phenotypes (N = 397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and four loci were newly suggested. The LDL receptor related protein 1 (LRP1)-Signal Transducer and Activator of Transcription 6 (STAT6)-S hort chain D ehydrogenase/R eductase family 9C member 7 (SDR9C7) region in chromosome 12 was the most significantly associated locus with a leading p value of 1.24 × 10-62 of rs11172113. The One Cut homeobox 2 (ONECUT2) gene locus in chromosome 18 was the strongest signal among the four new loci with a p value of 1.29 × 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more variants for headaches. This study has paved way for a large GWAS meta-analysis involving cohorts of different while genetically correlated headache phenotypes. Supplementary Information The online version contains supplementary material available at 10.1007/s43657-022-00078-7.
Collapse
Affiliation(s)
- Weihua Meng
- Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China, Ningbo, 315100 China
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Parminder S. Reel
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Charvi Nangia
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Aravind Lathika Rajendrakumar
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Harry L. Hebert
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Qian Guo
- Nottingham Ningbo China Beacons of Excellence Research and Innovation Institute, University of Nottingham Ningbo China, Ningbo, 315100 China
| | - Mark J. Adams
- Division of Psychiatry, Edinburgh Medical School, University of Edinburgh, Edinburgh, EH10 5HF UK
| | - Hua Zheng
- Department of Anaesthesiology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030 China
| | - Zen Haut Lu
- PAPRSB Institute of Health Sciences, Universiti Brunei Darussalam, Bandar Seri Begawan, BE1410 Brunei Darussalam
| | | | - Debashree Ray
- Department of Epidemiology, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
- Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD 21205 USA
| | - Lesley A. Colvin
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Colin N. A. Palmer
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| | - Andrew M. McIntosh
- Division of Psychiatry, Edinburgh Medical School, University of Edinburgh, Edinburgh, EH10 5HF UK
| | - Blair H. Smith
- Division of Population Health and Genomics, Ninewells Hospital and Medical School, University of Dundee, Dundee, DD2 4BF UK
| |
Collapse
|
5
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
6
|
Chen X, Zhang H, Liu M, Deng HW, Wu Z. Simultaneous detection of novel genes and SNPs by adaptive p-value combination. Front Genet 2022; 13:1009428. [DOI: 10.3389/fgene.2022.1009428] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 11/03/2022] [Indexed: 11/18/2022] Open
Abstract
Combining SNP p-values from GWAS summary data is a promising strategy for detecting novel genetic factors. Existing statistical methods for the p-value-based SNP-set testing confront two challenges. First, the statistical power of different methods depends on unknown patterns of genetic effects that could drastically vary over different SNP sets. Second, they do not identify which SNPs primarily contribute to the global association of the whole set. We propose a new signal-adaptive analysis pipeline to address these challenges using the omnibus thresholding Fisher’s method (oTFisher). The oTFisher remains robustly powerful over various patterns of genetic effects. Its adaptive thresholding can be applied to estimate important SNPs contributing to the overall significance of the given SNP set. We develop efficient calculation algorithms to control the type I error rate, which accounts for the linkage disequilibrium among SNPs. Extensive simulations show that the oTFisher has robustly high power and provides a higher balanced accuracy in screening SNPs than the traditional Bonferroni and FDR procedures. We applied the oTFisher to study the genetic association of genes and haplotype blocks of the bone density-related traits using the summary data of the Genetic Factors for Osteoporosis Consortium. The oTFisher identified more novel and literature-reported genetic factors than existing p-value combination methods. Relevant computation has been implemented into the R package TFisher to support similar data analysis.
Collapse
|
7
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
8
|
Could routine forensic STR genotyping data leak personal phenotypic information? Forensic Sci Int 2022; 335:111311. [PMID: 35468577 DOI: 10.1016/j.forsciint.2022.111311] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 03/19/2022] [Accepted: 04/13/2022] [Indexed: 11/22/2022]
Abstract
The application of forensic genetic markers must comply with privacy rights and legal policies on a premise that the markers do not expose phenotypic information. The most widely-used short tandem repeats (STRs) are generally viewed as 'junk' DNA because most STRs are located in non-coding regions and therefore refrain from leaking phenotypic traits. But with a deepening understanding of phenotypes and underlying genetic structure, whether STRs could potentially reflect any phenotypic information may need re-examining. Therefore, we performed the following analyses. First, we analyzed the association between 15 STRs and three facial characteristics (single or double eyelid, with or without epicanthus, unattached or attached earlobe) on 721 unrelated Han Chinese individuals. Then, we collected 27199 individuals' STRs and geographic data from the literature to investigate the association between STRs and bio-geographic information, and predict geographic information by STRs on additional 1993 unrelated individuals. We found that there was scarcely any association between STRs with studied facial characteristics. Although allele19 in D2S1338 and allele 18 in FGA (P = 0.0032, P = 0.0030, respectively after Bonferroni correction) showed statistical significance, the prediction effectiveness was very low. For the STRs and bio-geographic information, the principal component analysis showed the first three components could explain 87.7% of the variance, but the prediction accuracy only reached 25.2%. We demonstrated that the forensic phenotypes are usually complex traits, it is hardly possible to uncover phenotypic information by testing only dozens of STR loci.
Collapse
|
9
|
Yang Y, Basu S, Zhang L. A Bayesian hierarchically structured prior for gene-based association testing with multiple traits in genome-wide association studies. Genet Epidemiol 2022; 46:63-72. [PMID: 34787916 PMCID: PMC8795481 DOI: 10.1002/gepi.22437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 09/28/2021] [Accepted: 10/18/2021] [Indexed: 02/03/2023]
Abstract
Although genome-wide association studies (GWAS) often collect data on multiple correlated traits for complex diseases, conventional gene-based analysis is usually univariate, and therefore, treating traits as uncorrelated. Multivariate analysis of multiple correlated traits can potentially increase the power to detect genes that affect some or all of these traits. In this study, we propose the multivariate hierarchically structured variable selection (HSVS-M) model, a flexible Bayesian model that tests the association of a gene with multiple correlated traits. With only summary statistics, HSVS-M can account for the correlations among genetic variants and among traits simultaneously and can also estimate the various directions and magnitudes of associations between a gene and multiple traits. Simulation studies show that HSVS-M substantially outperforms competing methods in various scenarios, particularly when variants in a gene are associated with a trait in similar directions and magnitudes. We applied HSVS-M to the summary statistics of a meta-analysis GWAS on four lipid traits from the Global Lipids Genetics Consortium and identified 15 genes that have also been confirmed as risk factors in previous studies.
Collapse
Affiliation(s)
- Yi Yang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Biostatistics, Columbia University, New York, NY 10032, USA,Correspondence:
| | - Saonli Basu
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lin Zhang
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
10
|
Coupled mixed model for joint genetic analysis of complex disorders with two independently collected data sets. BMC Bioinformatics 2021; 22:50. [PMID: 33546598 PMCID: PMC7866684 DOI: 10.1186/s12859-021-03959-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 01/06/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the last decade, Genome-wide Association studies (GWASs) have contributed to decoding the human genome by uncovering many genetic variations associated with various diseases. Many follow-up investigations involve joint analysis of multiple independently generated GWAS data sets. While most of the computational approaches developed for joint analysis are based on summary statistics, the joint analysis based on individual-level data with consideration of confounding factors remains to be a challenge. RESULTS In this study, we propose a method, called Coupled Mixed Model (CMM), that enables a joint GWAS analysis on two independently collected sets of GWAS data with different phenotypes. The CMM method does not require the data sets to have the same phenotypes as it aims to infer the unknown phenotypes using a set of multivariate sparse mixed models. Moreover, CMM addresses the confounding variables due to population stratification, family structures, and cryptic relatedness, as well as those arising during data collection such as batch effects that frequently appear in joint genetic studies. We evaluate the performance of CMM using simulation experiments. In real data analysis, we illustrate the utility of CMM by an application to evaluating common genetic associations for Alzheimer's disease and substance use disorder using datasets independently collected for the two complex human disorders. Comparison of the results with those from previous experiments and analyses supports the utility of our method and provides new insights into the diseases. The software is available at https://github.com/HaohanWang/CMM .
Collapse
|
11
|
Zhang J, Guo X, Gonzales S, Yang J, Wang X. TS: a powerful truncated test to detect novel disease associated genes using publicly available gWAS summary data. BMC Bioinformatics 2020; 21:172. [PMID: 32366212 PMCID: PMC7199321 DOI: 10.1186/s12859-020-3511-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 04/23/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the last decade, a large number of common variants underlying complex diseases have been identified through genome-wide association studies (GWASs). Summary data of the GWASs are freely and publicly available. The summary data is usually obtained through single marker analysis. Gene-based analysis offers a useful alternative and complement to single marker analysis. Results from gene level association tests can be more readily integrated with downstream functional and pathogenic investigations. Most existing gene-based methods fall into two categories: burden tests and quadratic tests. Burden tests are usually powerful when the directions of effects of causal variants are the same. However, they may suffer loss of statistical power when different directions of effects exist at the causal variants. The power of quadratic tests is not affected by the directions of effects but could be less powerful due to issues such as the large number of degree of freedoms. These drawbacks of existing gene based methods motivated us to develop a new powerful method to identify disease associated genes using existing GWAS summary data. METHODS AND RESULTS In this paper, we propose a new truncated statistic method (TS) by utilizing a truncated method to find the genes that have a true contribution to the genetic association. Extensive simulation studies demonstrate that our proposed test outperforms other comparable tests. We applied TS and other comparable methods to the schizophrenia GWAS data and type 2 diabetes (T2D) GWAS meta-analysis summary data. TS identified more disease associated genes than comparable methods. Many of the significant genes identified by TS may have important mechanisms relevant to the associated traits. TS is implemented in C program TS, which is freely and publicly available online. CONCLUSIONS The proposed truncated statistic outperforms existing methods. It can be employed to detect novel traits associated genes using GWAS summary data.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, 76203 TX USA
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Discovery Park 3940 N. Elm, Denton, 76203 TX USA
| | - Samantha Gonzales
- Department of Computer Science and Engineering, University of North Texas, Discovery Park 3940 N. Elm, Denton, 76203 TX USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics School of Medicine, Emory University, Whitehead Biomedical Research Building, Suite 305K, Atlanta, 30322 GA USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, 76203 TX USA
| |
Collapse
|
12
|
Maierhofer A, Flunkert J, Oshima J, Martin GM, Poot M, Nanda I, Dittrich M, Müller T, Haaf T. Epigenetic signatures of Werner syndrome occur early in life and are distinct from normal epigenetic aging processes. Aging Cell 2019; 18:e12995. [PMID: 31259468 PMCID: PMC6718529 DOI: 10.1111/acel.12995] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Revised: 05/24/2019] [Accepted: 06/05/2019] [Indexed: 12/11/2022] Open
Abstract
Werner Syndrome (WS) is an adult-onset segmental progeroid syndrome. Bisulfite pyrosequencing of repetitive DNA families revealed comparable blood DNA methylation levels between classical (18 WRN-mutant) or atypical WS (3 LMNA-mutant and 3 POLD1-mutant) patients and age- and sex-matched controls. WS was not associated with either age-related accelerated global losses of ALU, LINE1, and α-satellite DNA methylations or gains of rDNA methylation. Single CpG methylation was analyzed with Infinium MethylationEPIC arrays. In a correspondence analysis, atypical WS samples clustered together with the controls and were clearly separated from classical WS, consistent with distinct epigenetic pathologies. In classical WS, we identified 659 differentially methylated regions (DMRs) comprising 3,656 CpG sites and 613 RefSeq genes. The top DMR was located in the HOXA4 promoter. Additional DMR genes included LMNA, POLD1, and 132 genes which have been reported to be differentially expressed in WRN-mutant/depleted cells. DMRs were enriched in genes with molecular functions linked to transcription factor activity and sequence-specific DNA binding to promoters transcribed by RNA polymerase II. We propose that transcriptional misregulation of downstream genes by the absence of WRN protein contributes to the variable premature aging phenotypes of WS. There were no CpG sites showing significant differences in DNA methylation changes with age between WS patients and controls. Genes with both WS- and age-related methylation changes exhibited a constant offset of methylation between WRN-mutant patients and controls across the entire analyzed age range. WS-specific epigenetic signatures occur early in life and do not simply reflect an acceleration of normal epigenetic aging processes.
Collapse
Affiliation(s)
- Anna Maierhofer
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
| | - Julia Flunkert
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
| | - Junko Oshima
- Department of Pathology University of Washington Seattle Washington USA
- Department of Clinical Cell Biology and Medicine, Graduate School of Medicine Chiba University Chiba Japan
| | - George M. Martin
- Department of Pathology University of Washington Seattle Washington USA
| | - Martin Poot
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
| | - Indrajit Nanda
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
| | - Marcus Dittrich
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
- Department of Bioinformatics Julius Maximilians University Würzburg Germany
| | - Tobias Müller
- Department of Bioinformatics Julius Maximilians University Würzburg Germany
| | - Thomas Haaf
- Institute of Human Genetics Julius Maximilians University Würzburg Germany
| |
Collapse
|