1
|
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min 2024; 17:31. [PMID: 39238044 PMCID: PMC11375927 DOI: 10.1186/s13040-024-00385-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2024] [Accepted: 08/27/2024] [Indexed: 09/07/2024] Open
Abstract
Genome-wide association studies (GWAS) have revolutionized our understanding of the genetic architecture of complex traits and diseases. GWAS summary statistics have become essential tools for various genetic analyses, including meta-analysis, fine-mapping, and risk prediction. However, the increasing number of GWAS summary statistics and the diversity of software tools available for their analysis can make it challenging for researchers to select the most appropriate tools for their specific needs. This systematic review aims to provide a comprehensive overview of the currently available software tools and databases for GWAS summary statistics analysis. We conducted a comprehensive literature search to identify relevant software tools and databases. We categorized the tools and databases by their functionality, including data management, quality control, single-trait analysis, and multiple-trait analysis. We also compared the tools and databases based on their features, limitations, and user-friendliness. Our review identified a total of 305 functioning software tools and databases dedicated to GWAS summary statistics, each with unique strengths and limitations. We provide descriptions of the key features of each tool and database, including their input/output formats, data types, and computational requirements. We also discuss the overall usability and applicability of each tool for different research scenarios. This comprehensive review will serve as a valuable resource for researchers who are interested in using GWAS summary statistics to investigate the genetic basis of complex traits and diseases. By providing a detailed overview of the available tools and databases, we aim to facilitate informed tool selection and maximize the effectiveness of GWAS summary statistics analysis.
Collapse
Affiliation(s)
| | - Pantelis G Bagos
- Department of Computer Science and Biomedical Informatics, University of Thessaly, 35131, Lamia, Greece.
| |
Collapse
|
2
|
Zhou F, Soremekun O, Chikowore T, Fatumo S, Barroso I, Morris AP, Asimit JL. Leveraging information between multiple population groups and traits improves fine-mapping resolution. Nat Commun 2023; 14:7279. [PMID: 37949886 PMCID: PMC10638399 DOI: 10.1038/s41467-023-43159-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 11/01/2023] [Indexed: 11/12/2023] Open
Abstract
Statistical fine-mapping helps to pinpoint likely causal variants underlying genetic association signals. Its resolution can be improved by (i) leveraging information between traits; and (ii) exploiting differences in linkage disequilibrium structure between diverse population groups. Using association summary statistics, MGflashfm jointly fine-maps signals from multiple traits and population groups; MGfm uses an analogous framework to analyse each trait separately. We also provide a practical approach to fine-mapping with out-of-sample reference panels. In simulation studies we show that MGflashfm and MGfm are well-calibrated and that the mean proportion of causal variants with PP > 0.80 is above 0.75 (MGflashfm) and 0.70 (MGfm). In our analysis of four lipids traits across five population groups, MGflashfm gives a median 99% credible set reduction of 10.5% over MGfm. MGflashfm and MGfm only require summary level data, making them very useful fine-mapping tools in consortia efforts where individual-level data cannot be shared.
Collapse
Affiliation(s)
- Feng Zhou
- MRC Biostatistics Unit, University of Cambridge, Cambridge, UK
| | - Opeyemi Soremekun
- The African Computational Genomic (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
| | - Tinashe Chikowore
- Sydney Brenner Institute for Molecular Bioscience, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- MRC/Wits Developmental Pathways for Health Research Unit, Department of Paediatrics, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Segun Fatumo
- The African Computational Genomic (TACG) Research Group, MRC/UVRI and LSHTM, Entebbe, Uganda
- Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, London, UK
| | - Inês Barroso
- Exeter Centre of Excellence for Diabetes Research (EXCEED), University of Exeter Medical School, Exeter, UK
| | - Andrew P Morris
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
| | | |
Collapse
|
3
|
Zhang J, Liang X, Gonzales S, Liu J, Gao XR, Wang X. A gene based combination test using GWAS summary data. BMC Bioinformatics 2023; 24:2. [PMID: 36597047 PMCID: PMC9811798 DOI: 10.1186/s12859-022-05114-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2022] [Accepted: 12/13/2022] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Gene-based association tests provide a useful alternative and complement to the usual single marker association tests, especially in genome-wide association studies (GWAS). The way of weighting for variants in a gene plays an important role in boosting the power of a gene-based association test. Appropriate weights can boost statistical power, especially when detecting genetic variants with weak effects on a trait. One major limitation of existing gene-based association tests lies in using weights that are predetermined biologically or empirically. This limitation often attenuates the power of a test. On another hand, effect sizes or directions of causal genetic variants in real data are usually unknown, driving a need for a flexible yet robust methodology of gene based association tests. Furthermore, access to individual-level data is often limited, while thousands of GWAS summary data are publicly and freely available. RESULTS To resolve these limitations, we propose a combination test named as OWC which is based on summary statistics from GWAS data. Several traditional methods including burden test, weighted sum of squared score test [SSU], weighted sum statistic [WSS], SNP-set Kernel Association Test [SKAT], and the score test are special cases of OWC. To evaluate the performance of OWC, we perform extensive simulation studies. Results of simulation studies demonstrate that OWC outperforms several existing popular methods. We further show that OWC outperforms comparison methods in real-world data analyses using schizophrenia GWAS summary data and a fasting glucose GWAS meta-analysis data. The proposed method is implemented in an R package available at https://github.com/Xuexia-Wang/OWC-R-package CONCLUSIONS: We propose a novel gene-based association test that incorporates four different weighting schemes (two constant weights and two weights proportional to normal statistic Z) and includes several popular methods as its special cases. Results of the simulation studies and real data analyses illustrate that the proposed test, OWC, outperforms comparable methods in most scenarios. These results demonstrate that OWC is a useful tool that adapts to the underlying biological model for a disease by weighting appropriately genetic variants and combination of well-known gene-based tests.
Collapse
Affiliation(s)
- Jianjun Zhang
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyu Liang
- grid.17088.360000 0001 2150 1785Department of Epidemiology and Biostatistics, Michigan State University, 909 Wilson Rd Room B601, East Lansing, MI 48824 USA
| | - Samantha Gonzales
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Jianguo Liu
- grid.266869.50000 0001 1008 957XDepartment of Mathematics, University of North Texas, 225 Avenue E, Denton, TX 76201 USA
| | - Xiaoyi Raymond Gao
- grid.261331.40000 0001 2285 7943Department of Ophthalmology and Visual Science, Department of Biomedical informatics, Division of Human Genetics, Ohio State University, 915 Olentangy River Road, Columbus, OH 43212 USA
| | - Xuexia Wang
- grid.65456.340000 0001 2110 1845Department of Biostatistics, Robert Stempel College of Public Health and Social Work, Florida International University, 11200 SW 8th street, Miami, FL 33174 USA
| |
Collapse
|
4
|
Shao Z, Wang T, Qiao J, Zhang Y, Huang S, Zeng P. A comprehensive comparison of multilocus association methods with summary statistics in genome-wide association studies. BMC Bioinformatics 2022; 23:359. [PMID: 36042399 PMCID: PMC9429742 DOI: 10.1186/s12859-022-04897-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 08/22/2022] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Multilocus analysis on a set of single nucleotide polymorphisms (SNPs) pre-assigned within a gene constitutes a valuable complement to single-marker analysis by aggregating data on complex traits in a biologically meaningful way. However, despite the existence of a wide variety of SNP-set methods, few comprehensive comparison studies have been previously performed to evaluate the effectiveness of these methods. RESULTS We herein sought to fill this knowledge gap by conducting a comprehensive empirical comparison for 22 commonly-used summary-statistics based SNP-set methods. We showed that only seven methods could effectively control the type I error, and that these well-calibrated approaches had varying power performance under the simulation scenarios. Overall, we confirmed that the burden test was generally underpowered and score-based variance component tests (e.g., sequence kernel association test) were much powerful under the polygenic genetic architecture in both common and rare variant association analyses. We further revealed that two linkage-disequilibrium-free P value combination methods (e.g., harmonic mean P value method and aggregated Cauchy association test) behaved very well under the sparse genetic architecture in simulations and real-data applications to common and rare variant association analyses as well as in expression quantitative trait loci weighted integrative analysis. We also assessed the scalability of these approaches by recording computational time and found that all these methods can be scalable to biobank-scale data although some might be relatively slow. CONCLUSION In conclusion, we hope that our findings can offer an important guidance on how to choose appropriate multilocus association analysis methods in post-GWAS era. All the SNP-set methods are implemented in the R package called MCA, which is freely available at https://github.com/biostatpzeng/ .
Collapse
Affiliation(s)
- Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuchen Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
5
|
Veturi Y, Lucas A, Bradford Y, Hui D, Dudek S, Theusch E, Verma A, Miller JE, Kullo I, Hakonarson H, Sleiman P, Schaid D, Stein CM, Edwards DRV, Feng Q, Wei WQ, Medina MW, Krauss R, Hoffmann TJ, Risch N, Voight BF, Rader DJ, Ritchie MD. A unified framework identifies new links between plasma lipids and diseases from electronic medical records across large-scale cohorts. Nat Genet 2021; 53:972-981. [PMID: 34140684 PMCID: PMC8555954 DOI: 10.1038/s41588-021-00879-y] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Accepted: 05/05/2021] [Indexed: 02/05/2023]
Abstract
Plasma lipids are known heritable risk factors for cardiovascular disease, but increasing evidence also supports shared genetics with diseases of other organ systems. We devised a comprehensive three-phase framework to identify new lipid-associated genes and study the relationships among lipids, genotypes, gene expression and hundreds of complex human diseases from the Electronic Medical Records and Genomics (347 traits) and the UK Biobank (549 traits). Aside from 67 new lipid-associated genes with strong replication, we found evidence for pleiotropic SNPs/genes between lipids and diseases across the phenome. These include discordant pleiotropy in the HLA region between lipids and multiple sclerosis and putative causal paths between triglycerides and gout, among several others. Our findings give insights into the genetic basis of the relationship between plasma lipids and diseases on a phenome-wide scale and can provide context for future prevention and treatment strategies.
Collapse
Affiliation(s)
- Yogasudha Veturi
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Anastasia Lucas
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Yuki Bradford
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel Hui
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Scott Dudek
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Elizabeth Theusch
- Department of Pediatrics, University of California San Francisco, Oakland, CA, USA
| | - Anurag Verma
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jason E. Miller
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Iftikhar Kullo
- Division of Cardiovascular Diseases, Mayo Clinic, Rochester, MN, USA
| | - Hakon Hakonarson
- Center for Applied Genomics, Children’s Hospital of Philadelphia, PA, USA
| | - Patrick Sleiman
- Center for Applied Genomics, Children’s Hospital of Philadelphia, PA, USA
| | - Daniel Schaid
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Charles M. Stein
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Digna R. Velez Edwards
- Department of Biomedical Informatics in School of Medicine, Vanderbilt University, Nashville, TN, USA.,Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN, USA.,Division of Quantitative Science, Department of Obstetrics and Gynecology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - QiPing Feng
- Division of Clinical Pharmacology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Wei-Qi Wei
- Department of Biomedical Informatics in School of Medicine, Vanderbilt University, Nashville, TN, USA
| | - Marisa W. Medina
- Department of Pediatrics, University of California San Francisco, Oakland, CA, USA
| | - Ronald Krauss
- Department of Pediatrics, University of California San Francisco, Oakland, CA, USA
| | - Thomas J. Hoffmann
- Institute for Human Genetics, and Department of Epidemiology & Biostatistics, University of California and San Francisco, San Francisco, CA, USA
| | - Neil Risch
- Institute for Human Genetics, and Department of Epidemiology & Biostatistics, University of California and San Francisco, San Francisco, CA, USA
| | - Benjamin F. Voight
- Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Institute of Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Daniel J. Rader
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Marylyn D. Ritchie
- Department of Genetics and Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.,
| |
Collapse
|
6
|
Lu H, Zhang J, Jiang Z, Zhang M, Wang T, Zhao H, Zeng P. Detection of Genetic Overlap Between Rheumatoid Arthritis and Systemic Lupus Erythematosus Using GWAS Summary Statistics. Front Genet 2021; 12:656545. [PMID: 33815486 PMCID: PMC8012913 DOI: 10.3389/fgene.2021.656545] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 03/01/2021] [Indexed: 01/04/2023] Open
Abstract
Background Clinical and epidemiological studies have suggested systemic lupus erythematosus (SLE) and rheumatoid arthritis (RA) are comorbidities and common genetic etiologies can partly explain such coexistence. However, shared genetic determinations underlying the two diseases remain largely unknown. Methods Our analysis relied on summary statistics available from genome-wide association studies of SLE (N = 23,210) and RA (N = 58,284). We first evaluated the genetic correlation between RA and SLE through the linkage disequilibrium score regression (LDSC). Then, we performed a multiple-tissue eQTL (expression quantitative trait loci) weighted integrative analysis for each of the two diseases and aggregated association evidence across these tissues via the recently proposed harmonic mean P-value (HMP) combination strategy, which can produce a single well-calibrated P-value for correlated test statistics. Afterwards, we conducted the pleiotropy-informed association using conjunction conditional FDR (ccFDR) to identify potential pleiotropic genes associated with both RA and SLE. Results We found there existed a significant positive genetic correlation (rg = 0.404, P = 6.01E-10) via LDSC between RA and SLE. Based on the multiple-tissue eQTL weighted integrative analysis and the HMP combination across various tissues, we discovered 14 potential pleiotropic genes by ccFDR, among which four were likely newly novel genes (i.e., INPP5B, OR5K2, RP11-2C24.5, and CTD-3105H18.4). The SNP effect sizes of these pleiotropic genes were typically positively dependent, with an average correlation of 0.579. Functionally, these genes were implicated in multiple auto-immune relevant pathways such as inositol phosphate metabolic process, membrane and glucagon signaling pathway. Conclusion This study reveals common genetic components between RA and SLE and provides candidate associated loci for understanding of molecular mechanism underlying the comorbidity of the two diseases.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Zhou Jiang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Meng Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Huashuo Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, China
| |
Collapse
|
7
|
Zhang J, Sha Q, Hao H, Zhang S, Gao XR, Wang X. Test Gene-Environment Interactions for Multiple Traits in Sequencing Association Studies. Hum Hered 2020; 84:170-196. [PMID: 32417835 PMCID: PMC7351593 DOI: 10.1159/000506008] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Accepted: 01/17/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The risk of many complex diseases is determined by an interplay of genetic and environmental factors. The examination of gene-environment interactions (G×Es) for multiple traits can yield valuable insights about the etiology of the disease and increase power in detecting disease-associated genes. However, the methods for testing G×Es for multiple traits are very limited. METHOD We developed novel approaches to test G×Es for multiple traits in sequencing association studies. We first perform a transformation of multiple traits by using either principal component analysis or standardization analysis. Then, we detect the effects of G×Es using novel proposed tests: testing the effect of an optimally weighted combination of G×Es (TOW-GE) and/or variable weight TOW-GE (VW-TOW-GE). Finally, we employ Fisher's combination test to combine the p values. RESULTS Extensive simulation studies show that the type I error rates of the proposed methods are well controlled. Compared to the interaction sequence kernel association test (ISKAT), TOW-GE is more powerful when there are only rare risk and protective variants; VW-TOW-GE is more powerful when there are both rare and common variants. Both TOW-GE and VW-TOW-GE are robust to directions of effects of causal G×Es. Application to the COPDGene Study demonstrates that our proposed methods are very effective. CONCLUSIONS Our proposed methods are useful tools in the identification of G×Es for multiple traits. The proposed methods can be used not only to identify G×Es for common variants, but also for rare variants. Therefore, they can be employed in identifying G×Es in both genome-wide association studies and next-generation sequencing data analyses.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| | - Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Han Hao
- Department of Mathematics, University of North Texas, Denton, Texas, USA
| | - Shuanglin Zhang
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, USA
| | - Xiaoyi Raymond Gao
- Department of Ophthalmology and Visual Science, The Ohio State University, Columbus, Ohio, USA
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
- Division of Human Genetics, The Ohio State University, Columbus, Ohio, USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, Texas, USA,
| |
Collapse
|
8
|
Zhang J, Guo X, Gonzales S, Yang J, Wang X. TS: a powerful truncated test to detect novel disease associated genes using publicly available gWAS summary data. BMC Bioinformatics 2020; 21:172. [PMID: 32366212 PMCID: PMC7199321 DOI: 10.1186/s12859-020-3511-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2019] [Accepted: 04/23/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In the last decade, a large number of common variants underlying complex diseases have been identified through genome-wide association studies (GWASs). Summary data of the GWASs are freely and publicly available. The summary data is usually obtained through single marker analysis. Gene-based analysis offers a useful alternative and complement to single marker analysis. Results from gene level association tests can be more readily integrated with downstream functional and pathogenic investigations. Most existing gene-based methods fall into two categories: burden tests and quadratic tests. Burden tests are usually powerful when the directions of effects of causal variants are the same. However, they may suffer loss of statistical power when different directions of effects exist at the causal variants. The power of quadratic tests is not affected by the directions of effects but could be less powerful due to issues such as the large number of degree of freedoms. These drawbacks of existing gene based methods motivated us to develop a new powerful method to identify disease associated genes using existing GWAS summary data. METHODS AND RESULTS In this paper, we propose a new truncated statistic method (TS) by utilizing a truncated method to find the genes that have a true contribution to the genetic association. Extensive simulation studies demonstrate that our proposed test outperforms other comparable tests. We applied TS and other comparable methods to the schizophrenia GWAS data and type 2 diabetes (T2D) GWAS meta-analysis summary data. TS identified more disease associated genes than comparable methods. Many of the significant genes identified by TS may have important mechanisms relevant to the associated traits. TS is implemented in C program TS, which is freely and publicly available online. CONCLUSIONS The proposed truncated statistic outperforms existing methods. It can be employed to detect novel traits associated genes using GWAS summary data.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, 76203 TX USA
| | - Xuan Guo
- Department of Computer Science and Engineering, University of North Texas, Discovery Park 3940 N. Elm, Denton, 76203 TX USA
| | - Samantha Gonzales
- Department of Computer Science and Engineering, University of North Texas, Discovery Park 3940 N. Elm, Denton, 76203 TX USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics School of Medicine, Emory University, Whitehead Biomedical Research Building, Suite 305K, Atlanta, 30322 GA USA
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, 1155 Union Circle #311430, Denton, 76203 TX USA
| |
Collapse
|
9
|
Zhang J, Xie S, Gonzales S, Liu J, Wang X. A fast and powerful eQTL weighted method to detect genes associated with complex trait using GWAS summary data. Genet Epidemiol 2020; 44:550-563. [PMID: 32350919 DOI: 10.1002/gepi.22297] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2019] [Revised: 04/13/2020] [Accepted: 04/14/2020] [Indexed: 02/06/2023]
Abstract
Although genomewide association studies (GWASs) have identified many genetic variants underlying complex traits, a large fraction of heritability still remains unexplained. Integrative analysis that incorporates additional information, such as expression quantitativetrait locus (eQTL) data into sequencing studies (denoted as transcriptomewide association study [TWAS]), can aid the discovery of trait-associated genetic variants. However, general TWAS methods only incorporate one eQTL-derived weight (e.g., cis-effect), and thus can suffer a substantial loss of power when the single estimated cis-effect is not predictive for the effect size of a genetic variant or when there are estimation errors in the estimated cis-effect, or if the data are not consistent with the model assumption. In this study, we propose an omnibus test (OT) which utilizes a Cauchy association test to integrate association evidence demonstrated by three different traditional tests (burden test, quadratic test, and adaptive test) using GWAS summary data with multiple eQTL-derived weights. The p value of the proposed test can be calculated analytically, and thus it is fast and efficient. We applied our proposed test to two schizophrenia (SCZ) GWAS summary data sets and two lipids trait (HDL) GWAS summary data sets. Compared with the three traditional tests, our proposed OT can identify more trait-associated genes.
Collapse
Affiliation(s)
- Jianjun Zhang
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Sicong Xie
- Beijing National Day School, Beijing, China
| | - Samantha Gonzales
- Department of Computer Science and Engineering, University of North Texas, Denton, Texas
| | - Jianguo Liu
- Department of Mathematics, University of North Texas, Denton, Texas
| | - Xuexia Wang
- Department of Mathematics, University of North Texas, Denton, Texas
| |
Collapse
|