1
|
Jurgens SJ, Wang X, Choi SH, Weng LC, Koyama S, Pirruccello JP, Nguyen T, Smadbeck P, Jang D, Chaffin M, Walsh R, Roselli C, Elliott AL, Wijdeveld LFJM, Biddinger KJ, Kany S, Rämö JT, Natarajan P, Aragam KG, Flannick J, Burtt NP, Bezzina CR, Lubitz SA, Lunetta KL, Ellinor PT. Rare coding variant analysis for human diseases across biobanks and ancestries. Nat Genet 2024; 56:1811-1820. [PMID: 39210047 DOI: 10.1038/s41588-024-01894-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 08/01/2024] [Indexed: 09/04/2024]
Abstract
Large-scale sequencing has enabled unparalleled opportunities to investigate the role of rare coding variation in human phenotypic variability. Here, we present a pan-ancestry analysis of sequencing data from three large biobanks, including the All of Us research program. Using mixed-effects models, we performed gene-based rare variant testing for 601 diseases across 748,879 individuals, including 155,236 with ancestry dissimilar to European. We identified 363 significant associations, which highlighted core genes for the human disease phenome and identified potential novel associations, including UBR3 for cardiometabolic disease and YLPM1 for psychiatric disease. Pan-ancestry burden testing represented an inclusive and useful approach for discovery in diverse datasets, although we also highlight the importance of ancestry-specific sensitivity analyses in this setting. Finally, we found that effect sizes for rare protein-disrupting variants were concordant between samples similar to European ancestry and other genetic ancestries (βDeming = 0.7-1.0). Our results have implications for multi-ancestry and cross-biobank approaches in sequencing association studies for human disease.
Collapse
Affiliation(s)
- Sean J Jurgens
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Experimental Cardiology, Heart Center, Amsterdam Cardiovascular Sciences, Heart Failure and Arrhythmias, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Xin Wang
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Seung Hoan Choi
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Lu-Chen Weng
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Satoshi Koyama
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - James P Pirruccello
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Division of Cardiology, University of California, San Francisco, CA, USA
| | - Trang Nguyen
- Metabolism Program, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Patrick Smadbeck
- Metabolism Program, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Dongkeun Jang
- Metabolism Program, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Mark Chaffin
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Roddy Walsh
- Department of Experimental Cardiology, Heart Center, Amsterdam Cardiovascular Sciences, Heart Failure and Arrhythmias, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
| | - Carolina Roselli
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Amanda L Elliott
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Psychiatry and Center for Genomic Medicine, Psychiatric and Neurodevelopmental Genetics Unit, Massachusetts General Hospital,Harvard Medical School, Boston, MA, USA
- Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Leonoor F J M Wijdeveld
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Physiology, Amsterdam UMC location VU, Amsterdam, The Netherlands
| | - Kiran J Biddinger
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Shinwan Kany
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Cardiology, University Heart and Vascular Center Hamburg-Eppendorf, Hamburg, Germany
- German Center for Cardiovascular Research (DZHK), Partner Site Hamburg/Kiel/Lübeck, Hamburg, Germany
| | - Joel T Rämö
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Institute for Molecular Medicine Finland (FIMM), Helsinki Institute of Life Science (HiLIFE), University of Helsinki, Helsinki, Finland
| | - Pradeep Natarajan
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Center for Genomic Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Krishna G Aragam
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
| | - Jason Flannick
- Metabolism Program, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA, USA
- Department of Pediatrics, Harvard Medical School, Boston, MA, USA
| | - Noël P Burtt
- Metabolism Program, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Program in Medical and Population Genetics, The Broad Institute of Harvard and MIT, Cambridge, MA, USA
| | - Connie R Bezzina
- Department of Experimental Cardiology, Heart Center, Amsterdam Cardiovascular Sciences, Heart Failure and Arrhythmias, Amsterdam UMC location University of Amsterdam, Amsterdam, The Netherlands
| | - Steven A Lubitz
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA
| | - Kathryn L Lunetta
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- NHLBI and Boston University's Framingham Heart Study, Framingham, MA, USA
| | - Patrick T Ellinor
- Cardiovascular Disease Initiative, The Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Cardiovascular Research Center, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA.
- Demoulas Center for Cardiac Arrhythmias, Massachusetts General Hospital, Boston, MA, USA.
| |
Collapse
|
2
|
Rajabli F, Kunkle BW. Strategies in Aggregation Tests for Rare Variants. Curr Protoc 2023; 3:e931. [PMID: 37988228 DOI: 10.1002/cpz1.931] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2023]
Abstract
Genome-wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high-throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well-established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step-by-step analysis of aggregations tests with the hands-on experience using R scripts in four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family-based design of rare-variant analysis, population stratification adjustment, and meta-analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.
Collapse
Affiliation(s)
- Farid Rajabli
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics, John P. Hussman Institute for Human Genomics, University of Miami Miller School of Medicine, Miami, Florida, USA
| |
Collapse
|
3
|
Quick C, Wen X, Abecasis G, Boehnke M, Kang HM. Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis. PLoS Genet 2020; 16:e1009060. [PMID: 33320851 PMCID: PMC7737906 DOI: 10.1371/journal.pgen.1009060] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 08/18/2020] [Indexed: 11/19/2022] Open
Abstract
Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.
Collapse
Affiliation(s)
- Corbin Quick
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Xiaoquan Wen
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Gonçalo Abecasis
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
- Regeneron Genetics Center, Regeneron Pharmaceuticals, Tarrytown, NY, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Hyun Min Kang
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
4
|
Svishcheva GR, Belonogova NM, Zorkoltseva IV, Kirichenko AV, Axenovich TI. Gene-based association tests using GWAS summary statistics. Bioinformatics 2020; 35:3701-3708. [PMID: 30860568 DOI: 10.1093/bioinformatics/btz172] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 02/12/2019] [Accepted: 03/11/2019] [Indexed: 01/09/2023] Open
Abstract
MOTIVATION A huge number of genome-wide association studies (GWAS) summary statistics freely available in databases provide a new material for gene-based association analysis aimed at identifying rare genetic variants. Only a few of the many popular gene-based methods developed for individual genotype and phenotype data are adapted for the practical use of the GWAS summary statistics as input. RESULTS We analytically prove and numerically illustrate that all popular powerful methods developed for gene-based association analysis of individual phenotype and genotype data can be modified to utilize GWAS summary statistics. We have modified and implemented all of the popular methods, including burden and kernel machine-based tests, multiple and functional linear regression, principal components analysis and others, in the R package sumFREGAT. Using real summary statistics for coronary artery disease, we show that the new package is able to detect genes not found by the existing packages. AVAILABILITY AND IMPLEMENTATION The R package sumFREGAT is freely and publicly available at: https://CRAN.R-project.org/package=sumFREGAT. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gulnara R Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Vavilov Institute of General Genetics, the Russian Academy of Sciences, Moscow, Russia
| | - Nadezhda M Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Irina V Zorkoltseva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Anatoly V Kirichenko
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia.,Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia.,Department of Biotechnology, L.K. Ernst Federal Center for Animal Husbandry, Dubrovitsy, Russia
| |
Collapse
|
5
|
Statistical Method Based on Bayes-Type Empirical Score Test for Assessing Genetic Association with Multilocus Genotype Data. Int J Genomics 2020; 2020:4708152. [PMID: 32455126 PMCID: PMC7229558 DOI: 10.1155/2020/4708152] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2019] [Accepted: 04/21/2020] [Indexed: 12/20/2022] Open
Abstract
Simultaneous testing of multiple genetic variants for association is widely recognized as a valuable complementary approach to single-marker tests. As such, principal component regression (PCR) has been found to have competitive power. We focus on exploring a robust test for an unknown genetic mode of all SNPs, an unknown Hardy-Weinberg equilibrium (HWE) in a population, and a large number of all SNPs. First, we propose a new global test by means of the use of codominant codes for all markers and PCR. The new global test is built on an empirical Bayes-type score statistic for testing marginal associations with each single marker. The new global test gains power by robustly exploiting the Hardy-Weinberg equilibrium in the control population and effectively using linkage disequilibrium among test markers. The new global test reduces to PCR when the genotype for each marker is coded as the number of minor alleles. This connection lends insight into the power of the new global test relative to PCR and some other popular multimarker test methods. Second, we propose a robust test method based on the new global test and the ordinary PCR test built on a prospective score statistic for testing marginal associations with each single marker when the genotype for each marker is coded as the number of minor alleles by taking the minimum p value of these two tests. Finally, through extensive simulation studies and analysis of the association between pancreatic cancer and some genes of interest, we show that the proposed robust test method has desirable power and can often identify association signals that may be missed by existing methods.
Collapse
|
6
|
Khalique F, Khan SA, Butt WH, Matloob I. An Integrated Approach for Spatio-Temporal Cholera Disease Hotspot Relation Mining for Public Health Management in Punjab, Pakistan. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2020; 17:ijerph17113763. [PMID: 32466471 PMCID: PMC7312960 DOI: 10.3390/ijerph17113763] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 05/14/2020] [Accepted: 05/18/2020] [Indexed: 12/13/2022]
Abstract
Public health management can generate actionable results when diseases are studied in context with other candidate factors contributing to disease dynamics. In order to fully understand the interdependent relationships of multiple geospatial features involved in disease dynamics, it is important to construct an effective representation model that is able to reveal the relationship patterns and trends. The purpose of this work is to combine disease incidence spatio-temporal data with other features of interest in a mutlivariate spatio-temporal model for investigating characteristic disease and feature patterns over identified hotspots. We present an integrated approach in the form of a disease management model for analyzing spatio-temporal dynamics of disease in connection with other determinants. Our approach aligns spatio-temporal profiles of disease with other driving factors in public health context to identify hotspots and patterns of disease and features of interest in the identified locations. We evaluate our model against cholera disease outbreaks from 2015–2019 in Punjab province of Pakistan. The experimental results showed that the presented model effectively address the complex dynamics of disease incidences in the presence of other features of interest over a geographic area representing populations and sub populations during a given time. The presented methodology provides an effective mechanism for identifying disease hotspots in multiple dimensions and relation between the hotspots for cost-effective and optimal resource allocation as well as a sound reference for further predictive and forecasting analysis.
Collapse
|
7
|
Yang T, Kim J, Wu C, Ma Y, Wei P, Pan W. An adaptive test for meta-analysis of rare variant association studies. Genet Epidemiol 2020; 44:104-116. [PMID: 31830326 PMCID: PMC6980317 DOI: 10.1002/gepi.22273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/12/2019] [Accepted: 11/25/2019] [Indexed: 01/02/2023]
Abstract
Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Junghi Kim
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yiding Ma
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
8
|
Povysil G, Petrovski S, Hostyk J, Aggarwal V, Allen AS, Goldstein DB. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat Rev Genet 2019; 20:747-759. [PMID: 31605095 DOI: 10.1038/s41576-019-0177-4] [Citation(s) in RCA: 117] [Impact Index Per Article: 23.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2019] [Indexed: 12/11/2022]
Abstract
The first phase of genome-wide association studies (GWAS) assessed the role of common variation in human disease. Advances optimizing and economizing high-throughput sequencing have enabled a second phase of association studies that assess the contribution of rare variation to complex disease in all protein-coding genes. Unlike the early microarray-based studies, sequencing-based studies catalogue the full range of genetic variation, including the evolutionarily youngest forms. Although the experience with common variants helped establish relevant standards for genome-wide studies, the analysis of rare variation introduces several challenges that require novel analysis approaches.
Collapse
Affiliation(s)
- Gundula Povysil
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.,Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Joseph Hostyk
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Vimla Aggarwal
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA.
| |
Collapse
|
9
|
Wang L, Lee S, Qiao D, Cho MH, Silverman EK, Lange C, Won S. metaFARVAT: An Efficient Tool for Meta-Analysis of Family-Based, Case-Control, and Population-Based Rare Variant Association Studies. Front Genet 2019; 10:572. [PMID: 31275357 PMCID: PMC6593391 DOI: 10.3389/fgene.2019.00572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Accepted: 05/31/2019] [Indexed: 11/13/2022] Open
Abstract
Family-based designs have been shown to be powerful in detecting the significant rare variants associated with human diseases. However, very few significant results have been found owing to relatively small sample sizes and the fact that statistical analyses often suffer from high false-negative error rates. These limitations can be avoided by combining results from multiple studies via meta-analysis. However, statistical methods for meta-analysis with rare variants are limited for family-based samples. In this report, we propose a tool for the meta-analysis of family-based rare variant associations, metaFARVAT. metaFARVAT is based on a quasi-likelihood score for each variant. These scores are combined to generate burden test, variable-threshold test, sequence kernel association test (SKAT), and optimal SKAT statistics. The proposed method tests homogeneous and heterogeneous effects of variants among different studies and can be applied to both quantitative and dichotomous phenotypes. Simulation results demonstrated the robustness and efficiency of the proposed method in different scenarios. By applying metaFARVAT to data from a family-based study and a case-control study, we identified a few promising candidate genes, including DLEC1, which is associated with chronic obstructive pulmonary disease.
Collapse
Affiliation(s)
- Longfei Wang
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea
| | - Sungyoung Lee
- Center for Precision Medicine, Seoul National University Hospital, Seoul, South Korea
| | - Dandi Qiao
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, United States
| | - Christoph Lange
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, United States.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Sungho Won
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, South Korea.,Department of Public Health Sciences, Seoul National University, Seoul, South Korea.,Institute of Health and Environment, Seoul National University, Seoul, South Korea
| |
Collapse
|
10
|
Weissenkampen JD, Jiang Y, Eckert S, Jiang B, Li B, Liu DJ. Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits. CURRENT PROTOCOLS IN HUMAN GENETICS 2019; 101:e83. [PMID: 30849219 PMCID: PMC6455968 DOI: 10.1002/cphg.83] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
| | - Yu Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Scott Eckert
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt Genetics Institute, Vanderbilt University, Nashville, TN
| | - Dajiang J. Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey PA
| |
Collapse
|
11
|
Abstract
Analysis of genomic data is often complicated by the presence of missing values, which may arise due to cost or other reasons. The prevailing approach of single imputation is generally invalid if the imputation model is misspecified. In this paper, we propose a robust score statistic based on imputed data for testing the association between a phenotype and a genomic variable with (partially) missing values. We fit a semiparametric regression model for the genomic variable against an arbitrary function of the linear predictor in the phenotype model and impute each missing value by its estimated posterior expectation. We show that the score statistic with such imputed values is asymptotically unbiased under general missing-data mechanisms, even when the imputation model is misspecified. We develop a spline-based method to estimate the semiparametric imputation model and derive the asymptotic distribution of the corresponding score statistic with a consistent variance estimator using sieve approximation theory and empirical process theory. The proposed test is computationally feasible regardless of the number of independent variables in the imputation model. We demonstrate the advantages of the proposed method over existing methods through extensive simulation studies and provide an application to a major cancer genomics study.
Collapse
Affiliation(s)
- Kin Yau Wong
- Department of Applied Mathematics, The Hong Kong Polytechnic University, Hong Kong
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| | - D Y Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599, USA
| |
Collapse
|
12
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
13
|
Jiang Y, Chen S, McGuire D, Chen F, Liu M, Iacono WG, Hewitt JK, Hokanson JE, Krauter K, Laakso M, Li KW, Lutz SM, McGue M, Pandit A, Zajac GJM, Boehnke M, Abecasis GR, Vrieze SI, Zhan X, Jiang B, Liu DJ. Proper conditional analysis in the presence of missing data: Application to large scale meta-analysis of tobacco use phenotypes. PLoS Genet 2018; 14:e1007452. [PMID: 30016313 PMCID: PMC6063450 DOI: 10.1371/journal.pgen.1007452] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Revised: 07/27/2018] [Accepted: 05/25/2018] [Indexed: 11/19/2022] Open
Abstract
Meta-analysis of genetic association studies increases sample size and the power for mapping complex traits. Existing methods are mostly developed for datasets without missing values, i.e. the summary association statistics are measured for all variants in contributing studies. In practice, genotype imputation is not always effective. This may be the case when targeted genotyping/sequencing assays are used or when the un-typed genetic variant is rare. Therefore, contributed summary statistics often contain missing values. Existing methods for imputing missing summary association statistics and using imputed values in meta-analysis, approximate conditional analysis, or simple strategies such as complete case analysis all have theoretical limitations. Applying these approaches can bias genetic effect estimates and lead to seriously inflated type-I or type-II errors in conditional analysis, which is a critical tool for identifying independently associated variants. To address this challenge and complement imputation methods, we developed a method to combine summary statistics across participating studies and consistently estimate joint effects, even when the contributed summary statistics contain large amounts of missing values. Based on this estimator, we proposed a score statistic called PCBS (partial correlation based score statistic) for conditional analysis of single-variant and gene-level associations. Through extensive analysis of simulated and real data, we showed that the new method produces well-calibrated type-I errors and is substantially more powerful than existing approaches. We applied the proposed approach to one of the largest meta-analyses to date for the cigarettes-per-day phenotype. Using the new method, we identified multiple novel independently associated variants at known loci for tobacco use, which were otherwise missed by alternative methods. Together, the phenotypic variance explained by these variants was 1.1%, improving that of previously reported associations by 71%. These findings illustrate the extent of locus allelic heterogeneity and can help pinpoint causal variants. It is of great interest to estimate the joint effects of multiple variants from large scale meta-analyses, in order to fine-map causal variants and understand the genetic architecture for complex traits. The summary association statistics from participating studies in a meta-analysis often contain missing values at some variant sites, as the imputation methods may not work well and the variants with low imputation quality will be filtered out. Missingness is especially likely when the underlying genetic variant is rare or the participating studies use targeted genotyping array that is not suitable for imputation. Existing methods for conditional meta-analysis do not properly handle missing data, and can incorrectly estimate correlations between score statistics. As a result, they can produce highly inflated type-I errors for conditional analysis, which will result in overestimated phenotypic variance explained and incorrect identification of causal variants. We systematically evaluated this bias and proposed a novel partial correlation based score statistic. The new statistic has valid type-I errors for conditional analysis and much higher power than the existing methods, even when the contributed summary statistics contain a large fraction of missing values. We expect this method to be highly useful in the sequencing age for complex trait genetics.
Collapse
Affiliation(s)
- Yu Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Sai Chen
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Daniel McGuire
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Fang Chen
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
| | - Mengzhen Liu
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - William G. Iacono
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - John K. Hewitt
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - John E. Hokanson
- Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Kenneth Krauter
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, Colorado, United States of America
| | - Markku Laakso
- Institute of Clinical Medicine, Internal Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio, Finland
| | - Kevin W. Li
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sharon M. Lutz
- Department of Biostatistics and Informatics, University of Colorado, Anschutz Medical Campus, Aurora, Colorado, United States of America
| | - Matthew McGue
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Anita Pandit
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Gregory J. M. Zajac
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Michael Boehnke
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Goncalo R. Abecasis
- Center of Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Scott I. Vrieze
- Department of Psychology, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Xiaowei Zhan
- Department of Clinical Science, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Bibo Jiang
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
- * E-mail: (DJL); (BJ)
| | - Dajiang J. Liu
- Department of Public Health Sciences, Penn State College of Medicine, Hershey, Pennsylvania, United States of America
- * E-mail: (DJL); (BJ)
| |
Collapse
|
14
|
Yang J, Chen S, Abecasis G. Improved score statistics for meta-analysis in single-variant and gene-level association studies. Genet Epidemiol 2018; 42:333-343. [PMID: 29696691 DOI: 10.1002/gepi.22123] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2017] [Revised: 03/04/2018] [Accepted: 03/16/2018] [Indexed: 01/09/2023]
Abstract
Meta-analysis is now an essential tool for genetic association studies, allowing them to combine large studies and greatly accelerating the pace of genetic discovery. Although the standard meta-analysis methods perform equivalently as the more cumbersome joint analysis under ideal settings, they result in substantial power loss under unbalanced settings with various case-control ratios. Here, we investigate the power loss problem by the standard meta-analysis methods for unbalanced studies, and further propose novel meta-analysis methods performing equivalently to the joint analysis under both balanced and unbalanced settings. We derive improved meta-score-statistics that can accurately approximate the joint-score-statistics with combined individual-level data, for both linear and logistic regression models, with and without covariates. In addition, we propose a novel approach to adjust for population stratification by correcting for known population structures through minor allele frequencies. In the simulated gene-level association studies under unbalanced settings, our method recovered up to 85% power loss caused by the standard methods. We further showed the power gain of our methods in gene-level tests with 26 unbalanced studies of age-related macular degeneration . In addition, we took the meta-analysis of three unbalanced studies of type 2 diabetes as an example to discuss the challenges of meta-analyzing multi-ethnic samples. In summary, our improved meta-score-statistics with corrections for population stratification can be used to construct both single-variant and gene-level association studies, providing a useful framework for ensuring well-powered, convenient, cross-study analyses.
Collapse
Affiliation(s)
- Jingjing Yang
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America.,Department of Human Genetics, Center for Computational and Quantitative Genetics, Emory University School of Medicine, Atlanta, Georgia, United States of America
| | - Sai Chen
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Gonçalo Abecasis
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | |
Collapse
|
15
|
Abstract
Meta-analysis is a statistical technique that is widely used for improving the power to detect associations, by synthesizing data from independent studies, and is extensively used in the genomic analyses of complex traits. Estimates from different studies are combined and the results effectively provide the power of a much larger study. Meta-analysis also has the potential of discovering heterogeneity in the effects among the different studies. This chapter provides an overview of the methods used for meta-analysis of common and rare single variants and also for gene/region-based analyses; common variants are mainly identified via genome-wide association studies (GWAS) and rare variants through various types of sequencing experiments.
Collapse
Affiliation(s)
- Kyriaki Michailidou
- Department of Electron Microscopy/Molecular Pathology, The Cyprus Institute of Neurology and Genetics, Nicosia, Cyprus.
| |
Collapse
|
16
|
Lu W, Wang X, Zhan X, Gazdar A. Meta-analysis approaches to combine multiple gene set enrichment studies. Stat Med 2017; 37:659-672. [PMID: 29052247 DOI: 10.1002/sim.7540] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2016] [Revised: 07/02/2017] [Accepted: 09/29/2017] [Indexed: 11/09/2022]
Abstract
In the field of gene set enrichment analysis (GSEA), meta-analysis has been used to integrate information from multiple studies to present a reliable summarization of the expanding volume of individual biomedical research, as well as improve the power of detecting essential gene sets involved in complex human diseases. However, existing methods, Meta-Analysis for Pathway Enrichment (MAPE), may be subject to power loss because of (1) using gross summary statistics for combining end results from component studies and (2) using enrichment scores whose distributions depend on the set sizes. In this paper, we adapt meta-analysis approaches recently developed for genome-wide association studies, which are based on fixed effect and random effects (RE) models, to integrate multiple GSEA studies. We further develop a mixed strategy via adaptive testing for choosing RE versus FE models to achieve greater statistical efficiency as well as flexibility. In addition, a size-adjusted enrichment score based on a one-sided Kolmogorov-Smirnov statistic is proposed to formally account for varying set sizes when testing multiple gene sets. Our methods tend to have much better performance than the MAPE methods and can be applied to both discrete and continuous phenotypes. Specifically, the performance of the adaptive testing method seems to be the most stable in general situations.
Collapse
Affiliation(s)
- Wentao Lu
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA
| | - Xinlei Wang
- Department of Statistical Science, Southern Methodist University, Dallas, TX 75275, USA
| | - Xiaowei Zhan
- Quantitative Biomedical Research Center, Center for the Genetics of Host Defense, Department of Clinical Science, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Adi Gazdar
- Department of Pathology, University of Texas Southwestern Medical Center, Dallas, TX 75235, USA
| |
Collapse
|
17
|
Tang ZZ, Bunn P, Tao R, Liu Z, Lin DY. PreMeta: a tool to facilitate meta-analysis of rare-variant associations. BMC Genomics 2017; 18:160. [PMID: 28196472 PMCID: PMC5310051 DOI: 10.1186/s12864-017-3573-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2016] [Accepted: 02/09/2017] [Indexed: 11/10/2022] Open
Abstract
Background Meta-analysis is essential to the discovery of rare variants that influence complex diseases and traits. Four major software packages, namely MASS, MetaSKAT, RAREMETAL, and seqMeta, have been developed to perform meta-analysis of rare-variant associations. These packages first generate summary statistics for each study and then perform the meta-analysis by combining the summary statistics. Because of incompatible file formats and non-equivalent summary statistics, the output files from the study-level analysis of one package cannot be directly used to perform meta-analysis in another package. Results We developed a computationally efficient software program, PreMeta, to resolve the non-compatibility of the four software packages and to facilitate meta-analysis of large-scale sequencing studies in a consortium setting. PreMeta reformats the output files of study-level summary statistics generated by the four packages (text files produced by MASS and RAREMETAL, binary files produced by MetaSKAT, and R data files produced by seqMeta) and translates the summary statistics from one form to another, such that the summary statistics from any package can be used to perform meta-analysis in any other package. With this tool, consortium members are not required to use the same software for study-level analyses. In addition, PreMeta checks for allele mismatches, corrects summary statistics, and allows the rescaled inverse normal transformation to be performed at the meta-analysis stage by rescaling summary statistics. Conclusions PreMeta processes summary statistics from the four packages to make them compatible and avoids the need to redo study-level analyses. PreMeta documentation and executable are available at: http://dlin.web.unc.edu/software/premeta.
Collapse
Affiliation(s)
- Zheng-Zheng Tang
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, 37203, TN, USA.
| | - Paul Bunn
- Department of Biostatistics, University of North Carolina, Chapel Hill, 37203, NC, USA
| | - Ran Tao
- Department of Biostatistics, University of North Carolina, Chapel Hill, 37203, NC, USA
| | - Zhouwen Liu
- Department of Biostatistics, Vanderbilt University School of Medicine, Nashville, 37203, TN, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, 37203, NC, USA
| |
Collapse
|
18
|
Discovery of rare variants for complex phenotypes. Hum Genet 2016; 135:625-34. [PMID: 27221085 DOI: 10.1007/s00439-016-1679-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
Collapse
|
19
|
Zhan X, Liu DJ. SEQMINER: An R-Package to Facilitate the Functional Interpretation of Sequence-Based Associations. Genet Epidemiol 2015; 39:619-23. [PMID: 26394715 PMCID: PMC4794281 DOI: 10.1002/gepi.21918] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Revised: 07/01/2015] [Accepted: 07/17/2015] [Indexed: 11/23/2022]
Abstract
Next‐generation sequencing has enabled the study of a comprehensive catalogue of genetic variants for their impact on various complex diseases. Numerous consortia studies of complex traits have publically released their summary association statistics, which have become an invaluable resource for learning the underlying biology, understanding the genetic architecture, and guiding clinical translations. There is great interest in the field in developing novel statistical methods for analyzing and interpreting results from these genotype‐phenotype association studies. One popular platform for method development and data analysis is R. In order to enable these analyses in R, it is necessary to develop packages that can efficiently query files of summary association statistics, explore the linkage disequilibrium structure between variants, and integrate various bioinformatics databases. The complexity and scale of sequence datasets and databases pose significant computational challenges for method developers. To address these challenges and facilitate method development, we developed the R package SEQMINER for annotating and querying files of sequence variants (e.g., VCF/BCF files) and summary association statistics (e.g., METAL/RAREMETAL files), and for integrating bioinformatics databases. SEQMINER provides an infrastructure where novel methods can be distributed and applied to analyzing sequence datasets in practice. We illustrate the performance of SEQMINER using datasets from the 1000 Genomes Project. We show that SEQMINER is highly efficient and easy to use. It will greatly accelerate the process of applying statistical innovations to analyze and interpret sequence‐based associations. The R package, its source code and documentations are available from http://cran.r‐project.org/web/packages/seqminer and http://seqminer.genomic.codes/.
Collapse
Affiliation(s)
- Xiaowei Zhan
- Department of Clinical Sciences, Quantitative Biomedical Research Center, Center for the Genetics of Host Defense, University of Texas Southwestern Medical Center, Dallas, Texas, United States of America
| | - Dajiang J Liu
- Institute for Personalized Medicine, College of Medicine, Pennsylvania State University, Pennsylvania, Hershey, United States of America.,Division of Biostatistics, Department of Public Health Sciences, College of Medicine, Pennsylvania State University, Hershey, Pennsylvania, United States of America
| |
Collapse
|
20
|
Tao R, Zeng D, Franceschini N, North KE, Boerwinkle E, Lin DY. Analysis of Sequence Data Under Multivariate Trait-Dependent Sampling. J Am Stat Assoc 2015; 110:560-572. [PMID: 26366025 DOI: 10.1080/01621459.2015.1008099] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
High-throughput DNA sequencing allows for the genotyping of common and rare variants for genetic association studies. At the present time and for the foreseeable future, it is not economically feasible to sequence all individuals in a large cohort. A cost-effective strategy is to sequence those individuals with extreme values of a quantitative trait. We consider the design under which the sampling depends on multiple quantitative traits. Under such trait-dependent sampling, standard linear regression analysis can result in bias of parameter estimation, inflation of type I error, and loss of power. We construct a likelihood function that properly reflects the sampling mechanism and utilizes all available data. We implement a computationally efficient EM algorithm and establish the theoretical properties of the resulting maximum likelihood estimators. Our methods can be used to perform separate inference on each trait or simultaneous inference on multiple traits. We pay special attention to gene-level association tests for rare variants. We demonstrate the superiority of the proposed methods over standard linear regression through extensive simulation studies. We provide applications to the Cohorts for Heart and Aging Research in Genomic Epidemiology Targeted Sequencing Study and the National Heart, Lung, and Blood Institute Exome Sequencing Project.
Collapse
Affiliation(s)
- Ran Tao
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Donglin Zeng
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599
| | - Kari E North
- Department of Epidemiology, University of North Carolina, Chapel Hill, NC 27599
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center, Houston, TX 77030
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina, Chapel Hill, NC 27599
| |
Collapse
|
21
|
Meta-analysis for Discovering Rare-Variant Associations: Statistical Methods and Software Programs. Am J Hum Genet 2015; 97:35-53. [PMID: 26094574 DOI: 10.1016/j.ajhg.2015.05.001] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2015] [Accepted: 05/01/2015] [Indexed: 01/01/2023] Open
Abstract
There is heightened interest in using next-generation sequencing technologies to identify rare variants that influence complex human diseases and traits. Meta-analysis is essential to this endeavor because large sample sizes are required for detecting associations with rare variants. In this article, we provide a comprehensive overview of statistical methods for meta-analysis of sequencing studies for discovering rare-variant associations. Specifically, we discuss the calculation of relevant summary statistics from participating studies, the construction of gene-level association tests, the choice of transformation for quantitative traits, the use of fixed-effects versus random-effects models, and the removal of shadow association signals through conditional analysis. We also show that meta-analysis based on properly calculated summary statistics is as powerful as joint analysis of individual-participant data. In addition, we demonstrate the performance of different meta-analysis methods by using both simulated and empirical data. We then compare four major software packages for meta-analysis of rare-variant associations-MASS, RAREMETAL, MetaSKAT, and seqMeta-in terms of the underlying statistical methodology, analysis pipeline, and software interface. Finally, we present PreMeta, a software interface that integrates the four meta-analysis packages and allows a consortium to combine otherwise incompatible summary statistics.
Collapse
|
22
|
Genetic variation in uncontrolled childhood asthma despite ICS treatment. THE PHARMACOGENOMICS JOURNAL 2015; 16:158-63. [PMID: 25963336 DOI: 10.1038/tpj.2015.36] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Revised: 03/02/2015] [Accepted: 03/26/2015] [Indexed: 11/08/2022]
Abstract
Genetic variation may partly explain asthma treatment response heterogeneity. We aimed to identify common and rare genetic variants associated with asthma that was not well controlled despite inhaled corticosteroid (ICS) treatment. Data of 110 children was collected in the Children Asthma Therapy Optimal trial. Associations of genetic variation with measures of lung function (FEV1%pred), airway hyperresponsiveness (AHR) to methacholine (Mch PD20) and treatment response outcomes were analyzed using the exome chip. The 17q12-21 locus (containing ORMDL3 and GSMDB) previously associated with childhood asthma was investigated separately. Single-nucleotide polymorphisms (SNPs) in the 17q12-21 locus were found nominally associated with the outcomes. The strongest association in this region was found for rs72821893 in KRT25 with FEV1%pred (P=3.75*10(-5)), Mch PD20 (P=0.00095) and Mch PD20-based treatment outcome (P=0.006). No novel single SNPs or burden tests were significantly associated with the outcomes. The 17q12-21 region was associated with FEV1%pred and AHR, and additionally with ICS treatment response.
Collapse
|
23
|
Wang Q, Lu Q, Zhao H. A review of study designs and statistical methods for genomic epidemiology studies using next generation sequencing. Front Genet 2015; 6:149. [PMID: 25941534 PMCID: PMC4403555 DOI: 10.3389/fgene.2015.00149] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2015] [Accepted: 03/30/2015] [Indexed: 12/22/2022] Open
Abstract
Results from numerous linkage and association studies have greatly deepened scientists’ understanding of the genetic basis of many human diseases, yet some important questions remain unanswered. For example, although a large number of disease-associated loci have been identified from genome-wide association studies in the past 10 years, it is challenging to interpret these results as most disease-associated markers have no clear functional roles in disease etiology, and all the identified genomic factors only explain a small portion of disease heritability. With the help of next-generation sequencing (NGS), diverse types of genomic and epigenetic variations can be detected with high accuracy. More importantly, instead of using linkage disequilibrium to detect association signals based on a set of pre-set probes, NGS allows researchers to directly study all the variants in each individual, therefore promises opportunities for identifying functional variants and a more comprehensive dissection of disease heritability. Although the current scale of NGS studies is still limited due to the high cost, the success of several recent studies suggests the great potential for applying NGS in genomic epidemiology, especially as the cost of sequencing continues to drop. In this review, we discuss several pioneer applications of NGS, summarize scientific discoveries for rare and complex diseases, and compare various study designs including targeted sequencing and whole-genome sequencing using population-based and family-based cohorts. Finally, we highlight recent advancements in statistical methods proposed for sequencing analysis, including group-based association tests, meta-analysis techniques, and annotation tools for variant prioritization.
Collapse
Affiliation(s)
- Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health New Haven, CT, USA
| | - Hongyu Zhao
- Program of Computational Biology and Bioinformatics, Yale University New Haven, CT, USA ; Department of Biostatistics, Yale School of Public Health New Haven, CT, USA ; Veterans Affairs Cooperative Studies Program Coordinating Center West Haven, CT, USA
| |
Collapse
|
24
|
Feng S, Pistis G, Zhang H, Zawistowski M, Mulas A, Zoledziewska M, Holmen OL, Busonero F, Sanna S, Hveem K, Willer C, Cucca F, Liu DJ, Abecasis GR. Methods for association analysis and meta-analysis of rare variants in families. Genet Epidemiol 2015; 39:227-38. [PMID: 25740221 DOI: 10.1002/gepi.21892] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Revised: 01/03/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Advances in exome sequencing and the development of exome genotyping arrays are enabling explorations of association between rare coding variants and complex traits. To ensure power for these rare variant analyses, a variety of association tests that group variants by gene or functional unit have been proposed. Here, we extend these tests to family-based studies. We develop family-based burden tests, variable frequency threshold tests and sequence kernel association tests. Through simulations, we compare the performance of different tests. We describe situations where family-based studies provide greater power than studies of unrelated individuals to detect rare variants associated with moderate to large changes in trait values. Broadly speaking, we find that when sample sizes are limited and only a modest fraction of all trait-associated variants can be identified, family samples are more powerful. Finally, we illustrate our approach by analyzing the relationship between coding variants and levels of high-density lipoprotein (HDL) cholesterol in 11,556 individuals from the HUNT and SardiNIA studies, demonstrating association for coding variants in the APOC3, CETP, LIPC, LIPG, and LPL genes and illustrating the value of family samples, meta-analysis, and gene-level tests. Our methods are implemented in freely available C++ code.
Collapse
Affiliation(s)
- Shuang Feng
- Department of Biostatistics, Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Vaitsiakhovich T, Drichel D, Herold C, Lacour A, Becker T. METAINTER: meta-analysis of multiple regression models in genome-wide association studies. ACTA ACUST UNITED AC 2014; 31:151-7. [PMID: 25252781 DOI: 10.1093/bioinformatics/btu629] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
MOTIVATION Meta-analysis of summary statistics is an essential approach to guarantee the success of genome-wide association studies (GWAS). Application of the fixed or random effects model to single-marker association tests is a standard practice. More complex methods of meta-analysis involving multiple parameters have not been used frequently, a gap that could be explained by the lack of a respective meta-analysis pipeline. Meta-analysis based on combining p-values can be applied to any association test. However, to be powerful, meta-analysis methods for high-dimensional models should incorporate additional information such as study-specific properties of parameter estimates, their effect directions, standard errors and covariance structure. RESULTS We modified 'method for the synthesis of linear regression slopes' recently proposed in the educational sciences to the case of multiple logistic regression, and implemented it in a meta-analysis tool called METAINTER. The software handles models with an arbitrary number of parameters, and can directly be applied to analyze the results of single-SNP tests, global haplotype tests, tests for and under gene-gene or gene-environment interaction. Via simulations for two-single nucleotide polymorphisms (SNP) models we have shown that the proposed meta-analysis method has correct type I error rate. Moreover, power estimates come close to that of the joint analysis of the entire sample. We conducted a real data analysis of six GWAS of type 2 diabetes, available from dbGaP (http://www.ncbi.nlm.nih.gov/gap). For each study, a genome-wide interaction analysis of all SNP pairs was performed by logistic regression tests. The results were then meta-analyzed with METAINTER. AVAILABILITY The software is freely available and distributed under the conditions specified on http://metainter.meb.uni-bonn.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tatsiana Vaitsiakhovich
- Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany
| | - Dmitriy Drichel
- Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany
| | - Christine Herold
- Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany
| | - André Lacour
- Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany
| | - Tim Becker
- Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany Institute for Medical Biometry, Informatics and Epidemiology, University of Bonn and German Center for Neurodegenerative Diseases (DZNE), Sigmund-Freud-Str. 25, D-53105 Bonn, Germany
| |
Collapse
|
26
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 658] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
27
|
Feng S, Liu D, Zhan X, Wing MK, Abecasis GR. RAREMETAL: fast and powerful meta-analysis for rare variants. Bioinformatics 2014; 30:2828-9. [PMID: 24894501 PMCID: PMC4173011 DOI: 10.1093/bioinformatics/btu367] [Citation(s) in RCA: 91] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Summary: RAREMETAL is a computationally efficient tool for meta-analysis of rare variants genotyped using sequencing or arrays. RAREMETAL facilitates analyses of individual studies, accommodates a variety of input file formats, handles related and unrelated individuals, executes both single variant and burden tests and performs conditional association analyses. Availability and implementation:http://genome.sph.umich.edu/wiki/RAREMETAL for executables, source code, documentation and tutorial. Contact:sfengsph@umich.edu or goncalo@umich.edu
Collapse
Affiliation(s)
- Shuang Feng
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Dajiang Liu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Xiaowei Zhan
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Mary Kate Wing
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Gonçalo R Abecasis
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| |
Collapse
|
28
|
Tang ZZ, Lin DY. Meta-analysis of sequencing studies with heterogeneous genetic associations. Genet Epidemiol 2014; 38:389-401. [PMID: 24799183 DOI: 10.1002/gepi.21798] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Revised: 02/05/2014] [Accepted: 02/06/2014] [Indexed: 01/06/2023]
Abstract
Recent advances in sequencing technologies have made it possible to explore the influence of rare variants on complex diseases and traits. Meta-analysis is essential to this exploration because large sample sizes are required to detect rare variants. Several methods are available to conduct meta-analysis for rare variants under fixed-effects models, which assume that the genetic effects are the same across all studies. In practice, genetic associations are likely to be heterogeneous among studies because of differences in population composition, environmental factors, phenotype and genotype measurements, or analysis method. We propose random-effects models which allow the genetic effects to vary among studies and develop the corresponding meta-analysis methods for gene-level association tests. Our methods take score statistics, rather than individual participant data, as input and thus can accommodate any study designs and any phenotypes. We produce the random-effects versions of all commonly used gene-level association tests, including burden, variable threshold, and variance-component tests. We demonstrate through extensive simulation studies that our random-effects tests are substantially more powerful than the fixed-effects tests in the presence of moderate and high between-study heterogeneity and achieve similar power to the latter when the heterogeneity is low. The usefulness of the proposed methods is further illustrated with data from National Heart, Lung, and Blood Institute Exome Sequencing Project (NHLBI ESP). The relevant software is freely available.
Collapse
Affiliation(s)
- Zheng-Zheng Tang
- Department of Biostatistics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | | |
Collapse
|
29
|
Liu DJ, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, Nikpay M, Auer PL, Goel A, Zhang H, Peters U, Farrall M, Orho-Melander M, Kooperberg C, McPherson R, Watkins H, Willer CJ, Hveem K, Melander O, Kathiresan S, Abecasis GR. Meta-analysis of gene-level tests for rare variant association. Nat Genet 2014; 46:200-4. [PMID: 24336170 PMCID: PMC3939031 DOI: 10.1038/ng.2852] [Citation(s) in RCA: 144] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 11/20/2013] [Indexed: 12/14/2022]
Abstract
The majority of reported complex disease associations for common genetic variants have been identified through meta-analysis, a powerful approach that enables the use of large sample sizes while protecting against common artifacts due to population structure and repeated small-sample analyses sharing individual-level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the focus of analysis. Here we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable-threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features from single-variant meta-analysis approaches and demonstrate its use in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays.
Collapse
Affiliation(s)
- Dajiang J. Liu
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Gina M. Peloso
- Broad Institute of Harvard and MIT, Cambridge, MA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA
| | - Xiaowei Zhan
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Oddgeir L. Holmen
- Department of Public Health and General Practice, Norwegian University of Science and Technology, Trondheim 7489, Norway
- St. Olav Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Matthew Zawistowski
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Shuang Feng
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Majid Nikpay
- University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Paul L. Auer
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- School of Public Health, University of Wisconsin-Milwaukee
| | - Anuj Goel
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - He Zhang
- Division of Cardiology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- Department of Epidemiology, University of Washington School of Public Health, Seattle, WA
| | - Martin Farrall
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - Marju Orho-Melander
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- Department of Biostatistics, University of Washington School of Public Health, Seattle, WA
| | - Ruth McPherson
- University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Hugh Watkins
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - Cristen J. Willer
- Division of Cardiology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Kristian Hveem
- Department of Public Health and General Practice, Norwegian University of Science and Technology, Trondheim 7489, Norway
- Levanger Hospital, Levanger, Norway
| | - Olle Melander
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Sekar Kathiresan
- Broad Institute of Harvard and MIT, Cambridge, MA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Cambridge, MA
| | - Gonçalo R. Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| |
Collapse
|
30
|
Panoutsopoulou K, Tachmazidou I, Zeggini E. In search of low-frequency and rare variants affecting complex traits. Hum Mol Genet 2013; 22:R16-21. [PMID: 23922232 PMCID: PMC3782074 DOI: 10.1093/hmg/ddt376] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
Collapse
Affiliation(s)
| | | | - Eleftheria Zeggini
- To whom correspondence should be addressed at: Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. Tel: +44-1223496868; Fax: +44-1223496826;
| |
Collapse
|