301
|
Martín JE, Bossini-Castillo L, Martín J. Unraveling the genetic component of systemic sclerosis. Hum Genet 2012; 131:1023-37. [PMID: 22218928 DOI: 10.1007/s00439-011-1137-z] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Accepted: 12/21/2011] [Indexed: 02/07/2023]
Abstract
Systemic sclerosis (SSc) is a severe connective tissue disorder characterized by extensive fibrosis, vascular damage, and autoimmune events. During the last years, the number of genetic markers convincingly associated with SSc has exponentially increased. In this report, we aim to offer an updated review of the classical and novel genetic associations with SSc, analyzing the firmest and replicated signals within HLA and non-HLA genes, identified by both candidate gene and genome-wide association (GWA) studies. We will also provide an insight into the future perspectives and approaches that might shed more light into the complex genetic background underlying SSc. In spite of the remarkable advance in the field of SSc genetics during the last decade, the use of the new genetic technologies such as next generation sequencing (NGS), as well as the deep phenotyping of the study cohorts, to fully characterize the genetic component of this disease is imperative.
Collapse
Affiliation(s)
- José Ezequiel Martín
- Instituto de Parasitología y Biomedicina López-Neyra, IPBLN-CSIC, Consejo Superior de Investigaciones Científicas, Parque Tecnológico Ciencias de la Salud, Avenida del Conocimiento, 18100-Armilla, Granada, Spain
| | | | | |
Collapse
|
302
|
Birnbaum RY, Hayashi G, Cohen I, Poon A, Chen H, Lam ET, Kwok PY, Birk OS, Liao W. Association analysis identifies ZNF750 regulatory variants in psoriasis. BMC MEDICAL GENETICS 2011; 12:167. [PMID: 22185198 PMCID: PMC3274454 DOI: 10.1186/1471-2350-12-167] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 12/20/2011] [Indexed: 11/10/2022]
Abstract
BACKGROUND Mutations in the ZNF750 promoter and coding regions have been previously associated with Mendelian forms of psoriasis and psoriasiform dermatitis. ZNF750 encodes a putative zinc finger transcription factor that is highly expressed in keratinocytes and represents a candidate psoriasis gene. METHODS We examined whether ZNF750 variants were associated with psoriasis in a large case-control population. We sequenced the promoter and exon regions of ZNF750 in 716 Caucasian psoriasis cases and 397 Caucasian controls. RESULTS We identified a total of 47 variants, including 38 rare variants of which 35 were novel. Association testing identified two ZNF750 haplotypes associated with psoriasis (p < 0.05). We also identified an excess of rare promoter and 5'untranslated region (UTR) variants in psoriasis cases compared to controls (p = 0.041), whereas there was no significant difference in the number of rare coding and rare 3' UTR variants. Using a promoter functional assay in stimulated human primary keratinocytes, we showed that four ZNF750 promoter and 5' UTR variants displayed a 35-55% reduction of ZNF750 promoter activity, consistent with the promoter activity reduction seen in a Mendelian psoriasis family with a ZNF750 promoter variant. However, the rare promoter and 5' UTR variants identified in this study did not strictly segregate with the psoriasis phenotype within families. CONCLUSIONS Two haplotypes of ZNF750 and rare 5' regulatory variants of ZNF750 were found to be associated with psoriasis. These rare 5' regulatory variants, though not causal, might serve as a genetic modifier of psoriasis.
Collapse
Affiliation(s)
- Ramon Y Birnbaum
- Department of Dermatology, University of California, San Francisco, California, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
303
|
Genetic influences in childhood obesity: recent progress and recommendations for experimental designs. Int J Obes (Lond) 2011; 36:479-84. [PMID: 22158269 DOI: 10.1038/ijo.2011.236] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
The increasing prevalence of pediatric obesity around the world has become an area of scientific interest because of public health concern. Although since early stages of the lifespan body weight might be heavily influenced by an individual's behavior, epidemiological research highlights the involvement of genetic influences contributing to variation in fat accumulation and thus body composition. Results from genome-wide association studies and candidate gene approaches have identified specific regions across the human genome influencing obesity-related phenotypes. Reviewing the scientific literature provides support to the belief that at the conceptual level scientists understand that genes and environments do not act independently, but rather synergistically, and that such interaction might be the responsible factor for differences within and among populations. However, there is still limited understanding of genetic and environmental factors influencing fat accumulation and deposition among different populations, which highlights the need for innovative experimental designs, improved body composition measures and appropriate statistical methodology.
Collapse
|
304
|
Khetarpal SA, Edmondson AC, Raghavan A, Neeli H, Jin W, Badellino KO, Demissie S, Manning AK, DerOhannessian SL, Wolfe ML, Cupples LA, Li M, Kathiresan S, Rader DJ. Mining the LIPG allelic spectrum reveals the contribution of rare and common regulatory variants to HDL cholesterol. PLoS Genet 2011; 7:e1002393. [PMID: 22174694 PMCID: PMC3234219 DOI: 10.1371/journal.pgen.1002393] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2011] [Accepted: 10/07/2011] [Indexed: 11/18/2022] Open
Abstract
Genome-wide association studies (GWAS) have successfully identified loci associated with quantitative traits, such as blood lipids. Deep resequencing studies are being utilized to catalogue the allelic spectrum at GWAS loci. The goal of these studies is to identify causative variants and missing heritability, including heritability due to low frequency and rare alleles with large phenotypic impact. Whereas rare variant efforts have primarily focused on nonsynonymous coding variants, we hypothesized that noncoding variants in these loci are also functionally important. Using the HDL-C gene LIPG as an example, we explored the effect of regulatory variants identified through resequencing of subjects at HDL-C extremes on gene expression, protein levels, and phenotype. Resequencing a portion of the LIPG promoter and 5' UTR in human subjects with extreme HDL-C, we identified several rare variants in individuals from both extremes. Luciferase reporter assays were used to measure the effect of these rare variants on LIPG expression. Variants conferring opposing effects on gene expression were enriched in opposite extremes of the phenotypic distribution. Minor alleles of a common regulatory haplotype and noncoding GWAS SNPs were associated with reduced plasma levels of the LIPG gene product endothelial lipase (EL), consistent with its role in HDL-C catabolism. Additionally, we found that a common nonfunctional coding variant associated with HDL-C (rs2000813) is in linkage disequilibrium with a 5' UTR variant (rs34474737) that decreases LIPG promoter activity. We attribute the gene regulatory role of rs34474737 to the observed association of the coding variant with plasma EL levels and HDL-C. Taken together, the findings show that both rare and common noncoding regulatory variants are important contributors to the allelic spectrum in complex trait loci.
Collapse
Affiliation(s)
- Sumeet A. Khetarpal
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Andrew C. Edmondson
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Avanthi Raghavan
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Hemanth Neeli
- Section of Hospital Medicine, Temple University Hospital, Philadelphia, Pennsylvania, United States of America
| | - Weijun Jin
- Department of Cell Biology, State University of New York Downstate Medical Center, Brooklyn, New York, United States of America
| | - Karen O. Badellino
- University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, United States of America
| | - Serkalem Demissie
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, Massachusetts, United States of America
| | - Alisa K. Manning
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| | - Stephanie L. DerOhannessian
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Megan L. Wolfe
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - L. Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
- Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, Massachusetts, United States of America
| | - Mingyao Li
- Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
| | - Sekar Kathiresan
- Cardiovascular Research Center and Center for Human Genetic Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America
| | - Daniel J. Rader
- Institute for Translational Medicine and Therapeutics, Institute for Diabetes, Obesity, and Metabolism, and Cardiovascular Institute, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
305
|
Ramsey LB, Bruun GH, Yang W, Treviño LR, Vattathil S, Scheet P, Cheng C, Rosner GL, Giacomini KM, Fan Y, Sparreboom A, Mikkelsen TS, Corydon TJ, Pui CH, Evans WE, Relling MV. Rare versus common variants in pharmacogenetics: SLCO1B1 variation and methotrexate disposition. Genome Res 2011; 22:1-8. [PMID: 22147369 DOI: 10.1101/gr.129668.111] [Citation(s) in RCA: 209] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Methotrexate is used to treat autoimmune diseases and malignancies, including acute lymphoblastic leukemia (ALL). Inter-individual variation in clearance of methotrexate results in heterogeneous systemic exposure, clinical efficacy, and toxicity. In a genome-wide association study of children with ALL, we identified SLCO1B1 as harboring multiple common polymorphisms associated with methotrexate clearance. The extent of influence of rare versus common variants on pharmacogenomic phenotypes remains largely unexplored. We tested the hypothesis that rare variants in SLCO1B1 could affect methotrexate clearance and compared the influence of common versus rare variants in addition to clinical covariates on clearance. From deep resequencing of SLCO1B1 exons in 699 children, we identified 93 SNPs, 15 of which were non-synonymous (NS). Three of these NS SNPs were common, with a minor allele frequency (MAF) >5%, one had low frequency (MAF 1%-5%), and 11 were rare (MAF <1%). NS SNPs (common or rare) predicted to be functionally damaging were more likely to be found among patients with the lowest methotrexate clearance than patients with high clearance. We verified lower function in vitro of four SLCO1B1 haplotypes that were associated with reduced methotrexate clearance. In a multivariate stepwise regression analysis adjusting for other genetic and non-genetic covariates, SLCO1B1 variants accounted for 10.7% of the population variability in clearance. Of that variability, common NS variants accounted for the majority, but rare damaging NS variants constituted 17.8% of SLCO1B1's effects (1.9% of total variation) and had larger effect sizes than common NS variants. Our results show that rare variants are likely to have an important effect on pharmacogenetic phenotypes.
Collapse
Affiliation(s)
- Laura B Ramsey
- Pharmaceutical Sciences Department, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
306
|
Mechanic LE, Chen HS, Amos CI, Chatterjee N, Cox NJ, Divi RL, Fan R, Harris EL, Jacobs K, Kraft P, Leal SM, McAllister K, Moore JH, Paltoo DN, Province MA, Ramos EM, Ritchie MD, Roeder K, Schaid DJ, Stephens M, Thomas DC, Weinberg CR, Witte JS, Zhang S, Zöllner S, Feuer EJ, Gillanders EM. Next generation analytic tools for large scale genetic epidemiology studies of complex diseases. Genet Epidemiol 2011; 36:22-35. [PMID: 22147673 DOI: 10.1002/gepi.20652] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
Over the past several years, genome-wide association studies (GWAS) have succeeded in identifying hundreds of genetic markers associated with common diseases. However, most of these markers confer relatively small increments of risk and explain only a small proportion of familial clustering. To identify obstacles to future progress in genetic epidemiology research and provide recommendations to NIH for overcoming these barriers, the National Cancer Institute sponsored a workshop entitled "Next Generation Analytic Tools for Large-Scale Genetic Epidemiology Studies of Complex Diseases" on September 15-16, 2010. The goal of the workshop was to facilitate discussions on (1) statistical strategies and methods to efficiently identify genetic and environmental factors contributing to the risk of complex disease; and (2) how to develop, apply, and evaluate these strategies for the design, analysis, and interpretation of large-scale complex disease association studies in order to guide NIH in setting the future agenda in this area of research. The workshop was organized as a series of short presentations covering scientific (gene-gene and gene-environment interaction, complex phenotypes, and rare variants and next generation sequencing) and methodological (simulation modeling and computational resources and data management) topic areas. Specific needs to advance the field were identified during each session and are summarized.
Collapse
Affiliation(s)
- Leah E Mechanic
- Epidemiology and Genetics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, NIH, Bethesda, Maryland 20892, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
307
|
Johansen CT, Wang J, McIntyre AD, Martins RA, Ban MR, Lanktree MB, Huff MW, Péterfy M, Mehrabian M, Lusis AJ, Kathiresan S, Anand SS, Yusuf S, Lee AH, Glimcher LH, Cao H, Hegele RA. Excess of rare variants in non-genome-wide association study candidate genes in patients with hypertriglyceridemia. ACTA ACUST UNITED AC 2011; 5:66-72. [PMID: 22135386 DOI: 10.1161/circgenetics.111.960864] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
BACKGROUND Rare variant accumulation studies can implicate genes in disease susceptibility when a significant burden is observed in patients versus control subjects. Such analyses might be particularly useful for candidate genes that are selected based on experiments other than genome-wide association studies (GWAS). We sought to determine whether rare variants in non-GWAS candidate genes identified from mouse models and human mendelian syndromes of hypertriglyceridemia (HTG) accumulate in patients with polygenic adult-onset HTG. METHODS AND RESULTS We resequenced protein coding regions of 3 genes with established roles (APOC2, GPIHBP1, LMF1) and 2 genes recently implicated (CREB3L3 and ZHX3) in TG metabolism. We identified 41 distinct heterozygous rare variants, including 29 singleton variants, in the combined sample; in total, we observed 47 rare variants in 413 HTG patients versus 16 in 324 control subjects (odds ratio=2.3; P=0.0050). Post hoc assessment of genetic burden in individual genes using 3 different tests suggested that the genetic burden was most prominent in the established genes LMF1 and APOC2, and also in the recently identified CREB3L3 gene. CONCLUSIONS These extensive resequencing studies show a significant accumulation of rare genetic variants in non-GWAS candidate genes among patients with polygenic HTG, and indicate the importance of testing specific hypotheses in large-scale resequencing studies.
Collapse
Affiliation(s)
- Christopher T Johansen
- Department of Biochemistry, Robarts Research Institute, University of Western Ontario, London, ON, Canada
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
308
|
Mangold E, Ludwig KU, Nöthen MM. Breakthroughs in the genetics of orofacial clefting. Trends Mol Med 2011; 17:725-33. [PMID: 21885341 DOI: 10.1016/j.molmed.2011.07.007] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2011] [Revised: 07/21/2011] [Accepted: 07/21/2011] [Indexed: 01/03/2023]
|
309
|
Yi N, Liu N, Zhi D, Li J. Hierarchical generalized linear models for multiple groups of rare and common variants: jointly estimating group and individual-variant effects. PLoS Genet 2011; 7:e1002382. [PMID: 22144906 PMCID: PMC3228815 DOI: 10.1371/journal.pgen.1002382] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2011] [Accepted: 09/29/2011] [Indexed: 12/19/2022] Open
Abstract
Complex diseases and traits are likely influenced by many common and rare genetic variants and environmental factors. Detecting disease susceptibility variants is a challenging task, especially when their frequencies are low and/or their effects are small or moderate. We propose here a comprehensive hierarchical generalized linear model framework for simultaneously analyzing multiple groups of rare and common variants and relevant covariates. The proposed hierarchical generalized linear models introduce a group effect and a genetic score (i.e., a linear combination of main-effect predictors for genetic variants) for each group of variants, and jointly they estimate the group effects and the weights of the genetic scores. This framework includes various previous methods as special cases, and it can effectively deal with both risk and protective variants in a group and can simultaneously estimate the cumulative contribution of multiple variants and their relative importance. Our computational strategy is based on extending the standard procedure for fitting generalized linear models in the statistical software R to the proposed hierarchical models, leading to the development of stable and flexible tools. The methods are illustrated with sequence data in gene ANGPTL4 from the Dallas Heart Study. The performance of the proposed procedures is further assessed via simulation studies. The methods are implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/).
Collapse
Affiliation(s)
- Nengjun Yi
- Department of Biostatistics, Section on Statistical Genetics, University of Alabama at Birmingham, Birmingham, Alabama, USA.
| | | | | | | |
Collapse
|
310
|
Abstract
Advances in genotyping and sequencing technologies have revolutionized the genetics of complex disease by locating rare and common variants that influence an individual's risk for diseases, such as diabetes, cancers, and psychiatric disorders. However, to capitalize on these data for prevention and therapies requires the identification of causal alleles and a mechanistic understanding for how these variants contribute to the disease. After discussing the strategies currently used to map variants for complex diseases, this Primer explores how variants may be prioritized for follow-up functional studies and the challenges and approaches for assessing the contributions of rare and common variants to disease phenotypes.
Collapse
|
311
|
Zhang Q, Chung D, Kraja A, Borecki II, Province MA. Methods for adjusting population structure and familial relatedness in association test for collective effect of multiple rare variants on quantitative traits. BMC Proc 2011; 5 Suppl 9:S35. [PMID: 22373066 PMCID: PMC3287871 DOI: 10.1186/1753-6561-5-s9-s35] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Because of the low frequency of rare genetic variants in observed data, the statistical power of detecting their associations with target traits is usually low. The collapsing test of collective effect of multiple rare variants is an important and useful strategy to increase the power; in addition, family data may be enriched with causal rare variants and therefore provide extra power. However, when family data are used, both population structure and familial relatedness need to be adjusted for the possible inflation of false positives. Using a unified mixed linear model and family data, we compared six methods to detect the association between multiple rare variants and quantitative traits. Through the analysis of 200 replications of the quantitative trait Q2 from the Genetic Analysis Workshop 17 data set simulated for 697 subjects from 8 extended families, and based on quantile-quantile plots under the null and receiver operating characteristic curves, we compared the false-positive rate and power of these methods. We observed that adjusting for pedigree-based kinship gives the best control for false-positive rate, whereas adjusting for marker-based identity by state slightly outperforms in terms of power. An adjustment based on a principal components analysis slightly improves the false-positive rate and power. Taking into account type-1 error, power, and computational efficiency, we find that adjusting for pedigree-based kinship seems to be a good choice for the collective test of association between multiple rare variants and quantitative traits using family data.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, 4444 Forest Park Boulevard, St, Louis, MO 63108, USA.
| | | | | | | | | |
Collapse
|
312
|
Abstract
A number of studies have been conducted to investigate the predictive value of common genetic variants for complex diseases. To date, these studies have generally shown that common variants have no appreciable added predictive value over classical risk factors. New sequencing technology has enhanced the ability to identify rare variants that may have larger functional effects than common variants. One would expect rare variants to improve the discrimination power for disease risk by permitting more detailed quantification of genetic risk. Using the Genetic Analysis Workshop 17 simulated data sets for unrelated individuals, we evaluate the predictive value of rare variants by comparing prediction models built using the support vector machine algorithm with or without rare variants. Empirical results suggest that rare variants have appreciable effects on disease risk prediction.
Collapse
Affiliation(s)
- Chengqing Wu
- Department of Epidemiology and Public Health, Yale University, 60 College Street, New Haven, CT 06510, USA.
| | | | | | | | | |
Collapse
|
313
|
Alfonso de Almeida MA, Vançan Russo Horimoto AR, Lopes de Oliveira PS, Krieger JE, da Costa Pereira A. Different approaches for dealing with rare variants in family-based genetic studies: application of a Genetic Analysis Workshop 17 problem. BMC Proc 2011; 5 Suppl 9:S78. [PMID: 22373261 PMCID: PMC3287918 DOI: 10.1186/1753-6561-5-s9-s78] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Rare variants are becoming the new candidates in the search for genetic variants that predispose individuals to a phenotype of interest. Their low prevalence in a population requires the development of dedicated detection and analytical methods. A family-based approach could greatly enhance their detection and interpretation because rare variants are nearly family specific. In this report, we test several distinct approaches for analyzing the information provided by rare and common variants and how they can be effectively used to pinpoint putative candidate genes for follow-up studies. The analyses were performed on the mini-exome data set provided by Genetic Analysis Workshop 17. Eight approaches were tested, four using the trait’s heritability estimates and four using QTDT models. These methods had their sensitivity, specificity, and positive and negative predictive values compared in light of the simulation parameters. Our results highlight important limitations of current methods to deal with rare and common variants, all methods presented a reduced specificity and, consequently, prone to false positive associations. Methods analyzing common variants information showed an enhanced sensibility when compared to rare variants methods. Furthermore, our limited knowledge of the use of biological databases for gene annotations, possibly for use as covariates in regression models, imposes a barrier to further research.
Collapse
Affiliation(s)
- Marcio Augusto Alfonso de Almeida
- Laboratory of Genetic and Molecular Cardiology, Heart Institute, University of Sao Paulo Medical School, Av, Dr, Eneas C Aguiar, 44-10 andar, São Paulo 05403-000, Brazil.
| | | | | | | | | |
Collapse
|
314
|
Abstract
The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
Collapse
|
315
|
Abstract
Next-generation sequencing allows for a new focus on rare variant density for conducting analyses of association to disease and for narrowing down the genomic regions that show evidence of functionality. In this study we use the 1000 Genomes Project pilot data as distributed by Genetic Analysis Workshop 17 to compare rare variant densities across seven populations. We made the comparisons using regressions of rare variants on total variant counts per gene for each population and Tajima's D values calculated for each gene in each population, using data on 3,205 genes. We found that the populations clustered by continent for both the regression slopes and Tajima's D values, with the African populations (Yoruba and Luhya) showing the highest density of rare variants, followed by the Asian populations (Han and Denver Chinese followed by the Japanese) and the European populations (CEPH [European-descent] and Tuscan) with the lowest densities. These significant differences in rare variant densities across populations seem to translate to measures of the rare variant density more commonly used in rare variant association analyses, suggesting the need to adjust for ancestry in such analyses. The selection signal was high for AHNAK, HLA-A, RANBP2, and RGPD4, among others. RANBP2 and RGPD4 showed a marked difference in rare variant density and potential selection between the Luhya and the other populations. This may suggest that differences between populations should be considered when delimiting genomic regions according to functionality and that these differences can create potential for disease heterogeneity.
Collapse
Affiliation(s)
- Paola Raska
- Department of Epidemiology and Biostatistics, Case Western Reserve University, 10900 Euclid Ave,, Cleveland, OH 44106, USA.
| | | |
Collapse
|
316
|
Thalamuthu A, Zhao J, Keong GTH, Kondragunta V, Mukhopadhyay I. Association tests for rare and common variants based on genotypic and phenotypic measures of similarity between individuals. BMC Proc 2011; 5 Suppl 9:S89. [PMID: 22373048 PMCID: PMC3287930 DOI: 10.1186/1753-6561-5-s9-s89] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Genome-wide association studies have helped us identify thousands of common variants associated with several widespread complex diseases. However, for most traits, these variants account for only a small fraction of phenotypic variance or heritability. Next-generation sequencing technologies are being used to identify additional rare variants hypothesized to have higher effect sizes than the already identified common variants, and to contribute significantly to the fraction of heritability that is still unexplained. Several pooling strategies have been proposed to test the joint association of multiple rare variants, because testing them individually may not be optimal. Within a gene or genomic region, if there are both rare and common variants, testing their joint association may be desirable to determine their synergistic effects. We propose new methods to test the joint association of several rare and common variants with binary and quantitative traits. Our association test for quantitative traits is based on genotypic and phenotypic measures of similarity between pairs of individuals. For the binary trait or case-control samples, we recently proposed an association test based on the genotypic similarity between individuals. Here, we develop a modified version of this test for rare variants. Our tests can be used for samples taken from multiple subpopulations. The power of our test statistics for case-control samples and quantitative traits was evaluated using the GAW17 simulated data sets. Type I error rates for the proposed tests are well controlled. Our tests are able to identify some of the important causal genes in the GAW17 simulated data sets.
Collapse
Affiliation(s)
- Anbupalam Thalamuthu
- Human Genetics, 60 Biopolis Street 02-01, Genome Institute of Singapore, Singapore 138672.
| | | | | | | | | |
Collapse
|
317
|
Zhang L, Pei YF, Hai R, Lin Y, Deng HW. Testing rare variants for association with diseases: a Bayesian marker selection approach. Ann Hum Genet 2011; 76:74-85. [PMID: 22034989 DOI: 10.1111/j.1469-1809.2011.00684.x] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Abstract
It has been a research focus to uncover the genetic determination of complex diseases caused by rare variants. As the vast majority of genomic variants represent background variation, highlighting potentially causal mutations through a weighting scheme is critical to the success of association studies aimed at identifying rare variants. In this study, we propose a novel Bayesian marker selection approach to perform a weighting-based association test. In this approach, an individual association signal and its direction are used to weight variants. In addition, the predicted biological function of variants is taken as prior information to direct the selection of likely causal variants. Simulation studies show that the proposed method has improved power over several existing methods in certain conditions. Analyses of two empirical datasets demonstrate its applicability.
Collapse
Affiliation(s)
- Lei Zhang
- Center of System Biomedical Sciences, University of Shanghai for Science and Technology, P. R. China
| | | | | | | | | |
Collapse
|
318
|
Flaherty P, Natsoulis G, Muralidharan O, Winters M, Buenrostro J, Bell J, Brown S, Holodniy M, Zhang N, Ji HP. Ultrasensitive detection of rare mutations using next-generation targeted resequencing. Nucleic Acids Res 2011; 40:e2. [PMID: 22013163 PMCID: PMC3245950 DOI: 10.1093/nar/gkr861] [Citation(s) in RCA: 103] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
With next-generation DNA sequencing technologies, one can interrogate a specific genomic region of interest at very high depth of coverage and identify less prevalent, rare mutations in heterogeneous clinical samples. However, the mutation detection levels are limited by the error rate of the sequencing technology as well as by the availability of variant-calling algorithms with high statistical power and low false positive rates. We demonstrate that we can robustly detect mutations at 0.1% fractional representation. This represents accurate detection of one mutant per every 1000 wild-type alleles. To achieve this sensitive level of mutation detection, we integrate a high accuracy indexing strategy and reference replication for estimating sequencing error variance. We employ a statistical model to estimate the error rate at each position of the reference and to quantify the fraction of variant base in the sample. Our method is highly specific (99%) and sensitive (100%) when applied to a known 0.1% sample fraction admixture of two synthetic DNA samples to validate our method. As a clinical application of this method, we analyzed nine clinical samples of H1N1 influenza A and detected an oseltamivir (antiviral therapy) resistance mutation in the H1N1 neuraminidase gene at a sample fraction of 0.18%.
Collapse
Affiliation(s)
- Patrick Flaherty
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA
| | | | | | | | | | | | | | | | | | | |
Collapse
|
319
|
Powers S, Gopalakrishnan S, Tintle N. Assessing the impact of non-differential genotyping errors on rare variant tests of association. Hum Hered 2011; 72:153-60. [PMID: 22004945 DOI: 10.1159/000332222] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2011] [Accepted: 08/24/2011] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND/AIMS We aim to quantify the effect of non-differential genotyping errors on the power of rare variant tests and identify those situations when genotyping errors are most harmful. METHODS We simulated genotype and phenotype data for a range of sample sizes, minor allele frequencies, disease relative risks and numbers of rare variants. Genotype errors were then simulated using five different error models covering a wide range of error rates. RESULTS Even at very low error rates, misclassifying a common homozygote as a heterozygote translates into a substantial loss of power, a result that is exacerbated even further as the minor allele frequency decreases. While the power loss from heterozygote to common homozygote errors tends to be smaller for a given error rate, in practice heterozygote to homozygote errors are more frequent and, thus, will have measurable impact on power. CONCLUSION Error rates from genotype-calling technology for next-generation sequencing data suggest that substantial power loss may be seen when applying current rare variant tests of association to called genotypes.
Collapse
Affiliation(s)
- Scott Powers
- Department of Statistics and Operations Research, University of North Carolina, Chapel Hill, NC, USA
| | | | | |
Collapse
|
320
|
Abstract
Exome sequencing - the targeted sequencing of the subset of the human genome that is protein coding - is a powerful and cost-effective new tool for dissecting the genetic basis of diseases and traits that have proved to be intractable to conventional gene-discovery strategies. Over the past 2 years, experimental and analytical approaches relating to exome sequencing have established a rich framework for discovering the genes underlying unsolved Mendelian disorders. Additionally, exome sequencing is being adapted to explore the extent to which rare alleles explain the heritability of complex diseases and health-related traits. These advances also set the stage for applying exome and whole-genome sequencing to facilitate clinical diagnosis and personalized disease-risk profiling.
Collapse
|
321
|
Evans D, Aberle J, Beil FU. The relative importance of common and rare genetic variants in the development of hypertriglyceridemia. Expert Rev Cardiovasc Ther 2011; 9:637-44. [PMID: 21615327 DOI: 10.1586/erc.11.53] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Plasma lipid levels are a complex trait with a genetic and an environmental component. There are two models for the genetic basis of complex traits: the common-disease common-variant hypothesis, in which susceptibility is due to variants occurring at relatively high frequency but low effect size; and the common-disease rare-variant hypothesis, where disease is due to multiple rare variants each occurring at low frequency but with high effect size. Genome-wide association studies have identified a number of common variants associated with plasma lipid levels. However, they account for only a proportion of the genetic variance. Resequencing studies are revealing the importance of rare variants in accounting for the missing variance. Next-generation sequencing will allow the relative importance of the two hypotheses to be assessed.
Collapse
Affiliation(s)
- David Evans
- Endokrinologie und Stoffwechsel, Medizinische Klinik III, Zentrum für Innere Medizin, Universitätsklinikum Hamburg-Eppendorf, Martinistrasse 52, 20246 Hamburg, Germany.
| | | | | |
Collapse
|
322
|
Li D, Lewinger JP, Gauderman WJ, Murcray CE, Conti D. Using extreme phenotype sampling to identify the rare causal variants of quantitative traits in association studies. Genet Epidemiol 2011; 35:790-9. [PMID: 21922541 DOI: 10.1002/gepi.20628] [Citation(s) in RCA: 105] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2011] [Revised: 07/17/2011] [Accepted: 07/22/2011] [Indexed: 12/11/2022]
Abstract
Variants identified in recent genome-wide association studies based on the common-disease common-variant hypothesis are far from fully explaining the hereditability of complex traits. Rare variants may, in part, explain some of the missing hereditability. Here, we explored the advantage of the extreme phenotype sampling in rare-variant analysis and refined this design framework for future large-scale association studies on quantitative traits. We first proposed a power calculation approach for a likelihood-based analysis method. We then used this approach to demonstrate the potential advantages of extreme phenotype sampling for rare variants. Next, we discussed how this design can influence future sequencing-based association studies from a cost-efficiency (with the phenotyping cost included) perspective. Moreover, we discussed the potential of a two-stage design with the extreme sample as the first stage and the remaining nonextreme subjects as the second stage. We demonstrated that this two-stage design is a cost-efficient alternative to the one-stage cross-sectional design or traditional two-stage design. We then discussed the analysis strategies for this extreme two-stage design and proposed a corresponding design optimization procedure. To address many practical concerns, for example measurement error or phenotypic heterogeneity at the very extremes, we examined an approach in which individuals with very extreme phenotypes are discarded. We demonstrated that even with a substantial proportion of these extreme individuals discarded, an extreme-based sampling can still be more efficient. Finally, we expanded the current analysis and design framework to accommodate the CMC approach where multiple rare-variants in the same gene region are analyzed jointly.
Collapse
Affiliation(s)
- Dalin Li
- Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | | | | | | | | |
Collapse
|
323
|
Stitziel NO, Kiezun A, Sunyaev S. Computational and statistical approaches to analyzing variants identified by exome sequencing. Genome Biol 2011; 12:227. [PMID: 21920052 PMCID: PMC3308043 DOI: 10.1186/gb-2011-12-9-227] [Citation(s) in RCA: 99] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
New sequencing technology has enabled the identification of thousands of single nucleotide polymorphisms in the exome, and many computational and statistical approaches to identify disease-association signals have emerged.
Collapse
Affiliation(s)
- Nathan O Stitziel
- Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Harvard Medical School, 75 Francis Street, Boston, MA 02115, USA
| | | | | |
Collapse
|
324
|
Abstract
The successes of genome-wide association (GWA) studies have mainly come from studies performed in populations of European descent. Since complex traits are characterized by marked genetic heterogeneity, the findings so far may provide an incomplete picture of the genetic architecture of complex traits. However, the recent GWA studies performed on East Asian populations now allow us to globally assess the heterogeneity of association signals between populations of European ancestry and East Asians, and the possible obstacles for multi-ethnic GWA studies. We focused on four different traits that represent a broad range of complex phenotypes, which have been studied in both Europeans and East Asians: type 2 diabetes, systemic lupus erythematosus, ulcerative colitis and height. For each trait, we observed that most of the risk loci identified in East Asians were shared with Europeans. However, we also observed that a significant part of the association signals at these shared loci seems to be independent between populations. This suggests that disease aetiology is common between populations, but that risk variants are often population specific. These variants could be truly population specific and result from natural selection, genetic drift and recent mutations, or they could be spurious, caused by the limitations of the method of analysis employed in the GWA studies. We therefore propose a three-stage framework for multi-ethnic GWA analyses, starting with the commonly used single-nucleotide polymorphism-based analysis, and followed by a gene-based approach and a pathway-based analysis, which will take into account the heterogeneity of association between populations at different levels.
Collapse
Affiliation(s)
- Jingyuan Fu
- Department of Genetics, University Medical Centre and University of Groningen, Groningen, The Netherlands
| | | | | |
Collapse
|
325
|
Quintana MA, Berstein JL, Thomas DC, Conti DV. Incorporating model uncertainty in detecting rare variants: the Bayesian risk index. Genet Epidemiol 2011; 35:638-49. [PMID: 22009789 DOI: 10.1002/gepi.20613] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 03/16/2011] [Accepted: 06/12/2011] [Indexed: 01/27/2023]
Abstract
We are interested in investigating the involvement of multiple rare variants within a given region by conducting analyses of individual regions with two goals: (1) to determine if regional rare variation in aggregate is associated with risk; and (2) conditional upon the region being associated, to identify specific genetic variants within the region that are driving the association. In particular, we seek a formal integrated analysis that achieves both of our goals. For rare variants with low minor allele frequencies, there is very little power to statistically test the null hypothesis of equal allele or genotype counts for each variant. Thus, genetic association studies are often limited to detecting association within a subset of the common genetic markers. However, it is very likely that associations exist for the rare variants that may not be captured by the set of common markers. Our framework aims at constructing a risk index based on multiple rare variants within a region. Our analytical strategy is novel in that we use a Bayesian approach to incorporate model uncertainty in the selection of variants to include in the index as well as the direction of the associated effects. Additionally, the approach allows for inference at both the group and variant-specific levels. Using a set of simulations, we show that our methodology has added power over other popular rare variant methods to detect global associations. In addition, we apply the approach to sequence data from the WECARE Study of second primary breast cancers.
Collapse
Affiliation(s)
- Melanie A Quintana
- Department of Preventive Medicine, Division of Biostatistics, University of Southern California, Los Angeles, California, USA.
| | | | | | | |
Collapse
|
326
|
Study designs for identification of rare disease variants in complex diseases: the utility of family-based designs. Genetics 2011; 189:1061-8. [PMID: 21840850 DOI: 10.1534/genetics.111.131813] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The recent progress in sequencing technologies makes possible large-scale medical sequencing efforts to assess the importance of rare variants in complex diseases. The results of such efforts depend heavily on the use of efficient study designs and analytical methods. We introduce here a unified framework for association testing of rare variants in family-based designs or designs based on unselected affected individuals. This framework allows us to quantify the enrichment in rare disease variants in families containing multiple affected individuals and to investigate the optimal design of studies aiming to identify rare disease variants in complex traits. We show that for many complex diseases with small values for the overall sibling recurrence risk ratio, such as Alzheimer's disease and most cancers, sequencing affected individuals with a positive family history of the disease can be extremely advantageous for identifying rare disease variants. In contrast, for complex diseases with large values of the sibling recurrence risk ratio, sequencing unselected affected individuals may be preferable.
Collapse
|
327
|
Feng BJ, Tavtigian SV, Southey MC, Goldgar DE. Design considerations for massively parallel sequencing studies of complex human disease. PLoS One 2011; 6:e23221. [PMID: 21850262 PMCID: PMC3151293 DOI: 10.1371/journal.pone.0023221] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2011] [Accepted: 07/14/2011] [Indexed: 12/24/2022] Open
Abstract
Massively Parallel Sequencing (MPS) allows sequencing of entire exomes and genomes to now be done at reasonable cost, and its utility for identifying genes responsible for rare Mendelian disorders has been demonstrated. However, for a complex disease, study designs need to accommodate substantial degrees of locus, allelic, and phenotypic heterogeneity, as well as complex relationships between genotype and phenotype. Such considerations include careful selection of samples for sequencing and a well-developed strategy for identifying the few "true" disease susceptibility genes from among the many irrelevant genes that will be found to harbor rare variants. To examine these issues we have performed simulation-based analyses in order to compare several strategies for MPS sequencing in complex disease. Factors examined include genetic architecture, sample size, number and relationship of individuals selected for sequencing, and a variety of filters based on variant type, multiple observations of genes and concordance of genetic variants within pedigrees. A two-stage design was assumed where genes from the MPS analysis of high-risk families are evaluated in a secondary screening phase of a larger set of probands with more modest family histories. Designs were evaluated using a cost function that assumes the cost of sequencing the whole exome is 400 times that of sequencing a single candidate gene. Results indicate that while requiring variants to be identified in multiple pedigrees and/or in multiple individuals in the same pedigree are effective strategies for reducing false positives, there is a danger of over-filtering so that most true susceptibility genes are missed. In most cases, sequencing more than two individuals per pedigree results in reduced power without any benefit in terms of reduced overall cost. Further, our results suggest that although no single strategy is optimal, simulations can provide important guidelines for study design.
Collapse
Affiliation(s)
- Bing-Jian Feng
- Department of Dermatology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| | - Sean V. Tavtigian
- Huntsman Cancer Institute and Department of Oncological Sciences, University of Utah, Salt Lake City, Utah, United States of America
| | - Melissa C. Southey
- Department of Pathology, University of Melbourne, Melbourne, Victoria, Australia
| | - David E. Goldgar
- Department of Dermatology, University of Utah School of Medicine, Salt Lake City, Utah, United States of America
| |
Collapse
|
328
|
Torkamani A, Scott-Van Zeeland AA, Topol EJ, Schork NJ. Annotating individual human genomes. Genomics 2011; 98:233-41. [PMID: 21839162 DOI: 10.1016/j.ygeno.2011.07.006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2011] [Accepted: 07/26/2011] [Indexed: 02/03/2023]
Abstract
Advances in DNA sequencing technologies have made it possible to rapidly, accurately and affordably sequence entire individual human genomes. As impressive as this ability seems, however, it will not likely amount to much if one cannot extract meaningful information from individual sequence data. Annotating variations within individual genomes and providing information about their biological or phenotypic impact will thus be crucially important in moving individual sequencing projects forward, especially in the context of the clinical use of sequence information. In this paper we consider the various ways in which one might annotate individual sequence variations and point out limitations in the available methods for doing so. It is arguable that, in the foreseeable future, DNA sequencing of individual genomes will become routine for clinical, research, forensic, and personal purposes. We therefore also consider directions and areas for further research in annotating genomic variants.
Collapse
|
329
|
Integrating Rare-Variant Testing, Function Prediction, and Gene Network in Composite Resequencing-Based Genome-Wide Association Studies (CR-GWAS). G3-GENES GENOMES GENETICS 2011; 1:233-43. [PMID: 22384334 PMCID: PMC3276137 DOI: 10.1534/g3.111.000364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/19/2011] [Accepted: 07/05/2011] [Indexed: 01/08/2023]
Abstract
High-density array-based genome-wide association studies (GWAS) are complemented by exome sequencing and whole-genome resequencing-based association studies. Here we present a composite resequencing-based genome-wide association study (CR-GWAS) strategy that systematically exploits collective biological information and analytical tools for a robust analysis. We showcased the utility of this strategy by using Arabidopsis (Arabidopsis thaliana) resequencing data. Bioinformatic predictions of biological function alteration at each locus were integrated into the process of association testing of both common and rare variants for complex traits with a suite of statistics. Significant signals were then filtered with a priori candidate loci generated from genome database and gene network models to obtain a posteriori candidate loci. A probabilistic gene network (AraNet) that interrogates network neighborhoods of genes was then used to expand the filtering power to examine the significant testing signals. Using this strategy, we confirmed the known true positives and identified several new promising associations. Promising genes (AP1, FCA, FRI, FLC, FLM, SPL5, FY, and DCL2) were shown to control for flowering time through either common variants or rare variants within a diverse set of Arabidopsis accessions. Although many of these candidate genes were cloned earlier with mutational studies, identifying their allele variation contribution to overall phenotypic variation among diverse natural accessions is critical. Our rare allele testing established a greater number of connections than previous analyses in which this issue was not addressed. More importantly, our results demonstrated the potential of integrating various biological, statistical, and bioinformatic tools into complex trait dissection.
Collapse
|
330
|
Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 2011; 35:606-19. [PMID: 21769936 DOI: 10.1002/gepi.20609] [Citation(s) in RCA: 188] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2010] [Revised: 03/23/2011] [Accepted: 06/03/2011] [Indexed: 01/31/2023]
Abstract
In anticipation of the availability of next-generation sequencing data, there is increasing interest in investigating association between complex traits and rare variants (RVs). In contrast to association studies for common variants (CVs), due to the low frequencies of RVs, common wisdom suggests that existing statistical tests for CVs might not work, motivating the recent development of several new tests for analyzing RVs, most of which are based on the idea of pooling/collapsing RVs. However, there is a lack of evaluations of, and thus guidance on the use of, existing tests. Here we provide a comprehensive comparison of various statistical tests using simulated data. We consider both independent and correlated rare mutations, and representative tests for both CVs and RVs. As expected, if there are no or few non-causal (i.e. neutral or non-associated) RVs in a locus of interest while the effects of causal RVs on the trait are all (or mostly) in the same direction (i.e. either protective or deleterious, but not both), then the simple pooled association tests (without selecting RVs and their association directions) and a new test called kernel-based adaptive clustering (KBAC) perform similarly and are most powerful; KBAC is more robust than simple pooled association tests in the presence of non-causal RVs; however, as the number of non-causal CVs increases and/or in the presence of opposite association directions, the winners are two methods originally proposed for CVs and a new test called C-alpha test proposed for RVs, each of which can be regarded as testing on a variance component in a random-effects model. Interestingly, several methods based on sequential model selection (i.e. selecting causal RVs and their association directions), including two new methods proposed here, perform robustly and often have statistical power between those of the above two classes.
Collapse
Affiliation(s)
- Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455-0392, USA
| | | |
Collapse
|
331
|
Gordon D, Finch SJ, De La Vega FM, De La Vega F. A new expectation-maximization statistical test for case-control association studies considering rare variants obtained by high-throughput sequencing. Hum Hered 2011; 71:113-25. [PMID: 21734402 DOI: 10.1159/000325590] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Genome-wide association studies (GWAS) have been successful in identifying common genetic variation reproducibly associated with disease. However, most associated variants confer very small risk and after meta-analysis of large cohorts a large fraction of expected heritability still remains unexplained. A possible explanation is that rare variants currently undetected by GWAS with SNP arrays could contribute a large fraction of risk when present in cases. This concept has spurred great interest in exploring the role of rare variants in disease. As the cost of sequencing continue to plummet, it is becoming feasible to directly sequence case-control samples for testing disease association including rare variants. We have developed a test statistic that allows for association testing among cases and controls using data directly from sequencing reads. In addition, our method allows for random errors in reads. We determine the probability of a true genotype call based on the observed base pair reads using the expectation-maximization algorithm. We apply the SumStat procedure to obtain a single statistic for a group of multiple rare variant loci. We document the validity of our method through simulations. Our results suggest that our statistic maintains the correct type I error rate, even in the presence of differential misclassification for sequence reads, and that it has good power under a number of scenarios. Finally, our SumStat results show power at least as good as the maximum single locus results.
Collapse
Affiliation(s)
- Derek Gordon
- Department of Genetics, Rutgers University, Piscataway, N.J., USA
| | | | | | | |
Collapse
|
332
|
Kim SY, Lohmueller KE, Albrechtsen A, Li Y, Korneliussen T, Tian G, Grarup N, Jiang T, Andersen G, Witte D, Jorgensen T, Hansen T, Pedersen O, Wang J, Nielsen R. Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics 2011; 12:231. [PMID: 21663684 PMCID: PMC3212839 DOI: 10.1186/1471-2105-12-231] [Citation(s) in RCA: 127] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2011] [Accepted: 06/11/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates. RESULTS We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data. CONCLUSIONS Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
Collapse
Affiliation(s)
- Su Yeon Kim
- Departments of Integrative Biology and Statistics, UC Berkeley, Berkeley, CA 94720, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
333
|
Abstract
Association mapping has successfully identified common SNPs associated with many diseases. However, the inability of this class of variation to account for most of the supposed heritability has led to a renewed interest in methods - primarily linkage analysis - to detect rare variants. Family designs allow for control of population stratification, investigations of questions such as parent-of-origin effects and other applications that are imperfectly or not readily addressed in case-control association studies. This article guides readers through the interface between linkage and association analysis, reviews the new methodologies and provides useful guidelines for applications. Just as effective SNP-genotyping tools helped to realize the potential of association studies, next-generation sequencing tools will benefit genetic studies by improving the power of family-based approaches.
Collapse
|
334
|
Shi G, Rao DC. Optimum designs for next-generation sequencing to discover rare variants for common complex disease. Genet Epidemiol 2011; 35:572-9. [PMID: 21618604 DOI: 10.1002/gepi.20597] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Revised: 04/21/2011] [Accepted: 04/25/2011] [Indexed: 01/11/2023]
Abstract
Recent advances in next-generation sequencing technologies make it affordable to search for rare and functional variants for common complex diseases systematically. We investigated strategies for enriching rare variants in the samples selected for sequencing so as to optimize the power for their discovery. In particular, we investigated the roles of alternative sources of enrichment in families through computer simulations. We showed that linkage information, extreme phenotype, and nonrandom ascertainment, such as multiply affected families, constitute different sources for enriching rare and functional variants in a sequencing study design. Linkage is well known to have limited power for detecting small genetic effects, and hence not considered to be a powerful tool for discovering variants for common complex diseases. However, those families with some degree of family-specific linkage evidence provide an effective sampling strategy to sub-select the most linkage-informative families for sequencing. Compared with selecting subjects with extreme phenotypes, linkage evidence performs better with larger families, while extreme-phenotype method is more efficient with smaller families. Families with multiple affected siblings were found to provide the largest enrichment of rare variants. Finally, we showed that combined strategies, such as selecting linkage-informative families from multiply affected families, provide much higher enrichment of rare functional variants than either strategy alone.
Collapse
Affiliation(s)
- Gang Shi
- Division of Biostatistics and Department of Genetics, School of Medicine, Washington University in St. Louis, 660 South Euclid Avenue, St. Louis, MO 63110-1093, USA.
| | | |
Collapse
|
335
|
Feng T, Elston RC, Zhu X. Detecting rare and common variants for complex traits: sibpair and odds ratio weighted sum statistics (SPWSS, ORWSS). Genet Epidemiol 2011; 35:398-409. [PMID: 21594893 DOI: 10.1002/gepi.20588] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2011] [Revised: 03/25/2011] [Accepted: 03/30/2011] [Indexed: 01/04/2023]
Abstract
It is generally known that risk variants segregate together with a disease within families, but this information has not been used in the existing statistical methods for detecting rare variants. Here we introduce two weighted sum statistics that can apply to either genome-wide association data or resequencing data for identifying rare disease variants: weights calculated based on sibpairs and odd ratios, respectively. We evaluated the two methods via extensive simulations under different disease models. We compared the proposed methods with the weighted sum statistic (WSS) proposed by Madsen and Browning, keeping the same genotyping or resequencing cost. Our methods clearly demonstrate more statistical power than the WSS. In addition, we found that using sibpair information can increase power over using only unrelated samples by more than 40%. We applied our methods to the Framingham Heart Study (FHS) and Wellcome Trust Case Control Consortium (WTCCC) hypertension datasets. Although we did not identify any genes as reaching a genome-wide significance level, we found variants in the candidate gene angiotensinogen significantly associated with hypertension at P = 6.9 × 10(-4), whereas the most significant single SNP association evidence is P = 0.063. We further applied the odds ratio weighted method to the IFIH1 gene for type-1 diabetes in the WTCCC data. Our method yielded a P-value of 4.82 × 10(-4), much more significant than that obtained by haplotype-based methods. We demonstrated that family data are extremely informative in searching for rare variants underlying complex traits, and the odds ratio weighted sum statistic is more efficient than currently existing methods.
Collapse
Affiliation(s)
- Tao Feng
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA
| | | | | |
Collapse
|
336
|
Johansen CT, Wang J, Hegele RA. Bias due to selection of rare variants using frequency in controls. Nat Genet 2011. [DOI: 10.1038/ng.817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
337
|
Abstract
Genome-wide association studies (GWAS) have become the primary approach for identifying genes with common variants influencing complex diseases. Despite considerable progress, the common variations identified by GWAS account for only a small fraction of disease heritability and are unlikely to explain the majority of phenotypic variations of common diseases. A potential source of the missing heritability is the contribution of rare variants. Next-generation sequencing technologies will detect millions of novel rare variants, but these technologies have three defining features: identification of a large number of rare variants, a high proportion of sequence errors, and a large proportion of missing data. These features raise challenges for testing the association of rare variants with phenotypes of interest. In this study, we use a genome continuum model and functional principal components as a general principle for developing novel and powerful association analysis methods designed for resequencing data. We use simulations to calculate the type I error rates and the power of nine alternative statistics: two functional principal component analysis (FPCA)-based statistics, the multivariate principal component analysis (MPCA)-based statistic, the weighted sum (WSS), the variable-threshold (VT) method, the generalized T(2), the collapsing method, the CMC method, and individual tests. We also examined the impact of sequence errors on their type I error rates. Finally, we apply the nine statistics to the published resequencing data set from ANGPTL4 in the Dallas Heart Study. We report that FPCA-based statistics have a higher power to detect association of rare variants and a stronger ability to filter sequence errors than the other seven methods.
Collapse
Affiliation(s)
- Li Luo
- Human Genetics Center, University of Texas School of Public Health, Houston, TX 77030, USA
| | | | | |
Collapse
|
338
|
Pan W, Shen X. Adaptive tests for association analysis of rare variants. Genet Epidemiol 2011; 35:381-8. [PMID: 21520272 DOI: 10.1002/gepi.20586] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2011] [Revised: 03/03/2011] [Accepted: 03/21/2011] [Indexed: 01/30/2023]
Abstract
In anticipation of the availability of next-generation sequencing data, there has been increasing interest in association analysis of rare variants (RVs). Owing to the extremely low frequency of a RV, single variant-based analysis and many existing tests developed for common variants may not be suitable. Hence, it is of interest to develop powerful statistical tests to assess association between complex traits and RVs with sequence data. Recently, a pooled association test based on variable thresholds (VT) was proposed and shown to be more powerful than some existing tests (Price et al. [2010] Am J Hum Genet 86:832-838). In this study, we generalize the VT test of Price et al. in several aspects. We propose a general class of adaptive tests that covers the VT test of Price et al. as a special case. In particular, we show that some of our proposed adaptive tests may substantially improve the power over the pooled association tests, including the VT test of Price et al., especially so in the presence of many neutral RVs and/or of causal RVs with opposite association directions, in which cases most of the existing pooled association tests suffer from significant loss of power. Our proposed tests are also general and flexible with the ability to incorporate weights on RVs and to adjust for covariates.
Collapse
Affiliation(s)
- Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455–0392, USA.
| | | |
Collapse
|
339
|
Voxelwise gene-wide association study (vGeneWAS): multivariate gene-based association testing in 731 elderly subjects. Neuroimage 2011; 56:1875-91. [PMID: 21497199 DOI: 10.1016/j.neuroimage.2011.03.077] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2010] [Revised: 02/19/2011] [Accepted: 03/28/2011] [Indexed: 12/18/2022] Open
Abstract
Imaging traits provide a powerful and biologically relevant substrate to examine the influence of genetics on the brain. Interest in genome-wide, brain-wide search for influential genetic variants is growing, but has mainly focused on univariate, SNP-based association tests. Moving to gene-based multivariate statistics, we can test the combined effect of multiple genetic variants in a single test statistic. Multivariate models can reduce the number of statistical tests in gene-wide or genome-wide scans and may discover gene effects undetectable with SNP-based methods. Here we present a gene-based method for associating the joint effect of single nucleotide polymorphisms (SNPs) in 18,044 genes across 31,662 voxels of the whole brain in 731 elderly subjects (mean age: 75.56±6.82SD years; 430 males) from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Structural MRI scans were analyzed using tensor-based morphometry (TBM) to compute 3D maps of regional brain volume differences compared to an average template image based on healthy elderly subjects. Using the voxel-level volume difference values as the phenotype, we selected the most significantly associated gene (out of 18,044) at each voxel across the brain. No genes identified were significant after correction for multiple comparisons, but several known candidates were re-identified, as were other genes highly relevant to brain function. GAB2, which has been previously associated with late-onset AD, was identified as the top gene in this study, suggesting the validity of the approach. This multivariate, gene-based voxelwise association study offers a novel framework to detect genetic influences on the brain.
Collapse
|
340
|
Affiliation(s)
- Brian J. Morris
- From the Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, The University of Sydney, Sydney, Australia
| |
Collapse
|
341
|
Nieduszynski CA, Liti G. From sequence to function: Insights from natural variation in budding yeasts. Biochim Biophys Acta Gen Subj 2011; 1810:959-66. [PMID: 21320572 PMCID: PMC3271348 DOI: 10.1016/j.bbagen.2011.02.004] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2010] [Revised: 02/03/2011] [Accepted: 02/08/2011] [Indexed: 12/18/2022]
Abstract
Background Natural variation offers a powerful approach for assigning function to DNA sequence—a pressing challenge in the age of high throughput sequencing technologies. Scope of Review Here we review comparative genomic approaches that are bridging the sequence–function and genotype–phenotype gaps. Reverse genomic approaches aim to analyse sequence to assign function, whereas forward genomic approaches start from a phenotype and aim to identify the underlying genotype responsible. Major Conclusions Comparative genomic approaches, pioneered in budding yeasts, have resulted in dramatic improvements in our understanding of the function of both genes and regulatory sequences. Analogous studies in other systems, including humans, demonstrate the ubiquity of comparative genomic approaches. Recently, forward genomic approaches, exploiting natural variation within yeast populations, have started to offer powerful insights into how genotype influences phenotype and even the ability to predict phenotypes. General Significance Comparative genomic experiments are defining the fundamental rules that govern complex traits in natural populations from yeast to humans. This article is part of a Special Issue entitled Systems Biology of Microorganisms.
Collapse
|
342
|
Tavtigian SV, Hashibe M, Thomas A. Tests of association for rare variants: case control mutation screening. Nat Rev Genet 2011; 12:224. [PMID: 21283087 DOI: 10.1038/nrg2867-c1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
343
|
Longmate JA, Larson GP, Krontiris TG, Sommer SS. Three ways of combining genotyping and resequencing in case-control association studies. PLoS One 2010; 5:e14318. [PMID: 21187953 PMCID: PMC3004857 DOI: 10.1371/journal.pone.0014318] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Accepted: 11/15/2010] [Indexed: 11/18/2022] Open
Abstract
We describe three statistical results that we have found to be useful in case-control genetic association testing. All three involve combining the discovery of novel genetic variants, usually by sequencing, with genotyping methods that recognize previously discovered variants. We first consider expanding the list of known variants by concentrating variant-discovery in cases. Although the naive inclusion of cases-only sequencing data would create a bias, we show that some sequencing data may be retained, even if controls are not sequenced. Furthermore, for alleles of intermediate frequency, cases-only sequencing with bias-correction entails little if any loss of power, compared to dividing the same sequencing effort among cases and controls. Secondly, we investigate more strongly focused variant discovery to obtain a greater enrichment for disease-related variants. We show how case status, family history, and marker sharing enrich the discovery set by increments that are multiplicative with penetrance, enabling the preferential discovery of high-penetrance variants. A third result applies when sequencing is the primary means of counting alleles in both cases and controls, but a supplementary pooled genotyping sample is used to identify the variants that are very rare. We show that this raises no validity issues, and we evaluate a less expensive and more adaptive approach to judging rarity, based on group-specific variants. We demonstrate the important and unusual caveat that this method requires equal sample sizes for validity. These three results can be used to more efficiently detect the association of rare genetic variants with disease.
Collapse
Affiliation(s)
- Jeffrey A Longmate
- Division of Biostatistics, City of Hope, Duarte, California, United States of America.
| | | | | | | |
Collapse
|