701
|
Lange LA, Hu Y, Zhang H, Xue C, Schmidt EM, Tang ZZ, Bizon C, Lange EM, Smith JD, Turner EH, Jun G, Kang HM, Peloso G, Auer P, Li KP, Flannick J, Zhang J, Fuchsberger C, Gaulton K, Lindgren C, Locke A, Manning A, Sim X, Rivas MA, Holmen OL, Gottesman O, Lu Y, Ruderfer D, Stahl EA, Duan Q, Li Y, Durda P, Jiao S, Isaacs A, Hofman A, Bis JC, Correa A, Griswold ME, Jakobsdottir J, Smith AV, Schreiner PJ, Feitosa MF, Zhang Q, Huffman JE, Crosby J, Wassel CL, Do R, Franceschini N, Martin LW, Robinson JG, Assimes TL, Crosslin DR, Rosenthal EA, Tsai M, Rieder MJ, Farlow DN, Folsom AR, Lumley T, Fox ER, Carlson CS, Peters U, Jackson RD, van Duijn CM, Uitterlinden AG, Levy D, Rotter JI, Taylor HA, Gudnason V, Siscovick DS, Fornage M, Borecki IB, Hayward C, Rudan I, Chen YE, Bottinger EP, Loos RJF, Sætrom P, Hveem K, Boehnke M, Groop L, McCarthy M, Meitinger T, Ballantyne CM, Gabriel SB, O'Donnell CJ, Post WS, North KE, Reiner AP, Boerwinkle E, Psaty BM, Altshuler D, Kathiresan S, Lin DY, Jarvik GP, Cupples LA, Kooperberg C, Wilson JG, Nickerson DA, Abecasis GR, Rich SS, Tracy RP, Willer CJ. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol. Am J Hum Genet 2014; 94:233-45. [PMID: 24507775 PMCID: PMC3928660 DOI: 10.1016/j.ajhg.2014.01.010] [Citation(s) in RCA: 167] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 01/14/2014] [Indexed: 10/25/2022] Open
Abstract
Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98(th) or <2(nd) percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments.
Collapse
Affiliation(s)
- Leslie A Lange
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Youna Hu
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - He Zhang
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Chenyi Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Ellen M Schmidt
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zheng-Zheng Tang
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Chris Bizon
- Renaissance Computing Institute, Chapel Hill, NC 27517, USA
| | - Ethan M Lange
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Emily H Turner
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Goo Jun
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Hyun Min Kang
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gina Peloso
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA
| | - Paul Auer
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA; School of Public Health, University of Wisconsin - Milwaukee, Milwaukee, WI 53201, USA
| | - Kuo-Ping Li
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jason Flannick
- Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Ji Zhang
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | | | - Kyle Gaulton
- Wellcome Trust Centre for Human Genetics, University of Oxford, OX1 2JD Oxford, UK
| | - Cecilia Lindgren
- Wellcome Trust Centre for Human Genetics, University of Oxford, OX1 2JD Oxford, UK
| | - Adam Locke
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Alisa Manning
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; School of Public Health, University of Wisconsin - Milwaukee, Milwaukee, WI 53201, USA; Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Genetics, Harvard Medical School, Boston, MA 02138, USA
| | - Xueling Sim
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Manuel A Rivas
- Wellcome Trust Centre for Human Genetics, University of Oxford, OX1 2JD Oxford, UK
| | - Oddgeir L Holmen
- HUNT Research Center, Department of Public Health, Norwegian University of Science and Technology, 7600 Levanger, Norway
| | - Omri Gottesman
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Yingchang Lu
- The Genetics of Obesity and Related Metabolic Traits Program, The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Douglas Ruderfer
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Eli A Stahl
- Division of Psychiatric Genomics, Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Qing Duan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA; Department of Computer Science, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Peter Durda
- Department of Pathology, University of Vermont, Colchester, VT 05446, USA
| | - Shuo Jiao
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Aaron Isaacs
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, 3015 DR Rotterdam, the Netherlands
| | - Albert Hofman
- Department of Epidemiology, Erasmus University Medical Center, 3000 DR Rotterdam, the Netherlands
| | - Joshua C Bis
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Michael E Griswold
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | | | - Albert V Smith
- Icelandic Heart Association, IS-201 Kopavogur, Iceland; University of Iceland, 101 Reykjavik, Iceland
| | - Pamela J Schreiner
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN 55454, USA
| | - Mary F Feitosa
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Qunyuan Zhang
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Jennifer E Huffman
- Medical Research Center for Human Genetics, Medical Research Center Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, UK
| | - Jacy Crosby
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Christina L Wassel
- Department of Epidemiology, University of Pittsburgh, Pittsburgh, PA 15261, USA
| | - Ron Do
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA
| | - Nora Franceschini
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC 20037, USA
| | - Jennifer G Robinson
- Departments of Epidemiology and Medicine, University of Iowa, Iowa City, IA 52242, USA
| | | | - David R Crosslin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Elisabeth A Rosenthal
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Michael Tsai
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN 55454, USA
| | - Mark J Rieder
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | - Aaron R Folsom
- Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, Minneapolis, MN 55454, USA
| | - Thomas Lumley
- Department of Biostatistics, University of Washington, Seattle, WA 98195, USA; Department of Statistics, University of Auckland, Auckland 1142, New Zealand
| | - Ervin R Fox
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Christopher S Carlson
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Rebecca D Jackson
- Division of Endocrinology, Ohio State University, Columbus, OH 43210, USA
| | - Cornelia M van Duijn
- Genetic Epidemiology Unit, Department of Epidemiology, Erasmus University Medical Center, 3015 DR Rotterdam, the Netherlands
| | - André G Uitterlinden
- Department of Internal Medicine, Erasmus University Medical Center, 3000 DR Rotterdam, the Netherlands
| | - Daniel Levy
- Center for Population Studies, National Heart, Lung, and Blood Institute, Framingham, MA 01702, USA; Framingham Heart Study, National Heart, Lung, and Blood Institute, Framingham, MA 01702, USA
| | - Jerome I Rotter
- Institute for Translational Genomics and Population Sciences, Los Angeles BioMedical Research Institute, and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Herman A Taylor
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS 39216, USA; Tougaloo College, Jackson, MS 39174, USA; Jackson State University, Jackson, MS 39217, USA
| | - Vilmundur Gudnason
- Icelandic Heart Association, IS-201 Kopavogur, Iceland; University of Iceland, 101 Reykjavik, Iceland
| | - David S Siscovick
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA; Department of Medicine, University of Washington Medical Center, Seattle, WA 98195, USA
| | - Myriam Fornage
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA; Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Ingrid B Borecki
- Division of Statistical Genomics, Department of Genetics, Washington University School of Medicine, St. Louis, MO 63110, USA
| | - Caroline Hayward
- Medical Research Center for Human Genetics, Medical Research Center Institute of Genetics and Molecular Medicine, University of Edinburgh, EH4 2XU Edinburgh, UK
| | - Igor Rudan
- Centre for Population Health Sciences, Medical School, University of Edinburgh, EH8 9YL Edinburgh, UK
| | - Y Eugene Chen
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA
| | - Erwin P Bottinger
- The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Ruth J F Loos
- The Genetics of Obesity and Related Metabolic Traits Program, The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, 7491 Trondheim, Norway; Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, 7489 Trondheim, Norway
| | - Kristian Hveem
- HUNT Research Center, Department of Public Health, Norwegian University of Science and Technology, 7600 Levanger, Norway
| | - Michael Boehnke
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Leif Groop
- Department of Clinical Sciences, Diabetes, and Endocrinology, Lund University, Skåne University Hospital, 221 00 Malmö, Sweden; Glostrup Research Institute, Glostrup University Hospital, 2600 Glostrup, Denmark
| | - Mark McCarthy
- Oxford Centre for Diabetes, Endocrinology, and Metabolism and Oxford National Institute for Health Research Biomedical Research Centre, University of Oxford, Churchill Hospital, OX1 2JD Oxford, UK
| | - Thomas Meitinger
- Institute of Human Genetics, Helmholtz Center Munich, German Research Center for Environmental Health, 85764 Neuherberg, Germany; Institute of Human Genetics, Technical University of Munich, 85764 Neuherberg, Germany
| | - Christie M Ballantyne
- Baylor College of Medicine, Houston, TX 77030, USA; Houston Methodist DeBakey Heart and Vascular Center, Houston, TX 77030, USA
| | - Stacey B Gabriel
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA
| | - Christopher J O'Donnell
- Center for Population Studies, National Heart, Lung, and Blood Institute, Framingham, MA 01702, USA; Cardiology Division, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Wendy S Post
- Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Kari E North
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA 98195, USA
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Department of Medicine, University of Washington, Seattle, WA 98195, USA; Department of Epidemiology, University of Washington, Seattle, WA 98195, USA; Department of Medicine, University of Washington Medical Center, Seattle, WA 98195, USA; Group Health Research Institute, Group Health Cooperative, Seattle, WA 98195, USA
| | - David Altshuler
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Department of Molecular Biology, Massachusetts General Hospital, Boston, MA 02114, USA; Department of Genetics, Harvard Medical School, Boston, MA 02138, USA
| | - Sekar Kathiresan
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA; Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02141, USA; Cardiology Division, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Dan-Yu Lin
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Gail P Jarvik
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA; Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - L Adrienne Cupples
- Center for Population Studies, National Heart, Lung, and Blood Institute, Framingham, MA 01702, USA; Department of Biostatistics, Boston University School of Public Health, Boston, MA 02215, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS 39216, USA
| | - Deborah A Nickerson
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | - Goncalo R Abecasis
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22908, USA
| | - Russell P Tracy
- Department of Pathology, University of Vermont, Colchester, VT 05446, USA; Department of Biochemistry, University of Vermont, Burlington, VT 05405, USA
| | - Cristen J Willer
- Division of Cardiovascular Medicine, Department of Internal Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
702
|
Sha Q, Zhang S. A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet Epidemiol 2014; 38:135-43. [PMID: 24382753 PMCID: PMC4162402 DOI: 10.1002/gepi.21787] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 10/28/2013] [Accepted: 12/02/2013] [Indexed: 11/10/2022]
Abstract
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | | |
Collapse
|
703
|
Liu DJ, Peloso GM, Zhan X, Holmen OL, Zawistowski M, Feng S, Nikpay M, Auer PL, Goel A, Zhang H, Peters U, Farrall M, Orho-Melander M, Kooperberg C, McPherson R, Watkins H, Willer CJ, Hveem K, Melander O, Kathiresan S, Abecasis GR. Meta-analysis of gene-level tests for rare variant association. Nat Genet 2014; 46:200-4. [PMID: 24336170 PMCID: PMC3939031 DOI: 10.1038/ng.2852] [Citation(s) in RCA: 144] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2013] [Accepted: 11/20/2013] [Indexed: 12/14/2022]
Abstract
The majority of reported complex disease associations for common genetic variants have been identified through meta-analysis, a powerful approach that enables the use of large sample sizes while protecting against common artifacts due to population structure and repeated small-sample analyses sharing individual-level data. As the focus of genetic association studies shifts to rare variants, genes and other functional units are becoming the focus of analysis. Here we propose and evaluate new approaches for performing meta-analysis of rare variant association tests, including burden tests, weighted burden tests, variable-threshold tests and tests that allow variants with opposite effects to be grouped together. We show that our approach retains useful features from single-variant meta-analysis approaches and demonstrate its use in a study of blood lipid levels in ∼18,500 individuals genotyped with exome arrays.
Collapse
Affiliation(s)
- Dajiang J. Liu
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Gina M. Peloso
- Broad Institute of Harvard and MIT, Cambridge, MA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA
| | - Xiaowei Zhan
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Oddgeir L. Holmen
- Department of Public Health and General Practice, Norwegian University of Science and Technology, Trondheim 7489, Norway
- St. Olav Hospital, Trondheim University Hospital, Trondheim, Norway
| | - Matthew Zawistowski
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Shuang Feng
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Majid Nikpay
- University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Paul L. Auer
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- School of Public Health, University of Wisconsin-Milwaukee
| | - Anuj Goel
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - He Zhang
- Division of Cardiology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- Department of Epidemiology, University of Washington School of Public Health, Seattle, WA
| | - Martin Farrall
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - Marju Orho-Melander
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Charles Kooperberg
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle WA 98109, USA
- Department of Biostatistics, University of Washington School of Public Health, Seattle, WA
| | - Ruth McPherson
- University of Ottawa Heart Institute, Ottawa, Ontario, Canada
| | - Hugh Watkins
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, United Kingdom
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
| | - Cristen J. Willer
- Division of Cardiology, Department of Internal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Kristian Hveem
- Department of Public Health and General Practice, Norwegian University of Science and Technology, Trondheim 7489, Norway
- Levanger Hospital, Levanger, Norway
| | - Olle Melander
- Department of Cardiovascular Medicine, University of Oxford, Oxford, UK
- Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Sekar Kathiresan
- Broad Institute of Harvard and MIT, Cambridge, MA
- Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA
- Harvard Medical School, Cambridge, MA
| | - Gonçalo R. Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| |
Collapse
|
704
|
Xu C, Ciampi A, Greenwood CMT. Exploring the potential benefits of stratified false discovery rates for region-based testing of association with rare genetic variation. Front Genet 2014; 5:11. [PMID: 24523729 PMCID: PMC3905218 DOI: 10.3389/fgene.2014.00011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2013] [Accepted: 01/13/2014] [Indexed: 01/13/2023] Open
Abstract
When analyzing the data that arises from exome or whole-genome sequencing studies, window-based tests, (i.e., tests that jointly analyze all genetic data in a small genomic region), are very popular. However, power is known to be quite low for finding associations with phenotypes using these tests, and therefore a variety of analytic strategies may be employed to potentially improve power. Using sequencing data of all of chromosome 3 from an interim release of data on 2432 individuals from the UK10K project, we simulated phenotypes associated with rare genetic variation, and used the results to explore the window-based test power. We asked two specific questions: firstly, whether there could be substantial benefits associated with incorporating information from external annotation on the genetic variants, and secondly whether the false discovery rate (FDRs) would be a useful metric for assessing significance. Although, as expected, there are benefits to using additional information (such as annotation) when it is associated with causality, we confirmed the general pattern of low sensitivity and power for window-based tests. For our chosen example, even when power is high to detect some of the associations, many of the regions containing causal variants are not detectable, despite using lax significance thresholds and optimal analytic methods. Furthermore, our estimated FDR values tended to be much smaller than the true FDRs. Long-range correlations between variants—due to linkage disequilibrium—likely explain some of this bias. A more sophisticated approach to using the annotation information may improve power, however, many causal variants of realistic effect sizes may simply be undetectable, at least with this sample size. Perhaps annotation information could assist in distinguishing windows containing causal variants from windows that are merely correlated with causal variants.
Collapse
Affiliation(s)
- Changjiang Xu
- Lady Davis Institute for Medical Research, Jewish General Hospital Montreal, QC, Canada ; Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada
| | - Antonio Ciampi
- Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada
| | - Celia M T Greenwood
- Lady Davis Institute for Medical Research, Jewish General Hospital Montreal, QC, Canada ; Department of Epidemiology, Biostatistics and Occupational Health, McGill University Montreal, QC, Canada ; Departments of Oncology and Human Genetics, McGill University Montreal, QC, Canada
| | | |
Collapse
|
705
|
Zhang Y, Long J, Lu W, Shu XO, Cai Q, Zheng Y, Li C, Li B, Gao YT, Zheng W. Rare coding variants and breast cancer risk: evaluation of susceptibility Loci identified in genome-wide association studies. Cancer Epidemiol Biomarkers Prev 2014; 23:622-8. [PMID: 24470074 DOI: 10.1158/1055-9965.epi-13-1043] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND To date, common genetic variants in approximately 70 loci have been identified for breast cancer via genome-wide association studies (GWAS). It is unknown whether rare variants in these loci are also associated with breast cancer risk. METHODS We investigated rare missense/nonsense variants with minor allele frequency (MAF) ≤5% located in flanking 500 kb of each of the index single-nucleotide polymorphism (SNP) in 67 GWAS loci. Included in the study were 3,472 cases and 3,595 controls from the Shanghai Breast Cancer Study. Both single marker and gene-based analyses were conducted to investigate the associations. RESULTS Single marker analyses identified 38 missense variants being associated with breast cancer risk at P < 0.05 after adjusting for the index SNP. SNP rs146217902 in the EDEM1 gene and rs200340088 in the EFEMP2 gene were only observed in 8 cases (P = 0.004 for both). SNP rs200995432 in the EFEMP2 gene was associated with increased risk with an OR of 6.2 [95% confidence interval (CI), 1.4-27.6; P = 6.2 × 10(-3)]. SNP rs80358978 in the BRCA2 gene was associated with 16.5-fold elevated risk (95% CI, 2.2-124.5; P = 2.2 × 10(-4)). Gene-based analyses suggested eight genes associated with breast cancer risk at P < 0.05, including the EFEMP2 gene (P = 0.002) and the FBXO18 gene (P = 0.008). CONCLUSION Our results identified associations of several rare coding variants neighboring common GWAS loci with breast cancer risk. Further investigation of these rare variants and genes would help to understand the biologic mechanisms underlying the associations. IMPACT Independent studies with larger sample size are warranted to clarify the relationship between these rare variants and breast cancer risk.
Collapse
Affiliation(s)
- Yanfeng Zhang
- Authors' Affiliations: Division of Epidemiology, Department of Medicine, Vanderbilt Epidemiology Center, Vanderbilt-Ingram Cancer Center; Department of Biostatistics; Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University School of Medicine, Nashville, Tennessee; Shanghai Center for Disease Control and Prevention; and Department of Epidemiology, Shanghai Cancer Institute, Shanghai, China
| | | | | | | | | | | | | | | | | | | |
Collapse
|
706
|
Chen H, Lumley T, Brody J, Heard-Costa NL, Fox CS, Cupples LA, Dupuis J. Sequence kernel association test for survival traits. Genet Epidemiol 2014; 38:191-7. [PMID: 24464521 DOI: 10.1002/gepi.21791] [Citation(s) in RCA: 54] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Revised: 12/20/2013] [Accepted: 12/21/2013] [Indexed: 11/11/2022]
Abstract
Rare variant tests have been of great interest in testing genetic associations with diseases and disease-related quantitative traits in recent years. Among these tests, the sequence kernel association test (SKAT) is an omnibus test for effects of rare genetic variants, in a linear or logistic regression framework. It is often described as a variance component test treating the genotypic effects as random. When the linear kernel is used, its test statistic can be expressed as a weighted sum of single-marker score test statistics. In this paper, we extend the test to survival phenotypes in a Cox regression framework. Because of the anticonservative small-sample performance of the score test in a Cox model, we substitute signed square-root likelihood ratio statistics for the score statistics, and confirm that the small-sample control of type I error is greatly improved. This test can also be applied in meta-analysis. We show in our simulation studies that this test has superior statistical power except in a few specific scenarios, as compared to burden tests in a Cox model. We also present results in an application to time-to-obesity using genotypes from Framingham Heart Study SNP Health Association Resource.
Collapse
Affiliation(s)
- Han Chen
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America; Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | | | | | | | | | | | | |
Collapse
|
707
|
Li B, Liu DJ, Leal SM. Identifying rare variants associated with complex traits via sequencing. ACTA ACUST UNITED AC 2014; Chapter 1:Unit 1.26. [PMID: 23853079 DOI: 10.1002/0471142905.hg0126s78] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although genome-wide association studies have been successful in detecting associations with common variants, there is currently an increasing interest in identifying low-frequency and rare variants associated with complex traits. Next-generation sequencing technologies make it feasible to survey the full spectrum of genetic variation in coding regions or the entire genome. The association analysis for rare variants is challenging, and traditional methods are ineffective, however, due to the low frequency of rare variants, coupled with allelic heterogeneity. Recently a battery of new statistical methods has been proposed for identifying rare variants associated with complex traits. These methods test for associations by aggregating multiple rare variants across a gene or a genomic region or among a group of variants in the genome. In this unit, we describe key concepts for rare variant association for complex traits, survey some of the recent methods, discuss their statistical power under various scenarios, and provide practical guidance on analyzing next-generation sequencing data for identifying rare variants associated with complex traits.
Collapse
Affiliation(s)
- Bingshan Li
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
| | | | | |
Collapse
|
708
|
Won S, Kim Y, Lange C. On rare-variant analysis in population-based designs: decomposing the likelihood to two informative components. Hum Hered 2014; 76:76-85. [PMID: 24434864 DOI: 10.1159/000357643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2012] [Accepted: 11/29/2013] [Indexed: 11/19/2022] Open
Abstract
Various analytical approaches have been suggested for the characterization of rare variants. One main approach is to collapse the genetic information of rare variants in a region and to construct an overall test statistic. Here, we proposed a new approach based on collapsed genotype scores. By utilizing the information of the association signal that is ignored in collapsing methods, i.e. the configuration of rare alleles, we constructed a more powerful test and compared it with existing rare-variant approaches. With extensive simulation studies, we showed that our method performs better than existing approaches, and we applied our method to a sequencing study of nonsyndromic cleft lip illustrating the practical advantages of the proposed method.
Collapse
Affiliation(s)
- Sungho Won
- Department of Applied Statistics, Chung-Ang University, Seoul, Korea
| | | | | |
Collapse
|
709
|
Ghosh A, Hartge P, Kraft P, Joshi AD, Ziegler RG, Barrdahl M, Chanock SJ, Wacholder S, Chatterjee N. Leveraging family history in population-based case-control association studies. Genet Epidemiol 2014; 38:114-22. [PMID: 24408355 DOI: 10.1002/gepi.21785] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 11/16/2013] [Accepted: 12/02/2013] [Indexed: 12/28/2022]
Abstract
Population-based epidemiologic studies often gather information from study participants on disease history among their family members. Although investigators widely recognize that family history will be associated with genotypes of the participants at disease susceptibility loci, they commonly ignore such information in primary genetic association analyses. In this report, we propose a simple approach to association testing by incorporating family history information as a "phenotype." We account for the expected attenuation in strength of association of the genotype of study participants with family history under Mendelian transmission. The proposed analysis can be performed using standard statistical software adopting either a meta- or pooled-analysis framework. Re-analysis of a total of 115 known susceptibility single-nucleotide polymorphisms, discovered through genome-wide association studies for several disease traits, indicates that incorporation of family history information can increase efficiency by as much as 40%. Efficiency gain depends on the type of design used for conducting the primary study, extent of family history, and accuracy and completeness of reporting.
Collapse
Affiliation(s)
- Arpita Ghosh
- Public Health Foundation of India, New Delhi, India
| | | | | | | | | | | | | | | | | |
Collapse
|
710
|
Abstract
Moving from a traditional medical model of treating pathologies to an individualized predictive and preventive model of personalized medicine promises to reduce the healthcare cost on an overburdened and overwhelmed system. Next-generation sequencing (NGS) has the potential to accelerate the early detection of disorders and the identification of pharmacogenetics markers to customize treatments. This review explains the historical facts that led to the development of NGS along with the strengths and weakness of NGS, with a special emphasis on the analytical aspects used to process NGS data. There are solutions to all the steps necessary for performing NGS in the clinical context where the majority of them are very efficient, but there are some crucial steps in the process that need immediate attention.
Collapse
Affiliation(s)
- Manuel L. Gonzalez-Garay
- Center for Molecular Imaging, Division of Genomics & Bioinformatics, The Brown Foundation Institute of Molecular Medicine, University of Texas Health Science Center at Houston, Houston, TX 77030, USA
| |
Collapse
|
711
|
Larson NB, Schaid DJ. Regularized rare variant enrichment analysis for case-control exome sequencing data. Genet Epidemiol 2013; 38:104-13. [PMID: 24382715 DOI: 10.1002/gepi.21783] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 11/04/2013] [Accepted: 12/02/2013] [Indexed: 11/09/2022]
Abstract
Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | | |
Collapse
|
712
|
Thomas DC, Yang Z, Yang F. Two-phase and family-based designs for next-generation sequencing studies. Front Genet 2013; 4:276. [PMID: 24379824 PMCID: PMC3861783 DOI: 10.3389/fgene.2013.00276] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2013] [Accepted: 11/19/2013] [Indexed: 12/21/2022] Open
Abstract
The cost of next-generation sequencing is now approaching that of early GWAS panels, but is still out of reach for large epidemiologic studies and the millions of rare variants expected poses challenges for distinguishing causal from non-causal variants. We review two types of designs for sequencing studies: two-phase designs for targeted follow-up of genomewide association studies using unrelated individuals; and family-based designs exploiting co-segregation for prioritizing variants and genes. Two-phase designs subsample subjects for sequencing from a larger case-control study jointly on the basis of their disease and carrier status; the discovered variants are then tested for association in the parent study. The analysis combines the full sequence data from the substudy with the more limited SNP data from the main study. We discuss various methods for selecting this subset of variants and describe the expected yield of true positive associations in the context of an on-going study of second breast cancers following radiotherapy. While the sharing of variants within families means that family-based designs are less efficient for discovery than sequencing unrelated individuals, the ability to exploit co-segregation of variants with disease within families helps distinguish causal from non-causal ones. Furthermore, by enriching for family history, the yield of causal variants can be improved and use of identity-by-descent information improves imputation of genotypes for other family members. We compare the relative efficiency of these designs with those using unrelated individuals for discovering and prioritizing variants or genes for testing association in larger studies. While associations can be tested with single variants, power is low for rare ones. Recent generalizations of burden or kernel tests for gene-level associations to family-based data are appealing. These approaches are illustrated in the context of a family-based study of colorectal cancer.
Collapse
Affiliation(s)
- Duncan C Thomas
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Zhao Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| | - Fan Yang
- Department of Preventive Medicine, University of Southern California Los Angeles, CA, USA
| |
Collapse
|
713
|
Wang X, Lee S, Zhu X, Redline S, Lin X. GEE-based SNP set association test for continuous and discrete traits in family-based association studies. Genet Epidemiol 2013; 37:778-86. [PMID: 24166731 PMCID: PMC4007511 DOI: 10.1002/gepi.21763] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Revised: 08/17/2013] [Accepted: 09/10/2013] [Indexed: 12/17/2022]
Abstract
Family-based genetic association studies of related individuals provide opportunities to detect genetic variants that complement studies of unrelated individuals. Most statistical methods for family association studies for common variants are single marker based, which test one SNP a time. In this paper, we consider testing the effect of an SNP set, e.g., SNPs in a gene, in family studies, for both continuous and discrete traits. Specifically, we propose a generalized estimating equations (GEEs) based kernel association test, a variance component based testing method, to test for the association between a phenotype and multiple variants in an SNP set jointly using family samples. The proposed approach allows for both continuous and discrete traits, where the correlation among family members is taken into account through the use of an empirical covariance estimator. We derive the theoretical distribution of the proposed statistic under the null and develop analytical methods to calculate the P-values. We also propose an efficient resampling method for correcting for small sample size bias in family studies. The proposed method allows for easily incorporating covariates and SNP-SNP interactions. Simulation studies show that the proposed method properly controls for type I error rates under both random and ascertained sampling schemes in family studies. We demonstrate through simulation studies that our approach has superior performance for association mapping compared to the single marker based minimum P-value GEE test for an SNP-set effect over a range of scenarios. We illustrate the application of the proposed method using data from the Cleveland Family GWAS Study.
Collapse
Affiliation(s)
- Xuefeng Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Seunggeun Lee
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA 44106
| | - Susan Redline
- Department of Medicine, Brigham and Women’s Hospital and Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA 02115
| |
Collapse
|
714
|
Di Camillo B, Sambo F, Toffolo G, Cobelli C. ABACUS: an entropy-based cumulative bivariate statistic robust to rare variants and different direction of genotype effect. ACTA ACUST UNITED AC 2013; 30:384-91. [PMID: 24292361 DOI: 10.1093/bioinformatics/btt697] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
MOTIVATION In the past years, both sequencing and microarray have been widely used to search for relations between genetic variations and predisposition to complex pathologies such as diabetes or neurological disorders. These studies, however, have been able to explain only a small fraction of disease heritability, possibly because complex pathologies cannot be referred to few dysfunctional genes, but are rather heterogeneous and multicausal, as a result of a combination of rare and common variants possibly impairing multiple regulatory pathways. Rare variants, though, are difficult to detect, especially when the effects of causal variants are in different directions, i.e. with protective and detrimental effects. RESULTS Here, we propose ABACUS, an Algorithm based on a BivAriate CUmulative Statistic to identify single nucleotide polymorphisms (SNPs) significantly associated with a disease within predefined sets of SNPs such as pathways or genomic regions. ABACUS is robust to the concurrent presence of SNPs with protective and detrimental effects and of common and rare variants; moreover, it is powerful even when few SNPs in the SNP-set are associated with the phenotype. We assessed ABACUS performance on simulated and real data and compared it with three state-of-the-art methods. When ABACUS was applied to type 1 and 2 diabetes data, besides observing a wide overlap with already known associations, we found a number of biologically sound pathways, which might shed light on diabetes mechanism and etiology. AVAILABILITY AND IMPLEMENTATION ABACUS is available at http://www.dei.unipd.it/∼dicamill/pagine/Software.html.
Collapse
Affiliation(s)
- Barbara Di Camillo
- Department of Information Engineering, University of Padova, via Gradenigo 6B, 35131 Padova, Italy
| | | | | | | |
Collapse
|
715
|
Koboldt DC, Steinberg KM, Larson DE, Wilson RK, Mardis ER. The next-generation sequencing revolution and its impact on genomics. Cell 2013; 155:27-38. [PMID: 24074859 DOI: 10.1016/j.cell.2013.09.006] [Citation(s) in RCA: 613] [Impact Index Per Article: 55.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2013] [Indexed: 02/07/2023]
Abstract
Genomics is a relatively new scientific discipline, having DNA sequencing as its core technology. As technology has improved the cost and scale of genome characterization over sequencing's 40-year history, the scope of inquiry has commensurately broadened. Massively parallel sequencing has proven revolutionary, shifting the paradigm of genomics to address biological questions at a genome-wide scale. Sequencing now empowers clinical diagnostics and other aspects of medical care, including disease risk, therapeutic identification, and prenatal testing. This Review explores the current state of genomics in the massively parallel sequencing era.
Collapse
Affiliation(s)
- Daniel C Koboldt
- The Genome Institute, School of Medicine, Washington University, St. Louis, MO 63108, USA
| | | | | | | | | |
Collapse
|
716
|
Saad M, Wijsman EM. Power of family-based association designs to detect rare variants in large pedigrees using imputed genotypes. Genet Epidemiol 2013; 38:1-9. [PMID: 24243664 DOI: 10.1002/gepi.21776] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2013] [Revised: 09/30/2013] [Accepted: 10/15/2013] [Indexed: 01/09/2023]
Abstract
Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America; Department of Biostatistics, University of Washington, Seattle, Washington, United States of America
| | | |
Collapse
|
717
|
Landrum MJ, Lee JM, Riley GR, Jang W, Rubinstein WS, Church DM, Maglott DR. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 2013; 42:D980-5. [PMID: 24234437 PMCID: PMC3965032 DOI: 10.1093/nar/gkt1113] [Citation(s) in RCA: 1934] [Impact Index Per Article: 175.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) provides a freely available archive of reports of relationships among medically important variants and phenotypes. ClinVar accessions submissions reporting human variation, interpretations of the relationship of that variation to human health and the evidence supporting each interpretation. The database is tightly coupled with dbSNP and dbVar, which maintain information about the location of variation on human assemblies. ClinVar is also based on the phenotypic descriptions maintained in MedGen (http://www.ncbi.nlm.nih.gov/medgen). Each ClinVar record represents the submitter, the variation and the phenotype, i.e. the unit that is assigned an accession of the format SCV000000000.0. The submitter can update the submission at any time, in which case a new version is assigned. To facilitate evaluation of the medical importance of each variant, ClinVar aggregates submissions with the same variation/phenotype combination, adds value from other NCBI databases, assigns a distinct accession of the format RCV000000000.0 and reports if there are conflicting clinical interpretations. Data in ClinVar are available in multiple formats, including html, download as XML, VCF or tab-delimited subsets. Data from ClinVar are provided as annotation tracks on genomic RefSeqs and are used in tools such as Variation Reporter (http://www.ncbi.nlm.nih.gov/variation/tools/reporter), which reports what is known about variation based on user-supplied locations.
Collapse
Affiliation(s)
- Melissa J Landrum
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA
| | | | | | | | | | | | | |
Collapse
|
718
|
Schaid DJ, Sinnwell JP, McDonnell SK, Thibodeau SN. Detecting genomic clustering of risk variants from sequence data: cases versus controls. Hum Genet 2013; 132:1301-9. [PMID: 23842950 PMCID: PMC3797865 DOI: 10.1007/s00439-013-1335-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 07/02/2013] [Indexed: 02/02/2023]
Abstract
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method--Tango's statistic--to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ(2) distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA,
| | | | | | | |
Collapse
|
719
|
|
720
|
Disruption of the non-canonical Wnt gene PRICKLE2 leads to autism-like behaviors with evidence for hippocampal synaptic dysfunction. Mol Psychiatry 2013; 18:1077-89. [PMID: 23711981 PMCID: PMC4163749 DOI: 10.1038/mp.2013.71] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/20/2012] [Revised: 04/08/2013] [Accepted: 04/19/2013] [Indexed: 12/30/2022]
Abstract
Autism spectrum disorders (ASDs) have been suggested to arise from abnormalities in the canonical and non-canonical Wnt signaling pathways. However, a direct connection between a human variant in a Wnt pathway gene and ASD-relevant brain pathology has not been established. Prickle2 (Pk2) is a post-synaptic non-canonical Wnt signaling protein shown to interact with post-synaptic density 95 (PSD-95). Here, we show that mice with disruption in Prickle2 display behavioral abnormalities including altered social interaction, learning abnormalities and behavioral inflexibility. Prickle2 disruption in mouse hippocampal neurons led to reductions in dendrite branching, synapse number and PSD size. Consistent with these findings, Prickle2 null neurons show decreased frequency and size of spontaneous miniature synaptic currents. These behavioral and physiological abnormalities in Prickle2 disrupted mice are consistent with ASD-like phenotypes present in other mouse models of ASDs. In 384 individuals with autism, we identified two with distinct, heterozygous, rare, non-synonymous PRICKLE2 variants (p.E8Q and p.V153I) that were shared by their affected siblings and inherited paternally. Unlike wild-type PRICKLE2, the PRICKLE2 variants found in ASD patients exhibit deficits in morphological and electrophysiological assays. These data suggest that these PRICKLE2 variants cause a critical loss of PRICKLE2 function. The data presented here provide new insight into the biological roles of Prickle2, its behavioral importance, and suggest disruptions in non-canonical Wnt genes such as PRICKLE2 may contribute to synaptic abnormalities underlying ASDs.
Collapse
|
721
|
Wang S, Yang Z, Ma JZ, Payne TJ, Li MD. Introduction to deep sequencing and its application to drug addiction research with a focus on rare variants. Mol Neurobiol 2013; 49:601-14. [PMID: 23990377 DOI: 10.1007/s12035-013-8541-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 08/16/2013] [Indexed: 11/30/2022]
Abstract
Through linkage analysis, candidate gene approach, and genome-wide association studies (GWAS), many genetic susceptibility factors for substance dependence have been discovered such as the alcohol dehydrogenase gene (ALDH2) for alcohol dependence (AD) and nicotinic acetylcholine receptor (nAChR) subunit variants on chromosomes 8 and 15 for nicotine dependence (ND). However, these confirmed genetic factors contribute only a small portion of the heritability responsible for each addiction. Among many potential factors, rare variants in those identified and unidentified susceptibility genes are supposed to contribute greatly to the missing heritability. Several studies focusing on rare variants have been conducted by taking advantage of next-generation sequencing technologies, which revealed that some rare variants of nAChR subunits are associated with ND in both genetic and functional studies. However, these studies investigated variants for only a small number of genes and need to be expanded to broad regions/genes in a larger population. This review presents an update on recently developed methods for rare-variant identification and association analysis and on studies focused on rare-variant discovery and function related to addictions.
Collapse
Affiliation(s)
- Shaolin Wang
- Department of Psychiatry & Neurobiology Science, University of Virginia, 1670 Discovery Drive, Suite 110, Charlottesville, VA, 22911, USA
| | | | | | | | | |
Collapse
|
722
|
Panoutsopoulou K, Tachmazidou I, Zeggini E. In search of low-frequency and rare variants affecting complex traits. Hum Mol Genet 2013; 22:R16-21. [PMID: 23922232 PMCID: PMC3782074 DOI: 10.1093/hmg/ddt376] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
Collapse
Affiliation(s)
| | | | - Eleftheria Zeggini
- To whom correspondence should be addressed at: Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. Tel: +44-1223496868; Fax: +44-1223496826;
| |
Collapse
|
723
|
Lee S, Teslovich T, Boehnke M, Lin X. General framework for meta-analysis of rare variants in sequencing association studies. Am J Hum Genet 2013; 93:42-53. [PMID: 23768515 PMCID: PMC3710762 DOI: 10.1016/j.ajhg.2013.05.010] [Citation(s) in RCA: 169] [Impact Index Per Article: 15.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 04/19/2013] [Accepted: 05/14/2013] [Indexed: 12/22/2022] Open
Abstract
We propose a general statistical framework for meta-analysis of gene- or region-based multimarker rare variant association tests in sequencing association studies. In genome-wide association studies, single-marker meta-analysis has been widely used to increase statistical power by combining results via regression coefficients and standard errors from different studies. In analysis of rare variants in sequencing studies, region-based multimarker tests are often used to increase power. We propose meta-analysis methods for commonly used gene- or region-based rare variants tests, such as burden tests and variance component tests. Because estimation of regression coefficients of individual rare variants is often unstable or not feasible, the proposed method avoids this difficulty by calculating score statistics instead that only require fitting the null model for each study and then aggregating these score statistics across studies. Our proposed meta-analysis rare variant association tests are conducted based on study-specific summary statistics, specifically score statistics for each variant and between-variant covariance-type (linkage disequilibrium) relationship statistics for each gene or region. The proposed methods are able to incorporate different levels of heterogeneity of genetic effects across studies and are applicable to meta-analysis of multiple ancestry groups. We show that the proposed methods are essentially as powerful as joint analysis by directly pooling individual level genotype data. We conduct extensive simulations to evaluate the performance of our methods by varying levels of heterogeneity across studies, and we apply the proposed methods to meta-analysis of rare variant effects in a multicohort study of the genetics of blood lipid levels.
Collapse
Affiliation(s)
- Seunggeun Lee
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - Tanya M. Teslovich
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
724
|
Ayers KL, Cordell HJ. Identification of grouped rare and common variants via penalized logistic regression. Genet Epidemiol 2013; 37:592-602. [PMID: 23836590 PMCID: PMC3842118 DOI: 10.1002/gepi.21746] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Revised: 05/24/2013] [Accepted: 05/24/2013] [Indexed: 11/09/2022]
Abstract
In spite of the success of genome-wide association studies in finding many common variants associated with disease, these variants seem to explain only a small proportion of the estimated heritability. Data collection has turned toward exome and whole genome sequencing, but it is well known that single marker methods frequently used for common variants have low power to detect rare variants associated with disease, even with very large sample sizes. In response, a variety of methods have been developed that attempt to cluster rare variants so that they may gather strength from one another under the premise that there may be multiple causal variants within a gene. Most of these methods group variants by gene or proximity, and test one gene or marker window at a time. We propose a penalized regression method (PeRC) that analyzes all genes at once, allowing grouping of all (rare and common) variants within a gene, along with subgrouping of the rare variants, thus borrowing strength from both rare and common variants within the same gene. The method can incorporate either a burden-based weighting of the rare variants or one in which the weights are data driven. In simulations, our method performs favorably when compared to many previously proposed approaches, including its predecessor, the sparse group lasso [Friedman et al., 2010].
Collapse
Affiliation(s)
- Kristin L Ayers
- Institute of Genetic Medicine, Newcastle University, Newcastle upon Tyne NE1 3BZ, United Kingdom.
| | | |
Collapse
|
725
|
Ionita-Laza I, Lee S, Makarov V, Buxbaum J, Lin X. Sequence kernel association tests for the combined effect of rare and common variants. Am J Hum Genet 2013; 92:841-53. [PMID: 23684009 DOI: 10.1016/j.ajhg.2013.04.015] [Citation(s) in RCA: 332] [Impact Index Per Article: 30.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Revised: 03/20/2013] [Accepted: 04/18/2013] [Indexed: 01/08/2023] Open
Abstract
Recent developments in sequencing technologies have made it possible to uncover both rare and common genetic variants. Genome-wide association studies (GWASs) can test for the effect of common variants, whereas sequence-based association studies can evaluate the cumulative effect of both rare and common variants on disease risk. Many groupwise association tests, including burden tests and variance-component tests, have been proposed for this purpose. Although such tests do not exclude common variants from their evaluation, they focus mostly on testing the effect of rare variants by upweighting rare-variant effects and downweighting common-variant effects and can therefore lose substantial power when both rare and common genetic variants in a region influence trait susceptibility. There is increasing evidence that the allelic spectrum of risk variants at a given locus might include novel, rare, low-frequency, and common genetic variants. Here, we introduce several sequence kernel association tests to evaluate the cumulative effect of rare and common variants. The proposed tests are computationally efficient and are applicable to both binary and continuous traits. Furthermore, they can readily combine GWAS and whole-exome-sequencing data on the same individuals, when available, and are also applicable to deep-resequencing data of GWAS loci. We evaluate these tests on data simulated under comprehensive scenarios and show that compared with the most commonly used tests, including the burden and variance-component tests, they can achieve substantial increases in power. We next show applications to sequencing studies for Crohn disease and autism spectrum disorders. The proposed tests have been incorporated into the software package SKAT.
Collapse
|
726
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
727
|
Evangelou E, Ioannidis JPA. Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 2013; 14:379-89. [PMID: 23657481 DOI: 10.1038/nrg3472] [Citation(s) in RCA: 400] [Impact Index Per Article: 36.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.
Collapse
Affiliation(s)
- Evangelos Evangelou
- Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina 45110, Greece
| | | |
Collapse
|
728
|
Schaid DJ, McDonnell SK, Sinnwell JP, Thibodeau SN. Multiple genetic variant association testing by collapsing and kernel methods with pedigree or population structured data. Genet Epidemiol 2013; 37:409-18. [PMID: 23650101 DOI: 10.1002/gepi.21727] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2013] [Revised: 03/11/2013] [Accepted: 04/01/2013] [Indexed: 11/11/2022]
Abstract
Searching for rare genetic variants associated with complex diseases can be facilitated by enriching for diseased carriers of rare variants by sampling cases from pedigrees enriched for disease, possibly with related or unrelated controls. This strategy, however, complicates analyses because of shared genetic ancestry, as well as linkage disequilibrium among genetic markers. To overcome these problems, we developed broad classes of "burden" statistics and kernel statistics, extending commonly used methods for unrelated case-control data to allow for known pedigree relationships, for autosomes and the X chromosome. Furthermore, by replacing pedigree-based genetic correlation matrices with estimates of genetic relationships based on large-scale genomic data, our methods can be used to account for population-structured data. By simulations, we show that the type I error rates of our developed methods are near the asymptotic nominal levels, allowing rapid computation of P-values. Our simulations also show that a linear weighted kernel statistic is generally more powerful than a weighted "burden" statistic. Because the proposed statistics are rapid to compute, they can be readily used for large-scale screening of the association of genomic sequence data with disease status.
Collapse
Affiliation(s)
- Daniel J Schaid
- Department of Health Sciences Research, Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, Minnesota 55905, USA.
| | | | | | | |
Collapse
|
729
|
Listgarten J, Lippert C, Kang EY, Xiang J, Kadie CM, Heckerman D. A powerful and efficient set test for genetic markers that handles confounders. ACTA ACUST UNITED AC 2013; 29:1526-33. [PMID: 23599503 PMCID: PMC3673214 DOI: 10.1093/bioinformatics/btt177] [Citation(s) in RCA: 60] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
MOTIVATION Approaches for testing sets of variants, such as a set of rare or common variants within a gene or pathway, for association with complex traits are important. In particular, set tests allow for aggregation of weak signal within a set, can capture interplay among variants and reduce the burden of multiple hypothesis testing. Until now, these approaches did not address confounding by family relatedness and population structure, a problem that is becoming more important as larger datasets are used to increase power. RESULTS We introduce a new approach for set tests that handles confounders. Our model is based on the linear mixed model and uses two random effects-one to capture the set association signal and one to capture confounders. We also introduce a computational speedup for two random-effects models that makes this approach feasible even for extremely large cohorts. Using this model with both the likelihood ratio test and score test, we find that the former yields more power while controlling type I error. Application of our approach to richly structured Genetic Analysis Workshop 14 data demonstrates that our method successfully corrects for population structure and family relatedness, whereas application of our method to a 15 000 individual Crohn's disease case-control cohort demonstrates that it additionally recovers genes not recoverable by univariate analysis. AVAILABILITY A Python-based library implementing our approach is available at http://mscompbio.codeplex.com.
Collapse
|
730
|
Thomas DC. Some surprising twists on the road to discovering the contribution of rare variants to complex diseases. Hum Hered 2013; 74:113-7. [PMID: 23594489 DOI: 10.1159/000347020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
|
731
|
Abstract
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Collapse
|
732
|
Wu MC, Maity A, Lee S, Simmons EM, Harmon QE, Lin X, Engel SM, Molldrem JJ, Armistead PM. Kernel machine SNP-set testing under multiple candidate kernels. Genet Epidemiol 2013; 37:267-75. [PMID: 23471868 PMCID: PMC3769109 DOI: 10.1002/gepi.21715] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2012] [Revised: 01/15/2013] [Accepted: 02/05/2013] [Indexed: 11/10/2022]
Abstract
Joint testing for the cumulative effect of multiple single-nucleotide polymorphisms grouped on the basis of prior biological knowledge has become a popular and powerful strategy for the analysis of large-scale genetic association studies. The kernel machine (KM)-testing framework is a useful approach that has been proposed for testing associations between multiple genetic variants and many different types of complex traits by comparing pairwise similarity in phenotype between subjects to pairwise similarity in genotype, with similarity in genotype defined via a kernel function. An advantage of the KM framework is its flexibility: choosing different kernel functions allows for different assumptions concerning the underlying model and can allow for improved power. In practice, it is difficult to know which kernel to use a priori because this depends on the unknown underlying trait architecture and selecting the kernel which gives the lowest P-value can lead to inflated type I error. Therefore, we propose practical strategies for KM testing when multiple candidate kernels are present based on constructing composite kernels and based on efficient perturbation procedures. We demonstrate through simulations and real data applications that the procedures protect the type I error rate and can lead to substantially improved power over poor choices of kernels and only modest differences in power vs. using the best candidate kernel.
Collapse
Affiliation(s)
- Michael C Wu
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7420, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
733
|
Okser S, Pahikkala T, Aittokallio T. Genetic variants and their interactions in disease risk prediction - machine learning and network perspectives. BioData Min 2013; 6:5. [PMID: 23448398 PMCID: PMC3606427 DOI: 10.1186/1756-0381-6-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 02/11/2013] [Indexed: 12/31/2022] Open
Abstract
A central challenge in systems biology and medical genetics is to understand how interactions among genetic loci contribute to complex phenotypic traits and human diseases. While most studies have so far relied on statistical modeling and association testing procedures, machine learning and predictive modeling approaches are increasingly being applied to mining genotype-phenotype relationships, also among those associations that do not necessarily meet statistical significance at the level of individual variants, yet still contributing to the combined predictive power at the level of variant panels. Network-based analysis of genetic variants and their interaction partners is another emerging trend by which to explore how sub-network level features contribute to complex disease processes and related phenotypes. In this review, we describe the basic concepts and algorithms behind machine learning-based genetic feature selection approaches, their potential benefits and limitations in genome-wide setting, and how physical or genetic interaction networks could be used as a priori information for providing improved predictive power and mechanistic insights into the disease networks. These developments are geared toward explaining a part of the missing heritability, and when combined with individual genomic profiling, such systems medicine approaches may also provide a principled means for tailoring personalized treatment strategies in the future.
Collapse
|
734
|
Wagner MJ. Rare-variant genome-wide association studies: a new frontier in genetic analysis of complex traits. Pharmacogenomics 2013; 14:413-24. [DOI: 10.2217/pgs.13.36] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Genome-wide association studies have, in the last few years, identified thousands of common genetic variants associated with common complex traits and diseases, implicating many genes not previously known to be involved in the biology of those traits. However, these variants have so far explained little of the population variance in trait values or disease susceptibility. As large-scale genome sequencing efforts have revealed the extent of genetic variation at the low end of the frequency range in human populations, the effects of rare variants have been proposed as an explanation of the ‘missing genetic variance.’ Improved technologies for genotyping rare variants, including inexpensive whole-genome and whole-exome sequencing and rare-variant genotyping chips, coupled with novel analytical methods, are making genome-wide scans for the effects of rare variants possible, and seem likely to usher in a new era in the genetic analysis of complex traits.
Collapse
Affiliation(s)
- Michael J Wagner
- Institute for Pharmacogenomics & Individualized Therapy, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7361, USA
| |
Collapse
|
735
|
Barnett IJ, Lee S, Lin X. Detecting rare variant effects using extreme phenotype sampling in sequencing association studies. Genet Epidemiol 2012. [PMID: 23184518 DOI: 10.1002/gepi.21699] [Citation(s) in RCA: 107] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
In the increasing number of sequencing studies aimed at identifying rare variants associated with complex traits, the power of the test can be improved by guided sampling procedures. We confirm both analytically and numerically that sampling individuals with extreme phenotypes can enrich the presence of causal rare variants and can therefore lead to an increase in power compared to random sampling. Although application of traditional rare variant association tests to these extreme phenotype samples requires dichotomizing the continuous phenotypes before analysis, the dichotomization procedure can decrease the power by reducing the information in the phenotypes. To avoid this, we propose a novel statistical method based on the optimal Sequence Kernel Association Test that allows us to test for rare variant effects using continuous phenotypes in the analysis of extreme phenotype samples. The increase in power of this method is demonstrated through simulation of a wide range of scenarios as well as in the triglyceride data of the Dallas Heart Study.
Collapse
Affiliation(s)
- Ian J Barnett
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | | | |
Collapse
|