201
|
Pérez-Núñez I, Pérez-Castrillón JL, Zarrabeitia MT, García-Ibarbia C, Martínez-Calvo L, Olmos JM, Briongos LS, Riancho J, Camarero V, Muñoz Vives JM, Cruz R, Riancho JA. Exon array analysis reveals genetic heterogeneity in atypical femoral fractures. A pilot study. Mol Cell Biochem 2015; 409:45-50. [DOI: 10.1007/s11010-015-2510-3] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2015] [Accepted: 07/04/2015] [Indexed: 10/23/2022]
|
202
|
Basmanav FB, Forstner AJ, Fier H, Herms S, Meier S, Degenhardt F, Hoffmann P, Barth S, Fricker N, Strohmaier J, Witt SH, Ludwig M, Schmael C, Moebus S, Maier W, Mössner R, Rujescu D, Rietschel M, Lange C, Nöthen MM, Cichon S. Investigation of the role of TCF4 rare sequence variants in schizophrenia. Am J Med Genet B Neuropsychiatr Genet 2015; 168B:354-62. [PMID: 26010163 DOI: 10.1002/ajmg.b.32318] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 04/13/2015] [Indexed: 12/20/2022]
Abstract
Transcription factor 4 (TCF4) is one of the most robust of all reported schizophrenia risk loci and is supported by several genetic and functional lines of evidence. While numerous studies have implicated common genetic variation at TCF4 in schizophrenia risk, the role of rare, small-sized variants at this locus-such as single nucleotide variants and short indels which are below the resolution of chip-based arrays requires further exploration. The aim of the present study was to investigate the association between rare TCF4 sequence variants and schizophrenia. Exon-targeted resequencing was performed in 190 German schizophrenia patients. Six rare variants at the coding exons and flanking sequences of the TCF4 gene were identified, including two missense variants and one splice site variant. These six variants were then pooled with nine additional rare variants identified in 379 European participants of the 1000 Genomes Project, and all 15 variants were genotyped in an independent German sample (n = 1,808 patients; n = 2,261 controls). These data were then analyzed using six statistical methods developed for the association analysis of rare variants. No significant association (P < 0.05) was found. However, the results from our association and power analyses suggest that further research into the possible involvement of rare TCF4 sequence variants in schizophrenia risk is warranted by the assessment of larger cohorts with higher statistical power to identify rare variant associations.
Collapse
Affiliation(s)
- F Buket Basmanav
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Andreas J Forstner
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Heide Fier
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany.,Department of Genomic Mathematics, University of Bonn, Bonn, Germany
| | - Stefan Herms
- Department of Genomics, Life and Brain Center, Bonn, Germany.,Division of Medical Genetics, University Hospital Basel and Department of Biomedicine, University of Basel, Basel, Switzerland
| | - Sandra Meier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany.,National Center for Register-Based Research, Aarhus University, Aarhus, Denmark
| | - Franziska Degenhardt
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Per Hoffmann
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany.,Division of Medical Genetics, University Hospital Basel and Department of Biomedicine, University of Basel, Basel, Switzerland.,Institute of Neuroscience and Medicine INM-1, Research Center Juelich, Juelich, Germany
| | - Sandra Barth
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Nadine Fricker
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Jana Strohmaier
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Stephanie H Witt
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Michael Ludwig
- Department of Clinical Chemistry and Clinical Pharmacology, University of Bonn, Bonn, Germany
| | - Christine Schmael
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Susanne Moebus
- Centre of Urban Epidemiology, Institute of Medical Informatics, Biometry and Epidemiology, Essen, Germany
| | - Wolfgang Maier
- Department of Psychiatry and Psychotherapy, University of Bonn, Bonn, Germany
| | - Rainald Mössner
- Department of Psychiatry and Psychotherapy, University of Bonn, Bonn, Germany.,Department of Psychiatry, University of Tübingen
| | - Dan Rujescu
- Department of Psychiatry, University of Halle, Halle, Germany
| | - Marcella Rietschel
- Department of Genetic Epidemiology in Psychiatry, Central Institute of Mental Health, Medical Faculty Mannheim/Heidelberg University, Mannheim, Germany
| | - Christoph Lange
- Department of Genomic Mathematics, University of Bonn, Bonn, Germany.,German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
| | - Markus M Nöthen
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany
| | - Sven Cichon
- Institute of Human Genetics, University of Bonn, Bonn, Germany.,Department of Genomics, Life and Brain Center, Bonn, Germany.,Division of Medical Genetics, University Hospital Basel and Department of Biomedicine, University of Basel, Basel, Switzerland.,Institute of Neuroscience and Medicine INM-1, Research Center Juelich, Juelich, Germany
| |
Collapse
|
203
|
Xu H, Zhang H, Yang W, Yadav R, Morrison AC, Qian M, Devidas M, Liu Y, Perez-Andreu V, Zhao X, Gastier-Foster JM, Lupo PJ, Neale G, Raetz E, Larsen E, Bowman WP, Carroll WL, Winick N, Williams R, Hansen T, Holm JC, Mardis E, Fulton R, Pui CH, Zhang J, Mullighan CG, Evans WE, Hunger SP, Gupta R, Schmiegelow K, Loh ML, Relling MV, Yang JJ. Inherited coding variants at the CDKN2A locus influence susceptibility to acute lymphoblastic leukaemia in children. Nat Commun 2015; 6:7553. [PMID: 26104880 PMCID: PMC4544058 DOI: 10.1038/ncomms8553] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 05/20/2015] [Indexed: 02/05/2023] Open
Abstract
There is increasing evidence from genome-wide association studies for a strong inherited genetic basis of susceptibility to acute lymphoblastic leukaemia (ALL) in children, yet the effects of protein-coding variants on ALL risk have not been systematically evaluated. Here we show a missense variant in CDKN2A associated with the development of ALL at genome-wide significance (rs3731249, P=9.4 × 10(-23), odds ratio=2.23). Functional studies indicate that this hypomorphic variant results in reduced tumour suppressor function of p16(INK4A), increases the susceptibility to leukaemic transformation of haematopoietic progenitor cells, and is preferentially retained in ALL tumour cells. Resequencing the CDKN2A-CDKN2B locus in 2,407 childhood ALL cases reveals 19 additional putative functional germline variants. These results provide direct functional evidence for the influence of inherited genetic variation on ALL risk, highlighting the important and complex roles of CDKN2A-CDKN2B tumour suppressors in leukaemogenesis.
Collapse
Affiliation(s)
- Heng Xu
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Laboratory Medicine, National Key Laboratory of Biotherapy/Collaborative Innovation Center of Biotherapy, and Cancer Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, China
| | - Hui Zhang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Pediatrics, The first affiliated hospital of Guangzhou Medical University, Guangzhou, Guangdong 510120, China
| | - Wenjian Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Rachita Yadav
- Centre for Biological Sequence Analysis, The Technical University of Denmark, Kgs, Lyngby DK-2800, Denmark
| | - Alanna C. Morrison
- Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, University of Texas Health Science Center, Houston, Texas 77030, USA
| | - Maoxiang Qian
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Meenakshi Devidas
- Department of Biostatistics, Epidemiology and Health Policy Research, College of Medicine, University of Florida, Gainesville, Florida 32610, USA
| | - Yu Liu
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Virginia Perez-Andreu
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Xujie Zhao
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Julie M. Gastier-Foster
- Department of Pathology and Laboratory Medicine, Nationwide Children's Hospital, and Departments of Pathology and Pediatrics, Ohio State University College of Medicine, Columbus, Ohio 43205, USA
| | - Philip J. Lupo
- Department of Pediatrics, Texas Children's Cancer Center, Baylor College of Medicine, Houston, Texas 77030, USA
| | - Geoff Neale
- Hartwell Center for Bioinformatics & Biotechnology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Elizabeth Raetz
- Huntsman Cancer Institute, The University of Utah, Salt Lake City, Utah 84112, USA
| | - Eric Larsen
- Maine Children's Cancer Program, Scarborough, Maine 04074, USA
| | - W. Paul Bowman
- Cook Children's Medical Center, Ft. Worth, Texas 38754, USA
| | - William L. Carroll
- Pediatric Oncology, Cancer Institute New York University, New York City, New York 10016, USA
| | - Naomi Winick
- Pediatric Hematology/Oncology, University of Texas Southwestern Medical Center, Dallas, Texas 75235, USA
| | | | - Torben Hansen
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen DK-2200, Denmark
| | - Jens-Christian Holm
- Department of Pediatrics, The Children's Obesity Clinic, Copenhagen University Hospital Holbaek, Holbaek DK-4300, Denmark
| | - Elaine Mardis
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Robert Fulton
- McDonnell Genome Institute, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Ching-Hon Pui
- Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Oncology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Jinghui Zhang
- Department of Computational Biology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Charles G. Mullighan
- Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Department of Pathology, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - William E. Evans
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Stephen P. Hunger
- Division of Oncology and Center for Childhood Cancer Research, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania 19104, USA
| | - Ramneek Gupta
- Centre for Biological Sequence Analysis, The Technical University of Denmark, Kgs, Lyngby DK-2800, Denmark
| | - Kjeld Schmiegelow
- Department of Paediatrics and Adolescent Medicine, The Juliane Marie Centre, The University Hospital Rigshospitalet, and the Institute of Clinical Medicine, Faculty of Health, University of Copenhagen, Copenhagen DK-2100, Denmark
| | - Mignon L. Loh
- Department of Pediatrics, Benioff Children's Hospital and the Helen Diller Family Comprehensive Cancer Center, University of California at San Francisco, San Francisco, California 94115, USA
| | - Mary V. Relling
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| | - Jun J. Yang
- Department of Pharmaceutical Sciences, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
- Hematological Malignancies Program, Comprehensive Cancer Center, St. Jude Children's Research Hospital, Memphis, Tennessee 38105, USA
| |
Collapse
|
204
|
Genetic variants in the ADAMTS13 and SUPT3H genes are associated with ADAMTS13 activity. Blood 2015; 125:3949-55. [DOI: 10.1182/blood-2015-02-629865] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 04/27/2015] [Indexed: 12/21/2022] Open
Abstract
Key Points
We identify rs41314453 as the strongest genetic predictor of ADAMTS13 activity, associated with a decrease of >20%. We present evidence of further independent associations with a common variant in SUPT3H, as well as 5 variants at the ADAMTS13 locus.
Collapse
|
205
|
Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation. Am J Hum Genet 2015; 96:926-37. [PMID: 26027497 DOI: 10.1016/j.ajhg.2015.04.018] [Citation(s) in RCA: 102] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2014] [Accepted: 04/29/2015] [Indexed: 11/20/2022] Open
Abstract
Accurate estimation of individual ancestry is important in genetic association studies, especially when a large number of samples are collected from multiple sources. However, existing approaches developed for genome-wide SNP data do not work well with modest amounts of genetic data, such as in targeted sequencing or exome chip genotyping experiments. We propose a statistical framework to estimate individual ancestry in a principal component ancestry map generated by a reference set of individuals. This framework extends and improves upon our previous method for estimating ancestry using low-coverage sequence reads (LASER 1.0) to analyze either genotyping or sequencing data. In particular, we introduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and empirical data examples, we show that our new method (LASER 2.0), combined with genotype imputation on the reference individuals, can substantially outperform LASER 1.0 in estimating fine-scale genetic ancestry. Specifically, LASER 2.0 can accurately estimate fine-scale ancestry within Europe using either exome chip genotypes or targeted sequencing data with off-target coverage as low as 0.05×. Under the framework of LASER 2.0, we can estimate individual ancestry in a shared reference space for samples assayed at different loci or by different techniques. Therefore, our ancestry estimation method will accelerate discovery in disease association studies not only by helping model ancestry within individual studies but also by facilitating combined analysis of genetic data from multiple sources.
Collapse
|
206
|
Leveraging Identity-by-Descent for Accurate Genotype Inference in Family Sequencing Data. PLoS Genet 2015; 11:e1005271. [PMID: 26043085 PMCID: PMC4456389 DOI: 10.1371/journal.pgen.1005271] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 05/12/2015] [Indexed: 12/23/2022] Open
Abstract
Sequencing family DNA samples provides an attractive alternative to population based designs to identify rare variants associated with human disease due to the enrichment of causal variants in pedigrees. Previous studies showed that genotype calling accuracy can be improved by modeling family relatedness compared to standard calling algorithms. Current family-based variant calling methods use sequencing data on single variants and ignore the identity-by-descent (IBD) sharing along the genome. In this study we describe a new computational framework to accurately estimate the IBD sharing from the sequencing data, and to utilize the inferred IBD among family members to jointly call genotypes in pedigrees. Through simulations and application to real data, we showed that IBD can be reliably estimated across the genome, even at very low coverage (e.g. 2X), and genotype accuracy can be dramatically improved. Moreover, the improvement is more pronounced for variants with low frequencies, especially at low to intermediate coverage (e.g. 10X to 20X), making our approach effective in studying rare variants in cost-effective whole genome sequencing in pedigrees. We hope that our tool is useful to the research community for identifying rare variants for human disease through family-based sequencing. To identify disease variants that occur less frequently in population, sequencing families in which multiple individuals are affected is more powerful due to the enrichment of causal variants. An important step in such studies is to infer individual genotypes from sequencing data. Existing methods do not utilize full familial transmission information and therefore result in reduced accuracy of inferred genotypes. In this study we describe a new method that infers shared genetic materials among family members and then incorporate the shared genomic information in a novel algorithm that can accurately infer genotypes. Our method is particularly advantageous when inferring low frequency variants with fewer sequence data, making it effective in analyzing genome-wide sequence data. We implemented the algorithm in a computationally efficient tool to facilitate cost-effective sequencing in families for identifying disease genetic variants.
Collapse
|
207
|
Lin KH, Zöllner S. Robust and Powerful Affected Sibpair Test for Rare Variant Association. Genet Epidemiol 2015; 39:325-33. [PMID: 25966809 DOI: 10.1002/gepi.21903] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2014] [Revised: 03/25/2015] [Accepted: 04/01/2015] [Indexed: 11/09/2022]
Abstract
Advances in DNA sequencing technology facilitate investigating the impact of rare variants on complex diseases. However, using a conventional case-control design, large samples are needed to capture enough rare variants to achieve sufficient power for testing the association between suspected loci and complex diseases. In such large samples, population stratification may easily cause spurious signals. One approach to overcome stratification is to use a family-based design. For rare variants, this strategy is especially appropriate, as power can be increased considerably by analyzing cases with affected relatives. We propose a novel framework for association testing in affected sibpairs by comparing the allele count of rare variants on chromosome regions shared identical by descent to the allele count of rare variants on nonshared chromosome regions, referred to as test for rare variant association with family-based internal control (TRAFIC). This design is generally robust to population stratification as cases and controls are matched within each sibpair. We evaluate the power analytically using general model for effect size of rare variants. For the same number of genotyped people, TRAFIC shows superior power over the conventional case-control study for variants with summed risk allele frequency f < 0.05; this power advantage is even more substantial when considering allelic heterogeneity. For complex models of gene-gene interaction, this power advantage depends on the direction of interaction and overall heritability. In sum, we introduce a new method for analyzing rare variants in affected sibpairs that is robust to population stratification, and provide freely available software.
Collapse
Affiliation(s)
- Keng-Han Lin
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Sebastian Zöllner
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America.,Center for Statistical Genetics, University of Michigan, Ann Arbor, Michigan, United States of America.,Department of Psychiatry, University of Michigan, Ann Arbor, Michigan, United States of America
| |
Collapse
|
208
|
Affiliation(s)
- Carlos Cruchaga
- 1] Department of Psychiatry, Washington University, St Louis, Missouri 63110, USA [2] Hope Center Program on Protein Aggregation and Neurodegeneration, Washington University St Louis, Missouri 63110, USA
| | - Alison M Goate
- 1] Department of Psychiatry, Washington University, St Louis, Missouri 63110, USA [2] Hope Center Program on Protein Aggregation and Neurodegeneration, Washington University St Louis, Missouri 63110, USA
| |
Collapse
|
209
|
Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders. Eur J Hum Genet 2015; 24:113-9. [PMID: 25898925 DOI: 10.1038/ejhg.2015.68] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2014] [Revised: 03/01/2015] [Accepted: 03/10/2015] [Indexed: 01/18/2023] Open
Abstract
Recent breakthroughs in exome-sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (eg, child and parents) are simultaneously sequenced, sequencing of a single case is often unsuccessful due to the large number of variants that need to be followed up for functional validation. Many approaches filter out common variants above a given frequency threshold (eg, 1%), and then prioritize the remaining variants according to their functional, structural and conservation properties. Here we present methods that leverage the genetic structure across different populations to improve filtering performance while accounting for the finite sample size of the reference panels. We show that leveraging genetic structure reduces the number of variants that need to be followed up by 16% in simulations and by up to 38% in empirical data of 20 exomes from individuals with monogenic disorders for which the causal variants are known.
Collapse
|
210
|
A statistical approach for rare-variant association testing in affected sibships. Am J Hum Genet 2015; 96:543-54. [PMID: 25799106 DOI: 10.1016/j.ajhg.2015.01.020] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 01/30/2015] [Indexed: 11/21/2022] Open
Abstract
Sequencing and exome-chip technologies have motivated development of novel statistical tests to identify rare genetic variation that influences complex diseases. Although many rare-variant association tests exist for case-control or cross-sectional studies, far fewer methods exist for testing association in families. This is unfortunate, because cosegregation of rare variation and disease status in families can amplify association signals for rare variants. Many researchers have begun sequencing (or genotyping via exome chips) familial samples that were either recently collected or previously collected for linkage studies. Because many linkage studies of complex diseases sampled affected sibships, we propose a strategy for association testing of rare variants for use in this study design. The logic behind our approach is that rare susceptibility variants should be found more often on regions shared identical by descent by affected sibling pairs than on regions not shared identical by descent. We propose both burden and variance-component tests of rare variation that are applicable to affected sibships of arbitrary size and that do not require genotype information from unaffected siblings or independent controls. Our approaches are robust to population stratification and produce analytic p values, thereby enabling our approach to scale easily to genome-wide studies of rare variation. We illustrate our methods by using simulated data and exome chip data from sibships ascertained for hypertension collected as part of the Genetic Epidemiology Network of Arteriopathy (GENOA) study.
Collapse
|
211
|
Farlow JL, Lin H, Sauerbeck L, Lai D, Koller DL, Pugh E, Hetrick K, Ling H, Kleinloog R, van der Vlies P, Deelen P, Swertz MA, Verweij BH, Regli L, Rinkel GJE, Ruigrok YM, Doheny K, Liu Y, Broderick J, Foroud T. Lessons learned from whole exome sequencing in multiplex families affected by a complex genetic disorder, intracranial aneurysm. PLoS One 2015; 10:e0121104. [PMID: 25803036 PMCID: PMC4372548 DOI: 10.1371/journal.pone.0121104] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2014] [Accepted: 02/10/2015] [Indexed: 12/30/2022] Open
Abstract
Genetic risk factors for intracranial aneurysm (IA) are not yet fully understood. Genomewide association studies have been successful at identifying common variants; however, the role of rare variation in IA susceptibility has not been fully explored. In this study, we report the use of whole exome sequencing (WES) in seven densely-affected families (45 individuals) recruited as part of the Familial Intracranial Aneurysm study. WES variants were prioritized by functional prediction, frequency, predicted pathogenicity, and segregation within families. Using these criteria, 68 variants in 68 genes were prioritized across the seven families. Of the genes that were expressed in IA tissue, one gene (TMEM132B) was differentially expressed in aneurysmal samples (n=44) as compared to control samples (n=16) (false discovery rate adjusted p-value=0.023). We demonstrate that sequencing of densely affected families permits exploration of the role of rare variants in a relatively common disease such as IA, although there are important study design considerations for applying sequencing to complex disorders. In this study, we explore methods of WES variant prioritization, including the incorporation of unaffected individuals, multipoint linkage analysis, biological pathway information, and transcriptome profiling. Further studies are needed to validate and characterize the set of variants and genes identified in this study.
Collapse
Affiliation(s)
- Janice L. Farlow
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Hai Lin
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Laura Sauerbeck
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati School of Medicine, Cincinnati, Ohio, United States of America
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Daniel L. Koller
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Elizabeth Pugh
- Center for Inherited Disease Research, Johns Hopkins University; Baltimore, Maryland, United States of America
| | - Kurt Hetrick
- Center for Inherited Disease Research, Johns Hopkins University; Baltimore, Maryland, United States of America
| | - Hua Ling
- Center for Inherited Disease Research, Johns Hopkins University; Baltimore, Maryland, United States of America
| | - Rachel Kleinloog
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Pieter van der Vlies
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Patrick Deelen
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Morris A. Swertz
- Department of Genetics, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
- Genomics Coordination Center, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands
| | - Bon H. Verweij
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Luca Regli
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
- Department of Neurosurgery, University Hospital Zurich, Zurich, Switzerland
| | - Gabriel J. E. Rinkel
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Ynte M. Ruigrok
- Department of Neurology and Neurosurgery, Brain Center Rudolf Magnus, University Medical Center Utrecht, Utrecht, the Netherlands
| | - Kimberly Doheny
- Center for Inherited Disease Research, Johns Hopkins University; Baltimore, Maryland, United States of America
| | - Yunlong Liu
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | - Joseph Broderick
- Department of Neurology and Rehabilitation Medicine, University of Cincinnati School of Medicine, Cincinnati, Ohio, United States of America
| | - Tatiana Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America
| | | |
Collapse
|
212
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
213
|
Villanueva P, Nudel R, Hoischen A, Fernández MA, Simpson NH, Gilissen C, Reader RH, Jara L, Echeverry MM, Francks C, Baird G, Conti-Ramsden G, O’Hare A, Bolton PF, Hennessy ER, Palomino H, Carvajal-Carmona L, Veltman JA, Cazier JB, De Barbieri Z, Fisher SE, Newbury DF. Exome sequencing in an admixed isolated population indicates NFXL1 variants confer a risk for specific language impairment. PLoS Genet 2015; 11:e1004925. [PMID: 25781923 PMCID: PMC4363375 DOI: 10.1371/journal.pgen.1004925] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 11/25/2014] [Indexed: 11/06/2022] Open
Abstract
Children affected by Specific Language Impairment (SLI) fail to acquire age appropriate language skills despite adequate intelligence and opportunity. SLI is highly heritable, but the understanding of underlying genetic mechanisms has proved challenging. In this study, we use molecular genetic techniques to investigate an admixed isolated founder population from the Robinson Crusoe Island (Chile), who are affected by a high incidence of SLI, increasing the power to discover contributory genetic factors. We utilize exome sequencing in selected individuals from this population to identify eight coding variants that are of putative significance. We then apply association analyses across the wider population to highlight a single rare coding variant (rs144169475, Minor Allele Frequency of 4.1% in admixed South American populations) in the NFXL1 gene that confers a nonsynonymous change (N150K) and is significantly associated with language impairment in the Robinson Crusoe population (p = 2.04 × 10-4, 8 variants tested). Subsequent sequencing of NFXL1 in 117 UK SLI cases identified four individuals with heterozygous variants predicted to be of functional consequence. We conclude that coding variants within NFXL1 confer an increased risk of SLI within a complex genetic model.
Collapse
Affiliation(s)
- Pía Villanueva
- Human Genetics Program, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, University of Chile, Santiago, Chile
- School of Speech and Hearing Therapy, Faculty of Medicine, University of Chile, Santiago, Chile
- Department of Child and Dental Maxillary Orthopedics, Faculty of Dentistry, University of Chile, Santiago, Chile
- Doctoral Program of Psychology, Graduate School, University of Granada, Granada, Spain
| | - Ron Nudel
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Alexander Hoischen
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences and Donders Centre for Neuroscience, Radboud University Medical Center, Nijmegen, the Netherlands
| | | | - Nuala H. Simpson
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Christian Gilissen
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences and Donders Centre for Neuroscience, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Rose H. Reader
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| | - Lillian Jara
- Human Genetics Program, Institute of Biomedical Sciences (ICBM), Faculty of Medicine, University of Chile, Santiago, Chile
| | - Maria Magdalena Echeverry
- Grupo de Citogenetica, Filogenia y Evolucion de las Poblaciones, Facultades de Ciencias y de Ciencias de la Salud, Universidad del Tolima, Ibague, Colombia
| | - Clyde Francks
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Gillian Baird
- Newcomen Centre, the Evelina Children’s Hospital, London, United Kingdom
| | - Gina Conti-Ramsden
- School of Psychological Sciences, University of Manchester, Manchester, United Kingdom
| | - Anne O’Hare
- Department of Reproductive and Developmental Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick F. Bolton
- Departments of Child & Adolescent Psychiatry & Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, King’s College London, London, United Kingdom
| | | | | | - Hernán Palomino
- Department of Child and Dental Maxillary Orthopedics, Faculty of Dentistry, University of Chile, Santiago, Chile
| | - Luis Carvajal-Carmona
- Grupo de Citogenetica, Filogenia y Evolucion de las Poblaciones, Facultades de Ciencias y de Ciencias de la Salud, Universidad del Tolima, Ibague, Colombia
- UC Davis Genome Center, Department of Biochemistry and Molecular Medicine, School of Medicine, University of California Davis, Davis, California, United States of America
| | - Joris A. Veltman
- Department of Human Genetics, Radboud Institute for Molecular Life Sciences and Donders Centre for Neuroscience, Radboud University Medical Center, Nijmegen, the Netherlands
| | - Jean-Baptiste Cazier
- Department of Oncology, University of Oxford, Oxford, United Kingdom
- Centre for Computational Biology, University of Birmingham, Edgbaston, United Kingdom
| | - Zulema De Barbieri
- School of Speech and Hearing Therapy, Faculty of Medicine, University of Chile, Santiago, Chile
| | - Simon E. Fisher
- Max Planck Institute for Psycholinguistics, Nijmegen, the Netherlands
- Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands
| | - Dianne F. Newbury
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- St Johns College, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
214
|
Abstract
Genome-wide association studies (GWASs) have successfully uncovered thousands of robust associations between common variants and complex traits and diseases. Despite these successes, much of the heritability of these traits remains unexplained. Because low-frequency and rare variants are not tagged by conventional genome-wide genotyping arrays, they may represent an important and understudied component of complex trait genetics. In contrast to common variant GWASs, there are many different types of study designs, assays and analytic techniques that can be utilized for rare variant association studies (RVASs). In this review, we briefly present the different technologies available to identify rare genetic variants, including novel exome arrays. We also compare the different study designs for RVASs and argue that the best design will likely be phenotype-dependent. We discuss the main analytical issues relevant to RVASs, including the different statistical methods that can be used to test genetic associations with rare variants and the various bioinformatic approaches to predicting in silico biological functions for variants. Finally, we describe recent rare variant association findings, highlighting the unexpected conclusion that most rare variants have modest-to-small effect sizes on phenotypic variation. This observation has major implications for our understanding of the genetic architecture of complex traits in the context of the unexplained heritability challenge.
Collapse
Affiliation(s)
- Paul L Auer
- School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53201-0413 USA
| | - Guillaume Lettre
- Montreal Heart Institute and Université de Montréal, Montreal, Quebec H1T 1C8 Canada
| |
Collapse
|
215
|
Pirie A, Wood A, Lush M, Tyrer J, Pharoah PDP. The effect of rare variants on inflation of the test statistics in case-control analyses. BMC Bioinformatics 2015; 16:53. [PMID: 25888290 PMCID: PMC4339749 DOI: 10.1186/s12859-015-0496-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Accepted: 02/12/2015] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND The detection of bias due to cryptic population structure is an important step in the evaluation of findings of genetic association studies. The standard method of measuring this bias in a genetic association study is to compare the observed median association test statistic to the expected median test statistic. This ratio is inflated in the presence of cryptic population structure. However, inflation may also be caused by the properties of the association test itself particularly in the analysis of rare variants. We compared the properties of the three most commonly used association tests: the likelihood ratio test, the Wald test and the score test when testing rare variants for association using simulated data. RESULTS We found evidence of inflation in the median test statistics of the likelihood ratio and score tests for tests of variants with less than 20 heterozygotes across the sample, regardless of the total sample size. The test statistics for the Wald test were under-inflated at the median for variants below the same minor allele frequency. CONCLUSIONS In a genetic association study, if a substantial proportion of the genetic variants tested have rare minor allele frequencies, the properties of the association test may mask the presence or absence of bias due to population structure. The use of either the likelihood ratio test or the score test is likely to lead to inflation in the median test statistic in the absence of population structure. In contrast, the use of the Wald test is likely to result in under-inflation of the median test statistic which may mask the presence of population structure.
Collapse
Affiliation(s)
- Ailith Pirie
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Angela Wood
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Michael Lush
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Jonathan Tyrer
- Department of Oncology, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| | - Paul D P Pharoah
- Department of Public Health and Primary Care, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
- Department of Oncology, Strangeways Research Laboratory, University of Cambridge, 2 Worts' Causeway, Cambridge, CB1 8RN, UK.
| |
Collapse
|
216
|
Chen R, Wei Q, Zhan X, Zhong X, Sutcliffe JS, Cox NJ, Cook EH, Li C, Chen W, Li B. A haplotype-based framework for group-wise transmission/disequilibrium tests for rare variant association analysis. ACTA ACUST UNITED AC 2015; 31:1452-9. [PMID: 25568282 DOI: 10.1093/bioinformatics/btu860] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2014] [Accepted: 12/23/2014] [Indexed: 12/30/2022]
Abstract
MOTIVATION A major focus of current sequencing studies for human genetics is to identify rare variants associated with complex diseases. Aside from reduced power of detecting associated rare variants, controlling for population stratification is particularly challenging for rare variants. Transmission/disequilibrium tests (TDT) based on family designs are robust to population stratification and admixture, and therefore provide an effective approach to rare variant association studies to eliminate spurious associations. To increase power of rare variant association analysis, gene-based collapsing methods become standard approaches for analyzing rare variants. Existing methods that extend this strategy to rare variants in families usually combine TDT statistics at individual variants and therefore lack the flexibility of incorporating other genetic models. RESULTS In this study, we describe a haplotype-based framework for group-wise TDT (gTDT) that is flexible to encompass a variety of genetic models such as additive, dominant and compound heterozygous (CH) (i.e. recessive) models as well as other complex interactions. Unlike existing methods, gTDT constructs haplotypes by transmission when possible and inherently takes into account the linkage disequilibrium among variants. Through extensive simulations we showed that type I error was correctly controlled for rare variants under all models investigated, and this remained true in the presence of population stratification. Under a variety of genetic models, gTDT showed increased power compared with the single marker TDT. Application of gTDT to an autism exome sequencing data of 118 trios identified potentially interesting candidate genes with CH rare variants. AVAILABILITY AND IMPLEMENTATION We implemented gTDT in C++ and the source code and the detailed usage are available on the authors' website (https://medschool.vanderbilt.edu/cgg). CONTACT bingshan.li@vanderbilt.edu or wei.chen@chp.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rui Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Qiang Wei
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xiaowei Zhan
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xue Zhong
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - James S Sutcliffe
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Nancy J Cox
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Edwin H Cook
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Chun Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Wei Chen
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Bingshan Li
- Department of Molecular Physiology and Biophysics, Vanderbilt University, TN, 37221, USA, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA, Center for Quantitative Sciences, Vanderbilt University, TN, 37221, USA, Department of Medicine, University of Chicago, Chicago, IL, USA, Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA, Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA and Department of Pediatrics, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
217
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
218
|
Zhang Y, Pan W. Principal component regression and linear mixed model in association analysis of structured samples: competitors or complements? Genet Epidemiol 2014; 39:149-55. [PMID: 25536929 DOI: 10.1002/gepi.21879] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 11/11/2014] [Accepted: 11/11/2014] [Indexed: 11/10/2022]
Abstract
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
Collapse
Affiliation(s)
- Yiwei Zhang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America
| | | |
Collapse
|
219
|
O'Connor TD, Fu W, Mychaleckyj JC, Logsdon B, Auer P, Carlson CS, Leal SM, Smith JD, Rieder MJ, Bamshad MJ, Nickerson DA, Akey JM. Rare variation facilitates inferences of fine-scale population structure in humans. Mol Biol Evol 2014; 32:653-60. [PMID: 25415970 PMCID: PMC4327153 DOI: 10.1093/molbev/msu326] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Understanding the genetic structure of human populations has important implications for the design and interpretation of disease mapping studies and reconstructing human evolutionary history. To date, inferences of human population structure have primarily been made with common variants. However, recent large-scale resequencing studies have shown an abundance of rare variation in humans, which may be particularly useful for making inferences of fine-scale population structure. To this end, we used an information theory framework and extensive coalescent simulations to rigorously quantify the informativeness of rare and common variation to detect signatures of fine-scale population structure. We show that rare variation affords unique insights into patterns of recent population structure. Furthermore, to empirically assess our theoretical findings, we analyzed high-coverage exome sequences in 6,515 European and African American individuals. As predicted, rare variants are more informative than common polymorphisms in revealing a distinct cluster of European–American individuals, and subsequent analyses demonstrate that these individuals are likely of Ashkenazi Jewish ancestry. Our results provide new insights into the population structure using rare variation, which will be an important factor to account for in rare variant association studies.
Collapse
Affiliation(s)
- Timothy D O'Connor
- Institute for Genome Sciences, University of Maryland School of Medicine Program in Personalized and Genomic Medicine, University of Maryland School of Medicine
| | - Wenqing Fu
- Department of Genome Sciences, University of Washington, Seattle
| | | | | | - Josyf C Mychaleckyj
- Department of Public Health Sciences, University of Virginia School of Medicine
| | - Benjamin Logsdon
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Paul Auer
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA Biostatistics: University of Wisconsin-Milwaukee, School of Public Health
| | - Christopher S Carlson
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA
| | - Suzanne M Leal
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX
| | - Joshua D Smith
- Department of Genome Sciences, University of Washington, Seattle
| | - Mark J Rieder
- Department of Genome Sciences, University of Washington, Seattle
| | - Michael J Bamshad
- Department of Genome Sciences, University of Washington, Seattle Department of Pediatrics, University of Washington, Seattle
| | | | - Joshua M Akey
- Department of Genome Sciences, University of Washington, Seattle
| |
Collapse
|
220
|
Yu J, Wu H, Wen Y, Liu Y, Zhou T, Ni B, Lin Y, Dong J, Zhou Z, Hu Z, Guo X, Sha J, Tong C. Identification of seven genes essential for male fertility through a genome-wide association study of non-obstructive azoospermia and RNA interference-mediated large-scale functional screening in Drosophila. Hum Mol Genet 2014; 24:1493-503. [DOI: 10.1093/hmg/ddu557] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
221
|
Satten GA, Biswas S, Papachristou C, Turkmen A, König IR. Population-based association and gene by environment interactions in Genetic Analysis Workshop 18. Genet Epidemiol 2014; 38 Suppl 1:S49-56. [PMID: 25112188 DOI: 10.1002/gepi.21825] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In the past decade, genome-wide association studies have been successful in identifying genetic loci that play a role in many complex diseases. Despite this, it has become clear that for many traits, investigation of single common variants does not give a complete picture of the genetic contribution to the phenotype. Therefore a number of new approaches are currently being investigated to further the search for susceptibility loci or regions. We summarize the contributions to Genetic Analysis Workshop 18 (GAW18) that concern this search using methods for population-based association analysis. Many of the members of our GAW18 working group made use of data types that have only recently become available through the use of next-generation sequencing technologies, with many focusing on the investigation of rare variants instead of or in combination with common variants. Some contributors used a haplotype-based approach, which to date has been used relatively infrequently but may become more important for analyzing rare variant association data. Others analyzed gene-gene or gene-environment interactions, where novel statistical approaches were needed to make the best use of the available information without requiring an excessive computational burden. GAW18 provided participants with the chance to make use of state-of-the-art data, statistical techniques, and technology. We report here some of the experiences and conclusions that were reached by workshop participants who analyzed the GAW18 data as a population-based association study.
Collapse
Affiliation(s)
- Glen A Satten
- Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
| | | | | | | | | |
Collapse
|
222
|
Zhang W, Meehan J, Su Z, Ng HW, Shu M, Luo H, Ge W, Perkins R, Tong W, Hong H. Whole genome sequencing of 35 individuals provides insights into the genetic architecture of Korean population. BMC Bioinformatics 2014; 15 Suppl 11:S6. [PMID: 25350283 PMCID: PMC4251052 DOI: 10.1186/1471-2105-15-s11-s6] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Background Due to a significant decline in the costs associated with next-generation sequencing, it has become possible to decipher the genetic architecture of a population by sequencing a large number of individuals to a deep coverage. The Korean Personal Genomes Project (KPGP) recently sequenced 35 Korean genomes at high coverage using the Illumina Hiseq platform and made the deep sequencing data publicly available, providing the scientific community opportunities to decipher the genetic architecture of the Korean population. Methods In this study, we used two single nucleotide variant (SNV) calling pipelines: mapping the raw reads obtained from whole genome sequencing of 35 Korean individuals in KPGP using BWA and SOAP2 followed by SNV calling using SAMtools and SOAPsnp, respectively. The consensus SNVs obtained from the two SNV pipelines were used to represent the SNVs of the Korean population. We compared these SNVs to those from 17 other populations provided by the HapMap consortium and the 1000 Genomes Project (1KGP) and identified SNVs that were only present in the Korean population. We studied the mutation spectrum and analyzed the genes of non-synonymous SNVs only detected in the Korean population. Results We detected a total of 8,555,726 SNVs in the 35 Korean individuals and identified 1,213,613 SNVs detected in at least one Korean individual (SNV-1) and 12,640 in all of 35 Korean individuals (SNV-35) but not in 17 other populations. In contrast with the SNVs common to other populations in HapMap and 1KGP, the Korean only SNVs had high percentages of non-silent variants, emphasizing the unique roles of these Korean only SNVs in the Korean population. Specifically, we identified 8,361 non-synonymous Korean only SNVs, of which 58 SNVs existed in all 35 Korean individuals. The 5,754 genes of non-synonymous Korean only SNVs were highly enriched in some metabolic pathways. We found adhesion is the top disease term associated with SNV-1 and Nelson syndrome is the only disease term associated with SNV-35. We found that a significant number of Korean only SNVs are in genes that are associated with the drug term of adenosine. Conclusion We identified the SNVs that were found in the Korean population but not seen in other populations, and explored the corresponding genes and pathways as well as the associated disease terms and drug terms. The results expand our knowledge of the genetic architecture of the Korean population, which will benefit the implementation of personalized medicine for the Korean population.
Collapse
|
223
|
Tantoso E, Wong LP, Li B, Saw WY, Xu W, Little P, Ong RTH, Teo YY. Evaluating the coverage and potential of imputing the exome microarray with next-generation imputation using the 1000 Genomes Project. PLoS One 2014; 9:e106681. [PMID: 25203698 PMCID: PMC4159276 DOI: 10.1371/journal.pone.0106681] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2014] [Accepted: 07/29/2014] [Indexed: 11/18/2022] Open
Abstract
Next-generation genotyping microarrays have been designed with insights from large-scale sequencing of exomes and whole genomes. The exome genotyping arrays promise to query the functional regions of the human genome at a fraction of the sequencing cost, thus allowing large number of samples to be genotyped. However, two pertinent questions exist: firstly, how representative is the content of the exome chip for populations not involved in the design of the chip; secondly, can the content of the exome chip be imputed with the reference data from the 1000 Genomes Project (1KGP). By deep whole-genome sequencing two Asian populations that are not part of the 1KGP, comprising 96 Southeast Asian Malays and 36 South Asian Indians for which the same samples have also been genotyped on both the Illumina 2.5 M and exome microarrays, we discovered the exome chip is a poor representation of exonic content in our two populations. However, up to 94.1% of the variants on the exome chip that are polymorphic in our populations can be confidently imputed with existing non-exome-centric microarrays using the 1KGP panel. The coverage further increases if there exists population-specific reference data from whole-genome sequencing. There is thus limited gain in using the exome chip for populations not involved in the microarray design. Instead, for the same cost of genotyping 2,000 samples on the exome chip, performing whole-genome sequencing of at least 35 samples in that population to complement the 1KGP may yield a higher coverage of the exonic content from imputation instead.
Collapse
Affiliation(s)
- Erwin Tantoso
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Lai-Ping Wong
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Bowen Li
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Woei-Yuh Saw
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Wenting Xu
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Peter Little
- Life Sciences Institute, National University of Singapore, Singapore, Singapore
| | - Rick Twee-Hee Ong
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
- Life Sciences Institute, National University of Singapore, Singapore, Singapore
- Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore
- NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, Singapore
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| |
Collapse
|
224
|
Cao S, Qin H, Deng HW, Wang YP. A unified sparse representation for sequence variant identification for complex traits. Genet Epidemiol 2014; 38:671-9. [PMID: 25195875 DOI: 10.1002/gepi.21849] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2014] [Revised: 06/23/2014] [Accepted: 07/16/2014] [Indexed: 12/25/2022]
Abstract
Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.
Collapse
Affiliation(s)
- Shaolong Cao
- Department of Biomedical Engineering, Tulane University, New Orleans, Louisiana, United States of America; Center for Bioinformatics and Genomics, Tulane University, New Orleans, Louisiana, United States of America
| | | | | | | |
Collapse
|
225
|
Fine-scale human genetic structure in Western France. Eur J Hum Genet 2014; 23:831-6. [PMID: 25182131 DOI: 10.1038/ejhg.2014.175] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2013] [Revised: 07/21/2014] [Accepted: 07/30/2014] [Indexed: 11/08/2022] Open
Abstract
The difficulties arising from association analysis with rare variants underline the importance of suitable reference population cohorts, which integrate detailed spatial information. We analyzed a sample of 1684 individuals from Western France, who were genotyped at genome-wide level, from two cohorts D.E.S.I.R and CavsGen. We found that fine-scale population structure occurs at the scale of Western France, with distinct admixture proportions for individuals originating from the Brittany Region and the Vendée Department. Genetic differentiation increases with distance at a high rate in these two parts of Northwestern France and linkage disequilibrium is higher in Brittany suggesting a lower effective population size. When looking for genomic regions informative about Breton origin, we found two prominent associated regions that include the lactase region and the HLA complex. For both the lactase and the HLA regions, there is a low differentiation between Bretons and Irish, and this is also found at the genome-wide level. At a more refined scale, and within the Pays de la Loire Region, we also found evidence of fine-scale population structure, although principal component analysis showed that individuals from different departments cannot be confidently discriminated. Because of the evidence for fine-scale genetic structure in Western France, we anticipate that rare and geographically localized variants will be identified in future full-sequence analyses.
Collapse
|
226
|
Abstract
In humans, most of the genetic variation is rare and often population-specific. Whereas the role of rare genetic variants in familial monogenic diseases is firmly established, we are only now starting to explore the contribution of this class of genetic variation to human common diseases and other complex traits. Such large-scale experiments are possible due to the development of next-generation DNA sequencing. Early findings suggested that rare and low-frequency coding variation might have a large effect on human phenotypes (eg, PCSK9 missense variants on low-density lipoprotein-cholesterol and coronary heart diseases). This observation sparked excitement in prognostic and diagnostic medicine, as well as in genetics-driven strategies to develop new drugs. In this review, I describe results and present initial conclusions regarding some of the recent rare and low-frequency variant discoveries. We can already assume that most phenotype-associated rare and low-frequency variants have modest-to-weak phenotypical effect. Thus, we will need large cohorts to identify them, as for common variants in genome-wide association studies. As we expand the list of associated rare and low-frequency variants, we can also better recognise the current limitations: we need to develop better statistical methods to optimally test association with rare variants, including non-coding variation, and to account for potential confounders such as population stratification.
Collapse
Affiliation(s)
- Guillaume Lettre
- Montreal Heart Institute, Montreal, Quebec, Canada Faculty of Medicine, Department of Medicine, Université de Montréal, Montreal, Quebec, Canada
| |
Collapse
|
227
|
MacArthur DG, Manolio TA, Dimmock DP, Rehm HL, Shendure J, Abecasis GR, Adams DR, Altman RB, Antonarakis SE, Ashley EA, Barrett JC, Biesecker LG, Conrad DF, Cooper GM, Cox NJ, Daly MJ, Gerstein MB, Goldstein DB, Hirschhorn JN, Leal SM, Pennacchio LA, Stamatoyannopoulos JA, Sunyaev SR, Valle D, Voight BF, Winckler W, Gunter C. Guidelines for investigating causality of sequence variants in human disease. Nature 2014; 508:469-76. [PMID: 24759409 PMCID: PMC4180223 DOI: 10.1038/nature13127] [Citation(s) in RCA: 951] [Impact Index Per Article: 86.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2013] [Accepted: 02/05/2014] [Indexed: 11/26/2022]
Abstract
The discovery of rare genetic variants is accelerating, and clear guidelines for distinguishing disease-causing sequence variants from the many potentially functional variants present in any human genome are urgently needed. Without rigorous standards we risk an acceleration of false-positive reports of causality, which would impede the translation of genomic research findings into the clinical diagnostic setting and hinder biological understanding of disease. Here we discuss the key challenges of assessing sequence variants in human disease, integrating both gene-level and variant-level support for causality. We propose guidelines for summarizing confidence in variant pathogenicity and highlight several areas that require further resource development.
Collapse
Affiliation(s)
- D G MacArthur
- 1] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - T A Manolio
- Division of Genomic Medicine, National Human Genome Research Institute, Bethesda, Maryland 20892, USA
| | - D P Dimmock
- Division of Genetics, Department of Pediatrics, Medical College of Wisconsin, Milwaukee, Wisconsin 53226, USA
| | - H L Rehm
- 1] Laboratory for Molecular Medicine, Partners Healthcare Center for Personalized Genetic Medicine, Cambridge, Massachusetts 02139, USA [2] Department of Pathology, Harvard Medical School, Boston, Massachusetts 02115, USA
| | - J Shendure
- Department of Genome Sciences, University of Washington, Seattle, Washington 98115, USA
| | - G R Abecasis
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - D R Adams
- 1] NIH Undiagnosed Diseases Program, National Institutes of Health Office of Rare Diseases Research and National Human Genome Research Institute, Bethesda, Maryland 20892, USA [2] Office of the Clinical Director, National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - R B Altman
- Departments of Bioengineering & Genetics, Stanford University, Stanford, California 94305, USA
| | - S E Antonarakis
- 1] Department of Genetic Medicine, University of Geneva Medical School, 1211 Geneva, Switzerland [2] iGE3 Institute of Genetics and Genomics of Geneva, 1211 Geneva, Switzerland
| | - E A Ashley
- Center for Inherited Cardiovascular Disease, Stanford University School of Medicine, Stanford, California 94305, USA
| | - J C Barrett
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK
| | - L G Biesecker
- Genetic Disease Research Branch, National Human Genome Research Institute, NIH, Bethesda, Maryland 20892, USA
| | - D F Conrad
- Departments of Genetics, Pathology and Immunology, Washington University School of Medicine, St Louis, Missouri 63110, USA
| | - G M Cooper
- HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, Alabama 35806, USA
| | - N J Cox
- Section of Genetic Medicine, Department of Medicine, University of Chicago, Chicago, Illinois 60637, USA
| | - M J Daly
- 1] Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts 02114, USA [2] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA
| | - M B Gerstein
- 1] Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA [2] Departments of Computer Science, Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - D B Goldstein
- Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina 27708, USA
| | - J N Hirschhorn
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA [2] Divisions of Genetics and Endocrinology, Children's Hospital, Boston, Massachusetts 02115, USA
| | - S M Leal
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas 77030, USA
| | - L A Pennacchio
- 1] Genomics Division, MS 84-171, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA [2] US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - J A Stamatoyannopoulos
- Department of Genome Sciences, University of Washington, 1705 Northeast Pacific Street, Seattle, Washington 98195, USA
| | - S R Sunyaev
- 1] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Boston, Massachusetts 02115, USA [2] Harvard Medical School, Boston, Massachusetts 02115, USA
| | - D Valle
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland 21287, USA
| | - B F Voight
- Department of Pharmacology and Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania 19104, USA
| | - W Winckler
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts 02142, USA [2] Next Generation Diagnostics, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, USA (W.W.); Marcus Autism Center, Children's Healthcare of Atlanta, Atlanta, Georgia 30329, USA (C.G.)
| | - C Gunter
- 1] HudsonAlpha Institute for Biotechnology, 601 Genome Way, Huntsville, Alabama 35806, USA [2] Next Generation Diagnostics, Novartis Institutes for BioMedical Research, Cambridge, Massachusetts, USA (W.W.); Marcus Autism Center, Children's Healthcare of Atlanta, Atlanta, Georgia 30329, USA (C.G.)
| |
Collapse
|
228
|
Abstract
Large whole-genome sequencing projects have provided access to much rare variation in human populations, which is highly informative about population structure and recent demography. Here, we show how the age of rare variants can be estimated from patterns of haplotype sharing and how these ages can be related to historical relationships between populations. We investigate the distribution of the age of variants occurring exactly twice (ƒ(2) variants) in a worldwide sample sequenced by the 1000 Genomes Project, revealing enormous variation across populations. The median age of haplotypes carrying ƒ(2) variants is 50 to 160 generations across populations within Europe or Asia, and 170 to 320 generations within Africa. Haplotypes shared between continents are much older with median ages for haplotypes shared between Europe and Asia ranging from 320 to 670 generations. The distribution of the ages of ƒ(2) haplotypes is informative about their demography, revealing recent bottlenecks, ancient splits, and more modern connections between populations. We see the effect of selection in the observation that functional variants are significantly younger than nonfunctional variants of the same frequency. This approach is relatively insensitive to mutation rate and complements other nonparametric methods for demographic inference.
Collapse
Affiliation(s)
- Iain Mathieson
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
- * E-mail:
| | - Gil McVean
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
229
|
Jiang Y, Conneely KN, Epstein MP. Flexible and robust methods for rare-variant testing of quantitative traits in trios and nuclear families. Genet Epidemiol 2014; 38:542-51. [PMID: 25044337 DOI: 10.1002/gepi.21839] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2014] [Revised: 05/21/2014] [Accepted: 05/29/2014] [Indexed: 11/07/2022]
Abstract
Most rare-variant association tests for complex traits are applicable only to population-based or case-control resequencing studies. There are fewer rare-variant association tests for family-based resequencing studies, which is unfortunate because pedigrees possess many attractive characteristics for such analyses. Family-based studies can be more powerful than their population-based counterparts due to increased genetic load and further enable the implementation of rare-variant association tests that, by design, are robust to confounding due to population stratification. With this in mind, we propose a rare-variant association test for quantitative traits in families; this test integrates the QTDT approach of Abecasis et al. [Abecasis et al., ] into the kernel-based SNP association test KMFAM of Schifano et al. [Schifano et al., ]. The resulting within-family test enjoys the many benefits of the kernel framework for rare-variant association testing, including rapid evaluation of P-values and preservation of power when a region harbors rare causal variation that acts in different directions on phenotype. Additionally, by design, this within-family test is robust to confounding due to population stratification. Although within-family association tests are generally less powerful than their counterparts that use all genetic information, we show that we can recover much of this power (although still ensuring robustness to population stratification) using a straightforward screening procedure. Our method accommodates covariates and allows for missing parental genotype data, and we have written software implementing the approach in R for public use.
Collapse
Affiliation(s)
- Yunxuan Jiang
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, United States of America
| | | | | |
Collapse
|
230
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 689] [Impact Index Per Article: 62.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
231
|
Wang Y, McKay JD, Rafnar T, Wang Z, Timofeeva M, Broderick P, Zong X, Laplana M, Wei Y, Han Y, Lloyd A, Delahaye-Sourdeix M, Chubb D, Gaborieau V, Wheeler W, Chatterjee N, Thorleifsson G, Sulem P, Liu G, Kaaks R, Henrion M, Kinnersley B, Vallée M, LeCalvez-Kelm F, Stevens VL, Gapstur SM, Chen WV, Zaridze D, Szeszenia-Dabrowska N, Lissowska J, Rudnai P, Fabianova E, Mates D, Bencko V, Foretova L, Janout V, Krokan HE, Gabrielsen ME, Skorpen F, Vatten L, Njølstad I, Chen C, Goodman G, Benhamou S, Vooder T, Valk K, Nelis M, Metspalu A, Lener M, Lubiński J, Johansson M, Vineis P, Agudo A, Clavel-Chapelon F, Bueno-de-Mesquita H, Trichopoulos D, Khaw KT, Johansson M, Weiderpass E, Tjønneland A, Riboli E, Lathrop M, Scelo G, Albanes D, Caporaso NE, Ye Y, Gu J, Wu X, Spitz MR, Dienemann H, Rosenberger A, Su L, Matakidou A, Eisen T, Stefansson K, Risch A, Chanock SJ, Christiani DC, Hung RJ, Brennan P, Landi MT, Houlston RS, Amos CI. Rare variants of large effect in BRCA2 and CHEK2 affect risk of lung cancer. Nat Genet 2014; 46:736-41. [PMID: 24880342 PMCID: PMC4074058 DOI: 10.1038/ng.3002] [Citation(s) in RCA: 337] [Impact Index Per Article: 30.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 05/08/2014] [Indexed: 12/16/2022]
Abstract
We conducted imputation to the 1000 Genomes Project of four genome-wide association studies of lung cancer in populations of European ancestry (11,348 cases and 15,861 controls) and genotyped an additional 10,246 cases and 38,295 controls for follow-up. We identified large-effect genome-wide associations for squamous lung cancer with the rare variants BRCA2 p.Lys3326X (rs11571833, odds ratio (OR) = 2.47, P = 4.74 × 10(-20)) and CHEK2 p.Ile157Thr (rs17879961, OR = 0.38, P = 1.27 × 10(-13)). We also showed an association between common variation at 3q28 (TP63, rs13314271, OR = 1.13, P = 7.22 × 10(-10)) and lung adenocarcinoma that had been previously reported only in Asians. These findings provide further evidence for inherited genetic susceptibility to lung cancer and its biological basis. Additionally, our analysis demonstrates that imputation can identify rare disease-causing variants with substantive effects on cancer risk from preexisting genome-wide association study data.
Collapse
Affiliation(s)
- Yufei Wang
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - James D. McKay
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Thorunn Rafnar
- deCODE genetics/Amgen, Sturlugata 8, 101 Reykjavik, Iceland
| | - Zhaoming Wang
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | - Maria Timofeeva
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Peter Broderick
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - Xuchen Zong
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital. Toronto, Canada
| | - Marina Laplana
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), Heidelberg, Germany
| | - Yongyue Wei
- Department of Environmental Health, Harvard School of Public Health, Boston, MA, 617-432-1641, USA
| | - Younghun Han
- Center for Genomic Medicine Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, 46 Centerra Parkway, Suite 330, Lebanon, NH 03766
| | - Amy Lloyd
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | | | - Daniel Chubb
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - Valerie Gaborieau
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - William Wheeler
- Information Management Services, Inc., Rockville, MD 20852, USA
| | - Nilanjan Chatterjee
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | | | - Patrick Sulem
- deCODE genetics/Amgen, Sturlugata 8, 101 Reykjavik, Iceland
| | - Geoffrey Liu
- Princess Margaret Hospital, University Health Network, Toronto, Canada
| | - Rudolf Kaaks
- Division of Cancer Epidemiology, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Translational Lung Research Center Heidelberg (TLRC-H), Member of the German Center for Lung Research (DZL), Heidelberg, Germany
| | - Marc Henrion
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - Ben Kinnersley
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - Maxime Vallée
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | | | - Victoria L. Stevens
- Epidemiology Research Program, American Cancer Society, Atlanta, GA, 30301, USA
| | - Susan M. Gapstur
- Epidemiology Research Program, American Cancer Society, Atlanta, GA, 30301, USA
| | - Wei V. Chen
- Department of Genetics, U.T. M.D. Anderson Cancer Center, Houston, TX 77030
| | - David Zaridze
- Institute of Carcinogenesis, Russian N.N. Blokhin Cancer Research Centre, 115478 Moscow, Russia
| | | | - Jolanta Lissowska
- The M. Sklodowska-Curie Memorial Cancer Center and Institute of Oncology, Warsaw 02781, Poland
| | - Peter Rudnai
- National Institute of Environmental Health, Budapest 1097, Hungary
| | - Eleonora Fabianova
- Regional Authority of Public Health, Banska’ Bystrica 97556, Slovak Republic
| | - Dana Mates
- National Institute of Public Health, Bucharest 050463, Romania
| | - Vladimir Bencko
- 1st Faculty of Medicine, Institute of Hygiene and Epidemiology, Charles University in Prague, 12800 Prague 2, Czech Republic
| | - Lenka Foretova
- Department of Cancer Epidemiology and Genetics, Masaryk Memorial Cancer Institute, Brno 65653, Czech Republic
| | | | - Hans E. Krokan
- Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim 7489, Norway
| | - Maiken Elvestad Gabrielsen
- Department of Cancer Research and Molecular Medicine, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim 7489, Norway
| | - Frank Skorpen
- Department of Laboratory Medicine, Children’s and Women’s Health, Faculty of Medicine
| | - Lars Vatten
- Department of Public Health and General Practice, Faculty of Medicine, Norwegian University of Science and Technology, Trondheim 7489, Norway
| | - Inger Njølstad
- Department of Community Medicine, University of Tromso, Tromso 9037, Norway
| | - Chu Chen
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | - Gary Goodman
- Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA
| | | | - Tonu Vooder
- Institute of Molecular and Cell Biology, University of Tartu, Tartu 51010, Estonia
| | - Kristjan Valk
- Competence Centre on Reproductive Medicine and Biology, 50410 Tartu, Estonia
| | - Mari Nelis
- Estonian Genome Center, Institute of Molecular and Cell Biology, Tartu 51010, Estonia
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Andres Metspalu
- Estonian Genome Center, Institute of Molecular and Cell Biology, Tartu 51010, Estonia
| | - Marcin Lener
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Jan Lubiński
- Department of Genetics and Pathology, International Hereditary Cancer Center, Pomeranian Medical University, Szczecin, Poland
| | - Mattias Johansson
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Paolo Vineis
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London, UK
- HuGeF Foundation, Torino, Italy
| | - Antonio Agudo
- Unit of Nutrition, Environment and Cancer, Cancer Epidemiology Research Program, Catalan Institute of Oncology, Barcelona, Spain
| | - Francoise Clavel-Chapelon
- INSERM, Centre for research in Epidemiology and Population Health (CESP), U1018, Nutrition, Hormones and Women’s Health team, F-94805, Villejuif, France
- Université Paris Sud, UMRS 1018, F-94805, Villejuif, France
- IGR, F-94805, Villejuif, France
| | - H.Bas Bueno-de-Mesquita
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London, UK
- National Institute for Public Health and the Environment (RIVM), Bilthoven, The Netherlands
- Department of Gastroenterology and Hepatology, University Medical Centre, Utrecht, The Netherlands
| | - Dimitrios Trichopoulos
- Department of Epidemiology, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA 02115, USA
- Bureau of Epidemiologic Research, Academy of Athens, 23 Alexandroupoleos Street, Athens, GR-115 27, Greece
- Hellenic Health Foundation, 13 Kaisareias Street, Athens, GR-115 27, Greece
| | - Kay-Tee Khaw
- University of Cambridge School of Clinical Medicine, Clinical Gerontology Unit Box 251, Addenbrooke’s Hospital, Cambridge CB2 2QQ, UK
| | - Mikael Johansson
- Department of Radiation Sciences, Umeå universitet, SE-901 87 Umeå, Sverige, Sweden
| | - Elisabete Weiderpass
- Department of Community Medicine, Faculty of Health Sciences, University of Tromsø, Tromsø, Norway
- Department of Research, Cancer Registry of Norway, Oslo, Norway
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
- Samfundet Folkhälsan, Helsinki, Finland
| | - Anne Tjønneland
- Danish Cancer Society Research Center, Strandboulevarden 49, DK 2100 Copenhagen Ø, Denmark
| | - Elio Riboli
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London, UK
| | - Mark Lathrop
- Centre d’Etude du Polymorphisme Humain (CEPH), Paris 75010, France
| | - Ghislaine Scelo
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Demetrius Albanes
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | - Neil E. Caporaso
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | - Yuanqing Ye
- Department of Epidemiology, U.T. M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Jian Gu
- Department of Epidemiology, U.T. M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Xifeng Wu
- Department of Epidemiology, U.T. M.D. Anderson Cancer Center, Houston, TX 77030, USA
| | - Margaret R. Spitz
- Dan L. Duncan Cancer Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Hendrik Dienemann
- Translational Lung Research Center Heidelberg (TLRC-H), Member of the German Center for Lung Research (DZL), Heidelberg, Germany
- Department of Thoracic Surgery, Thoraxklinik at University Hospital Heidelberg, Heidelberg, Germany
| | - Albert Rosenberger
- Department of Genetic Epidemiology, University of Göttingen, Göttingen, Germany
| | - Li Su
- Department of Environmental Health, Harvard School of Public Health, Boston, MA, 617-432-1641, USA
| | - Athena Matakidou
- Cancer Research UK Cambridge Institute, Li Ka Shing Centre, Cambridge, CB2 0RE, UK
| | - Timothy Eisen
- Department of Oncology, Cambridge University Hospitals NHS Foundation Trust, Cambridge, UK
- Addenbrooke’s Hospital, Cambridge Biomedical Campus, Hill’s Road Cambridge CB2 0QQ, UK
| | | | - Angela Risch
- Division of Epigenomics and Cancer Risk Factors, German Cancer Research Center (DKFZ), Heidelberg, Germany
- Translational Lung Research Center Heidelberg (TLRC-H), Member of the German Center for Lung Research (DZL), Heidelberg, Germany
| | - Stephen J. Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | - David C. Christiani
- Department of Environmental Health, Harvard School of Public Health, Boston, MA, 617-432-1641, USA
| | - Rayjean J. Hung
- Lunenfeld-Tanenbaum Research Institute of Mount Sinai Hospital. Toronto, Canada
| | - Paul Brennan
- International Agency for Research on Cancer (IARC/WHO), Lyon, France
| | - Maria Teresa Landi
- Division of Cancer Epidemiology and Genetics, National Cancer institute, NIH, DHHS, Bethesda, MD 20892-9769, USA
| | - Richard S. Houlston
- Division of Genetics and Epidemiology, Institute of Cancer Research, Sutton, Surrey, SM2 5NG, UK
| | - Christopher I. Amos
- Center for Genomic Medicine Department of Community and Family Medicine, Geisel School of Medicine, Dartmouth College, 46 Centerra Parkway, Suite 330, Lebanon, NH 03766
| |
Collapse
|
232
|
Whole-genome sequence variation, population structure and demographic history of the Dutch population. Nat Genet 2014; 46:818-25. [PMID: 24974849 DOI: 10.1038/ng.3021] [Citation(s) in RCA: 494] [Impact Index Per Article: 44.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2013] [Accepted: 06/06/2014] [Indexed: 12/16/2022]
Abstract
Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring families and constructed a haplotype map of 20.4 million single-nucleotide variants and 1.2 million insertions and deletions. The intermediate coverage (∼13×) and trio design enabled extensive characterization of structural variation, including midsize events (30-500 bp) previously poorly catalogued and de novo mutations. We demonstrate that the quality of the haplotypes boosts imputation accuracy in independent samples, especially for lower frequency alleles. Population genetic analyses demonstrate fine-scale structure across the country and support multiple ancient migrations, consistent with historical changes in sea level and flooding. The GoNL Project illustrates how single-population whole-genome sequencing can provide detailed characterization of genetic variation and may guide the design of future population studies.
Collapse
|
233
|
Abstract
Inferring population genetic structure from large-scale genotyping of single-nucleotide polymorphisms or variants is an important technique for studying the history and distribution of extant human populations, but it is also a very important tool for adjusting tests of association. However, the structures inferred depend on the minor allele frequency of the variants; this is very important when considering the phenotypic association of rare variants. Using the Genetic Analysis Workshop 18 data set for 142 unrelated individuals, which includes genotypes for many rare variants, we study the following hypothesis: the difference in detected structure is the result of a "scale" effect; that is, rare variants are likely to be shared only locally (smaller scale), while common variants can be spread over longer distances. The result is similar to that of using kernel principal component analysis, as the bandwidth of the kernel is changed. We show how different structures become evident as we consider rare or common variants.
Collapse
Affiliation(s)
- Omar De la Cruz
- Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, 10900 Euclid Ave, Cleveland, OH 44106, USA
| | - Paola Raska
- Department of Epidemiology and Biostatistics, Case Western Reserve University School of Medicine, 10900 Euclid Ave, Cleveland, OH 44106, USA
| |
Collapse
|
234
|
Gompert Z, Lucas LK, Buerkle CA, Forister ML, Fordyce JA, Nice CC. Admixture and the organization of genetic diversity in a butterfly species complex revealed through common and rare genetic variants. Mol Ecol 2014; 23:4555-73. [DOI: 10.1111/mec.12811] [Citation(s) in RCA: 146] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2013] [Revised: 04/27/2014] [Accepted: 04/29/2014] [Indexed: 12/16/2022]
Affiliation(s)
| | - Lauren K. Lucas
- Department of Biology; Utah State University; Logan UT 84322 USA
- Department of Biology; Texas State University; San Marcos TX 78666 USA
| | - C. Alex Buerkle
- Department of Botany and Program in Ecology; University of Wyoming; Laramie WY 82071 USA
| | | | - James A. Fordyce
- Department of Ecology & Evolutionary Biology; University of Tennessee; Knoxville TN 37996 USA
| | - Chris C. Nice
- Department of Biology; Texas State University; San Marcos TX 78666 USA
| |
Collapse
|
235
|
Morris DW, Pearson RD, Cormican P, Kenny EM, O'Dushlaine CT, Perreault LPL, Giannoulatou E, Tropea D, Maher BS, Wormley B, Kelleher E, Fahey C, Molinos I, Bellini S, Pirinen M, Strange A, Freeman C, Thiselton DL, Elves RL, Regan R, Ennis S, Dinan TG, McDonald C, Murphy KC, O'Callaghan E, Waddington JL, Walsh D, O'Donovan M, Grozeva D, Craddock N, Stone J, Scolnick E, Purcell S, Sklar P, Coe B, Eichler EE, Ophoff R, Buizer J, Szatkiewicz J, Hultman C, Sullivan P, Gurling H, Mcquillin A, St Clair D, Rees E, Kirov G, Walters J, Blackwood D, Johnstone M, Donohoe G, O'Neill FA, Kendler KS, Gill M, Riley BP, Spencer CCA, Corvin A. An inherited duplication at the gene p21 Protein-Activated Kinase 7 (PAK7) is a risk factor for psychosis. Hum Mol Genet 2014; 23:3316-26. [PMID: 24474471 PMCID: PMC4030770 DOI: 10.1093/hmg/ddu025] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2013] [Revised: 12/24/2013] [Accepted: 01/20/2014] [Indexed: 12/14/2022] Open
Abstract
Identifying rare, highly penetrant risk mutations may be an important step in dissecting the molecular etiology of schizophrenia. We conducted a gene-based analysis of large (>100 kb), rare copy-number variants (CNVs) in the Wellcome Trust Case Control Consortium 2 (WTCCC2) schizophrenia sample of 1564 cases and 1748 controls all from Ireland, and further extended the analysis to include an additional 5196 UK controls. We found association with duplications at chr20p12.2 (P = 0.007) and evidence of replication in large independent European schizophrenia (P = 0.052) and UK bipolar disorder case-control cohorts (P = 0.047). A combined analysis of Irish/UK subjects including additional psychosis cases (schizophrenia and bipolar disorder) identified 22 carriers in 11 707 cases and 10 carriers in 21 204 controls [meta-analysis Cochran-Mantel-Haenszel P-value = 2 × 10(-4); odds ratio (OR) = 11.3, 95% CI = 3.7, ∞]. Nineteen of the 22 cases and 8 of the 10 controls carried duplications starting at 9.68 Mb with similar breakpoints across samples. By haplotype analysis and sequencing, we identified a tandem ~149 kb duplication overlapping the gene p21 Protein-Activated Kinase 7 (PAK7, also called PAK5) which was in linkage disequilibrium with local haplotypes (P = 2.5 × 10(-21)), indicative of a single ancestral duplication event. We confirmed the breakpoints in 8/8 carriers tested and found co-segregation of the duplication with illness in two additional family members of one of the affected probands. We demonstrate that PAK7 is developmentally co-expressed with another known psychosis risk gene (DISC1) suggesting a potential molecular mechanism involving aberrant synapse development and plasticity.
Collapse
Affiliation(s)
- Derek W Morris
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Richard D Pearson
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Paul Cormican
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Elaine M Kenny
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Colm T O'Dushlaine
- Broad Institute and Center for Human Genetics Research of Massachusetts General Hospital, Boston, MA 02142, USA
| | - Louis-Philippe Lemieux Perreault
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK Montreal Heart Institute, Université de Montréal, Montréal, Québec H1T 1C8, Canada
| | - Eleni Giannoulatou
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Daniela Tropea
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Brion S Maher
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Brandon Wormley
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Eric Kelleher
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Ciara Fahey
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Ines Molinos
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Stefania Bellini
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Matti Pirinen
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Amy Strange
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Colin Freeman
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Dawn L Thiselton
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Rachel L Elves
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Regina Regan
- School of Medicine and Medical Science, University College Dublin, Ireland
| | - Sean Ennis
- School of Medicine and Medical Science, University College Dublin, Ireland
| | - Timothy G Dinan
- Department of Psychiatry, University College Cork, Cork, Ireland
| | - Colm McDonald
- Department of Psychiatry, National University of Ireland, Galway, University Road, Galway, Ireland
| | - Kieran C Murphy
- Department of Psychiatry, RCSI Education and Research Centre, Beaumont Hospital, Dublin 9, Ireland
| | - Eadbhard O'Callaghan
- DETECT Early Intervention in Psychosis Services, Dun Laoghaire, Co. Dublin, Ireland
| | - John L Waddington
- Molecular and Cellular Therapeutics, Royal College of Surgeons in Ireland, Dublin 2, Ireland
| | - Dermot Walsh
- Health Research Board, 73 Lower Baggot St, Dublin 2, Ireland
| | - Michael O'Donovan
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - Detelina Grozeva
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - Nick Craddock
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - Jennifer Stone
- Broad Institute and Center for Human Genetics Research of Massachusetts General Hospital, Boston, MA 02142, USA
| | - Ed Scolnick
- Broad Institute and Center for Human Genetics Research of Massachusetts General Hospital, Boston, MA 02142, USA
| | - Shaun Purcell
- Broad Institute and Center for Human Genetics Research of Massachusetts General Hospital, Boston, MA 02142, USA The Mount Sinai Hospital, New York, NY 10029, USA
| | - Pamela Sklar
- Broad Institute and Center for Human Genetics Research of Massachusetts General Hospital, Boston, MA 02142, USA The Mount Sinai Hospital, New York, NY 10029, USA
| | - Bradley Coe
- University of Washington School of Medicine, Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Evan E Eichler
- University of Washington School of Medicine, Howard Hughes Medical Institute, Seattle, WA 98195, USA
| | - Roel Ophoff
- Department of Human Genetics, UCLA School of Medicine, Los Angeles, CA 90095, USA
| | - Jacobine Buizer
- Rudolf Magnus Institute, University of Utrecht, 3584 CG Utrecht, Netherlands
| | - Jin Szatkiewicz
- University of North Carolina, Chapel Hill, NC 27599-7264, USA
| | - Christina Hultman
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, SE-171 77 Stockholm, Sweden
| | | | - Hugh Gurling
- Molecular Psychiatry Laboratory, Mental Health Sciences Unit, University College London, London WC1E 6BT, UK
| | - Andrew Mcquillin
- Molecular Psychiatry Laboratory, Mental Health Sciences Unit, University College London, London WC1E 6BT, UK
| | - David St Clair
- Institute of Medical Sciences, University of Aberdeen, Foresterhill, Aberdeen AB25 2ZD, UK
| | - Elliott Rees
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - George Kirov
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - James Walters
- MRC Centre for Neuropsychiatric Genetics and Genomics, and Neuroscience and Mental Health Research Institute, Cardiff University, Heath Park, Cardiff CF4 4XN, UK
| | - Douglas Blackwood
- Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK and
| | - Mandy Johnstone
- Division of Psychiatry, University of Edinburgh, Royal Edinburgh Hospital, Edinburgh EH10 5HF, UK and
| | - Gary Donohoe
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Francis A O'Neill
- Department of Psychiatry, Queen's University, Belfast BT7 1NN, Northern Ireland
| | - Kenneth S Kendler
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Michael Gill
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| | - Brien P Riley
- Departments of Psychiatry and Human Genetics, Virginia Institute of Psychiatric and Behavioral Genetics, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Chris C A Spencer
- Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Aiden Corvin
- Department of Psychiatry and Neuropsychiatric Genetics Research Group, Institute of Molecular Medicine, Trinity College Dublin, Dublin 2, Ireland
| |
Collapse
|
236
|
Zhou J, Tantoso E, Wong LP, Ong RTH, Bei JX, Li Y, Liu J, Khor CC, Teo YY. iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array. Bioinformatics 2014; 30:1714-20. [PMID: 24567545 DOI: 10.1093/bioinformatics/btu107] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Next-generation genotyping microarrays have been designed with insights from 1000 Genomes Project and whole-exome sequencing studies. These arrays additionally include variants that are typically present at lower frequencies. Determining the genotypes of these variants from hybridization intensities is challenging because there is less support to locate the presence of the minor alleles when the allele counts are low. Existing algorithms are mainly designed for calling common variants and are notorious for failing to generate accurate calls for low-frequency and rare variants. Here, we introduce a new calling algorithm, iCall, to call genotypes for variants across the whole spectrum of allele frequencies. RESULTS We benchmarked iCall against four of the most commonly used algorithms, GenCall, optiCall, illuminus and GenoSNP, as well as a post-processing caller zCall that adopted a two-stage calling design. Normalized hybridization intensities for 12 370 individuals genotyped on the Illumina HumanExome BeadChip were considered, of which 81 individuals were also whole-genome sequenced. The sequence calls were used to benchmark the accuracy of the genotype calling, and our comparisons indicated that iCall outperforms all four single-stage calling algorithms in terms of call rates and concordance, particularly in the calling accuracy of minor alleles, which is the principal concern for rare and low-frequency variants. The application of zCall to post-process the output from iCall also produced marginally improved performance to the combination of zCall and GenCall. AVAILABILITY AND IMPLEMENTATION iCall is implemented in C++ for use on Linux operating systems and is available for download at http://www.statgen.nus.edu.sg/∼software/icall.html.
Collapse
Affiliation(s)
- Jin Zhou
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Erwin Tantoso
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Lai-Ping Wong
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Rick Twee-Hee Ong
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Jin-Xin Bei
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Yi Li
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Jianjun Liu
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Chiea-Chuen Khor
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, SingaporeDepartment of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Department of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, SingaporeDepartment of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, SingaporeDepartment of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, SingaporeDepartment of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, SingaporeDepartment of Statistics and Applied Probability, Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Genome Institute of Singapore, Singapore, NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore and Life Sciences Institute, National University of Singapore, Singapore
| |
Collapse
|
237
|
Utilizing population controls in rare-variant case-parent association tests. Am J Hum Genet 2014; 94:845-53. [PMID: 24836453 DOI: 10.1016/j.ajhg.2014.04.014] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2013] [Accepted: 04/24/2014] [Indexed: 01/10/2023] Open
Abstract
There is great interest in detecting associations between human traits and rare genetic variation. To address the low power implicit in single-locus tests of rare genetic variants, many rare-variant association approaches attempt to accumulate information across a gene, often by taking linear combinations of single-locus contributions to a statistic. Using the right linear combination is key-an optimal test will up-weight true causal variants, down-weight neutral variants, and correctly assign the direction of effect for causal variants. Here, we propose a procedure that exploits data from population controls to estimate the linear combination to be used in an case-parent trio rare-variant association test. Specifically, we estimate the linear combination by comparing population control allele frequencies with allele frequencies in the parents of affected offspring. These estimates are then used to construct a rare-variant transmission disequilibrium test (rvTDT) in the case-parent data. Because the rvTDT is conditional on the parents' data, using parental data in estimating the linear combination does not affect the validity or asymptotic distribution of the rvTDT. By using simulation, we show that our new population-control-based rvTDT can dramatically improve power over rvTDTs that do not use population control information across a wide variety of genetic architectures. It also remains valid under population stratification. We apply the approach to a cohort of epileptic encephalopathy (EE) trios and find that dominant (or additive) inherited rare variants are unlikely to play a substantial role within EE genes previously identified through de novo mutation studies.
Collapse
|
238
|
Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 2014; 15:335-46. [PMID: 24739678 DOI: 10.1038/nrg3706] [Citation(s) in RCA: 377] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Collapse
Affiliation(s)
- Pak C Sham
- Centre for Genomic Sciences, Jockey Club Building for Interdisciplinary Research; State Key Laboratory of Brain and Cognitive Sciences, and Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Shaun M Purcell
- 1] Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York 10029-6574, USA. [2] Center for Human Genetic Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
239
|
King CR, Nicolae DL. GWAS to Sequencing: Divergence in Study Design and Analysis. Genes (Basel) 2014; 5:460-76. [PMID: 24879455 PMCID: PMC4094943 DOI: 10.3390/genes5020460] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Revised: 05/13/2014] [Accepted: 05/15/2014] [Indexed: 12/03/2022] Open
Abstract
The success of genome-wide association studies (GWAS) in uncovering genetic risk factors for complex traits has generated great promise for the complete data generated by sequencing. The bumpy transition from GWAS to whole-exome or whole-genome association studies (WGAS) based on sequencing investigations has highlighted important differences in analysis and interpretation. We show how the loss in power due to the allele frequency spectrum targeted by sequencing is difficult to compensate for with realistic effect sizes and point to study designs that may help. We discuss several issues in interpreting the results, including a special case of the winner's curse. Extrapolation and prediction using rare SNPs is complex, because of the selective ascertainment of SNPs in case-control studies and the low amount of information at each SNP, and naive procedures are biased under the alternative. We also discuss the challenges in tuning gene-based tests and accounting for multiple testing when genes have very different sets of SNPs. The examples we emphasize in this paper highlight the difficult road we must travel for a two-letter switch.
Collapse
Affiliation(s)
| | - Dan L Nicolae
- Departments of Medicine, Statistics, and Human Genetics, University of Chicago, Chicago,IL 60637, USA.
| |
Collapse
|
240
|
Vrieze SI, Feng S, Miller MB, Hicks BM, Pankratz N, Abecasis GR, Iacono WG, McGue M. Rare nonsynonymous exonic variants in addiction and behavioral disinhibition. Biol Psychiatry 2014; 75:783-9. [PMID: 24094508 PMCID: PMC3975816 DOI: 10.1016/j.biopsych.2013.08.027] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/02/2013] [Accepted: 08/26/2013] [Indexed: 10/26/2022]
Abstract
BACKGROUND Substance use is heritable, but few common genetic variants have been associated with these behaviors. Rare nonsynonymous exonic variants can now be efficiently genotyped, allowing exome-wide association tests. We identified and tested 111,592 nonsynonymous exonic variants for association with behavioral disinhibition and the use/misuse of nicotine, alcohol, and illicit drugs. METHODS Comprehensive genotyping of exonic variation combined with single-variant and gene-based tests of association was conducted in 7181 individuals; 172 candidate addiction genes were evaluated in greater detail. We also evaluated the aggregate effects of nonsynonymous variants on these phenotypes using Genome-wide Complex Trait Analysis. RESULTS No variant or gene was significantly associated with any phenotype. No association was found for any of the 172 candidate genes, even at reduced significance thresholds. All nonsynonymous variants jointly accounted for 35% of the heritability in illicit drug use and, when combined with common variants from a genome-wide array, accounted for 84% of the heritability. CONCLUSIONS Rare nonsynonymous variants may be important in etiology of illicit drug use, but detection of individual variants will require very large samples.
Collapse
Affiliation(s)
- Scott I Vrieze
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.
| | - Shuang Feng
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Michael B Miller
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| | - Brian M Hicks
- Department of Psychiatry (BMH), University of Michigan, Ann Arbor, Michigan
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology (NP), University of Minnesota, Minneapolis, Minnesota
| | - Gonçalo R Abecasis
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - William G Iacono
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| | - Matt McGue
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
241
|
Zakharov S, Wang X, Liu J, Teo YY. Improving power for robust trans-ethnic meta-analysis of rare and low-frequency variants with a partitioning approach. Eur J Hum Genet 2014; 23:238-44. [PMID: 24801758 DOI: 10.1038/ejhg.2014.78] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Revised: 02/20/2014] [Accepted: 04/04/2014] [Indexed: 01/06/2023] Open
Abstract
While genome-wide association studies have discovered numerous bona fide variants that are associated with common diseases and complex traits; these variants tend to be common in the population and explain only a small proportion of the phenotype variance. The search for the missing heritability has thus switched to rare and low-frequency variants, defined as <5% in the population, but which are expected to have a bigger impact on phenotypic outcomes. The rarer nature of these variants coupled with the curse of testing multiple variants across the genome meant that large sample sizes will still be required despite the assumption of bigger effect sizes. Combining data from multiple studies in a meta-analysis will continue to be the natural approach in boosting sample sizes. However, the population genetics of rare variants suggests that allelic and effect size heterogeneity across populations of different ancestries is likely to pose a greater challenge to trans-ethnic meta-analysis of rare variants than to similar analyses of common variants. Here, we introduce a novel method to perform trans-ethnic meta-analysis of rare and low-frequency variants. The approach is centered on partitioning the studies into distinct clusters using local inference of genomic similarity between population groups, with the aim to minimize both the number of clusters and between-study heterogeneity in each cluster. Through a series of simulations, we show that our approach either performs similarly to or outperforms conventional and recently introduced meta-analysis strategies, particularly in the presence of allelic heterogeneity.
Collapse
Affiliation(s)
- Sergii Zakharov
- 1] Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore [2] Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Xu Wang
- Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore
| | - Jianjun Liu
- Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore
| | - Yik-Ying Teo
- 1] Saw Swee Hock School of Public Health, National University of Singapore, Singapore, Singapore [2] Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore, Singapore [3] Department of Statistics and Applied Probability, National University of Singapore, Singapore, Singapore [4] NUS Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore, Singapore [5] Life Sciences Institute, National University of Singapore, Singapore, Singapore
| |
Collapse
|
242
|
Gupta AR, Pirruccello M, Cheng F, Kang HJ, Fernandez TV, Baskin JM, Choi M, Liu L, Ercan-Sencicek AG, Murdoch JD, Klei L, Neale BM, Franjic D, Daly MJ, Lifton RP, De Camilli P, Zhao H, Sestan N, State MW. Rare deleterious mutations of the gene EFR3A in autism spectrum disorders. Mol Autism 2014; 5:31. [PMID: 24860643 PMCID: PMC4032628 DOI: 10.1186/2040-2392-5-31] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 03/28/2014] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Whole-exome sequencing studies in autism spectrum disorder (ASD) have identified de novo mutations in novel candidate genes, including the synaptic gene Eighty-five Requiring 3A (EFR3A). EFR3A is a critical component of a protein complex required for the synthesis of the phosphoinositide PtdIns4P, which has a variety of functions at the neural synapse. We hypothesized that deleterious mutations in EFR3A would be significantly associated with ASD. METHODS We conducted a large case/control association study by deep resequencing and analysis of whole-exome data for coding and splice site variants in EFR3A. We determined the potential impact of these variants on protein structure and function by a variety of conservation measures and analysis of the Saccharomyces cerevisiae Efr3 crystal structure. We also analyzed the expression pattern of EFR3A in human brain tissue. RESULTS Rare nonsynonymous mutations in EFR3A were more common among cases (16 / 2,196 = 0.73%) than matched controls (12 / 3,389 = 0.35%) and were statistically more common at conserved nucleotides based on an experiment-wide significance threshold (P = 0.0077, permutation test). Crystal structure analysis revealed that mutations likely to be deleterious were also statistically more common in cases than controls (P = 0.017, Fisher exact test). Furthermore, EFR3A is expressed in cortical neurons, including pyramidal neurons, during human fetal brain development in a pattern consistent with ASD-related genes, and it is strongly co-expressed (P < 2.2 × 10(-16), Wilcoxon test) with a module of genes significantly associated with ASD. CONCLUSIONS Rare deleterious mutations in EFR3A were found to be associated with ASD using an experiment-wide significance threshold. Synaptic phosphoinositide metabolism has been strongly implicated in syndromic forms of ASD. These data for EFR3A strengthen the evidence for the involvement of this pathway in idiopathic autism.
Collapse
Affiliation(s)
- Abha R Gupta
- Department of Pediatrics and Child Study Center, Yale School of Medicine, New Haven, CT 06520, USA
| | | | - Feng Cheng
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA ; College of Pharmacy, University of South Florida, Tampa, FL 33612, USA
| | - Hyo Jung Kang
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA ; Department of Life Science, Chung-Ang University, Seoul, Korea
| | - Thomas V Fernandez
- Department of Psychiatry and Child Study Center, Yale School of Medicine, New Haven, CT 06520, USA
| | - Jeremy M Baskin
- Department of Cell Biology, Howard Hughes Medical Institute, Program in Cellular Neuroscience Neurodegeneration and Repair, Yale School of Medicine, New Haven, CT 06520, USA
| | - Murim Choi
- Department of Genetics, Howard Hughes Medical Institute, Yale School of Medicine, New Haven, CT 06520, USA
| | - Li Liu
- Department of Statistics, Carnegie Mellon University, Pittsburgh, PA 15213, USA
| | | | - John D Murdoch
- Program on Neurogenetics, Child Study Center, Department of Psychiatry, Department of Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Lambertus Klei
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA 15213, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Daniel Franjic
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA 02142, USA
| | - Richard P Lifton
- Department of Genetics, Howard Hughes Medical Institute, Yale School of Medicine, New Haven, CT 06520, USA
| | - Pietro De Camilli
- Department of Cell Biology, Howard Hughes Medical Institute, Program in Cellular Neuroscience Neurodegeneration and Repair, Yale School of Medicine, New Haven, CT 06520, USA
| | - Hongyu Zhao
- Departments of Biostatistics and Genetics, Yale School of Medicine, New Haven, CT 06520, USA
| | - Nenad Sestan
- Department of Neurobiology, Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT 06520, USA
| | - Matthew W State
- Department of Psychiatry, University of California San Francisco, San Francisco, CA 94143, USA
| |
Collapse
|
243
|
Lin WY. Association testing of clustered rare causal variants in case-control studies. PLoS One 2014; 9:e94337. [PMID: 24736372 PMCID: PMC3988195 DOI: 10.1371/journal.pone.0094337] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
Biological evidence suggests that multiple causal variants in a gene may cluster physically. Variants within the same protein functional domain or gene regulatory element would locate in close proximity on the DNA sequence. However, spatial information of variants is usually not used in current rare variant association analyses. We here propose a clustering method (abbreviated as "CLUSTER"), which is extended from the adaptive combination of P-values. Our method combines the association signals of variants that are more likely to be causal. Furthermore, the statistic incorporates the spatial information of variants. With extensive simulations, we show that our method outperforms several commonly-used methods in many scenarios. To demonstrate its use in real data analyses, we also apply this CLUSTER test to the Dallas Heart Study data. CLUSTER is among the best methods when the effects of causal variants are all in the same direction. As variants located in close proximity are more likely to have similar impact on disease risk, CLUSTER is recommended for association testing of clustered rare causal variants in case-control studies.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
244
|
Weeke P, Muhammad R, Delaney JT, Shaffer C, Mosley JD, Blair M, Short L, Stubblefield T, Roden DM, Darbar D. Whole-exome sequencing in familial atrial fibrillation. Eur Heart J 2014; 35:2477-83. [PMID: 24727801 DOI: 10.1093/eurheartj/ehu156] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
AIMS Positional cloning and candidate gene approaches have shown that atrial fibrillation (AF) is a complex disease with familial aggregation. Here, we employed whole-exome sequencing (WES) in AF kindreds to identify variants associated with familial AF. METHODS AND RESULTS WES was performed on 18 individuals in six modestly sized familial AF kindreds. After filtering very rare variants by multiple metrics, we identified 39 very rare and potentially pathogenic variants [minor allele frequency (MAF) ≤0.04%] in genes not previously associated with AF. Despite stringent filtering >1 very rare variants in the 5/6 of the kindreds were identified, whereas no plausible variants contributing to familial AF were found in 1/6 of the kindreds. Two candidate AF variants in the calcium channel subunit genes (CACNB2 and CACNA2D4) were identified in two separate families using expression data and predicted function. CONCLUSION By coupling family data with exome sequencing, we identified multiple very rare potentially pathogenic variants in five of six families, suggestive of a complex disease mechanism, whereas none were identified in the remaining AF pedigree. This study highlights some important limitations and challenges associated with performing WES in AF including the importance of having large well-curated multi-generational pedigrees, the issue of potential AF misclassification, and limitations of WES technology when applied to a complex disease.
Collapse
Affiliation(s)
- Peter Weeke
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA Department of Cardiology, Copenhagen University Hospital, Gentofte, Denmark
| | - Raafia Muhammad
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Jessica T Delaney
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Christian Shaffer
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Jonathan D Mosley
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Marcia Blair
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Laura Short
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Tanya Stubblefield
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA
| | - Dan M Roden
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA Division of Cardiovascular Medicine, Vanderbilt University School of Medicine, 2215B Garland Avenue, Room 1285A MRB IV, Nashville 37323-6602, TN, USA
| | - Dawood Darbar
- Division of Clinical Pharmacology, Vanderbilt University, Nashville, TN, USA Division of Cardiovascular Medicine, Vanderbilt University School of Medicine, 2215B Garland Avenue, Room 1285A MRB IV, Nashville 37323-6602, TN, USA
| | | |
Collapse
|
245
|
He L, Sillanpää MJ, Ripatti S, Pitkäniemi J. Bayesian Latent Variable Collapsing Model for Detecting Rare Variant Interaction Effect in Twin Study. Genet Epidemiol 2014; 38:310-24. [DOI: 10.1002/gepi.21804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2013] [Revised: 02/28/2014] [Accepted: 02/28/2014] [Indexed: 12/12/2022]
Affiliation(s)
- Liang He
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
| | - Mikko J. Sillanpää
- Department of Mathematical Sciences; University of Oulu; Oulu Finland
- Department of Biology and Biocenter Oulu; University of Oulu; Oulu Finland
| | - Samuli Ripatti
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
- Institute for Molecular Medicine Finland FIMM; University of Helsinki; Finland
- Human Genetics; Wellcome Trust Sanger Institute; United Kingdom
| | - Janne Pitkäniemi
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
- Finnish Cancer Registry; Institute for Statistical and Epidemiological Cancer Research; Helsinki Finland
| |
Collapse
|
246
|
Pyhäjärvi T, Hufford MB, Mezmouk S, Ross-Ibarra J. Complex patterns of local adaptation in teosinte. Genome Biol Evol 2014; 5:1594-609. [PMID: 23902747 PMCID: PMC3787665 DOI: 10.1093/gbe/evt109] [Citation(s) in RCA: 115] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open
Abstract
Populations of widely distributed species encounter and must adapt to local environmental conditions. However, comprehensive characterization of the genetic basis of adaptation is demanding, requiring genome-wide genotype data, multiple sampled populations, and an understanding of population structure and potential selection pressures. Here, we used single-nucleotide polymorphism genotyping and data on numerous environmental variables to describe the genetic basis of local adaptation in 21 populations of teosinte, the wild ancestor of maize. We found complex hierarchical genetic structure created by altitude, dispersal events, and admixture among subspecies, which complicated identification of locally beneficial alleles. Patterns of linkage disequilibrium revealed four large putative inversion polymorphisms showing clinal patterns of frequency. Population differentiation and environmental correlations suggest that both inversions and intergenic polymorphisms are involved in local adaptation.
Collapse
Affiliation(s)
- Tanja Pyhäjärvi
- Department of Plant Sciences, University of California, Davis
| | | | | | | |
Collapse
|
247
|
Ling Y, Jin Z, Su M, Zhong J, Zhao Y, Yu J, Wu J, Xiao J. VCGDB: a dynamic genome database of the Chinese population. BMC Genomics 2014; 15:265. [PMID: 24708222 PMCID: PMC4028056 DOI: 10.1186/1471-2164-15-265] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2013] [Accepted: 03/28/2014] [Indexed: 12/18/2022] Open
Abstract
Background The data released by the 1000 Genomes Project contain an increasing number of genome sequences from different nations and populations with a large number of genetic variations. As a result, the focus of human genome studies is changing from single and static to complex and dynamic. The currently available human reference genome (GRCh37) is based on sequencing data from 13 anonymous Caucasian volunteers, which might limit the scope of genomics, transcriptomics, epigenetics, and genome wide association studies. Description We used the massive amount of sequencing data published by the 1000 Genomes Project Consortium to construct the Virtual Chinese Genome Database (VCGDB), a dynamic genome database of the Chinese population based on the whole genome sequencing data of 194 individuals. VCGDB provides dynamic genomic information, which contains 35 million single nucleotide variations (SNVs), 0.5 million insertions/deletions (indels), and 29 million rare variations, together with genomic annotation information. VCGDB also provides a highly interactive user-friendly virtual Chinese genome browser (VCGBrowser) with functions like seamless zooming and real-time searching. In addition, we have established three population-specific consensus Chinese reference genomes that are compatible with mainstream alignment software. Conclusions VCGDB offers a feasible strategy for processing big data to keep pace with the biological data explosion by providing a robust resource for genomics studies; in particular, studies aimed at finding regions of the genome associated with diseases.
Collapse
Affiliation(s)
| | | | | | | | | | - Jun Yu
- CAS Key Laboratory of Genome Sciences and Information, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
| | | | | |
Collapse
|
248
|
Wang C, Zhan X, Bragg-Gresham J, Kang HM, Stambolian D, Chew EY, Branham KE, Heckenlively J, Fulton R, Wilson RK, Mardis ER, Lin X, Swaroop A, Zöllner S, Abecasis GR. Ancestry estimation and control of population stratification for sequence-based association studies. Nat Genet 2014; 46:409-15. [PMID: 24633160 PMCID: PMC4084909 DOI: 10.1038/ng.2924] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2013] [Accepted: 02/21/2014] [Indexed: 12/15/2022]
Abstract
Estimating individual ancestry is important in genetic association studies where population structure leads to false positive signals, although assigning ancestry remains challenging with targeted sequence data. We propose a new method for the accurate estimation of individual genetic ancestry, based on direct analysis of off-target sequence reads, and implement our method in the publicly available LASER software. We validate the method using simulated and empirical data and show that the method can accurately infer worldwide continental ancestry when used with sequencing data sets with whole-genome shotgun coverage as low as 0.001×. For estimates of fine-scale ancestry within Europe, the method performs well with coverage of 0.1×. On an even finer scale, the method improves discrimination between exome-sequenced study participants originating from different provinces within Finland. Finally, we show that our method can be used to improve case-control matching in genetic association studies and to reduce the risk of spurious findings due to population structure.
Collapse
Affiliation(s)
- Chaolong Wang
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Xiaowei Zhan
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Jennifer Bragg-Gresham
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Hyun Min Kang
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| | - Dwight Stambolian
- Department of Ophthalmology, University of Pennsylvania Medical School, Philadelphia, PA 19104
| | - Emily Y. Chew
- Division of Epidemiology and Clinical Research, National Eye Institute, Bethesda, MD 20892
| | - Kari E. Branham
- Department of Ophthalmology, University of Michigan Kellogg Eye Center, Ann Arbor, MI 48105
| | - John Heckenlively
- Department of Ophthalmology, University of Michigan Kellogg Eye Center, Ann Arbor, MI 48105
| | | | - Robert Fulton
- The Genome Institute, Washington University School of Medicine, St. Louis, MO 63108
| | - Richard K. Wilson
- The Genome Institute, Washington University School of Medicine, St. Louis, MO 63108
| | - Elaine R. Mardis
- The Genome Institute, Washington University School of Medicine, St. Louis, MO 63108
| | - Xihong Lin
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115
| | - Anand Swaroop
- Neurobiology-Neurodegeneration & Repair Laboratory, National Eye Institute, Bethesda, MD 20892
| | - Sebastian Zöllner
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
- Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI 48109
| | - Gonçalo R. Abecasis
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI 48109
| |
Collapse
|
249
|
Pulit SL, Leusink M, Menelaou A, de Bakker PIW. Association claims in the sequencing era. Genes (Basel) 2014; 5:196-213. [PMID: 24705293 PMCID: PMC3978519 DOI: 10.3390/genes5010196] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2013] [Revised: 02/24/2014] [Accepted: 02/24/2014] [Indexed: 12/13/2022] Open
Abstract
Since the completion of the Human Genome Project, the field of human genetics has been in great flux, largely due to technological advances in studying DNA sequence variation. Although community-wide adoption of statistical standards was key to the success of genome-wide association studies, similar standards have not yet been globally applied to the processing and interpretation of sequencing data. It has proven particularly challenging to pinpoint unequivocally disease variants in sequencing studies of polygenic traits. Here, we comment on a number of factors that may contribute to irreproducible claims of association in scientific literature and discuss possible steps that we can take towards cultural change.
Collapse
Affiliation(s)
- Sara L Pulit
- Department of Medical Genetics, Institute for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
| | - Maarten Leusink
- Department of Medical Genetics, Institute for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
| | - Androniki Menelaou
- Department of Medical Genetics, Institute for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
| | - Paul I W de Bakker
- Department of Medical Genetics, Institute for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
| |
Collapse
|
250
|
Flannick J, Thorleifsson G, Beer NL, Jacobs SBR, Grarup N, Burtt NP, Mahajan A, Fuchsberger C, Atzmon G, Benediktsson R, Blangero J, Bowden DW, Brandslund I, Brosnan J, Burslem F, Chambers J, Cho YS, Christensen C, Douglas DA, Duggirala R, Dymek Z, Farjoun Y, Fennell T, Fontanillas P, Forsén T, Gabriel S, Glaser B, Gudbjartsson DF, Hanis C, Hansen T, Hreidarsson AB, Hveem K, Ingelsson E, Isomaa B, Johansson S, Jørgensen T, Jørgensen ME, Kathiresan S, Kong A, Kooner J, Kravic J, Laakso M, Lee JY, Lind L, Lindgren CM, Linneberg A, Masson G, Meitinger T, Mohlke KL, Molven A, Morris AP, Potluri S, Rauramaa R, Ribel-Madsen R, Richard AM, Rolph T, Salomaa V, Segrè AV, Skärstrand H, Steinthorsdottir V, Stringham HM, Sulem P, Tai ES, Teo YY, Teslovich T, Thorsteinsdottir U, Trimmer JK, Tuomi T, Tuomilehto J, Vaziri-Sani F, Voight BF, Wilson JG, Boehnke M, McCarthy MI, Njølstad PR, Pedersen O, Groop L, Cox DR, Stefansson K, Altshuler D. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat Genet 2014; 46:357-63. [PMID: 24584071 DOI: 10.1038/ng.2915] [Citation(s) in RCA: 361] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2013] [Accepted: 02/10/2014] [Indexed: 02/07/2023]
Abstract
Loss-of-function mutations protective against human disease provide in vivo validation of therapeutic targets, but none have yet been described for type 2 diabetes (T2D). Through sequencing or genotyping of ~150,000 individuals across 5 ancestry groups, we identified 12 rare protein-truncating variants in SLC30A8, which encodes an islet zinc transporter (ZnT8) and harbors a common variant (p.Trp325Arg) associated with T2D risk and glucose and proinsulin levels. Collectively, carriers of protein-truncating variants had 65% reduced T2D risk (P = 1.7 × 10(-6)), and non-diabetic Icelandic carriers of a frameshift variant (p.Lys34Serfs*50) demonstrated reduced glucose levels (-0.17 s.d., P = 4.6 × 10(-4)). The two most common protein-truncating variants (p.Arg138* and p.Lys34Serfs*50) individually associate with T2D protection and encode unstable ZnT8 proteins. Previous functional study of SLC30A8 suggested that reduced zinc transport increases T2D risk, and phenotypic heterogeneity was observed in mouse Slc30a8 knockouts. In contrast, loss-of-function mutations in humans provide strong evidence that SLC30A8 haploinsufficiency protects against T2D, suggesting ZnT8 inhibition as a therapeutic strategy in T2D prevention.
Collapse
Affiliation(s)
- Jason Flannick
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA. [3] Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, USA
| | | | - Nicola L Beer
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK
| | - Suzanne B R Jacobs
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Niels Grarup
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Noël P Burtt
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Anubha Mahajan
- Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Christian Fuchsberger
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Gil Atzmon
- 1] Department of Medicine, Albert Einstein College of Medicine, Bronx, New York, USA. [2] Department of Genetics, Albert Einstein College of Medicine, Bronx, New York, USA
| | - Rafn Benediktsson
- Department of Endocrinology and Metabolism, Landspitali University Hospital, Reykjavik, Iceland
| | - John Blangero
- Department of Genetics, Texas Biomedical Research Institute, San Antonio, Texas, USA
| | - Don W Bowden
- 1] Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA. [2] Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA. [3] Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA. [4] Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, North Carolina, USA
| | - Ivan Brandslund
- 1] Department of Clinical Biochemistry, Vejle Hospital, Vejle, Denmark. [2] Institute of Regional Health Research, University of Southern Denmark, Odense, Denmark
| | - Julia Brosnan
- Cardiovascular & Metabolic Diseases Research Unit, Pfizer, Inc., Cambridge, Massachusetts, USA
| | - Frank Burslem
- Cardiovascular and Metabolic Diseases Practice, Prescient Life Sciences, London, UK
| | - John Chambers
- 1] Department of Epidemiology and Biostatistics, Imperial College London, London, UK. [2] Imperial College Healthcare National Health Service (NHS) Trust, London, UK. [3] Ealing Hospital NHS Trust, Middlesex, UK
| | - Yoon Shin Cho
- Department of Biomedical Science, Hallym University, Chuncheon, Korea
| | - Cramer Christensen
- Department of Internal Medicine and Endocrinology, Vejle Hospital, Vejle, Denmark
| | - Desirée A Douglas
- Unit of Diabetes and Celiac Diseases, Department of Clinical Sciences, Lund University, Malmö, Sweden
| | | | - Zachary Dymek
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Yossi Farjoun
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Timothy Fennell
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Pierre Fontanillas
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Tom Forsén
- 1] Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland. [2] Diabetes Care Unit, Vaasa Health Care Centre, Vaasa, Finland
| | - Stacey Gabriel
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA
| | - Benjamin Glaser
- 1] Endocrinology and Metabolism Service, Hadassah-Hebrew University Medical Center, Jerusalem, Israel. [2] Israel Diabetes Research Group (IDRG), Holon, Israel
| | | | - Craig Hanis
- Human Genetics Center, University of Texas Health Science Center at Houston, Houston, Texas, USA
| | - Torben Hansen
- 1] Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark. [2] Faculty of Health Sciences, University of Southern Denmark, Odense, Denmark
| | - Astradur B Hreidarsson
- Department of Endocrinology and Metabolism, Landspitali University Hospital, Reykjavik, Iceland
| | - Kristian Hveem
- Department of Public Health, Faculty of Medicine, Norwegian University of Science and Technology, Levanger, Norway
| | - Erik Ingelsson
- 1] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. [2] Molecular Epidemiology and Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Bo Isomaa
- 1] Folkhalsan Research Centre, Helsinki, Finland. [2] Department of Social Services and Health Care, Jakobstad, Finland
| | - Stefan Johansson
- 1] KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway. [2] Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway. [3] Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Torben Jørgensen
- 1] Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark. [2] Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark. [3] Faculty of Medicine, University of Aalborg, Aalborg, Denmark
| | | | - Sekar Kathiresan
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. [3] Cardiovascular Research Center, Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts, USA. [4] Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA
| | | | - Jaspal Kooner
- 1] Imperial College Healthcare National Health Service (NHS) Trust, London, UK. [2] Ealing Hospital NHS Trust, Middlesex, UK. [3] National Heart and Lung Institute (NHLI), Imperial College London, Hammersmith Hospital, London, UK
| | - Jasmina Kravic
- Department of Clinical Sciences, Diabetes and Endocrinology, Lund University Diabetes Centre, Malmö, Sweden
| | - Markku Laakso
- Department of Medicine, University of Eastern Finland, Kuopio Campus and Kuopio University Hospital, Kuopio, Finland
| | - Jong-Young Lee
- Center for Genome Science, Korea National Institute of Health, Osong Health Technology, Chungcheongbuk-do, Korea
| | - Lars Lind
- Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Cecilia M Lindgren
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Allan Linneberg
- 1] Research Centre for Prevention and Health, Glostrup University Hospital, Glostrup, Denmark. [2] Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark. [3] Department of Clinical Experimental Research, Glostrup University Hospital, Glostrup, Denmark
| | | | - Thomas Meitinger
- Institute of Human Genetics, Technical University Munich, Munich, Germany
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, USA
| | - Anders Molven
- 1] KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway. [2] Gade Laboratory for Pathology, Department of Clinical Medicine, University of Bergen, Bergen, Norway. [3] Department of Pathology, Haukeland University Hospital, Bergen, Norway
| | - Andrew P Morris
- 1] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. [2] Department of Biostatistics, University of Liverpool, Liverpool, UK
| | - Shobha Potluri
- Applied Quantitative Genotherapeutics, Pfizer, Inc., South San Francisco, California, USA
| | - Rainer Rauramaa
- 1] Kuopio Research Institute of Exercise Medicine, Kuopio, Finland. [2] Department of Clinical Physiology and Nuclear Medicine, Kuopio University Hospital, Kuopio, Finland
| | - Rasmus Ribel-Madsen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Ann-Marie Richard
- Cardiovascular & Metabolic Diseases Research Unit, Pfizer, Inc., Cambridge, Massachusetts, USA
| | - Tim Rolph
- Cardiovascular & Metabolic Diseases Research Unit, Pfizer, Inc., Cambridge, Massachusetts, USA
| | - Veikko Salomaa
- National Institute for Health and Welfare (THL), Helsinki, Finland
| | - Ayellet V Segrè
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA
| | - Hanna Skärstrand
- Unit of Diabetes and Celiac Diseases, Department of Clinical Sciences, Lund University, Malmö, Sweden
| | | | - Heather M Stringham
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | | | - E Shyong Tai
- 1] Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore. [2] Department of Medicine, National University of Singapore, National University Health System, Singapore. [3] Duke-National University of Singapore Graduate Medical School, Singapore
| | - Yik Ying Teo
- 1] Saw Swee Hock School of Public Health, National University of Singapore, National University Health System, Singapore. [2] Centre for Molecular Epidemiology, National University of Singapore, Singapore. [3] Genome Institute of Singapore, Agency for Science, Technology and Research, Singapore. [4] Graduate School for Integrative Science and Engineering, National University of Singapore, Singapore. [5] Department of Statistics and Applied Probability, National University of Singapore, Singapore
| | - Tanya Teslovich
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Unnur Thorsteinsdottir
- 1] deCODE Genetics/Amgen, Inc., Reykjavik, Iceland. [2] Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| | - Jeff K Trimmer
- Cardiovascular & Metabolic Diseases Research Unit, Pfizer, Inc., Cambridge, Massachusetts, USA
| | - Tiinamaija Tuomi
- 1] Department of General Practice and Primary Health Care, University of Helsinki, Helsinki, Finland. [2] Folkhalsan Research Centre, Helsinki, Finland
| | - Jaakko Tuomilehto
- 1] Centre for Vascular Prevention, Danube-University Krems, Krems, Austria. [2] Diabetes Prevention Unit, National Institute for Health and Welfare, Helsinki, Finland. [3] Diabetes Research Group, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Fariba Vaziri-Sani
- Unit of Diabetes and Celiac Diseases, Department of Clinical Sciences, Lund University, Malmö, Sweden
| | - Benjamin F Voight
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Pharmacology, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA. [3] Department of Genetics, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, Mississippi, USA
| | - Michael Boehnke
- Center for Statistical Genetics, Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA
| | - Mark I McCarthy
- 1] Oxford Centre for Diabetes, Endocrinology and Metabolism, University of Oxford, Oxford, UK. [2] Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK. [3] Oxford National Institute for Health Research (NIHR) Biomedical Research Centre, Churchill Hospital, Oxford, UK
| | - Pål R Njølstad
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] KG Jebsen Center for Diabetes Research, Department of Clinical Science, University of Bergen, Bergen, Norway. [3] Department of Pediatrics, Haukeland University Hospital, Bergen, Norway
| | - Oluf Pedersen
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Leif Groop
- 1] Department of Clinical Sciences, Diabetes and Endocrinology, Lund University Diabetes Centre, Malmö, Sweden. [2] Finnish Institute for Molecular Medicine (FIMM), Helsinki University, Helsinki, Finland
| | - David R Cox
- Applied Quantitative Genotherapeutics, Pfizer, Inc., South San Francisco, California, USA
| | - Kari Stefansson
- 1] deCODE Genetics/Amgen, Inc., Reykjavik, Iceland. [2] Faculty of Medicine, University of Iceland, Reykjavík, Iceland
| | - David Altshuler
- 1] Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, USA. [2] Department of Molecular Biology, Massachusetts General Hospital, Boston, Massachusetts, USA. [3] Diabetes Unit, Massachusetts General Hospital, Boston, Massachusetts, USA. [4] Center for Human Genetic Research, Massachusetts General Hospital, Boston, Massachusetts, USA. [5] Cardiovascular Research Center, Cardiology Division, Massachusetts General Hospital, Boston, Massachusetts, USA. [6] Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA. [7] Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| |
Collapse
|