101
|
Vélez JI, Lopera F, Sepulveda-Falla D, Patel HR, Johar AS, Chuah A, Tobón C, Rivera D, Villegas A, Cai Y, Peng K, Arkell R, Castellanos FX, Andrews SJ, Silva Lara MF, Creagh PK, Easteal S, de Leon J, Wong ML, Licinio J, Mastronardi CA, Arcos-Burgos M. APOE*E2 allele delays age of onset in PSEN1 E280A Alzheimer's disease. Mol Psychiatry 2016; 21:916-24. [PMID: 26619808 PMCID: PMC5414071 DOI: 10.1038/mp.2015.177] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Revised: 10/07/2015] [Accepted: 10/14/2015] [Indexed: 01/10/2023]
Abstract
Alzheimer's disease (AD) age of onset (ADAOO) varies greatly between individuals, with unique causal mutations suggesting the role of modifying genetic and environmental interactions. We analyzed ~50 000 common and rare functional genomic variants from 71 individuals of the 'Paisa' pedigree, the world's largest pedigree segregating a severe form of early-onset AD, who were affected carriers of the fully penetrant E280A mutation in the presenilin-1 (PSEN1) gene. Affected carriers with ages at the extremes of the ADAOO distribution (30s-70s age range), and linear mixed-effects models were used to build single-locus regression models outlining the ADAOO. We identified the rs7412 (APOE*E2 allele) as a whole exome-wide ADAOO modifier that delays ADAOO by ~12 years (β=11.74, 95% confidence interval (CI): 8.07-15.41, P=6.31 × 10(-8), PFDR=2.48 × 10(-3)). Subsequently, to evaluate comprehensively the APOE (apolipoprotein E) haplotype variants (E1/E2/E3/E4), the markers rs7412 and rs429358 were genotyped in 93 AD affected carriers of the E280A mutation. We found that the APOE*E2 allele, and not APOE*E4, modifies ADAOO in carriers of the E280A mutation (β=8.24, 95% CI: 4.45-12.01, P=3.84 × 10(-5)). Exploratory linear mixed-effects multilocus analysis suggested that other functional variants harbored in genes involved in cell proliferation, protein degradation, apoptotic and immune dysregulation processes (i.e., GPR20, TRIM22, FCRL5, AOAH, PINLYP, IFI16, RC3H1 and DFNA5) might interact with the APOE*E2 allele. Interestingly, suggestive evidence as an ADAOO modifier was found for one of these variants (GPR20) in a set of patients with sporadic AD from the Paisa genetic isolate. This is the first study demonstrating that the APOE*E2 allele modifies the natural history of AD typified by the age of onset in E280A mutation carriers. To the best of our knowledge, this is the largest analyzed sample of patients with a unique mutation sharing uniform environment. Formal replication of our results in other populations and in other forms of AD will be crucial for prediction, follow-up and presumably developing new therapeutic strategies for patients either at risk or affected by AD.
Collapse
Affiliation(s)
- J I Vélez
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia.,Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| | - F Lopera
- Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| | - D Sepulveda-Falla
- Neuroscience Research Group, University of Antioquia, Medellín, Colombia.,Institute of Neuropathology, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
| | - H R Patel
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - A S Johar
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - A Chuah
- Genome Discovery Unit, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - C Tobón
- Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| | - D Rivera
- Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| | - A Villegas
- Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| | - Y Cai
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - K Peng
- Biomolecular Resource Facility, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - R Arkell
- Early Mammalian Development Laboratory, Research School of Biology, The Australian National University, Canberra, ACT, Australia
| | - F X Castellanos
- NYU Child Study Center, NYU Langone Medical Center, New York, NY, USA.,Nathan Kline Institute for Psychiatric Research, Orangeburg, NY, USA
| | - S J Andrews
- Genome Diversity and Health Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - M F Silva Lara
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - P K Creagh
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - S Easteal
- Genome Diversity and Health Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia
| | - J de Leon
- Mental Health Research Center at Eastern State Hospital, University of Kentucky, Lexington, KY, USA
| | - M L Wong
- South Australian Health and Medical Research Institute and Department of Psychiatry, School of Medicine, Flinders University, Adelaide, SA, Australia
| | - J Licinio
- South Australian Health and Medical Research Institute and Department of Psychiatry, School of Medicine, Flinders University, Adelaide, SA, Australia
| | - C A Mastronardi
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia.,South Australian Health and Medical Research Institute and Department of Psychiatry, School of Medicine, Flinders University, Adelaide, SA, Australia
| | - M Arcos-Burgos
- Genomics and Predictive Medicine Group, Department of Genome Sciences, John Curtin School of Medical Research, The Australian National University, Canberra, ACT, Australia.,Neuroscience Research Group, University of Antioquia, Medellín, Colombia
| |
Collapse
|
102
|
Cuyvers E, Sleegers K. Genetic variations underlying Alzheimer's disease: evidence from genome-wide association studies and beyond. Lancet Neurol 2016; 15:857-868. [DOI: 10.1016/s1474-4422(16)00127-7] [Citation(s) in RCA: 146] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 03/07/2016] [Accepted: 03/10/2016] [Indexed: 12/20/2022]
|
103
|
Discovery of rare variants for complex phenotypes. Hum Genet 2016; 135:625-34. [PMID: 27221085 DOI: 10.1007/s00439-016-1679-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
Collapse
|
104
|
Abstract
Over the past few years, interest in the identification of rare variants that influence human phenotype has led to the development of many statistical methods for testing for association between sets of rare variants and binary or quantitative traits. Here, I review some of the most important ideas that underlie these methods and the most relevant issues when choosing a method for analysis. In addition to the tests for association, I review crucial issues in performing a rare variant study, from experimental design to interpretation and validation. I also discuss the many challenges of these studies, some of their limitations, and future research directions.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, University of Chicago, Chicago, Illinois 60637;
| |
Collapse
|
105
|
Olgiati S, Quadri M, Bonifati V. Genetics of movement disorders in the next-generation sequencing era. Mov Disord 2016; 31:458-70. [DOI: 10.1002/mds.26521] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2015] [Accepted: 11/29/2015] [Indexed: 12/15/2022] Open
Affiliation(s)
- Simone Olgiati
- Department of Clinical Genetics; Erasmus MC; Rotterdam The Netherlands
| | - Marialuisa Quadri
- Department of Clinical Genetics; Erasmus MC; Rotterdam The Netherlands
| | - Vincenzo Bonifati
- Department of Clinical Genetics; Erasmus MC; Rotterdam The Netherlands
| |
Collapse
|
106
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
107
|
Zhou YJ, Wang Y, Chen LL. Detecting the Common and Individual Effects of Rare Variants on Quantitative Traits by Using Extreme Phenotype Sampling. Genes (Basel) 2016; 7:genes7010002. [PMID: 26784232 PMCID: PMC4728382 DOI: 10.3390/genes7010002] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2015] [Revised: 12/21/2015] [Accepted: 01/05/2016] [Indexed: 12/19/2022] Open
Abstract
Next-generation sequencing technology has made it possible to detect rare genetic variants associated with complex human traits. In recent literature, various methods specifically designed for rare variants are proposed. These tests can be broadly classified into burden and nonburden tests. In this paper, we take advantage of the burden and nonburden tests, and consider the common effect and the individual deviations from the common effect. To achieve robustness, we use two methods of combining p-values, Fisher's method and the minimum-p method. In rare variant association studies, to improve the power of the tests, we explore the advantage of the extreme phenotype sampling. At first, we dichotomize the continuous phenotypes before analysis, and the two extremes are treated as two different groups representing a dichotomous phenotype. We next compare the powers of several methods based on extreme phenotype sampling and random sampling. Extensive simulation studies show that our proposed methods by using extreme phenotype sampling are the most powerful or very close to the most powerful one in various settings of true models when the same sample size is used.
Collapse
Affiliation(s)
- Ya-Jing Zhou
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| | - Yong Wang
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
| | - Li-Li Chen
- Department of Mathematics, School of Science, Harbin Institute of Technology, Harbin 150001, China.
- School of Mathematical Sciences, Heilongjiang University, Harbin 150080, China.
| |
Collapse
|
108
|
Jiang W, Yu W. Power estimation and sample size determination for replication studies of genome-wide association studies. BMC Genomics 2016; 17 Suppl 1:3. [PMID: 26818952 PMCID: PMC4895704 DOI: 10.1186/s12864-015-2296-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Background Replication study is a commonly used verification method to filter out false positives in genome-wide association studies (GWAS). If an association can be confirmed in a replication study, it will have a high confidence to be true positive. To design a replication study, traditional approaches calculate power by treating replication study as another independent primary study. These approaches do not use the information given by primary study. Besides, they need to specify a minimum detectable effect size, which may be subjective. One may think to replace the minimum effect size with the observed effect sizes in the power calculation. However, this approach will make the designed replication study underpowered since we are only interested in the positive associations from the primary study and the problem of the “winner’s curse” will occur. Results An Empirical Bayes (EB) based method is proposed to estimate the power of replication study for each association. The corresponding credible interval is estimated in the proposed approach. Simulation experiments show that our method is better than other plug-in based estimators in terms of overcoming the winner’s curse and providing higher estimation accuracy. The coverage probability of given credible interval is well-calibrated in the simulation experiments. Weighted average method is used to estimate the average power of all underlying true associations. This is used to determine the sample size of replication study. Sample sizes are estimated on 6 diseases from Wellcome Trust Case Control Consortium (WTCCC) using our method. They are higher than sample sizes estimated by plugging observed effect sizes in power calculation. Conclusions Our new method can objectively determine replication study’s sample size by using information extracted from primary study. Also the winner’s curse is alleviated. Thus, it is a better choice when designing replication studies of GWAS. The R-package is available at: http://bioinformatics.ust.hk/RPower.html.
Collapse
Affiliation(s)
- Wei Jiang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
| | - Weichuan Yu
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China.
| |
Collapse
|
109
|
Butkiewicz M, Bush WS. In Silico Functional Annotation of Genomic Variation. CURRENT PROTOCOLS IN HUMAN GENETICS 2016; 88:6.15.1-6.15.17. [PMID: 26724722 PMCID: PMC4722816 DOI: 10.1002/0471142905.hg0615s88] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
This unit describes the concepts and practical techniques for annotating genomic variants in the human genome to estimate their functional significance. With the rapid increase of available whole exome and whole genome sequencing information for human studies, annotation techniques have become progressively more important for highlighting and prioritizing nucleotide variants and their potential impact on genes and other genetic constructs. Here, we present an overview of different types of variant annotation approaches and elaborate on their foundations, assumptions, and the downstream consequences of their use. Computational approaches and tools to assign annotations and to identify variants are reviewed. Further, the general philosophy of assigning potential function to a genetic change within the biological context of a disease is discussed.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio
| | - William S Bush
- Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
110
|
Kim YJ, Lee J, Kim BJ, Park T. A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data. BMC Genomics 2015; 16:1109. [PMID: 26715385 PMCID: PMC4696174 DOI: 10.1186/s12864-015-2192-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 11/03/2015] [Indexed: 02/07/2023] Open
Abstract
Background Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants. Results In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data. Conclusions Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.
Collapse
Affiliation(s)
- Young Jin Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | - Juyoung Lee
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | - Bong-Jo Kim
- Division of Structural and Functional Genomics, Center for Genome Science, Korean National Institute of Health, Osong, Chungchungbuk-do, 363-951, South Korea.
| | | | - Taesung Park
- Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, 151-742, South Korea. .,Department of Statistics, Seoul National University, San 56-1, Shilim-dong, Kwanak-gu, Seoul, 151-742, South Korea.
| |
Collapse
|
111
|
Mensah-Ablorh A, Lindstrom S, Haiman CA, Henderson BE, Marchand LL, Lee S, Stram DO, Eliassen AH, Price A, Kraft P. Meta-Analysis of Rare Variant Association Tests in Multiethnic Populations. Genet Epidemiol 2015; 40:57-65. [PMID: 26639010 DOI: 10.1002/gepi.21939] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2015] [Revised: 09/15/2015] [Accepted: 09/19/2015] [Indexed: 12/30/2022]
Abstract
Several methods have been proposed to increase power in rare variant association testing by aggregating information from individual rare variants (MAF < 0.005). However, how to best combine rare variants across multiple ethnicities and the relative performance of designs using different ethnic sampling fractions remains unknown. In this study, we compare the performance of several statistical approaches for assessing rare variant associations across multiple ethnicities. We also explore how different ethnic sampling fractions perform, including single-ethnicity studies and studies that sample up to four ethnicities. We conducted simulations based on targeted sequencing data from 4,611 women in four ethnicities (African, European, Japanese American, and Latina). As with single-ethnicity studies, burden tests had greater power when all causal rare variants were deleterious, and variance component-based tests had greater power when some causal rare variants were deleterious and some were protective. Multiethnic studies had greater power than single-ethnicity studies at many loci, with inclusion of African Americans providing the largest impact. On average, studies including African Americans had as much as 20% greater power than equivalently sized studies without African Americans. This suggests that association studies between rare variants and complex disease should consider including subjects from multiple ethnicities, with preference given to genetically diverse groups.
Collapse
Affiliation(s)
- Akweley Mensah-Ablorh
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Sara Lindstrom
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Christopher A Haiman
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Brian E Henderson
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - Loic Le Marchand
- Epidemiology Program, University of Hawaii Cancer Research Center, Honolulu, Hawaii, United States of America
| | - Seunngeun Lee
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, Michigan, United States of America
| | - Daniel O Stram
- Department of Preventive Medicine, Keck School of Medicine and Norris Comprehensive Cancer Center, University of Southern California, Los Angeles, California, United States of America
| | - A Heather Eliassen
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Channing Division of Network Medicine, Brigham & Women's Hospital, Boston, Massachusetts, United States of America
| | - Alkes Price
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| | - Peter Kraft
- Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Program in Genetic Epidemiology and Statistical Genetics, Harvard School of Public Health, Boston, Massachusetts, United States of America.,Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
| |
Collapse
|
112
|
Yazdani A, Yazdani A, Boerwinkle E. Rare variants analysis using penalization methods for whole genome sequence data. BMC Bioinformatics 2015; 16:405. [PMID: 26637205 PMCID: PMC4670502 DOI: 10.1186/s12859-015-0825-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 11/11/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Availability of affordable and accessible whole genome sequencing for biomedical applications poses a number of statistical challenges and opportunities, particularly related to the analysis of rare variants and sparseness of the data. Although efforts have been devoted to address these challenges, the performance of statistical methods for rare variants analysis still needs further consideration. RESULT We introduce a new approach that applies restricted principal component analysis with convex penalization and then selects the best predictors of a phenotype by a concave penalized regression model, while estimating the impact of each genomic region on the phenotype. Using simulated data, we show that the proposed method maintains good power for association testing while keeping the false discovery rate low under a verity of genetic architectures. Illustrative data analyses reveal encouraging result of this method in comparison with other commonly applied methods for rare variants analysis. CONCLUSION By taking into account linkage disequilibrium and sparseness of the data, the proposed method improves power and controls the false discovery rate compared to other commonly applied methods for rare variant analyses.
Collapse
Affiliation(s)
- Akram Yazdani
- Human Genetics Center, University of Texas Health Science Center at Houston, TX, USA.
| | - Azam Yazdani
- Human Genetics Center, University of Texas Health Science Center at Houston, TX, USA.
| | - Eric Boerwinkle
- Human Genetics Center, University of Texas Health Science Center at Houston, TX, USA. .,Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA.
| |
Collapse
|
113
|
Zhu N, Heinrich V, Dickhaus T, Hecht J, Robinson PN, Mundlos S, Kamphans T, Krawitz PM. Strategies to improve the performance of rare variant association studies by optimizing the selection of controls. Bioinformatics 2015; 31:3577-83. [PMID: 26249812 DOI: 10.1093/bioinformatics/btv457] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 07/30/2015] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION When analyzing a case group of patients with ultra-rare disorders the ethnicities are often diverse and the data quality might vary. The population substructure in the case group as well as the heterogeneous data quality can cause substantial inflation of test statistics and result in spurious associations in case-control studies if not properly adjusted for. Existing techniques to correct for confounding effects were especially developed for common variants and are not applicable to rare variants. RESULTS We analyzed strategies to select suitable controls for cases that are based on similarity metrics that vary in their weighting schemes. We simulated different disease entities on real exome data and show that a similarity-based selection scheme can help to reduce false positive associations and to optimize the performance of the statistical tests. Especially when data quality as well as ethnicities vary a lot in the case group, a matching approach that puts more weight on rare variants shows the best performance. We reanalyzed collections of unrelated patients with Kabuki make-up syndrome, Hyperphosphatasia with Mental Retardation syndrome and Catel-Manzke syndrome for which the disease genes were recently described. We show that rare variant association tests are more sensitive and specific in identifying the disease gene than intersection filters and should thus be considered as a favorable approach in analyzing even small patient cohorts. AVAILABILITY AND IMPLEMENTATION Datasets used in our analysis are available at ftp://ftp.1000genomes.ebi.ac.uk./vol1/ftp/ CONTACT : peter.krawitz@charite.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Na Zhu
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Verena Heinrich
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Thorsten Dickhaus
- Institute for Statistics, University of Bremen, 28344 Bremen, Germany
| | - Jochen Hecht
- Berlin-Brandenburg Center for Regenerative Therapies (BCRT), 13353 Berlin, Germany
| | - Peter N Robinson
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany
| | - Stefan Mundlos
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany and
| | | | - Peter M Krawitz
- Institute of Medical Genetics and Human Genetics, Charité Universitätsmedizin Berlin, 13353 Berlin, Germany, Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany and
| |
Collapse
|
114
|
Sugino S, Bortz BJ, Vaida S, Karamchandani K, Janicki PK. Peripartum Anesthetic Management and Genomic Analysis of Rare Variants in a Patient with Familial Pulmonary Fibrosis. ACTA ACUST UNITED AC 2015; 5:169-72. [PMID: 26576048 DOI: 10.1213/xaa.0000000000000198] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
A 29-year-old patient, 32 weeks' pregnant, with a history of familial interstitial fibrosis, was treated for acute hypoxemia after admission to the intensive care unit. Within 48 hours, this was followed by an emergent cesarean delivery, under general anesthesia, due to acute respiratory failure. Successful perinatal obstetric and anesthetic management resulted in the delivery of a baby and recovery of the mother. Subsequent genomic analysis using next-generation sequencing of the patient's entire exome revealed that she was a carrier of a deleterious mutation in the TERT gene, previously associated with the hereditary forms of interstitial fibrosis.
Collapse
Affiliation(s)
- Shigekazu Sugino
- From the Department of Anesthesiology, Penn State Hershey Medical Center, Hershey, Pennsylvania
| | | | | | | | | |
Collapse
|
115
|
Li B, Wang GT, Leal SM. Generation of sequence-based data for pedigree-segregating Mendelian or Complex traits. Bioinformatics 2015; 31:3706-8. [PMID: 26177964 DOI: 10.1093/bioinformatics/btv412] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2015] [Accepted: 07/07/2015] [Indexed: 01/05/2023] Open
Abstract
MOTIVATION There is great interest in analyzing next generation sequence data that has been generated for pedigrees. However, unlike for population-based data there are only a limited number of rare variant methods to analyze pedigree data. One limitation is the ability to evaluate type I and II errors for family-based methods, due to lack of software that can simulate realistic sequence data for pedigrees. SUMMARY We developed RarePedSim (Rare-variant Pedigree-based Simulator), a program to simulate region/gene-level genotype and phenotype data for complex and Mendelian traits for any given pedigree structure. Using a genetic model, sequence variant data can be generated either conditionally or unconditionally on pedigree members' qualitative or quantitative phenotypes. Additionally, qualitative or quantitative traits can be generated conditional on variant data. Sequence data can either be simulated using realistic population demographic models or obtained from sequence-based studies. Variant sites can be annotated with positions, allele frequencies and functionality. For rare variants, RarePedSim is the only program that can efficiently generate both genotypes and phenotypes, regardless of pedigree structure. Data generated by RarePedSim are in standard Linkage file (.ped) and Variant Call (.vcf) formats, ready to be used for a variety of purposes, including evaluation of type I error and power, for association methods including mixed models and linkage analysis methods. AVAILABILITY AND IMPLEMENTATION bioinformatics.org/simped/rare CONTACT sleal@bcm.edu.
Collapse
Affiliation(s)
- Biao Li
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Gao T Wang
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| | - Suzanne M Leal
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA
| |
Collapse
|
116
|
Abstract
Much of cancer genetics research has focused on the identification of the most-important somatic mutations ('major drivers') that cause tumour growth. However, many mutations found in cancer might not be major drivers or 'passenger' mutations, but instead might have relatively weak tumour-promoting effects. Our aim is to highlight the existence of these mutations (termed 'mini drivers' herein), as multiple mini-driver mutations might substitute for a major-driver change, especially in the presence of genomic instability or high mutagen exposure. The mini-driver model has clinical implications: for example, the effects of therapeutically targeting such genes may be limited. However, the main importance of the model lies in helping to provide a complete understanding of tumorigenesis, especially as we anticipate that an increasing number of mini-driver mutations will be found by cancer genome sequencing.
Collapse
Affiliation(s)
- Francesc Castro-Giner
- Molecular and Population Genetics Laboratory, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| | - Peter Ratcliffe
- Henry Wellcome Building for Molecular Physiology, Nuffield Department of Clinical Medicine, University of Oxford, Roosevelt Drive, Oxford OX3 7BN, UK
| | - Ian Tomlinson
- Molecular and Population Genetics Laboratory, Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford OX3 7BN, UK
| |
Collapse
|
117
|
Detecting association of rare and common variants by adaptive combination of P-values. Genet Res (Camb) 2015; 97:e20. [PMID: 26440553 DOI: 10.1017/s0016672315000208] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) can detect common variants associated with diseases. Next generation sequencing technology has made it possible to detect rare variants. Most of association tests, including burden tests and nonburden tests, mainly target rare variants by upweighting rare variant effects and downweighting common variant effects. But there is increasing evidence that complex diseases are caused by both common and rare variants. In this paper, we extend the ADA method (adaptive combination of P-values; Lin et al., 2014) for rare variants only and propose a RC-ADA method (common and rare variants by adaptive combination of P-values). Our proposed method combines the per-site P-values with the weights based on minor allele frequencies (MAFs). The RC-ADA is robust to directions of effects of causal variants and inclusion of a high proportion of neutral variants. The performance of the RC-ADA method is compared with several other association methods. Extensive simulation studies show that the RC-ADA method is more powerful than other association methods over a wide range of models.
Collapse
|
118
|
Tzeng JY, Magnusson PKE, Sullivan PF, Szatkiewicz JP. A New Method for Detecting Associations with Rare Copy-Number Variants. PLoS Genet 2015; 11:e1005403. [PMID: 26431523 PMCID: PMC4592002 DOI: 10.1371/journal.pgen.1005403] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2015] [Accepted: 06/30/2015] [Indexed: 01/31/2023] Open
Abstract
Copy number variants (CNVs) play an important role in the etiology of many diseases such as cancers and psychiatric disorders. Due to a modest marginal effect size or the rarity of the CNVs, collapsing rare CNVs together and collectively evaluating their effect serves as a key approach to evaluating the collective effect of rare CNVs on disease risk. While a plethora of powerful collapsing methods are available for sequence variants (e.g., SNPs) in association analysis, these methods cannot be directly applied to rare CNVs due to the CNV-specific challenges, i.e., the multi-faceted nature of CNV polymorphisms (e.g., CNVs vary in size, type, dosage, and details of gene disruption), and etiological heterogeneity (e.g., heterogeneous effects of duplications and deletions that occur within a locus or in different loci). Existing CNV collapsing analysis methods (a.k.a. the burden test) tend to have suboptimal performance due to the fact that these methods often ignore heterogeneity and evaluate only the marginal effects of a CNV feature. We introduce CCRET, a random effects test for collapsing rare CNVs when searching for disease associations. CCRET is applicable to variants measured on a multi-categorical scale, collectively modeling the effects of multiple CNV features, and is robust to etiological heterogeneity. Multiple confounders can be simultaneously corrected. To evaluate the performance of CCRET, we conducted extensive simulations and analyzed large-scale schizophrenia datasets. We show that CCRET has powerful and robust performance under multiple types of etiological heterogeneity, and has performance comparable to or better than existing methods when there is no heterogeneity.
Collapse
Affiliation(s)
- Jung-Ying Tzeng
- Department of Statistics and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
- Department of Statistics, National Cheng-Kung University, Tainan, Taiwan
| | - Patrik K. E. Magnusson
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Patrick F. Sullivan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden
| | | | - Jin P. Szatkiewicz
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail:
| |
Collapse
|
119
|
Wang C, Kao WH, Hsiao CK. Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies. PLoS One 2015; 10:e0135918. [PMID: 26302001 PMCID: PMC4547758 DOI: 10.1371/journal.pone.0135918] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2014] [Accepted: 07/28/2015] [Indexed: 11/27/2022] Open
Abstract
The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.
Collapse
Affiliation(s)
- Charlotte Wang
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
| | - Wen-Hsin Kao
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
| | - Chuhsing Kate Hsiao
- Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, 100, Taiwan
- Bioinformatics and Biostatistics Core, Division of Genomic Medicine, Research Center for Medical Excellence, National Taiwan University, Taipei, 100, Taiwan
- Department of Public Health, National Taiwan University, Taipei, 100, Taiwan
- * E-mail:
| |
Collapse
|
120
|
Pendergrass SA, Verma A, Okula A, Hall MA, Crawford DC, Ritchie MD. Phenome-Wide Association Studies: Embracing Complexity for Discovery. Hum Hered 2015. [PMID: 26201697 DOI: 10.1159/000381851] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The inherent complexity of biological systems can be leveraged for a greater understanding of the impact of genetic architecture on outcomes, traits, and pharmacological response. The genome-wide association study (GWAS) approach has well-developed methods and relatively straight-forward methodologies; however, the bigger picture of the impact of genetic architecture on phenotypic outcome still remains to be elucidated even with an ever-growing number of GWAS performed. Greater consideration of the complexity of biological processes, using more data from the phenome, exposome, and diverse -omic resources, including considering the interplay of pleiotropy and genetic interactions, may provide additional leverage for making the most of the incredible wealth of information available for study. Here, we describe how incorporating greater complexity into analyses through the use of additional phenotypic data and widespread deployment of phenome-wide association studies may provide new insights into genetic factors influencing diseases, traits, and pharmacological response.
Collapse
Affiliation(s)
- Sarah A Pendergrass
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, Pa., USA
| | | | | | | | | | | |
Collapse
|
121
|
Svishcheva GR, Belonogova NM, Axenovich TI. Region-Based Association Test for Familial Data under Functional Linear Models. PLoS One 2015; 10:e0128999. [PMID: 26111046 PMCID: PMC4481467 DOI: 10.1371/journal.pone.0128999] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 05/04/2015] [Indexed: 12/22/2022] Open
Abstract
Region-based association analysis is a more powerful tool for gene mapping than testing of individual genetic variants, particularly for rare genetic variants. The most powerful methods for regional mapping are based on the functional data analysis approach, which assumes that the regional genome of an individual may be considered as a continuous stochastic function that contains information about both linkage and linkage disequilibrium. Here, we extend this powerful approach, earlier applied only to independent samples, to the samples of related individuals. To this end, we additionally include a random polygene effects in functional linear model used for testing association between quantitative traits and multiple genetic variants in the region. We compare the statistical power of different methods using Genetic Analysis Workshop 17 mini-exome family data and a wide range of simulation scenarios. Our method increases the power of regional association analysis of quantitative traits compared with burden-based and kernel-based methods for the majority of the scenarios. In addition, we estimate the statistical power of our method using regions with small number of genetic variants, and show that our method retains its advantage over burden-based and kernel-based methods in this case as well. The new method is implemented as the R-function 'famFLM' using two types of basis functions: the B-spline and Fourier bases. We compare the properties of the new method using models that differ from each other in the type of their function basis. The models based on the Fourier basis functions have an advantage in terms of speed and power over the models that use the B-spline basis functions and those that combine B-spline and Fourier basis functions. The 'famFLM' function is distributed under GPLv3 license and is freely available at http://mga.bionet.nsc.ru/soft/famFLM/.
Collapse
Affiliation(s)
- Gulnara R. Svishcheva
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Nadezhda M. Belonogova
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
| | - Tatiana I. Axenovich
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
122
|
Zeng P, Wang T. Detecting the Genomic Signature of Divergent Selection in Presence of Gene Flow. Curr Genomics 2015; 16:203-12. [PMID: 26069460 PMCID: PMC4460224 DOI: 10.2174/1389202916666150313230943] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2014] [Revised: 02/23/2015] [Accepted: 03/09/2015] [Indexed: 11/22/2022] Open
Abstract
In this paper the detection of rare variants association with continuous phenotypes of interest is investigated via the likelihood-ratio based variance component test under the framework of linear mixed models. The hypothesis testing is challenging and nonstandard, since under the null the variance component is located on the boundary of its parameter space. In this situation the usual asymptotic chisquare distribution of the likelihood ratio statistic does not necessarily hold. To circumvent the derivation of the null distribution we resort to the bootstrap method due to its generic applicability and being easy to implement. Both parametric and nonparametric bootstrap likelihood ratio tests are studied. Numerical studies are implemented to evaluate the performance of the proposed bootstrap likelihood ratio test and compare to some existing methods for the identification of rare variants. To reduce the computational time of the bootstrap likelihood ratio test we propose an effective approximation mixture for the bootstrap null distribution. The GAW17 data is used to illustrate the proposed test.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, and Center of Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| |
Collapse
|
123
|
Ying D, Sham PC, Smith DK, Zhang L, Lau YL, Yang W. HaploShare: identification of extended haplotypes shared by cases and evaluation against controls. Genome Biol 2015; 16:92. [PMID: 25956955 PMCID: PMC4432975 DOI: 10.1186/s13059-015-0662-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 04/24/2015] [Indexed: 11/11/2022] Open
Abstract
Recent founder mutations may play important roles in complex diseases and Mendelian disorders. Detecting shared haplotypes that are identical by descent (IBD) could facilitate discovery of these mutations. Several programs address this, but are usually limited to detecting pair-wise shared haplotypes and not providing a comparison of cases and controls. We present a novel algorithm and software package, HaploShare, which detects extended haplotypes that are shared by multiple individuals, and allows comparisons between cases and controls. Testing on simulated and real cases demonstrated significant improvements in detection power and reduction of false positive rate by HaploShare relative to other programs.
Collapse
Affiliation(s)
- Dingge Ying
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong. .,Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| | - Pak Chung Sham
- Department of Psychiatry, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong. .,Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| | - David Keith Smith
- State Key Laboratory for Emerging Infectious Diseases, The University of Hong Kong, Pokfulam, Hong Kong.
| | - Lu Zhang
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong.
| | - Yu Lung Lau
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong.
| | - Wanling Yang
- Department of Paediatrics and Adolescent Medicine, LKS Faculty of Medicine, The University of Hong Kong, 21 Sassoon Rd., Pokfulam, Hong Kong. .,Centre for Genomic Sciences, LKS Faculty of Medicine, The University of Hong Kong, Pokfulam, Hong Kong.
| |
Collapse
|
124
|
Gaye A, Burton TWY, Burton PR. ESPRESSO: taking into account assessment errors on outcome and exposures in power analysis for association studies. Bioinformatics 2015; 31:2691-6. [PMID: 25908791 PMCID: PMC4528636 DOI: 10.1093/bioinformatics/btv219] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2014] [Accepted: 04/19/2015] [Indexed: 12/20/2022] Open
Abstract
Motivation: Very large studies are required to provide sufficiently big sample sizes for adequately powered association analyses. This can be an expensive undertaking and it is important that an accurate sample size is identified. For more realistic sample size calculation and power analysis, the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables should be taken into account. Conventional methods to analyse power use closed-form solutions that are not flexible enough to cater for all of these elements easily. They often result in a potentially substantial overestimation of the actual power. Results: In this article, we describe the Estimating Sample-size and Power in R by Exploring Simulated Study Outcomes tool that allows assessment errors in power calculation under various biomedical scenarios to be incorporated. We also report a real world analysis where we used this tool to answer an important strategic question for an existing cohort. Availability and implementation: The software is available for online calculation and downloads at http://espresso-research.org. The code is freely available at https://github.com/ESPRESSO-research. Contact:louqman@gmail.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Amadou Gaye
- School of Social and Community Medicine, University of Bristol, UK and
| | | | - Paul R Burton
- School of Social and Community Medicine, University of Bristol, UK and
| |
Collapse
|
125
|
Zeng P, Zhao Y, Li H, Wang T, Chen F. Permutation-based variance component test in generalized linear mixed model with application to multilocus genetic association study. BMC Med Res Methodol 2015; 15:37. [PMID: 25897803 PMCID: PMC4410500 DOI: 10.1186/s12874-015-0030-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2014] [Accepted: 04/07/2015] [Indexed: 11/29/2022] Open
Abstract
Background In many medical studies the likelihood ratio test (LRT) has been widely applied to examine whether the random effects variance component is zero within the mixed effects models framework; whereas little work about likelihood-ratio based variance component test has been done in the generalized linear mixed models (GLMM), where the response is discrete and the log-likelihood cannot be computed exactly. Before applying the LRT for variance component in GLMM, several difficulties need to be overcome, including the computation of the log-likelihood, the parameter estimation and the derivation of the null distribution for the LRT statistic. Methods To overcome these problems, in this paper we make use of the penalized quasi-likelihood algorithm and calculate the LRT statistic based on the resulting working response and the quasi-likelihood. The permutation procedure is used to obtain the null distribution of the LRT statistic. We evaluate the permutation-based LRT via simulations and compare it with the score-based variance component test and the tests based on the mixture of chi-square distributions. Finally we apply the permutation-based LRT to multilocus association analysis in the case–control study, where the problem can be investigated under the framework of logistic mixed effects model. Results The simulations show that the permutation-based LRT can effectively control the type I error rate, while the score test is sometimes slightly conservative and the tests based on mixtures cannot maintain the type I error rate. Our studies also show that the permutation-based LRT has higher power than these existing tests and still maintains a reasonably high power even when the random effects do not follow a normal distribution. The application to GAW17 data also demonstrates that the proposed LRT has a higher probability to identify the association signals than the score test and the tests based on mixtures. Conclusions In the present paper the permutation-based LRT was developed for variance component in GLMM. The LRT outperforms existing tests and has a reasonably higher power under various scenarios; additionally, it is conceptually simple and easy to implement.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, , Jiangsu, People's Republic of China. .,Department of Epidemiology and Biostatistics, Center of Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical College, Xuzhou, 221004, Jiangsu, People's Republic of China.
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, , Jiangsu, People's Republic of China.
| | - Hongliang Li
- Center for Disease Control and Prevention of Pudong New Area, Pudong New Area, Shanghai, 200136, People's Republic of China.
| | - Ting Wang
- Department of Epidemiology and Biostatistics, Center of Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical College, Xuzhou, 221004, Jiangsu, People's Republic of China.
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, 211166, , Jiangsu, People's Republic of China.
| |
Collapse
|
126
|
Zhang Q. Associating rare genetic variants with human diseases. Front Genet 2015; 6:133. [PMID: 25904936 PMCID: PMC4389536 DOI: 10.3389/fgene.2015.00133] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2015] [Accepted: 03/19/2015] [Indexed: 11/20/2022] Open
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine St. Louis, MO, USA
| |
Collapse
|
127
|
Zeng P, Zhao Y, Liu J, Liu L, Zhang L, Wang T, Huang S, Chen F. Likelihood ratio tests in rare variant detection for continuous phenotypes. Ann Hum Genet 2015; 78:320-32. [PMID: 25117149 DOI: 10.1111/ahg.12071] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Accepted: 04/22/2014] [Indexed: 12/30/2022]
Abstract
It is believed that rare variants play an important role in human phenotypes; however, the detection of rare variants is extremely challenging due to their very low minor allele frequency. In this paper, the likelihood ratio test (LRT) and restricted likelihood ratio test (ReLRT) are proposed to test the association of rare variants based on the linear mixed effects model, where a group of rare variants are treated as random effects. Like the sequence kernel association test (SKAT), a state-of-the-art method for rare variant detection, LRT and ReLRT can effectively overcome the problem of directionality of effect inherent in the burden test in practice. By taking full advantage of the spectral decomposition, exact finite sample null distributions for LRT and ReLRT are obtained by simulation. We perform extensive numerical studies to evaluate the performance of LRT and ReLRT, and compare to the burden test, SKAT and SKAT-O. The simulations have shown that LRT and ReLRT can correctly control the type I error, and the controls are robust to the weights chosen and the number of rare variants under study. LRT and ReLRT behave similarly to the burden test when all the causal rare variants share the same direction of effect, and outperform SKAT across various situations. When both positive and negative effects exist, LRT and ReLRT suffer from few power reductions compared to the other two competing methods; under this case, an additional finding from our simulations is that SKAT-O is no longer the optimal test, and its power is even lower than that of SKAT. The exome sequencing SNP data from Genetic Analysis Workshop 17 were employed to illustrate the proposed methods, and interesting results are described.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, 211166, P. R. China; Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, 221004, P. R. China
| | | | | | | | | | | | | | | |
Collapse
|
128
|
Strike LT, Couvy-Duchesne B, Hansell NK, Cuellar-Partida G, Medland SE, Wright MJ. Genetics and Brain Morphology. Neuropsychol Rev 2015; 25:63-96. [DOI: 10.1007/s11065-015-9281-1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 02/08/2015] [Indexed: 12/17/2022]
|
129
|
Breheny P. The group exponential lasso for bi-level variable selection. Biometrics 2015; 71:731-40. [DOI: 10.1111/biom.12300] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Revised: 11/01/2014] [Accepted: 02/01/2015] [Indexed: 11/29/2022]
Affiliation(s)
- Patrick Breheny
- Department of Biostatistics; University of Iowa; 145 N. Riverside Dr., N336 CPHB Iowa City, Iowa 52242 U.S.A
| |
Collapse
|
130
|
Wang X, Zhang S, Li Y, Li M, Sha Q. A powerful approach to test an optimally weighted combination of rare variants in admixed populations. Genet Epidemiol 2015; 39:294-305. [PMID: 25758547 DOI: 10.1002/gepi.21894] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 01/09/2015] [Accepted: 01/26/2015] [Indexed: 11/09/2022]
Abstract
Population stratification has long been recognized as an issue in genetic association studies because unrecognized population stratification can lead to both false-positive and false-negative findings and can obscure true association signals if not appropriately corrected. This issue can be even worse in rare variant association analyses because rare variants often demonstrate stronger and potentially different patterns of stratification than common variants. To correct for population stratification in genetic association studies, we proposed a novel method to Test the effect of an Optimally Weighted combination of variants in Admixed populations (TOWA) in which the analytically derived optimal weights can be calculated from existing phenotype and genotype data. TOWA up weights rare variants and those variants that have strong associations with the phenotype. Additionally, it can adjust for the direction of the association, and allows for local ancestry difference among study subjects. Extensive simulations show that the type I error rate of TOWA is under control in the presence of population stratification and it is more powerful than existing methods. We have also applied TOWA to a real sequencing data. Our simulation studies as well as real data analysis results indicate that TOWA is a useful tool for rare variant association analyses in admixed populations.
Collapse
Affiliation(s)
- Xuexia Wang
- Joseph J. Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin, United States of America
| | | | | | | | | |
Collapse
|
131
|
Li Z, Huang Y, Li H, Hu J, Liu X, Jiang T, Sun G, Tang A, Sun X, Qian W, Zeng Y, Xie J, Zhao W, Xu Y, He T, Dong C, Liu Q, Mou L, Lu J, Lin Z, Wu S, Gao S, Guo G, Feng Q, Li Y, Zhang X, Wang J, Yang H, Wang J, Xiong C, Cai Z, Gui Y. Excess of rare variants in genes that are key epigenetic regulators of spermatogenesis in the patients with non-obstructive azoospermia. Sci Rep 2015; 5:8785. [PMID: 25739334 PMCID: PMC4350091 DOI: 10.1038/srep08785] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2014] [Accepted: 02/04/2015] [Indexed: 01/08/2023] Open
Abstract
Non-obstructive azoospermia (NOA), a severe form of male infertility, is often suspected to be linked to currently undefined genetic abnormalities. To explore the genetic basis of this condition, we successfully sequenced ~650 infertility-related genes in 757 NOA patients and 709 fertile males. We evaluated the contributions of rare variants to the etiology of NOA by identifying individual genes showing nominal associations and testing the genetic burden of a given biological process as a whole. We found a significant excess of rare, non-silent variants in genes that are key epigenetic regulators of spermatogenesis, such as BRWD1, DNMT1, DNMT3B, RNF17, UBR2, USP1 and USP26, in NOA patients (P = 5.5 × 10(-7)), corresponding to a carrier frequency of 22.5% of patients and 13.7% of controls (P = 1.4 × 10(-5)). An accumulation of low-frequency variants was also identified in additional epigenetic genes (BRDT and MTHFR). Our study suggested the potential associations of genetic defects in genes that are epigenetic regulators with spermatogenic failure in human.
Collapse
Affiliation(s)
- Zesong Li
- 1] Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China [2] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [3] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Yi Huang
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Honggang Li
- Family Planning Research Institute/The Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | | | - Xiao Liu
- BGI-Shenzhen, Shenzhen 518083, China
| | - Tao Jiang
- BGI-Shenzhen, Shenzhen 518083, China
| | | | - Aifa Tang
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Xiaojuan Sun
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Weiping Qian
- The Center of Reproductive Medicine, Peking University Shenzhen Hospital, Shenzhen 518036, China
| | - Yong Zeng
- Shenzhen Key Laboratory of Reproductive Immunology for Peri-implantation, Shenzhen Zhongshan Urology Hospital, Shenzhen 518045, China
| | - Jun Xie
- Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
| | - Wei Zhao
- BGI-Shenzhen, Shenzhen 518083, China
| | - Yu Xu
- BGI-Shenzhen, Shenzhen 518083, China
| | | | | | - Qunlong Liu
- The Center of Reproductive Medicine, Peking University Shenzhen Hospital, Shenzhen 518036, China
| | - Lisha Mou
- 1] Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China [2] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [3] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Jingxiao Lu
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Zheguang Lin
- Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
| | - Song Wu
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | | | | | | | | | | | - Jun Wang
- BGI-Shenzhen, Shenzhen 518083, China
| | | | - Jian Wang
- BGI-Shenzhen, Shenzhen 518083, China
| | - Chengliang Xiong
- Family Planning Research Institute/The Center of Reproductive Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Zhiming Cai
- 1] Shenzhen Key Laboratory of Genitourinary Cancer, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China [2] National-Regional Engineering Laboratory for Clinical Application of Cancer Genomics, Shenzhen Second People's Hospital, the First Affiliated Hospital of Shenzhen University, Shenzhen 518035, China
| | - Yaoting Gui
- Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Institute of Urology, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
| |
Collapse
|
132
|
Pham PH, Shipman WJ, Erikson GA, Schork NJ, Torkamani A. Scripps Genome ADVISER: Annotation and Distributed Variant Interpretation SERver. PLoS One 2015; 10:e0116815. [PMID: 25706643 PMCID: PMC4338027 DOI: 10.1371/journal.pone.0116815] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2014] [Accepted: 12/01/2014] [Indexed: 12/31/2022] Open
Abstract
Interpretation of human genomes is a major challenge. We present the Scripps Genome ADVISER (SG-ADVISER) suite, which aims to fill the gap between data generation and genome interpretation by performing holistic, in-depth, annotations and functional predictions on all variant types and effects. The SG-ADVISER suite includes a de-identification tool, a variant annotation web-server, and a user interface for inheritance and annotation-based filtration. SG-ADVISER allows users with no bioinformatics expertise to manipulate large volumes of variant data with ease--without the need to download large reference databases, install software, or use a command line interface. SG-ADVISER is freely available at genomics.scripps.edu/ADVISER.
Collapse
Affiliation(s)
- Phillip H. Pham
- Cypher Genomics, Inc., La Jolla, CA 92037, United States of America
| | - William J. Shipman
- Scripps Health, La Jolla, CA 92037, United States of America
- The Scripps Translational Science Institute, La Jolla, CA 92037, United States of America
| | - Galina A. Erikson
- Scripps Health, La Jolla, CA 92037, United States of America
- The Scripps Translational Science Institute, La Jolla, CA 92037, United States of America
| | - Nicholas J. Schork
- Scripps Health, La Jolla, CA 92037, United States of America
- The Scripps Translational Science Institute, La Jolla, CA 92037, United States of America
- The Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA 92037, United States of America
- Cypher Genomics, Inc., La Jolla, CA 92037, United States of America
| | - Ali Torkamani
- Scripps Health, La Jolla, CA 92037, United States of America
- The Scripps Translational Science Institute, La Jolla, CA 92037, United States of America
- The Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA 92037, United States of America
- Cypher Genomics, Inc., La Jolla, CA 92037, United States of America
| |
Collapse
|
133
|
Abstract
Genome-wide association studies (GWASs) have successfully uncovered thousands of robust associations between common variants and complex traits and diseases. Despite these successes, much of the heritability of these traits remains unexplained. Because low-frequency and rare variants are not tagged by conventional genome-wide genotyping arrays, they may represent an important and understudied component of complex trait genetics. In contrast to common variant GWASs, there are many different types of study designs, assays and analytic techniques that can be utilized for rare variant association studies (RVASs). In this review, we briefly present the different technologies available to identify rare genetic variants, including novel exome arrays. We also compare the different study designs for RVASs and argue that the best design will likely be phenotype-dependent. We discuss the main analytical issues relevant to RVASs, including the different statistical methods that can be used to test genetic associations with rare variants and the various bioinformatic approaches to predicting in silico biological functions for variants. Finally, we describe recent rare variant association findings, highlighting the unexpected conclusion that most rare variants have modest-to-small effect sizes on phenotypic variation. This observation has major implications for our understanding of the genetic architecture of complex traits in the context of the unexplained heritability challenge.
Collapse
Affiliation(s)
- Paul L Auer
- School of Public Health, University of Wisconsin-Milwaukee, Milwaukee, WI 53201-0413 USA
| | - Guillaume Lettre
- Montreal Heart Institute and Université de Montréal, Montreal, Quebec H1T 1C8 Canada
| |
Collapse
|
134
|
Garner C. Confounded by sequencing depth in association studies of rare alleles. Genet Epidemiol 2015; 35:261-8. [PMID: 21328616 DOI: 10.1002/gepi.20574] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/12/2011] [Indexed: 11/12/2022]
Abstract
Next-generation DNA sequencing technologies are facilitating large-scale association studies of rare genetic variants. The depth of the sequence read coverage is an important experimental variable in the next-generation technologies and it is a major determinant of the quality of genotype calls generated from sequence data. When case and control samples are sequenced separately or in different proportions across batches, they are unlikely to be matched on sequencing read depth and a differential misclassification of genotypes can result, causing confounding and an increased false-positive rate. Data from Pilot Study 3 of the 1000 Genomes project was used to demonstrate that a difference between the mean sequencing read depth of case and control samples can result in false-positive association for rare and uncommon variants, even when the mean coverage depth exceeds 30× in both groups. The degree of the confounding and inflation in the false-positive rate depended on the extent to which the mean depth was different in the case and control groups. A logistic regression model was used to test for association between case-control status and the cumulative number of alleles in a collapsed set of rare and uncommon variants. Including each individual's mean sequence read depth across the variant sites in the logistic regression model nearly eliminated the confounding effect and the inflated false-positive rate. Furthermore, accounting for the potential error by modeling the probability of the heterozygote genotype calls in the regression analysis had a relatively minor but beneficial effect on the statistical results.
Collapse
Affiliation(s)
- Chad Garner
- Department of Epidemiology, University of California, Irvine, CA 92697-3905, USA.
| |
Collapse
|
135
|
Kotze MJ, Lückhoff HK, Peeters AV, Baatjes K, Schoeman M, van der Merwe L, Grant KA, Fisher LR, van der Merwe N, Pretorius J, van Velden DP, Myburgh EJ, Pienaar FM, van Rensburg SJ, Yako YY, September AV, Moremi KE, Cronje FJ, Tiffin N, Bouwens CSH, Bezuidenhout J, Apffelstaedt JP, Hough FS, Erasmus RT, Schneider JW. Genomic medicine and risk prediction across the disease spectrum. Crit Rev Clin Lab Sci 2015; 52:120-37. [PMID: 25597499 DOI: 10.3109/10408363.2014.997930] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Genomic medicine is based on the knowledge that virtually every medical condition, disease susceptibility or response to treatment is caused, regulated or influenced by genes. Genetic testing may therefore add value across the disease spectrum, ranging from single-gene disorders with a Mendelian inheritance pattern to complex multi-factorial diseases. The critical factors for genomic risk prediction are to determine: (1) where the genomic footprint of a particular susceptibility or dysfunction resides within this continuum, and (2) to what extent the genetic determinants are modified by environmental exposures. Regarding the small subset of highly penetrant monogenic disorders, a positive family history and early disease onset are mostly sufficient to determine the appropriateness of genetic testing in the index case and to inform pre-symptomatic diagnosis in at-risk family members. In more prevalent polygenic non-communicable diseases (NCDs), the use of appropriate eligibility criteria is required to ensure a balance between benefit and risk. An additional screening step may therefore be necessary to identify individuals most likely to benefit from genetic testing. This need provided the stimulus for the development of a pathology-supported genetic testing (PSGT) service as a new model for the translational implementation of genomic medicine in clinical practice. PSGT is linked to the establishment of a research database proven to be an invaluable resource for the validation of novel and previously described gene-disease associations replicated in the South African population for a broad range of NCDs associated with increased cardio-metabolic risk. The clinical importance of inquiry concerning family history in determining eligibility for personalized genotyping was supported beyond its current limited role in diagnosing or screening for monogenic subtypes of NCDs. With the recent introduction of advanced microarray-based breast cancer subtyping, genetic testing has extended beyond the genome of the host to also include tumor gene expression profiling for chemotherapy selection. The decreasing cost of next generation sequencing over recent years, together with improvement of both laboratory and computational protocols, enables the mapping of rare genetic disorders and discovery of shared genetic risk factors as novel therapeutic targets across diagnostic boundaries. This article reviews the challenges, successes, increasing inter-disciplinary integration and evolving strategies for extending PSGT towards exome and whole genome sequencing (WGS) within a dynamic framework. Specific points of overlap are highlighted between the application of PSGT and exome or WGS, as the next logical step in genetically uncharacterized patients for whom a particular disease pattern and/or therapeutic failure are not adequately accounted for during the PSGT pre-screen. Discrepancies between different next generation sequencing platforms and low concordance among variant-calling pipelines caution against offering exome or WGS as a stand-alone diagnostic approach. The public reference human genome sequence (hg19) contains minor alleles at more than 1 million loci and variant calling using an advanced major allele reference genome sequence is crucial to ensure data integrity. Understanding that genomic risk prediction is not deterministic but rather probabilistic provides the opportunity for disease prevention and targeted treatment in a way that is unique to each individual patient.
Collapse
Affiliation(s)
- Maritha J Kotze
- Division of Anatomical Pathology, Department of Pathology, Faculty of Medicine and Health Sciences, Stellenbosch University , Cape Town , South Africa
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
136
|
Morris BJ. Renin, genes, microRNAs, and renal mechanisms involved in hypertension. Hypertension 2015; 65:956-62. [PMID: 25601934 DOI: 10.1161/hypertensionaha.114.04366] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 12/23/2014] [Indexed: 12/20/2022]
Affiliation(s)
- Brian J Morris
- From the Basic & Clinical Genomics Laboratory, School of Medical Sciences and Bosch Institute, The University of Sydney, Sydney, New South Wales, Australia.
| |
Collapse
|
137
|
Castellanos-Rizaldos E, Richardson K, Lin R, Wu G, Makrigiorgos MG. Single-tube, highly parallel mutation enrichment in cancer gene panels by use of temperature-tolerant COLD-PCR. Clin Chem 2015; 61:267-77. [PMID: 25297854 PMCID: PMC4281501 DOI: 10.1373/clinchem.2014.228361] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
BACKGROUND Multiplexed detection of low-level mutations presents a technical challenge for many technologies, including cancer gene panels used for targeted-resequencing. Analysis of mutations below approximately 2%-5% abundance in tumors with heterogeneity, samples with stromal contamination, or biofluids is problematic owing to increased noise from sequencing errors. Technologies that reduce noise via deep sequencing unavoidably reduce throughput and increase cost. Here we provide proof of principle that coamplification at lower denaturation temperature (COLD)-PCR technology enables multiplex low-level mutation detection in cancer gene panels while retaining throughput. METHODS We have developed a multiplex temperature-tolerant COLD-PCR (fast-TT-COLD-PCR) approach that uses cancer gene panels developed for massively parallel sequencing. After multiplex preamplification from genomic DNA, we attach tails to all amplicons and perform fast-TT-COLD-PCR. This approach gradually increases denaturation temperatures in a step-wise fashion, such that all possible denaturation temperatures are encompassed. By introducing modified nucleotides, fast-COLD-PCR is adapted to enrich for melting temperature (Tm)-increasing mutations over all amplicons, in a single tube. Therefore, in separate reactions, both Tm-decreasing and Tm-increasing mutations are enriched. RESULTS Using custom-made and commercial gene panels containing 8, 50, 190, or 16 000 amplicons, we demonstrate that fast-TT-COLD-PCR enriches mutations on all examined targets simultaneously. Incorporation of deoxyinosine triphosphate (dITP)/2,6-diaminopurine triphosphate (dDTP) in place of deoxyguanosine triphosphate (dGTP)/deoxyadenosine triphosphate (dATP) enables enrichment of Tm-increasing mutations. Serial dilution experiments demonstrate a limit of detection of approximately 0.01%-0.1% mutation abundance by use of Ion-Torrent and 0.1%-0.3% by use of Sanger sequencing. CONCLUSIONS Fast-TT-COLD-PCR improves the limit of detection of cancer gene panels by enabling mutation enrichment in multiplex, single-tube reactions. This novel adaptation of COLD-PCR converts subclonal mutations to clonal, thereby facilitating detection and subsequent mutation sequencing.
Collapse
Affiliation(s)
| | | | - Rui Lin
- Transgenomic Inc., Omaha, NE
| | | | - Mike G Makrigiorgos
- Division of DNA Repair and Genome Stability and Division of Medical Physics and Biophysics, Department of Radiation Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA;
| |
Collapse
|
138
|
A Genomic Data Fusion Framework to Exploit Rare and Common Variants for Association Discovery. Artif Intell Med 2015. [DOI: 10.1007/978-3-319-19551-3_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
|
139
|
Lin WY. Adaptive combination of P-values for family-based association testing with sequence data. PLoS One 2014; 9:e115971. [PMID: 25541952 PMCID: PMC4277421 DOI: 10.1371/journal.pone.0115971] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2014] [Accepted: 12/01/2014] [Indexed: 12/24/2022] Open
Abstract
Family-based study design will play a key role in identifying rare causal variants, because rare causal variants can be enriched in families with multiple affected subjects. Furthermore, different from population-based studies, family studies are robust to bias induced by population substructure. It is well known that rare causal variants are difficult to detect from single-locus tests. Therefore, burden tests and non-burden tests have been developed, by combining signals of multiple variants in a chromosomal region or a functional unit. This inevitably incorporates some neutral variants into the test statistics, which can dilute the power of statistical methods. To guard against the noise caused by neutral variants, we here propose an 'adaptive combination of P-values method' (abbreviated as 'ADA'). This method combines per-site P-values of variants that are more likely to be causal. Variants with large P-values (which are more likely to be neutral variants) are discarded from the combined statistic. In addition to performing extensive simulation studies, we applied these tests to the Genetic Analysis Workshop 17 data sets, where real sequence data were generated according to the 1000 Genomes Project. Compared with some existing methods, ADA is more robust to the inclusion of neutral variants. This is a merit especially when dichotomous traits are analyzed. However, there are some limitations for ADA. First, it is more computationally intensive. Second, pedigree structures and founders' sequence data are required for the permutation procedure. Third, unrelated controls cannot be included. We here show that, for family-based studies, the application of ADA is limited to dichotomous trait analyses with full pedigree information.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
140
|
Norden-Krichmar TM, Gizer IR, Wilhelmsen KC, Schork NJ, Ehlers CL. Protective variant associated with alcohol dependence in a Mexican American cohort. BMC MEDICAL GENETICS 2014; 15:136. [PMID: 25527893 PMCID: PMC4337107 DOI: 10.1186/s12881-014-0136-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/18/2014] [Accepted: 12/08/2014] [Indexed: 01/11/2023]
Abstract
Background Mexican Americans, particularly those born in the United States, are at greater risk for alcohol associated morbidity and mortality. The present study sought to investigate whether specific genetic variants may be associated with alcohol use disorder phenotypes in a select population of Mexican American young adults. Methods The study evaluated a cohort of 427 (age 18 – 30 years) Mexican American men (n = 171) and women (n = 256). Information on alcohol dependence was obtained through interview using the Semi-Structured Assessment for the Genetics of Alcoholism (SSAGA). For all subjects, DNA was extracted from blood samples, followed by genotyping using an Affymetrix Axiom Exome1A chip. Results A protective variant (rs991316) located downstream from the ADH7 (alcohol dehydrogenase 7) gene showed suggestive significance in association with alcohol dependence symptom counts derived from DSM-III-R and DSM-IV criteria, as well as to clustered alcohol dependence symptoms. Additional linkage analysis suggested that nearby variants in linkage disequilibrium with rs991316 were not responsible for the observed association with the alcohol dependence phenotypes in this study. Conclusions ADH7 has been shown to have a protective role against alcohol dependence in previous studies involving other ethnicities, but has not been reported for Mexican Americans. These results suggest that variants near ADH7 may play a role in protection from alcohol dependence in this Mexican American cohort.
Collapse
|
141
|
Xu M, Wang HZ, Guo W, Qin H, Shugart YY. Family-based tests applied to extended pedigrees identify rare variants related to hypertension. BMC Proc 2014; 8:S31. [PMID: 25519318 PMCID: PMC4143699 DOI: 10.1186/1753-6561-8-s1-s31] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The application of family-based tests to whole-genome sequenced data provides a new window on the role of rare variant alleles in the etiology of disease. By applying family-based tests to these data, we can now identify rare variants associated with disease. Approaches for common variants, by contrast, require large sample sizes for power, and are powerless when faced with rare variants. When we tested Yip et al's 2011 family-based association tests for rare variants on pedigrees from the Genetic Analysis Workshop 18, we found that weighted collapsing methods generally have more power than unweighted methods, but are more prone to type I errors. We then evaluated a sliding window modification of the weighted family-based association tests for rare variants method. Although this modification inflates the rate of false positives, it significantly increases the power of family-based association tests for rare variants to identify causal rare variants.
Collapse
Affiliation(s)
- Mengyuan Xu
- Division of Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892, USA
| | - Harold Z Wang
- Division of Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892, USA
| | - Wei Guo
- Division of Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892, USA
| | - Haide Qin
- Division of Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892, USA
| | - Yin Y Shugart
- Division of Intramural Research Program, National Institute of Mental Health, National Institutes of Health, Building 35, Room 3A 1000, 35 Convent Drive, Bethesda, MD 20892, USA
| |
Collapse
|
142
|
Li C, Yang C, Chen M, Chen X, Hou L, Zhao H. Adjustment of familial relatedness in association test for rare variants. BMC Proc 2014; 8:S39. [PMID: 25519384 PMCID: PMC4143885 DOI: 10.1186/1753-6561-8-s1-s39] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
High-throughput sequencing technology allows researchers to test associations between phenotypes and all the variants identified throughout the genome, and is especially useful for analyzing rare variants. However, the statistical power to identify phenotype-associated rare variants is very low with typical genome-wide association studies because of their low allele frequencies among unrelated individuals. In contrast, a family-based design may have more power because rare variants are more likely to be enriched in families than among unrelated individuals. Regardless, an analysis of family-based association studies needs to account appropriately for relatedness between family members. We analyzed the observed quantitative trait systolic blood pressure as well as the simulated Q1 data in the Genetic Analysis Workshop 18 data set using 4 tests: (a) a single-variant test, (b) a collapsing test, (c) a single-variant test where familial relatedness was accounted for, and (d) a collapsing test where familial relatedness was accounted for. We then compared the results of the 4 methods and observed that adjusting for familial relatedness could appropriately control the false-positive rate while maintaining reasonable power to detect several strongly associated variants/genes.
Collapse
Affiliation(s)
- Cong Li
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Can Yang
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06520, USA
| | - Mengjie Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Xiaowei Chen
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520, USA
| | - Lin Hou
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06520, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, 60 College Street, New Haven, CT 06520, USA
| |
Collapse
|
143
|
Spataro N, Calafell F, Cervera-Carles L, Casals F, Pagonabarraga J, Pascual-Sedano B, Campolongo A, Kulisevsky J, Lleó A, Navarro A, Clarimón J, Bosch E. Mendelian genes for Parkinson's disease contribute to the sporadic forms of the disease†. Hum Mol Genet 2014; 24:2023-34. [PMID: 25504046 DOI: 10.1093/hmg/ddu616] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Affiliation(s)
- Nino Spataro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Laura Cervera-Carles
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Ferran Casals
- Genomics Core Facility, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain
| | - Javier Pagonabarraga
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Berta Pascual-Sedano
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Antònia Campolongo
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Jaime Kulisevsky
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain, Health Sciences Department, Universitat Oberta de Catalunya, Catalonia, Spain
| | - Alberto Lleó
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain, National Institute for Bioinformatics (INB), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain and Center for Genomic Regulation (CRG), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain
| | - Jordi Clarimón
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Elena Bosch
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain,
| |
Collapse
|
144
|
Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014; 29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, set-based association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.,Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu 221004, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jianwei Gou
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| |
Collapse
|
145
|
Lu M, Lee HS, Hadley D, Huang JZ, Qian X. Logistic Principal Component Analysis for Rare Variants in Gene-Environment Interaction Analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1020-1028. [PMID: 26357039 DOI: 10.1109/tcbb.2014.2322371] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
The characteristics of low minor allele frequency (MAF) and weak individual effects make genome-wide association studies (GWAS) for rare variant single nucleotide polymorphisms (SNPs) more difficult when using conventional statistical methods. By aggregating the rare variant effects belonging to the same gene, collapsing is the most common way to enhance the detection of rare variant effects for association analyses with a given trait. In this paper, we propose a novel framework of MAF-based logistic principal component analysis (MLPCA) to derive aggregated statistics by explicitly modeling the correlation between rare variant SNP data, which is categorical. The derived aggregated statistics by MLPCA can then be tested as a surrogate variable in regression models to detect the gene-environment interaction from rare variants. In addition, MLPCA searches for the optimal linear combination from the best subset of rare variants according to MAF that has the maximum association with the given trait. We compared the power of our MLPCA-based methods with four existing collapsing methods in gene-environment interaction association analysis using both our simulation data set and Genetic Analysis Workshop 17 (GAW17) data. Our experimental results have demonstrated that MLPCA on two forms of genotype data representations achieves higher statistical power than those existing methods and can be further improved by introducing the appropriate sparsity penalty. The performance improvement by our MLPCA-based methods result from the derived aggregated statistics by explicitly modeling categorical SNP data and searching for the maximum associated subset of SNPs for collapsing, which helps better capture the combined effect from individual rare variants and the interaction with environmental factors.
Collapse
|
146
|
Forstner AJ, Basmanav FB, Mattheisen M, Böhmer AC, Hollegaard MV, Janson E, Strengman E, Priebe L, Degenhardt F, Hoffmann P, Herms S, Maier W, Mössner R, Rujescu D, Ophoff RA, Moebus S, Mortensen PB, Børglum AD, Hougaard DM, Frank J, Witt SH, Rietschel M, Zimmer A, Nöthen MM, Miró X, Cichon S. Investigation of the involvement of MIR185 and its target genes in the development of schizophrenia. J Psychiatry Neurosci 2014; 39:386-96. [PMID: 24936775 PMCID: PMC4214873 DOI: 10.1503/jpn.130189] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Schizophrenia is a complex neuropsychiatric disorder of unclear etiology. The strongest known genetic risk factor is the 22q11.2 microdeletion. Research has yet to confirm which genes within the deletion region are implicated in schizophrenia. The minimal 1.5 megabase deletion contains MIR185, which encodes microRNA 185. METHODS We determined miR-185 expression in embryonic and adult mouse brains. Common and rare variants at this locus were then investigated using a human genetics approach. First, we performed gene-based analyses for MIR185 common variants and target genes using Psychiatric Genomics Consortium genome-wide association data. Second, MIR185 was resequenced in German patients (n = 1000) and controls (n = 500). We followed up promising variants by genotyping an additional European sample (patients, n = 3598; controls, n = 4082). RESULTS In situ hybridization in mice revealed miR-185 expression in brain regions implicated in schizophrenia. Gene-based tests revealed association between common variants in 3 MIR185 target genes (ATAT1, SH3PXD2A, NTRK3) and schizophrenia. Further analyses in mice revealed overlapping expression patterns for these target genes and miR-185. Resequencing identified 2 rare patient-specific novel variants flanking MIR185. However, follow-up genotyping provided no further evidence of their involvement in schizophrenia. LIMITATIONS Power to detect rare variant associations was limited. CONCLUSION Human genetic analyses generated no evidence of the involvement of MIR185 in schizophrenia. However, the expression patterns of miR-185 and its target genes in mice, and the genetic association results for the 3 target genes, suggest that further research into the involvement of miR-185 and its downstream pathways in schizophrenia is warranted.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Markus M. Nöthen
- Correspondence to: M.M. Nöthen, Institute of Human Genetics, University of Bonn, Sigmund-Freud-Str. 25, 53127 Bonn, Germany;
| | | | | |
Collapse
|
147
|
Saad M, Wijsman EM. Combining family- and population-based imputation data for association analysis of rare and common variants in large pedigrees. Genet Epidemiol 2014; 38:579-90. [PMID: 25132070 PMCID: PMC4190076 DOI: 10.1002/gepi.21844] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 05/24/2014] [Accepted: 06/27/2014] [Indexed: 12/27/2022]
Abstract
In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.
Collapse
Affiliation(s)
- Mohamad Saad
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| | - Ellen M. Wijsman
- Division of Medical Genetics, Department of Medicine; and Department
of Biostatistics, University of Washington, Seattle, WA 98195, USA
| |
Collapse
|
148
|
Sung YJ, Korthauer KD, Swartz MD, Engelman CD. Methods for collapsing multiple rare variants in whole-genome sequence data. Genet Epidemiol 2014; 38 Suppl 1:S13-20. [PMID: 25112183 DOI: 10.1002/gepi.21820] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Genetic Analysis Workshop 18 provided whole-genome sequence data in a pedigree-based sample and longitudinal phenotype data for hypertension and related traits, presenting an excellent opportunity for evaluating analysis choices. We summarize the nine contributions to the working group on collapsing methods, which evaluated various approaches for the analysis of multiple rare variants. One contributor defined a variant prioritization scheme, whereas the remaining eight contributors evaluated statistical methods for association analysis. Six contributors chose the gene as the genomic region for collapsing variants, whereas three contributors chose nonoverlapping sliding windows across the entire genome. Statistical methods spanned most of the published methods, including well-established burden tests, variance-components-type tests, and recently developed hybrid approaches. Lesser known methods, such as functional principal components analysis, higher criticism, and homozygosity association, and some newly introduced methods were also used. We found that performance of these methods depended on the characteristics of the genomic region, such as effect size and direction of variants under consideration. Except for MAP4 and FLT3, the performance of all statistical methods to identify rare casual variants was disappointingly poor, providing overall power almost identical to the type I error. This poor performance may have arisen from a combination of (1) small sample size, (2) small effects of most of the causal variants, explaining a small fraction of variance, (3) use of incomplete annotation information, and (4) linkage disequilibrium between causal variants in a gene and noncausal variants in nearby genes. Our findings demonstrate challenges in analyzing rare variants identified from sequence data.
Collapse
Affiliation(s)
- Yun Ju Sung
- Division of Biostatistics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | | | | | | |
Collapse
|
149
|
Kim JH, Song P, Lim H, Lee JH, Lee JH, Park SA. Gene-based rare allele analysis identified a risk gene of Alzheimer's disease. PLoS One 2014; 9:e107983. [PMID: 25329708 PMCID: PMC4203677 DOI: 10.1371/journal.pone.0107983] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Accepted: 08/25/2014] [Indexed: 12/17/2022] Open
Abstract
Alzheimer’s disease (AD) has a strong propensity to run in families. However, the known risk genes excluding APOE are not clinically useful. In various complex diseases, gene studies have targeted rare alleles for unsolved heritability. Our study aims to elucidate previously unknown risk genes for AD by targeting rare alleles. We used data from five publicly available genetic studies from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and the database of Genotypes and Phenotypes (dbGaP). A total of 4,171 cases and 9,358 controls were included. The genotype information of rare alleles was imputed using 1,000 genomes. We performed gene-based analysis of rare alleles (minor allele frequency≤3%). The genome-wide significance level was defined as meta P<1.8×10–6 (0.05/number of genes in human genome = 0.05/28,517). ZNF628, which is located at chromosome 19q13.42, showed a genome-wide significant association with AD. The association of ZNF628 with AD was not dependent on APOE ε4. APOE and TREM2 were also significantly associated with AD, although not at genome-wide significance levels. Other genes identified by targeting common alleles could not be replicated in our gene-based rare allele analysis. We identified that rare variants in ZNF628 are associated with AD. The protein encoded by ZNF628 is known as a transcription factor. Furthermore, the associations of APOE and TREM2 with AD were highly significant, even in gene-based rare allele analysis, which implies that further deep sequencing of these genes is required in AD heritability studies.
Collapse
Affiliation(s)
- Jong Hun Kim
- Department of Neurology, Dementia Center, Stroke Center, Ilsan hospital, National Health Insurance Service, Goyang-shi, South Korea
| | - Pamela Song
- Department of Neurology, Inje University Ilsan Paik Hospital, Goyang-shi, South Korea
| | - Hyunsun Lim
- Clinical Research Management Team, Ilsan hospital, National Health Insurance Service, Goyang-shi, South Korea
| | - Jae-Hyung Lee
- Department of Life and Nanopharmaceutical Sciences and Department of Maxillofacial Biomedical Engineering, School of Dentistry, Kyung Hee University, Seoul, South Korea
| | - Jun Hong Lee
- Department of Neurology, Dementia Center, Stroke Center, Ilsan hospital, National Health Insurance Service, Goyang-shi, South Korea
| | - Sun Ah Park
- Department of Neurology, Soonchunhyang University Bucheon Hospital, Bucheon-shi, South Korea
- * E-mail:
| | | |
Collapse
|
150
|
Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers. J Am Stat Assoc 2014. [DOI: 10.1080/01621459.2014.923775] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|