101
|
Abstract
Rare variants may, in part, explain some of the hereditability missing in current genome-wide association studies. Many gene-based rare-variant analysis approaches proposed in recent years are aimed at population-based samples, although analysis strategies for family-based samples are clearly warranted since the family-based design has the potential to enhance our ability to enrich for rare causal variants. We have recently developed the generalized least squares, sequence kernel association test, or GLS-SKAT, approach for the rare-variant analyses in family samples, in which the kinship matrix that was computed from the high dimension genetic data was used to decorrelate the family structure. We then applied the SKAT-O approach for gene-/region-based inference in the decorrelated data. In this study, we applied this GLS-SKAT method to the systolic blood pressure data in the simulated family sample distributed by the Genetic Analysis Workshop 18. We compared the GLS-SKAT approach to the rare-variant analysis approach implemented in family-based association test-v1 and demonstrated that the GLS-SKAT approach provides superior power and good control of type I error rate.
Collapse
Affiliation(s)
- Dalin Li
- Medical Genetics Institute, Cedars-Sinai Medical Center, Los Angeles, CA 90048, USA ; David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Jerome I Rotter
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA ; Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| | - Xiuqing Guo
- David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA ; Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute and Department of Pediatrics, Harbor-UCLA Medical Center, Torrance, CA 90502, USA
| |
Collapse
|
102
|
Spataro N, Calafell F, Cervera-Carles L, Casals F, Pagonabarraga J, Pascual-Sedano B, Campolongo A, Kulisevsky J, Lleó A, Navarro A, Clarimón J, Bosch E. Mendelian genes for Parkinson's disease contribute to the sporadic forms of the disease†. Hum Mol Genet 2014; 24:2023-34. [PMID: 25504046 DOI: 10.1093/hmg/ddu616] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Affiliation(s)
- Nino Spataro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Francesc Calafell
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain
| | - Laura Cervera-Carles
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Ferran Casals
- Genomics Core Facility, Universitat Pompeu Fabra, Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain
| | - Javier Pagonabarraga
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Berta Pascual-Sedano
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Antònia Campolongo
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Jaime Kulisevsky
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain, Health Sciences Department, Universitat Oberta de Catalunya, Catalonia, Spain
| | - Alberto Lleó
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Arcadi Navarro
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain, National Institute for Bioinformatics (INB), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain and Center for Genomic Regulation (CRG), Barcelona Biomedical Research Park (PRBB), 08003 Barcelona, Spain
| | - Jordi Clarimón
- Department of Neurology, Institut d'Investigacions Biomèdiques Sant Pau-Hospital de Sant Pau, Universitat Autònoma de Barcelona, 08025 Barcelona, Spain, Center for Networking Biomedical Research in Neurodegenerative Diseases (CIBERNED), Madrid, Spain
| | - Elena Bosch
- Institute of Evolutionary Biology (CSIC-UPF), Department of Experimental and Health Sciences, Universitat Pompeu Fabra, 08003 Barcelona, Spain,
| |
Collapse
|
103
|
Chhibber A, Kroetz DL, Tantisira KG, McGeachie M, Cheng C, Plenge R, Stahl E, Sadee W, Ritchie MD, Pendergrass SA. Genomic architecture of pharmacological efficacy and adverse events. Pharmacogenomics 2014; 15:2025-48. [PMID: 25521360 PMCID: PMC4308414 DOI: 10.2217/pgs.14.144] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The pharmacokinetic and pharmacodynamic disciplines address pharmacological traits, including efficacy and adverse events. Pharmacogenomics studies have identified pervasive genetic effects on treatment outcomes, resulting in the development of genetic biomarkers for optimization of drug therapy. Pharmacogenomics-based tests are already being applied in clinical decision making. However, despite substantial progress in identifying the genetic etiology of pharmacological response, current biomarker panels still largely rely on single gene tests with a large portion of the genetic effects remaining to be discovered. Future research must account for the combined effects of multiple genetic variants, incorporate pathway-based approaches, explore gene-gene interactions and nonprotein coding functional genetic variants, extend studies across ancestral populations, and prioritize laboratory characterization of molecular mechanisms. Because genetic factors can play a key role in drug response, accurate biomarker tests capturing the main genetic factors determining treatment outcomes have substantial potential for improving individual clinical care.
Collapse
Affiliation(s)
- Aparna Chhibber
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Deanna L Kroetz
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Kelan G Tantisira
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Michael McGeachie
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Cheng Cheng
- Department of Biostatistics, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Robert Plenge
- Division of Rheumatology, Immunology & Allergy, Division of Genetics, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Eli Stahl
- Department of Genetics & Genomic Sciences, Mount Sinai Hospital, New York, NY, USA
| | - Wolfgang Sadee
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Marylyn D Ritchie
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| | - Sarah A Pendergrass
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| |
Collapse
|
104
|
Zeng P, Zhao Y, Qian C, Zhang L, Zhang R, Gou J, Liu J, Liu L, Chen F. Statistical analysis for genome-wide association study. J Biomed Res 2014; 29:285-97. [PMID: 26243515 PMCID: PMC4547377 DOI: 10.7555/jbr.29.20140007] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 06/07/2014] [Accepted: 09/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the past few years, genome-wide association study (GWAS) has made great successes in identifying genetic susceptibility loci underlying many complex diseases and traits. The findings provide important genetic insights into understanding pathogenesis of diseases. In this paper, we present an overview of widely used approaches and strategies for analysis of GWAS, offered a general consideration to deal with GWAS data. The issues regarding data quality control, population structure, association analysis, multiple comparison and visual presentation of GWAS results are discussed; other advanced topics including the issue of missing heritability, meta-analysis, set-based association analysis, copy number variation analysis and GWAS cohort analysis are also briefly introduced.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.,Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu 221004, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Cheng Qian
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Ruyang Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jianwei Gou
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Jin Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Liya Liu
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu 211166, China.
| |
Collapse
|
105
|
Fan R, Wang Y, Mills JL, Carter TC, Lobach I, Wilson AF, Bailey-Wilson JE, Weeks DE, Xiong M. Generalized functional linear models for gene-based case-control association studies. Genet Epidemiol 2014; 38:622-637. [PMID: 25203683 PMCID: PMC4189986 DOI: 10.1002/gepi.21840] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2014] [Revised: 04/29/2014] [Accepted: 05/28/2014] [Indexed: 01/23/2023]
Abstract
By using functional data analysis techniques, we developed generalized functional linear models for testing association between a dichotomous trait and multiple genetic variants in a genetic region while adjusting for covariates. Both fixed and mixed effect models are developed and compared. Extensive simulations show that Rao's efficient score tests of the fixed effect models are very conservative since they generate lower type I errors than nominal levels, and global tests of the mixed effect models generate accurate type I errors. Furthermore, we found that the Rao's efficient score test statistics of the fixed effect models have higher power than the sequence kernel association test (SKAT) and its optimal unified version (SKAT-O) in most cases when the causal variants are both rare and common. When the causal variants are all rare (i.e., minor allele frequencies less than 0.03), the Rao's efficient score test statistics and the global tests have similar or slightly lower power than SKAT and SKAT-O. In practice, it is not known whether rare variants or common variants in a gene region are disease related. All we can assume is that a combination of rare and common variants influences disease susceptibility. Thus, the improved performance of our models when the causal variants are both rare and common shows that the proposed models can be very useful in dissecting complex traits. We compare the performance of our methods with SKAT and SKAT-O on real neural tube defects and Hirschsprung's disease datasets. The Rao's efficient score test statistics and the global tests are more sensitive than SKAT and SKAT-O in the real data analysis. Our methods can be used in either gene-disease genome-wide/exome-wide association studies or candidate gene analyses.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - Yifan Wang
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - James L. Mills
- Epidemiology Branch, Division of Intramural Population Health Research Eunice Kennedy Shriver National Institute of Child Health and Human Development National Institutes of Health, Rockville, MD 20852
| | - Tonia C. Carter
- Center for Human Genetics, Marshfield Clinic, Marshfield, WI 54449
| | - Iryna Lobach
- Department of Neurology, School of Medicine University of California, San Francisco, CA 94185
| | - Alexander F. Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch National Human Genome Research Institute National Institutes of Health, Bethesda, MD 20892
| | - Joan E. Bailey-Wilson
- Statistical Genetics Section, Computational and Statistical Genomics Branch National Human Genome Research Institute National Institutes of Health, Bethesda, MD 20892
| | - Daniel E. Weeks
- Departments of Human Genetics and Biostatistics, Graduate School of Public Health University of Pittsburgh, Pittsburgh, PA 15261
| | - Momiao Xiong
- Human Genetics Center, University of Texas - Houston P.O. Box 20334, Houston, Texas 77225
| |
Collapse
|
106
|
Chen H, Malzahn D, Balliu B, Li C, Bailey JN. Testing genetic association with rare and common variants in family data. Genet Epidemiol 2014; 38 Suppl 1:S37-43. [PMID: 25112186 DOI: 10.1002/gepi.21823] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
With the advance of next-generation sequencing technologies in recent years, rare genetic variant data have now become available for genetic epidemiology studies. For family samples, however, only a few statistical methods for association analysis of rare genetic variants have been developed. Rare variant approaches are of great interest, particularly for family data, because samples enriched for trait-relevant variants can be ascertained and rare variants are putatively enriched through segregation. To facilitate the evaluation of existing and new rare variant testing approaches for analyzing family data, Genetic Analysis Workshop 18 (GAW18) provided genotype and next-generation sequencing data and longitudinal blood pressure traits from extended pedigrees of Mexican American families from the San Antonio Family Study. Our GAW18 group members analyzed real and simulated phenotype data from GAW18 by using generalized linear mixed-effects models or principal components to adjust for familial correlation or by testing binary traits using a correction factor for familial effects. With one exception, approaches dealt with the extended pedigrees in their original state using information based on the kinship matrix or alternative genetic similarity measures. For simulated data our group demonstrated that the family-based kernel machine score test is superior in power to family-based single-marker or burden tests, except in a few specific scenarios. For real data three contributions identified significant associations. They substantially reduced the number of tests before performing the association analysis. We conclude from our real data analyses that further development of strategies for targeted testing or more focused screening of genetic variants is strongly desirable.
Collapse
Affiliation(s)
- Han Chen
- Department of Biostatistics, Boston University School of Public Health, Boston, Massachusetts, United States of America
| | | | | | | | | |
Collapse
|
107
|
Wen SH, Yeh JI. Cohen's h for detection of disease association with rare genetic variants. BMC Genomics 2014; 15:875. [PMID: 25294186 PMCID: PMC4198687 DOI: 10.1186/1471-2164-15-875] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2014] [Accepted: 10/03/2014] [Indexed: 11/16/2022] Open
Abstract
Background The power of the genome wide association studies starts to go down when the minor allele frequency (MAF) is below 0.05. Here, we proposed the use of Cohen’s h in detecting disease associated rare variants. The variance stabilizing effect based on the arcsine square root transformation of MAFs to generate Cohen’s h contributed to the statistical power for rare variants analysis. We re-analyzed published datasets, one microarray and one sequencing based, and used simulation to compare the performance of Cohen’s h with the risk difference (RD) and odds ratio (OR). Results The analysis showed that the type 1 error rate of Cohen’s h was as expected and Cohen’s h and RD were both less biased and had higher power than OR. The advantage of Cohen’s h was more obvious when MAF was less than 0.01. Conclusions Cohen’s h can increase the power to find genetic association of rare variants and diseases, especially when MAF is less than 0.01. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-875) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Jih-I Yeh
- Department of Molecular Biology and Human Genetics, Tzu-Chi University, 701, Sec 3, Chung-Yang Rd, Hualien 97004, Taiwan.
| |
Collapse
|
108
|
Bao R, Huang L, Andrade J, Tan W, Kibbe WA, Jiang H, Feng G. Review of current methods, applications, and data management for the bioinformatics analysis of whole exome sequencing. Cancer Inform 2014; 13:67-82. [PMID: 25288881 PMCID: PMC4179624 DOI: 10.4137/cin.s13779] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2014] [Revised: 07/06/2014] [Accepted: 07/07/2014] [Indexed: 12/21/2022] Open
Abstract
The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, pre-processing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics.
Collapse
Affiliation(s)
- Riyue Bao
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Lei Huang
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Jorge Andrade
- Center for Research Informatics, The University of Chicago, Chicago, IL, USA
| | - Wei Tan
- IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA
| | - Warren A Kibbe
- Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA
| | - Hongmei Jiang
- Department of Statistics, Northwestern University, Evanston, IL, USA
| | - Gang Feng
- Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA
| |
Collapse
|
109
|
Dering C, König IR, Ramsey LB, Relling MV, Yang W, Ziegler A. A comprehensive evaluation of collapsing methods using simulated and real data: excellent annotation of functionality and large sample sizes required. Front Genet 2014; 5:323. [PMID: 25309579 PMCID: PMC4164031 DOI: 10.3389/fgene.2014.00323] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2014] [Accepted: 08/28/2014] [Indexed: 01/23/2023] Open
Abstract
The advent of next generation sequencing (NGS) technologies enabled the investigation of the rare variant-common disease hypothesis in unrelated individuals, even on the genome-wide level. Analysis of this hypothesis requires tailored statistical methods as single marker tests fail on rare variants. An entire class of statistical methods collapses rare variants from a genomic region of interest (ROI), thereby aggregating rare variants. In an extensive simulation study using data from the Genetic Analysis Workshop 17 we compared the performance of 15 collapsing methods by means of a variety of pre-defined ROIs regarding minor allele frequency thresholds and functionality. Findings of the simulation study were additionally confirmed by a real data set investigating the association between methotrexate clearance and the SLCO1B1 gene in patients with acute lymphoblastic leukemia. Our analyses showed substantially inflated type I error levels for many of the proposed collapsing methods. Only four approaches yielded valid type I errors in all considered scenarios. None of the statistical tests was able to detect true associations over a substantial proportion of replicates in the simulated data. Detailed annotation of functionality of variants is crucial to detect true associations. These findings were confirmed in the analysis of the real data. Recent theoretical work showed that large power is achieved in gene-based analyses only if large sample sizes are available and a substantial proportion of causing rare variants is present in the gene-based analysis. Many of the investigated statistical approaches use permutation requiring high computational cost. There is a clear need for valid, powerful and fast to calculate test statistics for studies investigating rare variants.
Collapse
Affiliation(s)
- Carmen Dering
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany
| | - Laura B Ramsey
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Mary V Relling
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Wenjian Yang
- Pharmaceutical Department, St. Jude Children's Research Hospital Memphis, TN, USA
| | - Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein Lübeck, Germany ; Zentrum für Klinische Studien, Universität zu Lübeck Lübeck, Germany ; School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal Durban, South Africa
| |
Collapse
|
110
|
Abstract
Rare genetic variants have recently been studied for genome-wide associations with human complex diseases. Existing rare variant methods are based on the hypothesis-testing framework that predefined variant sets need to be tested separately. The power of those methods is contingent upon accurate selection of variants for testing, and frequently, common variants are left out for separate testing. In this article, we present a novel Bayesian method for simultaneous testing of all genome-wide variants across the whole frequency range. The method allows for much more flexible grouping of variants and dynamically combines them for joint testing. The method accounts for correlation among variant sets, such that only direct associations with the disease are reported, whereas indirect associations due to linkage disequilibrium are not. Consequently, the method can obtain much improved power and flexibility and simultaneously pinpoint multiple disease variants with high resolution. Additional covariates of categorical, discrete, and continuous values can also be added. We compared our method with seven existing categories of approaches for rare variant mapping. We demonstrate that our method achieves similar power to the best methods available to date when testing very rare variants in small SNP sets. When moderately rare or common variants are included, or when testing a large collection of variants, however, our method significantly outperforms all existing methods evaluated in this study. We further demonstrate the power and the usage of our method in a whole-genome resequencing study of type 1 diabetes.
Collapse
|
111
|
Lin YC, Hsieh AR, Hsiao CL, Wu SJ, Wang HM, Lian IB, Fann CSJ. Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model. J Biomed Sci 2014; 21:88. [PMID: 25175702 PMCID: PMC4428531 DOI: 10.1186/s12929-014-0088-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 08/21/2014] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. RESULTS We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. CONCLUSIONS Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.
Collapse
Affiliation(s)
- Ying-Chao Lin
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan. .,Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ai-Ru Hsieh
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan.
| | - Ching-Lin Hsiao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Shang-Jung Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Hui-Min Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ie-Bin Lian
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, Changhua, Taiwan.
| | - Cathy S J Fann
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
112
|
Zhang Q, Wang L, Koboldt D, Boreki IB, Province MA. Adjusting family relatedness in data-driven burden test of rare variants. Genet Epidemiol 2014; 38:722-7. [PMID: 25169066 DOI: 10.1002/gepi.21848] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Revised: 07/01/2014] [Accepted: 07/16/2014] [Indexed: 11/08/2022]
Abstract
Family data represent a rich resource for detecting association between rare variants (RVs) and human traits. However, most RV association analysis methods developed in recent years are data-driven burden tests which can adaptively learn weights from data but require permutation to evaluate significance, thus are not readily applicable to family data, because random permutation will destroy family structure. Direct application of these methods to family data may result in a significant inflation of false positives. To overcome this issue, we have developed a generalized, weighted sum mixed model (WSMM), and corresponding computational techniques that can incorporate family information into data-driven burden tests, and allow adaptive and efficient permutation test in family data. Using simulated and real datasets, we demonstrate that the WSMM method can be used to appropriately adjust for genetic relatedness among family members and has a good control for the inflation of false positives. We compare WSMM with a nondata-driven, family-based Sequence Kernel Association Test (famSKAT), showing that WSMM has significantly higher power in some cases. WSMM provides a generalized, flexible framework for adapting different data-driven burden tests to analyze data with any family structures, and it can be extended to binary and time-to-onset traits, with or without covariates.
Collapse
Affiliation(s)
- Qunyuan Zhang
- Division of Statistical Genomics, Washington University School of Medicine, St. Louis, Missouri, United States of America
| | | | | | | | | |
Collapse
|
113
|
Zhang Y, Xu Z, Shen X, Pan W. Testing for association with multiple traits in generalized estimation equations, with application to neuroimaging data. Neuroimage 2014; 96:309-25. [PMID: 24704269 PMCID: PMC4043944 DOI: 10.1016/j.neuroimage.2014.03.061] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2013] [Revised: 02/14/2014] [Accepted: 03/23/2014] [Indexed: 11/17/2022] Open
Abstract
There is an increasing need to develop and apply powerful statistical tests to detect multiple traits-single locus associations, as arising from neuroimaging genetics and other studies. For example, in the Alzheimer's Disease Neuroimaging Initiative (ADNI), in addition to genome-wide single nucleotide polymorphisms (SNPs), thousands of neuroimaging and neuropsychological phenotypes as intermediate phenotypes for Alzheimer's disease, have been collected. Although some classic methods like MANOVA and newly proposed methods may be applied, they have their own limitations. For example, MANOVA cannot be applied to binary and other discrete traits. In addition, the relationships among these methods are not well understood. Importantly, since these tests are not data adaptive, depending on the unknown association patterns among multiple traits and between multiple traits and a locus, these tests may or may not be powerful. In this paper we propose a class of data-adaptive weights and the corresponding weighted tests in the general framework of generalized estimation equations (GEE). A highly adaptive test is proposed to select the most powerful one from this class of the weighted tests so that it can maintain high power across a wide range of situations. Our proposed tests are applicable to various types of traits with or without covariates. Importantly, we also analytically show relationships among some existing and our proposed tests, indicating that many existing tests are special cases of our proposed tests. Extensive simulation studies were conducted to compare and contrast the power properties of various existing and our new methods. Finally, we applied the methods to an ADNI dataset to illustrate the performance of the methods. We conclude with the recommendation for the use of the GEE-based Score test and our proposed adaptive test for their high and complementary performance.
Collapse
Affiliation(s)
- Yiwei Zhang
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA
| | - Zhiyuan Xu
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, Minneapolis, MN 55455, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, Minneapolis, MN 55455, USA.
| |
Collapse
|
114
|
Lippert C, Xiang J, Horta D, Widmer C, Kadie C, Heckerman D, Listgarten J. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. ACTA ACUST UNITED AC 2014; 30:3206-14. [PMID: 25075117 PMCID: PMC4221116 DOI: 10.1093/bioinformatics/btu504] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Set-based variance component tests have been identified as a way to increase power in association studies by aggregating weak individual effects. However, the choice of test statistic has been largely ignored even though it may play an important role in obtaining optimal power. We compared a standard statistical test—a score test—with a recently developed likelihood ratio (LR) test. Further, when correction for hidden structure is needed, or gene–gene interactions are sought, state-of-the art algorithms for both the score and LR tests can be computationally impractical. Thus we develop new computationally efficient methods. Results: After reviewing theoretical differences in performance between the score and LR tests, we find empirically on real data that the LR test generally has more power. In particular, on 15 of 17 real datasets, the LR test yielded at least as many associations as the score test—up to 23 more associations—whereas the score test yielded at most one more association than the LR test in the two remaining datasets. On synthetic data, we find that the LR test yielded up to 12% more associations, consistent with our results on real data, but also observe a regime of extremely small signal where the score test yielded up to 25% more associations than the LR test, consistent with theory. Finally, our computational speedups now enable (i) efficient LR testing when the background kernel is full rank, and (ii) efficient score testing when the background kernel changes with each test, as for gene–gene interaction tests. The latter yielded a factor of 2000 speedup on a cohort of size 13 500. Availability: Software available at http://research.microsoft.com/en-us/um/redmond/projects/MSCompBio/Fastlmm/. Contact:heckerma@microsoft.com Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christoph Lippert
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - Jing Xiang
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - Danilo Horta
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - Christian Widmer
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - Carl Kadie
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - David Heckerman
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| | - Jennifer Listgarten
- eScience Research Group, Microsoft Research, Los Angeles, CA, 90024 and eScience Research Group, Microsoft Research, Redmond, WA, 98052, USA
| |
Collapse
|
115
|
Guo W, Shugart YY. The power comparison of the haplotype-based collapsing tests and the variant-based collapsing tests for detecting rare variants in pedigrees. BMC Genomics 2014; 15:632. [PMID: 25070353 PMCID: PMC4131059 DOI: 10.1186/1471-2164-15-632] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2013] [Accepted: 07/18/2014] [Indexed: 11/20/2022] Open
Abstract
Background Both common and rare genetic variants have been shown to contribute to the etiology of complex diseases. Recent genome-wide association studies (GWAS) have successfully investigated how common variants contribute to the genetic factors associated with common human diseases. However, understanding the impact of rare variants, which are abundant in the human population (one in every 17 bases), remains challenging. A number of statistical tests have been developed to analyze collapsed rare variants identified by association tests. Here, we propose a haplotype-based approach. This work inspired by an existing statistical framework of the pedigree disequilibrium test (PDT), which uses genetic data to assess the effects of variants in general pedigrees. We aim to compare the performance between the haplotype-based approach and the rare variant-based approach for detecting rare causal variants in pedigrees. Results Extensive simulations in the sequencing setting were carried out to evaluate and compare the haplotype-based approach with the rare variant methods that drew on a more conventional collapsing strategy. As assessed through a variety of scenarios, the haplotype-based pedigree tests had enhanced statistical power compared with the rare variants based pedigree tests when the disease of interest was mainly caused by rare haplotypes (with multiple rare alleles), and vice versa when disease was caused by rare variants acting independently. For most of other situations when disease was caused both by haplotypes with multiple rare alleles and by rare variants with similar effects, these two approaches provided similar power in testing for association. Conclusions The haplotype-based approach was designed to assess the role of rare and potentially causal haplotypes. The proposed rare variants-based pedigree tests were designed to assess the role of rare and potentially causal variants. This study clearly documented the situations under which either method performs better than the other. All tests have been implemented in a software, which was submitted to the Comprehensive R Archive Network (CRAN) for general use as a computer program named rvHPDT.
Collapse
Affiliation(s)
| | - Yin Yao Shugart
- Division of Intramural Division Program, National Institute of Mental Health, National Institute of Health, 35 Convent Drive, Bethesda, MD 20892, USA.
| |
Collapse
|
116
|
Sha Q, Zhang S. A rare variant association test based on combinations of single-variant tests. Genet Epidemiol 2014; 38:494-501. [PMID: 25065727 DOI: 10.1002/gepi.21834] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Revised: 04/17/2014] [Accepted: 05/19/2014] [Indexed: 01/22/2023]
Abstract
Next generation sequencing technologies make direct testing rare variant associations possible. However, the development of powerful statistical methods for rare variant association studies is still underway. Most of existing methods are burden and quadratic tests. Recent studies show that the performance of each of burden and quadratic tests depends strongly upon the underlying assumption and no test demonstrates consistently acceptable power. Thus, combined tests by combining information from the burden and quadratic tests have been proposed recently. However, results from recent studies (including this study) show that there exist tests that can outperform both burden and quadratic tests. In this article, we propose three classes of tests that include tests outperforming both burden and quadratic tests. Then, we propose the optimal combination of single-variant tests (OCST) by combining information from tests of the three classes. We use extensive simulation studies to compare the performance of OCST with that of burden, quadratic and optimal single-variant tests. Our results show that OCST either is the most powerful test or has similar power with the most powerful test. We also compare the performance of OCST with that of the two existing combined tests. Our results show that OCST has better power than the two combined tests.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | | |
Collapse
|
117
|
Chen H, Meigs JB, Dupuis J. Incorporating gene-environment interaction in testing for association with rare genetic variants. Hum Hered 2014; 78:81-90. [PMID: 25060534 PMCID: PMC4169076 DOI: 10.1159/000363347] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2014] [Accepted: 05/03/2014] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES The incorporation of gene-environment interactions could improve the ability to detect genetic associations with complex traits. For common genetic variants, single-marker interaction tests and joint tests of genetic main effects and gene-environment interaction have been well-established and used to identify novel association loci for complex diseases and continuous traits. For rare genetic variants, however, single-marker tests are severely underpowered due to the low minor allele frequency, and only a few gene-environment interaction tests have been developed. We aimed at developing powerful and computationally efficient tests for gene-environment interaction with rare variants. METHODS In this paper, we propose interaction and joint tests for testing gene-environment interaction of rare genetic variants. Our approach is a generalization of existing gene-environment interaction tests for multiple genetic variants under certain conditions. RESULTS We show in our simulation studies that our interaction and joint tests have correct type I errors, and that the joint test is a powerful approach for testing genetic association, allowing for gene-environment interaction. We also illustrate our approach in a real data example from the Framingham Heart Study. CONCLUSION Our approach can be applied to both binary and continuous traits, it is powerful and computationally efficient.
Collapse
Affiliation(s)
- Han Chen
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard School of Public Health, Boston, MA, USA
| | - James B Meigs
- General Medicine Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- The National Heart, Lung and Blood Institute’s Framingham Heart Study, Framingham, MA, USA
| |
Collapse
|
118
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 658] [Impact Index Per Article: 65.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
119
|
Huang W, Massouras A, Inoue Y, Peiffer J, Ràmia M, Tarone AM, Turlapati L, Zichner T, Zhu D, Lyman RF, Magwire MM, Blankenburg K, Carbone MA, Chang K, Ellis LL, Fernandez S, Han Y, Highnam G, Hjelmen CE, Jack JR, Javaid M, Jayaseelan J, Kalra D, Lee S, Lewis L, Munidasa M, Ongeri F, Patel S, Perales L, Perez A, Pu L, Rollmann SM, Ruth R, Saada N, Warner C, Williams A, Wu YQ, Yamamoto A, Zhang Y, Zhu Y, Anholt RR, Korbel JO, Mittelman D, Muzny DM, Gibbs RA, Barbadilla A, Johnston JS, Stone EA, Richards S, Deplancke B, Mackay TF. Natural variation in genome architecture among 205 Drosophila melanogaster Genetic Reference Panel lines. Genome Res 2014; 24:1193-208. [PMID: 24714809 PMCID: PMC4079974 DOI: 10.1101/gr.171546.113] [Citation(s) in RCA: 415] [Impact Index Per Article: 41.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2013] [Accepted: 04/01/2014] [Indexed: 12/30/2022]
Abstract
The Drosophila melanogaster Genetic Reference Panel (DGRP) is a community resource of 205 sequenced inbred lines, derived to improve our understanding of the effects of naturally occurring genetic variation on molecular and organismal phenotypes. We used an integrated genotyping strategy to identify 4,853,802 single nucleotide polymorphisms (SNPs) and 1,296,080 non-SNP variants. Our molecular population genomic analyses show higher deletion than insertion mutation rates and stronger purifying selection on deletions. Weaker selection on insertions than deletions is consistent with our observed distribution of genome size determined by flow cytometry, which is skewed toward larger genomes. Insertion/deletion and single nucleotide polymorphisms are positively correlated with each other and with local recombination, suggesting that their nonrandom distributions are due to hitchhiking and background selection. Our cytogenetic analysis identified 16 polymorphic inversions in the DGRP. Common inverted and standard karyotypes are genetically divergent and account for most of the variation in relatedness among the DGRP lines. Intriguingly, variation in genome size and many quantitative traits are significantly associated with inversions. Approximately 50% of the DGRP lines are infected with Wolbachia, and four lines have germline insertions of Wolbachia sequences, but effects of Wolbachia infection on quantitative traits are rarely significant. The DGRP complements ongoing efforts to functionally annotate the Drosophila genome. Indeed, 15% of all D. melanogaster genes segregate for potentially damaged proteins in the DGRP, and genome-wide analyses of quantitative traits identify novel candidate genes. The DGRP lines, sequence data, genotypes, quality scores, phenotypes, and analysis and visualization tools are publicly available.
Collapse
Affiliation(s)
- Wen Huang
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Andreas Massouras
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Yutaka Inoue
- Center for Education in Liberal Arts and Sciences, Osaka University, Osaka-fu, 560-0043 Japan
| | - Jason Peiffer
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Miquel Ràmia
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - Aaron M. Tarone
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Lavanya Turlapati
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Thomas Zichner
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Dianhui Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard F. Lyman
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Michael M. Magwire
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kerstin Blankenburg
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mary Anna Carbone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Kyle Chang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lisa L. Ellis
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Sonia Fernandez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yi Han
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Gareth Highnam
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Carl E. Hjelmen
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - John R. Jack
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Mehwish Javaid
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Joy Jayaseelan
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Sandy Lee
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Lewis
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Mala Munidasa
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Fiona Ongeri
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Shohba Patel
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Lora Perales
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Agapito Perez
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - LingLing Pu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Stephanie M. Rollmann
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Robert Ruth
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Nehad Saada
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Crystal Warner
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Aneisa Williams
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yuan-Qing Wu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Akihiko Yamamoto
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Yiqing Zhang
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Yiming Zhu
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Robert R.H. Anholt
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Jan O. Korbel
- Genome Biology Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - David Mittelman
- Virginia Tech Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, Virginia 24061, USA
| | - Donna M. Muzny
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Antonio Barbadilla
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina (IBB), Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | - J. Spencer Johnston
- Department of Entomology, Texas A&M University, College Station, Texas 77843, USA
| | - Eric A. Stone
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| | - Stephen Richards
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas 77030 USA
| | - Bart Deplancke
- Laboratory of Systems Biology and Genetics, Institute of Bioengineering, School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Trudy F.C. Mackay
- Department of Biological Sciences, North Carolina State University, Raleigh, North Carolina 27595, USA
| |
Collapse
|
120
|
Abstract
Although many genetic factors have been successfully identified for human diseases in genome-wide association studies, genes discovered to date only account for a small proportion of overall genetic contributions to many complex traits. Association studies have difficulty in detecting the remaining true genetic variants that are either common variants with weak allelic effects, or rare variants that have strong allelic effects but are weakly associated at the population level. In this work, we applied a goodness-of-fit test for detecting sets of common and rare variants associated with quantitative or binary traits by using whole genome sequencing data. This test has been proved optimal for detecting weak and sparse signals in the literature, which fits the requirements for targeting the genetic components of missing heritability. Furthermore, this p value-combining method allows one to incorporate different data and/or research results for meta-analysis. The method was used to simultaneously analyse the whole genome sequencing and genome-wide association studies data of Genetic Analysis Workshop 18 for detecting true genetic variants. The results show that goodness-of-fit test is comparable or better than the influential sequence kernel association test in many cases.
Collapse
Affiliation(s)
- Li Yang
- Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| | - Jing Xuan
- Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| | - Zheyang Wu
- Department of Mathematical Sciences, Worcester Polytechnic Institute, 100 Institute Road, Worcester, MA 01609-2280, USA
| |
Collapse
|
121
|
Feng T, Zhu X. Whole genome sequencing data from pedigrees suggests linkage disequilibrium among rare variants created by population admixture. BMC Proc 2014; 8:S44. [PMID: 25519326 PMCID: PMC4143626 DOI: 10.1186/1753-6561-8-s1-s44] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Next-generation sequencing technologies have been designed to discover rare and de novo variants and are an important tool for identifying rare disease variants. Many statistical methods have been developed to test, using next-generation sequencing data, for rare variants that are associated with a trait. However, many of these methods make assumptions that rare variants are in linkage equilibrium in a gene. In this report, we studied whether transmitted or untransmitted haplotypes carry an excess of rare variants using the whole genome sequencing data of 15 large Mexican American pedigrees provided by the Genetic Analysis Workshop 18. We observed that an excess of rare variants are carried on either transmitted or nontransmitted haplotypes from parents to offspring. Further analyses suggest that such nonrandom associations among rare variants can be attributed to population admixture and single-nucleotide variant calling errors. Our results have significant implications for rare variant association studies, especially those conducted in admixed populations.
Collapse
Affiliation(s)
- Tao Feng
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| |
Collapse
|
122
|
Moutsianas L, Morris AP. Methodology for the analysis of rare genetic variation in genome-wide association and re-sequencing studies of complex human traits. Brief Funct Genomics 2014; 13:362-70. [PMID: 24916163 PMCID: PMC4168660 DOI: 10.1093/bfgp/elu012] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Genome-wide association studies have been successful in identifying common variants that impact complex human traits and diseases. However, despite this success, the joint effects of these variants explain only a small proportion of the genetic variance in these phenotypes, leading to speculation that rare genetic variation might account for much of the ‘missing heritability’. Consequently, there has been an exciting period of research and development into the methodology for the analysis of rare genetic variants, typically by considering their joint effects on complex traits within the same functional unit or genomic region. In this review, we describe a general framework for modelling the joint effects of rare genetic variants on complex traits in association studies of unrelated individuals. We summarise a range of widely used association tests that have been developed from this model and provide an overview of the relative performance of these approaches from published simulation studies.
Collapse
|
123
|
Abstract
Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article gives overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasize on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions.
Collapse
Affiliation(s)
- Jianqing Fan
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA;
| | - Fang Han
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA;
| | - Han Liu
- Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA;
| |
Collapse
|
124
|
Yan Q, Tiwari HK, Yi N, Lin WY, Gao G, Lou XY, Cui X, Liu N. Kernel-machine testing coupled with a rank-truncation method for genetic pathway analysis. Genet Epidemiol 2014; 38:447-56. [PMID: 24849109 DOI: 10.1002/gepi.21813] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 01/09/2023]
Abstract
Traditional genome-wide association studies (GWASs) usually focus on single-marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single-nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome-sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | | | | | | | | | | | | | | |
Collapse
|
125
|
Fan R, Wang Y, Mills JL, Wilson AF, Bailey-Wilson JE, Xiong M. Functional linear models for association analysis of quantitative traits. Genet Epidemiol 2014; 37:726-42. [PMID: 24130119 DOI: 10.1002/gepi.21757] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/15/2013] [Accepted: 08/14/2013] [Indexed: 12/19/2022]
Abstract
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, Maryland, United States of America
| | | | | | | | | | | |
Collapse
|
126
|
Abstract
This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.
Collapse
|
127
|
Kim S, Pan W, Shen X. Penalized regression approaches to testing for quantitative trait-rare variant association. Front Genet 2014; 5:121. [PMID: 24860593 PMCID: PMC4026747 DOI: 10.3389/fgene.2014.00121] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2014] [Accepted: 04/18/2014] [Indexed: 11/13/2022] Open
Abstract
In statistical data analysis, penalized regression is considered an attractive approach for its ability of simultaneous variable selection and parameter estimation. Although penalized regression methods have shown many advantages in variable selection and outcome prediction over other approaches for high-dimensional data, there is a relative paucity of the literature on their applications to hypothesis testing, e.g., in genetic association analysis. In this study, we apply several new penalized regression methods with a novel penalty, called Truncated L1 -penalty (TLP) (Shen et al., 2012), for either variable selection, or both variable selection and parameter grouping, in a data-adaptive way to test for association between a quantitative trait and a group of rare variants. The performance of the new methods are compared with some existing tests, including some recently proposed global tests and penalized regression-based methods, via simulations and an application to the real sequence data of the Genetic Analysis Workshop 17 (GAW17). Although our proposed penalized methods can improve over some existing penalized methods, often they do not outperform some existing global association tests. Some possible problems with utilizing penalized regression methods in genetic hypothesis testing are discussed. Given the capability of penalized regression in selecting causal variants and its sometimes promising performance, further studies are warranted.
Collapse
Affiliation(s)
- Sunkyung Kim
- Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota Minneapolis, MN, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota Minneapolis, MN, USA
| |
Collapse
|
128
|
Logsdon BA, Dai JY, Auer PL, Johnsen JM, Ganesh SK, Smith NL, Wilson JG, Tracy RP, Lange LA, Jiao S, Rich SS, Lettre G, Carlson CS, Jackson RD, O'Donnell CJ, Wurfel MM, Nickerson DA, Tang H, Reiner AP, Kooperberg C. A variational Bayes discrete mixture test for rare variant association. Genet Epidemiol 2014; 38:21-30. [PMID: 24482836 DOI: 10.1002/gepi.21772] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Recently, many statistical methods have been proposed to test for associations between rare genetic variants and complex traits. Most of these methods test for association by aggregating genetic variations within a predefined region, such as a gene. Although there is evidence that "aggregate" tests are more powerful than the single marker test, these tests generally ignore neutral variants and therefore are unable to identify specific variants driving the association with phenotype. We propose a novel aggregate rare-variant test that explicitly models a fraction of variants as neutral, tests associations at the gene-level, and infers the rare-variants driving the association. Simulations show that in the practical scenario where there are many variants within a given region of the genome with only a fraction causal our approach has greater power compared to other popular tests such as the Sequence Kernel Association Test (SKAT), the Weighted Sum Statistic (WSS), and the collapsing method of Morris and Zeggini (MZ). Our algorithm leverages a fast variational Bayes approximate inference methodology to scale to exome-wide analyses, a significant computational advantage over exact inference model selection methodologies. To demonstrate the efficacy of our methodology we test for associations between von Willebrand Factor (VWF) levels and VWF missense rare-variants imputed from the National Heart, Lung, and Blood Institute's Exome Sequencing project into 2,487 African Americans within the VWF gene. Our method suggests that a relatively small fraction (~10%) of the imputed rare missense variants within VWF are strongly associated with lower VWF levels in African Americans.
Collapse
|
129
|
Owen LA, Morrison MA, Ahn J, Woo SJ, Sato H, Robinson R, Morgan DJ, Zacharaki F, Simeonova M, Uehara H, Chakravarthy U, Hogg RE, Ambati BK, Kotoula M, Baehr W, Haider NB, Silvestri G, Miller JW, Tsironi EE, Farrer LA, Kim IK, Park KH, DeAngelis MM. FLT1 genetic variation predisposes to neovascular AMD in ethnically diverse populations and alters systemic FLT1 expression. Invest Ophthalmol Vis Sci 2014; 55:3543-54. [PMID: 24812550 DOI: 10.1167/iovs.14-14047] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
PURPOSE Current understanding of the genetic risk factors for age-related macular degeneration (AMD) is not sufficiently predictive of the clinical course. The VEGF pathway is a key therapeutic target for treatment of neovascular AMD; however, risk attributable to genetic variation within pathway genes is unclear. We sought to identify single nucleotide polymorphisms (SNPs) associated with AMD within the VEGF pathway. METHODS Using a tagSNP, direct sequencing and meta-analysis approach within four ethnically diverse cohorts, we identified genetic risk present in FLT1, though not within other VEGF pathway genes KDR, VEGFA, or VASH1. We used ChIP and ELISA in functional analysis. RESULTS The FLT1 SNPs rs9943922, rs9508034, rs2281827, rs7324510, and rs9513115 were significantly associated with increased risk of neovascular AMD. Each association was more significant after meta-analysis than in any one of the four cohorts. All associations were novel, within noncoding regions of FLT1 that do not tag for coding variants in linkage disequilibrium. Analysis of soluble FLT1 demonstrated higher expression in unaffected individuals homozygous for the FLT1 risk alleles rs9943922 (P = 0.0086) and rs7324510 (P = 0.0057). In silico analysis suggests that these variants change predicted splice sites and RNA secondary structure, and have been identified in other neovascular pathologies. These data were supported further by murine chromatin immunoprecipitation demonstrating that FLT1 is a target of Nr2e3, a nuclear receptor gene implicated in regulating an AMD pathway. CONCLUSIONS Although exact variant functions are not known, these data demonstrate relevancy across ethnically diverse genetic backgrounds within our study and, therefore, hold potential for global efficacy.
Collapse
Affiliation(s)
- Leah A Owen
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Margaux A Morrison
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Jeeyun Ahn
- Department of Ophthalmology, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Se Joon Woo
- Department of Ophthalmology, Seoul National University Bundang Hospital, Seoungnam, Republic of Korea
| | - Hajime Sato
- Department of Ophthalmology, Tohoku University Graduate School of Medicine, Aoba-ku, Sendai, Japan
| | - Rosann Robinson
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Denise J Morgan
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Fani Zacharaki
- Department of Ophthalmology, University of Thessaly, School of Medicine, Larissa, Greece
| | - Marina Simeonova
- Retina Service, Massachusetts Eye and Ear, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Hironori Uehara
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Usha Chakravarthy
- Centre for Experimental Medicine, Queen's University, Belfast, United Kingdom
| | - Ruth E Hogg
- Centre for Experimental Medicine, Queen's University, Belfast, United Kingdom
| | - Balamurali K Ambati
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Maria Kotoula
- Department of Ophthalmology, University of Thessaly, School of Medicine, Larissa, Greece
| | - Wolfgang Baehr
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| | - Neena B Haider
- Schepens Eye Research Institute, Harvard Medical School, Boston, Massachusetts, United States
| | - Giuliana Silvestri
- Centre for Experimental Medicine, Queen's University, Belfast, United Kingdom
| | - Joan W Miller
- Retina Service, Massachusetts Eye and Ear, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Evangelia E Tsironi
- Department of Ophthalmology, University of Thessaly, School of Medicine, Larissa, Greece
| | - Lindsay A Farrer
- Departments of Medicine (Biomedical Genetics), Ophthalmology, Neurology, Epidemiology, and Biostatistics, Boston University Schools of Medicine and Public Health, Boston, Massachusetts, United States
| | - Ivana K Kim
- Retina Service, Massachusetts Eye and Ear, Department of Ophthalmology, Harvard Medical School, Boston, Massachusetts, United States
| | - Kyu Hyung Park
- Department of Ophthalmology, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul, Republic of Korea
| | - Margaret M DeAngelis
- Department of Ophthalmology and Visual Sciences, University of Utah, Salt Lake City, Utah, United States
| |
Collapse
|
130
|
Derkach A, Lawless JF, Sun L. Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results. Stat Sci 2014. [DOI: 10.1214/13-sts456] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
131
|
Wang GT, Peng B, Leal SM. Variant association tools for quality control and analysis of large-scale sequence and genotyping array data. Am J Hum Genet 2014; 94:770-83. [PMID: 24791902 PMCID: PMC4067555 DOI: 10.1016/j.ajhg.2014.04.004] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2014] [Accepted: 04/03/2014] [Indexed: 12/14/2022] Open
Abstract
Currently there is great interest in detecting associations between complex traits and rare variants. In this report, we describe Variant Association Tools (VAT) and the VAT pipeline, which implements best practices for rare-variant association studies. Highlights of VAT include variant-site and call-level quality control (QC), summary statistics, phenotype- and genotype-based sample selection, variant annotation, selection of variants for association analysis, and a collection of rare-variant association methods for analyzing qualitative and quantitative traits. The association testing framework for VAT is regression based, which readily allows for flexible construction of association models with multiple covariates and weighting themes based on allele frequencies or predicted functionality. Additionally, pathway analyses, conditional analyses, and analyses of gene-gene and gene-environment interactions can be performed. VAT is capable of rapidly scanning through data by using multi-process computation, adaptive permutation, and simultaneously conducting association analysis via multiple methods. Results are available in text or graphic file formats and additionally can be output to relational databases for further annotation and filtering. An interface to R language also facilitates user implementation of novel association methods. The VAT's data QC and association-analysis pipeline can be applied to sequence, imputed, and genotyping array, e.g., "exome chip," data, providing a reliable and reproducible computational environment in which to analyze small- to large-scale studies with data from the latest genotyping and sequencing technologies. Application of the VAT pipeline is demonstrated through analysis of data from the 1000 Genomes project.
Collapse
Affiliation(s)
- Gao T Wang
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Bo Peng
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| | - Suzanne M Leal
- Center for Statistical Genetics, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
132
|
Sun H, Wang S. A power set-based statistical selection procedure to locate susceptible rare variants associated with complex traits with sequencing data. Bioinformatics 2014; 30:2317-23. [PMID: 24755303 DOI: 10.1093/bioinformatics/btu207] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
MOTIVATION Existing association methods for rare variants from sequencing data have focused on aggregating variants in a gene or a genetic region because of the fact that analysing individual rare variants is underpowered. However, these existing rare variant detection methods are not able to identify which rare variants in a gene or a genetic region of all variants are associated with the complex diseases or traits. Once phenotypic associations of a gene or a genetic region are identified, the natural next step in the association study with sequencing data is to locate the susceptible rare variants within the gene or the genetic region. RESULTS In this article, we propose a power set-based statistical selection procedure that is able to identify the locations of the potentially susceptible rare variants within a disease-related gene or a genetic region. The selection performance of the proposed selection procedure was evaluated through simulation studies, where we demonstrated the feasibility and superior power over several comparable existing methods. In particular, the proposed method is able to handle the mixed effects when both risk and protective variants are present in a gene or a genetic region. The proposed selection procedure was also applied to the sequence data on the ANGPTL gene family from the Dallas Heart Study to identify potentially susceptible rare variants within the trait-related genes. AVAILABILITY AND IMPLEMENTATION An R package 'rvsel' can be downloaded from http://www.columbia.edu/∼sw2206/ and http://statsun.pusan.ac.kr.
Collapse
Affiliation(s)
- Hokeun Sun
- Department of Statistics, Pusan National University, Pusan 609-735, Korea and Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
| | - Shuang Wang
- Department of Statistics, Pusan National University, Pusan 609-735, Korea and Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, NY 10032, USA
| |
Collapse
|
133
|
Lin WY. Association testing of clustered rare causal variants in case-control studies. PLoS One 2014; 9:e94337. [PMID: 24736372 PMCID: PMC3988195 DOI: 10.1371/journal.pone.0094337] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
Biological evidence suggests that multiple causal variants in a gene may cluster physically. Variants within the same protein functional domain or gene regulatory element would locate in close proximity on the DNA sequence. However, spatial information of variants is usually not used in current rare variant association analyses. We here propose a clustering method (abbreviated as "CLUSTER"), which is extended from the adaptive combination of P-values. Our method combines the association signals of variants that are more likely to be causal. Furthermore, the statistic incorporates the spatial information of variants. With extensive simulations, we show that our method outperforms several commonly-used methods in many scenarios. To demonstrate its use in real data analyses, we also apply this CLUSTER test to the Dallas Heart Study data. CLUSTER is among the best methods when the effects of causal variants are all in the same direction. As variants located in close proximity are more likely to have similar impact on disease risk, CLUSTER is recommended for association testing of clustered rare causal variants in case-control studies.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
134
|
He L, Sillanpää MJ, Ripatti S, Pitkäniemi J. Bayesian Latent Variable Collapsing Model for Detecting Rare Variant Interaction Effect in Twin Study. Genet Epidemiol 2014; 38:310-24. [DOI: 10.1002/gepi.21804] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2013] [Revised: 02/28/2014] [Accepted: 02/28/2014] [Indexed: 12/12/2022]
Affiliation(s)
- Liang He
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
| | - Mikko J. Sillanpää
- Department of Mathematical Sciences; University of Oulu; Oulu Finland
- Department of Biology and Biocenter Oulu; University of Oulu; Oulu Finland
| | - Samuli Ripatti
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
- Institute for Molecular Medicine Finland FIMM; University of Helsinki; Finland
- Human Genetics; Wellcome Trust Sanger Institute; United Kingdom
| | - Janne Pitkäniemi
- Department of Public Health; Hjelt Institute; University of Helsinki; Finland
- Finnish Cancer Registry; Institute for Statistical and Epidemiological Cancer Research; Helsinki Finland
| |
Collapse
|
135
|
Cook K, Benitez A, Fu C, Tintle N. Evaluating the impact of genotype errors on rare variant tests of association. Front Genet 2014; 5:62. [PMID: 24744770 PMCID: PMC3978329 DOI: 10.3389/fgene.2014.00062] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2013] [Accepted: 03/11/2014] [Indexed: 01/23/2023] Open
Abstract
The new class of rare variant tests has usually been evaluated assuming perfect genotype information. In reality, rare variant genotypes may be incorrect, and so rare variant tests should be robust to imperfect data. Errors and uncertainty in SNP genotyping are already known to dramatically impact statistical power for single marker tests on common variants and, in some cases, inflate the type I error rate. Recent results show that uncertainty in genotype calls derived from sequencing reads are dependent on several factors, including read depth, calling algorithm, number of alleles present in the sample, and the frequency at which an allele segregates in the population. We have recently proposed a general framework for the evaluation and investigation of rare variant tests of association, classifying most rare variant tests into one of two broad categories (length or joint tests). We use this framework to relate factors affecting genotype uncertainty to the power and type I error rate of rare variant tests. We find that non-differential genotype errors (an error process that occurs independent of phenotype) decrease power, with larger decreases for extremely rare variants, and for the common homozygote to heterozygote error. Differential genotype errors (an error process that is associated with phenotype status), lead to inflated type I error rates which are more likely to occur at sites with more common homozygote to heterozygote errors than vice versa. Finally, our work suggests that certain rare variant tests and study designs may be more robust to the inclusion of genotype errors. Further work is needed to directly integrate genotype calling algorithm decisions, study costs and test statistic choices to provide comprehensive design and analysis advice which appropriately accounts for the impact of genotype errors.
Collapse
Affiliation(s)
- Kaitlyn Cook
- Department of Mathematics, Carleton College Northfield, MN, USA
| | - Alejandra Benitez
- Department of Applied Mathematics, Brown University Providence, RI, USA
| | - Casey Fu
- Department of Mathematics, Massachusetts Institute of Technology Boston, MA, USA
| | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College Sioux Center, IA, USA
| |
Collapse
|
136
|
Li M, He Z, Zhang M, Zhan X, Wei C, Elston RC, Lu Q. A generalized genetic random field method for the genetic association analysis of sequencing data. Genet Epidemiol 2014; 38:242-53. [PMID: 24482034 PMCID: PMC5241166 DOI: 10.1002/gepi.21790] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2013] [Revised: 11/28/2013] [Accepted: 12/21/2013] [Indexed: 01/23/2023]
Abstract
With the advance of high-throughput sequencing technologies, it has become feasible to investigate the influence of the entire spectrum of sequencing variations on complex human diseases. Although association studies utilizing the new sequencing technologies hold great promise to unravel novel genetic variants, especially rare genetic variants that contribute to human diseases, the statistical analysis of high-dimensional sequencing data remains a challenge. Advanced analytical methods are in great need to facilitate high-dimensional sequencing data analyses. In this article, we propose a generalized genetic random field (GGRF) method for association analyses of sequencing data. Like other similarity-based methods (e.g., SIMreg and SKAT), the new method has the advantages of avoiding the need to specify thresholds for rare variants and allowing for testing multiple variants acting in different directions and magnitude of effects. The method is built on the generalized estimating equation framework and thus accommodates a variety of disease phenotypes (e.g., quantitative and binary phenotypes). Moreover, it has a nice asymptotic property, and can be applied to small-scale sequencing data without need for small-sample adjustment. Through simulations, we demonstrate that the proposed GGRF attains an improved or comparable power over a commonly used method, SKAT, under various disease scenarios, especially when rare variants play a significant role in disease etiology. We further illustrate GGRF with an application to a real dataset from the Dallas Heart Study. By using GGRF, we were able to detect the association of two candidate genes, ANGPTL3 and ANGPTL4, with serum triglyceride.
Collapse
Affiliation(s)
- Ming Li
- Division of Biostatistics, Department of Pediatrics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States of America
| | - Zihuai He
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Min Zhang
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Xiaowei Zhan
- Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Changshuai Wei
- Department of Epidemiology and Biostatics, Michigan State University, East Lansing, Michigan, United States of America
| | - Robert C. Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - Qing Lu
- Department of Epidemiology and Biostatics, Michigan State University, East Lansing, Michigan, United States of America
| |
Collapse
|
137
|
Zeng P, Zhao Y, Zhang L, Huang S, Chen F. Rare variants detection with kernel machine learning based on likelihood ratio test. PLoS One 2014; 9:e93355. [PMID: 24675868 PMCID: PMC3968153 DOI: 10.1371/journal.pone.0093355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2013] [Accepted: 03/03/2014] [Indexed: 11/18/2022] Open
Abstract
This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
138
|
Test of rare variant association based on affected sib-pairs. Eur J Hum Genet 2014; 23:229-37. [PMID: 24667785 DOI: 10.1038/ejhg.2014.43] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Revised: 11/06/2013] [Accepted: 12/30/2013] [Indexed: 11/08/2022] Open
Abstract
With the development of sequencing techniques, there is increasing interest to detect associations between rare variants and complex traits. Quite a few statistical methods to detect associations between rare variants and complex traits have been developed for unrelated individuals. Statistical methods for detecting rare variant associations under family-based designs have not received as much attention as methods for unrelated individuals. Recent studies show that rare disease variants will be enriched in family data and thus family-based designs may improve power to detect rare variant associations. In this article, we propose a novel test to test association between the optimally weighted combination of variants and trait of interests for affected sib-pairs. The optimal weights are analytically derived and can be calculated from sampled genotypes and phenotypes. Based on the optimal weights, the proposed method is robust to the directions of the effects of causal variants and is less affected by neutral variants than existing methods are. Our simulation results show that, in all the cases, the proposed method is substantially more powerful than existing methods based on unrelated individuals and existing methods based on affected sib-pairs.
Collapse
|
139
|
Lee IH, Lee K, Hsing M, Choe Y, Park JH, Kim SH, Bohn JM, Neu MB, Hwang KB, Green RC, Kohane IS, Kong SW. Prioritizing disease-linked variants, genes, and pathways with an interactive whole-genome analysis pipeline. Hum Mutat 2014; 35:537-47. [PMID: 24478219 DOI: 10.1002/humu.22520] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2013] [Accepted: 01/23/2014] [Indexed: 01/02/2023]
Abstract
Whole-genome sequencing (WGS) studies are uncovering disease-associated variants in both rare and nonrare diseases. Utilizing the next-generation sequencing for WGS requires a series of computational methods for alignment, variant detection, and annotation, and the accuracy and reproducibility of annotation results are essential for clinical implementation. However, annotating WGS with up to date genomic information is still challenging for biomedical researchers. Here, we present one of the fastest and highly scalable annotation, filtering, and analysis pipeline-gNOME-to prioritize phenotype-associated variants while minimizing false-positive findings. Intuitive graphical user interface of gNOME facilitates the selection of phenotype-associated variants, and the result summaries are provided at variant, gene, and genome levels. Moreover, the enrichment results of specific variants, genes, and gene sets between two groups or compared with population scale WGS datasets that is already integrated in the pipeline can help the interpretation. We found a small number of discordant results between annotation software tools in part due to different reporting strategies for the variants with complex impacts. Using two published whole-exome datasets of uveal melanoma and bladder cancer, we demonstrated gNOME's accuracy of variant annotation and the enrichment of loss-of-function variants in known cancer pathways. gNOME Web server and source codes are freely available to the academic community (http://gnome.tchlab.org).
Collapse
Affiliation(s)
- In-Hee Lee
- Children's Hospital Informatics Program at the Harvard-MIT Division of Health Sciences and Technology, Department of Medicine, Boston Children's Hospital, Boston, Massachusetts, 02115
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
140
|
Sha Q, Zhang S. A novel test for testing the optimally weighted combination of rare and common variants based on data of parents and affected children. Genet Epidemiol 2014; 38:135-43. [PMID: 24382753 PMCID: PMC4162402 DOI: 10.1002/gepi.21787] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2013] [Revised: 10/28/2013] [Accepted: 12/02/2013] [Indexed: 11/10/2022]
Abstract
With the development of sequencing technologies, the direct testing of rare variant associations has become possible. Many statistical methods for detecting associations between rare variants and complex diseases have recently been developed, most of which are population-based methods for unrelated individuals. A limitation of population-based methods is that spurious associations can occur when there is a population structure. For rare variants, this problem can be more serious, because the spectrum of rare variation can be very different in diverse populations, as well as the current nonexistence of methods to control for population stratification in population-based rare variant associations. A solution to the problem of population stratification is to use family-based association tests, which use family members to control for population stratification. In this article, we propose a novel test for Testing the Optimally Weighted combination of variants based on data of Parents and Affected Children (TOW-PAC). TOW-PAC is a family-based association test that tests the combined effect of rare and common variants in a genomic region, and is robust to the directions of the effects of causal variants. Simulation studies confirm that, for rare variant associations, family-based association tests are robust to population stratification although population-based association tests can be seriously confounded by population stratification. The results of power comparisons show that the power of TOW-PAC increases with an increase of the number of affected children in each family and TOW-PAC based on multiple affected children per family is more powerful than TOW based on unrelated individuals.
Collapse
Affiliation(s)
- Qiuying Sha
- Department of Mathematical Sciences, Michigan Technological University, Houghton, Michigan, United States of America
| | | |
Collapse
|
141
|
Li B, Liu DJ, Leal SM. Identifying rare variants associated with complex traits via sequencing. ACTA ACUST UNITED AC 2014; Chapter 1:Unit 1.26. [PMID: 23853079 DOI: 10.1002/0471142905.hg0126s78] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although genome-wide association studies have been successful in detecting associations with common variants, there is currently an increasing interest in identifying low-frequency and rare variants associated with complex traits. Next-generation sequencing technologies make it feasible to survey the full spectrum of genetic variation in coding regions or the entire genome. The association analysis for rare variants is challenging, and traditional methods are ineffective, however, due to the low frequency of rare variants, coupled with allelic heterogeneity. Recently a battery of new statistical methods has been proposed for identifying rare variants associated with complex traits. These methods test for associations by aggregating multiple rare variants across a gene or a genomic region or among a group of variants in the genome. In this unit, we describe key concepts for rare variant association for complex traits, survey some of the recent methods, discuss their statistical power under various scenarios, and provide practical guidance on analyzing next-generation sequencing data for identifying rare variants associated with complex traits.
Collapse
Affiliation(s)
- Bingshan Li
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
| | | | | |
Collapse
|
142
|
Turkmen AS, Lin S. Blocking approach for identification of rare variants in family-based association studies. PLoS One 2014; 9:e86126. [PMID: 24465912 PMCID: PMC3900483 DOI: 10.1371/journal.pone.0086126] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2013] [Accepted: 12/09/2013] [Indexed: 01/14/2023] Open
Abstract
With the advent of next-generation sequencing technology, rare variant association analysis is increasingly being conducted to identify genetic variants associated with complex traits. In recent years, significant effort has been devoted to develop powerful statistical methods to test such associations for population-based designs. However, there has been relatively little development for family-based designs although family data have been shown to be more powerful to detect rare variants. This study introduces a blocking approach that extends two popular family-based common variant association tests to rare variants association studies. Several options are considered to partition a genomic region (gene) into "independent" blocks by which information from SNVs is aggregated within a block and an overall test statistic for the entire genomic region is calculated by combining information across these blocks. The proposed methodology allows different variants to have different directions (risk or protective) and specification of minor allele frequency threshold is not needed. We carried out a simulation to verify the validity of the method by showing that type I error is well under control when the underlying null hypothesis and the assumption of independence across blocks are satisfied. Further, data from the Genetic Analysis Workshop [Formula: see text] are utilized to illustrate the feasibility and performance of the proposed methodology in a realistic setting.
Collapse
Affiliation(s)
- Asuman S Turkmen
- Statistics Department, The Ohio State University, Columbus, Ohio, United States of America ; Statistics Department, The Ohio State University, Newark, Ohio, United States of America
| | - Shili Lin
- Statistics Department, The Ohio State University, Columbus, Ohio, United States of America
| |
Collapse
|
143
|
Zakharov S, Teoh GHK, Salim A, Thalamuthu A. A method to incorporate prior information into score test for genetic association studies. BMC Bioinformatics 2014; 15:24. [PMID: 24450486 PMCID: PMC3904928 DOI: 10.1186/1471-2105-15-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 01/17/2014] [Indexed: 12/13/2022] Open
Abstract
Background The interest of the scientific community in investigating the impact of rare variants on complex traits has stimulated the development of novel statistical methodologies for association studies. The fact that many of the recently proposed methods for association studies suffer from low power to identify a genetic association motivates the incorporation of prior knowledge into statistical tests. Results In this article we propose a methodology to incorporate prior information into the region-based score test. Within our framework prior information is used to partition variants within a region into several groups, following which asymptotically independent group statistics are constructed and then combined into a global test statistic. Under the null hypothesis the distribution of our test statistic has lower degrees of freedom compared with those of the region-based score statistic. Theoretical power comparison, population genetics simulations and results from analysis of the GAW17 sequencing data set suggest that under some scenarios our method may perform as well as or outperform the score test and other competing methods. Conclusions An approach which uses prior information to improve the power of the region-based score test is proposed. Theoretical power comparison, population genetics simulations and the results of GAW17 data analysis showed that for some scenarios power of our method is on the level with or higher than those of the score test and other methods.
Collapse
Affiliation(s)
- Sergii Zakharov
- Human Genetics, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.
| | | | | | | |
Collapse
|
144
|
Won S, Kim Y, Lange C. On rare-variant analysis in population-based designs: decomposing the likelihood to two informative components. Hum Hered 2014; 76:76-85. [PMID: 24434864 DOI: 10.1159/000357643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2012] [Accepted: 11/29/2013] [Indexed: 11/19/2022] Open
Abstract
Various analytical approaches have been suggested for the characterization of rare variants. One main approach is to collapse the genetic information of rare variants in a region and to construct an overall test statistic. Here, we proposed a new approach based on collapsed genotype scores. By utilizing the information of the association signal that is ignored in collapsing methods, i.e. the configuration of rare alleles, we constructed a more powerful test and compared it with existing rare-variant approaches. With extensive simulation studies, we showed that our method performs better than existing approaches, and we applied our method to a sequencing study of nonsyndromic cleft lip illustrating the practical advantages of the proposed method.
Collapse
Affiliation(s)
- Sungho Won
- Department of Applied Statistics, Chung-Ang University, Seoul, Korea
| | | | | |
Collapse
|
145
|
Rare variant association testing by adaptive combination of P-values. PLoS One 2014; 9:e85728. [PMID: 24454922 PMCID: PMC3893264 DOI: 10.1371/journal.pone.0085728] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 12/02/2013] [Indexed: 01/21/2023] Open
Abstract
With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the -MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675–685) and propose an approach (named ‘adaptive combination of P-values for rare variant association testing’, abbreviated as ‘ADA’) that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.
Collapse
|
146
|
Ghosh A, Hartge P, Kraft P, Joshi AD, Ziegler RG, Barrdahl M, Chanock SJ, Wacholder S, Chatterjee N. Leveraging family history in population-based case-control association studies. Genet Epidemiol 2014; 38:114-22. [PMID: 24408355 DOI: 10.1002/gepi.21785] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2013] [Revised: 11/16/2013] [Accepted: 12/02/2013] [Indexed: 12/28/2022]
Abstract
Population-based epidemiologic studies often gather information from study participants on disease history among their family members. Although investigators widely recognize that family history will be associated with genotypes of the participants at disease susceptibility loci, they commonly ignore such information in primary genetic association analyses. In this report, we propose a simple approach to association testing by incorporating family history information as a "phenotype." We account for the expected attenuation in strength of association of the genotype of study participants with family history under Mendelian transmission. The proposed analysis can be performed using standard statistical software adopting either a meta- or pooled-analysis framework. Re-analysis of a total of 115 known susceptibility single-nucleotide polymorphisms, discovered through genome-wide association studies for several disease traits, indicates that incorporation of family history information can increase efficiency by as much as 40%. Efficiency gain depends on the type of design used for conducting the primary study, extent of family history, and accuracy and completeness of reporting.
Collapse
Affiliation(s)
- Arpita Ghosh
- Public Health Foundation of India, New Delhi, India
| | | | | | | | | | | | | | | | | |
Collapse
|
147
|
Tang H, Jin X, Li Y, Jiang H, Tang X, Yang X, Cheng H, Qiu Y, Chen G, Mei J, Zhou F, Wu R, Zuo X, Zhang Y, Zheng X, Cai Q, Yin X, Quan C, Shao H, Cui Y, Tian F, Zhao X, Liu H, Xiao F, Xu F, Han J, Shi D, Zhang A, Zhou C, Li Q, Fan X, Lin L, Tian H, Wang Z, Fu H, Wang F, Yang B, Huang S, Liang B, Xie X, Ren Y, Gu Q, Wen G, Sun Y, Wu X, Dang L, Xia M, Shan J, Li T, Yang L, Zhang X, Li Y, He C, Xu A, Wei L, Zhao X, Gao X, Xu J, Zhang F, Zhang J, Li Y, Sun L, Liu J, Chen R, Yang S, Wang J, Zhang X. A large-scale screen for coding variants predisposing to psoriasis. Nat Genet 2014; 46:45-50. [PMID: 24212883 DOI: 10.1038/ng.2827] [Citation(s) in RCA: 166] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2013] [Accepted: 10/17/2013] [Indexed: 12/13/2022]
Abstract
To explore the contribution of functional coding variants to psoriasis, we analyzed nonsynonymous single-nucleotide variants (SNVs) across the genome by exome sequencing in 781 psoriasis cases and 676 controls and through follow-up validation in 1,326 candidate genes by targeted sequencing in 9,946 psoriasis cases and 9,906 controls from the Chinese population. We discovered two independent missense SNVs in IL23R and GJB2 of low frequency and five common missense SNVs in LCE3D, ERAP1, CARD14 and ZNF816A associated with psoriasis at genome-wide significance. Rare missense SNVs in FUT2 and TARBP1 were also observed with suggestive evidence of association. Single-variant and gene-based association analyses of nonsynonymous SNVs did not identify newly associated genes for psoriasis in the regions subjected to targeted resequencing. This suggests that coding variants in the 1,326 targeted genes contribute only a limited fraction of the overall genetic risk for psoriasis.
Collapse
Affiliation(s)
- Huayang Tang
- 1] Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China. [2]
| | - Xin Jin
- 1] BGI-Shenzhen, Shenzhen, China. [2] School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China. [3]
| | - Yang Li
- 1] Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China. [2]
| | - Hui Jiang
- 1] BGI-Shenzhen, Shenzhen, China. [2]
| | - Xianfa Tang
- 1] Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China. [2]
| | - Xu Yang
- BGI-Shenzhen, Shenzhen, China
| | - Hui Cheng
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Ying Qiu
- Department of Dermatology, Jining No. 1 People's Hospital, Jining, Shandong, China
| | - Gang Chen
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Fusheng Zhou
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Xianbo Zuo
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Xiaodong Zheng
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Qi Cai
- Department of Dermatology, Second Hospital, Chengdu, China
| | - Xianyong Yin
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Cheng Quan
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Yong Cui
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Fangzhen Tian
- Department of Dermatology, Jining No. 1 People's Hospital, Jining, Shandong, China
| | | | - Hong Liu
- Shandong Provincial Institute of Dermatology and Venereology, Shandong Academy of Medical Science, Jinan, China
| | - Fengli Xiao
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Jianwen Han
- Department of Dermatology, Affiliated Hospital of Inner Mongolia Medical College, Huhehot, China
| | - Dongmei Shi
- Department of Dermatology, Jining No. 1 People's Hospital, Jining, Shandong, China
| | - Anping Zhang
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Cheng Zhou
- Department of Dermatology, Peking University People's Hospital, Beijing, China
| | | | - Xing Fan
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Hongqing Tian
- Shandong Provincial Institute of Dermatology and Venereology, Shandong Academy of Medical Science, Jinan, China
| | - Zaixing Wang
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Fang Wang
- Department of Dermatology, Peking University People's Hospital, Beijing, China
| | - Baoqi Yang
- Shandong Provincial Institute of Dermatology and Venereology, Shandong Academy of Medical Science, Jinan, China
| | | | - Bo Liang
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Yunqing Ren
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | | | - Guangdong Wen
- Department of Dermatology, Peking University People's Hospital, Beijing, China
| | - Yulin Sun
- State Key Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College and Center of Basic Medical Sciences, Navy General Hospital, Beijing, China
| | | | - Lin Dang
- Department of Dermatology, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Min Xia
- BGI-Shenzhen, Shenzhen, China
| | - Junjun Shan
- Department of Dermatology, Third People's Hospital of Hangzhou, Hangzhou, China
| | - Tianhang Li
- Department of Dermatology, Jining No. 1 People's Hospital, Jining, Shandong, China
| | | | - Xiuyun Zhang
- Department of Dermatology, Jining No. 1 People's Hospital, Jining, Shandong, China
| | - Yuzhen Li
- Department of Dermatology, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chundi He
- Department of Dermatology, No. 1 Hospital of China Medical University, Shenyang, China
| | - Aie Xu
- Department of Dermatology, Third People's Hospital of Hangzhou, Hangzhou, China
| | - Liping Wei
- School of Life Sciences, Peking University, Beijing, China
| | - Xiaohang Zhao
- State Key Laboratory of Molecular Oncology, Cancer Institute & Hospital, Chinese Academy of Medical Sciences & Peking Union Medical College and Center of Basic Medical Sciences, Navy General Hospital, Beijing, China
| | - Xinghua Gao
- Department of Dermatology, No. 1 Hospital of China Medical University, Shenyang, China
| | - Jinhua Xu
- Department of Dermatology, Huashan Hospital of Fudan University, Shanghai, China
| | - Furen Zhang
- Shandong Provincial Institute of Dermatology and Venereology, Shandong Academy of Medical Science, Jinan, China
| | - Jianzhong Zhang
- Department of Dermatology, Peking University People's Hospital, Beijing, China
| | | | - Liangdan Sun
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Jianjun Liu
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Runsheng Chen
- Institute of Biophysics of the Chinese Academy of Sciences, Beijing, China
| | - Sen Yang
- Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China
| | - Jun Wang
- 1] BGI-Shenzhen, Shenzhen, China. [2] Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark. [3] Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Xuejun Zhang
- 1] Department of Dermatology, First Affiliated Hospital, Anhui Medical University, Hefei, China. [2] Department of Dermatology, Huashan Hospital of Fudan University, Shanghai, China
| |
Collapse
|
148
|
Fan R, Lo SH. A robust model-free approach for rare variants association studies incorporating gene-gene and gene-environmental interactions. PLoS One 2013; 8:e83057. [PMID: 24358248 PMCID: PMC3866272 DOI: 10.1371/journal.pone.0083057] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2013] [Accepted: 10/30/2013] [Indexed: 11/19/2022] Open
Abstract
Recently more and more evidence suggest that rare variants with much lower minor allele frequencies play significant roles in disease etiology. Advances in next-generation sequencing technologies will lead to many more rare variants association studies. Several statistical methods have been proposed to assess the effect of rare variants by aggregating information from multiple loci across a genetic region and testing the association between the phenotype and aggregated genotype. One limitation of existing methods is that they only look into the marginal effects of rare variants but do not systematically take into account effects due to interactions among rare variants and between rare variants and environmental factors. In this article, we propose the summation of partition approach (SPA), a robust model-free method that is designed specifically for detecting both marginal effects and effects due to gene-gene (G×G) and gene-environmental (G×E) interactions for rare variants association studies. SPA has three advantages. First, it accounts for the interaction information and gains considerable power in the presence of unknown and complicated G×G or G×E interactions. Secondly, it does not sacrifice the marginal detection power; in the situation when rare variants only have marginal effects it is comparable with the most competitive method in current literature. Thirdly, it is easy to extend and can incorporate more complex interactions; other practitioners and scientists can tailor the procedure to fit their own study friendly. Our simulation studies show that SPA is considerably more powerful than many existing methods in the presence of G×G and G×E interactions.
Collapse
Affiliation(s)
- Ruixue Fan
- Department of Statistics, Columbia University, New York, New York, United States of America
| | - Shaw-Hwa Lo
- Department of Statistics, Columbia University, New York, New York, United States of America
- * E-mail: (SHL)
| |
Collapse
|
149
|
Cheng KF, Lee JY, Zheng W, Li C. A powerful association test of multiple genetic variants using a random-effects model. Stat Med 2013; 33:1816-27. [PMID: 24338936 DOI: 10.1002/sim.6068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2012] [Revised: 11/09/2013] [Accepted: 11/19/2013] [Indexed: 01/26/2023]
Abstract
There is an emerging interest in sequencing-based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance-component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM , is derived from a random-effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non-causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non-causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants.
Collapse
Affiliation(s)
- K F Cheng
- Biostatistics Center and Department of Public Health, Taipei Medical University, Taiwan
| | | | | | | |
Collapse
|
150
|
Freytag S, Bickeböller H. Comparison of three summary statistics for ranking genes in genome-wide association studies. Stat Med 2013; 33:1828-41. [PMID: 24323702 DOI: 10.1002/sim.6063] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Revised: 09/11/2013] [Accepted: 11/18/2013] [Indexed: 01/30/2023]
Abstract
Problems associated with insufficient power have haunted the analysis of genome-wide association studies and are likely to be the main challenge for the analysis of next-generation sequencing data. Ranking genes according to their strength of association with the investigated phenotype is one solution. To obtain rankings for genes, researchers can draw from a wide range of statistics summarizing the relationships between variants mapped to a gene and the phenotype. Hence, it is of interest to explore the performance of these statistics in the context of rankings. To this end, we conducted a simulation study (limited to genes of equal sizes) of three different summary statistics examining the ability to rank genes in a meaningful order. The weighted sum of squared marginal score test (Pan, 2009), RareCover algorithm (Bahtia et al., 2010) and the elastic net regularization (Zou and Hastie, 2005) were chosen, because they can handle common as well as rare variants. The test based on the score statistic outperformed both other methods in almost all investigated scenarios. It was the only measure to consistently detect genes with interacting causal variants. However, the RareCover algorithm proved better at identifying genes including causal variants with small effect sizes and low minor allele frequency than the weighted sum of squared marginal score test. The performance of the elastic net regularization was unimpressive for all but the simplest scenarios.
Collapse
Affiliation(s)
- Saskia Freytag
- Institute of Genetic Epidemiology, University of Göttingen, Humboltallee 32, Medical School, 37073 Göttingen, Germany
| | | |
Collapse
|