151
|
Abstract
In humans, most of the genetic variation is rare and often population-specific. Whereas the role of rare genetic variants in familial monogenic diseases is firmly established, we are only now starting to explore the contribution of this class of genetic variation to human common diseases and other complex traits. Such large-scale experiments are possible due to the development of next-generation DNA sequencing. Early findings suggested that rare and low-frequency coding variation might have a large effect on human phenotypes (eg, PCSK9 missense variants on low-density lipoprotein-cholesterol and coronary heart diseases). This observation sparked excitement in prognostic and diagnostic medicine, as well as in genetics-driven strategies to develop new drugs. In this review, I describe results and present initial conclusions regarding some of the recent rare and low-frequency variant discoveries. We can already assume that most phenotype-associated rare and low-frequency variants have modest-to-weak phenotypical effect. Thus, we will need large cohorts to identify them, as for common variants in genome-wide association studies. As we expand the list of associated rare and low-frequency variants, we can also better recognise the current limitations: we need to develop better statistical methods to optimally test association with rare variants, including non-coding variation, and to account for potential confounders such as population stratification.
Collapse
Affiliation(s)
- Guillaume Lettre
- Montreal Heart Institute, Montreal, Quebec, Canada Faculty of Medicine, Department of Medicine, Université de Montréal, Montreal, Quebec, Canada
| |
Collapse
|
152
|
Lin YC, Hsieh AR, Hsiao CL, Wu SJ, Wang HM, Lian IB, Fann CSJ. Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model. J Biomed Sci 2014; 21:88. [PMID: 25175702 PMCID: PMC4428531 DOI: 10.1186/s12929-014-0088-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 08/21/2014] [Indexed: 01/06/2023] Open
Abstract
BACKGROUND Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. RESULTS We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. CONCLUSIONS Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.
Collapse
Affiliation(s)
- Ying-Chao Lin
- Institute of Biomedical Informatics, National Yang-Ming University, Taipei, Taiwan. .,Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ai-Ru Hsieh
- Graduate Institute of Biostatistics, China Medical University, Taichung, Taiwan.
| | - Ching-Lin Hsiao
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Shang-Jung Wu
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Hui-Min Wang
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan.
| | - Ie-Bin Lian
- Graduate Institute of Statistics and Information Science, National Changhua University of Education, Changhua, Taiwan.
| | - Cathy S J Fann
- Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan. .,Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan. .,Institute of Public Health, National Yang-Ming University, Taipei, Taiwan.
| |
Collapse
|
153
|
Sambo F, Malovini A, Sandholm N, Stavarachi M, Forsblom C, Mäkinen VP, Harjutsalo V, Lithovius R, Gordin D, Parkkonen M, Saraheimo M, Thorn LM, Tolonen N, Wadén J, He B, Osterholm AM, Tuomilehto J, Lajer M, Salem RM, McKnight AJ, Tarnow L, Panduru NM, Barbarini N, Di Camillo B, Toffolo GM, Tryggvason K, Bellazzi R, Cobelli C, Groop PH. Novel genetic susceptibility loci for diabetic end-stage renal disease identified through robust naive Bayes classification. Diabetologia 2014; 57:1611-22. [PMID: 24871321 DOI: 10.1007/s00125-014-3256-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 04/11/2014] [Indexed: 10/25/2022]
Abstract
AIMS/HYPOTHESIS Diabetic nephropathy is a major diabetic complication, and diabetes is the leading cause of end-stage renal disease (ESRD). Family studies suggest a hereditary component for diabetic nephropathy. However, only a few genes have been associated with diabetic nephropathy or ESRD in diabetic patients. Our aim was to detect novel genetic variants associated with diabetic nephropathy and ESRD. METHODS We exploited a novel algorithm, 'Bag of Naive Bayes', whose marker selection strategy is complementary to that of conventional genome-wide association models based on univariate association tests. The analysis was performed on a genome-wide association study of 3,464 patients with type 1 diabetes from the Finnish Diabetic Nephropathy (FinnDiane) Study and subsequently replicated with 4,263 type 1 diabetes patients from the Steno Diabetes Centre, the All Ireland-Warren 3-Genetics of Kidneys in Diabetes UK collection (UK-Republic of Ireland) and the Genetics of Kidneys in Diabetes US Study (GoKinD US). RESULTS Five genetic loci (WNT4/ZBTB40-rs12137135, RGMA/MCTP2-rs17709344, MAPRE1P2-rs1670754, SEMA6D/SLC24A5-rs12917114 and SIK1-rs2838302) were associated with ESRD in the FinnDiane study. An association between ESRD and rs17709344, tagging the previously identified rs12437854 and located between the RGMA and MCTP2 genes, was replicated in independent case-control cohorts. rs12917114 near SEMA6D was associated with ESRD in the replication cohorts under the genotypic model (p < 0.05), and rs12137135 upstream of WNT4 was associated with ESRD in Steno. CONCLUSIONS/INTERPRETATION This study supports the previously identified findings on the RGMA/MCTP2 region and suggests novel susceptibility loci for ESRD. This highlights the importance of applying complementary statistical methods to detect novel genetic variants in diabetic nephropathy and, in general, in complex diseases.
Collapse
Affiliation(s)
- Francesco Sambo
- Department of Information Engineering, University of Padova, Padova, Italy
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
154
|
Marttinen P, Pirinen M, Sarin AP, Gillberg J, Kettunen J, Surakka I, Kangas AJ, Soininen P, O'Reilly P, Kaakinen M, Kähönen M, Lehtimäki T, Ala-Korpela M, Raitakari OT, Salomaa V, Järvelin MR, Ripatti S, Kaski S. Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression. Bioinformatics 2014; 30:2026-34. [PMID: 24665129 PMCID: PMC4080737 DOI: 10.1093/bioinformatics/btu140] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Revised: 02/27/2014] [Accepted: 03/04/2014] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION A typical genome-wide association study searches for associations between single nucleotide polymorphisms (SNPs) and a univariate phenotype. However, there is a growing interest to investigate associations between genomics data and multivariate phenotypes, for example, in gene expression or metabolomics studies. A common approach is to perform a univariate test between each genotype-phenotype pair, and then to apply a stringent significance cutoff to account for the large number of tests performed. However, this approach has limited ability to uncover dependencies involving multiple variables. Another trend in the current genetics is the investigation of the impact of rare variants on the phenotype, where the standard methods often fail owing to lack of power when the minor allele is present in only a limited number of individuals. RESULTS We propose a new statistical approach based on Bayesian reduced rank regression to assess the impact of multiple SNPs on a high-dimensional phenotype. Because of the method's ability to combine information over multiple SNPs and phenotypes, it is particularly suitable for detecting associations involving rare variants. We demonstrate the potential of our method and compare it with alternatives using the Northern Finland Birth Cohort with 4702 individuals, for whom genome-wide SNP data along with lipoprotein profiles comprising 74 traits are available. We discovered two genes (XRCC4 and MTHFD2L) without previously reported associations, which replicated in a combined analysis of two additional cohorts: 2390 individuals from the Cardiovascular Risk in Young Finns study and 3659 individuals from the FINRISK study. AVAILABILITY AND IMPLEMENTATION R-code freely available for download at http://users.ics.aalto.fi/pemartti/gene_metabolome/.
Collapse
Affiliation(s)
- Pekka Marttinen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Matti Pirinen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Antti-Pekka Sarin
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Jussi Gillberg
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Johannes Kettunen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Ida Surakka
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Antti J Kangas
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Pasi Soininen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Paul O'Reilly
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Marika Kaakinen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Mika Kähönen
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Terho Lehtimäki
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Mika Ala-Korpela
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Olli T Raitakari
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Veikko Salomaa
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, Finland
| | - Marjo-Riitta Järvelin
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Samuli Ripatti
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| | - Samuel Kaski
- Department of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Disease Dynamics, Harvard School of Public Health, Boston, MA, USA Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Unit of Public Health Genomics, National Institute for Health and Welfare, Helsinki, Computational Medicine, Institute of Health Sciences, University of Oulu and Oulu University Hospital, Oulu, NMR Metabolomics Laboratory, School of Pharmacy, University of Eastern Finland, Kuopio, Finland, Department of Epidemiology and Biostatistics, MRC Health Protection, Agency (HPA) Centre for Environment and Health, School of Public Health, Imperial College, London, UK, Institute of Health Sciences, Biocenter Oulu, University of Oulu, Oulu, Department of Clinical Physiology, Tampere University Hospital and University of Tampere, Department of Clinical Chemistry, Fimlab Laboratories, University of Tampere School of Medicine, Tampere, Finland, Computational Medicine, School of Social and Community Medicine and the Medical Research Council Integrative Epidemiology Unit, University of Bristol, Bristol, UK, Department of Clinical Physiology and Nuclear Medicine, Research Centre of Applied and Preventive Cardiovascular Medicine, University of Turku and Turku University Hospital, Turku, Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Unit of Primary Care, Oulu University Hospital, Department of Children and Young People and Families, National Institute for Health and Welfare, Oulu, Finland, Wellcome Trust Sanger Institute, Hinxton, Cambridge, UK, Hjelt Institute and Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Helsinki, FinlandDepartment of Information and Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, Esbo, Finland, Center for Communicable Dise
| |
Collapse
|
155
|
Abstract
The use of genetically isolated populations can empower next-generation association studies. In this review, we discuss the advantages of this approach and review study design and analytical considerations of genetic association studies focusing on isolates. We cite successful examples of using population isolates in association studies and outline potential ways forward.
Collapse
|
156
|
Zhao SD, Cai TT, Li H. More powerful genetic association testing via a new statistical framework for integrative genomics. Biometrics 2014; 70:881-90. [PMID: 24975802 DOI: 10.1111/biom.12206] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Revised: 05/01/2014] [Accepted: 05/01/2014] [Indexed: 11/30/2022]
Abstract
Integrative genomics offers a promising approach to more powerful genetic association studies. The hope is that combining outcome and genotype data with other types of genomic information can lead to more powerful SNP detection. We present a new association test based on a statistical model that explicitly assumes that genetic variations affect the outcome through perturbing gene expression levels. It is shown analytically that the proposed approach can have more power to detect SNPs that are associated with the outcome through transcriptional regulation, compared to tests using the outcome and genotype data alone, and simulations show that our method is relatively robust to misspecification. We also provide a strategy for applying our approach to high-dimensional genomic data. We use this strategy to identify a potentially new association between a SNP and a yeast cell's response to the natural product tomatidine, which standard association analysis did not detect.
Collapse
Affiliation(s)
- Sihai D Zhao
- Department of Statistics, University of Illinois at Urbana-Champaign, Champaign, Illinois 61820, U.S.A
| | | | | |
Collapse
|
157
|
Mosley JD, Van Driest SL, Weeke PE, Delaney JT, Wells QS, Bastarache L, Roden DM, Denny JC. Integrating EMR-linked and in vivo functional genetic data to identify new genotype-phenotype associations. PLoS One 2014; 9:e100322. [PMID: 24949630 PMCID: PMC4065041 DOI: 10.1371/journal.pone.0100322] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2014] [Accepted: 05/25/2014] [Indexed: 12/31/2022] Open
Abstract
The coupling of electronic medical records (EMR) with genetic data has created the potential for implementing reverse genetic approaches in humans, whereby the function of a gene is inferred from the shared pattern of morbidity among homozygotes of a genetic variant. We explored the feasibility of this approach to identify phenotypes associated with low frequency variants using Vanderbilt's EMR-based BioVU resource. We analyzed 1,658 low frequency non-synonymous SNPs (nsSNPs) with a minor allele frequency (MAF)<10% collected on 8,546 subjects. For each nsSNP, we identified diagnoses shared by at least 2 minor allele homozygotes and with an association p<0.05. The diagnoses were reviewed by a clinician to ascertain whether they may share a common mechanistic basis. While a number of biologically compelling clinical patterns of association were observed, the frequency of these associations was identical to that observed using genotype-permuted data sets, indicating that the associations were likely due to chance. To refine our analysis associations, we then restricted the analysis to 711 nsSNPs in genes with phenotypes in the On-line Mendelian Inheritance in Man (OMIM) or knock-out mouse phenotype databases. An initial comparison of the EMR diagnoses to the known in vivo functions of the gene identified 25 candidate nsSNPs, 19 of which had significant genotype-phenotype associations when tested using matched controls. Twleve of the 19 nsSNPs associations were confirmed by a detailed record review. Four of 12 nsSNP-phenotype associations were successfully replicated in an independent data set: thrombosis (F5,rs6031), seizures/convulsions (GPR98,rs13157270), macular degeneration (CNGB3,rs3735972), and GI bleeding (HGFAC,rs16844401). These analyses demonstrate the feasibility and challenges of using reverse genetics approaches to identify novel gene-phenotype associations in human subjects using low frequency variants. As increasing amounts of rare variant data are generated from modern genotyping and sequence platforms, model organism data may be an important tool to enable discovery.
Collapse
Affiliation(s)
- Jonathan D. Mosley
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Sara L. Van Driest
- Department of Pediatrics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Peter E. Weeke
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jessica T. Delaney
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Quinn S. Wells
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Lisa Bastarache
- Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dan M. Roden
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Josh C. Denny
- Department of Medicine, Vanderbilt University, Nashville, Tennessee, United States of America
- Biomedical Informatics, Vanderbilt University, Nashville, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
158
|
Gui H, Bao JY, Tang CSM, So MT, Ngo DN, Tran AQ, Bui DH, Pham DH, Nguyen TL, Tong A, Lok S, Sham PC, Tam PKH, Cherny SS, Garcia-Barcelo MM. Targeted next-generation sequencing on Hirschsprung disease: a pilot study exploits DNA pooling. Ann Hum Genet 2014; 78:381-7. [PMID: 24947032 DOI: 10.1111/ahg.12076] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2014] [Accepted: 05/07/2014] [Indexed: 12/11/2022]
Abstract
To adopt an efficient approach of identifying rare variants possibly related to Hirschsprung disease (HSCR), a pilot study was set up to evaluate the performance of a newly designed protocol for next generation targeted resquencing. In total, 20 Chinese HSCR patients and 20 Chinese sex-matched individuals with no HSCR were included, for which coding sequences (CDS) of 62 genes known to be in signaling pathways relevant to enteric nervous system development were selected for capture and sequencing. Blood DNAs from eight pools of five cases or controls were enriched by PCR-based RainDance technology (RDT) and then sequenced on a 454 FLX platform. As technical validation, five patients from case Pool-3 were also independently enriched by RDT, indexed with barcode and sequenced with sufficient coverage. Assessment for CDS single nucleotide variants showed DNA pooling performed well (specificity/sensitivity at 98.4%/83.7%) at the common variant level; but relatively worse (specificity/sensitivity at 65.5%/61.3%) at the rare variant level. Further Sanger sequencing only validated five out of 12 rare damaging variants likely involved in HSCR. Hence more improvement at variant detection and sequencing technology is needed to realize the potential of DNA pooling for large-scale resequencing projects.
Collapse
Affiliation(s)
- Hongsheng Gui
- Department of Surgery, The University of Hong Kong, Hong Kong, SAR, China; Department of Psychiatry, The University of Hong Kong, Hong Kong, SAR, China
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
159
|
Mallaney C, Sung YJ. Rare variant analysis of blood pressure phenotypes in the Genetic Analysis Workshop 18 whole genome sequencing data using sequence kernel association test. BMC Proc 2014; 8:S10. [PMID: 25519353 PMCID: PMC4143707 DOI: 10.1186/1753-6561-8-s1-s10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Sequence kernel association test (SKAT) has become one of the most commonly used nonburden tests for analyzing rare variants. Performance of burden tests depends on the weighting of rare and common variants when collapsing them in a genomic region. Using the systolic and diastolic blood pressure phenotypes of 142 unrelated individuals in the Genetic Analysis Workshop 18 data, we investigated whether performance of SKAT also depends on the weighting scheme. We analyzed the entire sequencing data for all 200 replications using 3 weighting schemes: equal weighting, Madsen-Browning weighting, and SKAT default linear weighting. We considered two options: all single-nucleotide polymorphisms (SNPs) and only low-frequency SNPs. A SKAT default weighting scheme (which heavily downweights common variants) performed better for the genes in which causal SNPs are mostly rare. This SKAT default weighting scheme behaved similarly to other weighting schemes after eliminating all common SNPs. In contrast, the equal weighting scheme performed the best for MAP4 and FLT3, both of which included a common variant with a large effect. However, SKAT with all 3 weighting schemes performed poorly. Overall power across all causal genes was about 0.05, which was almost identical to the type I error rate. This poor performance is partly due to a small sample size because of the need to analyze only unrelated individuals. Because a half of causal SNPs were not found in the annotation file based on the 1000 Genomes Project, we suspect that performance was also affected by our use of incomplete annotation information.
Collapse
Affiliation(s)
- Cates Mallaney
- Division of Biostatistics, Washington University in St. Louis, School of Medicine, St. Louis, MO 63110, USA
| | - Yun Ju Sung
- Division of Biostatistics, Washington University in St. Louis, School of Medicine, St. Louis, MO 63110, USA
| |
Collapse
|
160
|
Nalpathamkalam T, Derkach A, Paterson AD, Merico D. Genetic Analysis Workshop 18 single-nucleotide variant prioritization based on protein impact, sequence conservation, and gene annotation. BMC Proc 2014; 8:S11. [PMID: 25519362 PMCID: PMC4143669 DOI: 10.1186/1753-6561-8-s1-s11] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Grouping variants based on gene mapping can augment the power of rare variant association tests. Weighting or sorting variants based on their expected functional impact can provide additional benefit. We defined groups of prioritized variants based on systematic annotation of Genetic Analysis Workshop 18 (GAW18) single-nucleotide variants; we focused on variants detected by whole genome sequencing, specifically on the high-quality subset presented in the genotype files. First, we divided variants between coding and noncoding. Coding variants are fewer than 1% of the total and are more likely to have a biological effect than noncoding variants. Coding variants were further stratified into protein changing and protein damaging groups based on the effect on protein amino acid sequence. In particular, missense variants predicted to be damaging, splice-site alterations, and stop gains were assigned to the protein damaging category. Impact of noncoding variants is more difficult to predict. We decided to rely uniquely on conservation: we combined (a) the mammalian phastCons Conserved Element and (b) the PhyloP score, which identify conserved intervals and the single-nucleotide position, respectively. This reduced the noncoding variants to a number comparable to coding variants. Finally, using gene structure definition from the widely used RefSeq database, we mapped variants to genes to support association tests that require collapsing rare variants to genes. Companion GAW18 papers used these variant priority groups and gene mapping; one of these paper specifically found evidence of stronger association signal for protein damaging variants.
Collapse
Affiliation(s)
- Thomas Nalpathamkalam
- The Centre for Applied Genomics, The Hospital for Sick Children, 101 College Street, M5G 1L7 Toronto, ON, Canada ; Program in Genetics and Genome Biology, The Hospital for Sick Children, 101 College Street, M5G 1L7 Toronto, ON, Canada
| | - Andriy Derkach
- Department of Statistics, University of Toronto, 100 St. George St., M5S 3G3 Toronto, ON, Canada
| | - Andrew D Paterson
- Program in Genetics and Genome Biology, The Hospital for Sick Children, 101 College Street, M5G 1L7 Toronto, ON, Canada ; Division of Biostatistics, Dalla Lana School of Public Health, 155 College Street, University of Toronto, M5T 3M7 Toronto, ON, Canada
| | - Daniele Merico
- The Centre for Applied Genomics, The Hospital for Sick Children, 101 College Street, M5G 1L7 Toronto, ON, Canada ; Program in Genetics and Genome Biology, The Hospital for Sick Children, 101 College Street, M5G 1L7 Toronto, ON, Canada
| |
Collapse
|
161
|
Feng T, Zhu X. Whole genome sequencing data from pedigrees suggests linkage disequilibrium among rare variants created by population admixture. BMC Proc 2014; 8:S44. [PMID: 25519326 PMCID: PMC4143626 DOI: 10.1186/1753-6561-8-s1-s44] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Next-generation sequencing technologies have been designed to discover rare and de novo variants and are an important tool for identifying rare disease variants. Many statistical methods have been developed to test, using next-generation sequencing data, for rare variants that are associated with a trait. However, many of these methods make assumptions that rare variants are in linkage equilibrium in a gene. In this report, we studied whether transmitted or untransmitted haplotypes carry an excess of rare variants using the whole genome sequencing data of 15 large Mexican American pedigrees provided by the Genetic Analysis Workshop 18. We observed that an excess of rare variants are carried on either transmitted or nontransmitted haplotypes from parents to offspring. Further analyses suggest that such nonrandom associations among rare variants can be attributed to population admixture and single-nucleotide variant calling errors. Our results have significant implications for rare variant association studies, especially those conducted in admixed populations.
Collapse
Affiliation(s)
- Tao Feng
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| | - Xiaofeng Zhu
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Wolstein Research Building, 2103 Cornell Road, Cleveland, OH 44106, USA
| |
Collapse
|
162
|
Abstract
Genome-wide association studies have successfully identified common variants that are associated with complex diseases. However, the majority of genetic variants contributing to disease susceptibility are yet to be discovered. It is now widely believed that multiple rare variants are likely to be associated with complex diseases. Using custom-made chips or next-generation sequencing to uncover the effects of rare variants on the disease can be very expensive in current technology. Consequently, many researchers use the genotype imputation approach to predict the genotypes at these rare variants that are not directly genotyped in the study sample. One important question in genotype imputation is how to choose a reference panel that will produce high imputation accuracy in a population of interest. Using whole genome sequence data from the Genetic Analysis Workshop 18 data set, this report compares genotype imputation accuracy among reference panels representing different degrees of genetic similarity to a study sample of admixed Mexican Americans. Results show that a reference panel that closely matches the ancestry of the study population can increase imputation accuracy, but it can also result in more missing genotype calls. Having a larger-size reference panel can reduce imputation error and missing genotype, but the improvement may be limited. We also find that, for the admixed study sample, the simple selection of a single best-reference panel among HapMap African, European, or Asian population is not appropriate. The composite reference panel combining all available reference data should be used.
Collapse
Affiliation(s)
- Guan-Hua Huang
- Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu 30010, Taiwan
| | - Yi-Chi Tseng
- Institute of Statistics, National Chiao Tung University, 1001 University Road, Hsinchu 30010, Taiwan
| |
Collapse
|
163
|
Dering C, Schillert A, König IR, Ziegler A. A comparison of two collapsing methods in different approaches. BMC Proc 2014; 8:S8. [PMID: 25519408 PMCID: PMC4143760 DOI: 10.1186/1753-6561-8-s1-s8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Sequencing technologies have enabled the investigation of whole genomes of many individuals in parallel. Studies have shown that the joint consideration of multiple rare variants may explain a relevant proportion of the genetic basis for disease so that grouping of rare variants, termed collapsing, can enrich the association signal. Following this assumption, we investigate the type I error and the power of two proposed collapsing methods (combined multivariate and collapsing method and the functional principal component analysis [FPCA]-based statistic) using the case-control data provided for the Genetic Analysis Workshop 18 with knowledge of the true model. Variants with a minor allele frequency (MAF) of 0.05 or less were collapsed per gene for combined multivariate and collapsing. Neither of the methods detected any of the truly associated genes reliably. Although combined multivariate and collapsing identified one gene with a power of 0.66, it had an unacceptably high false-positive rate of 75%. In contrast, FPCA covered the type I error level well but at the cost of low power. A strict filtering of variants by small MAF might lead to a better performance of the collapsing methods. Furthermore, the inclusion of information on functionality of the variants could be helpful.
Collapse
Affiliation(s)
- Carmen Dering
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Haus. 24, 23562 Lübeck, Germany
| | - Arne Schillert
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Haus. 24, 23562 Lübeck, Germany
| | - Inke R König
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Haus. 24, 23562 Lübeck, Germany
| | - Andreas Ziegler
- Institut für Medizinische Biometrie und Statistik, Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Ratzeburger Allee 160, Haus. 24, 23562 Lübeck, Germany ; Zentrum für Klinische Studien, Universität zu Lübeck, Ratzeburger Allee 160, Haus. 2, 23562, Lübeck, Germany
| |
Collapse
|
164
|
Sham PC, Purcell SM. Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 2014; 15:335-46. [PMID: 24739678 DOI: 10.1038/nrg3706] [Citation(s) in RCA: 377] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Significance testing was developed as an objective method for summarizing statistical evidence for a hypothesis. It has been widely adopted in genetic studies, including genome-wide association studies and, more recently, exome sequencing studies. However, significance testing in both genome-wide and exome-wide studies must adopt stringent significance thresholds to allow multiple testing, and it is useful only when studies have adequate statistical power, which depends on the characteristics of the phenotype and the putative genetic variant, as well as the study design. Here, we review the principles and applications of significance testing and power calculation, including recently proposed gene-based tests for rare variants.
Collapse
Affiliation(s)
- Pak C Sham
- Centre for Genomic Sciences, Jockey Club Building for Interdisciplinary Research; State Key Laboratory of Brain and Cognitive Sciences, and Department of Psychiatry, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong SAR, China
| | - Shaun M Purcell
- 1] Center for Statistical Genetics, Icahn School of Medicine at Mount Sinai, New York 10029-6574, USA. [2] Center for Human Genetic Research, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA
| |
Collapse
|
165
|
Scott-Van Zeeland AA, Bloss CS, Tewhey R, Bansal V, Torkamani A, Libiger O, Duvvuri V, Wineinger N, Galvez L, Darst BF, Smith EN, Carson A, Pham P, Phillips T, Villarasa N, Tisch R, Zhang G, Levy S, Murray S, Chen W, Srinivasan S, Berenson G, Brandt H, Crawford S, Crow S, Fichter MM, Halmi KA, Johnson C, Kaplan AS, La Via M, Mitchell JE, Strober M, Rotondo A, Treasure J, Woodside DB, Bulik CM, Keel P, Klump KL, Lilenfeld L, Plotnicov K, Topol EJ, Shih PB, Magistretti P, Bergen AW, Berrettini W, Kaye W, Schork NJ. Evidence for the role of EPHX2 gene variants in anorexia nervosa. Mol Psychiatry 2014; 19:724-32. [PMID: 23999524 PMCID: PMC3852189 DOI: 10.1038/mp.2013.91] [Citation(s) in RCA: 59] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Revised: 06/19/2013] [Accepted: 06/24/2013] [Indexed: 01/08/2023]
Abstract
Anorexia nervosa (AN) and related eating disorders are complex, multifactorial neuropsychiatric conditions with likely rare and common genetic and environmental determinants. To identify genetic variants associated with AN, we pursued a series of sequencing and genotyping studies focusing on the coding regions and upstream sequence of 152 candidate genes in a total of 1205 AN cases and 1948 controls. We identified individual variant associations in the Estrogen Receptor-ß (ESR2) gene, as well as a set of rare and common variants in the Epoxide Hydrolase 2 (EPHX2) gene, in an initial sequencing study of 261 early-onset severe AN cases and 73 controls (P=0.0004). The association of EPHX2 variants was further delineated in: (1) a pooling-based replication study involving an additional 500 AN patients and 500 controls (replication set P=0.00000016); (2) single-locus studies in a cohort of 386 previously genotyped broadly defined AN cases and 295 female population controls from the Bogalusa Heart Study (BHS) and a cohort of 58 individuals with self-reported eating disturbances and 851 controls (combined smallest single locus P<0.01). As EPHX2 is known to influence cholesterol metabolism, and AN is often associated with elevated cholesterol levels, we also investigated the association of EPHX2 variants and longitudinal body mass index (BMI) and cholesterol in BHS female and male subjects (N=229) and found evidence for a modifying effect of a subset of variants on the relationship between cholesterol and BMI (P<0.01). These findings suggest a novel association of gene variants within EPHX2 to susceptibility to AN and provide a foundation for future study of this important yet poorly understood condition.
Collapse
Affiliation(s)
- A A Scott-Van Zeeland
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - C S Bloss
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - R Tewhey
- Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - V Bansal
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - A Torkamani
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - O Libiger
- The Scripps Translational Science Institute, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - V Duvvuri
- Department of Pediatrics, The University of California, San Diego, La Jolla, CA, USA
| | - N Wineinger
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - L Galvez
- The Scripps Translational Science Institute, La Jolla, CA, USA
| | - B F Darst
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - E N Smith
- Department of Pediatrics, The University of California, San Diego, La Jolla, CA, USA
| | - A Carson
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - P Pham
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - T Phillips
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - N Villarasa
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - R Tisch
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - G Zhang
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA
| | - S Levy
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - S Murray
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - W Chen
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - S Srinivasan
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - G Berenson
- Department of Epidemiology, Tulane University, New Orleans, LA, USA
| | - H Brandt
- Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - S Crawford
- Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - S Crow
- Department of Psychiatry, University of Minnesota, Minneapolis, MN, USA
| | - M M Fichter
- Roseneck Hospital for Behavioral Medicine, Prien, Germany
| | - K A Halmi
- Eating Disorder Research Program Weill Cornell Medical College, White Plains, NY, USA
| | - C Johnson
- Eating Recovery Center, Denver, CO, USA
| | - A S Kaplan
- Center for Addiction and Mental Health, Toronto, ON, Canada,Department of Psychiatry, Toronto General Hospital, University Health Network, Toronto, ON, Canada,Department of Psychiatry, University of Toronto, Toronto, ON, Canada
| | - M La Via
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - J E Mitchell
- Neuropsychiatric Research Institute, Fargo, ND, USA,Department of Clinical Neuroscience, University of North Dakota School of Medicine and Health Sciences, Grand Forks, ND, USA
| | - M Strober
- Department of Psychiatry and Biobehavioral Sciences, David Geffen School of Medicine, University of California at Los Angeles, Los Angeles, CA, USA
| | - A Rotondo
- Department of Psychiatry, Neurobiology, Pharmacology, and Biotechnology, University of Pisa, Pisa, Italy
| | - J Treasure
- Department of Academic Psychiatry, Bermondsey Wing Guys Hospital, University of London, London, UK
| | - D B Woodside
- Department of Psychiatry, Toronto General Hospital, University Health Network, Toronto, ON, Canada,Department of Psychiatry, University of Toronto, Toronto, ON, Canada,Department of Psychology, Florida State University, Tallahassee, FL, USA
| | - C M Bulik
- Department of Psychiatry, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA,Department of Nutrition, University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
| | - P Keel
- Department of Psychology, Florida State University, Tallahassee, FL, USA
| | - K L Klump
- Department of Psychology, Michigan State University, East Lansing, MI, USA
| | - L Lilenfeld
- Clinical Psychology Program, American School of Professional Psychology at Argosy University, Washington, DC, USA
| | - K Plotnicov
- Department of Psychiatry, University of Pittsburgh, Pittsburgh, PA, USA
| | - E J Topol
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA
| | - P B Shih
- Department of Pediatrics, The University of California, San Diego, La Jolla, CA, USA
| | - P Magistretti
- Laboratory of Neuroenergetics and Cellular Dynamics, The University of Lausanne, Lausanne, Switzerland
| | - A W Bergen
- Center for Health Sciences, SRI International, Menlo Park, CA, USA
| | - W Berrettini
- Department of Psychiatry, The University of Pennsylvania, Philadelphia, PA, USA
| | - W Kaye
- Department of Pediatrics, The University of California, San Diego, La Jolla, CA, USA
| | - N J Schork
- The Scripps Translational Science Institute, La Jolla, CA, USA,Scripps Health, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, USA,Department of Molecular and Experimental Medicine, The Scripps Research Institute, 3344 N Torrey Pines Court, Room 306, La Jolla, CA 92037, USA. E-mail:
| |
Collapse
|
166
|
Liu JZ, Anderson CA. Genetic studies of Crohn's disease: past, present and future. Best Pract Res Clin Gastroenterol 2014; 28:373-86. [PMID: 24913378 PMCID: PMC4075408 DOI: 10.1016/j.bpg.2014.04.009] [Citation(s) in RCA: 70] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/24/2014] [Revised: 04/14/2014] [Accepted: 04/24/2014] [Indexed: 01/31/2023]
Abstract
The exact aetiology of Crohn's disease is unknown, though it is clear from early epidemiological studies that a combination of genetic and environmental risk factors contributes to an individual's disease susceptibility. Here, we review the history of gene-mapping studies of Crohn's disease, from the linkage-based studies that first implicated the NOD2 locus, through to modern-day genome-wide association studies that have discovered over 140 loci associated with Crohn's disease and yielded novel insights into the biological pathways underlying pathogenesis. We describe on-going and future gene-mapping studies that utilise next generation sequencing technology to pinpoint causal variants and identify rare genetic variation underlying Crohn's disease risk. We comment on the utility of genetic markers for predicting an individual's disease risk and discuss their potential for identifying novel drug targets and influencing disease management. Finally, we describe how these studies have shaped and continue to shape our understanding of the genetic architecture of Crohn's disease.
Collapse
Affiliation(s)
- Jimmy Z Liu
- The Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK
| | | |
Collapse
|
167
|
Yan Q, Tiwari HK, Yi N, Lin WY, Gao G, Lou XY, Cui X, Liu N. Kernel-machine testing coupled with a rank-truncation method for genetic pathway analysis. Genet Epidemiol 2014; 38:447-56. [PMID: 24849109 DOI: 10.1002/gepi.21813] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Revised: 04/09/2014] [Accepted: 04/10/2014] [Indexed: 01/09/2023]
Abstract
Traditional genome-wide association studies (GWASs) usually focus on single-marker analysis, which only accesses marginal effects. Pathway analysis, on the other hand, considers biological pathway gene marker hierarchical structure and therefore provides additional insights into the genetic architecture underlining complex diseases. Recently, a number of methods for pathway analysis have been proposed to assess the significance of a biological pathway from a collection of single-nucleotide polymorphisms. In this study, we propose a novel approach for pathway analysis that assesses the effects of genes using the sequence kernel association test and the effects of pathways using an extended adaptive rank truncated product statistic. It has been increasingly recognized that complex diseases are caused by both common and rare variants. We propose a new weighting scheme for genetic variants across the whole allelic frequency spectrum to be analyzed together without any form of frequency cutoff for defining rare variants. The proposed approach is flexible. It is applicable to both binary and continuous traits, and incorporating covariates is easy. Furthermore, it can be readily applied to GWAS data, exome-sequencing data, and deep resequencing data. We evaluate the new approach on data simulated under comprehensive scenarios and show that it has the highest power in most of the scenarios while maintaining the correct type I error rate. We also apply our proposed methodology to data from a study of the association between bipolar disorder and candidate pathways from Wellcome Trust Case Control Consortium (WTCCC) to show its utility.
Collapse
Affiliation(s)
- Qi Yan
- Department of Biostatistics, University of Alabama at Birmingham, Birmingham, Alabama, United States of America
| | | | | | | | | | | | | | | |
Collapse
|
168
|
Stade B, Seelow D, Thomsen I, Krawczak M, Franke A. GrabBlur--a framework to facilitate the secure exchange of whole-exome and -genome SNV data using VCF files. BMC Genomics 2014; 15 Suppl 4:S8. [PMID: 25055742 PMCID: PMC4083413 DOI: 10.1186/1471-2164-15-s4-s8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Background Next Generation Sequencing (NGS) of whole exomes or genomes is increasingly being used in human genetic research and diagnostics. Sharing NGS data with third parties can help physicians and researchers to identify causative or predisposing mutations for a specific sample of interest more efficiently. In many cases, however, the exchange of such data may collide with data privacy regulations. GrabBlur is a newly developed tool to aggregate and share NGS-derived single nucleotide variant (SNV) data in a public database, keeping individual samples unidentifiable. In contrast to other currently existing SNV databases, GrabBlur includes phenotypic information and contact details of the submitter of a given database entry. By means of GrabBlur human geneticists can securely and easily share SNV data from resequencing projects. GrabBlur can ease the interpretation of SNV data by offering basic annotations, genotype frequencies and in particular phenotypic information - given that this information was shared - for the SNV of interest. Tool description GrabBlur facilitates the combination of phenotypic and NGS data (VCF files) via a local interface or command line operations. Data submissions may include HPO (Human Phenotype Ontology) terms, other trait descriptions, NGS technology information and the identity of the submitter. Most of this information is optional and its provision at the discretion of the submitter. Upon initial intake, GrabBlur merges and aggregates all sample-specific data. If a certain SNV is rare, the sample-specific information is replaced with the submitter identity. Generally, all data in GrabBlur are highly aggregated so that they can be shared with others while ensuring maximum privacy. Thus, it is impossible to reconstruct complete exomes or genomes from the database or to re-identify single individuals. After the individual information has been sufficiently "blurred", the data can be uploaded into a publicly accessible domain where aggregated genotypes are provided alongside phenotypic information. A web interface allows querying the database and the extraction of gene-wise SNV information. If an interesting SNV is found, the interrogator can get in contact with the submitter to exchange further information on the carrier and clarify, for example, whether the latter's phenotype matches with phenotype of their own patient.
Collapse
|
169
|
Fan R, Wang Y, Mills JL, Wilson AF, Bailey-Wilson JE, Xiong M. Functional linear models for association analysis of quantitative traits. Genet Epidemiol 2014; 37:726-42. [PMID: 24130119 DOI: 10.1002/gepi.21757] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2013] [Revised: 07/15/2013] [Accepted: 08/14/2013] [Indexed: 12/19/2022]
Abstract
Functional linear models are developed in this paper for testing associations between quantitative traits and genetic variants, which can be rare variants or common variants or the combination of the two. By treating multiple genetic variants of an individual in a human population as a realization of a stochastic process, the genome of an individual in a chromosome region is a continuum of sequence data rather than discrete observations. The genome of an individual is viewed as a stochastic function that contains both linkage and linkage disequilibrium (LD) information of the genetic markers. By using techniques of functional data analysis, both fixed and mixed effect functional linear models are built to test the association between quantitative traits and genetic variants adjusting for covariates. After extensive simulation analysis, it is shown that the F-distributed tests of the proposed fixed effect functional linear models have higher power than that of sequence kernel association test (SKAT) and its optimal unified test (SKAT-O) for three scenarios in most cases: (1) the causal variants are all rare, (2) the causal variants are both rare and common, and (3) the causal variants are common. The superior performance of the fixed effect functional linear models is most likely due to its optimal utilization of both genetic linkage and LD information of multiple genetic variants in a genome and similarity among different individuals, while SKAT and SKAT-O only model the similarities and pairwise LD but do not model linkage and higher order LD information sufficiently. In addition, the proposed fixed effect models generate accurate type I error rates in simulation studies. We also show that the functional kernel score tests of the proposed mixed effect functional linear models are preferable in candidate gene analysis and small sample problems. The methods are applied to analyze three biochemical traits in data from the Trinity Students Study.
Collapse
Affiliation(s)
- Ruzong Fan
- Biostatistics and Bioinformatics Branch, Division of Intramural Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Rockville, Maryland, United States of America
| | | | | | | | | | | |
Collapse
|
170
|
Abstract
This article focuses on conducting global testing for association between a binary trait and a set of rare variants (RVs), although its application can be much broader to other types of traits, common variants (CVs), and gene set or pathway analysis. We show that many of the existing tests have deteriorating performance in the presence of many nonassociated RVs: their power can dramatically drop as the proportion of nonassociated RVs in the group to be tested increases. We propose a class of so-called sum of powered score (SPU) tests, each of which is based on the score vector from a general regression model and hence can deal with different types of traits and adjust for covariates, e.g., principal components accounting for population stratification. The SPU tests generalize the sum test, a representative burden test based on pooling or collapsing genotypes of RVs, and a sum of squared score (SSU) test that is closely related to several other powerful variance component tests; a previous study (Basu and Pan 2011) has demonstrated good performance of one, but not both, of the Sum and SSU tests in many situations. The SPU tests are versatile in the sense that one of them is often powerful, although its identity varies with the unknown true association parameters. We propose an adaptive SPU (aSPU) test to approximate the most powerful SPU test for a given scenario, consequently maintaining high power and being highly adaptive across various scenarios. We conducted extensive simulations to show superior performance of the aSPU test over several state-of-the-art association tests in the presence of many nonassociated RVs. Finally we applied the SPU and aSPU tests to the GAW17 mini-exome sequence data to compare its practical performance with some existing tests, demonstrating their potential usefulness.
Collapse
|
171
|
Vrieze SI, Feng S, Miller MB, Hicks BM, Pankratz N, Abecasis GR, Iacono WG, McGue M. Rare nonsynonymous exonic variants in addiction and behavioral disinhibition. Biol Psychiatry 2014; 75:783-9. [PMID: 24094508 PMCID: PMC3975816 DOI: 10.1016/j.biopsych.2013.08.027] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2013] [Revised: 08/02/2013] [Accepted: 08/26/2013] [Indexed: 10/26/2022]
Abstract
BACKGROUND Substance use is heritable, but few common genetic variants have been associated with these behaviors. Rare nonsynonymous exonic variants can now be efficiently genotyped, allowing exome-wide association tests. We identified and tested 111,592 nonsynonymous exonic variants for association with behavioral disinhibition and the use/misuse of nicotine, alcohol, and illicit drugs. METHODS Comprehensive genotyping of exonic variation combined with single-variant and gene-based tests of association was conducted in 7181 individuals; 172 candidate addiction genes were evaluated in greater detail. We also evaluated the aggregate effects of nonsynonymous variants on these phenotypes using Genome-wide Complex Trait Analysis. RESULTS No variant or gene was significantly associated with any phenotype. No association was found for any of the 172 candidate genes, even at reduced significance thresholds. All nonsynonymous variants jointly accounted for 35% of the heritability in illicit drug use and, when combined with common variants from a genome-wide array, accounted for 84% of the heritability. CONCLUSIONS Rare nonsynonymous variants may be important in etiology of illicit drug use, but detection of individual variants will require very large samples.
Collapse
Affiliation(s)
- Scott I Vrieze
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan.
| | - Shuang Feng
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - Michael B Miller
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| | - Brian M Hicks
- Department of Psychiatry (BMH), University of Michigan, Ann Arbor, Michigan
| | - Nathan Pankratz
- Department of Laboratory Medicine and Pathology (NP), University of Minnesota, Minneapolis, Minnesota
| | - Gonçalo R Abecasis
- Center for Statistical Genetics (SIV, SF, GRA), Department of Biostatistics, University of Michigan, Ann Arbor, Michigan
| | - William G Iacono
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| | - Matt McGue
- Department of Psychology (MBM, WGI, MM), University of Minnesota, Minneapolis, Minnesota
| |
Collapse
|
172
|
Peterson RE, Maes HH, Lin P, Kramer JR, Hesselbrock VM, Bauer LO, Nurnberger JI, Edenberg HJ, Dick DM, Webb BT. On the association of common and rare genetic variation influencing body mass index: a combined SNP and CNV analysis. BMC Genomics 2014; 15:368. [PMID: 24884913 PMCID: PMC4035084 DOI: 10.1186/1471-2164-15-368] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Accepted: 04/27/2014] [Indexed: 12/18/2022] Open
Abstract
Background As the architecture of complex traits incorporates a widening spectrum of genetic variation, analyses integrating common and rare variation are needed. Body mass index (BMI) represents a model trait, since common variation shows robust association but accounts for a fraction of the heritability. A combined analysis of single nucleotide polymorphisms (SNP) and copy number variation (CNV) was performed using 1850 European and 498 African-Americans from the Study of Addiction: Genetics and Environment. Genetic risk sum scores (GRSS) were constructed using 32 BMI-validated SNPs and aggregate-risk methods were compared: count versus weighted and proxy versus imputation. Results The weighted SNP-GRSS constructed from imputed probabilities of risk alleles performed best and was highly associated with BMI (p = 4.3×10−16) accounting for 3% of the phenotypic variance. In addition to BMI-validated SNPs, common and rare BMI/obesity-associated CNVs were identified from the literature. Of the 84 CNVs previously reported, only 21-kilobase deletions on 16p12.3 showed evidence for association with BMI (p = 0.003, frequency = 16.9%), with two CNVs nominally associated with class II obesity, 1p36.1 duplications (OR = 3.1, p = 0.009, frequency 1.2%) and 5q13.2 deletions (OR = 1.5, p = 0.048, frequency 7.7%). All other CNVs, individually and in aggregate, were not associated with BMI or obesity. The combined model, including covariates, SNP-GRSS, and 16p12.3 deletion accounted for 11.5% of phenotypic variance in BMI (3.2% from genetic effects). Models significantly predicted obesity classification with maximum discriminative ability for morbid-obesity (p = 3.15×10−18). Conclusion Results show that incorporating validated effect sizes and allelic probabilities improve prediction algorithms. Although rare-CNVs did not account for significant phenotypic variation, results provide a framework for integrated analyses. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-368) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Roseann E Peterson
- Virginia Institute for Psychiatric and Behavioral Genetics, Department of Human and Molecular Genetics, School of Medicine, Virginia Commonwealth University, Biotech I, 800 E, Leigh Street, Richmond, VA 23298-0126, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
173
|
King EG, Sanderson BJ, McNeil CL, Long AD, Macdonald SJ. Genetic dissection of the Drosophila melanogaster female head transcriptome reveals widespread allelic heterogeneity. PLoS Genet 2014; 10:e1004322. [PMID: 24810915 PMCID: PMC4014434 DOI: 10.1371/journal.pgen.1004322] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2013] [Accepted: 03/10/2014] [Indexed: 12/01/2022] Open
Abstract
Modern genetic mapping is plagued by the “missing heritability” problem, which refers to the discordance between the estimated heritabilities of quantitative traits and the variance accounted for by mapped causative variants. One major potential explanation for the missing heritability is allelic heterogeneity, in which there are multiple causative variants at each causative gene with only a fraction having been identified. The majority of genome-wide association studies (GWAS) implicitly assume that a single SNP can explain all the variance for a causative locus. However, if allelic heterogeneity is prevalent, a substantial amount of genetic variance will remain unexplained. In this paper, we take a haplotype-based mapping approach and quantify the number of alleles segregating at each locus using a large set of 7922 eQTL contributing to regulatory variation in the Drosophila melanogaster female head. Not only does this study provide a comprehensive eQTL map for a major community genetic resource, the Drosophila Synthetic Population Resource, but it also provides a direct test of the allelic heterogeneity hypothesis. We find that 95% of cis-eQTLs and 78% of trans-eQTLs are due to multiple alleles, demonstrating that allelic heterogeneity is widespread in Drosophila eQTL. Allelic heterogeneity likely contributes significantly to the missing heritability problem common in GWAS studies. For traits with complex genetic inheritance it has generally proven very difficult to identify the majority of the specific causative variants involved. A range of hypotheses have been put forward to explain this so-called “missing heritability”. One idea—allelic heterogeneity, where genes each harbor multiple different causative variants—has received little attention, because it is difficult to detect with most genetic mapping designs. Here we make use of a panel of Drosophila melanogaster lines derived from multiple founders, allowing us to directly test for the presence of multiple alleles at a large set of genetic loci influencing gene expression. We find that the vast majority of loci harbor more than two functional alleles, demonstrating extensive allelic heterogeneity at the level of gene expression and suggesting that such heterogeneity is an important factor determining the genetic basis of complex trait variation in general.
Collapse
Affiliation(s)
- Elizabeth G. King
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
- * E-mail:
| | - Brian J. Sanderson
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| | - Casey L. McNeil
- Department of Biology, Newman University, Wichita, Kansas, United States of America
| | - Anthony D. Long
- Department of Ecology and Evolutionary Biology, University of California Irvine, Irvine, California, United States of America
| | - Stuart J. Macdonald
- Department of Molecular Biosciences, University of Kansas, Lawrence, Kansas, United States of America
| |
Collapse
|
174
|
Zhang F, Boerwinkle E, Xiong M. Epistasis analysis for quantitative traits by functional regression model. Genome Res 2014; 24:989-98. [PMID: 24803592 PMCID: PMC4032862 DOI: 10.1101/gr.161760.113] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The critical barrier in interaction analysis for rare variants is that most traditional statistical methods for testing interactions were originally designed for testing the interaction between common variants and are difficult to apply to rare variants because of their prohibitive computational time and poor ability. The great challenges for successful detection of interactions with next-generation sequencing (NGS) data are (1) lack of methods for interaction analysis with rare variants, (2) severe multiple testing, and (3) time-consuming computations. To meet these challenges, we shift the paradigm of interaction analysis between two loci to interaction analysis between two sets of loci or genomic regions and collectively test interactions between all possible pairs of SNPs within two genomic regions. In other words, we take a genome region as a basic unit of interaction analysis and use high-dimensional data reduction and functional data analysis techniques to develop a novel functional regression model to collectively test interactions between all possible pairs of single nucleotide polymorphisms (SNPs) within two genome regions. By intensive simulations, we demonstrate that the functional regression models for interaction analysis of the quantitative trait have the correct type 1 error rates and a much better ability to detect interactions than the current pairwise interaction analysis. The proposed method was applied to exome sequence data from the NHLBI’s Exome Sequencing Project (ESP) and CHARGE-S study. We discovered 27 pairs of genes showing significant interactions after applying the Bonferroni correction (P-values < 4.58 × 10−10) in the ESP, and 11 were replicated in the CHARGE-S study.
Collapse
Affiliation(s)
- Futao Zhang
- Institute of Bioinformatics, Zhejiang University, Hangzhou, Zhejiang 310058, China; Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| | - Eric Boerwinkle
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| | - Momiao Xiong
- Human Genetics Center, Division of Biostatistics, The University of Texas School of Public Health, Houston, Texas 77030, USA
| |
Collapse
|
175
|
Derkach A, Lawless JF, Sun L. Pooled Association Tests for Rare Genetic Variants: A Review and Some New Results. Stat Sci 2014. [DOI: 10.1214/13-sts456] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
176
|
Lin WY. Association testing of clustered rare causal variants in case-control studies. PLoS One 2014; 9:e94337. [PMID: 24736372 PMCID: PMC3988195 DOI: 10.1371/journal.pone.0094337] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 03/12/2014] [Indexed: 11/18/2022] Open
Abstract
Biological evidence suggests that multiple causal variants in a gene may cluster physically. Variants within the same protein functional domain or gene regulatory element would locate in close proximity on the DNA sequence. However, spatial information of variants is usually not used in current rare variant association analyses. We here propose a clustering method (abbreviated as "CLUSTER"), which is extended from the adaptive combination of P-values. Our method combines the association signals of variants that are more likely to be causal. Furthermore, the statistic incorporates the spatial information of variants. With extensive simulations, we show that our method outperforms several commonly-used methods in many scenarios. To demonstrate its use in real data analyses, we also apply this CLUSTER test to the Dallas Heart Study data. CLUSTER is among the best methods when the effects of causal variants are all in the same direction. As variants located in close proximity are more likely to have similar impact on disease risk, CLUSTER is recommended for association testing of clustered rare causal variants in case-control studies.
Collapse
Affiliation(s)
- Wan-Yu Lin
- Institute of Epidemiology and Preventive Medicine, College of Public Health, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
177
|
Bodian DL, McCutcheon JN, Kothiyal P, Huddleston KC, Iyer RK, Vockley JG, Niederhuber JE. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing. PLoS One 2014; 9:e94554. [PMID: 24728327 PMCID: PMC3984285 DOI: 10.1371/journal.pone.0094554] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2013] [Accepted: 02/17/2014] [Indexed: 01/05/2023] Open
Abstract
Technological advances coupled with decreasing costs are bringing whole genome and whole exome sequencing closer to routine clinical use. One of the hurdles to clinical implementation is the high number of variants of unknown significance. For cancer-susceptibility genes, the difficulty in interpreting the clinical relevance of the genomic variants is compounded by the fact that most of what is known about these variants comes from the study of highly selected populations, such as cancer patients or individuals with a family history of cancer. The genetic variation in known cancer-susceptibility genes in the general population has not been well characterized to date. To address this gap, we profiled the nonsynonymous genomic variation in 158 genes causally implicated in carcinogenesis using high-quality whole genome sequences from an ancestrally diverse cohort of 681 healthy individuals. We found that all individuals carry multiple variants that may impact cancer susceptibility, with an average of 68 variants per individual. Of the 2,688 allelic variants identified within the cohort, most are very rare, with 75% found in only 1 or 2 individuals in our population. Allele frequencies vary between ancestral groups, and there are 21 variants for which the minor allele in one population is the major allele in another. Detailed analysis of a selected subset of 5 clinically important cancer genes, BRCA1, BRCA2, KRAS, TP53, and PTEN, highlights differences between germline variants and reported somatic mutations. The dataset can serve a resource of genetic variation in cancer-susceptibility genes in 6 ancestry groups, an important foundation for the interpretation of cancer risk from personal genome sequences.
Collapse
Affiliation(s)
- Dale L. Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Justine N. McCutcheon
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Prachi Kothiyal
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Kathi C. Huddleston
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Ramaswamy K. Iyer
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| | - Joseph G. Vockley
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
- * E-mail:
| | - John E. Niederhuber
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia, United States of America
| |
Collapse
|
178
|
Nishino J, Sugiyama M, Nishida N, Tokunaga K, Mizokami M, Mano S. The interaction of a single-nucleotide polymorphism with age on response to interferon-α and ribavirin therapy in female patients with hepatitis C infection. J Med Virol 2014; 86:1130-3. [PMID: 24692042 DOI: 10.1002/jmv.23939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/07/2014] [Indexed: 11/08/2022]
Abstract
Older female patients exhibit a poor response to the current standard treatment for hepatitis C, interferon-α, and ribavirin (PEG-IFN-α/RBV). In this study, we reported that the combination of age and the genotype of a novel SNP can predict response to standard treatment (P = 7.31 × 10(-8)). The model incorporating genotype of the novel SNP, rs1287948, predicts response more accurately (AUC = 0.934; 95% CI = 0.881-0.988) in women as compared with the model using age and the previously identified SNP, rs8099917.
Collapse
Affiliation(s)
- Jo Nishino
- Center for Information Biology, National Institute of Genetics, Mishima, Japan
| | | | | | | | | | | |
Collapse
|
179
|
Cook K, Benitez A, Fu C, Tintle N. Evaluating the impact of genotype errors on rare variant tests of association. Front Genet 2014; 5:62. [PMID: 24744770 PMCID: PMC3978329 DOI: 10.3389/fgene.2014.00062] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2013] [Accepted: 03/11/2014] [Indexed: 01/23/2023] Open
Abstract
The new class of rare variant tests has usually been evaluated assuming perfect genotype information. In reality, rare variant genotypes may be incorrect, and so rare variant tests should be robust to imperfect data. Errors and uncertainty in SNP genotyping are already known to dramatically impact statistical power for single marker tests on common variants and, in some cases, inflate the type I error rate. Recent results show that uncertainty in genotype calls derived from sequencing reads are dependent on several factors, including read depth, calling algorithm, number of alleles present in the sample, and the frequency at which an allele segregates in the population. We have recently proposed a general framework for the evaluation and investigation of rare variant tests of association, classifying most rare variant tests into one of two broad categories (length or joint tests). We use this framework to relate factors affecting genotype uncertainty to the power and type I error rate of rare variant tests. We find that non-differential genotype errors (an error process that occurs independent of phenotype) decrease power, with larger decreases for extremely rare variants, and for the common homozygote to heterozygote error. Differential genotype errors (an error process that is associated with phenotype status), lead to inflated type I error rates which are more likely to occur at sites with more common homozygote to heterozygote errors than vice versa. Finally, our work suggests that certain rare variant tests and study designs may be more robust to the inclusion of genotype errors. Further work is needed to directly integrate genotype calling algorithm decisions, study costs and test statistic choices to provide comprehensive design and analysis advice which appropriately accounts for the impact of genotype errors.
Collapse
Affiliation(s)
- Kaitlyn Cook
- Department of Mathematics, Carleton College Northfield, MN, USA
| | - Alejandra Benitez
- Department of Applied Mathematics, Brown University Providence, RI, USA
| | - Casey Fu
- Department of Mathematics, Massachusetts Institute of Technology Boston, MA, USA
| | - Nathan Tintle
- Department of Mathematics, Statistics and Computer Science, Dordt College Sioux Center, IA, USA
| |
Collapse
|
180
|
Zeng P, Zhao Y, Zhang L, Huang S, Chen F. Rare variants detection with kernel machine learning based on likelihood ratio test. PLoS One 2014; 9:e93355. [PMID: 24675868 PMCID: PMC3968153 DOI: 10.1371/journal.pone.0093355] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2013] [Accepted: 03/03/2014] [Indexed: 11/18/2022] Open
Abstract
This paper mainly utilizes likelihood-based tests to detect rare variants associated with a continuous phenotype under the framework of kernel machine learning. Both the likelihood ratio test (LRT) and the restricted likelihood ratio test (ReLRT) are investigated. The relationship between the kernel machine learning and the mixed effects model is discussed. By using the eigenvalue representation of LRT and ReLRT, their exact finite sample distributions are obtained in a simulation manner. Numerical studies are performed to evaluate the performance of the proposed approaches under the contexts of standard mixed effects model and kernel machine learning. The results have shown that the LRT and ReLRT can control the type I error correctly at the given α level. The LRT and ReLRT consistently outperform the SKAT, regardless of the sample size and the proportion of the negative causal rare variants, and suffer from fewer power reductions compared to the SKAT when both positive and negative effects of rare variants are present. The LRT and ReLRT performed under the context of kernel machine learning have slightly higher powers than those performed under the context of standard mixed effects model. We use the Genetic Analysis Workshop 17 exome sequencing SNP data as an illustrative example. Some interesting results are observed from the analysis. Finally, we give the discussion.
Collapse
Affiliation(s)
- Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Yang Zhao
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Liwei Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical College, Xuzhou, Jiangsu, China
| | - Feng Chen
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing, Jiangsu, China
- * E-mail:
| |
Collapse
|
181
|
Nievergelt CM, Wineinger NE, Libiger O, Pham P, Zhang G, Baker DG, Schork NJ. Chip-based direct genotyping of coding variants in genome wide association studies: utility, issues and prospects. Gene 2014; 540:104-9. [PMID: 24521671 DOI: 10.1016/j.gene.2014.01.069] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2013] [Revised: 01/20/2014] [Accepted: 01/23/2014] [Indexed: 11/19/2022]
Abstract
There is considerable debate about the most efficient way to interrogate rare coding variants in association studies. The options include direct genotyping of specific known coding variants in genes or, alternatively, sequencing across the entire exome to capture known as well as novel variants. Each strategy has advantages and disadvantages, but the availability of cost-efficient exome arrays has made the former appealing. Here we consider the utility of a direct genotyping chip, the Illumina HumanExome array (HE), by evaluating its content based on: 1. functionality; and 2. amenability to imputation. We explored these issues by genotyping a large, ethnically diverse cohort on the HumanOmniExpressExome array (HOEE) which combines the HE with content from the GWAS array (HOE). We find that the use of the HE is likely to be a cost-effective way of expanding GWAS, but does have some drawbacks that deserve consideration when planning studies.
Collapse
Affiliation(s)
- Caroline M Nievergelt
- Department of Psychiatry, University of California, San Diego; VA Center of Excellence for Stress and Mental Health, VA San Diego.
| | - Nathan E Wineinger
- Scripps Genomic Medicine, Scripps Health; The Scripps Translational Science Institute, The Scripps Research Institute
| | - Ondrej Libiger
- The Scripps Translational Science Institute, The Scripps Research Institute
| | | | | | - Dewleen G Baker
- Department of Psychiatry, University of California, San Diego; VA Center of Excellence for Stress and Mental Health, VA San Diego
| | | |
Collapse
|
182
|
Li B, Liu DJ, Leal SM. Identifying rare variants associated with complex traits via sequencing. ACTA ACUST UNITED AC 2014; Chapter 1:Unit 1.26. [PMID: 23853079 DOI: 10.1002/0471142905.hg0126s78] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although genome-wide association studies have been successful in detecting associations with common variants, there is currently an increasing interest in identifying low-frequency and rare variants associated with complex traits. Next-generation sequencing technologies make it feasible to survey the full spectrum of genetic variation in coding regions or the entire genome. The association analysis for rare variants is challenging, and traditional methods are ineffective, however, due to the low frequency of rare variants, coupled with allelic heterogeneity. Recently a battery of new statistical methods has been proposed for identifying rare variants associated with complex traits. These methods test for associations by aggregating multiple rare variants across a gene or a genomic region or among a group of variants in the genome. In this unit, we describe key concepts for rare variant association for complex traits, survey some of the recent methods, discuss their statistical power under various scenarios, and provide practical guidance on analyzing next-generation sequencing data for identifying rare variants associated with complex traits.
Collapse
Affiliation(s)
- Bingshan Li
- Department of Molecular Physiology and Biophysics, Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, USA
| | | | | |
Collapse
|
183
|
Zakharov S, Teoh GHK, Salim A, Thalamuthu A. A method to incorporate prior information into score test for genetic association studies. BMC Bioinformatics 2014; 15:24. [PMID: 24450486 PMCID: PMC3904928 DOI: 10.1186/1471-2105-15-24] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 01/17/2014] [Indexed: 12/13/2022] Open
Abstract
Background The interest of the scientific community in investigating the impact of rare variants on complex traits has stimulated the development of novel statistical methodologies for association studies. The fact that many of the recently proposed methods for association studies suffer from low power to identify a genetic association motivates the incorporation of prior knowledge into statistical tests. Results In this article we propose a methodology to incorporate prior information into the region-based score test. Within our framework prior information is used to partition variants within a region into several groups, following which asymptotically independent group statistics are constructed and then combined into a global test statistic. Under the null hypothesis the distribution of our test statistic has lower degrees of freedom compared with those of the region-based score statistic. Theoretical power comparison, population genetics simulations and results from analysis of the GAW17 sequencing data set suggest that under some scenarios our method may perform as well as or outperform the score test and other competing methods. Conclusions An approach which uses prior information to improve the power of the region-based score test is proposed. Theoretical power comparison, population genetics simulations and the results of GAW17 data analysis showed that for some scenarios power of our method is on the level with or higher than those of the score test and other methods.
Collapse
Affiliation(s)
- Sergii Zakharov
- Human Genetics, Genome Institute of Singapore, 60 Biopolis Street, #02-01 Genome, Singapore 138672, Singapore.
| | | | | | | |
Collapse
|
184
|
Won S, Kim Y, Lange C. On rare-variant analysis in population-based designs: decomposing the likelihood to two informative components. Hum Hered 2014; 76:76-85. [PMID: 24434864 DOI: 10.1159/000357643] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2012] [Accepted: 11/29/2013] [Indexed: 11/19/2022] Open
Abstract
Various analytical approaches have been suggested for the characterization of rare variants. One main approach is to collapse the genetic information of rare variants in a region and to construct an overall test statistic. Here, we proposed a new approach based on collapsed genotype scores. By utilizing the information of the association signal that is ignored in collapsing methods, i.e. the configuration of rare alleles, we constructed a more powerful test and compared it with existing rare-variant approaches. With extensive simulation studies, we showed that our method performs better than existing approaches, and we applied our method to a sequencing study of nonsyndromic cleft lip illustrating the practical advantages of the proposed method.
Collapse
Affiliation(s)
- Sungho Won
- Department of Applied Statistics, Chung-Ang University, Seoul, Korea
| | | | | |
Collapse
|
185
|
Rare variant association testing by adaptive combination of P-values. PLoS One 2014; 9:e85728. [PMID: 24454922 PMCID: PMC3893264 DOI: 10.1371/journal.pone.0085728] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Accepted: 12/02/2013] [Indexed: 01/21/2023] Open
Abstract
With the development of next-generation sequencing technology, there is a great demand for powerful statistical methods to detect rare variants (minor allele frequencies (MAFs)<1%) associated with diseases. Testing for each variant site individually is known to be underpowered, and therefore many methods have been proposed to test for the association of a group of variants with phenotypes, by pooling signals of the variants in a chromosomal region. However, this pooling strategy inevitably leads to the inclusion of a large proportion of neutral variants, which may compromise the power of association tests. To address this issue, we extend the -MidP method (Cheung et al., 2012, Genet Epidemiol 36: 675–685) and propose an approach (named ‘adaptive combination of P-values for rare variant association testing’, abbreviated as ‘ADA’) that adaptively combines per-site P-values with the weights based on MAFs. Before combining P-values, we first imposed a truncation threshold upon the per-site P-values, to guard against the noise caused by the inclusion of neutral variants. This ADA method is shown to outperform popular burden tests and non-burden tests under many scenarios. ADA is recommended for next-generation sequencing data analysis where many neutral variants may be included in a functional region.
Collapse
|
186
|
Abstract
Genome-wide association studies (GWAS) are a powerful tool for investigators to examine the human genome to detect genetic risk factors, reveal the genetic architecture of diseases and open up new opportunities for treatment and prevention. However, despite its successes, GWAS have not been able to identify genetic loci that are effective classifiers of disease, limiting their value for genetic testing. This chapter highlights the challenges that lie ahead for GWAS in better identifying disease risk predictors, and how we may address them. In this regard, we review basic concepts regarding GWAS, the technologies used for capturing genetic variation, the missing heritability problem, the need for efficient study design especially for replication efforts, reducing the bias introduced into a dataset, and how to utilize new resources available, such as electronic medical records. We also look to what lies ahead for the field, and the approaches that can be taken to realize the full potential of GWAS.
Collapse
Affiliation(s)
- Rishika De
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH, USA
| | | | | |
Collapse
|
187
|
Zhu H, Khondker Z, Lu Z, Ibrahim JG. Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers. J Am Stat Assoc 2014; 109:997-990. [PMID: 25349462 PMCID: PMC4208701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
We propose a Bayesian generalized low rank regression model (GLRR) for the analysis of both high-dimensional responses and covariates. This development is motivated by performing searches for associations between genetic variants and brain imaging phenotypes. GLRR integrates a low rank matrix to approximate the high-dimensional regression coefficient matrix of GLRR and a dynamic factor model to model the high-dimensional covariance matrix of brain imaging phenotypes. Local hypothesis testing is developed to identify significant covariates on high-dimensional responses. Posterior computation proceeds via an efficient Markov chain Monte Carlo algorithm. A simulation study is performed to evaluate the finite sample performance of GLRR and its comparison with several competing approaches. We apply GLRR to investigate the impact of 1,071 SNPs on top 40 genes reported by AlzGene database on the volumes of 93 regions of interest (ROI) obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI).
Collapse
Affiliation(s)
- Hongtu Zhu
- H. Zhu is Professor of Biostatistics ( ), Z. Khondker was a Ph.d student under the supervision of Drs. Ibrahim and Zhu ( ), Z. Lu was a postdoctoral fellow under the supervision of Dr. Zhu ( ), and J. G. Ibrahim is Alumni Distinguished Professor of Biostatistics ( ), Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7420
| | - Zakaria Khondker
- H. Zhu is Professor of Biostatistics ( ), Z. Khondker was a Ph.d student under the supervision of Drs. Ibrahim and Zhu ( ), Z. Lu was a postdoctoral fellow under the supervision of Dr. Zhu ( ), and J. G. Ibrahim is Alumni Distinguished Professor of Biostatistics ( ), Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7420
| | - Zhaohua Lu
- H. Zhu is Professor of Biostatistics ( ), Z. Khondker was a Ph.d student under the supervision of Drs. Ibrahim and Zhu ( ), Z. Lu was a postdoctoral fellow under the supervision of Dr. Zhu ( ), and J. G. Ibrahim is Alumni Distinguished Professor of Biostatistics ( ), Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7420
| | - Joseph G Ibrahim
- H. Zhu is Professor of Biostatistics ( ), Z. Khondker was a Ph.d student under the supervision of Drs. Ibrahim and Zhu ( ), Z. Lu was a postdoctoral fellow under the supervision of Dr. Zhu ( ), and J. G. Ibrahim is Alumni Distinguished Professor of Biostatistics ( ), Department of Biostatistics, University of North Carolina at Chapel Hill, NC 27599-7420
| |
Collapse
|
188
|
Lange K, Papp JC, Sinsheimer JS, Sobel EM. Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2014; 1:279-300. [PMID: 24955378 PMCID: PMC4062304 DOI: 10.1146/annurev-statistics-022513-115638] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
Collapse
Affiliation(s)
- Kenneth Lange
- Depts of Biomathematics, Human Genetics, and Statistics, UCLA
| | | | - Janet S. Sinsheimer
- Depts of Biomathematics, Human Genetics, Statistics, and Biostatistics, UCLA
| | | |
Collapse
|
189
|
Gupta PK, Kulwal PL, Jaiswal V. Association mapping in crop plants: opportunities and challenges. ADVANCES IN GENETICS 2014; 85:109-47. [PMID: 24880734 DOI: 10.1016/b978-0-12-800271-1.00002-0] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
The research area of association mapping (AM) is currently receiving major attention for genetic studies of quantitative traits in all major crops. However, the level of success and utility of AM achieved for crop improvement is not comparable to that in the area of human health care for diagnosis of complex human diseases. These AM studies in plants, as in humans, became possible due to the availability of DNA-based molecular markers and a variety of sophisticated statistical tools that are evolving on a regular basis. In this chapter, we first briefly review the significance of a variety of populations that are used in AM studies, then briefly describe the molecular markers and high-throughput genotyping strategies, and finally describe the approaches used for AM studies. The major part of the chapter is, however, devoted to analysis of reasons why the results of AM have been underutilized in plant breeding. We also examine the opportunities available and challenges faced while using AM for crop improvement programs. This includes a detailed discussion of the issues that have plagued AM studies, and the solutions that have become available to deal with these issues, so that in future, the results of AM studies may prove increasingly fruitful for crop improvement programs.
Collapse
Affiliation(s)
- Pushpendra K Gupta
- Department of Genetics and Plant Breeding, Ch. Charan Singh University, Meerut, UP, India
| | - Pawan L Kulwal
- State Level Biotechnology Centre, Mahatma Phule Agricultural University, Rahuri, MS, India
| | - Vandana Jaiswal
- Department of Genetics and Plant Breeding, Ch. Charan Singh University, Meerut, UP, India
| |
Collapse
|
190
|
Larson NB, Schaid DJ. Regularized rare variant enrichment analysis for case-control exome sequencing data. Genet Epidemiol 2013; 38:104-13. [PMID: 24382715 DOI: 10.1002/gepi.21783] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 11/04/2013] [Accepted: 12/02/2013] [Indexed: 11/09/2022]
Abstract
Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research.
Collapse
Affiliation(s)
- Nicholas B Larson
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, United States of America
| | | |
Collapse
|
191
|
Cheng KF, Lee JY, Zheng W, Li C. A powerful association test of multiple genetic variants using a random-effects model. Stat Med 2013; 33:1816-27. [PMID: 24338936 DOI: 10.1002/sim.6068] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2012] [Revised: 11/09/2013] [Accepted: 11/19/2013] [Indexed: 01/26/2023]
Abstract
There is an emerging interest in sequencing-based association studies of multiple rare variants. Most association tests suggested in the literature involve collapsing rare variants with or without weighting. Recently, a variance-component score test [sequence kernel association test (SKAT)] was proposed to address the limitations of collapsing method. Although SKAT was shown to outperform most of the alternative tests, its applications and power might be restricted and influenced by missing genotypes. In this paper, we suggest a new method based on testing whether the fraction of causal variants in a region is zero. The new association test, T REM , is derived from a random-effects model and allows for missing genotypes, and the choice of weighting function is not required when common and rare variants are analyzed simultaneously. We performed simulations to study the type I error rates and power of four competing tests under various conditions on the sample size, genotype missing rate, variant frequency, effect directionality, and the number of non-causal rare variant and/or causal common variant. The simulation results showed that T REM was a valid test and less sensitive to the inclusion of non-causal rare variants and/or low effect common variants or to the presence of missing genotypes. When the effects were more consistent in the same direction, T REM also had better power performance. Finally, an application to the Shanghai Breast Cancer Study showed that rare causal variants at the FGFR2 gene were detected by T REM and SKAT, but T REM produced more consistent results for different sets of rare and common variants.
Collapse
Affiliation(s)
- K F Cheng
- Biostatistics Center and Department of Public Health, Taipei Medical University, Taiwan
| | | | | | | |
Collapse
|
192
|
Konczal M, Koteja P, Stuglik MT, Radwan J, Babik W. Accuracy of allele frequency estimation using pooled RNA-Seq. Mol Ecol Resour 2013; 14:381-92. [PMID: 24119300 DOI: 10.1111/1755-0998.12186] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Revised: 09/30/2013] [Accepted: 10/06/2013] [Indexed: 11/28/2022]
Abstract
For nonmodel organisms, genome-wide information that describes functionally relevant variation may be obtained by RNA-Seq following de novo transcriptome assembly. While sequencing has become relatively inexpensive, the preparation of a large number of sequencing libraries remains prohibitively expensive for population genetic analyses of nonmodel species. Pooling samples may be then an attractive alternative. To test whether pooled RNA-Seq accurately predicts true allele frequencies, we analysed the liver transcriptomes of 10 bank voles. Each sample was sequenced both as an individually barcoded library and as a part of a pool. Equal amounts of total RNA from each vole were pooled prior to mRNA selection and library construction. Reads were mapped onto the de novo assembled reference transcriptome. High-quality genotypes for individual voles, determined for 23,682 SNPs, provided information on 'true' allele frequencies; allele frequencies estimated from the pool were then compared with these values. 'True' frequencies and those estimated from the pool were highly correlated. Mean relative estimation error was 21% and did not depend on expression level. However, we also observed a minor effect of interindividual variation in gene expression and allele-specific gene expression influencing allele frequency estimation accuracy. Moreover, we observed strong negative relationship between minor allele frequency and relative estimation error. Our results indicate that pooled RNA-Seq exhibits accuracy comparable with pooled genome resequencing, but variation in expression level between individuals should be assessed and accounted for. This should help in taking account the difference in accuracy between conservatively expressed transcripts and these which are variable in expression level.
Collapse
Affiliation(s)
- M Konczal
- Institute of Environmental Sciences, Jagiellonian University, Gronostajowa 7, 30-387, Kraków, Poland
| | | | | | | | | |
Collapse
|
193
|
An application and empirical comparison of statistical analysis methods for associating rare variants to a complex phenotype. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013. [PMID: 21121035 DOI: 10.1142/9789814335058_0009] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register]
Abstract
The contribution of collections of rare sequence variations (or 'variants') to phenotypic expression has begun to receive considerable attention within the biomedical research community. However, the best way to capture the effects of rare variants in relevant statistical analysis models is an open question. In this paper we describe the application of a number of statistical methods for testing associations between rare variants in two genes to obesity. We consider the relative merits of the different methods as well as important implementation details, such as the leveraging of genomic annotations and determining p-values.
Collapse
|
194
|
Schaid DJ, Sinnwell JP, McDonnell SK, Thibodeau SN. Detecting genomic clustering of risk variants from sequence data: cases versus controls. Hum Genet 2013; 132:1301-9. [PMID: 23842950 PMCID: PMC3797865 DOI: 10.1007/s00439-013-1335-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 07/02/2013] [Indexed: 02/02/2023]
Abstract
As the ability to measure dense genetic markers approaches the limit of the DNA sequence itself, taking advantage of possible clustering of genetic variants in, and around, a gene would benefit genetic association analyses, and likely provide biological insights. The greatest benefit might be realized when multiple rare variants cluster in a functional region. Several statistical tests have been developed, one of which is based on the popular Kulldorff scan statistic for spatial clustering of disease. We extended another popular spatial clustering method--Tango's statistic--to genomic sequence data. An advantage of Tango's method is that it is rapid to compute, and when single test statistic is computed, its distribution is well approximated by a scaled χ(2) distribution, making computation of p values very rapid. We compared the Type-I error rates and power of several clustering statistics, as well as the omnibus sequence kernel association test. Although our version of Tango's statistic, which we call "Kernel Distance" statistic, took approximately half the time to compute than the Kulldorff scan statistic, it had slightly less power than the scan statistic. Our results showed that the Ionita-Laza version of Kulldorff's scan statistic had the greatest power over a range of clustering scenarios.
Collapse
Affiliation(s)
- Daniel J Schaid
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA,
| | | | | | | |
Collapse
|
195
|
Winham SJ, Biernacka JM. Gene-environment interactions in genome-wide association studies: current approaches and new directions. J Child Psychol Psychiatry 2013; 54:1120-34. [PMID: 23808649 PMCID: PMC3829379 DOI: 10.1111/jcpp.12114] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/03/2013] [Indexed: 01/20/2023]
Abstract
BACKGROUND Complex psychiatric traits have long been thought to be the result of a combination of genetic and environmental factors, and gene-environment interactions are thought to play a crucial role in behavioral phenotypes and the susceptibility and progression of psychiatric disorders. Candidate gene studies to investigate hypothesized gene-environment interactions are now fairly common in human genetic research, and with the shift toward genome-wide association studies, genome-wide gene-environment interaction studies are beginning to emerge. METHODS We summarize the basic ideas behind gene-environment interaction, and provide an overview of possible study designs and traditional analysis methods in the context of genome-wide analysis. We then discuss novel approaches beyond the traditional strategy of analyzing the interaction between the environmental factor and each polymorphism individually. RESULTS Two-step filtering approaches that reduce the number of polymorphisms tested for interactions can substantially increase the power of genome-wide gene-environment studies. New analytical methods including data-mining approaches, and gene-level and pathway-level analyses, also have the capacity to improve our understanding of how complex genetic and environmental factors interact to influence psychologic and psychiatric traits. Such methods, however, have not yet been utilized much in behavioral and mental health research. CONCLUSIONS Although methods to investigate gene-environment interactions are available, there is a need for further development and extension of these methods to identify gene-environment interactions in the context of genome-wide association studies. These novel approaches need to be applied in studies of psychology and psychiatry.
Collapse
Affiliation(s)
- Stacey J Winham
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905
| | - Joanna M. Biernacka
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester MN 55905,Department of Psychiatry and Psychology, Mayo Clinic, Rochester MN 55905
| |
Collapse
|
196
|
Epstein RJ. Has discovery-based cancer research been a bust? Clin Transl Oncol 2013; 15:865-70. [PMID: 24002944 DOI: 10.1007/s12094-013-1071-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2013] [Accepted: 06/18/2013] [Indexed: 12/11/2022]
Abstract
The completion of the human genome sequence sparked optimism about prospects for new anticancer drug development, but clinical progress over the last decade has proven slower than expected. Here it is proposed that unrealistically high expectations of first-generation discovery-based diagnostics have contributed to this problem. Hypothesis-based single-molecule tests (e.g., mutation screening of KRAS, EGFR, BRAF or KIT genes) continue to change clinical practice incrementally, whereas first-generation multiplex assays--such as gene expression profiling and proteomics--have identified few high-impact therapeutic targets despite numerous correlations with prognosis. To move forward, second-generation multiplex diagnostics should be based not on statistical patterns/associations alone, but on clinically interpretable ('high-signal-to-noise') data such as change-of-function mutations, gene amplifications, recurrent chromosomal anomalies, and abnormal phosphorylation profiles of ERK or mTOR signaling cascades.
Collapse
Affiliation(s)
- R J Epstein
- Department of Oncology, Clinical Cancer Informatics & Research Centre, The Kinghorn Cancer Centre, Sydney, Australia,
| |
Collapse
|
197
|
Johnson BA, Seneviratne C, Wang XQ, Ait-Daoud N, Li MD. Determination of genotype combinations that can predict the outcome of the treatment of alcohol dependence using the 5-HT(3) antagonist ondansetron. Am J Psychiatry 2013; 170:1020-31. [PMID: 23897038 PMCID: PMC3809153 DOI: 10.1176/appi.ajp.2013.12091163] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
OBJECTIVE The authors previously reported that the 5'-HTTLPR-LL and rs1042173-TT (SLC6A4-LL/TT) genotypes in the serotonin transporter gene predicted a significant reduction in the severity of alcohol consumption among alcoholics receiving the 5-HT3 antagonist ondansetron. In this study, they explored additional markers of ondansetron treatment response in alcoholics by examining polymorphisms in the HTR3A and HTR3B genes, which regulate directly the function and binding of 5-HT3 receptors to ondansetron. METHOD The authors genotyped one rare and 18 common single-nucleotide polymorphisms in HTR3A and HTR3B in the same sample that they genotyped for SLC6A4-LL/TT in the previous randomized, double-blind, 11-week clinical trial. Participants were 283 European Americans who received oral ondansetron (4 mg/kg of body weight twice daily) or placebo along with weekly cognitive-behavioral therapy. Associations of individual and combined genotypes with treatment response on drinking outcomes were analyzed. RESULTS Individuals carrying one or more of genotypes rs1150226-AG and rs1176713-GG in HTR3A and rs17614942-AC in HTR3B showed a significant overall mean difference between ondansetron and placebo in drinks per drinking day (22.50; effect size=0.867), percentage of heavy drinking days (220.58%; effect size=0.780), and percentage of days abstinent (18.18%; effect size=0.683). Combining these HTR3A/HTR3B and SLC6A4-LL/TT genotypes increased the target cohort from approaching 20% (identified in the previous study) to 34%. CONCLUSIONS The authors present initial evidence suggesting that a combined fivemarker genotype panel can be used to predict the outcome of treatment of alcohol dependence with ondansetron. Additional, larger pharmacogenetic studies would help to validate these results.
Collapse
Affiliation(s)
- Bankole A. Johnson
- Department of Psychiatry and Neurobehavioral Sciences University of Virginia, Charlottesville, Virginia, USA
| | - Chamindi Seneviratne
- Department of Psychiatry and Neurobehavioral Sciences University of Virginia, Charlottesville, Virginia, USA
| | - Xin-Qun Wang
- Department of Public Health Sciences, University of Virginia, Charlottesville, Virginia, USA
| | - Nassima Ait-Daoud
- Department of Psychiatry and Neurobehavioral Sciences University of Virginia, Charlottesville, Virginia, USA
| | - Ming D. Li
- Department of Psychiatry and Neurobehavioral Sciences University of Virginia, Charlottesville, Virginia, USA
| |
Collapse
|
198
|
Cardinale CJ, Wei Z, Panossian S, Wang F, Kim CE, Mentch FD, Chiavacci RM, Kachelries KE, Pandey R, Grant SFA, Baldassano RN, Hakonarson H. Targeted resequencing identifies defective variants of decoy receptor 3 in pediatric-onset inflammatory bowel disease. Genes Immun 2013; 14:447-52. [DOI: 10.1038/gene.2013.43] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2013] [Accepted: 07/19/2013] [Indexed: 12/14/2022]
|
199
|
Combined genotype and haplotype tests for region-based association studies. BMC Genomics 2013; 14:569. [PMID: 23964661 PMCID: PMC3852120 DOI: 10.1186/1471-2164-14-569] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2013] [Accepted: 08/13/2013] [Indexed: 12/13/2022] Open
Abstract
Background Although single-SNP analysis has proven to be useful in identifying many disease-associated loci, region-based analysis has several advantages. Empirically, it has been shown that region-based genotype and haplotype approaches may possess much higher power than single-SNP statistical tests. Both high quality haplotypes and genotypes may be available for analysis given the development of next generation sequencing technologies and haplotype assembly algorithms. Results As generally it is unknown whether genotypes or haplotypes are more relevant for identifying an association, we propose to use both of them with the purpose of preserving high power under both genotype and haplotype disease scenarios. We suggest two approaches for a combined association test and investigate the performance of these two approaches based on a theoretical model, population genetics simulations and analysis of a real data set. Conclusions Based on a theoretical model, population genetics simulations and analysis of a central corneal thickness (CCT) Genome Wide Association Study (GWAS) data set we have shown that combined genotype and haplotype approach has a high potential utility for applications in association studies.
Collapse
|
200
|
He X, Sanders SJ, Liu L, De Rubeis S, Lim ET, Sutcliffe JS, Schellenberg GD, Gibbs RA, Daly MJ, Buxbaum JD, State MW, Devlin B, Roeder K. Integrated model of de novo and inherited genetic variants yields greater power to identify risk genes. PLoS Genet 2013; 9:e1003671. [PMID: 23966865 PMCID: PMC3744441 DOI: 10.1371/journal.pgen.1003671] [Citation(s) in RCA: 188] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 06/10/2013] [Indexed: 01/31/2023] Open
Abstract
De novo mutations affect risk for many diseases and disorders, especially those with early-onset. An example is autism spectrum disorders (ASD). Four recent whole-exome sequencing (WES) studies of ASD families revealed a handful of novel risk genes, based on independent de novo loss-of-function (LoF) mutations falling in the same gene, and found that de novo LoF mutations occurred at a twofold higher rate than expected by chance. However successful these studies were, they used only a small fraction of the data, excluding other types of de novo mutations and inherited rare variants. Moreover, such analyses cannot readily incorporate data from case-control studies. An important research challenge in gene discovery, therefore, is to develop statistical methods that accommodate a broader class of rare variation. We develop methods that can incorporate WES data regarding de novo mutations, inherited variants present, and variants identified within cases and controls. TADA, for Transmission And De novo Association, integrates these data by a gene-based likelihood model involving parameters for allele frequencies and gene-specific penetrances. Inference is based on a Hierarchical Bayes strategy that borrows information across all genes to infer parameters that would be difficult to estimate for individual genes. In addition to theoretical development we validated TADA using realistic simulations mimicking rare, large-effect mutations affecting risk for ASD and show it has dramatically better power than other common methods of analysis. Thus TADA's integration of various kinds of WES data can be a highly effective means of identifying novel risk genes. Indeed, application of TADA to WES data from subjects with ASD and their families, as well as from a study of ASD subjects and controls, revealed several novel and promising ASD candidate genes with strong statistical support. The genetic underpinnings of autism spectrum disorder (ASD) have proven difficult to determine, despite a wealth of evidence for genetic causes and ongoing effort to identify genes. Recently investigators sequenced the coding regions of the genomes from ASD children along with their unaffected parents (ASD trios) and identified numerous new candidate genes by pinpointing spontaneously occurring (de novo) mutations in the affected offspring. A gene with a severe (de novo) mutation observed in more than one individual is immediately implicated in ASD; however, the majority of severe mutations are observed only once per gene. These genes create a short list of candidates, and our results suggest about 50% are true risk genes. To strengthen our inferences, we develop a novel statistical method (TADA) that utilizes inherited variation transmitted to affected offspring in conjunction with (de novo) mutations to identify risk genes. Through simulations we show that TADA dramatically increases power. We apply this approach to nearly 1000 ASD trios and 2000 subjects from a case-control study and identify several promising genes. Through simulations and application we show that TADA's integration of sequencing data can be a highly effective means of identifying risk genes.
Collapse
Affiliation(s)
- Xin He
- Lane Center of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Stephan J. Sanders
- Departments of Psychiatry and Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Li Liu
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
| | - Silvia De Rubeis
- Seaver Autism Center for Research and Treatment, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
- Department of Psychiatry, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
| | - Elaine T. Lim
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - James S. Sutcliffe
- Vanderbilt Brain Institute, Departments of Molecular Physiology & Biophysics and Psychiatry, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Gerard D. Schellenberg
- Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - Richard A. Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas, United States of America
| | - Mark J. Daly
- Analytic and Translational Genetics Unit, Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Joseph D. Buxbaum
- Seaver Autism Center for Research and Treatment, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
- Department of Psychiatry, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
- Department of Genetics and Genomic Sciences, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
- Friedman Brain Institute, Icahn Mount Sinai School of Medicine, New York, New York, United States of America
| | - Matthew W. State
- Departments of Psychiatry and Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Bernie Devlin
- Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Kathryn Roeder
- Lane Center of Computational Biology, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- Department of Statistics, Carnegie Mellon University, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|