1
|
Liu H, Zhang H. Powerful Rare-Variant Association Analysis of Secondary Phenotypes. Genet Epidemiol 2025; 49:e22589. [PMID: 39350332 DOI: 10.1002/gepi.22589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Revised: 06/24/2024] [Accepted: 09/02/2024] [Indexed: 12/20/2024]
Abstract
Most genome-wide association studies are based on case-control designs, which provide abundant resources for secondary phenotype analyses. However, such studies suffer from biased sampling of primary phenotypes, and the traditional statistical methods can lead to seriously distorted analysis results when they are applied to secondary phenotypes without accounting for the biased sampling mechanism. To our knowledge, there are no statistical methods specifically tailored for rare variant association analysis with secondary phenotypes. In this article, we proposed two novel joint test statistics for identifying secondary-phenotype-associated rare variants based on prospective likelihood and retrospective likelihood, respectively. We also exploit the assumption of gene-environment independence in retrospective likelihood to improve the statistical power and adopt a two-step strategy to balance statistical power and robustness. Simulations and a real-data application are conducted to demonstrate the superior performance of our proposed methods.
Collapse
Affiliation(s)
- Hanyun Liu
- School of Management, University of Science and Technology of China, Hefei, China
| | - Hong Zhang
- School of Management, University of Science and Technology of China, Hefei, China
| |
Collapse
|
2
|
Henarejos-Castillo I, Sanz FJ, Solana-Manrique C, Sebastian-Leon P, Medina I, Remohi J, Paricio N, Diaz-Gimeno P. Whole-exome sequencing and Drosophila modelling reveal mutated genes and pathways contributing to human ovarian failure. Reprod Biol Endocrinol 2024; 22:153. [PMID: 39633407 PMCID: PMC11616368 DOI: 10.1186/s12958-024-01325-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 11/24/2024] [Indexed: 12/07/2024] Open
Abstract
BACKGROUND Ovarian failure (OF) is a multifactorial, complex disease presented by up to 1% of women under 40 years of age. Despite 90% of patients being diagnosed with idiopathic OF, the underlying molecular mechanisms remain unknown, making it difficult to personalize treatments for these patients in the clinical setting. Studying the presence and/or accumulation of SNVs at the gene/pathway levels will help describe novel genes and characterize disrupted biological pathways linked with ovarian failure. METHODS Ad-hoc case-control SNV screening conducted from 2020 to 2023 of 150 VCF files WES data included Spanish IVF patients with (n = 118) and without (n = 32) OF (< 40 years of age; mean BMI 22.78) along with GnomAD (n = 38,947) and IGSR (n = 1,271; 258 European female VCF) data for pseudo-control female populations. SNVs were prioritized according to their predicted deleteriousness, frequency in genomic databases, and proportional differences across populations. A burden test was performed to reveal genes with a higher presence of SNVs in the OF cohort in comparison to control and pseudo-control groups. Systematic in-silico analyses were performed to assess the potential disruptions caused by the mutated genes in relevant biological pathways. Finally, genes with orthologues in Drosophila melanogaster were considered to experimentally validate the potential impediments to ovarian function and reproductive potential. RESULTS Eighteen genes had a higher presence of SNVs in the OF population (FDR < 0.05). AK2, CDC27, CFTR, CTBP2, KMT2C, and MTCH2 were associated with OF for the first time and their silenced/knockout forms reduced fertility in Drosophila. We also predicted the disruption of 29 sub-pathways across four signalling pathways (FDR < 0.05). These sub-pathways included the metaphase to anaphase transition during oocyte meiosis, inflammatory processes related to necroptosis, DNA repair mismatch systems and the MAPK signalling cascade. CONCLUSIONS This study sheds light on the underlying molecular mechanisms of OF, providing novel associations for six genes and OF-related infertility, setting a foundation for further biomarker development, and improving precision medicine in infertility.
Collapse
Affiliation(s)
- Ismael Henarejos-Castillo
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Pediatrics, Obstetrics and Gynaecology, University of Valencia, Av. Blasco Ibáñez 15, Valencia, 46010, Spain
| | - Francisco José Sanz
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
| | - Cristina Solana-Manrique
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
- Department of Physiotherapy, Faculty of Health Sciences, European University of Valencia, Passeig de l'Albereda, 7, Valencia, 46010, Spain
| | - Patricia Sebastian-Leon
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
| | - Ignacio Medina
- High-Performance Computing Service, University of Cambridge, 7 JJ Thomson Ave, Cambridge, CB3 0RB, UK
| | - José Remohi
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain
- Department of Pediatrics, Obstetrics and Gynaecology, University of Valencia, Av. Blasco Ibáñez 15, Valencia, 46010, Spain
| | - Nuria Paricio
- Department of Genetics, Biotechnology and Biomedicine Institute (BioTecMed), University of Valencia, C. Dr. Moliner, 50, Burjassot, 46100, Spain
| | - Patricia Diaz-Gimeno
- IVI-RMA Global Research Alliance, IVI Foundation, Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Valencia, 46026, Spain.
- Department of Genomic & Systems Reproductive Medicine, IVI Foundation, Valencia, Spain - Instituto de Investigación Sanitaria La Fe (IIS La Fe), Av. Fernando Abril Martorell 106, Torre A, Planta 1ª, Valencia, 46026, Spain.
| |
Collapse
|
3
|
Chen JH, Landback P, Arsala D, Guzzetta A, Xia S, Atlas J, Sosa D, Zhang YE, Cheng J, Shen B, Long M. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.14.567139. [PMID: 38045239 PMCID: PMC10690195 DOI: 10.1101/2023.11.14.567139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are genetic novelties pivotal in mammalian evolution. However, their phenotypic impacts and evolutionary patterns over time remain elusive in humans due to the technical and ethical complexities of functional studies. Integrating gene age dating with Mendelian disease phenotyping, our research shows a gradual rise in disease gene proportion as gene age increases. Logistic regression modeling indicates that this increase in older genes may be related to their longer sequence lengths and higher burdens of deleterious de novo germline variants (DNVs). We also find a steady integration of new genes with biomedical phenotypes into the human genome over macroevolutionary timescales (~0.07% per million years). Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures across gene ages. Notably, young genes show significant enrichment in diseases related to the male reproductive system, indicating strong sexual selection. Young genes also exhibit disease-related functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, musculoskeletal phenotypes, and color vision. We further reveal a logistic growth pattern of pleiotropy over evolutionary time, indicating a diminishing marginal growth of new functions for older genes due to intensifying selective constraints over time. We propose a "pleiotropy-barrier" model that delineates higher potentials for phenotypic innovation in young genes compared to older genes, a process that is subject to natural selection. Our study demonstrates that evolutionarily new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
Affiliation(s)
- Jian-Hai Chen
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Patrick Landback
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Deanna Arsala
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Alexander Guzzetta
- Department of Pathology, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Shengqian Xia
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Jared Atlas
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Dylan Sosa
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| | - Yong E. Zhang
- Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jingqiu Cheng
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Bairong Shen
- Institutes for Systems Genetics, West China University Hospital, Chengdu 610041, China
| | - Manyuan Long
- Department of Ecology and Evolution, The University of Chicago, 1101 E 57th Street, Chicago, IL 60637
| |
Collapse
|
4
|
Li X, Pura J, Allen A, Owzar K, Lu J, Harms M, Xie J. DYNATE: Localizing rare-variant association regions via multiple testing embedded in an aggregation tree. Genet Epidemiol 2024; 48:42-55. [PMID: 38014869 PMCID: PMC10842871 DOI: 10.1002/gepi.22542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 10/09/2023] [Accepted: 10/26/2023] [Indexed: 11/29/2023]
Abstract
Rare-variants (RVs) genetic association studies enable researchers to uncover the variation in phenotypic traits left unexplained by common variation. Traditional single-variant analysis lacks power; thus, researchers have developed various methods to aggregate the effects of RVs across genomic regions to study their collective impact. Some existing methods utilize a static delineation of genomic regions, often resulting in suboptimal effect aggregation, as neutral subregions within the test region will result in an attenuation of signal. Other methods use varying windows to search for signals but often result in long regions containing many neutral RVs. To pinpoint short genomic regions enriched for disease-associated RVs, we developed a novel method, DYNamic Aggregation TEsting (DYNATE). DYNATE dynamically and hierarchically aggregates smaller genomic regions into larger ones and performs multiple testing for disease associations with a controlled weighted false discovery rate. DYNATE's main advantage lies in its strong ability to identify short genomic regions highly enriched for disease-associated RVs. Extensive numerical simulations demonstrate the superior performance of DYNATE under various scenarios compared with existing methods. We applied DYNATE to an amyotrophic lateral sclerosis study and identified a new gene, EPG5, harboring possibly pathogenic mutations.
Collapse
Affiliation(s)
- Xuechan Li
- Novartis Pharmaceuticals Corporation, Basel, Switzerland
| | | | - Andrew Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Kouros Owzar
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| | - Jianfeng Lu
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| | - Matthew Harms
- Department of Neurology, Columbia University, Broadway, New York, USA
| | - Jichun Xie
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
- Department of Mathematics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
5
|
Chen J. Evolutionarily new genes in humans with disease phenotypes reveal functional enrichment patterns shaped by adaptive innovation and sexual selection. RESEARCH SQUARE 2023:rs.3.rs-3632644. [PMID: 38045389 PMCID: PMC10690325 DOI: 10.21203/rs.3.rs-3632644/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
New genes (or young genes) are structural novelties pivotal in mammalian evolution. Their phenotypic impact on humans, however, remains elusive due to the technical and ethical complexities in functional studies. Through combining gene age dating with Mendelian disease phenotyping, our research reveals that new genes associated with disease phenotypes steadily integrate into the human genome at a rate of ~ 0.07% every million years over macroevolutionary timescales. Despite this stable pace, we observe distinct patterns in phenotypic enrichment, pleiotropy, and selective pressures between young and old genes. Notably, young genes show significant enrichment in the male reproductive system, indicating strong sexual selection. Young genes also exhibit functions in tissues and systems potentially linked to human phenotypic innovations, such as increased brain size, bipedal locomotion, and color vision. Our findings further reveal increasing levels of pleiotropy over evolutionary time, which accompanies stronger selective constraints. We propose a "pleiotropy-barrier" model that delineates different potentials for phenotypic innovation between young and older genes subject to natural selection. Our study demonstrates that evolutionary new genes are critical in influencing human reproductive evolution and adaptive phenotypic innovations driven by sexual and natural selection, with low pleiotropy as a selective advantage.
Collapse
|
6
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Rare variant association on unrelated individuals in case-control studies using aggregation tests: existing methods and current limitations. Brief Bioinform 2023; 24:bbad412. [PMID: 37974506 DOI: 10.1093/bib/bbad412] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 10/14/2023] [Accepted: 10/28/2023] [Indexed: 11/19/2023] Open
Abstract
Over the past years, progress made in next-generation sequencing technologies and bioinformatics have sparked a surge in association studies. Especially, genome-wide association studies (GWASs) have demonstrated their effectiveness in identifying disease associations with common genetic variants. Yet, rare variants can contribute to additional disease risk or trait heterogeneity. Because GWASs are underpowered for detecting association with such variants, numerous statistical methods have been recently proposed. Aggregation tests collapse multiple rare variants within a genetic region (e.g. gene, gene set, genomic loci) to test for association. An increasing number of studies using such methods successfully identified trait-associated rare variants and led to a better understanding of the underlying disease mechanism. In this review, we compare existing aggregation tests, their statistical features and scope of application, splitting them into the five classical classes: burden, adaptive burden, variance-component, omnibus and other. Finally, we describe some limitations of current aggregation tests, highlighting potential direction for further investigations.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, 1050 Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, 1050 Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Avenue Hippocrate 74 (+5) bte B1.74.06, 1200 Brussels, Belgium
- WELBIO department, WEL Research Institute, avenue Pasteur, 6, 1300 Wavre, Belgium
| |
Collapse
|
7
|
Boutry S, Helaers R, Lenaerts T, Vikkula M. Excalibur: A new ensemble method based on an optimal combination of aggregation tests for rare-variant association testing for sequencing data. PLoS Comput Biol 2023; 19:e1011488. [PMID: 37708232 PMCID: PMC10522036 DOI: 10.1371/journal.pcbi.1011488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2023] [Revised: 09/26/2023] [Accepted: 09/04/2023] [Indexed: 09/16/2023] Open
Abstract
The development of high-throughput next-generation sequencing technologies and large-scale genetic association studies produced numerous advances in the biostatistics field. Various aggregation tests, i.e. statistical methods that analyze associations of a trait with multiple markers within a genomic region, have produced a variety of novel discoveries. Notwithstanding their usefulness, there is no single test that fits all needs, each suffering from specific drawbacks. Selecting the right aggregation test, while considering an unknown underlying genetic model of the disease, remains an important challenge. Here we propose a new ensemble method, called Excalibur, based on an optimal combination of 36 aggregation tests created after an in-depth study of the limitations of each test and their impact on the quality of result. Our findings demonstrate the ability of our method to control type I error and illustrate that it offers the best average power across all scenarios. The proposed method allows for novel advances in Whole Exome/Genome sequencing association studies, able to handle a wide range of association models, providing researchers with an optimal aggregation analysis for the genetic regions of interest.
Collapse
Affiliation(s)
- Simon Boutry
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
| | - Raphaël Helaers
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
| | - Tom Lenaerts
- Interuniversity Institute of Bioinformatics in Brussels, Université Libre de Bruxelles-Vrije Universiteit Brussels, Brussels, Belgium
- Machine Learning Group, Université Libre de Bruxelles, Brussels, Belgium
- Artificial Intelligence laboratory, Vrije Universiteit Brussel, Brussels, Belgium
| | - Miikka Vikkula
- Human Molecular Genetics, de Duve Institute, University of Louvain, Brussels, Belgium
- WELBIO department, WEL Research Institute, Wavre, Belgium
| |
Collapse
|
8
|
Aborageh M, Krawitz P, Fröhlich H. Genetics in parkinson's disease: From better disease understanding to machine learning based precision medicine. FRONTIERS IN MOLECULAR MEDICINE 2022; 2:933383. [PMID: 39086979 PMCID: PMC11285583 DOI: 10.3389/fmmed.2022.933383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 08/30/2022] [Indexed: 08/02/2024]
Abstract
Parkinson's Disease (PD) is a neurodegenerative disorder with highly heterogeneous phenotypes. Accordingly, it has been challenging to robustly identify genetic factors associated with disease risk, prognosis and therapy response via genome-wide association studies (GWAS). In this review we first provide an overview of existing statistical methods to detect associations between genetic variants and the disease phenotypes in existing PD GWAS. Secondly, we discuss the potential of machine learning approaches to better quantify disease phenotypes and to move beyond disease understanding towards a better-personalized treatment of the disease.
Collapse
Affiliation(s)
- Mohamed Aborageh
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
| | - Peter Krawitz
- Institute for Genomic Statistics and Bioinformatics, University Hospital Bonn, Bonn, Germany
| | - Holger Fröhlich
- Bonn-Aachen International Center for Information Technology (B-IT), Rheinische Friedrich-Wilhelms-Universität Bonn, Bonn, Germany
- Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin, Germany
| |
Collapse
|
9
|
Charon C, Allodji R, Meyer V, Deleuze JF. Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
Affiliation(s)
- Céline Charon
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France.
| | - Rodrigue Allodji
- Radiation Epidemiology Group CESP, Inserm Unit 1018, Gustave Roussy Université Paris Saclay, 114 rue Edouard Vaillant, Villejuif, 94805, France
| | - Vincent Meyer
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| | - Jean-François Deleuze
- CEA Paris-Saclay, Institut François Jacob, Centre National de Recherche en Génomique Humaine, 2 rue Gaston Crémieux, Evry, 91057, France
| |
Collapse
|
10
|
Dong G, Wendl MC, Zhang B, Ding L, Huang KL. AeQTL: eQTL analysis using region-based aggregation of rare genomic variants. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2021; 26:172-183. [PMID: 33691015 PMCID: PMC8050802] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Concurrently available genomic and transcriptomic data from large cohorts provide opportunities to discover expression quantitative trait loci (eQTLs)-genetic variants associated with gene expression changes. However, the statistical power of detecting rare variant eQTLs is often limited and most existing eQTL tools are not compatible with sequence variant file formats. We have developed AeQTL (Aggregated eQTL), a software tool that performs eQTL analysis on variants aggregated according to user-specified regions and is designed to accommodate standard genomic files. AeQTL consistently yielded similar or higher powers for identifying rare variant eQTLs than single-variant tests. Using AeQTL, we discovered that aggregated rare germline truncations in cis exomic regions are significantly associated with the expression of BRCA1 and SLC25A39 in breast tumors. In a somatic mutation pan-cancer analysis, aggregated mutations of those predicted to be missense versus truncations were differentially associated with gene expressions of cancer drivers, and somatic truncation eQTLs were further identified as a new multi-omic classifier of oncogenes versus tumor-suppressor genes. AeQTL is easy to use and customize, allowing a broad application for discovering rare variants, including coding and noncoding variants, associated with gene expression. AeQTL is implemented in Python and the source code is freely available at https://github.com/Huan-glab/AeQTL under the MIT license.
Collapse
Affiliation(s)
- Guanlan Dong
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA
| | - Michael C. Wendl
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Bin Zhang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Li Ding
- Department of Medicine, McDonnel Genome Institute, Washington University in St. Louis, St. Louis, MO 63108, USA
| | - Kuan-lin Huang
- Department of Genetics and Genomic Sciences, Center for Transformative Disease Modeling, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA,Corresponding:
| |
Collapse
|
11
|
Swietlik EM, Prapa M, Martin JM, Pandya D, Auckland K, Morrell NW, Gräf S. 'There and Back Again'-Forward Genetics and Reverse Phenotyping in Pulmonary Arterial Hypertension. Genes (Basel) 2020; 11:E1408. [PMID: 33256119 PMCID: PMC7760524 DOI: 10.3390/genes11121408] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Revised: 11/17/2020] [Accepted: 11/23/2020] [Indexed: 02/07/2023] Open
Abstract
Although the invention of right heart catheterisation in the 1950s enabled accurate clinical diagnosis of pulmonary arterial hypertension (PAH), it was not until 2000 when the landmark discovery of the causative role of bone morphogenetic protein receptor type II (BMPR2) mutations shed new light on the pathogenesis of PAH. Since then several genes have been discovered, which now account for around 25% of cases with the clinical diagnosis of idiopathic PAH. Despite the ongoing efforts, in the majority of patients the cause of the disease remains elusive, a phenomenon often referred to as "missing heritability". In this review, we discuss research approaches to uncover the genetic architecture of PAH starting with forward phenotyping, which in a research setting should focus on stable intermediate phenotypes, forward and reverse genetics, and finally reverse phenotyping. We then discuss potential sources of "missing heritability" and how functional genomics and multi-omics methods are employed to tackle this problem.
Collapse
Affiliation(s)
- Emilia M. Swietlik
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
- Royal Papworth Hospital NHS Foundation Trust, Cambridge CB2 0AY, UK
- Addenbrooke’s Hospital NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Matina Prapa
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
- Addenbrooke’s Hospital NHS Foundation Trust, Cambridge CB2 0QQ, UK
| | - Jennifer M. Martin
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
| | - Divya Pandya
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
| | - Kathryn Auckland
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
| | - Nicholas W. Morrell
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
- Royal Papworth Hospital NHS Foundation Trust, Cambridge CB2 0AY, UK
- Addenbrooke’s Hospital NHS Foundation Trust, Cambridge CB2 0QQ, UK
- NIHR BioResource for Translational Research, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Stefan Gräf
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK; (E.M.S.); (M.P.); (J.M.M.); (D.P.); (K.A.); (N.W.M.)
- NIHR BioResource for Translational Research, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
- Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge CB2 0PT, UK
| |
Collapse
|
12
|
Swietlik EM, Gräf S, Morrell NW. The role of genomics and genetics in pulmonary arterial hypertension. Glob Cardiol Sci Pract 2020; 2020:e202013. [PMID: 33150157 PMCID: PMC7590931 DOI: 10.21542/gcsp.2020.13] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Affiliation(s)
- Emilia M Swietlik
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Stefan Gräf
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Department of Haematology, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, United Kingdom
| | - Nicholas W Morrell
- Department of Medicine, University of Cambridge, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Addenbrooke's Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,Royal Papworth Hospital NHS Foundation Trust, Cambridge Biomedical Campus, Cambridge, United Kingdom.,NIHR BioResource for Translational Research, Cambridge Biomedical Campus, Cambridge, United Kingdom
| |
Collapse
|
13
|
Povysil G, Petrovski S, Hostyk J, Aggarwal V, Allen AS, Goldstein DB. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat Rev Genet 2019; 20:747-759. [PMID: 31605095 DOI: 10.1038/s41576-019-0177-4] [Citation(s) in RCA: 117] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/06/2019] [Indexed: 12/11/2022]
Abstract
The first phase of genome-wide association studies (GWAS) assessed the role of common variation in human disease. Advances optimizing and economizing high-throughput sequencing have enabled a second phase of association studies that assess the contribution of rare variation to complex disease in all protein-coding genes. Unlike the early microarray-based studies, sequencing-based studies catalogue the full range of genetic variation, including the evolutionarily youngest forms. Although the experience with common variants helped establish relevant standards for genome-wide studies, the analysis of rare variation introduces several challenges that require novel analysis approaches.
Collapse
Affiliation(s)
- Gundula Povysil
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Slavé Petrovski
- Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK.,Department of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia
| | - Joseph Hostyk
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Vimla Aggarwal
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, NC, USA
| | - David B Goldstein
- Institute for Genomic Medicine, Columbia University Irving Medical Center, Columbia University, New York, NY, USA.
| |
Collapse
|
14
|
Yang T, Chen H, Tang H, Li D, Wei P. A powerful and data-adaptive test for rare-variant-based gene-environment interaction analysis. Stat Med 2019; 38:1230-1244. [PMID: 30460711 PMCID: PMC6399020 DOI: 10.1002/sim.8037] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2018] [Revised: 10/17/2018] [Accepted: 10/22/2018] [Indexed: 12/20/2022]
Abstract
As whole-exome/genome sequencing data become increasingly available in genetic epidemiology research consortia, there is emerging interest in testing the interactions between rare genetic variants and environmental exposures that modify the risk of complex diseases. However, testing rare-variant-based gene-by-environment interactions (GxE) is more challenging than testing the genetic main effects due to the difficulty in correctly estimating the latter under the null hypothesis of no GxE effects and the presence of neutral variants. In response, we have developed a family of powerful and data-adaptive GxE tests, called "aGE" tests, in the framework of the adaptive powered score test, originally proposed for testing the genetic main effects. Using extensive simulations, we show that aGE tests can control the type I error rate in the presence of a large number of neutral variants or a nonlinear environmental main effect, and the power is more resilient to the inclusion of neutral variants than that of existing methods. We demonstrate the performance of the proposed aGE tests using Pancreatic Cancer Case-Control Consortium Exome Chip data. An R package "aGE" is available at http://github.com/ytzhong/projects/.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, TX 77030, USA
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics and Environmental Sciences, School of Public Health,The University of Texas Health Science Center at Houston, TX77030, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX77030, USA
| | - Hongwei Tang
- Departments of Gastrointestinal Medical Oncology and Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX77030, USA
| | - Donghui Li
- Departments of Gastrointestinal Medical Oncology and Epidemiology, The University of Texas MD Anderson Cancer Center, Houston, TX77030, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX 77030, USA
| |
Collapse
|
15
|
Guinot F, Szafranski M, Ambroise C, Samson F. Learning the optimal scale for GWAS through hierarchical SNP aggregation. BMC Bioinformatics 2018; 19:459. [PMID: 30497371 PMCID: PMC6267789 DOI: 10.1186/s12859-018-2475-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Accepted: 11/09/2018] [Indexed: 11/16/2022] Open
Abstract
Background Genome-Wide Association Studies (GWAS) seek to identify causal genomic variants associated with rare human diseases. The classical statistical approach for detecting these variants is based on univariate hypothesis testing, with healthy individuals being tested against affected individuals at each locus. Given that an individual’s genotype is characterized by up to one million SNPs, this approach lacks precision, since it may yield a large number of false positives that can lead to erroneous conclusions about genetic associations with the disease. One way to improve the detection of true genetic associations is to reduce the number of hypotheses to be tested by grouping SNPs. Results We propose a dimension-reduction approach which can be applied in the context of GWAS by making use of the haplotype structure of the human genome. We compare our method with standard univariate and group-based approaches on both synthetic and real GWAS data. Conclusion We show that reducing the dimension of the predictor matrix by aggregating SNPs gives a greater precision in the detection of associations between the phenotype and genomic regions.
Collapse
Affiliation(s)
- Florent Guinot
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France. .,BIOptimize, Reims, 51000, France.
| | - Marie Szafranski
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France
| | - Christophe Ambroise
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France.,UMR MIA-Paris - AgroParisTech, INRA, Université Paris-Saclay, Paris, 75005, France
| | - Franck Samson
- UMR 8071 LaMME - UEVE, CNRS, ENSIIE, USC INRA, 23 bd de France, Evry, 91000, France
| |
Collapse
|
16
|
Abstract
While genome-wide association studies have been very successful in identifying associations of common genetic variants with many different traits, the rarer frequency spectrum of the genome has not yet been comprehensively explored. Technological developments increasingly lift restrictions to access rare genetic variation. Dense reference panels enable improved genotype imputation for rarer variants in studies using DNA microarrays. Moreover, the decreasing cost of next generation sequencing makes whole exome and genome sequencing increasingly affordable for large samples. Large-scale efforts based on sequencing, such as ExAC, 100,000 Genomes, and TopMed, are likely to significantly advance this field.The main challenge in evaluating complex trait associations of rare variants is statistical power. The choice of population should be considered carefully because allele frequencies and linkage disequilibrium structure differ between populations. Genetically isolated populations can have favorable genomic characteristics for the study of rare variants.One strategy to increase power is to assess the combined effect of multiple rare variants within a region, known as aggregate testing. A range of methods have been developed for this. Model performance depends on the genetic architecture of the region of interest.
Collapse
Affiliation(s)
- Karoline Kuchenbaecker
- Wellcome Trust Sanger Institute, Cambridge, UK. .,University College London, London, UK.
| | - Emil Vincent Rosenbaum Appel
- Novo Nordisk Foundation Center for Basic Metabolic Research, Section for Metabolic Genetics, Faculty of Health Sciences, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
17
|
Luo Y, Maity A, Wu MC, Smith C, Duan Q, Li Y, Tzeng JY. On the substructure controls in rare variant analysis: Principal components or variance components? Genet Epidemiol 2017; 42:276-287. [PMID: 29280188 DOI: 10.1002/gepi.22102] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2017] [Revised: 10/07/2017] [Accepted: 10/19/2017] [Indexed: 11/09/2022]
Abstract
Recent studies showed that population substructure (PS) can have more complex impact on rare variant tests and that similarity-based collapsing tests (e.g., SKAT) may suffer more severely by PS than burden-based tests. In this work, we evaluate the performance of SKAT coupling with principal components (PC) or variance components (VC) based PS correction methods. We consider confounding effects caused by PS including stratified populations, admixed populations, and spatially distributed nongenetic risk; we investigate which types of variants (e.g., common, less frequent, rare, or all variants) should be used to effectively control for confounding effects. We found that (i) PC-based methods can account for confounding effects in most scenarios except for admixture, although the number of sufficient PCs depends on the PS complexity and the type of variants used. (ii) PCs based on all variants (i.e., common + less frequent + rare) tend to require equal or fewer sufficient PCs and often achieve higher power than PCs based on other variant types. (iii) VC-based methods can effectively adjust for confounding in all scenarios (even for admixture), though the type of variants should be used to construct VC may vary. (iv) VC based on all variants works consistently in all scenarios, though its power may be sometimes lower than VC based on other variant types. Given that the best-performed method and which variants to use depend on the underlying unknown confounding mechanisms, a robust strategy is to perform SKAT analyses using VC-based methods based on all variants.
Collapse
Affiliation(s)
- Yiwen Luo
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Arnab Maity
- Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Michael C Wu
- Fred Hutchinson Cancer Research Center, Seattle, Washington, United States of America
| | - Chris Smith
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America
| | - Qing Duan
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Yun Li
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America.,Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Jung-Ying Tzeng
- Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, North Carolina State University, Raleigh, North Carolina, United States of America.,Department of Statistics, National Cheng-Kung University, Tainan, Taiwan.,Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
| |
Collapse
|
18
|
Goldfeder RL, Wall DP, Khoury MJ, Ioannidis JPA, Ashley EA. Human Genome Sequencing at the Population Scale: A Primer on High-Throughput DNA Sequencing and Analysis. Am J Epidemiol 2017; 186:1000-1009. [PMID: 29040395 DOI: 10.1093/aje/kww224] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2016] [Accepted: 12/02/2016] [Indexed: 12/30/2022] Open
Abstract
Most human diseases have underlying genetic causes. To better understand the impact of genes on disease and its implications for medicine and public health, researchers have pursued methods for determining the sequences of individual genes, then all genes, and now complete human genomes. Massively parallel high-throughput sequencing technology, where DNA is sheared into smaller pieces, sequenced, and then computationally reordered and analyzed, enables fast and affordable sequencing of full human genomes. As the price of sequencing continues to decline, more and more individuals are having their genomes sequenced. This may facilitate better population-level disease subtyping and characterization, as well as individual-level diagnosis and personalized treatment and prevention plans. In this review, we describe several massively parallel high-throughput DNA sequencing technologies and their associated strengths, limitations, and error modes, with a focus on applications in epidemiologic research and precision medicine. We detail the methods used to computationally process and interpret sequence data to inform medical or preventative action.
Collapse
|
19
|
Chien LC, Bowden DW, Chiu YF. Region-based association tests for sequencing data on survival traits. Genet Epidemiol 2017; 41:511-522. [DOI: 10.1002/gepi.22054] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2016] [Revised: 03/27/2017] [Accepted: 03/27/2017] [Indexed: 11/07/2022]
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science; Kaohsiung Medical University; Kaohsiung Taiwan
| | - Donald W. Bowden
- Center for Diabetes Research, Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
- Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
- Department of Biochemistry; Wake Forest School of Medicine; Winston-Salem North Carolina United States of America
| | - Yen-Feng Chiu
- Institute of Population Health Sciences; National Health Research Institutes; Miaoli Taiwan
| |
Collapse
|
20
|
Abstract
Despite thousands of genetic loci identified to date, a large proportion of genetic variation predisposing to complex disease and traits remains unaccounted for. Advances in sequencing technology enable focused explorations on the contribution of low-frequency and rare variants to human traits. Here we review experimental approaches and current knowledge on the contribution of these genetic variants in complex disease and discuss challenges and opportunities for personalised medicine.
Collapse
Affiliation(s)
- Lorenzo Bomba
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Klaudia Walter
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK
| | - Nicole Soranzo
- Human Genetics, Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1HH, UK. .,Department of Haematology, University of Cambridge, Hills Rd, Cambridge, CB2 0AH, UK. .,The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics at the University of Cambridge, University of Cambridge, Strangeways Research Laboratory, Wort's Causeway, Cambridge, CB1 8RN, UK.
| |
Collapse
|
21
|
Xiao Y, Liu H, Wu L, Warburton M, Yan J. Genome-wide Association Studies in Maize: Praise and Stargaze. MOLECULAR PLANT 2017; 10:359-374. [PMID: 28039028 DOI: 10.1016/j.molp.2016.12.008] [Citation(s) in RCA: 225] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/10/2016] [Revised: 12/02/2016] [Accepted: 12/20/2016] [Indexed: 05/18/2023]
Abstract
Genome-wide association study (GWAS) has become a widely accepted strategy for decoding genotype-phenotype associations in many species thanks to advances in next-generation sequencing (NGS) technologies. Maize is an ideal crop for GWAS and significant progress has been made in the last decade. This review summarizes current GWAS efforts in maize functional genomics research and discusses future prospects in the omics era. The general goal of GWAS is to link genotypic variations to corresponding differences in phenotype using the most appropriate statistical model in a given population. The current review also presents perspectives for optimizing GWAS design and analysis. GWAS analysis of data from RNA, protein, and metabolite-based omics studies is discussed, along with new models and new population designs that will identify causes of phenotypic variation that have been hidden to date. The joint and continuous efforts of the whole community will enhance our understanding of maize quantitative traits and boost crop molecular breeding designs.
Collapse
Affiliation(s)
- Yingjie Xiao
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Haijun Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China
| | - Liuji Wu
- Synergetic Innovation Center of Henan Grain Crops, Henan Agricultural University, Zhengzhou 450002, China
| | - Marilyn Warburton
- United States of Department of Agriculture, Agricultural Research Service, Corn Host Plant Resistance Research Unit, Box 9555, MS 39762, Mississippi, USA
| | - Jianbing Yan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan 430070, China.
| |
Collapse
|
22
|
Rytova AI, Khlebus EY, Shevtsov AE, Kutsenko VA, Shcherbakova NV, Zharikova AA, Ershova AI, Kiseleva AV, Boytsov SA, Yarovaya EB, Meshkov AN. Modern probabilistic and statistical approaches to search for nucleotide sequence options associated with integrated diseases. RUSS J GENET+ 2017. [DOI: 10.1134/s1022795417100088] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
23
|
Petersen BS, Fredrich B, Hoeppner MP, Ellinghaus D, Franke A. Opportunities and challenges of whole-genome and -exome sequencing. BMC Genet 2017; 18:14. [PMID: 28193154 PMCID: PMC5307692 DOI: 10.1186/s12863-017-0479-5] [Citation(s) in RCA: 137] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 01/26/2017] [Indexed: 01/08/2023] Open
Abstract
Recent advances in the development of sequencing technologies provide researchers with unprecedented possibilities for genetic analyses. In this review, we will discuss the history of genetic studies and the progress driven by next-generation sequencing (NGS), using complex inflammatory bowel diseases as an example. We focus on the opportunities, but also challenges that researchers are facing when working with NGS data to unravel the genetic causes underlying diseases.
Collapse
Affiliation(s)
| | - Broder Fredrich
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Marc P Hoeppner
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - David Ellinghaus
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Andre Franke
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany.
| |
Collapse
|
24
|
Moore CCB, Basile AO, Wallace JR, Frase AT, Ritchie MD. A biologically informed method for detecting rare variant associations. BioData Min 2016; 9:27. [PMID: 27582876 PMCID: PMC5006419 DOI: 10.1186/s13040-016-0107-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 06/18/2016] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND BioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting. PURPOSE OF THE STUDY In this manuscript, we describe the software package for BioBin, along with type I error and power simulations to demonstrate the strengths and various customizable features and analysis options of this variant binning tool. RESULTS Simulation testing highlights the utility of BioBin as a fast, comprehensive and expandable tool for the biologically-inspired binning and analysis of low-frequency variants in sequence data. CONCLUSIONS AND POTENTIAL IMPLICATIONS The BioBin software package has the capability to transform and streamline the analysis pipelines for researchers analyzing rare variants. This automated bioinformatics tool minimizes the manual effort of creating genomic regions for binning such that time can be spent on the much more interesting task of statistical analyses. This software package is open source and freely available from http://ritchielab.com/software/biobin-download.
Collapse
Affiliation(s)
| | - Anna Okula Basile
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
| | - John Robert Wallace
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Alex Thomas Frase
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Marylyn DeRiggi Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| |
Collapse
|
25
|
Discovery of rare variants for complex phenotypes. Hum Genet 2016; 135:625-34. [PMID: 27221085 DOI: 10.1007/s00439-016-1679-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2016] [Accepted: 04/28/2016] [Indexed: 12/27/2022]
Abstract
With the rise of sequencing technologies, it is now feasible to assess the role rare variants play in the genetic contribution to complex trait variation. While some of the earlier targeted sequencing studies successfully identified rare variants of large effect, unbiased gene discovery using exome sequencing has experienced limited success for complex traits. Nevertheless, rare variant association studies have demonstrated that rare variants do contribute to phenotypic variability, but sample sizes will likely have to be even larger than those of common variant association studies to be powered for the detection of genes and loci. Large-scale sequencing efforts of tens of thousands of individuals, such as the UK10K Project and aggregation efforts such as the Exome Aggregation Consortium, have made great strides in advancing our knowledge of the landscape of rare variation, but there remain many considerations when studying rare variation in the context of complex traits. We discuss these considerations in this review, presenting a broad range of topics at a high level as an introduction to rare variant analysis in complex traits including the issues of power, study design, sample ascertainment, de novo variation, and statistical testing approaches. Ultimately, as sequencing costs continue to decline, larger sequencing studies will yield clearer insights into the biological consequence of rare mutations and may reveal which genes play a role in the etiology of complex traits.
Collapse
|
26
|
Abstract
Over the past few years, interest in the identification of rare variants that influence human phenotype has led to the development of many statistical methods for testing for association between sets of rare variants and binary or quantitative traits. Here, I review some of the most important ideas that underlie these methods and the most relevant issues when choosing a method for analysis. In addition to the tests for association, I review crucial issues in performing a rare variant study, from experimental design to interpretation and validation. I also discuss the many challenges of these studies, some of their limitations, and future research directions.
Collapse
Affiliation(s)
- Dan L Nicolae
- Departments of Medicine and Statistics, University of Chicago, Chicago, Illinois 60637;
| |
Collapse
|
27
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
28
|
Chien LC, Hsu FC, Bowden DW, Chiu YF. Generalization of Rare Variant Association Tests for Longitudinal Family Studies. Genet Epidemiol 2016; 40:101-12. [PMID: 26783077 DOI: 10.1002/gepi.21951] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2015] [Revised: 11/19/2015] [Accepted: 11/19/2015] [Indexed: 11/06/2022]
Abstract
Given the functional relevance of many rare variants, their identification is frequently critical for dissecting disease etiology. Functional variants are likely to be aggregated in family studies enriched with affected members, and this aggregation increases the statistical power to detect rare variants associated with a trait of interest. Longitudinal family studies provide additional information for identifying genetic and environmental factors associated with disease over time. However, methods to analyze rare variants in longitudinal family data remain fairly limited. These methods should be capable of accounting for different sources of correlations and handling large amounts of sequencing data efficiently. To identify rare variants associated with a phenotype in longitudinal family studies, we extended pedigree-based burden (BT) and kernel (KS) association tests to genetic longitudinal studies. Generalized estimating equation (GEE) approaches were used to generalize the pedigree-based BT and KS to multiple correlated phenotypes under the generalized linear model framework, adjusting for fixed effects of confounding factors. These tests accounted for complex correlations between repeated measures of the same phenotype (serial correlations) and between individuals in the same family (familial correlations). We conducted comprehensive simulation studies to compare the proposed tests with mixed-effects models and marginal models, using GEEs under various configurations. When the proposed tests were applied to data from the Diabetes Heart Study, we found exome variants of POMGNT1 and JAK1 genes were associated with type 2 diabetes.
Collapse
Affiliation(s)
- Li-Chu Chien
- Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
| | - Fang-Chi Hsu
- Department of Biostatistical Sciences, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Donald W Bowden
- Center for Diabetes Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America.,Center for Genomics and Personalized Medicine Research, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America.,Department of Biochemistry, Wake Forest School of Medicine, Winston-Salem, North Carolina, United States of America
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Miaoli, Taiwan
| |
Collapse
|
29
|
Basile AO, Wallace JR, Peissig P, McCarty CA, Brilliant M, Ritchie MD. KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:249-260. [PMID: 26776191 PMCID: PMC4824557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Next-generation sequencing technology has presented an opportunity for rare variant discovery and association of these variants with disease. To address the challenges of rare variant analysis, multiple statistical methods have been developed for combining rare variants to increase statistical power for detecting associations. BioBin is an automated tool that expands on collapsing/binning methods by performing multi-level variant aggregation with a flexible, biologically informed binning strategy using an internal biorepository, the Library of Knowledge (LOKI). The databases within LOKI provide variant details, regional annotations and pathway interactions which can be used to generate bins of biologically-related variants, thereby increasing the power of any subsequent statistical test. In this study, we expand the framework of BioBin to incorporate statistical tests, including a dispersion-based test, SKAT, thereby providing the option of performing a unified collapsing and statistical rare variant analysis in one tool. Extensive simulation studies performed on gene-coding regions showed a Bin-KAT analysis to have greater power than BioBin-regression in all simulated conditions, including variants influencing the phenotype in the same direction, a scenario where burden tests often retain greater power. The use of Madsen- Browning variant weighting increased power in the burden analysis to that equitable with Bin-KAT; but overall Bin-KAT retained equivalent or higher power under all conditions. Bin-KAT was applied to a study of 82 pharmacogenes sequenced in the Marshfield Personalized Medicine Research Project (PMRP). We looked for association of these genes with 9 different phenotypes extracted from the electronic health record. This study demonstrates that Bin-KAT is a powerful tool for the identification of genes harboring low frequency variants for complex phenotypes.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry, Microbiology and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | | | | | | | | | | |
Collapse
|
30
|
Coombes B, Basu S, Guha S, Schork N. Weighted Score Tests Implementing Model-Averaging Schemes in Detection of Rare Variants in Case-Control Studies. PLoS One 2015; 10:e0139355. [PMID: 26436424 PMCID: PMC4593572 DOI: 10.1371/journal.pone.0139355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 09/11/2015] [Indexed: 12/04/2022] Open
Abstract
Multi-locus effect modeling is a powerful approach for detection of genes influencing a complex disease. Especially for rare variants, we need to analyze multiple variants together to achieve adequate power for detection. In this paper, we propose several parsimonious branching model techniques to assess the joint effect of a group of rare variants in a case-control study. These models implement a data reduction strategy within a likelihood framework and use a weighted score test to assess the statistical significance of the effect of the group of variants on the disease. The primary advantage of the proposed approach is that it performs model-averaging over a substantially smaller set of models supported by the data and thus gains power to detect multi-locus effects. We illustrate these proposed approaches on simulated and real data and study their performance compared to several existing rare variant detection approaches. The primary goal of this paper is to assess if there is any gain in power to detect association by averaging over a number of models instead of selecting the best model. Extensive simulations and real data application demonstrate the advantage the proposed approach in presence of causal variants with opposite directional effects along with a moderate number of null variants in linkage disequilibrium.
Collapse
Affiliation(s)
- Brandon Coombes
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Saonli Basu
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Sharmistha Guha
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, United States of America
| | - Nicholas Schork
- J. Craig Venter Institute, La Jolla, CA, United States of America
| |
Collapse
|
31
|
Schmidt EM, Willer CJ. Insights into blood lipids from rare variant discovery. Curr Opin Genet Dev 2015; 33:25-31. [PMID: 26241468 DOI: 10.1016/j.gde.2015.06.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2015] [Revised: 06/19/2015] [Accepted: 06/22/2015] [Indexed: 12/18/2022]
Abstract
Large-scale genome wide screens have discovered over 160 common variants associated with plasma lipids, which are risk factors often linked to heart disease. A large fraction of lipid heritability remains unexplained, and it is hypothesized that rare variants of functional consequence may account for some of the missing heritability. Finding lipid-associated variants that occur less frequently in the human population poses a challenge, primarily due to lack of power and difficulties to identify and test them. Interrogation of the protein-coding regions of the genome using array and sequencing techniques has led to important discoveries of rare variants that affect lipid levels and related disease risk. Here, we summarize the latest methods and findings that contribute to our current understanding of rare variant lipid genetics.
Collapse
Affiliation(s)
- Ellen M Schmidt
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Cristen J Willer
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA; Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI 48109, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
32
|
Lee S, Abecasis G, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet 2014; 95:5-23. [PMID: 24995866 DOI: 10.1016/j.ajhg.2014.06.009] [Citation(s) in RCA: 689] [Impact Index Per Article: 62.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2014] [Indexed: 12/30/2022] Open
Abstract
Despite the extensive discovery of trait- and disease-associated common variants, much of the genetic contribution to complex traits remains unexplained. Rare variants can explain additional disease risk or trait variability. An increasing number of studies are underway to identify trait- and disease-associated rare variants. In this review, we provide an overview of statistical issues in rare-variant association studies with a focus on study designs and statistical tests. We present the design and analysis pipeline of rare-variant studies and review cost-effective sequencing designs and genotyping platforms. We compare various gene- or region-based association tests, including burden tests, variance-component tests, and combined omnibus tests, in terms of their assumptions and performance. Also discussed are the related topics of meta-analysis, population-stratification adjustment, genotype imputation, follow-up studies, and heritability due to rare variants. We provide guidelines for analysis and discuss some of the challenges inherent in these studies and future research directions.
Collapse
|
33
|
Wain LV, Sayers I, Soler Artigas M, Portelli MA, Zeggini E, Obeidat M, Sin DD, Bossé Y, Nickle D, Brandsma CA, Malarstig A, Vangjeli C, Jelinsky SA, John S, Kilty I, McKeever T, Shrine NRG, Cook JP, Patel S, Spector TD, Hollox EJ, Hall IP, Tobin MD. Whole exome re-sequencing implicates CCDC38 and cilia structure and function in resistance to smoking related airflow obstruction. PLoS Genet 2014; 10:e1004314. [PMID: 24786987 PMCID: PMC4006731 DOI: 10.1371/journal.pgen.1004314] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2013] [Accepted: 03/06/2014] [Indexed: 11/19/2022] Open
Abstract
Chronic obstructive pulmonary disease (COPD) is a leading cause of global morbidity and mortality and, whilst smoking remains the single most important risk factor, COPD risk is heritable. Of 26 independent genomic regions showing association with lung function in genome-wide association studies, eleven have been reported to show association with airflow obstruction. Although the main risk factor for COPD is smoking, some individuals are observed to have a high forced expired volume in 1 second (FEV1) despite many years of heavy smoking. We hypothesised that these "resistant smokers" may harbour variants which protect against lung function decline caused by smoking and provide insight into the genetic determinants of lung health. We undertook whole exome re-sequencing of 100 heavy smokers who had healthy lung function given their age, sex, height and smoking history and applied three complementary approaches to explore the genetic architecture of smoking resistance. Firstly, we identified novel functional variants in the "resistant smokers" and looked for enrichment of these novel variants within biological pathways. Secondly, we undertook association testing of all exonic variants individually with two independent control sets. Thirdly, we undertook gene-based association testing of all exonic variants. Our strongest signal of association with smoking resistance for a non-synonymous SNP was for rs10859974 (P = 2.34 × 10(-4)) in CCDC38, a gene which has previously been reported to show association with FEV1/FVC, and we demonstrate moderate expression of CCDC38 in bronchial epithelial cells. We identified an enrichment of novel putatively functional variants in genes related to cilia structure and function in resistant smokers. Ciliary function abnormalities are known to be associated with both smoking and reduced mucociliary clearance in patients with COPD. We suggest that genetic influences on the development or function of cilia in the bronchial epithelium may affect growth of cilia or the extent of damage caused by tobacco smoke.
Collapse
Affiliation(s)
- Louise V. Wain
- University of Leicester, Department of Health Sciences, Leicester, United Kingdom
- * E-mail:
| | - Ian Sayers
- Division of Respiratory Medicine, University of Nottingham, Queen's Medical Centre, Nottingham, United Kingdom
| | - María Soler Artigas
- University of Leicester, Department of Health Sciences, Leicester, United Kingdom
| | - Michael A. Portelli
- Division of Respiratory Medicine, University of Nottingham, Queen's Medical Centre, Nottingham, United Kingdom
| | | | - Ma'en Obeidat
- University of British Columbia Centre for Heart Lung Innovation, St. Paul's Hospital, Vancouver, Canada
| | - Don D. Sin
- University of British Columbia Centre for Heart Lung Innovation, St. Paul's Hospital, Vancouver, Canada
| | - Yohan Bossé
- Institut universitaire de cardiologie et de pneumologie de Québec, Department of Molecular Medicine, Laval University, Québec, Canada
| | - David Nickle
- Merck Research Laboratories, Boston, Massachusetts, United States of America
- Merck, Rahway, New Jersey, United States of America
| | - Corry-Anke Brandsma
- University of Groningen, University Medical Center Groningen, Department of Pathology and Medical Biology, GRIAC Research Institute, Groningen, The Netherlands
| | | | | | - Scott A. Jelinsky
- Pfizer Worldwide R&D, Cambridge, Massachusetts, United States of America
| | - Sally John
- Pfizer Worldwide R&D, Cambridge, Massachusetts, United States of America
| | - Iain Kilty
- Pfizer Worldwide R&D, Cambridge, Massachusetts, United States of America
| | - Tricia McKeever
- School of Community Health Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Nick R. G. Shrine
- University of Leicester, Department of Health Sciences, Leicester, United Kingdom
| | - James P. Cook
- University of Leicester, Department of Health Sciences, Leicester, United Kingdom
| | - Shrina Patel
- Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
| | - Tim D. Spector
- Department of Twin Research and Genetic Epidemiology, King's College London, London, United Kingdom
| | - Edward J. Hollox
- University of Leicester, Department of Genetics, Leicester, United Kingdom
| | - Ian P. Hall
- Division of Respiratory Medicine, University of Nottingham, Queen's Medical Centre, Nottingham, United Kingdom
| | - Martin D. Tobin
- University of Leicester, Department of Health Sciences, Leicester, United Kingdom
- National Institute for Health Research (NIHR) Leicester Respiratory Biomedical Research Unit, Glenfield Hospital, Leicester, United Kingdom
| |
Collapse
|
34
|
Lange K, Papp JC, Sinsheimer JS, Sobel EM. Next Generation Statistical Genetics: Modeling, Penalization, and Optimization in High-Dimensional Data. ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION 2014; 1:279-300. [PMID: 24955378 PMCID: PMC4062304 DOI: 10.1146/annurev-statistics-022513-115638] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Statistical genetics is undergoing the same transition to big data that all branches of applied statistics are experiencing. With the advent of inexpensive DNA sequencing, the transition is only accelerating. This brief review highlights some modern techniques with recent successes in statistical genetics. These include: (a) lasso penalized regression and association mapping, (b) ethnic admixture estimation, (c) matrix completion for genotype and sequence data, (d) the fused lasso and copy number variation, (e) haplotyping, (f) estimation of relatedness, (g) variance components models, and (h) rare variant testing. For more than a century, genetics has been both a driver and beneficiary of statistical theory and practice. This symbiotic relationship will persist for the foreseeable future.
Collapse
Affiliation(s)
- Kenneth Lange
- Depts of Biomathematics, Human Genetics, and Statistics, UCLA
| | | | - Janet S. Sinsheimer
- Depts of Biomathematics, Human Genetics, Statistics, and Biostatistics, UCLA
| | | |
Collapse
|
35
|
Panoutsopoulou K, Tachmazidou I, Zeggini E. In search of low-frequency and rare variants affecting complex traits. Hum Mol Genet 2013; 22:R16-21. [PMID: 23922232 PMCID: PMC3782074 DOI: 10.1093/hmg/ddt376] [Citation(s) in RCA: 64] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The allelic architecture of complex traits is likely to be underpinned by a combination of multiple common frequency and rare variants. Targeted genotyping arrays and next-generation sequencing technologies at the whole-genome sequencing (WGS) and whole-exome scales (WES) are increasingly employed to access sequence variation across the full minor allele frequency (MAF) spectrum. Different study design strategies that make use of diverse technologies, imputation and sample selection approaches are an active target of development and evaluation efforts. Initial insights into the contribution of rare variants in common diseases and medically relevant quantitative traits point to low-frequency and rare alleles acting either independently or in aggregate and in several cases alongside common variants. Studies conducted in population isolates have been successful in detecting rare variant associations with complex phenotypes. Statistical methodologies that enable the joint analysis of rare variants across regions of the genome continue to evolve with current efforts focusing on incorporating information such as functional annotation, and on the meta-analysis of these burden tests. In addition, population stratification, defining genome-wide statistical significance thresholds and the design of appropriate replication experiments constitute important considerations for the powerful analysis and interpretation of rare variant association studies. Progress in addressing these emerging challenges and the accrual of sufficiently large data sets are poised to help the field of complex trait genetics enter a promising era of discovery.
Collapse
Affiliation(s)
| | | | - Eleftheria Zeggini
- To whom correspondence should be addressed at: Wellcome Trust Sanger Institute, The Morgan Building, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1HH, UK. Tel: +44-1223496868; Fax: +44-1223496826;
| |
Collapse
|
36
|
Zuo L, Wang KS, Zhang XY, Li CSR, Zhang F, Wang X, Chen W, Gao G, Zhang H, Krystal JH, Luo X. Rare SERINC2 variants are specific for alcohol dependence in individuals of European descent. Pharmacogenet Genomics 2013; 23:395-402. [PMID: 23778322 PMCID: PMC4287355 DOI: 10.1097/fpc.0b013e328362f9f2] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
OBJECTIVES We have previously reported a top-ranked risk gene [i.e., serine incorporator 2 gene (SERINC2)] for alcohol dependence in individuals of European descent by analyzing the common variants in a genome-wide association study. In the present study, we comprehensively examined the rare variants [minor allele frequency (MAF)<0.05] in the NKAIN1-SERINC2 region to confirm our previous finding. MATERIALS AND METHODS A discovery sample (1409 European-American patients with alcohol dependence and 1518 European-American controls) and a replication sample (6438 European-Australian family participants with 1645 alcohol-dependent probands) were subjected to an association analysis. A total of 39,903 individuals from 19 other cohorts with 11 different neuropsychiatric and neurological disorders served as contrast groups. The entire NKAIN1-SERINC2 region was imputed in all cohorts using the same reference panels of genotypes that included rare variants from the whole-genome sequencing data. We stringently cleaned the phenotype and genotype data, and obtained a total of about 220 single-nucleotide polymorphisms in individuals of European descent and about 450 single-nucleotide polymorphisms in the individuals of African descent with 0 RESULTS Using a weighted regression analysis implemented in the program SCORE-Seq, we found a rare variant constellation across the entire NKAIN1-SERINC2 region that was associated with alcohol dependence in European-Americans (Fp: overall, P=1.8×10(-4); VT: overall, P=1.4×10(-4); Collapsing, P=6.5×10(-5)) and European-Australians (Fp: overall, P=0.028; Collapsing, P=0.025), but not in African-Americans, and not associated with any other disorder examined. Association signals in this region came mainly from SERINC2, a gene that codes for an activity-regulated protein expressed in the brain that incorporates serine into lipids. In addition, 26 individual rare variants were nominally associated with alcohol dependence in European-Americans (P<0.05). The associations of five of these rare variants that lay within SERINC2 showed region-wide significance (P<α=0.0006) and 25 associations survived correction for a false discovery rate (q<0.05). The associations of two rare variants at SERINC2 were replicated in European-Australians (P<0.05). CONCLUSION We concluded that SERINC2 was a replicable and significant risk gene specific for alcohol dependence in individuals of European descent.
Collapse
Affiliation(s)
- Lingjun Zuo
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Ke-Sheng Wang
- Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN, USA
| | - Xiang-Yang Zhang
- Menninger Department of Psychiatry and Behavioral Sciences, Baylor College of Medicine, Houston, Texas, USA
| | - Chiang-Shan R. Li
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - Fengyu Zhang
- Lieber Institute for Brain Development, Johns Hopkins University Medical Campus, Baltimore, MD, USA
| | - Xiaoping Wang
- Department of Neurology, First People's Hospital, Shanghai Jiaotong University, Shanghai, China
| | - Wenan Chen
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Guimin Gao
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA, USA
| | - Heping Zhang
- Department of Biostatistics, Yale University School of Epidemiology and Public Health, New Haven, CT, USA
| | - John H. Krystal
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
- Psychiatry Services, Yale-New Haven Hospital, New Haven, CT
- VA Alcohol Research Center, VA Connecticut Healthcare System, West Haven, CT
| | - Xingguang Luo
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| |
Collapse
|
37
|
Valsesia A, Macé A, Jacquemont S, Beckmann JS, Kutalik Z. The Growing Importance of CNVs: New Insights for Detection and Clinical Interpretation. Front Genet 2013; 4:92. [PMID: 23750167 PMCID: PMC3667386 DOI: 10.3389/fgene.2013.00092] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2013] [Accepted: 05/04/2013] [Indexed: 02/03/2023] Open
Abstract
Differences between genomes can be due to single nucleotide variants, translocations, inversions, and copy number variants (CNVs, gain or loss of DNA). The latter can range from sub-microscopic events to complete chromosomal aneuploidies. Small CNVs are often benign but those larger than 500 kb are strongly associated with morbid consequences such as developmental disorders and cancer. Detecting CNVs within and between populations is essential to better understand the plasticity of our genome and to elucidate its possible contribution to disease. Hence there is a need for better-tailored and more robust tools for the detection and genome-wide analyses of CNVs. While a link between a given CNV and a disease may have often been established, the relative CNV contribution to disease progression and impact on drug response is not necessarily understood. In this review we discuss the progress, challenges, and limitations that occur at different stages of CNV analysis from the detection (using DNA microarrays and next-generation sequencing) and identification of recurrent CNVs to the association with phenotypes. We emphasize the importance of germline CNVs and propose strategies to aid clinicians to better interpret structural variations and assess their clinical implications.
Collapse
Affiliation(s)
- Armand Valsesia
- Genetics Core, Nestlé Institute of Health Sciences Lausanne, Switzerland
| | | | | | | | | |
Collapse
|
38
|
Evangelou E, Ioannidis JPA. Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet 2013; 14:379-89. [PMID: 23657481 DOI: 10.1038/nrg3472] [Citation(s) in RCA: 404] [Impact Index Per Article: 33.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Meta-analysis of genome-wide association studies (GWASs) has become a popular method for discovering genetic risk variants. Here, we overview both widely applied and newer statistical methods for GWAS meta-analysis, including issues of interpretation and assessment of sources of heterogeneity. We also discuss extensions of these meta-analysis methods to complex data. Where possible, we provide guidelines for researchers who are planning to use these methods. Furthermore, we address special issues that may arise for meta-analysis of sequencing data and rare variants. Finally, we discuss challenges and solutions surrounding the goals of making meta-analysis data publicly available and building powerful consortia.
Collapse
Affiliation(s)
- Evangelos Evangelou
- Clinical and Molecular Epidemiology Unit, Department of Hygiene and Epidemiology, University of Ioannina Medical School, Ioannina 45110, Greece
| | | |
Collapse
|
39
|
Abstract
The role of rare variants has become a focus in the search for association with complex traits. Imputation is a powerful and cost-efficient tool to access variants that have not been directly typed, but there are several challenges when imputing rare variants, most notably reference panel selection. Extensions to rare variant association tests to incorporate genotype uncertainty from imputation are discussed, as well as the use of imputed low-frequency and rare variants in the study of population isolates.
Collapse
|
40
|
Tachmazidou I, Morris A, Zeggini E. Rare variant association testing for next-generation sequencing data via hierarchical clustering. Hum Hered 2013; 74:165-71. [PMID: 23594494 PMCID: PMC3668801 DOI: 10.1159/000346022] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
OBJECTIVES It is thought that a proportion of the genetic susceptibility to complex diseases is due to low-frequency and rare variants. Next-generation sequencing in large populations facilitates the detection of rare variant associations to disease risk. In order to achieve adequate power to detect association at low-frequency and rare variants, locus-specific statistical methods are being developed that combine information across variants within a functional unit and test for association with this enriched signal through so-called burden tests. METHODS We propose a hierarchical clustering approach and a similarity kernel-based association test for continuous phenotypes. This method clusters individuals into groups, within which samples are assumed to be genetically similar, and subsequently tests the group effects among the different clusters. RESULTS The power of this approach is comparable to that of collapsing methods when causal variants have the same direction of effect, but its power is significantly higher compared to burden tests when both protective and risk variants are present in the region of interest. Overall, we observe that the Sequence Kernel Association Test (SKAT) is the most powerful approach under the allelic architectures considered. CONCLUSIONS In our overall comparison, we find the analytical framework within which SKAT operates to yield higher power and to control type I error appropriately.
Collapse
|
41
|
Zuo L, Zhang H, Malison RT, Li CSR, Zhang XY, Wang F, Lu L, Lu L, Wang X, Krystal JH, Zhang F, Deng HW, Luo X. Rare ADH variant constellations are specific for alcohol dependence. Alcohol Alcohol 2012; 48:9-14. [PMID: 23019235 DOI: 10.1093/alcalc/ags104] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
AIMS Some of the well-known functional alcohol dehydrogenase (ADH) gene variants (e.g. ADH1B*2, ADH1B*3 and ADH1C*2) that significantly affect the risk of alcohol dependence are rare variants in most populations. In the present study, we comprehensively examined the associations between rare ADH variants [minor allele frequency (MAF) <0.05] and alcohol dependence, with several other neuropsychiatric and neurological disorders as reference. METHODS A total of 49,358 subjects in 22 independent cohorts with 11 different neuropsychiatric and neurological disorders were analyzed, including 3 cohorts with alcohol dependence. The entire ADH gene cluster (ADH7-ADH1C-ADH1B-ADH1A-ADH6-ADH4-ADH5 at Chr4) was imputed in all samples using the same reference panels that included whole-genome sequencing data. We stringently cleaned the phenotype and genotype data to obtain a total of 870 single nucleotide polymorphisms with 0< MAF <0.05 for association analysis. RESULTS We found that a rare variant constellation across the entire ADH gene cluster was significantly associated with alcohol dependence in European-Americans (Fp1: simulated global P = 0.045), European-Australians (Fp5: global P = 0.027; collapsing: P = 0.038) and African-Americans (Fp5: global P = 0.050; collapsing: P = 0.038), but not with any other neuropsychiatric disease. Association signals in this region came principally from ADH6, ADH7, ADH1B and ADH1C. In particular, a rare ADH6 variant constellation showed a replicable association with alcohol dependence across these three independent cohorts. No individual rare variants were statistically significantly associated with any disease examined after group- and region-wide correction for multiple comparisons. CONCLUSION We conclude that rare ADH variants are specific for alcohol dependence. The ADH gene cluster may harbor a causal variant(s) for alcohol dependence.
Collapse
Affiliation(s)
- Lingjun Zuo
- Department of Psychiatry, Yale University School of Medicine, West Haven, CT 06516, USA
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Mägi R, Asimit JL, Day-Williams AG, Zeggini E, Morris AP. Genome-wide association analysis of imputed rare variants: application to seven common complex diseases. Genet Epidemiol 2012; 36:785-96. [PMID: 22951892 PMCID: PMC3569874 DOI: 10.1002/gepi.21675] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 07/23/2012] [Accepted: 07/27/2012] [Indexed: 12/21/2022]
Abstract
Genome-wide association studies have been successful in identifying loci contributing effects to a range of complex human traits. The majority of reproducible associations within these loci are with common variants, each of modest effect, which together explain only a small proportion of heritability. It has been suggested that much of the unexplained genetic component of complex traits can thus be attributed to rare variation. However, genome-wide association study genotyping chips have been designed primarily to capture common variation, and thus are underpowered to detect the effects of rare variants. Nevertheless, we demonstrate here, by simulation, that imputation from an existing scaffold of genome-wide genotype data up to high-density reference panels has the potential to identify rare variant associations with complex traits, without the need for costly re-sequencing experiments. By application of this approach to genome-wide association studies of seven common complex diseases, imputed up to publicly available reference panels, we identify genome-wide significant evidence of rare variant association in PRDM10 with coronary artery disease and multiple genes in the major histocompatibility complex (MHC) with type 1 diabetes. The results of our analyses highlight that genome-wide association studies have the potential to offer an exciting opportunity for gene discovery through association with rare variants, conceivably leading to substantial advancements in our understanding of the genetic architecture underlying complex human traits.
Collapse
Affiliation(s)
- Reedik Mägi
- Estonian Genome Centre, University of Tartu, Tartu, Estonia
| | | | | | | | | |
Collapse
|