101
|
Kim HJ, Yu Z, Lawson A, Zhao H, Chung D. Improving SNP prioritization and pleiotropic architecture estimation by incorporating prior knowledge using graph-GPA. Bioinformatics 2018; 34:2139-2141. [PMID: 29432514 DOI: 10.1093/bioinformatics/bty061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/06/2018] [Indexed: 12/17/2022] Open
Abstract
Summary Integration of genetic studies for multiple phenotypes is a powerful approach to improving the identification of genetic variants associated with complex traits. Although it has been shown that leveraging shared genetic basis among phenotypes, namely pleiotropy, can increase statistical power to identify risk variants, it remains challenging to effectively integrate genome-wide association study (GWAS) datasets for a large number of phenotypes. We previously developed graph-GPA, a Bayesian hierarchical model that integrates multiple GWAS datasets to boost statistical power for the identification of risk variants and to estimate pleiotropic architecture within a unified framework. Here we propose a novel improvement of graph-GPA which incorporates external knowledge about phenotype-phenotype relationship to guide the estimation of genetic correlation and the association mapping. The application of graph-GPA to GWAS datasets for 12 complex diseases with a prior disease graph obtained from a text mining of biomedical literature illustrates its power to improve the identification of risk genetic variants and to facilitate understanding of genetic relationship among complex diseases. Availability and implementation graph-GPA is implemented as an R package 'GGPA', which is publicly available at http://dongjunchung.github.io/GGPA/. DDNet, a web interface to query diseases of interest and download a prior disease graph obtained from a text mining of biomedical literature, is publicly available at http://www.chunglab.io/ddnet/. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hang J Kim
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, USA
| | - Zhenning Yu
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Andrew Lawson
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA.,Department of Genetics, Yale School of Medicine, New Haven, CT, USA
| | - Dongjun Chung
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29425, USA
| |
Collapse
|
102
|
Anderson D, Lassmann T. A phenotype centric benchmark of variant prioritisation tools. NPJ Genom Med 2018; 3:5. [PMID: 29423277 PMCID: PMC5799157 DOI: 10.1038/s41525-018-0044-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Revised: 01/09/2018] [Accepted: 01/10/2018] [Indexed: 01/08/2023] Open
Abstract
Next generation sequencing is a standard tool used in clinical diagnostics. In Mendelian diseases the challenge is to discover the single etiological variant among thousands of benign or functionally unrelated variants. After calling variants from aligned sequencing reads, variant prioritisation tools are used to examine the conservation or potential functional consequences of variants. We hypothesised that the performance of variant prioritisation tools may vary by disease phenotype. To test this we created benchmark data sets for variants associated with different disease phenotypes. We found that performance of 24 tested tools is highly variable and differs by disease phenotype. The task of identifying a causative variant amongst a large number of benign variants is challenging for all tools, highlighting the need for further development in the field. Based on our observations, we recommend use of five top performers found in this study (FATHMM, M-CAP, MetaLR, MetaSVM and VEST3). In addition we provide tables indicating which analytical approach works best in which disease context. Variant prioritisation tools are best suited to investigate variants associated with well-studied genetic diseases, as these variants are more readily available during algorithm development than variants associated with rare diseases. We anticipate that further development into disease focussed tools will lead to significant improvements.
Collapse
Affiliation(s)
- Denise Anderson
- Telethon Kids Institute, The University of Western Australia, Subiaco, WA 6008 Australia
| | - Timo Lassmann
- Telethon Kids Institute, The University of Western Australia, Subiaco, WA 6008 Australia
| |
Collapse
|
103
|
Li J, Shi L, Zhang K, Zhang Y, Hu S, Zhao T, Teng H, Li X, Jiang Y, Ji L, Sun Z. VarCards: an integrated genetic and clinical database for coding variants in the human genome. Nucleic Acids Res 2018; 46:D1039-D1048. [PMID: 29112736 PMCID: PMC5753295 DOI: 10.1093/nar/gkx1039] [Citation(s) in RCA: 142] [Impact Index Per Article: 23.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 10/16/2017] [Accepted: 10/18/2017] [Indexed: 12/24/2022] Open
Abstract
A growing number of genomic tools and databases were developed to facilitate the interpretation of genomic variants, particularly in coding regions. However, these tools are separately available in different online websites or databases, making it challenging for general clinicians, geneticists and biologists to obtain the first-hand information regarding some particular variants and genes of interest. Starting with coding regions and splice sties, we artificially generated all possible single nucleotide variants (n = 110 154 363) and cataloged all reported insertion and deletions (n = 1 223 370). We then annotated these variants with respect to functional consequences from more than 60 genomic data sources to develop a database, named VarCards (http://varcards.biols.ac.cn/), by which users can conveniently search, browse and annotate the variant- and gene-level implications of given variants, including the following information: (i) functional effects; (ii) functional consequences through different in silico algorithms; (iii) allele frequencies in different populations; (iv) disease- and phenotype-related knowledge; (v) general meaningful gene-level information; and (vi) drug-gene interactions. As a case study, we successfully employed VarCards in interpretation of de novo mutations in autism spectrum disorders. In conclusion, VarCards provides an intuitive interface of necessary information for researchers to prioritize candidate variations and genes.
Collapse
Affiliation(s)
- Jinchen Li
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
- National Clinical Research Center for Geriatric Disorders, Xiangya Hospital, Central South University, Changsha, Hunan 410078, China
- Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410078, China
| | - Leisheng Shi
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Kun Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Yi Zhang
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Shanshan Hu
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Tingting Zhao
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Huajing Teng
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Xianfeng Li
- Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410078, China
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Yi Jiang
- Laboratory of Medical Genetics, School of Life Sciences, Central South University, Changsha, Hunan 410078, China
| | - Liying Ji
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
| | - Zhongsheng Sun
- Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou, Zhejiang 325025, China
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
104
|
Hao X, Zeng P, Zhang S, Zhou X. Identifying and exploiting trait-relevant tissues with multiple functional annotations in genome-wide association studies. PLoS Genet 2018; 14:e1007186. [PMID: 29377896 PMCID: PMC5805369 DOI: 10.1371/journal.pgen.1007186] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 02/08/2018] [Accepted: 01/04/2018] [Indexed: 12/18/2022] Open
Abstract
Genome-wide association studies (GWASs) have identified many disease associated loci, the majority of which have unknown biological functions. Understanding the mechanism underlying trait associations requires identifying trait-relevant tissues and investigating associations in a trait-specific fashion. Here, we extend the widely used linear mixed model to incorporate multiple SNP functional annotations from omics studies with GWAS summary statistics to facilitate the identification of trait-relevant tissues, with which to further construct powerful association tests. Specifically, we rely on a generalized estimating equation based algorithm for parameter inference, a mixture modeling framework for trait-tissue relevance classification, and a weighted sequence kernel association test constructed based on the identified trait-relevant tissues for powerful association analysis. We refer to our analytic procedure as the Scalable Multiple Annotation integration for trait-Relevant Tissue identification and usage (SMART). With extensive simulations, we show how our method can make use of multiple complementary annotations to improve the accuracy for identifying trait-relevant tissues. In addition, our procedure allows us to make use of the inferred trait-relevant tissues, for the first time, to construct more powerful SNP set tests. We apply our method for an in-depth analysis of 43 traits from 28 GWASs using tissue-specific annotations in 105 tissues derived from ENCODE and Roadmap. Our results reveal new trait-tissue relevance, pinpoint important annotations that are informative of trait-tissue relationship, and illustrate how we can use the inferred trait-relevant tissues to construct more powerful association tests in the Wellcome trust case control consortium study.
Collapse
Affiliation(s)
- Xingjie Hao
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei, China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| | - Ping Zeng
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan, Hubei, China
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, United States of America
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
105
|
Lee PH, Lee C, Li X, Wee B, Dwivedi T, Daly M. Principles and methods of in-silico prioritization of non-coding regulatory variants. Hum Genet 2018; 137:15-30. [PMID: 29288389 PMCID: PMC5892192 DOI: 10.1007/s00439-017-1861-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2017] [Accepted: 12/14/2017] [Indexed: 12/13/2022]
Abstract
Over a decade of genome-wide association, studies have made great strides toward the detection of genes and genetic mechanisms underlying complex traits. However, the majority of associated loci reside in non-coding regions that are functionally uncharacterized in general. Now, the availability of large-scale tissue and cell type-specific transcriptome and epigenome data enables us to elucidate how non-coding genetic variants can affect gene expressions and are associated with phenotypic changes. Here, we provide an overview of this emerging field in human genomics, summarizing available data resources and state-of-the-art analytic methods to facilitate in-silico prioritization of non-coding regulatory mutations. We also highlight the limitations of current approaches and discuss the direction of much-needed future research.
Collapse
Affiliation(s)
- Phil H Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA.
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
| | - Christian Lee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Department of Life Sciences, Harvard University, Cambridge, MA, USA
| | - Xihao Li
- Quantitative Genomics Program, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Brian Wee
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
| | - Tushar Dwivedi
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Mark Daly
- Center for Genomic Medicine, Massachusetts General Hospital and Harvard Medical School, Simches Research Building, 185 Cambridge St, Boston, MA, 02114, USA
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| |
Collapse
|
106
|
Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL, Jiang T, Hu Y, Chang D, Jin C, Dai W, He Q, Liu Z, Mukherjee S, Crane PK, Zhao H. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics. Am J Hum Genet 2017; 101:939-964. [PMID: 29220677 PMCID: PMC5812911 DOI: 10.1016/j.ajhg.2017.11.001] [Citation(s) in RCA: 111] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2017] [Accepted: 10/25/2017] [Indexed: 02/08/2023] Open
Abstract
Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (Ntotal≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS.
Collapse
Affiliation(s)
- Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - Derek Ou
- Yale School of Medicine, New Haven, CT 06510, USA
| | | | - Ryan L Powles
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, USA
| | | | - Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - David Chang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, USA
| | | | - Wei Dai
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | - Qidu He
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Zefeng Liu
- Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shubhabrata Mukherjee
- Division of General Internal Medicine, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Paul K Crane
- Division of General Internal Medicine, Department of Medicine, University of Washington, Seattle, WA 98195, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA; Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT 06510, USA; VA Cooperative Studies Program Coordinating Center, West Haven, CT 06516, USA.
| |
Collapse
|
107
|
Aston E, Channon A, Belavkin RV, Gifford DR, Krašovec R, Knight CG. Critical Mutation Rate has an Exponential Dependence on Population Size for Eukaryotic-length Genomes with Crossover. Sci Rep 2017; 7:15519. [PMID: 29138394 PMCID: PMC5686101 DOI: 10.1038/s41598-017-14628-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2017] [Accepted: 10/02/2017] [Indexed: 01/22/2023] Open
Abstract
The critical mutation rate (CMR) determines the shift between survival-of-the-fittest and survival of individuals with greater mutational robustness ("flattest"). We identify an inverse relationship between CMR and sequence length in an in silico system with a two-peak fitness landscape; CMR decreases to no more than five orders of magnitude above estimates of eukaryotic per base mutation rate. We confirm the CMR reduces exponentially at low population sizes, irrespective of peak radius and distance, and increases with the number of genetic crossovers. We also identify an inverse relationship between CMR and the number of genes, confirming that, for a similar number of genes to that for the plant Arabidopsis thaliana (25,000), the CMR is close to its known wild-type mutation rate; mutation rates for additional organisms were also found to be within one order of magnitude of the CMR. This is the first time such a simulation model has been assigned input and produced output within range for a given biological organism. The decrease in CMR with population size previously observed is maintained; there is potential for the model to influence understanding of populations undergoing bottleneck, stress, and conservation strategy for populations near extinction.
Collapse
Affiliation(s)
- Elizabeth Aston
- School of Computing and Mathematics, Keele University, Keele, Staffordshire, UK.
| | - Alastair Channon
- School of Computing and Mathematics, Keele University, Keele, Staffordshire, UK
| | - Roman V Belavkin
- School of Engineering and Information Sciences, Middlesex University, London, UK
| | - Danna R Gifford
- Faculty of Science and Engineering, The University of Manchester, Manchester, UK
| | - Rok Krašovec
- Faculty of Science and Engineering, The University of Manchester, Manchester, UK
| | - Christopher G Knight
- Faculty of Science and Engineering, The University of Manchester, Manchester, UK
| |
Collapse
|
108
|
Chen L, Qin ZS. Using DIVAN to assess disease/trait-associated single nucleotide variants in genome-wide scale. BMC Res Notes 2017; 10:530. [PMID: 29084591 PMCID: PMC5663107 DOI: 10.1186/s13104-017-2851-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 10/23/2017] [Indexed: 01/01/2023] Open
Abstract
OBJECTIVE The majority of sequence variants identified by Genome-wide association studies (GWASs) fall outside of the protein-coding regions. Unlike coding variants, it is challenging to connect these noncoding variants to the pathophysiology of complex diseases/traits due to the lack of functional annotations in the non-coding regions. To overcome this, by leveraging the rich collection of genomic and epigenomic profiles, we have developed DIVAN, or Disease/trait-specific Variant ANnotation, which enables the assignment of a measurement (D-score) for each base of the human genome in a disease/trait-specific manner. To facilitate the utilization of DIVAN, we pre-computed D-scores for every base of the human genome (hg19) for 45 different diseases/traits. RESULTS In this work, we present a detailed protocol on how to utilize DIVAN software toolkit to retrieve D-scores either by variant identifiers or by genomic regions for a disease/trait of interest. We also demonstrate the utilities of the D-scores using real data examples. We believe that the pre-computed D-scores for 45 diseases/traits is a useful resource to follow up on the discoveries made by GWASs, and the DIVAN software toolkit provides a convenient way to access this resource. DIVAN is freely available at https://sites.google.com/site/emorydivan/software .
Collapse
Affiliation(s)
- Li Chen
- Department of Health Outcomes Research and Policy, Harrison School of Pharmacy, Auburn University, Auburn, AL, 36849, USA.
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, USA. .,Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
109
|
Eriguchi Y, Kuwabara H, Inai A, Kawakubo Y, Nishimura F, Kakiuchi C, Tochigi M, Ohashi J, Aoki N, Kato K, Ishiura H, Mitsui J, Tsuji S, Doi K, Yoshimura J, Morishita S, Shimada T, Furukawa M, Umekage T, Sasaki T, Kasai K, KanoMD PhD Y. Identification of candidate genes involved in the etiology of sporadic Tourette syndrome by exome sequencing. Am J Med Genet B Neuropsychiatr Genet 2017; 174:712-723. [PMID: 28608572 DOI: 10.1002/ajmg.b.32559] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/03/2016] [Accepted: 05/15/2017] [Indexed: 01/01/2023]
Abstract
Tourette Syndrome (TS) is a neurodevelopmental disorder characterized by chronic motor and vocal tics. Although there is a large genetic contribution, the genetic architecture of TS remains unclear. Exome sequencing has successfully revealed the contribution of de novo mutations in sporadic cases with neuropsychiatric disorders such as autism and schizophrenia. Here, using exome sequencing, we investigated de novo mutations in individuals with sporadic TS to identify novel risk loci and elucidate the genetic background of TS. Exome analysis was conducted for sporadic TS cases: nine trio families and one quartet family with concordant twins were investigated. Missense mutations were evaluated using functional prediction algorithms, and their population frequencies were calculated based on three public databases. Gene expression patterns in the brain were analyzed using the BrainSpan Developmental Transcriptome. Thirty de novo mutations, including four synonymous and four missense mutations, were identified. Among the missense mutations, one in the rapamycin-insensitive companion of mammalian target of rapamycin (RICTOR)-coding gene (rs140964083: G > A, found in one proband) was predicted to be hazardous. In the three public databases analyzed, variants in the same SNP locus were absent, and variants in the same gene were either absent or present at an extremely low frequency (3/5,008), indicating the rarity of hazardous RICTOR mutations in the general population. The de novo variant of RICTOR may be implicated in the development of sporadic TS, and RICTOR is a novel candidate factor for TS etiology.
Collapse
Affiliation(s)
- Yosuke Eriguchi
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Department of Neuropsychiatry, Sakura Hospital, Aomori, Japan
| | - Hitoshi Kuwabara
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Disability Services Office, The University of Tokyo, Tokyo, Japan
| | - Aya Inai
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yuki Kawakubo
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Fumichika Nishimura
- Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Chihiro Kakiuchi
- Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Mamoru Tochigi
- Department of Neuropsychiatry, Teikyo University School of Medicine, Tokyo, Japan
| | - Jun Ohashi
- Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Tokyo, Japan
| | - Naoto Aoki
- Department of Neuropsychiatry, Sakura Hospital, Aomori, Japan
| | - Kayoko Kato
- Department of Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Ishiura
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Jun Mitsui
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Shoji Tsuji
- Department of Neurology, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan.,Medical Genome Center, The University of Tokyo Hospital, The University of Tokyo, Tokyo, Japan
| | - Koichiro Doi
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Jun Yoshimura
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Shinichi Morishita
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Chiba, Japan
| | - Takafumi Shimada
- Division for Counseling and Support, The University of Tokyo, Tokyo, Japan
| | - Masaomi Furukawa
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Tadashi Umekage
- Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Tsukasa Sasaki
- Department of Physical and Health Education, Graduate School of Education, The University of Tokyo, Tokyo, Japan
| | - Kiyoto Kasai
- Department of Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| | - Yukiko KanoMD PhD
- Department of Child Neuropsychiatry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
110
|
Rath M, Jenssen SE, Schwefel K, Spiegler S, Kleimeier D, Sperling C, Kaderali L, Felbor U. High-throughput sequencing of the entire genomic regions of CCM1/KRIT1 , CCM2 and CCM3/PDCD10 to search for pathogenic deep-intronic splice mutations in cerebral cavernous malformations. Eur J Med Genet 2017. [DOI: 10.1016/j.ejmg.2017.06.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
111
|
Lu Q, Powles RL, Abdallah S, Ou D, Wang Q, Hu Y, Lu Y, Liu W, Li B, Mukherjee S, Crane PK, Zhao H. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease. PLoS Genet 2017; 13:e1006933. [PMID: 28742084 PMCID: PMC5546707 DOI: 10.1371/journal.pgen.1006933] [Citation(s) in RCA: 67] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 08/07/2017] [Accepted: 07/18/2017] [Indexed: 12/31/2022] Open
Abstract
Continuing efforts from large international consortia have made genome-wide epigenomic and transcriptomic annotation data publicly available for a variety of cell and tissue types. However, synthesis of these datasets into effective summary metrics to characterize the functional non-coding genome remains a challenge. Here, we present GenoSkyline-Plus, an extension of our previous work through integration of an expanded set of epigenomic and transcriptomic annotations to produce high-resolution, single tissue annotations. After validating our annotations with a catalog of tissue-specific non-coding elements previously identified in the literature, we apply our method using data from 127 different cell and tissue types to present an atlas of heritability enrichment across 45 different GWAS traits. We show that broader organ system categories (e.g. immune system) increase statistical power in identifying biologically relevant tissue types for complex diseases while annotations of individual cell types (e.g. monocytes or B-cells) provide deeper insights into disease etiology. Additionally, we use our GenoSkyline-Plus annotations in an in-depth case study of late-onset Alzheimer's disease (LOAD). Our analyses suggest a strong connection between LOAD heritability and genetic variants contained in regions of the genome functional in monocytes. Furthermore, we show that LOAD shares a similar localization of SNPs to monocyte-functional regions with Parkinson's disease. Overall, we demonstrate that integrated genome annotations at the single tissue level provide a valuable tool for understanding the etiology of complex human diseases. Our GenoSkyline-Plus annotations are freely available at http://genocanyon.med.yale.edu/GenoSkyline.
Collapse
Affiliation(s)
- Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Ryan L. Powles
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Sarah Abdallah
- Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Derek Ou
- Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Yisi Lu
- Department of Immunobiology, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Wei Liu
- School of Life Sciences, Peking University, Beijing, China
| | - Boyang Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Shubhabrata Mukherjee
- Division of General Internal Medicine, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Paul K. Crane
- Division of General Internal Medicine, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America
| |
Collapse
|
112
|
Hu Y, Lu Q, Liu W, Zhang Y, Li M, Zhao H. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLoS Genet 2017; 13:e1006836. [PMID: 28598966 PMCID: PMC5482506 DOI: 10.1371/journal.pgen.1006836] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2016] [Revised: 06/23/2017] [Accepted: 05/23/2017] [Indexed: 12/25/2022] Open
Abstract
Accurate prediction of disease risk based on genetic factors is an important goal in human genetics research and precision medicine. Advanced prediction models will lead to more effective disease prevention and treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome-wide association studies (GWAS) in the past decade, accuracy of genetic risk prediction remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes. In this work, we introduce PleioPred, a principled framework that leverages pleiotropy and functional annotations in genetic risk prediction for complex diseases. PleioPred uses GWAS summary statistics as its input, and jointly models multiple genetically correlated diseases and a variety of external information including linkage disequilibrium and diverse functional annotations to increase the accuracy of risk prediction. Through comprehensive simulations and real data analyses on Crohn’s disease, celiac disease and type-II diabetes, we demonstrate that our approach can substantially increase the accuracy of polygenic risk prediction and risk population stratification, i.e. PleioPred can significantly better separate type-II diabetes patients with early and late onset ages, illustrating its potential clinical application. Furthermore, we show that the increment in prediction accuracy is significantly correlated with the genetic correlation between the predicted and jointly modeled diseases. Genetic risk prediction plays a significant role in precision medicine. Accurate prediction models could have great impact on disease prevention and treatment strategies. However, prediction accuracies for most complex diseases remain moderate mainly due to the challenges in identifying and quantifying the effects of genetic variants from millions of markers, limited access to individual-level genotype data, and lack of efficient computational methods. Up to now, most methods have been focused on predicting disease risk using data from a single trait. With the discovery of genetic correlations among many complex diseases, incorporating data of genetically correlated diseases could have the potential to increase prediction accuracy. Current statistical methods are not able to fully exploit the richness of these kinds of data to take into account the shared genetic architecture. To make use of commonly available GWAS summary statistics, we propose a novel method to address these challenges by jointly modeling genetically correlated diseases and integrating genomic functional annotations. We demonstrate the substantial improvement in accuracy in both extensive simulation studies and real data analysis of Crohn’s disease, celiac disease and type-II diabetes. Furthermore, we show that the increment in prediction accuracy is significantly correlated with the genetic correlation between the predicted and jointly modeled diseases.
Collapse
Affiliation(s)
- Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Wei Liu
- Peking University, Beijing, China
| | - Yuhua Zhang
- Shanghai Jiao Tong University, Shanghai, China
| | - Mo Li
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut, United States of America
- Clinical Epidemiology Research Center (CERC), Veterans Affairs (VA) Cooperative Studies Program, VA Connecticut Healthcare System, West Haven, Connecticut, United States of America
- * E-mail:
| |
Collapse
|
113
|
Leveraging functional annotations in genetic risk prediction for human complex diseases. PLoS Comput Biol 2017; 13:e1005589. [PMID: 28594818 PMCID: PMC5481142 DOI: 10.1371/journal.pcbi.1005589] [Citation(s) in RCA: 102] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2016] [Revised: 06/22/2017] [Accepted: 05/19/2017] [Indexed: 12/25/2022] Open
Abstract
Genetic risk prediction is an important goal in human genetics research and precision medicine. Accurate prediction models will have great impacts on both disease prevention and early treatment strategies. Despite the identification of thousands of disease-associated genetic variants through genome wide association studies (GWAS), genetic risk prediction accuracy remains moderate for most diseases, which is largely due to the challenges in both identifying all the functionally relevant variants and accurately estimating their effect sizes in the presence of linkage disequilibrium. In this paper, we introduce AnnoPred, a principled framework that leverages diverse types of genomic and epigenomic functional annotations in genetic risk prediction for complex diseases. AnnoPred is trained using GWAS summary statistics in a Bayesian framework in which we explicitly model various functional annotations and allow for linkage disequilibrium estimated from reference genotype data. Compared with state-of-the-art risk prediction methods, AnnoPred achieves consistently improved prediction accuracy in both extensive simulations and real data.
Collapse
|
114
|
Qamra A, Xing M, Padmanabhan N, Kwok JJT, Zhang S, Xu C, Leong YS, Lee Lim AP, Tang Q, Ooi WF, Suling Lin J, Nandi T, Yao X, Ong X, Lee M, Tay ST, Keng ATL, Gondo Santoso E, Ng CCY, Ng A, Jusakul A, Smoot D, Ashktorab H, Rha SY, Yeoh KG, Peng Yong W, Chow PK, Chan WH, Ong HS, Soo KC, Kim KM, Wong WK, Rozen SG, Teh BT, Kappei D, Lee J, Connolly J, Tan P. Epigenomic Promoter Alterations Amplify Gene Isoform and Immunogenic Diversity in Gastric Adenocarcinoma. Cancer Discov 2017; 7:630-651. [DOI: 10.1158/2159-8290.cd-16-1022] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2016] [Revised: 10/27/2016] [Accepted: 03/16/2017] [Indexed: 01/08/2023]
|
115
|
Chung D, Kim HJ, Zhao H. graph-GPA: A graphical model for prioritizing GWAS results and investigating pleiotropic architecture. PLoS Comput Biol 2017; 13:e1005388. [PMID: 28212402 PMCID: PMC5347371 DOI: 10.1371/journal.pcbi.1005388] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2016] [Revised: 03/06/2017] [Accepted: 01/28/2017] [Indexed: 02/06/2023] Open
Abstract
Genome-wide association studies (GWAS) have identified tens of thousands of genetic variants associated with hundreds of phenotypes and diseases, which have provided clinical and medical benefits to patients with novel biomarkers and therapeutic targets. However, identification of risk variants associated with complex diseases remains challenging as they are often affected by many genetic variants with small or moderate effects. There has been accumulating evidence suggesting that different complex traits share common risk basis, namely pleiotropy. Recently, several statistical methods have been developed to improve statistical power to identify risk variants for complex traits through a joint analysis of multiple GWAS datasets by leveraging pleiotropy. While these methods were shown to improve statistical power for association mapping compared to separate analyses, they are still limited in the number of phenotypes that can be integrated. In order to address this challenge, in this paper, we propose a novel statistical framework, graph-GPA, to integrate a large number of GWAS datasets for multiple phenotypes using a hidden Markov random field approach. Application of graph-GPA to a joint analysis of GWAS datasets for 12 phenotypes shows that graph-GPA improves statistical power to identify risk variants compared to statistical methods based on smaller number of GWAS datasets. In addition, graph-GPA also promotes better understanding of genetic mechanisms shared among phenotypes, which can potentially be useful for the development of improved diagnosis and therapeutics. The R implementation of graph-GPA is currently available at https://dongjunchung.github.io/GGPA/. Recently, there has been accumulating evidence suggesting pleiotropy, i.e., genetic components shared across multiple phenotypes. Incorporation of pleiotropy in genetic analysis might improve statistical power to identify risk associated genetic variants. Several statistical approaches have been proposed to utilize pleiotropy for association mapping but they are currently still limited to a relatively small number of phenotypes, e.g., a pair of phenotypes. This restricts potential gain in statistical power in association mapping and investigation of pleiotropic structure among a large number of phenotypes. In order to address this challenge, in this paper, we propose graph-GPA, a novel statistical framework to integrate a large number of phenotypes using a hidden Markov random field architecture. Application of the proposed statistical method to GWAS datasets for 12 phenotypes showed that graph-GPA does not only provide a parsimonious representation of genetic relationship among these phenotypes, but also identify significantly larger number of novel genetic variants that are potentially functional. We believe that this novel approach might help investigation of common etiology and improvement of diagnosis and therapeutics.
Collapse
Affiliation(s)
- Dongjun Chung
- Department of Public Health Sciences, Medical University of South Carolina, Charleston, South Carolina, United States of America
- * E-mail:
| | - Hang J. Kim
- Department of Mathematical Sciences, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Genetics, Yale School of Medicine, New Haven, Connecticut, United States of America
- VA Cooperative Studies Program Coordinating Center, West Haven, Connecticut, United States of America
| |
Collapse
|
116
|
Chen L, Jin P, Qin ZS. DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles. Genome Biol 2016; 17:252. [PMID: 27923386 PMCID: PMC5139035 DOI: 10.1186/s13059-016-1112-z] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Accepted: 11/18/2016] [Indexed: 12/22/2022] Open
Abstract
Understanding the link between non-coding sequence variants, identified in genome-wide association studies, and the pathophysiology of complex diseases remains challenging due to a lack of annotations in non-coding regions. To overcome this, we developed DIVAN, a novel feature selection and ensemble learning framework, which identifies disease-specific risk variants by leveraging a comprehensive collection of genome-wide epigenomic profiles across cell types and factors, along with other static genomic features. DIVAN accurately and robustly recognizes non-coding disease-specific risk variants under multiple testing scenarios; among all the features, histone marks, especially those marks associated with repressed chromatin, are often more informative than others.
Collapse
Affiliation(s)
- Li Chen
- Department of Mathematics and Computer Science, Emory University, Atlanta, GA, 30322, USA
| | - Peng Jin
- Department of Human Genetics, Emory University School of Medicine, Atlanta, GA, 30322, USA
| | - Zhaohui S Qin
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA, 30322, USA. .,Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, GA, 30322, USA.
| |
Collapse
|
117
|
Lu Q, Jin C, Sun J, Bowler R, Kechris K, Kaminski N, Zhao H. Post-GWAS Prioritization Through Data Integration Provides Novel Insights on Chronic Obstructive Pulmonary Disease. STATISTICS IN BIOSCIENCES 2016; 2016:1-17. [PMID: 27812370 PMCID: PMC5087812 DOI: 10.1007/s12561-016-9151-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Revised: 05/14/2016] [Accepted: 05/24/2016] [Indexed: 01/24/2023]
Abstract
Rich collections of genomic and epigenomic annotations, availabilities of large population cohorts for genome-wide association studies (GWAS), and advancements in data integration techniques provide the unprecedented opportunity to accelerate discoveries in complex disease studies through integrative analyses. In this paper, we apply a variety of approaches to integrate GWAS summary statistics of chronic obstructive pulmonary disease (COPD) with functional annotations to illustrate how data integration could help researchers understand complex human diseases. We show that incorporating functional annotations can better prioritize GWAS signals at both the global and the local levels. Signal prioritization on severe COPD GWAS reveals multiple potential risk loci that are linked with pulmonary functions. Enrichment analysis provides novel insights on the pathogenesis of COPD and hints the existence of genetic contributions to muscle dysfuncion and chronic lung inflammation, two symptoms that are often co-morbid with COPD. Our results suggest that rich signals for COPD genetics are still buried under the Bonferroni-corrected genome-wide significance threshold. Many more biological findings are expected to emerge as more samples are recruited for COPD studies.
Collapse
Affiliation(s)
- Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | | | - Jiehuan Sun
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Russell Bowler
- National Jewish Health, Department of Medicine, Denver, CO, USA
| | - Katerina Kechris
- Department of Biostatistics and Informatics, University of Colorado Denver, Denver, CO, USA
| | - Naftali Kaminski
- Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA
- VA Cooperative Studies Program Coordinating Center, West Haven, CT, USA
| |
Collapse
|
118
|
Lu Q, Powles RL, Wang Q, He BJ, Zhao H. Integrative Tissue-Specific Functional Annotations in the Human Genome Provide Novel Insights on Many Complex Traits and Improve Signal Prioritization in Genome Wide Association Studies. PLoS Genet 2016; 12:e1005947. [PMID: 27058395 PMCID: PMC4825932 DOI: 10.1371/journal.pgen.1005947] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2015] [Accepted: 03/01/2016] [Indexed: 12/20/2022] Open
Abstract
Extensive efforts have been made to understand genomic function through both experimental and computational approaches, yet proper annotation still remains challenging, especially in non-coding regions. In this manuscript, we introduce GenoSkyline, an unsupervised learning framework to predict tissue-specific functional regions through integrating high-throughput epigenetic annotations. GenoSkyline successfully identified a variety of non-coding regulatory machinery including enhancers, regulatory miRNA, and hypomethylated transposable elements in extensive case studies. Integrative analysis of GenoSkyline annotations and results from genome-wide association studies (GWAS) led to novel biological insights on the etiologies of a number of human complex traits. We also explored using tissue-specific functional annotations to prioritize GWAS signals and predict relevant tissue types for each risk locus. Brain and blood-specific annotations led to better prioritization performance for schizophrenia than standard GWAS p-values and non-tissue-specific annotations. As for coronary artery disease, heart-specific functional regions was highly enriched of GWAS signals, but previously identified risk loci were found to be most functional in other tissues, suggesting a substantial proportion of still undetected heart-related loci. In summary, GenoSkyline annotations can guide genetic studies at multiple resolutions and provide valuable insights in understanding complex diseases. GenoSkyline is available at http://genocanyon.med.yale.edu/GenoSkyline.
Collapse
Affiliation(s)
- Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
| | - Ryan Lee Powles
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Qian Wang
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Beixin Julie He
- Division of Cardiovascular Medicine, Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut, United States of America
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut, United States of America
- Program of Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
119
|
Xu Z, Pan W. Binomial Mixture Model Based Association Testing to Account for Genetic Heterogeneity for GWAS. Genet Epidemiol 2016; 40:202-9. [PMID: 26916514 PMCID: PMC4814320 DOI: 10.1002/gepi.21954] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Revised: 11/20/2015] [Accepted: 12/14/2015] [Indexed: 11/09/2022]
Abstract
Genome-wide association studies (GWAS) have confirmed the ubiquitous existence of genetic heterogeneity for common disease: multiple common genetic variants have been identified to be associated, while many more are yet expected to be uncovered. However, the single SNP (single-nucleotide polymorphism) based trend test (or its variants) that has been dominantly used in GWAS is based on contrasting the allele frequency difference between the case and control groups, completely ignoring possible genetic heterogeneity. In spite of the widely accepted notion of genetic heterogeneity, we are not aware of any previous attempt to apply genetic heterogeneity motivated methods in GWAS. Here, to explicitly account for unknown genetic heterogeneity, we applied a mixture model based single-SNP test to the Wellcome Trust Case Control Consortium (WTCCC) GWAS data with traits of Crohn's disease, bipolar disease, coronary artery disease, and type 2 diabetes, identifying much larger numbers of significant SNPs and risk loci for each trait than those of the popular trend test, demonstrating potential power gain of the mixture model based test.
Collapse
Affiliation(s)
- Zhiyuan Xu
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, United States of America
| |
Collapse
|
120
|
Lu Q, Yao X, Hu Y, Zhao H. GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation. Bioinformatics 2015; 32:542-8. [PMID: 26504140 DOI: 10.1093/bioinformatics/btv610] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Accepted: 10/16/2015] [Indexed: 12/29/2022] Open
Abstract
MOTIVATION Genome-wide association study (GWAS) has been a great success in the past decade. However, significant challenges still remain in both identifying new risk loci and interpreting results. Bonferroni-corrected significance level is known to be conservative, leading to insufficient statistical power when the effect size is moderate at risk locus. Complex structure of linkage disequilibrium also makes it challenging to separate causal variants from nonfunctional ones in large haplotype blocks. Under such circumstances, a computational approach that may increase signal replication rate and identify potential functional sites among correlated markers is urgently needed. RESULTS We describe GenoWAP, a GWAS signal prioritization method that integrates genomic functional annotation and GWAS test statistics. The effectiveness of GenoWAP is demonstrated through its applications to Crohn's disease and schizophrenia using the largest studies available, where highly ranked loci show substantially stronger signals in the whole dataset after prioritization based on a subset of samples. At the single nucleotide polymorphism (SNP) level, top ranked SNPs after prioritization have both higher replication rates and consistently stronger enrichment of eQTLs. Within each risk locus, GenoWAP may be able to distinguish functional sites from groups of correlated SNPs. AVAILABILITY AND IMPLEMENTATION GenoWAP is freely available on the web at http://genocanyon.med.yale.edu/GenoWAP.
Collapse
Affiliation(s)
- Qiongshi Lu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | | | - Yiming Hu
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, CT, USA, Program of Computational Biology and Bioinformatics, Yale University, New Haven, CT, USA and VA Cooperative Studies Program Coordinating Center, West Haven, CT, USA
| |
Collapse
|