1
|
Verma SS, Guare L, Ehsan S, Gastounioti A, Scales G, Ritchie MD, Kontos D, McCarthy AM. Genome-Wide Association Study of Breast Density among Women of African Ancestry. Cancers (Basel) 2023; 15:2776. [PMID: 37345113 DOI: 10.3390/cancers15102776] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 05/03/2023] [Accepted: 05/11/2023] [Indexed: 06/23/2023] Open
Abstract
Breast density, the amount of fibroglandular versus fatty tissue in the breast, is a strong breast cancer risk factor. Understanding genetic factors associated with breast density may help in clarifying mechanisms by which breast density increases cancer risk. To date, 50 genetic loci have been associated with breast density, however, these studies were performed among predominantly European ancestry populations. We utilized a cohort of women aged 40-85 years who underwent screening mammography and had genetic information available from the Penn Medicine BioBank to conduct a Genome-Wide Association Study (GWAS) of breast density among 1323 women of African ancestry. For each mammogram, the publicly available "LIBRA" software was used to quantify dense area and area percent density. We identified 34 significant loci associated with dense area and area percent density, with the strongest signals in GACAT3, CTNNA3, HSD17B6, UGDH, TAAR8, ARHGAP10, BOD1L2, and NR3C2. There was significant overlap between previously identified breast cancer SNPs and SNPs identified as associated with breast density. Our results highlight the importance of breast density GWAS among diverse populations, including African ancestry populations. They may provide novel insights into genetic factors associated with breast density and help in elucidating mechanisms by which density increases breast cancer risk.
Collapse
Affiliation(s)
- Shefali Setia Verma
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Lindsay Guare
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Sarah Ehsan
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Aimilia Gastounioti
- Washington University School of Medicine in St. Louis, St. Louis, MO 63130, USA
| | | | - Marylyn D Ritchie
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Despina Kontos
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Anne Marie McCarthy
- Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
2
|
Chattopadhyay A, Shih CY, Hsu YC, Juang JMJ, Chuang EY, Lu TP. CLIN_SKAT: an R package to conduct association analysis using functionally relevant variants. BMC Bioinformatics 2022; 23:441. [PMID: 36274122 PMCID: PMC9590128 DOI: 10.1186/s12859-022-04987-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2022] [Accepted: 10/16/2022] [Indexed: 12/03/2022] Open
Abstract
Background Availability of next generation sequencing data, allows low-frequency and rare variants to be studied through strategies other than the commonly used genome-wide association studies (GWAS). Rare variants are important keys towards explaining the heritability for complex diseases that remains to be explained by common variants due to their low effect sizes. However, analysis strategies struggle to keep up with the huge amount of data at disposal therefore creating a bottleneck. This study describes CLIN_SKAT, an R package, that provides users with an easily implemented analysis pipeline with the goal of (i) extracting clinically relevant variants (both rare and common), followed by (ii) gene-based association analysis by grouping the selected variants.
Results CLIN_SKAT offers four simple functions that can be used to obtain clinically relevant variants, map them to genes or gene sets, calculate weights from global healthy populations and conduct weighted case–control analysis. CLIN_SKAT introduces improvements by adding certain pre-analysis steps and customizable features to make the SKAT results clinically more meaningful. Moreover, it offers several plot functions that can be availed towards obtaining visualizations for interpretation of the analyses results. CLIN_SKAT is available on Windows/Linux/MacOS and is operative for R version 4.0.4 or later. It can be freely downloaded from https://github.com/ShihChingYu/CLIN_SKAT, installed through devtools::install_github("ShihChingYu/CLIN_SKAT", force=T) and executed by loading the package into R using library(CLIN_SKAT). All outputs (tabular and graphical) can be downloaded in simple, publishable formats.
Conclusions Statistical association analysis is often underpowered due to low sample sizes and high numbers of variants to be tested, limiting detection of causal ones. Therefore, retaining a subset of variants that are biologically meaningful seems to be a more effective strategy for identifying explainable associations while reducing the degrees of freedom. CLIN_SKAT offers users a one-stop R package that identifies disease risk variants with improved power via a series of tailor-made procedures that allows dimension reduction, by retaining functionally relevant variants, and incorporating ethnicity based priors. Furthermore, it also eliminates the requirement for high computational resources and bioinformatics expertise. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04987-2.
Collapse
|
3
|
Kumar A, Sandhu N, Kumar P, Pruthi G, Singh J, Kaur S, Chhuneja P. Genome-wide identification and in silico analysis of NPF, NRT2, CLC and SLAC1/SLAH nitrate transporters in hexaploid wheat (Triticum aestivum). Sci Rep 2022; 12:11227. [PMID: 35781289 PMCID: PMC9250930 DOI: 10.1038/s41598-022-15202-w] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 06/20/2022] [Indexed: 11/09/2022] Open
Abstract
Nitrogen transport is one of the most important processes in plants mediated by specialized transmembrane proteins. Plants have two main systems for nitrogen uptake from soil and its transport within the system—a low-affinity transport system and a high-affinity transport system. Nitrate transporters are of special interest in cereal crops because large amount of money is spent on N fertilizers every year to enhance the crop productivity. Till date four gene families of nitrate transporter proteins; NPF (nitrate transporter 1/peptide transporter family), NRT2 (nitrate transporter 2 family), the CLC (chloride channel family), and the SLAC/SLAH (slow anion channel-associated homologues) have been reported in plants. In our study, in silico mining of nitrate transporter genes along with their detailed structure, phylogenetic and expression analysis was carried out. A total of 412 nitrate transporter genes were identified in hexaploid wheat genome using HMMER based homology searches in IWGSC Refseq v2.0. Out of those twenty genes were root specific, 11 leaf/shoot specific and 17 genes were grain/spike specific. The identification of nitrate transporter genes in the close proximity to the previously identified 67 marker-traits associations associated with the nitrogen use efficiency related traits in nested synthetic hexaploid wheat introgression library indicated the robustness of the reported transporter genes. The detailed crosstalk between the genome and proteome and the validation of identified putative candidate genes through expression and gene editing studies may lay down the foundation to improve nitrogen use efficiency of cereal crops.
Collapse
Affiliation(s)
- Aman Kumar
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Nitika Sandhu
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India.
| | - Pankaj Kumar
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Gomsie Pruthi
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Jasneet Singh
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Satinder Kaur
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| | - Parveen Chhuneja
- School of Agricultural Biotechnology, Punjab Agricultural University, Ludhiana, Punjab, India
| |
Collapse
|
4
|
Slim L, Chatelain C, Foucauld HD, Azencott CA. A systematic analysis of gene-gene interaction in multiple sclerosis. BMC Med Genomics 2022; 15:100. [PMID: 35501860 PMCID: PMC9063218 DOI: 10.1186/s12920-022-01247-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Accepted: 03/28/2022] [Indexed: 12/03/2022] Open
Abstract
Background For the most part, genome-wide association studies (GWAS) have only partially explained the heritability of complex diseases. One of their limitations is to assume independent contributions of individual variants to the phenotype. Many tools have therefore been developed to investigate the interactions between distant loci, or epistasis. Among them, the recently proposed EpiGWAS models the interactions between a target variant and the rest of the genome. However, applying this approach to studying interactions along all genes of a disease map is not straightforward. Here, we propose a pipeline to that effect, which we illustrate by investigating a multiple sclerosis GWAS dataset from the Wellcome Trust Case Control Consortium 2 through 19 disease maps from the MetaCore pathway database. Results For each disease map, we build an epistatic network by connecting the genes that are deemed to interact. These networks tend to be connected, complementary to the disease maps and contain hubs. In addition, we report 4 epistatic gene pairs involving missense variants, and 25 gene pairs with a deleterious epistatic effect mediated by eQTLs. Among these, we highlight the interaction of GLI-1 and SUFU, and of IP10 and NF-\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\kappa$$\end{document}κB, as they both match known biological interactions. The latter pair is particularly promising for therapeutic development, as both genes have known inhibitors. Conclusions Our study showcases the ability of EpiGWAS to uncover biologically interpretable epistatic interactions that are potentially actionable for the development of combination therapy.
Supplementary Information The online version contains supplementary material available at 10.1186/s12920-022-01247-3.
Collapse
Affiliation(s)
- Lotfi Slim
- CBIO, MINES ParisTech, PSL Research University, 75006, Paris, France. .,Translational Sciences, SANOFI R&D, 91385, Chilly-Mazarin, France. .,NVIDIA Corporation, Santa Clara, 95051, USA.
| | | | | | - Chloé-Agathe Azencott
- CBIO, MINES ParisTech, PSL Research University, 75006, Paris, France.,Institut Curie, PSL Research University, 75005, Paris, France.,U900, Inserm, 75005, Paris, France
| |
Collapse
|
5
|
Coltelli L, Allegrini G, Orlandi P, Finale C, Fontana A, Masini LC, Scalese M, Arrighi G, Barletta MT, De Maio E, Banchi M, Fini E, Guidi P, Frenzilli G, Donati S, Giovannelli S, Tanganelli L, Salvadori B, Livi L, Meattini I, Pazzagli I, Di Lieto M, Pistelli M, Casadei V, Ferro A, Cupini S, Orlandi F, Francesca D, Lorenzini G, Barellini L, Falcone A, Cosimi A, Bocci G. A pharmacogenetic interaction analysis of bevacizumab with paclitaxel in advanced breast cancer patients. NPJ Breast Cancer 2022; 8:33. [PMID: 35314692 PMCID: PMC8938486 DOI: 10.1038/s41523-022-00400-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 02/07/2022] [Indexed: 11/18/2022] Open
Abstract
To investigate pharmacogenetic interactions among VEGF-A, VEGFR-2, IL-8, HIF-1α, EPAS-1, and TSP-1 SNPs and their role on progression-free survival (PFS) in metastatic breast cancer (MBC) patients treated with bevacizumab plus first-line paclitaxel or with paclitaxel alone. Analyses were performed on germline DNA, and SNPs were investigated by real-time PCR technique. The multifactor dimensionality reduction (MDR) methodology was applied to investigate the interaction between SNPs. The present study was an explorative, ambidirectional cohort study: 307 patients from 11 Oncology Units were evaluated retrospectively from 2009 to 2016, then followed prospectively (NCT01935102). Two hundred and fifteen patients were treated with paclitaxel and bevacizumab, whereas 92 patients with paclitaxel alone. In the bevacizumab plus paclitaxel group, the MDR software provided two pharmacogenetic interaction profiles consisting of the combination between specific VEGF-A rs833061 and VEGFR-2 rs1870377 genotypes. Median PFS for favorable genetic profile was 16.8 vs. the 10.6 months of unfavorable genetic profile (p = 0.0011). Cox proportional hazards model showed an adjusted hazard ratio of 0.64 (95% CI, 0.5–0.9; p = 0.004). Median OS for the favorable genetic profile was 39.6 vs. 28 months of unfavorable genetic profile (p = 0.0103). Cox proportional hazards model revealed an adjusted hazard ratio of 0.71 (95% CI, 0.5–1.01; p = 0.058). In the 92 patients treated with paclitaxel alone, the results showed no effect of the favorable genetic profile, as compared to the unfavorable genetic profile, either on the PFS (p = 0.509) and on the OS (p = 0.732). The pharmacogenetic statistical interaction between VEGF-A rs833061 and VEGFR-2 rs1870377 genotypes may identify a population of bevacizumab-treated patients with a better PFS.
Collapse
|
6
|
Duroux D, Climente-González H, Azencott CA, Van Steen K. Interpretable network-guided epistasis detection. Gigascience 2022; 11:6521880. [PMID: 35134928 PMCID: PMC8848319 DOI: 10.1093/gigascience/giab093] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 10/12/2021] [Accepted: 12/13/2021] [Indexed: 11/15/2022] Open
Abstract
Background Detecting epistatic interactions at the gene level is essential to understanding the biological mechanisms of complex diseases. Unfortunately, genome-wide interaction association studies involve many statistical challenges that make such detection hard. We propose a multi-step protocol for epistasis detection along the edges of a gene-gene co-function network. Such an approach reduces the number of tests performed and provides interpretable interactions while keeping type I error controlled. Yet, mapping gene interactions into testable single-nucleotide polymorphism (SNP)-interaction hypotheses, as well as computing gene pair association scores from SNP pair ones, is not trivial. Results Here we compare 3 SNP-gene mappings (positional overlap, expression quantitative trait loci, and proximity in 3D structure) and use the adaptive truncated product method to compute gene pair scores. This method is non-parametric, does not require a known null distribution, and is fast to compute. We apply multiple variants of this protocol to a genome-wide association study dataset on inflammatory bowel disease. Different configurations produced different results, highlighting that various mechanisms are implicated in inflammatory bowel disease, while at the same time, results overlapped with known disease characteristics. Importantly, the proposed pipeline also differs from a conventional approach where no network is used, showing the potential for additional discoveries when prior biological knowledge is incorporated into epistasis detection.
Collapse
Affiliation(s)
- Diane Duroux
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium
| | - Héctor Climente-González
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France.,High-Dimensional Statistical Modeling Team, RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
| | - Chloé-Agathe Azencott
- Institut Curie, PSL Research University, F-75005 Paris, France.,INSERM, U900, F-75005 Paris, France.,CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, 75006 Paris, France
| | - Kristel Van Steen
- BIO3 - Systems Genetics, GIGA-R Medical Genomics, University of Liège, 4000 Liège, Belgium, 11 Liège 4000, Belgium.,BIO3 - Systems Medicine, Department of Human Genetics, KU Leuven, 3000 Leuven, Belgium, 49 3000 Leuven, Belgium
| |
Collapse
|
7
|
Investigation of gene-gene interactions in cardiac traits and serum fatty acid levels in the LURIC Health Study. PLoS One 2020; 15:e0238304. [PMID: 32915819 PMCID: PMC7485803 DOI: 10.1371/journal.pone.0238304] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2019] [Accepted: 08/13/2020] [Indexed: 01/25/2023] Open
Abstract
Epistasis analysis elucidates the effects of gene-gene interactions (G×G) between multiple loci for complex traits. However, the large computational demands and the high multiple testing burden impede their discoveries. Here, we illustrate the utilization of two methods, main effect filtering based on individual GWAS results and biological knowledge-based modeling through Biofilter software, to reduce the number of interactions tested among single nucleotide polymorphisms (SNPs) for 15 cardiac-related traits and 14 fatty acids. We performed interaction analyses using the two filtering methods, adjusting for age, sex, body mass index (BMI), waist-hip ratio, and the first three principal components from genetic data, among 2,824 samples from the Ludwigshafen Risk and Cardiovascular (LURIC) Health Study. Using Biofilter, one interaction nearly met Bonferroni significance: an interaction between rs7735781 in XRCC4 and rs10804247 in XRCC5 was identified for venous thrombosis with a Bonferroni-adjusted likelihood ratio test (LRT) p: 0.0627. A total of 57 interactions were identified from main effect filtering for the cardiac traits G×G (10) and fatty acids G×G (47) at Bonferroni-adjusted LRT p < 0.05. For cardiac traits, the top interaction involved SNPs rs1383819 in SNTG1 and rs1493939 (138kb from 5’ of SAMD12) with Bonferroni-adjusted LRT p: 0.0228 which was significantly associated with history of arterial hypertension. For fatty acids, the top interaction between rs4839193 in KCND3 and rs10829717 in LOC107984002 with Bonferroni-adjusted LRT p: 2.28×10−5 was associated with 9-trans 12-trans octadecanoic acid, an omega-6 trans fatty acid. The model inflation factor for the interactions under different filtering methods was evaluated from the standard median and the linear regression approach. Here, we applied filtering approaches to identify numerous genetic interactions related to cardiac-related outcomes as potential targets for therapy. The approaches described offer ways to detect epistasis in the complex traits and to improve precision medicine capability.
Collapse
|
8
|
Basile AO, Byrska-Bishop M, Wallace J, Frase AT, Ritchie MD. Novel features and enhancements in BioBin, a tool for the biologically inspired binning and association analysis of rare variants. Bioinformatics 2018; 34:527-529. [PMID: 28968757 PMCID: PMC5860358 DOI: 10.1093/bioinformatics/btx559] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 09/13/2017] [Indexed: 11/27/2022] Open
Abstract
Motivation BioBin is an automated bioinformatics tool for the multi-level biological binning of sequence variants. Herein, we present a significant update to BioBin which expands the software to facilitate a comprehensive rare variant analysis and incorporates novel features and analysis enhancements. Results In BioBin 2.3, we extend our software tool by implementing statistical association testing, updating the binning algorithm, as well as incorporating novel analysis features providing for a robust, highly customizable, and unified rare variant analysis tool. Availability and implementation The BioBin software package is open source and freely available to users at http://www.ritchielab.com/software/biobin-download Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Marta Byrska-Bishop
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - John Wallace
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Alexander T Frase
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17822 USA
| |
Collapse
|
9
|
Manduchi E, Williams SM, Chesi A, Johnson ME, Wells AD, Grant SFA, Moore JH. Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS. Hum Genet 2018; 137:413-425. [PMID: 29797095 PMCID: PMC5996751 DOI: 10.1007/s00439-018-1893-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 05/20/2018] [Indexed: 12/29/2022]
Abstract
Although Genome Wide Association Studies (GWAS) have led to many valuable insights into the genetic bases of common diseases over the past decade, the issue of missing heritability has surfaced, as the discovered main effect genetic variants found to date do not account for much of a trait's predicted genetic component. We present a workflow, integrating epigenomics and topologically associating domain data, aimed at discovering trait-associated SNP pairs from GWAS where neither SNP achieved independent genome-wide significance. Each analyzed SNP pair consists of one SNP in a putative active enhancer and another SNP in a putative physically interacting gene promoter in a trait-relevant tissue. As a proof-of-principle case study, we used this approach to identify focused collections of SNP pairs that we analyzed in three independent Type 2 diabetes (T2D) GWAS. This approach led us to discover 35 significant SNP pairs, encompassing both novel signals and signals for which we have found orthogonal support from other sources. Nine of these pairs are consistent with eQTL results, two are consistent with our own capture C experiments, and seven involve signals supported by recent T2D literature.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
- Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
- Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA.
| | - Scott M Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, USA
| | - Alessandra Chesi
- Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Matthew E Johnson
- Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
| | - Andrew D Wells
- Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Struan F A Grant
- Division of Human Genetics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Center for Spatial and Functional Genomics, The Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Genetics, Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
10
|
Ritchie MD, Van Steen K. The search for gene-gene interactions in genome-wide association studies: challenges in abundance of methods, practical considerations, and biological interpretation. ANNALS OF TRANSLATIONAL MEDICINE 2018; 6:157. [PMID: 29862246 DOI: 10.21037/atm.2018.04.05] [Citation(s) in RCA: 58] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
One of the primary goals in this era of precision medicine is to understand the biology of human diseases and their treatment, such that each individual patient receives the best possible treatment for their disease based on their genetic and environmental exposures. One way to work towards achieving this goal is to identify the environmental exposures and genetic variants that are relevant to each disease in question, as well as the complex interplay between genes and environment. Genome-wide association studies (GWAS) have allowed for a greater understanding of the genetic component of many complex traits. However, these genetic effects are largely small and thus, our ability to use these GWAS finding for precision medicine is limited. As more and more GWAS have been performed, rather than focusing only on common single nucleotide polymorphisms (SNPs) and additive genetic models, many researchers have begun to explore alternative heritable components of complex traits including rare variants, structural variants, epigenetics, and genetic interactions. While genetic interactions are a plausible reality that could explain some of the heritabliy that has not yet been identified, especially when one considers the identification of genetic interactions in model organisms as well as our understanding of biological complexity, still there are significant challenges and considerations in identifying these genetic interactions. Broadly, these can be summarized in three categories: abundance of methods, practical considerations, and biological interpretation. In this review, we will discuss these important elements in the search for genetic interactions along with some potential solutions. While genetic interactions are theoretically understood to be important for complex human disease, the body of evidence is still building to support this component of the underlying genetic architecture of complex human traits. Our hope is that more sophisticated modeling approaches and more robust computational techniques will enable the community to identify these important genetic interactions and improve our ability to implement precision medicine in the future.
Collapse
Affiliation(s)
- Marylyn D Ritchie
- Department of Genetics, University of Pennsylvania, Philadelphia, PA, USA
| | - Kristel Van Steen
- WELBIO, GIGA-R Medical Genomics Unit - BIO3, University of Liège, Liège, Belgium.,Department of Human Genetics, University of Leuven, Leuven, Belgium
| |
Collapse
|
11
|
Verma SS, Josyula N, Verma A, Zhang X, Veturi Y, Dewey FE, Hartzel DN, Lavage DR, Leader J, Ritchie MD, Pendergrass SA. Rare variants in drug target genes contributing to complex diseases, phenome-wide. Sci Rep 2018; 8:4624. [PMID: 29545597 PMCID: PMC5854600 DOI: 10.1038/s41598-018-22834-4] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2017] [Accepted: 03/01/2018] [Indexed: 12/30/2022] Open
Abstract
The DrugBank database consists of ~800 genes that are well characterized drug targets. This list of genes is a useful resource for association testing. For example, loss of function (LOF) genetic variation has the potential to mimic the effect of drugs, and high impact variation in these genes can impact downstream traits. Identifying novel associations between genetic variation in these genes and a range of diseases can also uncover new uses for the drugs that target these genes. Phenome Wide Association Studies (PheWAS) have been successful in identifying genetic associations across hundreds of thousands of diseases. We have conducted a novel gene based PheWAS to test the effect of rare variants in DrugBank genes, evaluating associations between these genes and more than 500 quantitative and dichotomous phenotypes. We used whole exome sequencing data from 38,568 samples in Geisinger MyCode Community Health Initiative. We evaluated the results of this study when binning rare variants using various filters based on potential functional impact. We identified multiple novel associations, and the majority of the significant associations were driven by functionally annotated variation. Overall, this study provides a sweeping exploration of rare variant associations within functionally relevant genes across a wide range of diagnoses.
Collapse
Affiliation(s)
- Shefali Setia Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Navya Josyula
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA
| | - Anurag Verma
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Xinyuan Zhang
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Yogasudha Veturi
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | | | - Dustin N Hartzel
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Daniel R Lavage
- Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Joe Leader
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.,Phenomic Analytics and Clinical Data Core, Geisinger, Danville, PA, USA
| | - Marylyn D Ritchie
- Perelman School of Medicine, Department of Genetics, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger, Danville, PA, 17221, USA.
| |
Collapse
|
12
|
Manduchi E, Chesi A, Hall MA, Grant SFA, Moore JH. Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2018; 23:548-558. [PMID: 29218913 PMCID: PMC5728670] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
We utilized evidence for enhancer-promoter interactions from functional genomics data in order to build biological filters to narrow down the search space for two-way Single Nucleotide Polymorphism (SNP) interactions in Type 2 Diabetes (T2D) Genome Wide Association Studies (GWAS). This has led us to the identification of a reproducible statistically significant SNP pair associated with T2D. As more functional genomics data are being generated that can help identify potentially interacting enhancer-promoter pairs in larger collection of tissues/cells, this approach has implications for investigation of epistasis from GWAS in general.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, 3700 Hamilton Walk, Philadelphia, PA, 19104, USA, ²Division of Human Genetics and Endocrinology, The Children's Hospital of Philadelphia, 3615 Civic Center Boulevard, Philadelphia, PA 19104, USA,
| | | | | | | | | |
Collapse
|
13
|
Kim D, Li R, Lucas A, Verma SS, Dudek SM, Ritchie MD. Using knowledge-driven genomic interactions for multi-omics data analysis: metadimensional models for predicting clinical outcomes in ovarian carcinoma. J Am Med Inform Assoc 2017; 24:577-587. [PMID: 28040685 DOI: 10.1093/jamia/ocw165] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 12/02/2016] [Indexed: 02/07/2023] Open
Abstract
It is common that cancer patients have different molecular signatures even though they have similar clinical features, such as histology, due to the heterogeneity of tumors. To overcome this variability, we previously developed a new approach incorporating prior biological knowledge that identifies knowledge-driven genomic interactions associated with outcomes of interest. However, no systematic approach has been proposed to identify interaction models between pathways based on multi-omics data. Here we have proposed such a novel methodological framework, called metadimensional knowledge-driven genomic interactions (MKGIs). To test the utility of the proposed framework, we applied it to an ovarian cancer dataset including multi-omics profiles from The Cancer Genome Atlas to predict grade, stage, and survival outcome. We found that each knowledge-driven genomic interaction model, based on different genomic datasets, contains different sets of pathway features, which suggests that each genomic data type may contribute to outcomes in ovarian cancer via a different pathway. In addition, MKGI models significantly outperformed the single knowledge-driven genomic interaction model. From the MKGI models, many interactions between pathways associated with outcomes were found, including the mitogen-activated protein kinase (MAPK) signaling pathway and the gonadotropin-releasing hormone (GnRH) signaling pathway, which are known to play important roles in cancer pathogenesis. The beauty of incorporating biological knowledge into the model based on multi-omics data is the ability to improve diagnosis and prognosis and provide better interpretability. Thus, determining variability in molecular signatures based on these interactions between pathways may lead to better diagnostic/treatment strategies for better precision medicine.
Collapse
Affiliation(s)
- Dokyoon Kim
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA.,Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Ruowang Li
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Anastasia Lucas
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Shefali S Verma
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA
| | - Scott M Dudek
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA.,Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
14
|
Hall MA, Wallace J, Lucas A, Kim D, Basile AO, Verma SS, McCarty CA, Brilliant MH, Peissig PL, Kitchner TE, Verma A, Pendergrass SA, Dudek SM, Moore JH, Ritchie MD. PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 2017; 8:1167. [PMID: 29079728 PMCID: PMC5660079 DOI: 10.1038/s41467-017-00802-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 07/28/2017] [Indexed: 12/22/2022] Open
Abstract
Genome-wide, imputed, sequence, and structural data are now available for exceedingly large sample sizes. The needs for data management, handling population structure and related samples, and performing associations have largely been met. However, the infrastructure to support analyses involving complexity beyond genome-wide association studies is not standardized or centralized. We provide the PLatform for the Analysis, Translation, and Organization of large-scale data (PLATO), a software tool equipped to handle multi-omic data for hundreds of thousands of samples to explore complexity using genetic interactions, environment-wide association studies and gene–environment interactions, phenome-wide association studies, as well as copy number and rare variant analyses. Using the data from the Marshfield Personalized Medicine Research Project, a site in the electronic Medical Records and Genomics Network, we apply each feature of PLATO to type 2 diabetes and demonstrate how PLATO can be used to uncover the complex etiology of common traits. Centralized infrastructure to support analyses involving complexity beyond genome-wide association studies is broadly needed. Here, Ritchie and colleagues develop PLATO, a software tool to process and integrate various methods for this task.
Collapse
Affiliation(s)
- Molly A Hall
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - John Wallace
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA
| | - Anastasia Lucas
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA
| | - Dokyoon Kim
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA
| | - Anna O Basile
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Shefali S Verma
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA.,Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | | | | | - Peggy L Peissig
- Marshfield Clinic Research Institute, Marshfield, WI, 54449, USA
| | | | - Anurag Verma
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA.,Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, 16802, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA
| | - Scott M Dudek
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA
| | - Jason H Moore
- Institute for Biomedical Informatics, Departments of Genetics and Biostatistics and Epidemiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Institute, Geisinger Health System, Danville, PA, 17821, USA. .,Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
15
|
McAllister K, Mechanic LE, Amos C, Aschard H, Blair IA, Chatterjee N, Conti D, Gauderman WJ, Hsu L, Hutter CM, Jankowska MM, Kerr J, Kraft P, Montgomery SB, Mukherjee B, Papanicolaou GJ, Patel CJ, Ritchie MD, Ritz BR, Thomas DC, Wei P, Witte JS. Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases. Am J Epidemiol 2017; 186:753-761. [PMID: 28978193 PMCID: PMC5860428 DOI: 10.1093/aje/kwx227] [Citation(s) in RCA: 106] [Impact Index Per Article: 15.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 03/14/2017] [Accepted: 03/16/2017] [Indexed: 12/25/2022] Open
Abstract
Recently, many new approaches, study designs, and statistical and analytical methods have emerged for studying gene-environment interactions (G×Es) in large-scale studies of human populations. There are opportunities in this field, particularly with respect to the incorporation of -omics and next-generation sequencing data and continual improvement in measures of environmental exposures implicated in complex disease outcomes. In a workshop called "Current Challenges and New Opportunities for Gene-Environment Interaction Studies of Complex Diseases," held October 17-18, 2014, by the National Institute of Environmental Health Sciences and the National Cancer Institute in conjunction with the annual American Society of Human Genetics meeting, participants explored new approaches and tools that have been developed in recent years for G×E discovery. This paper highlights current and critical issues and themes in G×E research that need additional consideration, including the improved data analytical methods, environmental exposure assessment, and incorporation of functional data and annotations.
Collapse
Affiliation(s)
| | - Leah E. Mechanic
- Correspondence to Dr. Leah E. Mechanic, Genomic Epidemiology Branch, Epidemiology and Genomics Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, 9609 Medical Center Drive, Room 4E104, MSC 9763, Bethesda, MD 20892 (e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
16
|
Ritchie MD, Davis JR, Aschard H, Battle A, Conti D, Du M, Eskin E, Fallin MD, Hsu L, Kraft P, Moore JH, Pierce BL, Bien SA, Thomas DC, Wei P, Montgomery SB. Incorporation of Biological Knowledge Into the Study of Gene-Environment Interactions. Am J Epidemiol 2017; 186:771-777. [PMID: 28978191 PMCID: PMC5860556 DOI: 10.1093/aje/kwx229] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2016] [Revised: 04/07/2017] [Accepted: 04/10/2017] [Indexed: 12/12/2022] Open
Abstract
A growing knowledge base of genetic and environmental information has greatly enabled the study of disease risk factors. However, the computational complexity and statistical burden of testing all variants by all environments has required novel study designs and hypothesis-driven approaches. We discuss how incorporating biological knowledge from model organisms, functional genomics, and integrative approaches can empower the discovery of novel gene-environment interactions and discuss specific methodological considerations with each approach. We consider specific examples where the application of these approaches has uncovered effects of gene-environment interactions relevant to drug response and immunity, and we highlight how such improvements enable a greater understanding of the pathogenesis of disease and the realization of precision medicine.
Collapse
Affiliation(s)
- Marylyn D. Ritchie
- Correspondence to Dr. Stephen B. Montgomery, Departments of Genetics and Pathology, Stanford University School of Medicine, Stanford, CA 94305 (e-mail: ); or Dr. Marylyn D. Ritchie, Geisinger Health System, 205 Hood Center for Health Research, Center Street, Danville, PA 17821(e-mail: )
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Stephen B. Montgomery
- Correspondence to Dr. Stephen B. Montgomery, Departments of Genetics and Pathology, Stanford University School of Medicine, Stanford, CA 94305 (e-mail: ); or Dr. Marylyn D. Ritchie, Geisinger Health System, 205 Hood Center for Health Research, Center Street, Danville, PA 17821(e-mail: )
| |
Collapse
|
17
|
Holzinger ER, Verma SS, Moore CB, Hall M, De R, Gilbert-Diamond D, Lanktree MB, Pankratz N, Amuzu A, Burt A, Dale C, Dudek S, Furlong CE, Gaunt TR, Kim DS, Riess H, Sivapalaratnam S, Tragante V, van Iperen EP, Brautbar A, Carrell DS, Crosslin DR, Jarvik GP, Kuivaniemi H, Kullo IJ, Larson EB, Rasmussen-Torvik LJ, Tromp G, Baumert J, Cruickshanks KJ, Farrall M, Hingorani AD, Hovingh GK, Kleber ME, Klein BE, Klein R, Koenig W, Lange LA, Mӓrz W, North KE, Charlotte Onland-Moret N, Reiner AP, Talmud PJ, van der Schouw YT, Wilson JG, Kivimaki M, Kumari M, Moore JH, Drenos F, Asselbergs FW, Keating BJ, Ritchie MD. Discovery and replication of SNP-SNP interactions for quantitative lipid traits in over 60,000 individuals. BioData Min 2017; 10:25. [PMID: 28770004 PMCID: PMC5525436 DOI: 10.1186/s13040-017-0145-5] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2017] [Accepted: 07/12/2017] [Indexed: 12/01/2022] Open
Abstract
BACKGROUND The genetic etiology of human lipid quantitative traits is not fully elucidated, and interactions between variants may play a role. We performed a gene-centric interaction study for four different lipid traits: low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), total cholesterol (TC), and triglycerides (TG). RESULTS Our analysis consisted of a discovery phase using a merged dataset of five different cohorts (n = 12,853 to n = 16,849 depending on lipid phenotype) and a replication phase with ten independent cohorts totaling up to 36,938 additional samples. Filters are often applied before interaction testing to correct for the burden of testing all pairwise interactions. We used two different filters: 1. A filter that tested only single nucleotide polymorphisms (SNPs) with a main effect of p < 0.001 in a previous association study. 2. A filter that only tested interactions identified by Biofilter 2.0. Pairwise models that reached an interaction significance level of p < 0.001 in the discovery dataset were tested for replication. We identified thirteen SNP-SNP models that were significant in more than one replication cohort after accounting for multiple testing. CONCLUSIONS These results may reveal novel insights into the genetic etiology of lipid levels. Furthermore, we developed a pipeline to perform a computationally efficient interaction analysis with multi-cohort replication.
Collapse
Affiliation(s)
- Emily R. Holzinger
- Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institute for General Medical Sciences, National Institutes of Health, Baltimore, MD USA
| | - Shefali S. Verma
- The Center for Systems Genomics, The Pennsylvania State University, University Park, State College, PA USA
| | | | - Molly Hall
- The Center for Systems Genomics, The Pennsylvania State University, University Park, State College, PA USA
| | - Rishika De
- Department of Genetics, Geisel School of Medicine at Dartmouth, Hanover, NH USA
| | | | | | - Nathan Pankratz
- Department of Lab Medicine and Pathology, University of Minnesota, Minneapolis, MN USA
| | | | - Amber Burt
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Caroline Dale
- London School of Hygiene and Tropical Medicine, London, UK
| | - Scott Dudek
- The Center for Systems Genomics, The Pennsylvania State University, University Park, State College, PA USA
| | - Clement E. Furlong
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Tom R. Gaunt
- MRC Integrative Epidemiology Unit, University of Bristol, Oakfield House, Oakfield Grove, Bristol, UK
| | - Daniel Seung Kim
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Helene Riess
- Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | | | - Vinicius Tragante
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht, The Netherlands
- Department of Medical Genetics, Biomedical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Erik P.A. van Iperen
- Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, The Netherlands
- Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, Amsterdam, The Netherlands
| | - Ariel Brautbar
- Department of Medical Genetics, Marshfield Clinic, Marshfield, WI USA
| | - David S. Carrell
- Group Health Research Institute, Group Health Cooperative, Seattle, WA USA
| | - David R. Crosslin
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Gail P. Jarvik
- Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, WA USA
| | - Helena Kuivaniemi
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Stellenbosch University, Tygerberg, South Africa
| | | | - Eric B. Larson
- Group Health Research Institute, Group Health Cooperative, Seattle, WA USA
| | - Laura J. Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, Chicago, IL USA
| | - Gerard Tromp
- Division of Molecular Biology and Human Genetics, Department of Biomedical Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Jens Baumert
- Institute of Epidemiology II, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Karen J. Cruickshanks
- Department of Population Health Sciences, Department of Ophthalmology and Visual Sciences, University of Wisconsin-Madison, Madison, WI USA
| | - Martin Farrall
- Department of Cardiovascular Medicine, The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, UK
| | - Aroon D. Hingorani
- Department of Epidemiology and Public Health, UCL Institute of Epidemiology & Health Care, University College London, London, UK
| | - G. K. Hovingh
- Department of Vascular Medicine, Academic Medical Center, Amsterdam, The Netherlands
| | - Marcus E. Kleber
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
| | - Barbara E. Klein
- Department of Population Health Sciences, Department of Ophthalmology and Visual Sciences, University of Wisconsin-Madison, Madison, WI USA
| | - Ronald Klein
- Department of Population Health Sciences, Department of Ophthalmology and Visual Sciences, University of Wisconsin-Madison, Madison, WI USA
| | - Wolfgang Koenig
- Department of Internal Medicine II – Cardiology, University of Ulm Medical Centre, Ulm, Germany
| | - Leslie A. Lange
- Department of Genetics, University of North Carolina School of Medicine at Chapel Hill, Chapel Hill, NC USA
| | - Winfried Mӓrz
- Vth Department of Medicine, Medical Faculty Mannheim, Heidelberg University, Heidelberg, Germany
- Synlab Academy, Synlab Services GmbH, Mannheim, Germany
| | - Kari E. North
- Department of Epidemiology, School of Public Health, University of North Carolina at Chapel Hill, Chapel Hill, NC USA
| | - N. Charlotte Onland-Moret
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Alex P. Reiner
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA USA
| | - Philippa J. Talmud
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, UK
| | - Yvonne T. van der Schouw
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht, The Netherlands
| | - James G. Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS USA
| | - Mika Kivimaki
- Department of Epidemiology and Public Health, UCL Institute of Epidemiology & Health Care, University College London, London, UK
| | - Meena Kumari
- Department of Epidemiology and Public Health, UCL Institute of Epidemiology & Health Care, University College London, London, UK
- ISER, University of Essex, Essex, UK
| | - Jason H. Moore
- Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA USA
| | - Fotios Drenos
- MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol, UK
- Centre of Cardiovascular Genetics, Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, UK
| | - Folkert W. Asselbergs
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Utrecht, The Netherlands
- Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, The Netherlands
- Centre of Cardiovascular Genetics, Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, London, UK
| | - Brendan J. Keating
- Division of Genetics, The Children’s Hospital of Philadelphia, Philadelphia, PA USA
- Division of Transplantation, Department of Surgery, University of Pennsylvania, Philadelphia, PA USA
| | - Marylyn D. Ritchie
- Biomedical and Translational Informatics, Geisinger Clinic, Danville, PA USA
| |
Collapse
|
18
|
Identifying gene-gene interactions that are highly associated with four quantitative lipid traits across multiple cohorts. Hum Genet 2016; 136:165-178. [PMID: 27848076 DOI: 10.1007/s00439-016-1738-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2016] [Accepted: 10/07/2016] [Indexed: 10/20/2022]
Abstract
Genetic loci explain only 25-30 % of the heritability observed in plasma lipid traits. Epistasis, or gene-gene interactions may contribute to a portion of this missing heritability. Using the genetic data from five NHLBI cohorts of 24,837 individuals, we combined the use of the quantitative multifactor dimensionality reduction (QMDR) algorithm with two SNP-filtering methods to exhaustively search for SNP-SNP interactions that are associated with HDL cholesterol (HDL-C), LDL cholesterol (LDL-C), total cholesterol (TC) and triglycerides (TG). SNPs were filtered either on the strength of their independent effects (main effect filter) or the prior knowledge supporting a given interaction (Biofilter). After the main effect filter, QMDR identified 20 SNP-SNP models associated with HDL-C, 6 associated with LDL-C, 3 associated with TC, and 10 associated with TG (permutation P value <0.05). With the use of Biofilter, we identified 2 SNP-SNP models associated with HDL-C, 3 associated with LDL-C, 1 associated with TC and 8 associated with TG (permutation P value <0.05). In an independent dataset of 7502 individuals from the eMERGE network, we replicated 14 of the interactions identified after main effect filtering: 11 for HDL-C, 1 for LDL-C and 2 for TG. We also replicated 23 of the interactions found to be associated with TG after applying Biofilter. Prior knowledge supports the possible role of these interactions in the genetic etiology of lipid traits. This study also presents a computationally efficient pipeline for analyzing data from large genotyping arrays and detecting SNP-SNP interactions that are not primarily driven by strong main effects.
Collapse
|
19
|
Moore CCB, Basile AO, Wallace JR, Frase AT, Ritchie MD. A biologically informed method for detecting rare variant associations. BioData Min 2016; 9:27. [PMID: 27582876 PMCID: PMC5006419 DOI: 10.1186/s13040-016-0107-3] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 06/18/2016] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND BioBin is a bioinformatics software package developed to automate the process of binning rare variants into groups for statistical association analysis using a biological knowledge-driven framework. BioBin collapses variants into biological features such as genes, pathways, evolutionary conserved regions (ECRs), protein families, regulatory regions, and others based on user-designated parameters. BioBin provides the infrastructure to create complex and interesting hypotheses in an automated fashion thereby circumventing the necessity for advanced and time consuming scripting. PURPOSE OF THE STUDY In this manuscript, we describe the software package for BioBin, along with type I error and power simulations to demonstrate the strengths and various customizable features and analysis options of this variant binning tool. RESULTS Simulation testing highlights the utility of BioBin as a fast, comprehensive and expandable tool for the biologically-inspired binning and analysis of low-frequency variants in sequence data. CONCLUSIONS AND POTENTIAL IMPLICATIONS The BioBin software package has the capability to transform and streamline the analysis pipelines for researchers analyzing rare variants. This automated bioinformatics tool minimizes the manual effort of creating genomic regions for binning such that time can be spent on the much more interesting task of statistical analyses. This software package is open source and freely available from http://ritchielab.com/software/biobin-download.
Collapse
Affiliation(s)
| | - Anna Okula Basile
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
| | - John Robert Wallace
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Alex Thomas Frase
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| | - Marylyn DeRiggi Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, The Pennsylvania State University, University Park, PA 16802 USA
- Biomedical and Translational Informatics, Geisinger Health System, Danville, PA 17821 USA
| |
Collapse
|
20
|
Phenome-Wide Association Study to Explore Relationships between Immune System Related Genetic Loci and Complex Traits and Diseases. PLoS One 2016; 11:e0160573. [PMID: 27508393 PMCID: PMC4980020 DOI: 10.1371/journal.pone.0160573] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 07/16/2016] [Indexed: 12/21/2022] Open
Abstract
We performed a Phenome-Wide Association Study (PheWAS) to identify interrelationships between the immune system genetic architecture and a wide array of phenotypes from two de-identified electronic health record (EHR) biorepositories. We selected variants within genes encoding critical factors in the immune system and variants with known associations with autoimmunity. To define case/control status for EHR diagnoses, we used International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from 3,024 Geisinger Clinic MyCode® subjects (470 diagnoses) and 2,899 Vanderbilt University Medical Center BioVU biorepository subjects (380 diagnoses). A pooled-analysis was also carried out for the replicating results of the two data sets. We identified new associations with potential biological relevance including SNPs in tumor necrosis factor (TNF) and ankyrin-related genes associated with acute and chronic sinusitis and acute respiratory tract infection. The two most significant associations identified were for the C6orf10 SNP rs6910071 and “rheumatoid arthritis” (ICD-9 code category 714) (pMETAL = 2.58 x 10−9) and the ATN1 SNP rs2239167 and “diabetes mellitus, type 2” (ICD-9 code category 250) (pMETAL = 6.39 x 10−9). This study highlights the utility of using PheWAS in conjunction with EHRs to discover new genotypic-phenotypic associations for immune-system related genetic loci.
Collapse
|
21
|
Butkiewicz M, Cooke Bailey JN, Frase A, Dudek S, Yaspan BL, Ritchie MD, Pendergrass SA, Haines JL. Pathway analysis by randomization incorporating structure-PARIS: an update. ACTA ACUST UNITED AC 2016; 32:2361-3. [PMID: 27153576 DOI: 10.1093/bioinformatics/btw130] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Accepted: 03/03/2016] [Indexed: 01/11/2023]
Abstract
MOTIVATION We present an update to the pathway enrichment analysis tool 'Pathway Analysis by Randomization Incorporating Structure (PARIS)' that determines aggregated association signals generated from genome-wide association study results. Pathway-based analyses highlight biological pathways associated with phenotypes. PARIS uses a unique permutation strategy to evaluate the genomic structure of interrogated pathways, through permutation testing of genomic features, thus eliminating many of the over-testing concerns arising with other pathway analysis approaches. RESULTS We have updated PARIS to incorporate expanded pathway definitions through the incorporation of new expert knowledge from multiple database sources, through customized user provided pathways, and other improvements in user flexibility and functionality. AVAILABILITY AND IMPLEMENTATION PARIS is freely available to all users at https://ritchielab.psu.edu/software/paris-download CONTACT jnc43@case.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Jessica N Cooke Bailey
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| | - Alex Frase
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, PA, USA
| | - Scott Dudek
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, PA, USA
| | - Brian L Yaspan
- Department of Human Genetics, Genentech, Inc, South San Francisco, CA, USA
| | - Marylyn D Ritchie
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, PA, USA
| | - Sarah A Pendergrass
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, PA, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA Institute for Computational Biology, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
22
|
Hohman TJ, Bush WS, Jiang L, Brown-Gentry KD, Torstenson ES, Dudek SM, Mukherjee S, Naj A, Kunkle BW, Ritchie MD, Martin ER, Schellenberg GD, Mayeux R, Farrer LA, Pericak-Vance MA, Haines JL, Thornton-Wells TA. Discovery of gene-gene interactions across multiple independent data sets of late onset Alzheimer disease from the Alzheimer Disease Genetics Consortium. Neurobiol Aging 2016; 38:141-150. [PMID: 26827652 PMCID: PMC4735733 DOI: 10.1016/j.neurobiolaging.2015.10.031] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 10/28/2015] [Accepted: 10/28/2015] [Indexed: 12/20/2022]
Abstract
Late-onset Alzheimer disease (AD) has a complex genetic etiology, involving locus heterogeneity, polygenic inheritance, and gene-gene interactions; however, the investigation of interactions in recent genome-wide association studies has been limited. We used a biological knowledge-driven approach to evaluate gene-gene interactions for consistency across 13 data sets from the Alzheimer Disease Genetics Consortium. Fifteen single nucleotide polymorphism (SNP)-SNP pairs within 3 gene-gene combinations were identified: SIRT1 × ABCB1, PSAP × PEBP4, and GRIN2B × ADRA1A. In addition, we extend a previously identified interaction from an endophenotype analysis between RYR3 × CACNA1C. Finally, post hoc gene expression analyses of the implicated SNPs further implicate SIRT1 and ABCB1, and implicate CDH23 which was most recently identified as an AD risk locus in an epigenetic analysis of AD. The observed interactions in this article highlight ways in which genotypic variation related to disease may depend on the genetic context in which it occurs. Further, our results highlight the utility of evaluating genetic interactions to explain additional variance in AD risk and identify novel molecular mechanisms of AD pathogenesis.
Collapse
Affiliation(s)
- Timothy J Hohman
- Vanderbilt Memory & Alzheimer's Center, Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William S Bush
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Lan Jiang
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Eric S Torstenson
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Scott M Dudek
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | | | - Adam Naj
- Department of Biostatistics and Epidemiology, University of Pennsylvania, Philadelphia, PA, USA
| | - Brian W Kunkle
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | - Eden R Martin
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA; Department of Public Health Sciences, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Gerard D Schellenberg
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA
| | - Richard Mayeux
- Gertrude H. Sergievsky Center, Department of Neurology and the Taub Institute for Research on Alzheimer's Disease and the Aging Brain, College of Physicians and Surgeons, Columbia University, New York, NY, USA
| | - Lindsay A Farrer
- Department of Medicine (Biomedical Genetics), Boston University, Boston, MA, USA; Department of Neurology, Boston University, Boston, MA, USA; Department of Ophthalmology, Boston University, Boston, MA, USA; Department of Epidemiology, Boston University, Boston, MA, USA; Department of Biostatistics, Boston University, Boston, MA, USA
| | - Margaret A Pericak-Vance
- Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genomics, Miller School of Medicine, University of Miami, Miami, FL, USA; Department of Neurology, Miller School of Medicine, University of Miami, Miami, FL, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | - Tricia A Thornton-Wells
- Vanderbilt Genetics Institute, Department of Molecular Physiology & Biophysics, Vanderbilt University Medical Center, Nashville, TN, USA.
| |
Collapse
|
23
|
Basile AO, Wallace JR, Peissig P, McCarty CA, Brilliant M, Ritchie MD. KNOWLEDGE DRIVEN BINNING AND PHEWAS ANALYSIS IN MARSHFIELD PERSONALIZED MEDICINE RESEARCH PROJECT USING BIOBIN. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:249-260. [PMID: 26776191 PMCID: PMC4824557] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Next-generation sequencing technology has presented an opportunity for rare variant discovery and association of these variants with disease. To address the challenges of rare variant analysis, multiple statistical methods have been developed for combining rare variants to increase statistical power for detecting associations. BioBin is an automated tool that expands on collapsing/binning methods by performing multi-level variant aggregation with a flexible, biologically informed binning strategy using an internal biorepository, the Library of Knowledge (LOKI). The databases within LOKI provide variant details, regional annotations and pathway interactions which can be used to generate bins of biologically-related variants, thereby increasing the power of any subsequent statistical test. In this study, we expand the framework of BioBin to incorporate statistical tests, including a dispersion-based test, SKAT, thereby providing the option of performing a unified collapsing and statistical rare variant analysis in one tool. Extensive simulation studies performed on gene-coding regions showed a Bin-KAT analysis to have greater power than BioBin-regression in all simulated conditions, including variants influencing the phenotype in the same direction, a scenario where burden tests often retain greater power. The use of Madsen- Browning variant weighting increased power in the burden analysis to that equitable with Bin-KAT; but overall Bin-KAT retained equivalent or higher power under all conditions. Bin-KAT was applied to a study of 82 pharmacogenes sequenced in the Marshfield Personalized Medicine Research Project (PMRP). We looked for association of these genes with 9 different phenotypes extracted from the electronic health record. This study demonstrates that Bin-KAT is a powerful tool for the identification of genes harboring low frequency variants for complex phenotypes.
Collapse
Affiliation(s)
- Anna O Basile
- Department of Biochemistry, Microbiology and Molecular Biology, The Pennsylvania State University, University Park, PA, USA
| | | | | | | | | | | |
Collapse
|
24
|
Butkiewicz M, Bush WS. In Silico Functional Annotation of Genomic Variation. CURRENT PROTOCOLS IN HUMAN GENETICS 2016; 88:6.15.1-6.15.17. [PMID: 26724722 PMCID: PMC4722816 DOI: 10.1002/0471142905.hg0615s88] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
This unit describes the concepts and practical techniques for annotating genomic variants in the human genome to estimate their functional significance. With the rapid increase of available whole exome and whole genome sequencing information for human studies, annotation techniques have become progressively more important for highlighting and prioritizing nucleotide variants and their potential impact on genes and other genetic constructs. Here, we present an overview of different types of variant annotation approaches and elaborate on their foundations, assumptions, and the downstream consequences of their use. Computational approaches and tools to assign annotations and to identify variants are reviewed. Further, the general philosophy of assigning potential function to a genetic change within the biological context of a disease is discussed.
Collapse
Affiliation(s)
- Mariusz Butkiewicz
- Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio
| | - William S Bush
- Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio
| |
Collapse
|
25
|
Verma SS, Frase AT, Verma A, Pendergrass SA, Mahony S, Haas DW, Ritchie MD. PHENOME-WIDE INTERACTION STUDY (PheWIS) IN AIDS CLINICAL TRIALS GROUP DATA (ACTG). PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:57-68. [PMID: 26776173 PMCID: PMC4722952] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Association studies have shown and continue to show a substantial amount of success in identifying links between multiple single nucleotide polymorphisms (SNPs) and phenotypes. These studies are also believed to provide insights toward identification of new drug targets and therapies. Albeit of all the success, challenges still remain for applying and prioritizing these associations based on available biological knowledge. Along with single variant association analysis, genetic interactions also play an important role in uncovering the etiology and progression of complex traits. For gene-gene interaction analysis, selection of the variants to test for associations still poses a challenge in identifying epistatic interactions among the large list of variants available in high-throughput, genome-wide datasets. Therefore in this study, we propose a pipeline to identify interactions among genetic variants that are associated with multiple phenotypes by prioritizing previously published results from main effect association analysis (genome-wide and phenome-wide association analysis) based on a-priori biological knowledge in AIDS Clinical Trials Group (ACTG) data. We approached the prioritization and filtration of variants by using the results of a previously published single variant PheWAS and then utilizing biological information from the Roadmap Epigenome project. We removed variants in low functional activity regions based on chromatin states annotation and then conducted an exhaustive pairwise interaction search using linear regression analysis. We performed this analysis in two independent pre-treatment clinical trial datasets from ACTG to allow for both discovery and replication. Using a regression framework, we observed 50,798 associations that replicate at p-value 0.01 for 26 phenotypes, among which 2,176 associations for 212 unique SNPs for fasting blood glucose phenotype reach Bonferroni significance and an additional 9,970 interactions for high-density lipoprotein (HDL) phenotype and fasting blood glucose (total of 12,146 associations) reach FDR significance. We conclude that this method of prioritizing variants to look for epistatic interactions can be used extensively for generating hypotheses for genomewide and phenome-wide interaction analyses. This original Phenome-wide Interaction study (PheWIS) can be applied further to patients enrolled in randomized clinical trials to establish the relationship between patient's response to a particular drug therapy and non-linear combination of variants that might be affecting the outcome.
Collapse
Affiliation(s)
- Shefali S Verma
- Center for System Genomics, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | | | |
Collapse
|
26
|
KIM DOKYOON, LUCAS ANASTASIA, GLESSNER JOSEPH, VERMA SHEFALIS, BRADFORD YUKI, LI RUOWANG, FRASE ALEXT, HAKONARSON HAKON, PEISSIG PEGGY, BRILLIANT MURRAY, RITCHIE MARYLYND. BIOFILTER AS A FUNCTIONAL ANNOTATION PIPELINE FOR COMMON AND RARE COPY NUMBER BURDEN. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2016; 21:357-368. [PMID: 26776200 PMCID: PMC4722964] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Recent studies on copy number variation (CNV) have suggested that an increasing burden of CNVs is associated with susceptibility or resistance to disease. A large number of genes or genomic loci contribute to complex diseases such as autism. Thus, total genomic copy number burden, as an accumulation of copy number change, is a meaningful measure of genomic instability to identify the association between global genetic effects and phenotypes of interest. However, no systematic annotation pipeline has been developed to interpret biological meaning based on the accumulation of copy number change across the genome associated with a phenotype of interest. In this study, we develop a comprehensive and systematic pipeline for annotating copy number variants into genes/genomic regions and subsequently pathways and other gene groups using Biofilter - a bioinformatics tool that aggregates over a dozen publicly available databases of prior biological knowledge. Next we conduct enrichment tests of biologically defined groupings of CNVs including genes, pathways, Gene Ontology, or protein families. We applied the proposed pipeline to a CNV dataset from the Marshfield Clinic Personalized Medicine Research Project (PMRP) in a quantitative trait phenotype derived from the electronic health record - total cholesterol. We identified several significant pathways such as toll-like receptor signaling pathway and hepatitis C pathway, gene ontologies (GOs) of nucleoside triphosphatase activity (NTPase) and response to virus, and protein families such as cell morphogenesis that are associated with the total cholesterol phenotype based on CNV profiles (permutation p-value < 0.01). Based on the copy number burden analysis, it follows that the more and larger the copy number changes, the more likely that one or more target genes that influence disease risk and phenotypic severity will be affected. Thus, our study suggests the proposed enrichment pipeline could improve the interpretability of copy number burden analysis where hundreds of loci or genes contribute toward disease susceptibility via biological knowledge groups such as pathways. This CNV annotation pipeline with Biofilter can be used for CNV data from any genotyping or sequencing platform and to explore CNV enrichment for any traits or phenotypes. Biofilter continues to be a powerful bioinformatics tool for annotating, filtering, and constructing biologically informed models for association analysis - now including copy number variants.
Collapse
Affiliation(s)
- DOKYOON KIM
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - ANASTASIA LUCAS
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - JOSEPH GLESSNER
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - SHEFALI S. VERMA
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - YUKI BRADFORD
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - RUOWANG LI
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - ALEX T. FRASE
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
| | - HAKON HAKONARSON
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
| | - PEGGY PEISSIG
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA
| | - MURRAY BRILLIANT
- Biomedical Informatics Research Center, Marshfield Clinic Research Foundation, Marshfield, Wisconsin, USA
| | - MARYLYN D. RITCHIE
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania, USA
- Biomedical & Translational Informatics, Geisinger Health System, Danville, Pennsylvania, USA
| |
Collapse
|
27
|
De R, Verma SS, Drenos F, Holzinger ER, Holmes MV, Hall MA, Crosslin DR, Carrell DS, Hakonarson H, Jarvik G, Larson E, Pacheco JA, Rasmussen-Torvik LJ, Moore CB, Asselbergs FW, Moore JH, Ritchie MD, Keating BJ, Gilbert-Diamond D. Identifying gene-gene interactions that are highly associated with Body Mass Index using Quantitative Multifactor Dimensionality Reduction (QMDR). BioData Min 2015; 8:41. [PMID: 26674805 PMCID: PMC4678717 DOI: 10.1186/s13040-015-0074-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 12/04/2015] [Indexed: 11/22/2022] Open
Abstract
Background Despite heritability estimates of 40–70 % for obesity, less than 2 % of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Methods Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in the context of BMI. Filtered SNPs were specifically analyzed for interactions that are highly associated with BMI using QMDR (Quantitative Multifactor Dimensionality Reduction). QMDR is a nonparametric, genetic model-free method that detects non-linear interactions associated with a quantitative trait. Results We identified seven novel, epistatic models with a Bonferroni corrected p-value of association < 0.1. Prior experimental evidence helps explain the plausible biological interactions highlighted within our results and their relationship with obesity. We identified interactions between genes involved in mitochondrial dysfunction (POLG2), cholesterol metabolism (SOAT2), lipid metabolism (CYP11B2), cell adhesion (EZR), cell proliferation (MAP2K5), and insulin resistance (IGF1R). Moreover, we found an 8.8 % increase in the variance in BMI explained by these seven SNP-SNP interactions, beyond what is explained by the main effects of an index FTO SNP and the SNPs within these interactions. We also replicated one of these interactions and 58 proxy SNP-SNP models representing it in an independent dataset from the eMERGE study. Conclusion This study highlights a novel approach for discovering gene-gene interactions by combining methods such as QMDR with traditional statistics. Electronic supplementary material The online version of this article (doi:10.1186/s13040-015-0074-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rishika De
- Computational Genetics Laboratory, Department of Genetics, Geisel School of Medicine at Dartmouth, Dartmouth-Hitchcock Medical Center, 706 Rubin Building, HB7937, One Medical Center Dr, Lebanon, NH 03756 USA
| | - Shefali S Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Fotios Drenos
- Centre for Cardiovascular Genetics, Institute of Cardiovascular Science, Faculty of Population Health Sciences, University College London, 5 University Street, London, WC1E 6JF UK
| | - Emily R Holzinger
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Michael V Holmes
- Division of Transplant Surgery, Perelman School of Medicine, University of Pennsylvania, 3400 Spruce Street, 2 Dulles Pvln, Philadelphia, PA 19104 USA
| | - Molly A Hall
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - David R Crosslin
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065 USA
| | - David S Carrell
- Group Health Research Institute, Metropolitan Park East, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101-1448 USA
| | - Hakon Hakonarson
- The Joseph Stokes Jr. Research Institute, The Children's Hospital of Philadelphia, Office 1016 Abramson Building, Room 1216E, 3615 Civic Center Blvd, Philadelphia, PA 19104 USA
| | - Gail Jarvik
- Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065 USA ; Division of Medical Genetics, Department of Medicine, University of Washington, Health Sciences Building, K-253B, Medical Genetics, Box 357720, Seattle, WA 98195-7720 USA
| | - Eric Larson
- Group Health Research Institute, Metropolitan Park East, 1730 Minor Avenue, Suite 1600, Seattle, WA 98101-1448 USA
| | - Jennifer A Pacheco
- Center for Genetic Medicine, Northwestern University Feinberg School of Medicine, 303 E. Superior Street, Lurie 7-125, Chicago, IL 60611 USA
| | - Laura J Rasmussen-Torvik
- Department of Preventive Medicine, Northwestern University, Feinberg School of Medicine, 680 N Lake Shore Drive, Suite 1400, Chicago, IL 60611 USA
| | - Carrie B Moore
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA ; Center for Human Genetics Research, Vanderbilt University School of Medicine, 519 Light Hall, Nashville, TN 37232 USA
| | - Folkert W Asselbergs
- Department of Cardiology, Division Heart and Lungs, University Medical Center Utrecht, Room E03.511, P.O. Box 85500, 3508 GA Utrecht, The Netherlands ; Institute of Cardiovascular Science, University College London, London, UK ; Durrer Center for Cardiogenetic Research, ICIN-Netherlands Heart Institute, Utrecht, The Netherlands
| | - Jason H Moore
- Institute for Biomedical Informatics, The Perelman School of Medicine, University of Pennsylvania, 1418 Blockley Hall, 423 Guardian Drive, Philadelphia, PA 19104-6021 USA
| | - Marylyn D Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, 512 Wartik Laboratory, The Pennsylvania State University, University Park, PA 16802 USA
| | - Brendan J Keating
- The Joseph Stokes Jr. Research Institute, The Children's Hospital of Philadelphia, Office 1016 Abramson Building, Room 1216E, 3615 Civic Center Blvd, Philadelphia, PA 19104 USA ; University Medical Center Utrecht, Utrecht, The Netherlands
| | - Diane Gilbert-Diamond
- Institute for Quantitative Biomedical Sciences at Dartmouth, Hanover, NH USA ; Department of Epidemiology, Geisel School of Medicine at Dartmouth, One Medical Center Drive, 7927 Rubin Building, Lebanon, NH 03756 USA
| |
Collapse
|
28
|
Niel C, Sinoquet C, Dina C, Rocheleau G. A survey about methods dedicated to epistasis detection. Front Genet 2015; 6:285. [PMID: 26442103 PMCID: PMC4564769 DOI: 10.3389/fgene.2015.00285] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2015] [Accepted: 08/27/2015] [Indexed: 12/25/2022] Open
Abstract
During the past decade, findings of genome-wide association studies (GWAS) improved our knowledge and understanding of disease genetics. To date, thousands of SNPs have been associated with diseases and other complex traits. Statistical analysis typically looks for association between a phenotype and a SNP taken individually via single-locus tests. However, geneticists admit this is an oversimplified approach to tackle the complexity of underlying biological mechanisms. Interaction between SNPs, namely epistasis, must be considered. Unfortunately, epistasis detection gives rise to analytic challenges since analyzing every SNP combination is at present impractical at a genome-wide scale. In this review, we will present the main strategies recently proposed to detect epistatic interactions, along with their operating principle. Some of these methods are exhaustive, such as multifactor dimensionality reduction, likelihood ratio-based tests or receiver operating characteristic curve analysis; some are non-exhaustive, such as machine learning techniques (random forests, Bayesian networks) or combinatorial optimization approaches (ant colony optimization, computational evolution system).
Collapse
Affiliation(s)
- Clément Niel
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, Ecole Polytechnique de l'Université de Nantes Nantes, France
| | - Christine Sinoquet
- Computer Science Institute of Nantes-Atlantic (Lina), Centre National de la Recherche Scientifique UMR 6241, University of Nantes Nantes, France
| | - Christian Dina
- Institut du Thorax, Institut National de la Santé et de la Recherche Médicale UMR 1087, Centre National de la Recherche Scientifique UMR 6291, University of Nantes Nantes, France
| | - Ghislain Rocheleau
- European Genomic Institute for Diabetes FR3508, Centre National de la Recherche Scientifique UMR 8199, Lille 2 University Lille, France
| |
Collapse
|
29
|
Pendergrass SA, Verma A, Okula A, Hall MA, Crawford DC, Ritchie MD. Phenome-Wide Association Studies: Embracing Complexity for Discovery. Hum Hered 2015. [PMID: 26201697 DOI: 10.1159/000381851] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The inherent complexity of biological systems can be leveraged for a greater understanding of the impact of genetic architecture on outcomes, traits, and pharmacological response. The genome-wide association study (GWAS) approach has well-developed methods and relatively straight-forward methodologies; however, the bigger picture of the impact of genetic architecture on phenotypic outcome still remains to be elucidated even with an ever-growing number of GWAS performed. Greater consideration of the complexity of biological processes, using more data from the phenome, exposome, and diverse -omic resources, including considering the interplay of pleiotropy and genetic interactions, may provide additional leverage for making the most of the incredible wealth of information available for study. Here, we describe how incorporating greater complexity into analyses through the use of additional phenotypic data and widespread deployment of phenome-wide association studies may provide new insights into genetic factors influencing diseases, traits, and pharmacological response.
Collapse
Affiliation(s)
- Sarah A Pendergrass
- Biomedical and Translational Informatics Program, Geisinger Health System, Danville, Pa., USA
| | | | | | | | | | | |
Collapse
|
30
|
Hall MA, Verma SS, Wallace J, Lucas A, Berg RL, Connolly J, Crawford DC, Crosslin DR, de Andrade M, Doheny KF, Haines JL, Harley JB, Jarvik GP, Kitchner T, Kuivaniemi H, Larson EB, Carrell DS, Tromp G, Vrabec TR, Pendergrass SA, McCarty CA, Ritchie MD. Biology-Driven Gene-Gene Interaction Analysis of Age-Related Cataract in the eMERGE Network. Genet Epidemiol 2015; 39:376-84. [PMID: 25982363 PMCID: PMC4550090 DOI: 10.1002/gepi.21902] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2014] [Revised: 02/27/2015] [Accepted: 03/13/2015] [Indexed: 01/19/2023]
Abstract
Bioinformatics approaches to examine gene-gene models provide a means to discover interactions between multiple genes that underlie complex disease. Extensive computational demands and adjusting for multiple testing make uncovering genetic interactions a challenge. Here, we address these issues using our knowledge-driven filtering method, Biofilter, to identify putative single nucleotide polymorphism (SNP) interaction models for cataract susceptibility, thereby reducing the number of models for analysis. Models were evaluated in 3,377 European Americans (1,185 controls, 2,192 cases) from the Marshfield Clinic, a study site of the Electronic Medical Records and Genomics (eMERGE) Network, using logistic regression. All statistically significant models from the Marshfield Clinic were then evaluated in an independent dataset of 4,311 individuals (742 controls, 3,569 cases), using independent samples from additional study sites in the eMERGE Network: Mayo Clinic, Group Health/University of Washington, Vanderbilt University Medical Center, and Geisinger Health System. Eighty-three SNP-SNP models replicated in the independent dataset at likelihood ratio test P < 0.05. Among the most significant replicating models was rs12597188 (intron of CDH1)-rs11564445 (intron of CTNNB1). These genes are known to be involved in processes that include: cell-to-cell adhesion signaling, cell-cell junction organization, and cell-cell communication. Further Biofilter analysis of all replicating models revealed a number of common functions among the genes harboring the 83 replicating SNP-SNP models, which included signal transduction and PI3K-Akt signaling pathway. These findings demonstrate the utility of Biofilter as a biology-driven method, applicable for any genome-wide association study dataset.
Collapse
Affiliation(s)
- Molly A Hall
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Shefali S Verma
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - John Wallace
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Anastasia Lucas
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Richard L Berg
- Marshfield Clinic, Marshfield, Wisconsin, United States of America
| | - John Connolly
- Center for Applied Genomics, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America
| | - Dana C Crawford
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - David R Crosslin
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America
| | | | - Kimberly F Doheny
- Center for Inherited Disease Research, IGM, Johns Hopkins University SOM, Baltimore, Maryland, United States of America
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, United States of America
| | - John B Harley
- Department of Pediatrics, Cincinnati Children's Hospital, University of Cincinnati, Cincinnati, Ohio, United States of America
| | - Gail P Jarvik
- Department of Genome Sciences, University of Washington, Seattle, Washington, United States of America.,Division of Medical Genetics, Department of Medicine, University of Washington, Seattle, Washington, United States of America
| | - Terrie Kitchner
- Marshfield Clinic, Marshfield, Wisconsin, United States of America
| | - Helena Kuivaniemi
- Geisinger Health System, Danville, Pennsylvania, United States of America
| | - Eric B Larson
- Group Health Research Institute, Seattle, Washington, United States of America
| | - David S Carrell
- Group Health Research Institute, Seattle, Washington, United States of America
| | - Gerard Tromp
- Geisinger Health System, Danville, Pennsylvania, United States of America
| | - Tamara R Vrabec
- Geisinger Health System, Danville, Pennsylvania, United States of America
| | | | | | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Pennsylvania State University, University Park, Pennsylvania, United States of America.,Geisinger Health System, Danville, Pennsylvania, United States of America
| |
Collapse
|
31
|
HU TING, DARABOS CHRISTIAN, CRICCO MARIAE, KONG EMILY, MOORE JASONH. Genome-wide genetic interaction analysis of glaucoma using expert knowledge derived from human phenotype networks. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015; 20:207-18. [PMID: 25592582 PMCID: PMC4299930] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
The large volume of GWAS data poses great computational challenges for analyzing genetic interactions associated with common human diseases. We propose a computational framework for characterizing epistatic interactions among large sets of genetic attributes in GWAS data. We build the human phenotype network (HPN) and focus around a disease of interest. In this study, we use the GLAUGEN glaucoma GWAS dataset and apply the HPN as a biological knowledge-based filter to prioritize genetic variants. Then, we use the statistical epistasis network (SEN) to identify a significant connected network of pairwise epistatic interactions among the prioritized SNPs. These clearly highlight the complex genetic basis of glaucoma. Furthermore, we identify key SNPs by quantifying structural network characteristics. Through functional annotation of these key SNPs using Biofilter, a software accessing multiple publicly available human genetic data sources, we find supporting biomedical evidences linking glaucoma to an array of genetic diseases, proving our concept. We conclude by suggesting hypotheses for a better understanding of the disease.
Collapse
Affiliation(s)
| | | | - MARIA E. CRICCO
- Institute for the Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College Hanover, NH 03755, U.S.A.
| | - EMILY KONG
- Institute for the Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College Hanover, NH 03755, U.S.A.
| | - JASON H. MOORE
- Institute for the Quantitative Biomedical Sciences, Geisel School of Medicine, Dartmouth College Hanover, NH 03755, U.S.A.
| |
Collapse
|
32
|
Moore CB, Verma A, Pendergrass S, Verma SS, Johnson DH, Daar ES, Gulick RM, Haubrich R, Robbins GK, Ritchie MD, Haas DW. Phenome-wide Association Study Relating Pretreatment Laboratory Parameters With Human Genetic Variants in AIDS Clinical Trials Group Protocols. Open Forum Infect Dis 2015; 2:ofu113. [PMID: 25884002 PMCID: PMC4396430 DOI: 10.1093/ofid/ofu113] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Accepted: 12/02/2014] [Indexed: 01/11/2023] Open
Abstract
Background. Phenome-Wide Association Studies (PheWAS) identify genetic associations across multiple phenotypes. Clinical trials offer opportunities for PheWAS to identify pharmacogenomic associations. We describe the first PheWAS to use genome-wide genotypic data and to utilize human immunodeficiency virus (HIV) clinical trials data. As proof-of-concept, we focused on baseline laboratory phenotypes from antiretroviral therapy-naive individuals. Methods. Data from 4 AIDS Clinical Trials Group (ACTG) studies were split into 2 datasets: Dataset I (1181 individuals from protocol A5202) and Dataset II (1366 from protocols A5095, ACTG 384, and A5142). Final analyses involved 2547 individuals and 5 954 294 imputed polymorphisms. We calculated comprehensive associations between these polymorphisms and 27 baseline laboratory phenotypes. Results. A total of 10 584 (0.17%) polymorphisms had associations with P < .01 in both datasets and with the same direction of association. Twenty polymorphisms replicated associations with identical or related phenotypes reported in the Catalog of Published Genome-Wide Association Studies, including several not previously reported in HIV-positive cohorts. We also identified several possibly novel associations. Conclusions. These analyses define PheWAS properties and principles with baseline laboratory data from HIV clinical trials. This approach may be useful for evaluating on-treatment HIV clinical trials data for associations with various clinical phenotypes.
Collapse
Affiliation(s)
- Carrie B. Moore
- Vanderbilt University School of Medicine, Nashville, Tennessee
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Anurag Verma
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Sarah Pendergrass
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - Shefali S. Verma
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | | | - Eric S. Daar
- Los Angeles Biomed Research Institute at Harbor-UCLA Medical Center, Torrance, California
| | | | | | | | - Marylyn D. Ritchie
- The Center for Systems Genomics, The Pennsylvania State University, University Park
| | - David W. Haas
- Vanderbilt University School of Medicine, Nashville, Tennessee
| |
Collapse
|
33
|
Kim D, Li R, Dudek SM, Wallace JR, Ritchie MD. Binning somatic mutations based on biological knowledge for predicting survival: an application in renal cell carcinoma. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015:96-107. [PMID: 25592572 PMCID: PMC4299944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a new approach for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.
Collapse
|
34
|
Hall MA, Verma A, Brown-Gentry KD, Goodloe R, Boston J, Wilson S, McClellan B, Sutcliffe C, Dilks HH, Gillani NB, Jin H, Mayo P, Allen M, Schnetz-Boutaud N, Crawford DC, Ritchie MD, Pendergrass SA. Detection of pleiotropy through a Phenome-wide association study (PheWAS) of epidemiologic data as part of the Environmental Architecture for Genes Linked to Environment (EAGLE) study. PLoS Genet 2014; 10:e1004678. [PMID: 25474351 PMCID: PMC4256091 DOI: 10.1371/journal.pgen.1004678] [Citation(s) in RCA: 51] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2014] [Accepted: 08/16/2014] [Indexed: 12/19/2022] Open
Abstract
We performed a Phenome-wide association study (PheWAS) utilizing diverse genotypic and phenotypic data existing across multiple populations in the National Health and Nutrition Examination Surveys (NHANES), conducted by the Centers for Disease Control and Prevention (CDC), and accessed by the Epidemiological Architecture for Genes Linked to Environment (EAGLE) study. We calculated comprehensive tests of association in Genetic NHANES using 80 SNPs and 1,008 phenotypes (grouped into 184 phenotype classes), stratified by race-ethnicity. Genetic NHANES includes three surveys (NHANES III, 1999-2000, and 2001-2002) and three race-ethnicities: non-Hispanic whites (n = 6,634), non-Hispanic blacks (n = 3,458), and Mexican Americans (n = 3,950). We identified 69 PheWAS associations replicating across surveys for the same SNP, phenotype-class, direction of effect, and race-ethnicity at p<0.01, allele frequency >0.01, and sample size >200. Of these 69 PheWAS associations, 39 replicated previously reported SNP-phenotype associations, 9 were related to previously reported associations, and 21 were novel associations. Fourteen results had the same direction of effect across more than one race-ethnicity: one result was novel, 11 replicated previously reported associations, and two were related to previously reported results. Thirteen SNPs showed evidence of pleiotropy. We further explored results with gene-based biological networks, contrasting the direction of effect for pleiotropic associations across phenotypes. One PheWAS result was ABCG2 missense SNP rs2231142, associated with uric acid levels in both non-Hispanic whites and Mexican Americans, protoporphyrin levels in non-Hispanic whites and Mexican Americans, and blood pressure levels in Mexican Americans. Another example was SNP rs1800588 near LIPC, significantly associated with the novel phenotypes of folate levels (Mexican Americans), vitamin E levels (non-Hispanic whites) and triglyceride levels (non-Hispanic whites), and replication for cholesterol levels. The results of this PheWAS show the utility of this approach for exposing more of the complex genetic architecture underlying multiple traits, through generating novel hypotheses for future research.
Collapse
Affiliation(s)
- Molly A. Hall
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Anurag Verma
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Kristin D. Brown-Gentry
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Robert Goodloe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Jonathan Boston
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Sarah Wilson
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Bob McClellan
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Cara Sutcliffe
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Holly H. Dilks
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nila B. Gillani
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Hailing Jin
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Ping Mayo
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Melissa Allen
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Nathalie Schnetz-Boutaud
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Dana C. Crawford
- Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee, United States of America
- Department of Molecular Physiology and Biophysics, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Marylyn D. Ritchie
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Sarah A. Pendergrass
- Center for Systems Genomics, Department of Biochemistry and Molecular Biology, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| |
Collapse
|
35
|
Chhibber A, Kroetz DL, Tantisira KG, McGeachie M, Cheng C, Plenge R, Stahl E, Sadee W, Ritchie MD, Pendergrass SA. Genomic architecture of pharmacological efficacy and adverse events. Pharmacogenomics 2014; 15:2025-48. [PMID: 25521360 PMCID: PMC4308414 DOI: 10.2217/pgs.14.144] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The pharmacokinetic and pharmacodynamic disciplines address pharmacological traits, including efficacy and adverse events. Pharmacogenomics studies have identified pervasive genetic effects on treatment outcomes, resulting in the development of genetic biomarkers for optimization of drug therapy. Pharmacogenomics-based tests are already being applied in clinical decision making. However, despite substantial progress in identifying the genetic etiology of pharmacological response, current biomarker panels still largely rely on single gene tests with a large portion of the genetic effects remaining to be discovered. Future research must account for the combined effects of multiple genetic variants, incorporate pathway-based approaches, explore gene-gene interactions and nonprotein coding functional genetic variants, extend studies across ancestral populations, and prioritize laboratory characterization of molecular mechanisms. Because genetic factors can play a key role in drug response, accurate biomarker tests capturing the main genetic factors determining treatment outcomes have substantial potential for improving individual clinical care.
Collapse
Affiliation(s)
- Aparna Chhibber
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Deanna L Kroetz
- Department of Bioengineering & Therapeutic Sciences, Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, CA,USA
| | - Kelan G Tantisira
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Michael McGeachie
- Department of Medicine, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Cheng Cheng
- Department of Biostatistics, St Jude Children's Research Hospital, Memphis, TN, USA
| | - Robert Plenge
- Division of Rheumatology, Immunology & Allergy, Division of Genetics, Brigham & Women's Hospital, Harvard Medical School, Cambridge, MA, USA
| | - Eli Stahl
- Department of Genetics & Genomic Sciences, Mount Sinai Hospital, New York, NY, USA
| | - Wolfgang Sadee
- Center for Pharmacogenomics, College of Medicine, The Ohio State University, Columbus, OH, USA
| | - Marylyn D Ritchie
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| | - Sarah A Pendergrass
- Department of Biochemistry & Molecular Biology, Center for Systems Genomics, Eberly College of Science, The Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16801, USA
| |
Collapse
|
36
|
Kim D, Li R, Dudek SM, Frase AT, Pendergrass SA, Ritchie MD. Knowledge-driven genomic interactions: an application in ovarian cancer. BioData Min 2014; 7:20. [PMID: 25214892 PMCID: PMC4161273 DOI: 10.1186/1756-0381-7-20] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Accepted: 08/28/2014] [Indexed: 12/11/2022] Open
Abstract
Background Effective cancer clinical outcome prediction for understanding of the mechanism of various types of cancer has been pursued using molecular-based data such as gene expression profiles, an approach that has promise for providing better diagnostics and supporting further therapies. However, clinical outcome prediction based on gene expression profiles varies between independent data sets. Further, single-gene expression outcome prediction is limited for cancer evaluation since genes do not act in isolation, but rather interact with other genes in complex signaling or regulatory networks. In addition, since pathways are more likely to co-operate together, it would be desirable to incorporate expert knowledge to combine pathways in a useful and informative manner. Methods Thus, we propose a novel approach for identifying knowledge-driven genomic interactions and applying it to discover models associated with cancer clinical phenotypes using grammatical evolution neural networks (GENN). In order to demonstrate the utility of the proposed approach, an ovarian cancer data from the Cancer Genome Atlas (TCGA) was used for predicting clinical stage as a pilot project. Results We identified knowledge-driven genomic interactions associated with cancer stage from single knowledge bases such as sources of pathway-pathway interaction, but also knowledge-driven genomic interactions across different sets of knowledge bases such as pathway-protein family interactions by integrating different types of information. Notably, an integration model from different sources of biological knowledge achieved 78.82% balanced accuracy and outperformed the top models with gene expression or single knowledge-based data types alone. Furthermore, the results from the models are more interpretable because they are framed in the context of specific biological pathways or other expert knowledge. Conclusions The success of the pilot study we have presented herein will allow us to pursue further identification of models predictive of clinical cancer survival and recurrence. Understanding the underlying tumorigenesis and progression in ovarian cancer through the global view of interactions within/between different biological knowledge sources has the potential for providing more effective screening strategies and therapeutic targets for many types of cancer.
Collapse
Affiliation(s)
- Dokyoon Kim
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Ruowang Li
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Scott M Dudek
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Alex T Frase
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Sarah A Pendergrass
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Center for Systems Genomics, Pennsylvania State University, University Park, Pennsylvania, USA
| |
Collapse
|
37
|
Crawford DC, Crosslin DR, Tromp G, Kullo IJ, Kuivaniemi H, Hayes MG, Denny JC, Bush WS, Haines JL, Roden DM, McCarty CA, Jarvik GP, Ritchie MD. eMERGEing progress in genomics-the first seven years. Front Genet 2014; 5:184. [PMID: 24987407 PMCID: PMC4060012 DOI: 10.3389/fgene.2014.00184] [Citation(s) in RCA: 67] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2014] [Accepted: 05/30/2014] [Indexed: 12/15/2022] Open
Abstract
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
Collapse
Affiliation(s)
- Dana C Crawford
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Molecular Physiology and Biophysics, Vanderbilt University Nashville, TN, USA
| | - David R Crosslin
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Gerard Tromp
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - Iftikhar J Kullo
- Division of Cardiovascular Diseases and the Gonda Vascular Center, Mayo Clinic Rochester, MN, USA
| | - Helena Kuivaniemi
- The Sigfried and Janet Weis Center for Research, Geisinger Health System Danville, PA, USA
| | - M Geoffrey Hayes
- Division of Endocrinology, Metabolism, and Molecular Medicine, Department of Medicine, Feinberg School of Medicine, Northwestern University Chicago, IL, USA
| | - Joshua C Denny
- Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA ; Department of Medicine, Vanderbilt University Nashville, TN, USA
| | - William S Bush
- Center for Human Genetics Research, Vanderbilt University Nashville, TN, USA ; Department of Biomedical Informatics, Vanderbilt University Nashville, TN, USA
| | - Jonathan L Haines
- Department of Epidemiology and Biostatistics, Case Western Reserve University Cleveland, OH, USA ; Institute for Computational Biology, Case Western Reserve University Cleveland, OH, USA
| | - Dan M Roden
- Department of Medicine, Vanderbilt University Nashville, TN, USA ; Department of Pharmacology, Vanderbilt University Nashville, TN, USA
| | | | - Gail P Jarvik
- Medical Genetics, Department of Medicine, School of Medicine, University of Washington Seattle, WA, USA ; Department of Genome Sciences, University of Washington Seattle, WA, USA
| | - Marylyn D Ritchie
- Department of Biochemistry and Molecular Biology, Pennsylvania State University University Park, PA, USA ; Center for Systems Genomics, Pennsylvania State University University Park, PA, USA
| |
Collapse
|
38
|
Sun X, Lu Q, Mukherjee S, Crane PK, Elston R, Ritchie MD. Analysis pipeline for the epistasis search - statistical versus biological filtering. Front Genet 2014; 5:106. [PMID: 24817878 PMCID: PMC4012196 DOI: 10.3389/fgene.2014.00106] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2014] [Accepted: 04/10/2014] [Indexed: 12/15/2022] Open
Abstract
Gene-gene interactions may contribute to the genetic variation underlying complex traits but have not always been taken fully into account. Statistical analyses that consider gene-gene interaction may increase the power of detecting associations, especially for low-marginal-effect markers, and may explain in part the "missing heritability." Detecting pair-wise and higher-order interactions genome-wide requires enormous computational power. Filtering pipelines increase the computational speed by limiting the number of tests performed. We summarize existing filtering approaches to detect epistasis, after distinguishing the purposes that lead us to search for epistasis. Statistical filtering includes quality control on the basis of single marker statistics to avoid the analysis of bad and least informative data, and limits the search space for finding interactions. Biological filtering includes targeting specific pathways, integrating various databases based on known biological and metabolic pathways, gene function ontology and protein-protein interactions. It is increasingly possible to target single-nucleotide polymorphisms that have defined functions on gene expression, though not belonging to protein-coding genes. Filtering can improve the power of an interaction association study, but also increases the chance of missing important findings.
Collapse
Affiliation(s)
- Xiangqing Sun
- Department of Epidemiology and Biostatistics, Case Western Reserve UniversityCleveland, OH, USA
| | - Qing Lu
- Department of Epidemiology and Biostatistics, Michigan State UniversityEast Lansing, MI, USA
| | | | - Paul K. Crane
- Department of Medicine, University of WashingtonSeattle, WA, USA
| | - Robert Elston
- Department of Epidemiology and Biostatistics, Case Western Reserve UniversityCleveland, OH, USA
| | - Marylyn D. Ritchie
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University ParkPA, USA
| |
Collapse
|