1
|
Tang D, Freudenberg J, Dahl A. Factorizing polygenic epistasis improves prediction and uncovers biological pathways in complex traits. Am J Hum Genet 2023; 110:1875-1887. [PMID: 37922884 PMCID: PMC10645564 DOI: 10.1016/j.ajhg.2023.10.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 11/07/2023] Open
Abstract
Epistasis is central in many domains of biology, but it has not yet been proven useful for understanding the etiology of complex traits. This is partly because complex-trait epistasis involves polygenic interactions that are poorly captured in current models. To address this gap, we developed a model called Epistasis Factor Analysis (EFA). EFA assumes that polygenic epistasis can be factorized into interactions between a few epistasis factors (EFs), which represent latent polygenic components of the observed complex trait. The statistical goals of EFA are to improve polygenic prediction and to increase power to detect epistasis, while the biological goal is to unravel genetic effects into more-homogeneous units. We mathematically characterize EFA and use simulations to show that EFA outperforms current epistasis models when its assumptions approximately hold. Applied to predicting yeast growth rates, EFA outperforms the additive model for several traits with large epistasis heritability and uniformly outperforms the standard epistasis model. We replicate these prediction improvements in a second dataset. We then apply EFA to four previously characterized traits in the UK Biobank and find statistically significant epistasis in all four, including two that are robust to scale transformation. Moreover, we find that the inferred EFs partly recover pre-defined biological pathways for two of the traits. Our results demonstrate that more realistic models can identify biologically and statistically meaningful epistasis in complex traits, indicating that epistasis has potential for precision medicine and characterizing the biology underlying GWAS results.
Collapse
Affiliation(s)
- David Tang
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Program in Bioinformatics and Integrative Genomics, Harvard Medical School, Boston, MA, USA.
| | - Jerome Freudenberg
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA; Bioinformatics Interdepartmental Program, University of California Los Angeles, Los Angeles, CA, USA
| | - Andy Dahl
- Section of Genetic Medicine, University of Chicago, Chicago, IL, USA.
| |
Collapse
|
2
|
Bai H, Naj AC, Benchek P, Dumitrescu L, Hohman T, Hamilton-Nelson K, Kallianpur AR, Griswold AJ, Vardarajan B, Martin ER, Beecham GW, Below JE, Schellenberg G, Mayeux R, Farrer L, Pericak-Vance MA, Haines JL, Bush WS. A haptoglobin (HP) structural variant alters the effect of APOE alleles on Alzheimer's disease. Alzheimers Dement 2023; 19:4886-4895. [PMID: 37051669 DOI: 10.1002/alz.13050] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Revised: 01/19/2023] [Accepted: 01/20/2023] [Indexed: 04/14/2023]
Abstract
BACKGROUND Haptoglobin (HP) is an antioxidant of apolipoprotein E (APOE), and previous reports have shown HP binds with APOE and amyloid beta (Aβ) to aid its clearance. A common structural variant of the HP gene distinguishes it into two alleles: HP1 and HP2. METHODS HP genotypes were imputed in 29 cohorts from the Alzheimer's Disease Genetics Consortium (N = 20,512). Associations between the HP polymorphism and Alzheimer's disease (AD) risk and age of onset through APOE interactions were investigated using regression models. RESULTS The HP polymorphism significantly impacts AD risk in European-descent individuals (and in meta-analysis with African-descent individuals) by modifying both the protective effect of APOE ε2 and the detrimental effect of APOE ε4. The effect is particularly significant among APOE ε4 carriers. DISCUSSION The effect modification of APOE by HP suggests adjustment and/or stratification by HP genotype is warranted when APOE risk is considered. Our findings also provided directions for further investigations on potential mechanisms behind this association.
Collapse
Affiliation(s)
- Haimeng Bai
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
- Systems Biology and Bioinformatics, Department of Nutrition, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Adam C Naj
- Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Penelope Benchek
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Logan Dumitrescu
- Vanderbilt Memory & Alzheimer's Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Timothy Hohman
- Vanderbilt Memory & Alzheimer's Center, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Kara Hamilton-Nelson
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, USA
| | - Asha R Kallianpur
- Genomic Medicine Institute, Lerner Research Institute, Cleveland Clinic Foundation, Cleveland, Ohio, USA
- Department of Molecular Medicine, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - Anthony J Griswold
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, USA
| | - Badri Vardarajan
- Department of Neurology, The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University, New York, New York, USA
| | - Eden R Martin
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, USA
| | - Gary W Beecham
- John P. Hussman Institute for Human Genomics, University of Miami, Miami, Florida, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Gerard Schellenberg
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, University of Pennsylvania Perelman School of Medicine, Philadelphia, Pennsylvania, USA
| | - Richard Mayeux
- Department of Neurology, The Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University, New York, New York, USA
| | - Lindsay Farrer
- Departments of Medicine (Biomedical Genetics), Neurology, Ophthalmology, Biostatistics, and Epidemiology, Boston University Schools of Medicine and Public Health, Boston, Massachusetts, USA
| | | | - Jonathan L Haines
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| | - William S Bush
- Cleveland Institute for Computational Biology, Case Western Reserve University, Cleveland, Ohio, USA
- Department of Population & Quantitative Health Sciences, Case Western Reserve University School of Medicine, Cleveland, Ohio, USA
| |
Collapse
|
3
|
Zeng L, Moser S, Mirza-Schreiber N, Lamina C, Coassin S, Nelson CP, Annilo T, Franzén O, Kleber ME, Mack S, Andlauer TFM, Jiang B, Stiller B, Li L, Willenborg C, Munz M, Kessler T, Kastrati A, Laugwitz KL, Erdmann J, Moebus S, Nöthen MM, Peters A, Strauch K, Müller-Nurasyid M, Gieger C, Meitinger T, Steinhagen-Thiessen E, März W, Metspalu A, Björkegren JLM, Samani NJ, Kronenberg F, Müller-Myhsok B, Schunkert H. Cis-epistasis at the LPA locus and risk of cardiovascular diseases. Cardiovasc Res 2022; 118:1088-1102. [PMID: 33878186 PMCID: PMC8930071 DOI: 10.1093/cvr/cvab136] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Accepted: 04/16/2021] [Indexed: 12/28/2022] Open
Abstract
AIMS Coronary artery disease (CAD) has a strong genetic predisposition. However, despite substantial discoveries made by genome-wide association studies (GWAS), a large proportion of heritability awaits identification. Non-additive genetic effects might be responsible for part of the unaccounted genetic variance. Here, we attempted a proof-of-concept study to identify non-additive genetic effects, namely epistatic interactions, associated with CAD. METHODS AND RESULTS We tested for epistatic interactions in 10 CAD case-control studies and UK Biobank with focus on 8068 SNPs at 56 loci with known associations with CAD risk. We identified a SNP pair located in cis at the LPA locus, rs1800769 and rs9458001, to be jointly associated with risk for CAD [odds ratio (OR) = 1.37, P = 1.07 × 10-11], peripheral arterial disease (OR = 1.22, P = 2.32 × 10-4), aortic stenosis (OR = 1.47, P = 6.95 × 10-7), hepatic lipoprotein(a) (Lp(a)) transcript levels (beta = 0.39, P = 1.41 × 10-8), and Lp(a) serum levels (beta = 0.58, P = 8.7 × 10-32), while individual SNPs displayed no association. Further exploration of the LPA locus revealed a strong dependency of these associations on a rare variant, rs140570886, that was previously associated with Lp(a) levels. We confirmed increased CAD risk for heterozygous (relative OR = 1.46, P = 9.97 × 10-32) and individuals homozygous for the minor allele (relative OR = 1.77, P = 0.09) of rs140570886. Using forward model selection, we also show that epistatic interactions between rs140570886, rs9458001, and rs1800769 modulate the effects of the rs140570886 risk allele. CONCLUSIONS These results demonstrate the feasibility of a large-scale knowledge-based epistasis scan and provide rare evidence of an epistatic interaction in a complex human disease. We were directed to a variant (rs140570886) influencing risk through additive genetic as well as epistatic effects. In summary, this study provides deeper insights into the genetic architecture of a locus important for cardiovascular diseases.
Collapse
Affiliation(s)
- Lingyao Zeng
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Sylvain Moser
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- International Max Planck Research School for Translational Psychiatry (IMPRS-TP), Munich 80804, Germany
| | - Nazanin Mirza-Schreiber
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Institute of Neurogenomics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Claudia Lamina
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Stefan Coassin
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Christopher P Nelson
- Department of Cardiovascular Sciences, University of Leicester, BHF Cardiovascular Research Centre, Glenfield Hospital, Groby Rd, Leicester LE3 9QP, UK
- NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, UK
| | - Tarmo Annilo
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
| | - Oscar Franzén
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Karolinska Institutet, Huddinge, 14186 Stockholm, Sweden
| | - Marcus E Kleber
- Medizinische Klinik V (Nephrologie, Hypertensiologie, Rheumatologie, Endokrinologie, Diabetologie), Medizinische Fakultät Mannheim der Universität Heidelberg, 69120 Heidelberg, Germany
| | - Salome Mack
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Till F M Andlauer
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Department of Neurology, Klinikum rechts der Isar, School of Medicine, Technical University of Munich, 81675 Munich, Germany
| | - Beibei Jiang
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
| | - Barbara Stiller
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Ling Li
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
| | - Christina Willenborg
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
| | - Matthias Munz
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
- Charité – University Medicine Berlin, Corporate Member of Freie Universität Berlin, Humboldt-Universität zu Berlin, and Berlin Institute of Health, Institute for Dental and Craniofacial Sciences, Department of Periodontology and Synoptic Dentistry, 14197 Berlin, Germany
| | - Thorsten Kessler
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Adnan Kastrati
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Munich Heart Alliance, 80636 Munich, Germany
| | - Karl-Ludwig Laugwitz
- Medizinische Klinik, Klinikum rechts der Isar, Technische Universität München, 81675 Munich, Germany
| | - Jeanette Erdmann
- Institute for Cardiogenetics and University Heart Center Luebeck, University of Lübeck, 23562 Lübeck, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| | - Susanne Moebus
- Institute for Medical Informatics, Biometry and Epidemiology, University Hospital Essen, 45147 Essen, Germany
- Centre for Urbane Epidemiology, University Hospital Essen, 45147 Essen, Germany
| | - Markus M Nöthen
- Institute of Human Genetics, University of Bonn School of Medicine & University Hospital Bonn, 53012 Bonn, Germany
| | - Annette Peters
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
| | - Konstantin Strauch
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, 55101 Mainz, Germany
| | - Martina Müller-Nurasyid
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- IBE, Faculty of Medicine, LMU Munich, 81377 Munich, Germany
- Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center, Johannes Gutenberg University, 55101 Mainz, Germany
- Department of Internal Medicine I (Cardiology), Hospital of the Ludwig-Maximilians-University (LMU) Munich, 81377 Munich, Germany
| | - Christian Gieger
- Institute of Genetic Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, 85764 Neuherberg, Germany
- Institute of Epidemiology II, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | - Thomas Meitinger
- Institute of Human Genetics, Helmholtz Zentrum München, 85764 Neuherberg, Germany
| | | | - Winfried März
- Medizinische Klinik V (Nephrologie, Hypertensiologie, Rheumatologie, Endokrinologie, Diabetologie), Medizinische Fakultät Mannheim der Universität Heidelberg, 69120 Heidelberg, Germany
- Synlab Akademie, Synlab Holding Deutschland GmbH, Mannheim und Augsburg, 86156 Augsburg, Germany
| | - Andres Metspalu
- Estonian Genome Center, Institute of Genomics, University of Tartu, 51010 Tartu, Estonia
- Institute of Molecular and Cell Biology, University of Tartu, 51010 Tartu, Estonia
| | - Johan L M Björkegren
- Department of Genetics and Genomic Sciences and Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, One Gustave L. Levy Place, New York, NY 10029, USA
- Integrated Cardio Metabolic Centre, Karolinska Institutet, Huddinge, 14186 Stockholm, Sweden
| | - Nilesh J Samani
- Department of Cardiovascular Sciences, University of Leicester, BHF Cardiovascular Research Centre, Glenfield Hospital, Groby Rd, Leicester LE3 9QP, UK
- NIHR Leicester Biomedical Research Centre, Glenfield Hospital, Leicester LE3 9QP, UK
| | - Florian Kronenberg
- Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Innsbruck, Innsbruck 6020, Austria
| | - Bertram Müller-Myhsok
- Department of Translational Research in Psychiatry, Max Planck Institute of Psychiatry, 80804 Munich, Germany
- Munich Cluster of Systems Biology, SyNergy, 81377 Munich, Germany
- Department of Health Data Science, University of Liverpool, Liverpool L69 3BX, UK
| | - Heribert Schunkert
- Deutsches Herzzentrum München, Klinik für Herz- und Kreislauferkrankungen, Technische Universität München, 80636 Munich, Germany
- Deutsches Zentrum für Herz- und Kreislauf-Forschung (DZHK), Partner Site Hamburg/Lübeck/Kiel, 23562 Lübeck, Germany
| |
Collapse
|
4
|
Blumenthal DB, Baumbach J, Hoffmann M, Kacprowski T, List M. A framework for modeling epistatic interaction. Bioinformatics 2021; 37:1708-1716. [PMID: 33252645 DOI: 10.1093/bioinformatics/btaa990] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2020] [Revised: 10/21/2020] [Accepted: 11/16/2020] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Recently, various tools for detecting single nucleotide polymorphisms (SNPs) involved in epistasis have been developed. However, no studies evaluate the employed statistical epistasis models such as the χ2-test or quadratic regression independently of the tools that use them. Such an independent evaluation is crucial for developing improved epistasis detection tools, for it allows to decide if a tool's performance should be attributed to the epistasis model or to the optimization strategy run on top of it. RESULTS We present a protocol for evaluating epistasis models independently of the tools they are used in and generalize existing models designed for dichotomous phenotypes to the categorical and quantitative case. In addition, we propose a new model which scores candidate SNP sets by computing maximum likelihood distributions for the observed phenotypes in the cells of their penetrance tables. Extensive experiments show that the proposed maximum likelihood model outperforms three widely used epistasis models in most cases. The experiments also provide valuable insights into the properties of existing models, for instance, that quadratic regression perform particularly well on instances with quantitative phenotypes. AVAILABILITY AND IMPLEMENTATION The evaluation protocol and all compared models are implemented in C++ and are supported under Linux and macOS. They are available at https://github.com/baumbachlab/genepiseeker/, along with test datasets and scripts to reproduce the experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David B Blumenthal
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Jan Baumbach
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus Hoffmann
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Tim Kacprowski
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| | - Markus List
- Chair of Experimental Bioinformatics, TUM School of Life Sciences, Technical University of Munich, 85354 Freising, Germany
| |
Collapse
|
5
|
Garcia JA, Lohmueller KE. Negative linkage disequilibrium between amino acid changing variants reveals interference among deleterious mutations in the human genome. PLoS Genet 2021; 17:e1009676. [PMID: 34319975 PMCID: PMC8351996 DOI: 10.1371/journal.pgen.1009676] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 08/09/2021] [Accepted: 06/22/2021] [Indexed: 11/18/2022] Open
Abstract
Evolutionary forces like Hill-Robertson interference and negative epistasis can lead to deleterious mutations being found on distinct haplotypes. However, the extent to which these forces depend on the selection and dominance coefficients of deleterious mutations and shape genome-wide patterns of linkage disequilibrium (LD) in natural populations with complex demographic histories has not been tested. In this study, we first used forward-in-time simulations to predict how negative selection impacts LD. Under models where deleterious mutations have additive effects on fitness, deleterious variants less than 10 kb apart tend to be carried on different haplotypes relative to pairs of synonymous SNPs. In contrast, for recessive mutations, there is no consistent ordering of how selection coefficients affect LD decay, due to the complex interplay of different evolutionary effects. We then examined empirical data of modern humans from the 1000 Genomes Project. LD between derived alleles at nonsynonymous SNPs is lower compared to pairs of derived synonymous variants, suggesting that nonsynonymous derived alleles tend to occur on different haplotypes more than synonymous variants. This result holds when controlling for potential confounding factors by matching SNPs for frequency in the sample (allele count), physical distance, magnitude of background selection, and genetic distance between pairs of variants. Lastly, we introduce a new statistic HR(j) which allows us to detect interference using unphased genotypes. Application of this approach to high-coverage human genome sequences confirms our finding that nonsynonymous derived alleles tend to be located on different haplotypes more often than are synonymous derived alleles. Our findings suggest that interference may play a pervasive role in shaping patterns of LD between deleterious variants in the human genome, and consequently influences genome-wide patterns of LD.
Collapse
Affiliation(s)
- Jesse A. Garcia
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
| | - Kirk E. Lohmueller
- Interdepartmental Program in Bioinformatics, University of California, Los Angeles, California, United States of America
- Department of Ecology and Evolutionary Biology, University of California, Los Angeles, California, United States of America
- Department of Human Genetics, David Geffen School of Medicine, University of California, Los Angeles, California, United States of America
| |
Collapse
|
6
|
Donlon TA, Chen R, Masaki KH, Willcox DC, Allsopp RC, Willcox BJ, Morris BJ. Association of growth hormone receptor gene variant with longevity in men is due to amelioration of increased mortality risk from hypertension. Aging (Albany NY) 2021; 13:14745-14767. [PMID: 34074802 PMCID: PMC8221335 DOI: 10.18632/aging.203133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2021] [Accepted: 05/20/2021] [Indexed: 12/17/2022]
Abstract
The single nucleotide polymorphism (SNP) rs4130113 of the growth hormone receptor gene (GHR) is associated with longevity. Here we explored whether longevity-associated genotypes protect against mortality in all individuals, or only in individuals with aging-related diseases. Rs4130113 genotypes were tested for association with mortality in 3,557 elderly American men of Japanese ancestry. At baseline (1991–1993), 1,000 had diabetes, 730 had coronary heart disease (CHD), 1,901 had hypertension, 485 had cancer, and 919 lacked these diseases. The men were followed from baseline until Dec 31, 2019 or death (mean 10.8 ± 6.5 SD years, range 0.01–28.8 years; 99.0% deceased by that date). In a heterozygote disadvantage model, longevity-associated genotypes were associated with significantly lower mortality risk in individuals having hypertension (covariate-adjusted hazard ratio [HR] 0.83 [95% CI: 0.76–0.93, p = 4.3 x10–4]. But in individuals with diabetes, CHD, and cancer there was no genotypic difference in lifespan. As expected, normotensive men outlived men with hypertension (p = 0.036). There was no effect, however, of genotypic difference on lifespan in normotensive men (p = 0.11). We found that SNP rs4130113 potentially influenced the binding of transcription factors E2A, MYF, NRSF, TAL1, and TCF12 so as to alter GHR expression. We propose that in individuals with hypertension, longevity-associated genetic variation in GHR enhances cell resilience mechanisms to help protect against cellular stress caused by hypertension. As a result, hypertension-affected men who possess the longevity-associated genetic variant of GHR live as long as normotensive men.
Collapse
Affiliation(s)
- Timothy A Donlon
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA.,Department of Cell and Molecular Biology, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96813, USA.,Department of Pathology, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96813, USA
| | - Randi Chen
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA
| | - Kamal H Masaki
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA.,Department of Geriatric Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96817, USA
| | - D Craig Willcox
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA.,Department of Human Welfare, Okinawa International University, Ginowan, Okinawa 901-2701, Japan
| | - Richard C Allsopp
- Institute for Biogenesis Research, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96822, USA
| | - Bradley J Willcox
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA.,Department of Geriatric Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96817, USA
| | - Brian J Morris
- Department of Research, Kuakini Medical Center, Honolulu, HI 96817, USA.,Department of Geriatric Medicine, John A. Burns School of Medicine, University of Hawaii, Honolulu, HI 96817, USA.,School of Medical Sciences, University of Sydney, Sydney, New South Wales 2006, Australia
| |
Collapse
|
7
|
A differential evolution based feature combination selection algorithm for high-dimensional data. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2020.08.081] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
8
|
Yan KK, Zhao H, Wu JT, Pang H. An enhanced machine learning tool for cis-eQTL mapping with regularization and confounder adjustments. Genet Epidemiol 2020; 44:798-810. [PMID: 32700329 PMCID: PMC7875251 DOI: 10.1002/gepi.22341] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2020] [Revised: 07/07/2020] [Accepted: 07/07/2020] [Indexed: 11/07/2022]
Abstract
Many expression quantitative trait loci (eQTL) studies have been conducted to investigate the biological effects of variants in gene regulation. However, these eQTL studies may suffer from low or moderate statistical power and overly conservative false-discovery rate. In practice, most algorithms for eQTL identification do not model the joint effects of multiple genetic variants with weak or moderate influence. Here we present a novel machine-learning algorithm, lasso least-squares kernel machine (LSKM-LASSO) that model the association between multiple genetic variants and phenotypic traits simultaneously with the existence of nongenetic and genetic confounding. With a more general and flexible framework for the estimation of genetic confounding, LSKM-LASSO is able to provide a more accurate evaluation of the joint effects of multiple genetic variants. Our simulations demonstrate that our approach outperforms three state-of-the-art alternatives in terms of eQTL identification and phenotype prediction. We then apply our method to genotype and gene expression data of 11 tissues obtained from the Genotype-Tissue Expression project. Our algorithm was able to identify more genes with eQTL than other algorithms. By incorporating a regularization term and combining it with least-squares kernel machine, LSKM-LASSO provides a powerful tool for eQTL mapping and phenotype prediction.
Collapse
Affiliation(s)
- Kang K. Yan
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Hongyu Zhao
- Department of Biostatistics, Yale University, New Haven, Connecticut
| | - Joseph T. Wu
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| | - Herbert Pang
- School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China
| |
Collapse
|
9
|
Rau CD, Gonzales NM, Bloom JS, Park D, Ayroles J, Palmer AA, Lusis AJ, Zaitlen N. Modeling epistasis in mice and yeast using the proportion of two or more distinct genetic backgrounds: Evidence for "polygenic epistasis". PLoS Genet 2020; 16:e1009165. [PMID: 33104702 PMCID: PMC7644088 DOI: 10.1371/journal.pgen.1009165] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 11/05/2020] [Accepted: 10/02/2020] [Indexed: 12/22/2022] Open
Abstract
Background The majority of quantitative genetic models used to map complex traits assume that alleles have similar effects across all individuals. Significant evidence suggests, however, that epistatic interactions modulate the impact of many alleles. Nevertheless, identifying epistatic interactions remains computationally and statistically challenging. In this work, we address some of these challenges by developing a statistical test for polygenic epistasis that determines whether the effect of an allele is altered by the global genetic ancestry proportion from distinct progenitors. Results We applied our method to data from mice and yeast. For the mice, we observed 49 significant genotype-by-ancestry interaction associations across 14 phenotypes as well as over 1,400 Bonferroni-corrected genotype-by-ancestry interaction associations for mouse gene expression data. For the yeast, we observed 92 significant genotype-by-ancestry interactions across 38 phenotypes. Given this evidence of epistasis, we test for and observe evidence of rapid selection pressure on ancestry specific polymorphisms within one of the cohorts, consistent with epistatic selection. Conclusions Unlike our prior work in human populations, we observe widespread evidence of ancestry-modified SNP effects, perhaps reflecting the greater divergence present in crosses using mice and yeast. Many statistical tests which link genetic markers in the genome to differences in traits rely on the assumption that the same polymorphism will have identical effects in different individuals. However, there is substantial evidence indicating that this is not the case. Epistasis is the phenomenon in which multiple polymorphisms interact with one another to amplify or negate each other’s effects on a trait. We hypothesized that individual SNP effects could be changed in a polygenic manner, such that the proportion of as genetic ancestry, rather than specific markers, might be used to capture epistatic interactions. Motivated by this possibility, we develop a new statistical test that allowed us to examine the genome to identify polymorphisms which have different effects depending on the ancestral makeup of each individual. We use our test in two different populations of inbred mice and a yeast panel and demonstrate that these sorts of variable effect polymorphisms exist in 14 different physical traits in mice and 38 phenotypes in yeast as well as in murine gene expression. We use the term “polygenic epistasis” to distinguish these interactions from the more conventional two- or multi-locus interactions.
Collapse
Affiliation(s)
- Christoph D. Rau
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States of America
| | - Natalia M. Gonzales
- Department of Human Genetics, University of Chicago, Chicago, IL, United States of America
| | - Joshua S. Bloom
- Department of Human Genetics, UCLA, Los Angeles, CA, United States of America
| | - Danny Park
- Department of Medicine, UCSF, San Francisco, CA, United States of America
| | - Julien Ayroles
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ, United States of America
| | - Abraham A. Palmer
- Department of Psychiatry, and Institute for Genomic Medicine, UCSD, San Diego, CA, United States of America
| | - Aldons J. Lusis
- Department of Human Genetics, UCLA, Los Angeles, CA, United States of America
| | - Noah Zaitlen
- Department of Neurology, UCLA, Los Angeles, CA, United States of America
- * E-mail:
| |
Collapse
|
10
|
Kang J, Coates JT, Strawderman RL, Rosenstein BS, Kerns SL. Genomics models in radiotherapy: From mechanistic to machine learning. Med Phys 2020; 47:e203-e217. [PMID: 32418335 PMCID: PMC8725063 DOI: 10.1002/mp.13751] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Revised: 06/28/2019] [Accepted: 07/17/2019] [Indexed: 12/28/2022] Open
Abstract
Machine learning (ML) provides a broad framework for addressing high-dimensional prediction problems in classification and regression. While ML is often applied for imaging problems in medical physics, there are many efforts to apply these principles to biological data toward questions of radiation biology. Here, we provide a review of radiogenomics modeling frameworks and efforts toward genomically guided radiotherapy. We first discuss medical oncology efforts to develop precision biomarkers. We next discuss similar efforts to create clinical assays for normal tissue or tumor radiosensitivity. We then discuss modeling frameworks for radiosensitivity and the evolution of ML to create predictive models for radiogenomics.
Collapse
Affiliation(s)
- John Kang
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14642, USA
| | - James T. Coates
- CRUK/MRC Oxford Institute for Radiation Oncology, University of Oxford, Oxford OX3 7DQ, UK
| | - Robert L. Strawderman
- Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA
| | - Barry S. Rosenstein
- Department of Radiation Oncology and the Department of Genetics and Genomic Sciences, Icahn Institute for Data Science and Genomic Technology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sarah L. Kerns
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY 14642, USA
| |
Collapse
|
11
|
Xu L, Wu J, Lu W, Yang C, Liu H. Application of the Albumin-Bilirubin Grade in Predicting the Prognosis of Patients With Hepatocellular Carcinoma: A Systematic Review and Meta-Analysis. Transplant Proc 2019; 51:3338-3346. [PMID: 31732203 DOI: 10.1016/j.transproceed.2019.08.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Accepted: 08/30/2019] [Indexed: 02/05/2023]
Abstract
BACKGROUND The albumin-bilirubin (ALBI) grade has exhibited an equal excellence with the Child-Pugh (C-P) grade in predicting overall survival (OS) of patients with hepatocellular carcinoma (HCC). However, available published results of the ALBI grade in predicting the prognosis of HCC are still limited. The goal of this study is to perform a systematic review and meta-analysis of the available data to comprehensively evaluate the ALBI grade in predicting OS of patients with HCC. METHODS Multiple databases were systematically searched for eligible studies. Studies analyzing the relationship between the ALBI grade and survival outcome were identified. Hazard ratio (HR) with 95% confidence interval (CI) was calculated to assess the risk. All statistical analyses were conducted by R version 3.3.1 (The R Foundation for Statistical Computing, Vienna, Austria). RESULTS A total of 8 studies were enrolled in the meta-analysis. The pooled estimates demonstrated a significant relationship between elevated ALBI grade and inferior OS in patients with HCC (grade 1 vs 2: HR = 1.71, 95% CI: 1.52-1.92; grade 1 vs 3: HR = 3.81, 95% CI: 2.75-5.29.). In addition, the same tendency was observed when performing subgroup analysis, including treatment strategies (surgical resection, transcatheter arterial chemoembolization, radiofrequency ablation, and sorafenib) and study regions (Japan, Europe, China, and the USA). Moreover, the ALBI grade was able to classify patients with C-P grade A into 2 distinct prognostic cohorts-ALBI grade 1 and ALBI grade 2-with distinguishing survival outcomes (surgical resection: grade 1 vs 2: HR = 1.74, 95% CI: 1.55-2.06, P < .001; sorafenib: grade 1 vs 2: HR = 1.54, 95% CI: 1.30-1.82, P < .001). CONCLUSION The ALBI grade has the potency of becoming an independent prognostic factor in patients with HCC. More well-designed studies should be performed to evaluate the ALBI grade as a complementary prognostic tool to current staging systems in routine clinical practice.
Collapse
Affiliation(s)
- Lin Xu
- Laboratory of Liver Surgery, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Jing Wu
- Integrated TCM and Western Medicine Department, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Wenzhu Lu
- Integrated TCM and Western Medicine Department, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Chunmei Yang
- Integrated TCM and Western Medicine Department, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China
| | - Hong Liu
- Integrated TCM and Western Medicine Department, West China Hospital, Sichuan University, Chengdu, Sichuan, People's Republic of China.
| |
Collapse
|
12
|
Petty LE, Highland HM, Gamazon ER, Hu H, Karhade M, Chen HH, de Vries PS, Grove ML, Aguilar D, Bell GI, Huff CD, Hanis CL, Doddapaneni H, Munzy DM, Gibbs RA, Ma J, Parra EJ, Cruz M, Valladares-Salgado A, Arking DE, Barbeira A, Im HK, Morrison AC, Boerwinkle E, Below JE. Functionally oriented analysis of cardiometabolic traits in a trans-ethnic sample. Hum Mol Genet 2019; 28:1212-1224. [PMID: 30624610 PMCID: PMC6423424 DOI: 10.1093/hmg/ddy435] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Revised: 11/13/2018] [Accepted: 11/20/2018] [Indexed: 01/02/2023] Open
Abstract
Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-ancestry and African-ancestry populations and identified substantial predictive power using European-derived models in a non-European target population. We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset.
Collapse
Affiliation(s)
- Lauren E Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Heather M Highland
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA.,Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA
| | - Eric R Gamazon
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Clare Hall, University of Cambridge, Cambridge, UK
| | - Hao Hu
- Department of Epidemiology, MD Anderson Cancer Center, Houston, TX, USA
| | - Mandar Karhade
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Paul S de Vries
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Megan L Grove
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - David Aguilar
- Department of Cardiology, Baylor College of Medicine Houston, TX, USA
| | - Graeme I Bell
- Departments of Medicine and Human Genetics, The University of Chicago, Chicago, IL, USA
| | - Chad D Huff
- Department of Epidemiology, MD Anderson Cancer Center, Houston, TX, USA
| | - Craig L Hanis
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | | | - Donna M Munzy
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Richard A Gibbs
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Jianzhong Ma
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Esteban J Parra
- Department of Anthropology, University of Toronto at Mississauga, Mississauga, Ontario, Canada
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, IMSS, Mexico City, Mexico
| | - Adan Valladares-Salgado
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, IMSS, Mexico City, Mexico
| | - Dan E Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Alvaro Barbeira
- Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA
| | - Hae Kyung Im
- Section of Genetic Medicine, Department of Medicine, University of Chicago, IL, USA
| | - Alanna C Morrison
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jennifer E Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA.,Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| |
Collapse
|
13
|
Chen HH, Petty LE, Bush W, Naj AC, Below JE. GWAS and Beyond: Using Omics Approaches to Interpret SNP Associations. CURRENT GENETIC MEDICINE REPORTS 2019; 7:30-40. [PMID: 33312764 PMCID: PMC7731888 DOI: 10.1007/s40142-019-0159-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
PURPOSE OF REVIEW Neurodegenerative diseases, neuropsychiatric disorders, and related traits have highly complex etiologies but are also highly heritable and identifying the causal genes and biological pathways underlying these traits may advance the development of treatments and preventive strategies. While many genome-wide association studies (GWAS) have successfully identified variants contributing to polygenic neurodegenerative and neuropsychiatric phenotypes including Alzheimer's disease (AD), schizophrenia (SCZ), and bipolar disorder (BPD) amongst others, interpreting the biological roles of significantly-associated variants in the genetic architecture of these traits remains a significant challenge. Here we review several 'omics' approaches which attempt to bridge the gap from associated genetic variants to phenotype by helping define the functional roles of GWAS loci in the development of neuropsychiatric disorders and traits. RECENT FINDINGS Several common 'omics' approaches have been applied to examine neuropsychiatric traits, such as nearest-gene mapping, trans-ethnic fine mapping, annotation enrichment analysis, transcriptomic analysis, and pathway analysis, and each of these approaches has strengths and limitations in providing insight into biological mechanisms. One popular emerging method is the examination of tissue-specific genetically-regulated gene expression (GReX), which aggregates the genetic variants' effects at the gene-level. Furthermore, proteomic, metabolomic, and microbiomic studies and phenome-wide association studies will further enhance our understanding of neuropsychiatric traits. SUMMARY GWAS has been applied to neuropsychiatric traits for a decade, but our understanding about the biological function of identified variants remains limited. Today, technological advancements have created analytical approaches for integrating transcriptomics, metabolomics, proteomics, pharmacology and toxicology as tools for understanding the functional roles of genetics variants. These data, as well as the broader clinical information provided by electronic health records, can provide additional insight and complement genomic analyses.
Collapse
Affiliation(s)
- Hung-Hsin Chen
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - Lauren E. Petty
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| | - William Bush
- Institute for Computational Biology, Department of Population and Quantitative Health Sciences, School of Medicine, Case Western Reserve University, Cleveland, OH, USA
| | - Adam C. Naj
- Department of Biostatistics, Epidemiology, and Informatics; Department of Pathology and Laboratory Medicine; Center for Clinical Epidemiology and Biostatistics; Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
| | - Jennifer E. Below
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, USA
| |
Collapse
|
14
|
Guan B, Zhao Y. Self-Adjusting Ant Colony Optimization Based on Information Entropy for Detecting Epistatic Interactions. Genes (Basel) 2019; 10:genes10020114. [PMID: 30717303 PMCID: PMC6409693 DOI: 10.3390/genes10020114] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 01/21/2019] [Accepted: 01/28/2019] [Indexed: 12/15/2022] Open
Abstract
The epistatic interactions of single nucleotide polymorphisms (SNPs) are considered to be an important factor in determining the susceptibility of individuals to complex diseases. Although many methods have been proposed to detect such interactions, the development of detection algorithm is still ongoing due to the computational burden in large-scale association studies. In this paper, to deal with the intensive computing problem of detecting epistatic interactions in large-scale datasets, a self-adjusting ant colony optimization based on information entropy (IEACO) is proposed. The algorithm can automatically self-adjust the path selection strategy according to the real-time information entropy. The performance of IEACO is compared with that of ant colony optimization (ACO), AntEpiSeeker, AntMiner, and epiACO on a set of simulated datasets and a real genome-wide dataset. The results of extensive experiments show that the proposed method is superior to the other methods.
Collapse
Affiliation(s)
- Boxin Guan
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| | - Yuhai Zhao
- Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, and School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
| |
Collapse
|
15
|
Arabnejad M, Dawkins BA, Bush WS, White BC, Harkness AR, McKinney BA. Transition-transversion encoding and genetic relationship metric in ReliefF feature selection improves pathway enrichment in GWAS. BioData Min 2018; 11:23. [PMID: 30410580 PMCID: PMC6215626 DOI: 10.1186/s13040-018-0186-4] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2018] [Accepted: 10/22/2018] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND ReliefF is a nearest-neighbor based feature selection algorithm that efficiently detects variants that are important due to statistical interactions or epistasis. For categorical predictors, like genotypes, the standard metric used in ReliefF has been a simple (binary) mismatch difference. In this study, we develop new metrics of varying complexity that incorporate allele sharing, adjustment for allele frequency heterogeneity via the genetic relationship matrix (GRM), and physicochemical differences of variants via a new transition/transversion encoding. METHODS We introduce a new two-dimensional transition/transversion genotype encoding for ReliefF, and we implement three ReliefF attribute metrics: 1.) genotype mismatch (GM), which is the ReliefF standard, 2.) allele mismatch (AM), which accounts for heterozygous differences and has not been used previously in ReliefF, and 3.) the new transition/transversion metric. We incorporate these attribute metrics into the ReliefF nearest neighbor calculation with a Manhattan metric, and we introduce GRM as a new ReliefF nearest-neighbor metric to adjust for allele frequency heterogeneity. RESULTS We apply ReliefF with each metric to a GWAS of major depressive disorder and compare the detection of genes in pathways implicated in depression, including Axon Guidance, Neuronal System, and G Protein-Coupled Receptor Signaling. We also compare with detection by Random Forest and Lasso as well as random/null selection to assess pathway size bias. CONCLUSIONS Our results suggest that using more genetically motivated encodings, such as transition/transversion, and metrics that adjust for allele frequency heterogeneity, such as GRM, lead to ReliefF attribute scores with improved pathway enrichment.
Collapse
Affiliation(s)
- M. Arabnejad
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
| | - B. A. Dawkins
- Department of Mathematics, The University of Tulsa, Tulsa, OK 74104 USA
| | - W. S. Bush
- Institute for Computational Biology, Case Western Reserve University, 2103 Cornell Road, Cleveland, OH 44106 USA
| | - B. C. White
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
| | - A. R. Harkness
- Department of Psychology, The University of Tulsa, Tulsa, OK 74104 USA
| | - B. A. McKinney
- Tandy School of Computer Science, The University of Tulsa, 800 S. Tucker Dr, Tulsa, OK 74104 USA
- Department of Mathematics, The University of Tulsa, Tulsa, OK 74104 USA
| |
Collapse
|
16
|
Campbell RF, McGrath PT, Paaby AB. Analysis of Epistasis in Natural Traits Using Model Organisms. Trends Genet 2018; 34:883-898. [PMID: 30166071 PMCID: PMC6541385 DOI: 10.1016/j.tig.2018.08.002] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2018] [Revised: 06/06/2018] [Accepted: 08/03/2018] [Indexed: 12/16/2022]
Abstract
The ability to detect and understand epistasis in natural populations is important for understanding how biological traits are influenced by genetic variation. However, identification and characterization of epistasis in natural populations remains difficult due to statistical issues that arise as a result of multiple comparisons, and the fact that most genetic variants segregate at low allele frequencies. In this review, we discuss how model organisms may be used to manipulate genotypic combinations to power the detection of epistasis as well as test interactions between specific genes. Findings from a number of species indicate that statistical epistasis is pervasive between natural genetic variants. However, the properties of experimental systems that enable analysis of epistasis also constrain extrapolation of these results back into natural populations.
Collapse
Affiliation(s)
- Richard F Campbell
- Department of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332 USA
| | - Patrick T McGrath
- Department of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332 USA; Department of Physics, Georgia Institute of Technology, Atlanta, GA, 30332 USA.
| | - Annalise B Paaby
- Department of Biological Sciences, Georgia Institute of Technology, Atlanta, GA, 30332 USA
| |
Collapse
|
17
|
Albert FW, Bloom JS, Siegel J, Day L, Kruglyak L. Genetics of trans-regulatory variation in gene expression. eLife 2018; 7:e35471. [PMID: 30014850 PMCID: PMC6072440 DOI: 10.7554/elife.35471] [Citation(s) in RCA: 95] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 06/30/2018] [Indexed: 12/02/2022] Open
Abstract
Heritable variation in gene expression forms a crucial bridge between genomic variation and the biology of many traits. However, most expression quantitative trait loci (eQTLs) remain unidentified. We mapped eQTLs by transcriptome sequencing in 1012 yeast segregants. The resulting eQTLs accounted for over 70% of the heritability of mRNA levels, allowing comprehensive dissection of regulatory variation. Most genes had multiple eQTLs. Most expression variation arose from trans-acting eQTLs distant from their target genes. Nearly all trans-eQTLs clustered at 102 hotspot locations, some of which influenced the expression of thousands of genes. Fine-mapped hotspot regions were enriched for transcription factor genes. While most genes had a local eQTL, most of these had no detectable effects on the expression of other genes in trans. Hundreds of non-additive genetic interactions accounted for small fractions of expression variation. These results reveal the complexity of genetic influences on transcriptome variation in unprecedented depth and detail.
Collapse
Affiliation(s)
- Frank Wolfgang Albert
- Department of Genetics, Cell Biology and DevelopmentUniversity of MinnesotaMinneapolisUnited States
| | - Joshua S Bloom
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Jake Siegel
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Laura Day
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| | - Leonid Kruglyak
- Department of Human GeneticsUniversity of California, Los AngelesLos AngelesUnited States
- Department of Biological ChemistryUniversity of California, Los AngelesLos AngelesUnited States
- Howard Hughes Medical InstituteLos AngelesUnited States
| |
Collapse
|
18
|
Zekavat SM, Ruotsalainen S, Handsaker RE, Alver M, Bloom J, Poterba T, Seed C, Ernst J, Chaffin M, Engreitz J, Peloso GM, Manichaikul A, Yang C, Ryan KA, Fu M, Johnson WC, Tsai M, Budoff M, Vasan RS, Cupples LA, Rotter JI, Rich SS, Post W, Mitchell BD, Correa A, Metspalu A, Wilson JG, Salomaa V, Kellis M, Daly MJ, Neale BM, McCarroll S, Surakka I, Esko T, Ganna A, Ripatti S, Kathiresan S, Natarajan P. Deep coverage whole genome sequences and plasma lipoprotein(a) in individuals of European and African ancestries. Nat Commun 2018; 9:2606. [PMID: 29973585 PMCID: PMC6031652 DOI: 10.1038/s41467-018-04668-w] [Citation(s) in RCA: 66] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 05/15/2018] [Indexed: 02/06/2023] Open
Abstract
Lipoprotein(a), Lp(a), is a modified low-density lipoprotein particle that contains apolipoprotein(a), encoded by LPA, and is a highly heritable, causal risk factor for cardiovascular diseases that varies in concentrations across ancestries. Here, we use deep-coverage whole genome sequencing in 8392 individuals of European and African ancestry to discover and interpret both single-nucleotide variants and copy number (CN) variation associated with Lp(a). We observe that genetic determinants between Europeans and Africans have several unique determinants. The common variant rs12740374 associated with Lp(a) cholesterol is an eQTL for SORT1 and independent of LDL cholesterol. Observed associations of aggregates of rare non-coding variants are largely explained by LPA structural variation, namely the LPA kringle IV 2 (KIV2)-CN. Finally, we find that LPA risk genotypes confer greater relative risk for incident atherosclerotic cardiovascular diseases compared to directly measured Lp(a), and are significantly associated with measures of subclinical atherosclerosis in African Americans.
Collapse
Affiliation(s)
- Seyedeh M Zekavat
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Yale School of Medicine, New Haven, CT, 06510, USA
- Department of Computational Biology & Bioinformatics, Yale University, New Haven, CT, 06510, USA
| | - Sanni Ruotsalainen
- Institute for Molecular Medicine, University of Helsinki, Helsinki, Finland
| | - Robert E Handsaker
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Maris Alver
- Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
- Estonian Genome Center, Tallinn, Estonia
| | - Jonathan Bloom
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Timothy Poterba
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Cotton Seed
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Jason Ernst
- Department of Biological Chemistry, University of California, Los Angeles, Los Angeles, CA, 90095, USA
| | - Mark Chaffin
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Jesse Engreitz
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
| | - Ani Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22904, USA
| | - Chaojie Yang
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22904, USA
| | - Kathleen A Ryan
- Program in Personalized and Genomic Medicine, Division of Endocrinology, Diabetes & Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Mao Fu
- Program in Personalized and Genomic Medicine, Division of Endocrinology, Diabetes & Nutrition, Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - W Craig Johnson
- Department of Biostatistics, School of Public Health and Community Medicine, University of Washington, Seattle, WA, 98195, USA
| | - Michael Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Matthew Budoff
- Division of Cardiology, Harbor-UCLA Medical Center, Los Angeles Biomedical Research Institute, Los Angeles, CA, 90509, USA
| | - Ramachandran S Vasan
- NHLBI Framingham Heart Study, Framingham, MA, 20892, USA
- Sections of Preventive medicine and Epidemiology, and cardiovascular medicine, Departments of Medicine and Epidemiology, Boston university Schools of Medicine and Public health, Boston, MA, 02118, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, 02118, USA
- NHLBI Framingham Heart Study, Framingham, MA, 20892, USA
| | - Jerome I Rotter
- Departments of Pediatrics and Medicine, The Institute for Translational Genomics and Population Sciences, Los Angeles Biomedical Research Institute, Harbor-UCLA Medical Center, Torrance, CA, 90509, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22904, USA
| | - Wendy Post
- Division of Cardiology, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, 21201, USA
| | - Adolfo Correa
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | | | - James G Wilson
- Department of Medicine, University of Mississippi Medical Center, Jackson, MS, 39216, USA
| | - Veikko Salomaa
- National Institute for Health and Welfare, Helsinki, Finland
| | - Manolis Kellis
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, 32 Vassar St, Cambridge, MA, 02139, USA
| | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Steven McCarroll
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Department of Genetics, Harvard Medical School, Boston, MA, 02115, USA
| | - Ida Surakka
- Institute for Molecular Medicine, University of Helsinki, Helsinki, Finland
| | - Tonu Esko
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Estonian Genome Center, Tallinn, Estonia
| | - Andrea Ganna
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Analytic and Translational Genetics Unit, Boston, MA, 02142, USA
| | - Samuli Ripatti
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA
- Institute for Molecular Medicine, University of Helsinki, Helsinki, Finland
- Department of Public Health, Faculty of Medicine, University of Helsinki, Helsinki, Finland
| | - Sekar Kathiresan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA.
| | - Pradeep Natarajan
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA, 02142, USA.
- Department of Medicine, Harvard Medical School, Boston, MA, 02115, USA.
- Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, 02114, USA.
- Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, 02114, USA.
| |
Collapse
|
19
|
Kang J, Rancati T, Lee S, Oh JH, Kerns SL, Scott JG, Schwartz R, Kim S, Rosenstein BS. Machine Learning and Radiogenomics: Lessons Learned and Future Directions. Front Oncol 2018; 8:228. [PMID: 29977864 PMCID: PMC6021505 DOI: 10.3389/fonc.2018.00228] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 06/04/2018] [Indexed: 12/25/2022] Open
Abstract
Due to the rapid increase in the availability of patient data, there is significant interest in precision medicine that could facilitate the development of a personalized treatment plan for each patient on an individual basis. Radiation oncology is particularly suited for predictive machine learning (ML) models due to the enormous amount of diagnostic data used as input and therapeutic data generated as output. An emerging field in precision radiation oncology that can take advantage of ML approaches is radiogenomics, which is the study of the impact of genomic variations on the sensitivity of normal and tumor tissue to radiation. Currently, patients undergoing radiotherapy are treated using uniform dose constraints specific to the tumor and surrounding normal tissues. This is suboptimal in many ways. First, the dose that can be delivered to the target volume may be insufficient for control but is constrained by the surrounding normal tissue, as dose escalation can lead to significant morbidity and rare. Second, two patients with nearly identical dose distributions can have substantially different acute and late toxicities, resulting in lengthy treatment breaks and suboptimal control, or chronic morbidities leading to poor quality of life. Despite significant advances in radiogenomics, the magnitude of the genetic contribution to radiation response far exceeds our current understanding of individual risk variants. In the field of genomics, ML methods are being used to extract harder-to-detect knowledge, but these methods have yet to fully penetrate radiogenomics. Hence, the goal of this publication is to provide an overview of ML as it applies to radiogenomics. We begin with a brief history of radiogenomics and its relationship to precision medicine. We then introduce ML and compare it to statistical hypothesis testing to reflect on shared lessons and to avoid common pitfalls. Current ML approaches to genome-wide association studies are examined. The application of ML specifically to radiogenomics is next presented. We end with important lessons for the proper integration of ML into radiogenomics.
Collapse
Affiliation(s)
- John Kang
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY, United States
| | - Tiziana Rancati
- Prostate Cancer Program, Fondazione IRCCS Istituto Nazionale dei Tumori, Milan, Italy
| | - Sangkyu Lee
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Jung Hun Oh
- Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | - Sarah L. Kerns
- Department of Radiation Oncology, University of Rochester Medical Center, Rochester, NY, United States
| | - Jacob G. Scott
- Department of Translational Hematology and Oncology Research, Cleveland Clinic, Cleveland, OH, United States
- Department of Radiation Oncology, Cleveland Clinic, Cleveland, OH, United States
| | - Russell Schwartz
- Computational Biology Department, Carnegie Mellon School of Computer Science, Pittsburgh, PA, United States
- Department of Biological Sciences, Carnegie Mellon University, Pittsburgh, PA, United States
| | - Seyoung Kim
- Computational Biology Department, Carnegie Mellon School of Computer Science, Pittsburgh, PA, United States
| | - Barry S. Rosenstein
- Department of Radiation Oncology, Icahn School of Medicine at Mount Sinai, New York, NY, United States
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| |
Collapse
|
20
|
Weissbrod O, Rothschild D, Barkan E, Segal E. Host genetics and microbiome associations through the lens of genome wide association studies. Curr Opin Microbiol 2018; 44:9-19. [PMID: 29909175 DOI: 10.1016/j.mib.2018.05.003] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Revised: 03/15/2018] [Accepted: 05/25/2018] [Indexed: 12/22/2022]
Abstract
Recent studies indicate that the gut microbiome is partially heritable, motivating the need to investigate microbiome-host genome associations via microbial genome-wide association studies (mGWAS). Existing mGWAS demonstrate that microbiome-host genotype associations are typically weak and are spread across multiple variants, similar to associations often observed in genome-wide association studies (GWAS) of complex traits. Here we reconsider mGWAS by viewing them through the lens of GWAS, and demonstrate that there are striking similarities between the challenges and pitfalls faced by the two study designs. We further advocate the mGWAS community to adopt three key lessons learned over the history of GWAS: firstly, adopting uniform data and reporting formats to facilitate replication and meta-analysis efforts; secondly, enforcing stringent statistical criteria to reduce the number of false positive findings; and thirdly, considering the microbiome and the host genome as distinct entities, rather than studying different taxa and single nucleotide polymorphism (SNPs) separately. Finally, we anticipate that mGWAS sample sizes will have to increase by orders of magnitude to reproducibly associate the host genome with the gut microbiome.
Collapse
Affiliation(s)
- Omer Weissbrod
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Daphna Rothschild
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Elad Barkan
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel; Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel.
| |
Collapse
|
21
|
Brown R, Kichaev G, Mancuso N, Boocock J, Pasaniuc B. Enhanced methods to detect haplotypic effects on gene expression. Bioinformatics 2018; 33:2307-2313. [PMID: 28369161 DOI: 10.1093/bioinformatics/btx142] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 03/20/2017] [Indexed: 12/26/2022] Open
Abstract
Motivation Expression quantitative trait loci (eQTLs), genetic variants associated with gene expression levels, are identified in eQTL mapping studies. Such studies typically test for an association between single nucleotide polymorphisms (SNPs) and expression under an additive model, which ignores interaction and haplotypic effects. Mismatches between the model tested and the underlying genetic architecture can lead to a loss of association power. Here we introduce a new haplotype-based test for eQTL studies that looks for haplotypic effects on expression levels. Our test is motivated by compound heterozygous architectures, a common disease model for recessive monogenic disorders, where two different alleles can have the same effect on a gene's function. Results When the underlying true causal architecture for a simulated gene is a compound heterozygote, our method is better able to capture the signal than the marginal SNP method. When the underlying model is a single SNP, there is no difference in the power of our method relative to the marginal SNP method. We apply our method to empirical gene expression data measured in 373 European individuals from the GEUVADIS study and find 29 more eGenes (genes with at least one association) than the standard marginal SNP method. Furthermore, in 974 of the 3529 total eGenes, our haplotype-based method results in a stronger association signal than the standard marginal SNP method. This demonstrates our method both increases power over the standard method and provides evidence of haplotypic architectures regulating gene expression. Availability and Implementation http://bogdan.bioinformatics.ucla.edu/software/. Contact rob.brown@ucla.edu or pasaniuc@ucla.edu.
Collapse
Affiliation(s)
- Robert Brown
- Bioinformatics IDP, University of California Los Angeles, Los Angeles, CA, USA
| | - Gleb Kichaev
- Bioinformatics IDP, University of California Los Angeles, Los Angeles, CA, USA
| | | | - James Boocock
- Bioinformatics IDP, University of California Los Angeles, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics IDP, University of California Los Angeles, Los Angeles, CA, USA.,Department of Pathology and Laboratory Medicine.,Department of Human Genetics, Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
22
|
Mohammadi P, Castel SE, Brown AA, Lappalainen T. Quantifying the regulatory effect size of cis-acting genetic variation using allelic fold change. Genome Res 2017; 27:1872-1884. [PMID: 29021289 PMCID: PMC5668944 DOI: 10.1101/gr.216747.116] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 06/05/2017] [Indexed: 12/11/2022]
Abstract
Mapping cis-acting expression quantitative trait loci (cis-eQTL) has become a popular approach for characterizing proximal genetic regulatory variants. In this paper, we describe and characterize log allelic fold change (aFC), the magnitude of expression change associated with a given genetic variant, as a biologically interpretable unit for quantifying the effect size of cis-eQTLs and a mathematically convenient approach for systematic modeling of cis-regulation. This measure is mathematically independent from expression level and allele frequency, additive, applicable to multiallelic variants, and generalizable to multiple independent variants. We provide efficient tools and guidelines for estimating aFC from both eQTL and allelic expression data sets and apply it to Genotype Tissue Expression (GTEx) data. We show that aFC estimates independently derived from eQTL and allelic expression data are highly consistent, and identify technical and biological correlates of eQTL effect size. We generalize aFC to analyze genes with two eQTLs in GTEx and show that in nearly all cases the two eQTLs act independently in regulating gene expression. In summary, aFC is a solid measure of cis-regulatory effect size that allows quantitative interpretation of cellular regulatory events from population data, and it is a valuable approach for investigating novel aspects of eQTL data sets.
Collapse
Affiliation(s)
- Pejman Mohammadi
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| | - Stephane E Castel
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| | - Andrew A Brown
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, 1211, Switzerland
- Institute for Genetics and Genomics in Geneva (iGE3), University of Geneva, Geneva, 1211, Switzerland
- Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland
| | - Tuuli Lappalainen
- New York Genome Center, New York, New York 10013, USA
- Department of Systems Biology, Columbia University, New York, New York 10032, USA
| |
Collapse
|
23
|
Chen A, Liu Y, Williams SM, Morris N, Buchner DA. Widespread epistasis regulates glucose homeostasis and gene expression. PLoS Genet 2017; 13:e1007025. [PMID: 28961251 PMCID: PMC5636166 DOI: 10.1371/journal.pgen.1007025] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Revised: 10/11/2017] [Accepted: 09/17/2017] [Indexed: 02/07/2023] Open
Abstract
The relative contributions of additive versus non-additive interactions in the regulation of complex traits remains controversial. This may be in part because large-scale epistasis has traditionally been difficult to detect in complex, multi-cellular organisms. We hypothesized that it would be easier to detect interactions using mouse chromosome substitution strains that simultaneously incorporate allelic variation in many genes on a controlled genetic background. Analyzing metabolic traits and gene expression levels in the offspring of a series of crosses between mouse chromosome substitution strains demonstrated that inter-chromosomal epistasis was a dominant feature of these complex traits. Epistasis typically accounted for a larger proportion of the heritable effects than those due solely to additive effects. These epistatic interactions typically resulted in trait values returning to the levels of the parental CSS host strain. Due to the large epistatic effects, analyses that did not account for interactions consistently underestimated the true effect sizes due to allelic variation or failed to detect the loci controlling trait variation. These studies demonstrate that epistatic interactions are a common feature of complex traits and thus identifying these interactions is key to understanding their genetic regulation. Most complex traits and diseases are regulated by the combined influence of multiple genetic variants. However, it remains controversial whether these genetic variants independently influence complex traits, and therefore the impact of each variant could be simply added together (additivity), or whether the variants work together to influence trait variation, in which case the combined impact of multiple variants would differ from the summed impact of each individual variant (epistasis). In this study in mice, we discovered that the genetic regulation of blood sugar levels and gene expression in the liver were predominantly controlled by non-additive interactions, whereas body weight was predominantly controlled by additive interactions. Remarkably, the expression level of nearly 25% of all genes in the liver was controlled by non-additive interactions. The non-additive interactions typically acted to return trait values to the levels detected in control mice, thus contributing to a reduction in trait variation. We also demonstrated that not accounting for non-additive interactions significantly underestimated the phenotypic effect of a genetic variant on a particular genetic background, suggesting that many previously identified risk loci may have significantly larger effects on disease susceptibility in a subset of individuals. These studies highlight the importance of understanding interactions between genetic variants to better understand disease risk and personalize clinical care.
Collapse
Affiliation(s)
- Anlu Chen
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
| | - Yang Liu
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
| | - Scott M. Williams
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - Nathan Morris
- Department of Population and Quantitative Health Sciences, Case Western Reserve University, Cleveland, OH, United States of America
| | - David A. Buchner
- Department of Biochemistry, Case Western Reserve University, Cleveland, OH, United States of America
- Department of Genetics and Genome Sciences, Case Western Reserve University, Cleveland, OH, United States of America
- * E-mail:
| |
Collapse
|
24
|
Liu J, Yu G, Jiang Y, Wang J. HiSeeker: Detecting High-Order SNP Interactions Based on Pairwise SNP Combinations. Genes (Basel) 2017; 8:genes8060153. [PMID: 28561745 PMCID: PMC5485517 DOI: 10.3390/genes8060153] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Revised: 05/06/2017] [Accepted: 05/25/2017] [Indexed: 01/27/2023] Open
Abstract
Detecting single nucleotide polymorphisms’ (SNPs) interaction is one of the most popular approaches for explaining the missing heritability of common complex diseases in genome-wide association studies. Many methods have been proposed for SNP interaction detection, but most of them only focus on pairwise interactions and ignore high-order ones, which may also contribute to complex traits. Existing methods for high-order interaction detection can hardly handle genome-wide data and suffer from low detection power, due to the exponential growth of search space. In this paper, we proposed a flexible two-stage approach (called HiSeeker) to detect high-order interactions. In the screening stage, HiSeeker employs the chi-squared test and logistic regression model to efficiently obtain candidate pairwise combinations, which have intermediate or significant associations with the phenotype for interaction detection. In the search stage, two different strategies (exhaustive search and ant colony optimization-based search) are utilized to detect high-order interactions from candidate combinations. The experimental results on simulated datasets demonstrate that HiSeeker can more efficiently and effectively detect high-order interactions than related representative algorithms. On two real case-control datasets, HiSeeker also detects several significant high-order interactions, whose individual SNPs and pairwise interactions have no strong main effects or pairwise interaction effects, and these high-order interactions can hardly be identified by related algorithms.
Collapse
Affiliation(s)
- Jie Liu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Yuan Jiang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| |
Collapse
|
25
|
A fast algorithm for Bayesian multi-locus model in genome-wide association studies. Mol Genet Genomics 2017; 292:923-934. [DOI: 10.1007/s00438-017-1322-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 04/18/2017] [Indexed: 12/27/2022]
|
26
|
Liu X, Finucane HK, Gusev A, Bhatia G, Gazal S, O’Connor L, Bulik-Sullivan B, Wright FA, Sullivan PF, Neale BM, Price AL. Functional Architectures of Local and Distal Regulation of Gene Expression in Multiple Human Tissues. Am J Hum Genet 2017; 100:605-616. [PMID: 28343628 DOI: 10.1016/j.ajhg.2017.03.002] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2016] [Accepted: 02/24/2017] [Indexed: 12/12/2022] Open
Abstract
Genetic variants that modulate gene expression levels play an important role in the etiology of human diseases and complex traits. Although large-scale eQTL mapping studies routinely identify many local eQTLs, the molecular mechanisms by which genetic variants regulate expression remain unclear, particularly for distal eQTLs, which these studies are not well powered to detect. Here, we leveraged all variants (not just those that pass stringent significance thresholds) to analyze the functional architecture of local and distal regulation of gene expression in 15 human tissues by employing an extension of stratified LD-score regression that produces robust results in simulations. The top enriched functional categories in local regulation of peripheral-blood gene expression included coding regions (11.41×), conserved regions (4.67×), and four histone marks (p < 5 × 10-5 for all enrichments); local enrichments were similar across the 15 tissues. We also observed substantial enrichments for distal regulation of peripheral-blood gene expression: coding regions (4.47×), conserved regions (4.51×), and two histone marks (p < 3 × 10-7 for all enrichments). Analyses of the genetic correlation of gene expression across tissues confirmed that local regulation of gene expression is largely shared across tissues but that distal regulation is highly tissue specific. Our results elucidate the functional components of the genetic architecture of local and distal regulation of gene expression.
Collapse
|