1
|
Xu G, Amei A, Wu W, Liu Y, Shen L, Oh EC, Wang Z. RETROSPECTIVE VARYING COEFFICIENT ASSOCIATION ANALYSIS OF LONGITUDINAL BINARY TRAITS: APPLICATION TO THE IDENTIFICATION OF GENETIC LOCI ASSOCIATED WITH HYPERTENSION. Ann Appl Stat 2024; 18:487-505. [PMID: 38577266 PMCID: PMC10994004 DOI: 10.1214/23-aoas1798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/06/2024]
Abstract
Many genetic studies contain rich information on longitudinal phenotypes that require powerful analytical tools for optimal analysis. Genetic analysis of longitudinal data that incorporates temporal variation is important for understanding the genetic architecture and biological variation of complex diseases. Most of the existing methods assume that the contribution of genetic variants is constant over time and fail to capture the dynamic pattern of disease progression. However, the relative influence of genetic variants on complex traits fluctuates over time. In this study, we propose a retrospective varying coefficient mixed model association test, RVMMAT, to detect time-varying genetic effect on longitudinal binary traits. We model dynamic genetic effect using smoothing splines, estimate model parameters by maximizing a double penalized quasi-likelihood function, design a joint test using a Cauchy combination method, and evaluate statistical significance via a retrospective approach to achieve robustness to model misspecification. Through simulations we illustrated that the retrospective varying-coefficient test was robust to model misspecification under different ascertainment schemes and gained power over the association methods assuming constant genetic effect. We applied RVMMAT to a genome-wide association analysis of longitudinal measure of hypertension in the Multi-Ethnic Study of Atherosclerosis. Pathway analysis identified two important pathways related to G-protein signaling and DNA damage. Our results demonstrated that RVMMAT could detect biologically relevant loci and pathways in a genome scan and provided insight into the genetic architecture of hypertension.
Collapse
Affiliation(s)
- Gang Xu
- Department of Mathematical Sciences, University of Nevada
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada
| | - Weimiao Wu
- Department of Biostatistics, Yale School of Public Health
| | - Yunqing Liu
- Department of Biostatistics, Yale School of Public Health
| | - Linchuan Shen
- Department of Mathematical Sciences, University of Nevada
| | - Edwin C. Oh
- Department of Internal Medicine, University of Nevada School of Medicine
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health
| |
Collapse
|
2
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
3
|
Mbatchou J, Abney M, McPeek MS. BRASS: Permutation methods for binary traits in genetic association studies with structured samples. PLoS Genet 2023; 19:e1011020. [PMID: 37934792 PMCID: PMC10656004 DOI: 10.1371/journal.pgen.1011020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 11/17/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, New York, United States of America
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
4
|
Jin Y, Li D, Liu M, Cui Z, Sun D, Li C, Zhang A, Cao H, Ruan Y. Genome-Wide Association Study Identified Novel SNPs Associated with Chlorophyll Content in Maize. Genes (Basel) 2023; 14:genes14051010. [PMID: 37239370 DOI: 10.3390/genes14051010] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2023] [Revised: 04/24/2023] [Accepted: 04/26/2023] [Indexed: 05/28/2023] Open
Abstract
Chlorophyll is an essential component that captures light energy to drive photosynthesis. Chlorophyll content can affect photosynthetic activity and thus yield. Therefore, mining candidate genes of chlorophyll content will help increase maize production. Here, we performed a genome-wide association study (GWAS) on chlorophyll content and its dynamic changes in 378 maize inbred lines with extensive natural variation. Our phenotypic assessment showed that chlorophyll content and its dynamic changes were natural variations with a moderate genetic level of 0.66/0.67. A total of 19 single-nucleotide polymorphisms (SNPs) were found associated with 76 candidate genes, of which one SNP, 2376873-7-G, co-localized in chlorophyll content and area under the chlorophyll content curve (AUCCC). Zm00001d026568 and Zm00001d026569 were highly associated with SNP 2376873-7-G and encoded pentatricopeptide repeat-containing protein and chloroplastic palmitoyl-acyl carrier protein thioesterase, respectively. As expected, higher expression levels of these two genes are associated with higher chlorophyll contents. These results provide a certain experimental basis for discovering the candidate genes of chlorophyll content and finally provide new insights for cultivating high-yield and excellent maize suitable for planting environment.
Collapse
Affiliation(s)
- Yueting Jin
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Dan Li
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Meiling Liu
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Zhenhai Cui
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
- Key Laboratory of Soybean Molecular Design Breeding, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun 130102, China
| | - Daqiu Sun
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Cong Li
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Ao Zhang
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Huiying Cao
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
| | - Yanye Ruan
- College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China
- Liaoning Province Research Center of Plant Genetic Engineering Technology, Shenyang Key Laboratory of Maize Genomic Selection Breeding, Shenyang 110866, China
| |
Collapse
|
5
|
Ballinger ML, Pattnaik S, Mundra PA, Zaheed M, Rath E, Priestley P, Baber J, Ray-Coquard I, Isambert N, Causeret S, van der Graaf WTA, Puri A, Duffaud F, Le Cesne A, Seddon B, Chandrasekar C, Schiffman JD, Brohl AS, James PA, Kurtz JE, Penel N, Myklebost O, Meza-Zepeda LA, Pickett H, Kansara M, Waddell N, Kondrashova O, Pearson JV, Barbour AP, Li S, Nguyen TL, Fatkin D, Graham RM, Giannoulatou E, Green MJ, Kaplan W, Ravishankar S, Copty J, Powell JE, Cuppen E, van Eijk K, Veldink J, Ahn JH, Kim JE, Randall RL, Tucker K, Judson I, Sarin R, Ludwig T, Genin E, Deleuze JF, Haber M, Marshall G, Cairns MJ, Blay JY, Thomas DM, Tattersall M, Neuhaus S, Lewis C, Tucker K, Carey-Smith R, Wood D, Porceddu S, Dickinson I, Thorne H, James P, Ray-Coquard I, Blay JY, Cassier P, Le Cesne A, Duffaud F, Penel N, Isambert N, Kurtz JE, Puri A, Sarin R, Ahn JH, Kim JE, Ward I, Judson I, van der Graaf W, Seddon B, Chandrasekar C, Rickar R, Hennig I, Schiffman J, Randall RL, Silvestri A, Zaratzian A, Tayao M, Walwyn K, Niedermayr E, Mang D, Clark R, Thorpe T, MacDonald J, Riddell K, Mar J, Fennelly V, Wicht A, Zielony B, Galligan E, Glavich G, Stoeckert J, Williams L, Djandjgava L, Buettner I, Osinki C, Stephens S, Rogasik M, Bouclier L, Girodet M, Charreton A, Fayet Y, Crasto S, Sandupatla B, Yoon Y, Je N, Thompson L, Fowler T, Johnson B, Petrikova G, Hambridge T, Hutchins A, Bottero D, Scanlon D, Stokes-Denson J, Génin E, Campion D, Dartigues JF, Deleuze JF, Lambert JC, Redon R, Ludwig T, Grenier-Boley B, Letort S, Lindenbaum P, Meyer V, Quenez O, Dina C, Bellenguez C, Le Clézio CC, Giemza J, Chatel S, Férec C, Le Marec H, Letenneur L, Nicolas G, Rouault K. Heritable defects in telomere and mitotic function selectively predispose to sarcomas. Science 2023; 379:253-260. [PMID: 36656928 DOI: 10.1126/science.abj4784] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Accepted: 11/16/2022] [Indexed: 01/20/2023]
Abstract
Cancer genetics has to date focused on epithelial malignancies, identifying multiple histotype-specific pathways underlying cancer susceptibility. Sarcomas are rare malignancies predominantly derived from embryonic mesoderm. To identify pathways specific to mesenchymal cancers, we performed whole-genome germline sequencing on 1644 sporadic cases and 3205 matched healthy elderly controls. Using an extreme phenotype design, a combined rare-variant burden and ontologic analysis identified two sarcoma-specific pathways involved in mitotic and telomere functions. Variants in centrosome genes are linked to malignant peripheral nerve sheath and gastrointestinal stromal tumors, whereas heritable defects in the shelterin complex link susceptibility to sarcoma, melanoma, and thyroid cancers. These studies indicate a specific role for heritable defects in mitotic and telomere biology in risk of sarcomas.
Collapse
Affiliation(s)
- Mandy L Ballinger
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | - Swetansu Pattnaik
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | - Piyushkumar A Mundra
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | - Milita Zaheed
- Hereditary Cancer Centre, Prince of Wales Hospital, Sydney 2031, Australia
| | - Emma Rath
- Garvan Institute of Medical Research, Sydney 2010, Australia
| | - Peter Priestley
- Hartwig Medical Foundation, 1098 XH Amsterdam, Netherlands
- Hartwig Medical Foundation Australia, Sydney 2000, Australia
| | - Jonathan Baber
- Hartwig Medical Foundation, 1098 XH Amsterdam, Netherlands
- Hartwig Medical Foundation Australia, Sydney 2000, Australia
| | - Isabelle Ray-Coquard
- Department of Adult Medical Oncology, Centre Leon Berard, University Claude Bernard, 69373 Lyon, France
| | | | | | | | - Ajay Puri
- Department of Orthopedic Oncology, Tata Memorial Hospital, Mumbai, Maharashtra 400012, India
| | | | | | - Beatrice Seddon
- Sarcoma Unit, University College Hospital, London NW1 2BU, UK
| | | | - Joshua D Schiffman
- Division of Pediatric Hematology/Oncology, Department of Pediatrics, Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84112, USA
| | - Andrew S Brohl
- Sarcoma Department, Moffitt Cancer Center, Tampa, FL 33612, USA
| | - Paul A James
- The Sir Peter MacCallum Department of Oncology, University of Melbourne, Melbourne 3010, Australia
- Parkville Familial Cancer Centre, Peter MacCallum Cancer Centre and Royal Melbourne Hospital, Melbourne 3000, Australia
| | | | | | - Ola Myklebost
- Western Norway Familial Cancer Centre, Haukeland University Hospital, 5021 Bergen, Norway
- Department of Clinical Science, University of Bergen, 5007 Bergen, Norway
- Institute for Cancer Research, Oslo University Hospital, N-0424 Oslo, Norway
| | | | - Hilda Pickett
- Children's Medical Research Institute, The University of Sydney, Westmead 2145, Australia
| | - Maya Kansara
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | - Nicola Waddell
- QIMR Berghofer Medical Research Institute, Brisbane 4006, Australia
| | - Olga Kondrashova
- QIMR Berghofer Medical Research Institute, Brisbane 4006, Australia
| | - John V Pearson
- QIMR Berghofer Medical Research Institute, Brisbane 4006, Australia
| | - Andrew P Barbour
- Faculty of Medicine. The University of Queensland, Brisbane 4072, Australia
| | - Shuai Li
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne 3010, Australia
- Centre for Cancer Genetic Epidemiology, Department of Public Health and Primary Care, University of Cambridge, Cambridge CB1 8RN, UK
- Precision Medicine, School of Clinical Sciences at Monash Health, Monash University, Clayton 3800, Australia
- Murdoch Children's Research Institute, Royal Children's Hospital, Parkville 3051, Australia
| | - Tuong L Nguyen
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, Faculty of Medicine, Dentistry and Health Sciences, University of Melbourne, Melbourne 3010, Australia
| | - Diane Fatkin
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
- Molecular Cardiology Division, Victor Chang Cardiac Research Institute, Darlinghurst 2010, Australia
- Cardiology Department, St Vincent's Hospital, Sydney 2010, Australia
| | - Robert M Graham
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
- Molecular Cardiology Division, Victor Chang Cardiac Research Institute, Darlinghurst 2010, Australia
| | - Eleni Giannoulatou
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
- Computational Genomics Division, Victor Chang Cardiac Research Institute, Sydney 2010, Australia
| | - Melissa J Green
- School of Psychiatry, University of New South Wales, Sydney 2052, Australia
- Neuorscience Research Australia, Sydney 2031, Australia
| | - Warren Kaplan
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | | | - Joseph Copty
- Garvan Institute of Medical Research, Sydney 2010, Australia
| | - Joseph E Powell
- Garvan Institute of Medical Research, Sydney 2010, Australia
- UNSW Cellular Genomics Futures Institute, University of New South Wales, Sydney 2052, Australia
| | - Edwin Cuppen
- Hartwig Medical Foundation, 1098 XH Amsterdam, Netherlands
| | - Kristel van Eijk
- Department of Neurology, University Medical Centre Utrecht Brain Center, Utrecht University, 3584 CX Utrecht, Netherlands
| | - Jan Veldink
- Department of Neurology, University Medical Centre Utrecht Brain Center, Utrecht University, 3584 CX Utrecht, Netherlands
| | - Jin-Hee Ahn
- Department of Oncology, Asan Medical Centre, Seoul 05505, South Korea
| | - Jeong Eun Kim
- Department of Oncology, Asan Medical Centre, Seoul 05505, South Korea
| | - R Lor Randall
- Department of Orthopaedic Surgery, University of California, Davis Health, Sacramento, CA 95817, USA
| | - Kathy Tucker
- Hereditary Cancer Centre, Prince of Wales Hospital, Sydney 2031, Australia
| | - Ian Judson
- Sarcoma Unit, The Royal Marsden NHS Foundation Trust, London SW3 6JJ, UK
| | - Rajiv Sarin
- Cancer Genetics Unit, ACTREC, Tata Memorial Centre, Mumbai, Maharashtra 410210, India
| | - Thomas Ludwig
- Université de Brest, Inserm, EFS, UMR 1078, GGB, CHU de Brest, 29200 Brest, France
| | - Emmanuelle Genin
- Université de Brest, Inserm, EFS, UMR 1078, GGB, CHU de Brest, 29200 Brest, France
| | - Jean-Francois Deleuze
- Centre National de Recherche en Génomique Humaine, Institut de Génomique, 91057 Evry, France
| | - Michelle Haber
- Children's Cancer Institute, Lowy Cancer Research Centre, University of New South Wales, Kensington 2033, Australia
| | - Glenn Marshall
- Children's Cancer Institute, Lowy Cancer Research Centre, University of New South Wales, Kensington 2033, Australia
- Kids Cancer Centre, Sydney Children's Hospital, Randwick 2031, Australia
| | - Murray J Cairns
- School of Biomedical Sciences and Pharmacy, The University of Newcastle, Callaghan 2308, Australia
- Centre for Brain and Mental Health Research, The Hunter Medical Research Institute, Newcastle 2305, Australia
| | - Jean-Yves Blay
- Department of Adult Medical Oncology, Centre Leon Berard, University Claude Bernard, 69373 Lyon, France
| | - David M Thomas
- Garvan Institute of Medical Research, Sydney 2010, Australia
- St Vincent's Clinical School, University of New South Wales, Sydney 2010, Australia
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
6
|
Calbet‐Llopart N, Combalia M, Kiroglu A, Potrony M, Tell‐Martí G, Combalia A, Brugues A, Podlipnik S, Carrera C, Puig S, Malvehy J, Puig‐Butillé JA. Common genetic variants associated with melanoma risk or naevus count in patients with wildtype MC1R melanoma. Br J Dermatol 2022; 187:753-764. [PMID: 35701387 PMCID: PMC9804579 DOI: 10.1111/bjd.21707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2022] [Revised: 06/07/2022] [Accepted: 06/11/2022] [Indexed: 01/05/2023]
Abstract
BACKGROUND Hypomorphic MC1R variants are the most prevalent genetic determinants of melanoma risk in the white population. However, the genetic background of patients with wildtype (WT) MC1R melanoma is poorly studied. OBJECTIVES To analyse the role of candidate common genetic variants on the melanoma risk and naevus count in Spanish patients with WT MC1R melanoma. METHODS We examined 753 individuals with WT MC1R from Spain (497 patients and 256 controls). We used OpenArray reverse-transcriptase polymerase chain reaction to genotype a panel of 221 common genetic variants involved in melanoma, naevogenesis, hormonal pathways and proinflammatory pathways. Genetic models were tested using multivariate logistic regression models. Nonparametric multifactor dimensionality reduction (MDR) was used to detect gene-gene interactions within each biological subgroup of variants. RESULTS We found that variant rs12913832 in the HERC2 gene, which is associated with blue eye colour, increased melanoma risk in individuals with WT MC1R [odds ratio (OR) 1·97, 95% confidence interval (CI) 1·48-2·63; adjusted P < 0·001; corrected P < 0·001]. We also observed a trend between the rs3798577 variant in the oestrogen receptor alpha gene (ESR1) and a lower naevus count, which was restricted to female patients with WT MC1R (OR 0·51, 95% CI 0·33-0·79; adjusted P = 0·002; corrected P = 0·11). This sex-dependent association was statistically significant in a larger cohort of patients with melanoma regardless of their MC1R status (n = 1497; OR 0·71, 95% CI 0·57-0·88; adjusted P = 0·002), reinforcing the hypothesis of an association between hormonal pathways and susceptibility to melanocytic proliferation. Last, the MDR analysis revealed four genetic combinations associated with melanoma risk or naevus count in patients with WT MC1R. CONCLUSIONS Our data suggest that epistatic interaction among common variants related to melanocyte biology or proinflammatory pathways might influence melanocytic proliferation in individuals with WT MC1R. What is already known about this topic? Genetic variants in the MC1R gene are the most prevalent melanoma genetic risk factor in the white population. Still, 20-40% of cases of melanoma occur in individuals with wildtype MC1R. Multiple genetic variants have a pleiotropic effect in melanoma and naevogenesis. Additional variants in unexplored pathways might also have a role in melanocytic proliferation in these patients. Epidemiological evidence suggests an association of melanocytic proliferation with hormonal pathways and proinflammatory pathways. What does this study add? Variant rs12913832 in the HERC2 gene, which is associated with blue eye colour, increases the melanoma risk in individuals with wildtype MC1R. Variant rs3798577 in the oestrogen receptor gene is associated with naevus count regardless of the MC1R status in female patients with melanoma. We report epistatic interactions among common genetic variants with a role in modulating the risk of melanoma or the number of naevi in individuals with wildtype MC1R. What is the translational message? We report a potential role of hormonal signalling pathways in melanocytic proliferation, providing a basis for better understanding of sex-based differences observed at the epidemiological level. We show that gene-gene interactions among common genetic variants might be responsible for an increased risk for melanoma development in individuals with a low-risk phenotype, such as darkly pigmented hair and skin.
Collapse
Affiliation(s)
- Neus Calbet‐Llopart
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Marc Combalia
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Anil Kiroglu
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Miriam Potrony
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain,Biochemistry and Molecular Genetics DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Gemma Tell‐Martí
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Andrea Combalia
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Albert Brugues
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Sebastian Podlipnik
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| | - Cristina Carrera
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Susana Puig
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Josep Malvehy
- Dermatology DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain,Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain
| | - Joan Anton Puig‐Butillé
- Centro de Investigación Biomédica en Red de Enfermedades Raras (CIBERER)Instituto de Salud Carlos IIIBarcelonaSpain,Molecular Biology CORE, Biochemistry and Molecular Genetics DepartmentMelanoma Group, Hospital Clínic de Barcelona, IDIBAPS, University of BarcelonaBarcelonaSpain
| |
Collapse
|
7
|
Onifade M, Roy-Gagnon MH, Parent MÉ, Burkett KM. Comparison of mixed model based approaches for correcting for population substructure with application to extreme phenotype sampling. BMC Genomics 2022; 23:98. [PMID: 35120458 PMCID: PMC8815214 DOI: 10.1186/s12864-022-08297-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Accepted: 01/06/2022] [Indexed: 11/10/2022] Open
Abstract
Background Mixed models are used to correct for confounding due to population stratification and hidden relatedness in genome-wide association studies. This class of models includes linear mixed models and generalized linear mixed models. Existing mixed model approaches to correct for population substructure have been previously investigated with both continuous and case-control response variables. However, they have not been investigated in the context of extreme phenotype sampling (EPS), where genetic covariates are only collected on samples having extreme response variable values. In this work, we compare the performance of existing binary trait mixed model approaches (GMMAT, LEAP and CARAT) on EPS data. Since linear mixed models are commonly used even with binary traits, we also evaluate the performance of a popular linear mixed model implementation (GEMMA). Results We used simulation studies to estimate the type I error rate and power of all approaches assuming a population with substructure. Our simulation results show that for a common candidate variant, both LEAP and GMMAT control the type I error rate while CARAT’s rate remains inflated. We applied all methods to a real dataset from a Québec, Canada, case-control study that is known to have population substructure. We observe similar type I error control with the analysis on the Québec dataset. For rare variants, the false positive rate remains inflated even after correction with mixed model approaches. For methods that control the type I error rate, the estimated power is comparable. Conclusions The methods compared in this study differ in their type I error control. Therefore, when data are from an EPS study, care should be taken to ensure that the models underlying the methodology are suitable to the sampling strategy and to the minor allele frequency of the candidate SNPs. Supplementary Information The online version contains supplementary material available at (10.1186/s12864-022-08297-y).
Collapse
Affiliation(s)
- Maryam Onifade
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada
| | | | - Marie-Élise Parent
- Centre Armand-Frappier Santé Biotechnologie, Institut national de la recherche scientifique, Université du Québec, Laval, Canada
| | - Kelly M Burkett
- Department of Mathematics and Statistics, University of Ottawa, Ottawa, Canada.
| |
Collapse
|
8
|
Novel directions in data pre-processing and genome-wide association study (GWAS) methodologies to overcome ongoing challenges. INFORMATICS IN MEDICINE UNLOCKED 2021. [DOI: 10.1016/j.imu.2021.100586] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
9
|
Wu W, Wang Z, Xu K, Zhang X, Amei A, Gelernter J, Zhao H, Justice AC, Wang Z. Retrospective Association Analysis of Longitudinal Binary Traits Identifies Important Loci and Pathways in Cocaine Use. Genetics 2019; 213:1225-1236. [PMID: 31591132 PMCID: PMC6893384 DOI: 10.1534/genetics.119.302598] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Accepted: 10/04/2019] [Indexed: 12/15/2022] Open
Abstract
Longitudinal phenotypes have been increasingly available in genome-wide association studies (GWAS) and electronic health record-based studies for identification of genetic variants that influence complex traits over time. For longitudinal binary data, there remain significant challenges in gene mapping, including misspecification of the model for phenotype distribution due to ascertainment. Here, we propose L-BRAT (Longitudinal Binary-trait Retrospective Association Test), a retrospective, generalized estimating equation-based method for genetic association analysis of longitudinal binary outcomes. We also develop RGMMAT, a retrospective, generalized linear mixed model-based association test. Both tests are retrospective score approaches in which genotypes are treated as random conditional on phenotype and covariates. They allow both static and time-varying covariates to be included in the analysis. Through simulations, we illustrated that retrospective association tests are robust to ascertainment and other types of phenotype model misspecification, and gain power over previous association methods. We applied L-BRAT and RGMMAT to a genome-wide association analysis of repeated measures of cocaine use in a longitudinal cohort. Pathway analysis implicated association with opioid signaling and axonal guidance signaling pathways. Lastly, we replicated important pathways in an independent cocaine dependence case-control GWAS. Our results illustrate that L-BRAT is able to detect important loci and pathways in a genome scan and to provide insights into genetic architecture of cocaine use.
Collapse
Affiliation(s)
- Weimiao Wu
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Zhong Wang
- Baker Institute for Animal Health, Cornell University, Ithaca, New York 14850
| | - Ke Xu
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Xinyu Zhang
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Amei Amei
- Department of Mathematical Sciences, University of Nevada, Las Vegas, Nevada 89154
| | - Joel Gelernter
- Department of Psychiatry, Yale School of Medicine, New Haven, Connecticut 06511
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
| | - Hongyu Zhao
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| | - Amy C Justice
- VA Connecticut Healthcare System, West Haven, Connecticut 06516
- Department of Internal Medicine, Yale School of Medicine, New Haven, Connecticut 06511
| | - Zuoheng Wang
- Department of Biostatistics, Yale School of Public Health, New Haven, Connecticut 06520
| |
Collapse
|
10
|
Manduchi E, Orzechowski PR, Ritchie MD, Moore JH. Exploration of a diversity of computational and statistical measures of association for genome-wide genetic studies. BioData Min 2019; 12:14. [PMID: 31320928 PMCID: PMC6617598 DOI: 10.1186/s13040-019-0201-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Accepted: 06/14/2019] [Indexed: 01/03/2023] Open
Abstract
Background The principal line of investigation in Genome Wide Association Studies (GWAS) is the identification of main effects, that is individual Single Nucleotide Polymorphisms (SNPs) which are associated with the trait of interest, independent of other factors. A variety of methods have been proposed to this end, mostly statistical in nature and differing in assumptions and type of model employed. Moreover, for a given model, there may be multiple choices for the SNP genotype encoding. As an alternative to statistical methods, machine learning methods are often applicable. Typically, for a given GWAS, a single approach is selected and utilized to identify potential SNPs of interest. Even when multiple GWAS are combined through meta-analyses within a consortium, each GWAS is typically analyzed with a single approach and the resulting summary statistics are then utilized in meta-analyses. Results In this work we use as case studies a Type 2 Diabetes (T2D) and a breast cancer GWAS to explore a diversity of applicable approaches spanning different methods and encoding choices. We assess similarity of these approaches based on the derived ranked lists of SNPs and, for each GWAS, we identify a subset of representative approaches that we use as an ensemble to derive a union list of top SNPs. Among these are SNPs which are identified by multiple approaches as well as several SNPs identified by only one or a few of the less frequently used approaches. The latter include SNPs from established loci and SNPs which have other supporting lines of evidence in terms of their potential relevance to the traits. Conclusions Not every main effect analysis method is suitable for every GWAS, but for each GWAS there are typically multiple applicable methods and encoding options. We suggest a workflow for a single GWAS, extensible to multiple GWAS from consortia, where representative approaches are selected among a pool of suitable options, to yield a more comprehensive set of SNPs, potentially including SNPs that would typically be missed with the most popular analyses, but that could provide additional valuable insights for follow-up. Electronic supplementary material The online version of this article (10.1186/s13040-019-0201-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Elisabetta Manduchi
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| | - Patryk R Orzechowski
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| | - Marylyn D Ritchie
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,3Department of Genetics, University of Pennsylvania, Philadelphia, PA USA
| | - Jason H Moore
- 1Institute for Biomedical Informatics, University of Pennsylvania, Philadelphia, PA USA.,2Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA USA
| |
Collapse
|
11
|
Wang C, Deng S, Sun L, Li L, Hu YQ. A nonparametric test for association with multiple loci in the retrospective case-control study. Stat Methods Med Res 2019; 29:589-602. [PMID: 30987531 DOI: 10.1177/0962280219842892] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The genome-wide association studies aim at identifying common or rare variants associated with common diseases and explaining more heritability. It is well known that common diseases are influenced by multiple single nucleotide polymorphisms (SNPs) that are usually correlated in location or function. In order to powerfully detect association signals, it is highly desirable to take account of correlations or linkage disequilibrium (LD) information among multiple SNPs in testing for association. In this article, we propose a test SLIDE that depicts the difference of the average multi-locus genotypes between cases and controls and derive its variance-covariance matrix in the retrospective design. This matrix is composed of the pairwise LD between SNPs. Thus SLIDE can borrow the strength from an external database in the population of interest with a few thousands to hundreds of thousands individuals to improve the power for detecting association. Extensive simulations show that SLIDE has apparent superiority over the existing methods, especially in the situation involving both common and rare variants, both protective and deleterious variants. Furthermore, the efficiency of the proposed method is demonstrated in the application to the data from the Wellcome Trust Case Control Consortium.
Collapse
Affiliation(s)
- Chan Wang
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Division of Biostatistics, Department of Population Health, New York University School of Medicine, New York, NY, USA
| | - Shufang Deng
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Leiming Sun
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Liming Li
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China
| | - Yue-Qing Hu
- State Key Laboratory of Genetic Engineering, School of Life Sciences, Institute of Biostatistics, Fudan University, Shanghai, China.,Shanghai Center for Mathematical Sciences, Fudan University, Shanghai, China
| |
Collapse
|
12
|
Chien LC, Chiu YF. General retrospective mega-analysis framework for rare variant association tests. Genet Epidemiol 2018; 42:621-635. [PMID: 30188589 DOI: 10.1002/gepi.22147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Revised: 06/05/2018] [Accepted: 06/05/2018] [Indexed: 11/09/2022]
Abstract
Here, we describe a retrospective mega-analysis framework for gene- or region-based multimarker rare variant association tests. Our proposed mega-analysis association tests allow investigators to combine longitudinal and cross-sectional family- and/or population-based studies. This framework can be applied to a continuous, categorical, or survival trait. In addition to autosomal variants, the tests can be applied to conduct mega-analyses on X-chromosome variants. Tests were built on study-specific region- or gene-level quasiscore statistics and, therefore, do not require estimates of effects of individual rare variants. We used the generalized estimating equation approach to account for complex multiple correlation structures between family members, repeated measurements, and genetic markers. While accounting for multilevel correlations and heterogeneity across studies, the test statistics were computationally efficient and feasible for large-scale sequencing studies. The retrospective aspect of association tests helps alleviate bias due to phenotype-related sampling and type I errors due to misspecification of phenotypic distribution. We evaluated our developed mega-analysis methods through comprehensive simulations with varying sample sizes, covariates, population stratification structures, and study designs across multiple studies. To illustrate application of the proposed framework, we conducted a mega-association analysis combining a longitudinal family study and a cross-sectional case-control study from Genetic Analysis Workshop 19.
Collapse
Affiliation(s)
- Li-Chu Chien
- Center for Fundamental Science, Kaohsiung Medical University, Kaohsiung, Taiwan, ROC
| | - Yen-Feng Chiu
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, ROC
| |
Collapse
|
13
|
Two-way mixed-effects methods for joint association analysis using both host and pathogen genomes. Proc Natl Acad Sci U S A 2018; 115:E5440-E5449. [PMID: 29848634 DOI: 10.1073/pnas.1710980115] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Infectious diseases are often affected by specific pairings of hosts and pathogens and therefore by both of their genomes. The integration of a pair of genomes into genome-wide association mapping can provide an exquisitely detailed view of the genetic landscape of complex traits. We present a statistical method, ATOMM (Analysis with a Two-Organism Mixed Model), that maps a trait of interest to a pair of genomes simultaneously; this method makes use of whole-genome sequence data for both host and pathogen organisms. ATOMM uses a two-way mixed-effect model to test for genetic associations and cross-species genetic interactions while accounting for sample structure including interactions between the genetic backgrounds of the two organisms. We demonstrate the applicability of ATOMM to a joint association study of quantitative disease resistance (QDR) in the Arabidopsis thaliana-Xanthomonas arboricola pathosystem. Our method uncovers a clear host-strain specificity in QDR and provides a powerful approach to identify genetic variants on both genomes that contribute to phenotypic variation.
Collapse
|
14
|
Weissbrod O, Rahmani E, Schweiger R, Rosset S, Halperin E. Association testing of bisulfite-sequencing methylation data via a Laplace approximation. Bioinformatics 2018; 33:i325-i332. [PMID: 28881982 PMCID: PMC5870555 DOI: 10.1093/bioinformatics/btx248] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Motivation Epigenome-wide association studies can provide novel insights into the regulation of genes involved in traits and diseases. The rapid emergence of bisulfite-sequencing technologies enables performing such genome-wide studies at the resolution of single nucleotides. However, analysis of data produced by bisulfite-sequencing poses statistical challenges owing to low and uneven sequencing depth, as well as the presence of confounding factors. The recently introduced Mixed model Association for Count data via data AUgmentation (MACAU) can address these challenges via a generalized linear mixed model when confounding can be encoded via a single variance component. However, MACAU cannot be used in the presence of multiple variance components. Additionally, MACAU uses a computationally expensive Markov Chain Monte Carlo (MCMC) procedure, which cannot directly approximate the model likelihood. Results We present a new method, Mixed model Association via a Laplace ApproXimation (MALAX), that is more computationally efficient than MACAU and allows to model multiple variance components. MALAX uses a Laplace approximation rather than MCMC based approximations, which enables to directly approximate the model likelihood. Through an extensive analysis of simulated and real data, we demonstrate that MALAX successfully addresses statistical challenges introduced by bisulfite-sequencing while controlling for complex sources of confounding, and can be over 50% faster than the state of the art. Availability and Implementation The full source code of MALAX is available at https://github.com/omerwe/MALAX. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Omer Weissbrod
- Statistics Department, Tel Aviv University, Tel Aviv, Israel.,Computer Science Department, Technion - Israel Institute of Technology, Haifa, Israel
| | - Elior Rahmani
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Regev Schweiger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Saharon Rosset
- Statistics Department, Tel Aviv University, Tel Aviv, Israel
| | - Eran Halperin
- Computer Science Department, University of California Los Angeles, Los Angeles, CA, USA.,Department of Anesthesiology and Perioperative Medicine, University of California Los Angeles, Los Angeles, CA, USA
| |
Collapse
|
15
|
Wu X, McPeek MS. L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals. Am J Hum Genet 2018; 102:574-591. [PMID: 29625022 DOI: 10.1016/j.ajhg.2018.02.016] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Accepted: 02/20/2018] [Indexed: 01/11/2023] Open
Abstract
In complex-trait mapping, when each subject has multiple measurements of a quantitative trait over time, power for detecting genetic association can be gained by the inclusion of all measurements and not just single time points or averages in the analysis. To increase power and control type 1 error, one should account for dependence among observations for a single individual as well as dependence between observations of related individuals if they are present in the sample. We propose L-GATOR, a retrospective, mixed-effects method for association mapping of longitudinally measured traits in samples with related individuals. L-GATOR allows arbitrary time points for different individuals, incorporates both time-varying and static covariates, and properly addresses various types of dependence. In simulations, we show that L-GATOR outperforms existing prospective methods in terms of both type 1 error and power when there is phenotype model misspecification or missing data. Compared with the previously proposed longGWAS method, L-GATOR was more than ten times faster for association testing in our simulations and almost 100 times faster for parameter estimation. L-GATOR is applicable to essentially arbitrary combinations of related and unrelated individuals, including small families as well as large, complex pedigrees. We apply the method to data from the Framingham Heart Study to identify association between longitudinal systolic blood pressure measurements and genome-wide SNPs. Of the smallest p values, one-third occur in or near genes that have been previously identified as associated with pulse pressure (such as PIK3CG) and systolic and diastolic blood pressure (such as C10orf107), showing that L-GATOR is able to prioritize relevant loci in a genome screen.
Collapse
|
16
|
Lloyd-Jones LR, Robinson MR, Yang J, Visscher PM. Transformation of Summary Statistics from Linear Mixed Model Association on All-or-None Traits to Odds Ratio. Genetics 2018; 208:1397-1408. [PMID: 29429966 PMCID: PMC5887138 DOI: 10.1534/genetics.117.300360] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 01/25/2018] [Indexed: 12/15/2022] Open
Abstract
Genome-wide association studies (GWAS) have identified thousands of loci that are robustly associated with complex diseases. The use of linear mixed model (LMM) methodology for GWAS is becoming more prevalent due to its ability to control for population structure and cryptic relatedness and to increase power. The odds ratio (OR) is a common measure of the association of a disease with an exposure (e.g., a genetic variant) and is readably available from logistic regression. However, when the LMM is applied to all-or-none traits it provides estimates of genetic effects on the observed 0-1 scale, a different scale to that in logistic regression. This limits the comparability of results across studies, for example in a meta-analysis, and makes the interpretation of the magnitude of an effect from an LMM GWAS difficult. In this study, we derived transformations from the genetic effects estimated under the LMM to the OR that only rely on summary statistics. To test the proposed transformations, we used real genotypes from two large, publicly available data sets to simulate all-or-none phenotypes for a set of scenarios that differ in underlying model, disease prevalence, and heritability. Furthermore, we applied these transformations to GWAS summary statistics for type 2 diabetes generated from 108,042 individuals in the UK Biobank. In both simulation and real-data application, we observed very high concordance between the transformed OR from the LMM and either the simulated truth or estimates from logistic regression. The transformations derived and validated in this study improve the comparability of results from prospective and already performed LMM GWAS on complex diseases by providing a reliable transformation to a common comparative scale for the genetic effects.
Collapse
Affiliation(s)
- Luke R Lloyd-Jones
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
| | - Matthew R Robinson
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Department of Computational Biology, University of Lausanne, CH-1015, Switzerland
| | - Jian Yang
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Queensland Brain Institute, University of Queensland, Brisbane 4072, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane 4072, Australia
- Queensland Brain Institute, University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
17
|
McClure KA, Gardner KM, Douglas GM, Song J, Forney CF, DeLong J, Fan L, Du L, Toivonen PMA, Somers DJ, Rajcan I, Myles S. A Genome-Wide Association Study of Apple Quality and Scab Resistance. THE PLANT GENOME 2018; 11:170075. [PMID: 29505632 DOI: 10.3835/plantgenome2017.08.0075] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/17/2023]
Abstract
The apple ( × Borkh.) is an economically and culturally important crop grown worldwide. Growers of this long-lived perennial must produce fruit of adequate quality while also combatting abiotic and biotic stress. Traditional apple breeding can take up to 20 yr from initial cross to commercial release, but genomics-assisted breeding can help accelerate this process. To advance genomics-assisted breeding in apple, we performed genome-wide association studies (GWAS) and genomic prediction in a collection of 172 apple accessions by linking over 55,000 single nucleotide polymorphisms (SNPs) with 10 phenotypes collected over 2 yr. Genome-wide association studies revealed several known loci for skin color, harvest date and firmness at harvest. Several significant GWAS associations were detected for resistance to a major fungal pathogen, apple scab ( [Cke.] Wint.), but we demonstrate that these hits likely represent a single ancestral source. Using genomic prediction, we show that most phenotypes are sufficiently predictable using genome-wide SNPs to be candidates for genomic selection. Finally, we detect a signal for firmness retention after storage on chromosome 10 and show that it may not stem from variation in , a gene repeatedly identified in bi-parental mapping studies and widely believed to underlie a major QTL for firmness on chromosome 10. We provide evidence that this major QTL is more likely due to variation in a neighboring ethylene response factor (ERF) gene. The present study showcases the superior mapping resolution of GWAS compared to bi-parental linkage mapping by identifying a novel candidate gene underlying a well-studied, major QTL involved in apple firmness.
Collapse
|
18
|
Analysis of the human monocyte-derived macrophage transcriptome and response to lipopolysaccharide provides new insights into genetic aetiology of inflammatory bowel disease. PLoS Genet 2017; 13:e1006641. [PMID: 28263993 PMCID: PMC5358891 DOI: 10.1371/journal.pgen.1006641] [Citation(s) in RCA: 81] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 03/20/2017] [Accepted: 02/17/2017] [Indexed: 12/15/2022] Open
Abstract
The FANTOM5 consortium utilised cap analysis of gene expression (CAGE) to provide an unprecedented insight into transcriptional regulation in human cells and tissues. In the current study, we have used CAGE-based transcriptional profiling on an extended dense time course of the response of human monocyte-derived macrophages grown in macrophage colony-stimulating factor (CSF1) to bacterial lipopolysaccharide (LPS). We propose that this system provides a model for the differentiation and adaptation of monocytes entering the intestinal lamina propria. The response to LPS is shown to be a cascade of successive waves of transient gene expression extending over at least 48 hours, with hundreds of positive and negative regulatory loops. Promoter analysis using motif activity response analysis (MARA) identified some of the transcription factors likely to be responsible for the temporal profile of transcriptional activation. Each LPS-inducible locus was associated with multiple inducible enhancers, and in each case, transient eRNA transcription at multiple sites detected by CAGE preceded the appearance of promoter-associated transcripts. LPS-inducible long non-coding RNAs were commonly associated with clusters of inducible enhancers. We used these data to re-examine the hundreds of loci associated with susceptibility to inflammatory bowel disease (IBD) in genome-wide association studies. Loci associated with IBD were strongly and specifically (relative to rheumatoid arthritis and unrelated traits) enriched for promoters that were regulated in monocyte differentiation or activation. Amongst previously-identified IBD susceptibility loci, the vast majority contained at least one promoter that was regulated in CSF1-dependent monocyte-macrophage transitions and/or in response to LPS. On this basis, we concluded that IBD loci are strongly-enriched for monocyte-specific genes, and identified at least 134 additional candidate genes associated with IBD susceptibility from reanalysis of published GWA studies. We propose that dysregulation of monocyte adaptation to the environment of the gastrointestinal mucosa is the key process leading to inflammatory bowel disease. Macrophages are immune cells that form the first line of defense against pathogens, but also mediate tissue damage in inflammatory disease. Macrophages initiate inflammation by recognising and responding to components of bacterial cells. Macrophages of the wall of the gut are constantly replenished from the blood. Upon entering the intestine, newly-arrived cells modulate their response to stimuli derived from the bacteria in the wall of the gut. This process fails in chronic inflammatory bowel diseases (IBD). Both the major forms of IBD, Crohn’s disease and ulcerative colitis, run in families. The inheritance is complex, involving more than 200 different regions of the genome. We hypothesised that the genetic risk of IBD is associated specifically with altered regulation of genes that control the development of macrophages. In this study, we used the comprehensive transcriptome dataset produced by the FANTOM5 consortium to identify the sets of promoters and enhancers that are involved in adaptation of macrophages to the gut wall, their response to bacterial stimuli, and how their functions are integrated. A reanalysis of published genome-wide association data based upon regulated genes in monocytes as candidates strongly supports the view that susceptibility to IBD arises from a primary defect in macrophage differentiation.
Collapse
|
19
|
Hayeck TJ, Loh PR, Pollack S, Gusev A, Patterson N, Zaitlen NA, Price AL. Mixed Model Association with Family-Biased Case-Control Ascertainment. Am J Hum Genet 2017; 100:31-39. [PMID: 28017371 DOI: 10.1016/j.ajhg.2016.11.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 11/08/2016] [Indexed: 01/06/2023] Open
Abstract
Mixed models have become the tool of choice for genetic association studies; however, standard mixed model methods may be poorly calibrated or underpowered under family sampling bias and/or case-control ascertainment. Previously, we introduced a liability threshold-based mixed model association statistic (LTMLM) to address case-control ascertainment in unrelated samples. Here, we consider family-biased case-control ascertainment, where case and control subjects are ascertained non-randomly with respect to family relatedness. Previous work has shown that this type of ascertainment can severely bias heritability estimates; we show here that it also impacts mixed model association statistics. We introduce a family-based association statistic (LT-Fam) that is robust to this problem. Similar to LTMLM, LT-Fam is computed from posterior mean liabilities (PML) under a liability threshold model; however, LT-Fam uses published narrow-sense heritability estimates to avoid the problem of biased heritability estimation, enabling correct calibration. In simulations with family-biased case-control ascertainment, LT-Fam was correctly calibrated (average χ2 = 1.00-1.02 for null SNPs), whereas the Armitage trend test (ATT), standard mixed model association (MLM), and case-control retrospective association test (CARAT) were mis-calibrated (e.g., average χ2 = 0.50-1.22 for MLM, 0.89-2.65 for CARAT). LT-Fam also attained higher power than other methods in some settings. In 1,259 type 2 diabetes-affected case subjects and 5,765 control subjects from the CARe cohort, downsampled to induce family-biased ascertainment, LT-Fam was correctly calibrated whereas ATT, MLM, and CARAT were again mis-calibrated. Our results highlight the importance of modeling family sampling bias in case-control datasets with related samples.
Collapse
|
20
|
Zhong S, Jiang D, McPeek MS. CERAMIC: Case-Control Association Testing in Samples with Related Individuals, Based on Retrospective Mixed Model Analysis with Adjustment for Covariates. PLoS Genet 2016; 12:e1006329. [PMID: 27695091 PMCID: PMC5047592 DOI: 10.1371/journal.pgen.1006329] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2016] [Accepted: 08/29/2016] [Indexed: 12/15/2022] Open
Abstract
We consider the problem of genetic association testing of a binary trait in a sample that contains related individuals, where we adjust for relevant covariates and allow for missing data. We propose CERAMIC, an estimating equation approach that can be viewed as a hybrid of logistic regression and linear mixed-effects model (LMM) approaches. CERAMIC extends the recently proposed CARAT method to allow samples with related individuals and to incorporate partially missing data. In simulations, we show that CERAMIC outperforms existing LMM and generalized LMM approaches, maintaining high power and correct type 1 error across a wider range of scenarios. CERAMIC results in a particularly large power increase over existing methods when the sample includes related individuals with some missing data (e.g., when some individuals with phenotype and covariate information have missing genotype), because CERAMIC is able to make use of the relationship information to incorporate partially missing data in the analysis while correcting for dependence. Because CERAMIC is based on a retrospective analysis, it is robust to misspecification of the phenotype model, resulting in better control of type 1 error and higher power than that of prospective methods, such as GMMAT, when the phenotype model is misspecified. CERAMIC is computationally efficient for genomewide analysis in samples of related individuals of almost any configuration, including small families, unrelated individuals and even large, complex pedigrees. We apply CERAMIC to data on type 2 diabetes (T2D) from the Framingham Heart Study. In a genome scan, 9 of the 10 smallest CERAMIC p-values occur in or near either known T2D susceptibility loci or plausible candidates, verifying that CERAMIC is able to home in on the important loci in a genome scan.
Collapse
Affiliation(s)
- Sheng Zhong
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| | - Duo Jiang
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
| | - Mary Sara McPeek
- Department of Statistics, University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, University of Chicago, Chicago, Illinois, United States of America
- * E-mail:
| |
Collapse
|
21
|
Jiang D, Mbatchou J, McPeek MS. Retrospective Association Analysis of Binary Traits: Overcoming Some Limitations of the Additive Polygenic Model. Hum Hered 2016; 80:187-95. [PMID: 27576759 DOI: 10.1159/000446957] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Case-control genetic association analysis is an extremely common tool in human complex trait mapping. From a statistical point of view, the analysis of binary traits poses somewhat different challenges from the analysis of quantitative traits. Desirable features of a binary trait mapping approach would include (1) phenotype modeled as binary, with appropriate dependence between the mean and variance; (2) appropriate correction for relevant covariates; (3) appropriate correction for sample structure of various types, including related individuals, admixture and other types of population structure; (4) both fast and accurate computations; (5) robustness to ascertainment and other types of phenotype model misspecification, and (6) ability to leverage partially missing data to increase power. We review these challenges and argue, both theoretically and in simulations, for the value of retrospective association analysis as a way to overcome some of the limitations of the phenotype model, including model misspecification due to ascertainment. We give an overview of two recent retrospective methods, CARAT and CERAMIC, that are designed to meet criteria 1-6.
Collapse
|