1
|
Mbatchou J, McPeek MS. JASPER: Fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. Am J Hum Genet 2024; 111:1750-1769. [PMID: 39025064 DOI: 10.1016/j.ajhg.2024.06.010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 06/19/2024] [Accepted: 06/20/2024] [Indexed: 07/20/2024] Open
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction, and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks, or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture, and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits, and microbiome abundances. It allows for covariates, ascertainment, and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, most of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA; Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA; Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA.
| |
Collapse
|
2
|
Yu Z, Farage G, Williams RW, Broman KW, Sen Ś. BulkLMM: Real-time genome scans for multiple quantitative traits using linear mixed models. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.20.572698. [PMID: 38187625 PMCID: PMC10769382 DOI: 10.1101/2023.12.20.572698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Genetic studies often collect data using high-throughput phenotyping. That has led to the need for fast genomewide scans for large number of traits using linear mixed models (LMMs). Computing the scans one by one on each trait is time consuming. We have developed new algorithms for performing genome scans on a large number of quantitative traits using LMMs, BulkLMM, that speeds up the computation by orders of magnitude compared to one trait at a time scans. On a mouse BXD Liver Proteome data with more than 35,000 traits and 7,000 markers, BulkLMM completed in a few seconds. We use vectorized, multi-threaded operations and regularization to improve optimization, and numerical approximations to speed up the computations. Our software implementation in the Julia programming language also provides permutation testing for LMMs and is available at https://github.com/senresearch/BulkLMM.jl.
Collapse
Affiliation(s)
- Zifan Yu
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Gregory Farage
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Robert W Williams
- Department of Genetics, Genomics and Informatics, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| | - Karl W Broman
- Department of Biostatistics and Medical Informatics, University of Wisconsin - Madison, Madison, WI 53706, USA
| | - Śaunak Sen
- Department of Preventive Medicine, University of Tennessee Health Science Center, Memphis, TN 38163, USA
| |
Collapse
|
3
|
Mbatchou J, McPeek MS. JASPER: fast, powerful, multitrait association testing in structured samples gives insight on pleiotropy in gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.12.18.571948. [PMID: 38187553 PMCID: PMC10769254 DOI: 10.1101/2023.12.18.571948] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2024]
Abstract
Joint association analysis of multiple traits with multiple genetic variants can provide insight into genetic architecture and pleiotropy, improve trait prediction and increase power for detecting association. Furthermore, some traits are naturally high-dimensional, e.g., images, networks or longitudinally measured traits. Assessing significance for multitrait genetic association can be challenging, especially when the sample has population sub-structure and/or related individuals. Failure to adequately adjust for sample structure can lead to power loss and inflated type 1 error, and commonly used methods for assessing significance can work poorly with a large number of traits or be computationally slow. We developed JASPER, a fast, powerful, robust method for assessing significance of multitrait association with a set of genetic variants, in samples that have population sub-structure, admixture and/or relatedness. In simulations, JASPER has higher power, better type 1 error control, and faster computation than existing methods, with the power and speed advantage of JASPER increasing with the number of traits. JASPER is potentially applicable to a wide range of association testing applications, including for multiple disease traits, expression traits, image-derived traits and microbiome abundances. It allows for covariates, ascertainment and rare variants and is robust to phenotype model misspecification. We apply JASPER to analyze gene expression in the Framingham Heart Study, where, compared to alternative approaches, JASPER finds more significant associations, including several that indicate pleiotropic effects, some of which replicate previous results, while others have not previously been reported. Our results demonstrate the promise of JASPER for powerful multitrait analysis in structured samples.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, NY 10591, USA
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, IL 60637, USA
- Department of Human Genetics, The University of Chicago, Chicago, IL 60637, USA
| |
Collapse
|
4
|
Mbatchou J, Abney M, McPeek MS. BRASS: Permutation methods for binary traits in genetic association studies with structured samples. PLoS Genet 2023; 19:e1011020. [PMID: 37934792 PMCID: PMC10656004 DOI: 10.1371/journal.pgen.1011020] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2023] [Revised: 11/17/2023] [Accepted: 10/16/2023] [Indexed: 11/09/2023] Open
Abstract
In genetic association analysis of complex traits, permutation testing can be a valuable tool for assessing significance when the distribution of the test statistic is unknown or not well-approximated. This commonly arises, e.g, in tests of gene-set, pathway or genome-wide significance, or when the statistic is formed by machine learning or data adaptive methods. Existing applications include eQTL mapping, association testing with rare variants, inclusion of admixed individuals in genetic association analysis, and epistasis detection among many others. For genetic association testing in samples with population structure and/or relatedness, use of naive permutation can lead to inflated type 1 error. To address this in quantitative traits, the MVNpermute method was developed. However, for association mapping of a binary trait, the relationship between the mean and variance makes both naive permutation and the MVNpermute method invalid. We propose BRASS, a permutation method for binary traits, for use in association mapping in structured samples. In addition to modeling structure in the sample, BRASS allows for covariates, ascertainment and simultaneous testing of multiple markers, and it accommodates a wide range of test statistics. In simulation studies, we compare BRASS to other permutation and resampling-based methods in a range of scenarios that include population structure, familial relatedness, ascertainment and phenotype model misspecification. In these settings, we demonstrate the superior control of type 1 error by BRASS compared to the other 6 methods considered. We apply BRASS to assess genome-wide significance for association analyses in domestic dog for elbow dysplasia (ED) and idiopathic epilepsy (IE). For both traits we detect previously identified associations, and in addition, for ED, we detect significant association with a SNP on chromosome 35 that was not detected by previous analyses, demonstrating the potential of the method.
Collapse
Affiliation(s)
- Joelle Mbatchou
- Regeneron Genetics Center, Tarrytown, New York, United States of America
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mark Abney
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| | - Mary Sara McPeek
- Department of Statistics, The University of Chicago, Chicago, Illinois, United States of America
- Department of Human Genetics, The University of Chicago, Chicago, Illinois, United States of America
| |
Collapse
|
5
|
Wright KM, Deighan AG, Di Francesco A, Freund A, Jojic V, Churchill GA, Raj A. Age and diet shape the genetic architecture of body weight in diversity outbred mice. eLife 2022; 11:64329. [PMID: 35838135 PMCID: PMC9286741 DOI: 10.7554/elife.64329] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2020] [Accepted: 05/20/2022] [Indexed: 12/26/2022] Open
Abstract
Understanding how genetic variation shapes a complex trait relies on accurately quantifying both the additive genetic and genotype–environment interaction effects in an age-dependent manner. We used a linear mixed model to quantify diet-dependent genetic contributions to body weight measured through adulthood in diversity outbred female mice under five diets. We observed that heritability of body weight declined with age under all diets, except the 40% calorie restriction diet. We identified 14 loci with age-dependent associations and 19 loci with age- and diet-dependent associations, with many diet-dependent loci previously linked to neurological function and behavior in mice or humans. We found their allelic effects to be dynamic with respect to genomic background, age, and diet, identifying several loci where distinct alleles affect body weight at different ages. These results enable us to more fully understand and predict the effectiveness of dietary intervention on overall health throughout age in distinct genetic backgrounds. Body weight is one trait influenced by genes, age and environmental factors. Both internal and external environmental pressures are known to affect genetic variation over time. However, it is largely unknown how all factors – including age – interact to shape metabolism and bodyweight. Wright et al. set out to quantify the interactions between genes and diet in ageing mice and found that the effect of genetics on mouse body weight changes with age. In the experiments, Wright et al. weighed 960 female mice with diverse genetic backgrounds, starting at two months of age into adulthood. The animals were randomized to different diets at six months of age. Some mice had unlimited food access, others received 20% or 40% less calories than a typical mouse diet, and some fasted one or two days per week. Variations in their genetic background explained about 80% of differences in mice’s weight, but the influence of genetics relative to non-genetic factors decreased as they aged. Mice on the 40% calorie restriction diet were an exception to this rule and genetics accounted for 80% of their weight throughout adulthood, likely due to reduced influence from diet and reduced interactions between diet and genes. Several genes involved in metabolism, neurological function, or behavior, were associated with mouse weight. The experiments highlight the importance of considering interactions between genetics, environment, and age in determining complex traits like body weight. The results and the approaches used by Wright et al. may help other scientists learn more about how the genetic predisposition to disease changes with environmental stimuli and age.
Collapse
Affiliation(s)
- Kevin M Wright
- Calico Life Sciences LLC, South San Francisco, United States
| | | | | | - Adam Freund
- Calico Life Sciences LLC, South San Francisco, United States
| | - Vladimir Jojic
- Calico Life Sciences LLC, South San Francisco, United States
| | | | - Anil Raj
- Calico Life Sciences LLC, South San Francisco, United States
| |
Collapse
|
6
|
Epstein B, Burghardt LT, Heath KD, Grillo MA, Kostanecki A, Hämälä T, Young ND, Tiffin P. Combining GWAS and population genomic analyses to characterize coevolution in a legume-rhizobia symbiosis. Mol Ecol 2022. [PMID: 35793264 DOI: 10.1111/mec.16602] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 06/03/2022] [Accepted: 07/04/2022] [Indexed: 11/28/2022]
Abstract
The mutualism between legumes and rhizobia is clearly the product of past coevolution. However, the nature of ongoing evolution between these partners is less clear. To characterize the nature of recent coevolution between legumes and rhizobia, we used population genomic analysis to characterize selection on functionally annotated symbiosis genes as well as on symbiosis gene candidates identified through a two-species association analysis. For the association analysis, we inoculated each of 202 accessions of the legume host Medicago truncatula with a community of 88 Sinorhizobia (Ensifer) meliloti strains. Multistrain inoculation, which better reflects the ecological reality of rhizobial selection in nature than single-strain inoculation, allows strains to compete for nodulation opportunities and host resources and for hosts to preferentially form nodules and provide resources to some strains. We found extensive host by symbiont, that is, genotype-by-genotype, effects on rhizobial fitness and some annotated rhizobial genes bear signatures of recent positive selection. However, neither genes responsible for this variation nor annotated host symbiosis genes are enriched for signatures of either positive or balancing selection. This result suggests that stabilizing selection dominates selection acting on symbiotic traits and that variation in these traits is under mutation-selection balance. Consistent with the lack of positive selection acting on host genes, we found that among-host variation in growth was similar whether plants were grown with rhizobia or N-fertilizer, suggesting that the symbiosis may not be a major driver of variation in plant growth in multistrain contexts.
Collapse
Affiliation(s)
- Brendan Epstein
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, USA
| | - Liana T Burghardt
- Department of Plant Sciences, The University of Pennsylvania, University Park, Pennsylvania, USA
| | - Katy D Heath
- Department of Plant Biology, University of Illinois, Urbana, Illinois, USA.,Carl R. Woese Institute for Genomic Biology, University of Illinois, Urbana, Illinois, USA
| | - Michael A Grillo
- Department of Biology, Loyola University Chicago, Chicago, Illinois, USA
| | - Adam Kostanecki
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, USA
| | - Tuomas Hämälä
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, USA.,School of Life Sciences, University of Nottingham, Nottingham, UK
| | - Nevin D Young
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, USA.,Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota, USA
| | - Peter Tiffin
- Department of Plant and Microbial Biology, University of Minnesota, St. Paul, Minnesota, USA
| |
Collapse
|
7
|
Savriama Y, Tautz D. Testing the accuracy of 3D automatic landmarking via genome-wide association studies. G3 (BETHESDA, MD.) 2022; 12:jkab443. [PMID: 35100368 PMCID: PMC9210295 DOI: 10.1093/g3journal/jkab443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2021] [Accepted: 12/16/2021] [Indexed: 11/13/2022]
Abstract
Various advances in 3D automatic phenotyping and landmark-based geometric morphometric methods have been made. While it is generally accepted that automatic landmarking compromises the capture of the biological variation, no studies have directly tested the actual impact of such landmarking approaches in analyses requiring a large number of specimens and for which the precision of phenotyping is crucial to extract an actual biological signal adequately. Here, we use a recently developed 3D atlas-based automatic landmarking method to test its accuracy in detecting QTLs associated with craniofacial development of the house mouse skull and lower jaws for a large number of specimens (circa 700) that were previously phenotyped via a semiautomatic landmarking method complemented with manual adjustment. We compare both landmarking methods with univariate and multivariate mapping of the skull and the lower jaws. We find that most significant SNPs and QTLs are not recovered based on the data derived from the automatic landmarking method. Our results thus confirm the notion that information is lost in the automated landmarking procedure although somewhat dependent on the analyzed structure. The automatic method seems to capture certain types of structures slightly better, such as lower jaws whose shape is almost entirely summarized by its outline and could be assimilated as a 2D flat object. By contrast, the more apparent 3D features exhibited by a structure such as the skull are not adequately captured by the automatic method. We conclude that using 3D atlas-based automatic landmarking methods requires careful consideration of the experimental question.
Collapse
Affiliation(s)
- Yoland Savriama
- Department Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| | - Diethard Tautz
- Department Evolutionary Genetics, Max-Planck Institute for Evolutionary Biology, 24306 Plön, Germany
| |
Collapse
|
8
|
Zou J, Gopalakrishnan S, Parker CC, Nicod J, Mott R, Cai N, Lionikas A, Davies RW, Palmer AA, Flint J. Analysis of independent cohorts of outbred CFW mice reveals novel loci for behavioral and physiological traits and identifies factors determining reproducibility. G3 (BETHESDA, MD.) 2022; 12:jkab394. [PMID: 34791208 PMCID: PMC8728023 DOI: 10.1093/g3journal/jkab394] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2021] [Accepted: 10/17/2021] [Indexed: 12/12/2022]
Abstract
Combining samples for genetic association is standard practice in human genetic analysis of complex traits, but is rarely undertaken in rodent genetics. Here, using 23 phenotypes and genotypes from two independent laboratories, we obtained a sample size of 3076 commercially available outbred mice and identified 70 loci, more than double the number of loci identified in the component studies. Fine-mapping in the combined sample reduced the number of likely causal variants, with a median reduction in set size of 51%, and indicated novel gene associations, including Pnpo, Ttll6, and GM11545 with bone mineral density, and Psmb9 with weight. However, replication at a nominal threshold of 0.05 between the two component studies was low, with less than one-third of loci identified in one study replicated in the second. In addition to overestimates in the effect size in the discovery sample (Winner's Curse), we also found that heterogeneity between studies explained the poor replication, but the contribution of these two factors varied among traits. Leveraging these observations, we integrated information about replication rates, study-specific heterogeneity, and Winner's Curse corrected estimates of power to assign variants to one of four confidence levels. Our approach addresses concerns about reproducibility and demonstrates how to obtain robust results from mapping complex traits in any genome-wide association study.
Collapse
Affiliation(s)
- Jennifer Zou
- Department of Computer Science, University of California, Los Angeles, CA 90024, USA
| | - Shyam Gopalakrishnan
- Faculty of Health and Medical Sciences, GLOBE Institute, University of Copenhagen, Copenhagen DK-1353, Denmark
| | - Clarissa C Parker
- Department of Psychology and Program in Neuroscience, Middlebury College, Middlebury, VT 05753, USA
| | | | - Richard Mott
- UCL Department of Genetics, Evolution & Environment, UCL Genetics Institute, London WC1E 6BT, UK
| | - Na Cai
- Helmholtz Zentrum Muenchen, Helmoltz Pioneer Campus, Neuherberg 85764, Germany
| | - Arimantas Lionikas
- School of Medicine, Medical Sciences and Nutrition, College of Life Sciences and Medicine, University of Aberdeen, Aberdeen AB24 3FX, UK
| | - Robert W Davies
- Department of Statistics, University of Oxford, Oxford OX1 2JD, UK
| | - Abraham A Palmer
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92093, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jonathan Flint
- Department of Biobehavioral Sciences, University of California, Los Angeles, CA 90024, USA
| |
Collapse
|
9
|
Andersen EC, Rockman MV. Natural genetic variation as a tool for discovery in Caenorhabditis nematodes. Genetics 2022; 220:iyab156. [PMID: 35134197 PMCID: PMC8733454 DOI: 10.1093/genetics/iyab156] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2021] [Accepted: 09/11/2021] [Indexed: 11/12/2022] Open
Abstract
Over the last 20 years, studies of Caenorhabditis elegans natural diversity have demonstrated the power of quantitative genetic approaches to reveal the evolutionary, ecological, and genetic factors that shape traits. These studies complement the use of the laboratory-adapted strain N2 and enable additional discoveries not possible using only one genetic background. In this chapter, we describe how to perform quantitative genetic studies in Caenorhabditis, with an emphasis on C. elegans. These approaches use correlations between genotype and phenotype across populations of genetically diverse individuals to discover the genetic causes of phenotypic variation. We present methods that use linkage, near-isogenic lines, association, and bulk-segregant mapping, and we describe the advantages and disadvantages of each approach. The power of C. elegans quantitative genetic mapping is best shown in the ability to connect phenotypic differences to specific genes and variants. We will present methods to narrow genomic regions to candidate genes and then tests to identify the gene or variant involved in a quantitative trait. The same features that make C. elegans a preeminent experimental model animal contribute to its exceptional value as a tool to understand natural phenotypic variation.
Collapse
Affiliation(s)
- Erik C Andersen
- Department of Molecular Biosciences, Northwestern University, Evanston, IL 60201, USA
| | - Matthew V Rockman
- Department of Biology and Center for Genomics & Systems Biology, New York University, New York, NY 10003, USA
| |
Collapse
|
10
|
Asif H, Alliey-Rodriguez N, Keedy S, Tamminga CA, Sweeney JA, Pearlson G, Clementz BA, Keshavan MS, Buckley P, Liu C, Neale B, Gershon ES. GWAS significance thresholds for deep phenotyping studies can depend upon minor allele frequencies and sample size. Mol Psychiatry 2021; 26:2048-2055. [PMID: 32066829 PMCID: PMC7429341 DOI: 10.1038/s41380-020-0670-3] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Revised: 01/28/2020] [Accepted: 01/29/2020] [Indexed: 02/01/2023]
Abstract
An important issue affecting genome-wide association studies with deep phenotyping (multiple correlated phenotypes) is determining the suitable family-wise significance threshold. Straightforward family-wise correction (Bonferroni) of p < 0.05 for 4.3 million genotypes and 335 phenotypes would give a threshold of p < 3.46E-11. This would be too conservative because it assumes all tests are independent. The effective number of tests, both phenotypic and genotypic, must be adjusted for the correlations between them. Spectral decomposition of the phenotype matrix and LD-based correction of the number of tested SNPs are currently used to determine an effective number of tests. In this paper, we compare these calculated estimates with permutation-determined family-wise significance thresholds. Permutations are performed by shuffling individual IDs of the genotype vector for this dataset, to preserve correlation of phenotypes. Our results demonstrate that the permutation threshold is influenced by minor allele frequency (MAF) of the SNPs, and by the number of individuals tested. For the more common SNPs (MAF > 0.1), the permutation family-wise threshold was in close agreement with spectral decomposition methods. However, for less common SNPs (0.05 < MAF ≤ 0.1), the permutation threshold calculated over all SNPs was off by orders of magnitude. This applies to the number of individuals studied (here 777) but not to very much larger numbers. Based on these findings, we propose that the threshold to find a particular level of family-wise significance may need to be established using separate permutations of the actual data for several MAF bins.
Collapse
Affiliation(s)
- Huma Asif
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| | - Ney Alliey-Rodriguez
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Sarah Keedy
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA
| | - Carol A Tamminga
- Department of Psychiatry, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - John A Sweeney
- Department of Psychiatry, University of Cincinnati, Cincinnati, OH, USA
| | - Godfrey Pearlson
- Departments of Psychiatry & Neuroscience, Yale University, New Haven, CT, USA
| | - Brett A Clementz
- Department of Psychology, University of Georgia, Athens, GA, USA
| | | | | | - Chunyu Liu
- Department of Psychiatry, SUNY Upstate Medical University, Binghamton, NY, USA
| | | | - Elliot S Gershon
- Department of Psychiatry and Behavioral Neurosciences, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA. .,Department of Human Genetics, University of Chicago, 924 East 57th Street Room. R016, Chicago, IL, 60637, USA.
| |
Collapse
|
11
|
Mueller JC, Carrete M, Boerno S, Kuhl H, Tella JL, Kempenaers B. Genes acting in synapses and neuron projections are early targets of selection during urban colonization. Mol Ecol 2020; 29:3403-3412. [DOI: 10.1111/mec.15451] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 04/08/2020] [Indexed: 02/06/2023]
Affiliation(s)
- Jakob C. Mueller
- Department of Behavioural Ecology & Evolutionary Genetics Max Planck Institute for Ornithology Seewiesen Germany
| | - Martina Carrete
- Department of Conservation Biology Estación Biológica de Doñana – CSIC Sevilla Spain
- Department of Physical, Chemical and Natural Systems University Pablo de Olavide Sevilla Spain
| | - Stefan Boerno
- Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin Germany
| | - Heiner Kuhl
- Sequencing Core Facility Max Planck Institute for Molecular Genetics Berlin Germany
- Department of Ecophysiology and Aquaculture Leibniz‐Institute of Freshwater Ecology and Inland Fisheries Berlin Germany
| | - José L. Tella
- Department of Conservation Biology Estación Biológica de Doñana – CSIC Sevilla Spain
| | - Bart Kempenaers
- Department of Behavioural Ecology & Evolutionary Genetics Max Planck Institute for Ornithology Seewiesen Germany
| |
Collapse
|
12
|
Powell DL, García-Olazábal M, Keegan M, Reilly P, Du K, Díaz-Loyo AP, Banerjee S, Blakkan D, Reich D, Andolfatto P, Rosenthal GG, Schartl M, Schumer M. Natural hybridization reveals incompatible alleles that cause melanoma in swordtail fish. Science 2020; 368:731-736. [PMID: 32409469 PMCID: PMC8074799 DOI: 10.1126/science.aba5216] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/27/2020] [Indexed: 12/21/2022]
Abstract
The establishment of reproductive barriers between populations can fuel the evolution of new species. A genetic framework for this process posits that "incompatible" interactions between genes can evolve that result in reduced survival or reproduction in hybrids. However, progress has been slow in identifying individual genes that underlie hybrid incompatibilities. We used a combination of approaches to map the genes that drive the development of an incompatibility that causes melanoma in swordtail fish hybrids. One of the genes involved in this incompatibility also causes melanoma in hybrids between distantly related species. Moreover, this melanoma reduces survival in the wild, likely because of progressive degradation of the fin. This work identifies genes underlying a vertebrate hybrid incompatibility and provides a glimpse into the action of these genes in natural hybrid populations.
Collapse
Affiliation(s)
- Daniel L Powell
- Department of Biology, Stanford University and Howard Hughes Medical Institute, Stanford, CA, USA.
- Centro de Investigaciones Científicas de las Huastecas "Aguazarca", A.C., Calnali, Hidalgo, Mexico
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Mateo García-Olazábal
- Centro de Investigaciones Científicas de las Huastecas "Aguazarca", A.C., Calnali, Hidalgo, Mexico
- Department of Biology, Texas A&M University, College Station, TX, USA
| | | | - Patrick Reilly
- Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA
| | - Kang Du
- Developmental Biochemistry, Biocenter, University of Würzburg, Würzburg, Bavaria, Germany
| | - Alejandra P Díaz-Loyo
- Laboratorio de Ecología de la Conducta, Instituto de Fisiología, Benemérita Universidad Autónoma de Puebla, Puebla, Mexico
| | - Shreya Banerjee
- Department of Biology, Stanford University and Howard Hughes Medical Institute, Stanford, CA, USA
| | - Danielle Blakkan
- Department of Biology, Stanford University and Howard Hughes Medical Institute, Stanford, CA, USA
| | - David Reich
- Department of Genetics, Harvard Medical School, Howard Hughes Medical Institute, and the Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Department of Human Evolutionary Biology, Harvard University, Cambridge, MA 02138, USA
| | - Peter Andolfatto
- Department of Biological Sciences, Columbia University, New York, NY, USA
| | - Gil G Rosenthal
- Centro de Investigaciones Científicas de las Huastecas "Aguazarca", A.C., Calnali, Hidalgo, Mexico
- Department of Biology, Texas A&M University, College Station, TX, USA
| | - Manfred Schartl
- Centro de Investigaciones Científicas de las Huastecas "Aguazarca", A.C., Calnali, Hidalgo, Mexico
- Department of Biology, Texas A&M University, College Station, TX, USA
- Developmental Biochemistry, Biocenter, University of Würzburg, Würzburg, Bavaria, Germany
- Hagler Institute for Advanced Study, Texas A&M University, College Station, TX, USA
- Xiphophorus Genetic Stock Center, Texas State University San Marcos, San Marcos, TX, USA
| | - Molly Schumer
- Department of Biology, Stanford University and Howard Hughes Medical Institute, Stanford, CA, USA.
| |
Collapse
|
13
|
Identifying genetic variants underlying phenotypic variation in plants without complete genomes. Nat Genet 2020; 52:534-540. [PMID: 32284578 PMCID: PMC7610390 DOI: 10.1038/s41588-020-0612-7] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2019] [Accepted: 03/10/2020] [Indexed: 12/11/2022]
Abstract
Structural variants and presence/absence polymorphisms are common in plant genomes, yet they are routinely overlooked in genome-wide association studies (GWAS). Here, we expand the type of genetic variants detected in GWAS to include major deletions, insertions and rearrangements. We first use raw sequencing data directly to derive short sequences, k-mers, that mark a broad range of polymorphisms independently of a reference genome. We then link k-mers associated with phenotypes to specific genomic regions. Using this approach, we reanalyzed 2,000 traits in Arabidopsis thaliana, tomato and maize populations. Associations identified with k-mers recapitulate those found with SNPs, but with stronger statistical support. Importantly, we discovered new associations with structural variants and with regions missing from reference genomes. Our results demonstrate the power of performing GWAS before linking sequence reads to specific genomic regions, which allows the detection of a wider range of genetic variants responsible for phenotypic variation.
Collapse
|
14
|
Extreme genetic signatures of local adaptation during Lotus japonicus colonization of Japan. Nat Commun 2020; 11:253. [PMID: 31937774 PMCID: PMC6959357 DOI: 10.1038/s41467-019-14213-y] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 12/20/2019] [Indexed: 11/09/2022] Open
Abstract
Colonization of new habitats is expected to require genetic adaptations to overcome environmental challenges. Here, we use full genome re-sequencing and extensive common garden experiments to investigate demographic and selective processes associated with colonization of Japan by Lotus japonicus over the past ~20,000 years. Based on patterns of genomic variation, we infer the details of the colonization process where L. japonicus gradually spread from subtropical conditions to much colder climates in northern Japan. We identify genomic regions with extreme genetic differentiation between northern and southern subpopulations and perform population structure-corrected association mapping of phenotypic traits measured in a common garden. Comparing the results of these analyses, we find that signatures of extreme subpopulation differentiation overlap strongly with phenotype association signals for overwintering and flowering time traits. Our results provide evidence that these traits were direct targets of selection during colonization and point to associated candidate genes. Local adaptation contributes to plant colonization across extreme environmental gradients. Here, the authors reconstruct the colonization history of Lotus japonicus in Japan and identify extreme genetic signatures of local adaptation to a cold climate using genome resequencing and common garden experiments.
Collapse
|
15
|
Dahl A, Nguyen K, Cai N, Gandal MJ, Flint J, Zaitlen N. A Robust Method Uncovers Significant Context-Specific Heritability in Diverse Complex Traits. Am J Hum Genet 2020; 106:71-91. [PMID: 31901249 PMCID: PMC7042488 DOI: 10.1016/j.ajhg.2019.11.015] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 11/26/2019] [Indexed: 02/08/2023] Open
Abstract
Gene-environment interactions (GxE) can be fundamental in applications ranging from functional genomics to precision medicine and is a conjectured source of substantial heritability. However, unbiased methods to profile GxE genome-wide are nascent and, as we show, cannot accommodate general environment variables, modest sample sizes, heterogeneous noise, and binary traits. To address this gap, we propose a simple, unifying mixed model for gene-environment interaction (GxEMM). In simulations and theory, we show that GxEMM can dramatically improve estimates and eliminate false positives when the assumptions of existing methods fail. We apply GxEMM to a range of human and model organism datasets and find broad evidence of context-specific genetic effects, including GxSex, GxAdversity, and GxDisease interactions across thousands of clinical and molecular phenotypes. Overall, GxEMM is broadly applicable for testing and quantifying polygenic interactions, which can be useful for explaining heritability and invaluable for determining biologically relevant environments.
Collapse
Affiliation(s)
- Andy Dahl
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA.
| | - Khiem Nguyen
- Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA
| | - Na Cai
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK; European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Michael J Gandal
- Department of Psychiatry, Semel Institute, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Jonathan Flint
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, Los Angeles, CA 90095, USA
| | - Noah Zaitlen
- Department of Neurology, University of California Los Angeles, Los Angeles, CA 90095, USA; Department of Medicine, University of California San Francisco, San Francisco, CA 94158, USA.
| |
Collapse
|
16
|
de Jong M, Tavares H, Pasam RK, Butler R, Ward S, George G, Melnyk CW, Challis R, Kover PX, Leyser O. Natural variation in Arabidopsis shoot branching plasticity in response to nitrate supply affects fitness. PLoS Genet 2019; 15:e1008366. [PMID: 31539368 PMCID: PMC6774567 DOI: 10.1371/journal.pgen.1008366] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2019] [Revised: 10/02/2019] [Accepted: 08/09/2019] [Indexed: 12/20/2022] Open
Abstract
The capacity of organisms to tune their development in response to environmental cues is pervasive in nature. This phenotypic plasticity is particularly striking in plants, enabled by their modular and continuous development. A good example is the activation of lateral shoot branches in Arabidopsis, which develop from axillary meristems at the base of leaves. The activity and elongation of lateral shoots depends on the integration of many signals both external (e.g. light, nutrient supply) and internal (e.g. the phytohormones auxin, strigolactone and cytokinin). Here, we characterise natural variation in plasticity of shoot branching in response to nitrate supply using two diverse panels of Arabidopsis lines. We find extensive variation in nitrate sensitivity across these lines, suggesting a genetic basis for variation in branching plasticity. High plasticity is associated with extreme branching phenotypes such that lines with the most branches on high nitrate have the fewest under nitrate deficient conditions. Conversely, low plasticity is associated with a constitutively moderate level of branching. Furthermore, variation in plasticity is associated with alternative life histories with the low plasticity lines flowering significantly earlier than high plasticity lines. In Arabidopsis, branching is highly correlated with fruit yield, and thus low plasticity lines produce more fruit than high plasticity lines under nitrate deficient conditions, whereas highly plastic lines produce more fruit under high nitrate conditions. Low and high plasticity, associated with early and late flowering respectively, can therefore be interpreted alternative escape vs mitigate strategies to low N environments. The genetic architecture of these traits appears to be highly complex, with only a small proportion of the estimated genetic variance detected in association mapping.
Collapse
Affiliation(s)
- Maaike de Jong
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
- Department of Biology, University of York, York, United Kingdom
| | - Hugo Tavares
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Raj K. Pasam
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Rebecca Butler
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Sally Ward
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
- Department of Biology, University of York, York, United Kingdom
| | - Gilu George
- Department of Biology, University of York, York, United Kingdom
| | - Charles W. Melnyk
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
| | - Richard Challis
- Department of Biology, University of York, York, United Kingdom
| | - Paula X. Kover
- Department of Biology and Biochemistry, University of Bath, Claverton Down, Bath, United Kingdom
| | - Ottoline Leyser
- Sainsbury Laboratory, University of Cambridge, Cambridge, United Kingdom
- Department of Biology, University of York, York, United Kingdom
- * E-mail:
| |
Collapse
|
17
|
Corty RW, Valdar W. QTL Mapping on a Background of Variance Heterogeneity. G3 (BETHESDA, MD.) 2018; 8:3767-3782. [PMID: 30389794 DOI: 10.1101/276980] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
Standard QTL mapping procedures seek to identify genetic loci affecting the phenotypic mean while assuming that all individuals have the same residual variance. But when the residual variance differs systematically between groups, perhaps due to a genetic or environmental factor, such standard procedures can falter: in testing for QTL associations, they attribute too much weight to observations that are noisy and too little to those that are precise, resulting in reduced power and and increased susceptibility to false positives. The negative effects of such "background variance heterogeneity" (BVH) on standard QTL mapping have received little attention until now, although the subject is closely related to work on the detection of variance-controlling genes. Here we use simulation to examine how BVH affects power and false positive rate for detecting QTL affecting the mean (mQTL), the variance (vQTL), or both (mvQTL). We compare linear regression for mQTL and Levene's test for vQTL, with tests more recently developed, including tests based on the double generalized linear model (DGLM), which can model BVH explicitly. We show that, when used in conjunction with a suitable permutation procedure, the DGLM-based tests accurately control false positive rate and are more powerful than the other tests. We also find that some adverse effects of BVH can be mitigated by applying a rank inverse normal transform. We apply our novel approach, which we term "mean-variance QTL mapping", to publicly available data on a mouse backcross and, after accommodating BVH driven by sire, detect a new mQTL for bodyweight.
Collapse
Affiliation(s)
- Robert W Corty
- Department of Genetics
- Bioinformatics and Computational Biology Curriculum
| | - William Valdar
- Department of Genetics
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
18
|
Corty RW, Valdar W. QTL Mapping on a Background of Variance Heterogeneity. G3 (BETHESDA, MD.) 2018; 8:3767-3782. [PMID: 30389794 PMCID: PMC6288843 DOI: 10.1534/g3.118.200790] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/27/2018] [Accepted: 10/28/2018] [Indexed: 12/21/2022]
Abstract
Standard QTL mapping procedures seek to identify genetic loci affecting the phenotypic mean while assuming that all individuals have the same residual variance. But when the residual variance differs systematically between groups, perhaps due to a genetic or environmental factor, such standard procedures can falter: in testing for QTL associations, they attribute too much weight to observations that are noisy and too little to those that are precise, resulting in reduced power and and increased susceptibility to false positives. The negative effects of such "background variance heterogeneity" (BVH) on standard QTL mapping have received little attention until now, although the subject is closely related to work on the detection of variance-controlling genes. Here we use simulation to examine how BVH affects power and false positive rate for detecting QTL affecting the mean (mQTL), the variance (vQTL), or both (mvQTL). We compare linear regression for mQTL and Levene's test for vQTL, with tests more recently developed, including tests based on the double generalized linear model (DGLM), which can model BVH explicitly. We show that, when used in conjunction with a suitable permutation procedure, the DGLM-based tests accurately control false positive rate and are more powerful than the other tests. We also find that some adverse effects of BVH can be mitigated by applying a rank inverse normal transform. We apply our novel approach, which we term "mean-variance QTL mapping", to publicly available data on a mouse backcross and, after accommodating BVH driven by sire, detect a new mQTL for bodyweight.
Collapse
Affiliation(s)
- Robert W Corty
- Department of Genetics
- Bioinformatics and Computational Biology Curriculum
| | - William Valdar
- Department of Genetics
- Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC
| |
Collapse
|
19
|
Zhang T, Sun L. Beyond the traditional simulation design for evaluating type 1 error control: From the "theoretical" null to "empirical" null. Genet Epidemiol 2018; 43:166-179. [PMID: 30478944 PMCID: PMC6518945 DOI: 10.1002/gepi.22172] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2018] [Revised: 09/10/2018] [Accepted: 09/21/2018] [Indexed: 01/25/2023]
Abstract
When evaluating a newly developed statistical test, an important step is to check its type 1 error (T1E) control using simulations. This is often achieved by the standard simulation design S0 under the so-called "theoretical" null of no association. In practice, the whole-genome association analyses scan through a large number of genetic markers ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s) for the ones associated with an outcome of interest ( <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ), where <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> comes from an alternative while the majority of <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> s are not associated with <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> ; the <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi> <mml:mo>-</mml:mo> <mml:mi>G</mml:mi></mml:math> relationships are under the "empirical" null. This reality can be better represented by two other simulation designs, where design S1.1 simulates <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> from analternative model based on <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> , then evaluates its association with independently generated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mrow/> <mml:msub><mml:mi>G</mml:mi> <mml:mrow><mml:mi>n</mml:mi> <mml:mi>e</mml:mi> <mml:mi>w</mml:mi></mml:mrow> </mml:msub> </mml:mrow> </mml:math> ; while design S1.2 evaluates the association between permutated <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>Y</mml:mi></mml:math> and <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mi>G</mml:mi></mml:math> . More than a decade ago, Efron (2004) has noted the important distinction between the "theoretical" and "empirical" null in false discovery rate control. Using scale tests for variance heterogeneity, direct univariate, and multivariate interaction tests as examples, here we show that not all null simulation designs are equal. In examining the accuracy of a likelihood ratio test, while simulation design S0 suggested the method being accurate, designs S1.1 and S1.2 revealed its increased empirical T1E rate if applied in real data setting. The inflation becomes more severe at the tail and does not diminish as sample size increases. This is an important observation that calls for new practices for methods evaluation and T1E control interpretation.
Collapse
Affiliation(s)
- Ting Zhang
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada
| | - Lei Sun
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.,Department of Statistical Sciences, Faculty of Arts and Science, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
20
|
Kemppainen P, Husby A. Accounting for heteroscedasticity and censoring in chromosome partitioning analyses. Evol Lett 2018; 2:599-609. [PMID: 30564443 PMCID: PMC6292708 DOI: 10.1002/evl3.88] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 10/07/2018] [Accepted: 10/10/2018] [Indexed: 01/02/2023] Open
Abstract
A fundamental assumption in quantitative genetics is that traits are controlled by many loci of small effect. Using genomic data, this assumption can be tested using chromosome partitioning analyses, where the proportion of genetic variance for a trait explained by each chromosome (h2c), is regressed on its size. However, as h2c‐estimates are necessarily positive (censoring) and the variance increases with chromosome size (heteroscedasticity), two fundamental assumptions of ordinary least squares (OLS) regression are violated. Using simulated and empirical data we demonstrate that these violations lead to incorrect inference of genetic architecture. The degree of bias depends mainly on the number of chromosomes and their size distribution and is therefore specific to the species; using published data across many different species we estimate that not accounting for this effect overall resulted in 28% false positives. We introduce a new and computationally efficient resampling method that corrects for inflation caused by heteroscedasticity and censoring and that works under a large range of dataset sizes and genetic architectures in empirical datasets. Our new method substantially improves the robustness of inferences from chromosome partitioning analyses.
Collapse
Affiliation(s)
- Petri Kemppainen
- Organismal and Evolutionary Biology Research Programme University of Helsinki 00014 Helsinki Finland
| | - Arild Husby
- Organismal and Evolutionary Biology Research Programme University of Helsinki 00014 Helsinki Finland.,Department of Ecology and Genetics Uppsala University 75236 Uppsala Sweden
| |
Collapse
|
21
|
Ganjgahi H, Winkler AM, Glahn DC, Blangero J, Donohue B, Kochunov P, Nichols TE. Fast and powerful genome wide association of dense genetic data with high dimensional imaging phenotypes. Nat Commun 2018; 9:3254. [PMID: 30108209 PMCID: PMC6092439 DOI: 10.1038/s41467-018-05444-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2017] [Accepted: 07/09/2018] [Indexed: 01/05/2023] Open
Abstract
Genome wide association (GWA) analysis of brain imaging phenotypes can advance our understanding of the genetic basis of normal and disorder-related variation in the brain. GWA approaches typically use linear mixed effect models to account for non-independence amongst subjects due to factors, such as family relatedness and population structure. The use of these models with high-dimensional imaging phenotypes presents enormous challenges in terms of computational intensity and the need to account multiple testing in both the imaging and genetic domain. Here we present a method that makes mixed models practical with high-dimensional traits by a combination of a transformation applied to the data and model, and the use of a non-iterative variance component estimator. With such speed enhancements permutation tests are feasible, which allows inference on powerful spatial tests like the cluster size statistic.
Collapse
Affiliation(s)
- Habib Ganjgahi
- Department of Statistics, University of Oxford, Oxford, UK
- Medical Research Council Harwell Institute, Harwell, UK
| | - Anderson M Winkler
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK
- Big Data Analytics Group, Hospital Israelita Albert Einstein, São Paulo, SP, Brazil
| | - David C Glahn
- Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, Brownsville, TX, USA
| | - Brian Donohue
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Peter Kochunov
- Maryland Psychiatric Research Center, Department of Psychiatry, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Thomas E Nichols
- Wellcome Centre for Integrative Neuroimaging, University of Oxford, Oxford, UK.
- Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, UK.
- Department of Statistics, University of Warwick, Coventry, UK.
| |
Collapse
|
22
|
Bonnet A, Lévy‐Leduc C, Gassiat E, Toro R, Bourgeron T. Improving heritability estimation by a variable selection approach in sparse high dimensional linear mixed models. J R Stat Soc Ser C Appl Stat 2018. [DOI: 10.1111/rssc.12261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Anna Bonnet
- AgroParisTech and Université Paris‐Saclay Paris France
| | | | | | | | | |
Collapse
|
23
|
Noble LM, Chelo I, Guzella T, Afonso B, Riccardi DD, Ammerman P, Dayarian A, Carvalho S, Crist A, Pino-Querido A, Shraiman B, Rockman MV, Teotónio H. Polygenicity and Epistasis Underlie Fitness-Proximal Traits in the Caenorhabditis elegans Multiparental Experimental Evolution (CeMEE) Panel. Genetics 2017; 207:1663-1685. [PMID: 29066469 PMCID: PMC5714472 DOI: 10.1534/genetics.117.300406] [Citation(s) in RCA: 44] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2017] [Accepted: 10/10/2017] [Indexed: 01/27/2023] Open
Abstract
Understanding the genetic basis of complex traits remains a major challenge in biology. Polygenicity, phenotypic plasticity, and epistasis contribute to phenotypic variance in ways that are rarely clear. This uncertainty can be problematic for estimating heritability, for predicting individual phenotypes from genomic data, and for parameterizing models of phenotypic evolution. Here, we report an advanced recombinant inbred line (RIL) quantitative trait locus mapping panel for the hermaphroditic nematode Caenorhabditis elegans, the C. elegans multiparental experimental evolution (CeMEE) panel. The CeMEE panel, comprising 507 RILs at present, was created by hybridization of 16 wild isolates, experimental evolution for 140-190 generations, and inbreeding by selfing for 13-16 generations. The panel contains 22% of single-nucleotide polymorphisms known to segregate in natural populations, and complements existing C. elegans mapping resources by providing fine resolution and high nucleotide diversity across > 95% of the genome. We apply it to study the genetic basis of two fitness components, fertility and hermaphrodite body size at time of reproduction, with high broad-sense heritability in the CeMEE. While simulations show that we should detect common alleles with additive effects as small as 5%, at gene-level resolution, the genetic architectures of these traits do not feature such alleles. We instead find that a significant fraction of trait variance, approaching 40% for fertility, can be explained by sign epistasis with main effects below the detection limit. In congruence, phenotype prediction from genomic similarity, while generally poor ([Formula: see text]), requires modeling epistasis for optimal accuracy, with most variance attributed to the rapidly evolving chromosome arms.
Collapse
Affiliation(s)
- Luke M Noble
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Ivo Chelo
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
| | - Thiago Guzella
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | - Bruno Afonso
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | - David D Riccardi
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Patrick Ammerman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Adel Dayarian
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, California 93106
| | - Sara Carvalho
- Instituto Gulbenkian de Ciência, P-2781-901 Oeiras, Portugal
| | - Anna Crist
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| | | | - Boris Shraiman
- Kavli Institute for Theoretical Physics, University of California, Santa Barbara, California 93106
- Department of Physics, University of California, Santa Barbara, California 93106
| | - Matthew V Rockman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York 10003
| | - Henrique Teotónio
- Institut de Biologie, École Normale Supérieure, Centre National de la Recherche Scientifique (CNRS) UMR 8197, Institut National de la Santé et de la Recherche Médicale (INSERM) U1024, F-75005 Paris, France
| |
Collapse
|
24
|
Sun S, Hood M, Scott L, Peng Q, Mukherjee S, Tung J, Zhou X. Differential expression analysis for RNAseq using Poisson mixed models. Nucleic Acids Res 2017; 45:e106. [PMID: 28369632 PMCID: PMC5499851 DOI: 10.1093/nar/gkx204] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2016] [Revised: 03/02/2017] [Accepted: 03/17/2017] [Indexed: 12/13/2022] Open
Abstract
Identifying differentially expressed (DE) genes from RNA sequencing (RNAseq) studies is among the most common analyses in genomics. However, RNAseq DE analysis presents several statistical and computational challenges, including over-dispersed read counts and, in some settings, sample non-independence. Previous count-based methods rely on simple hierarchical Poisson models (e.g. negative binomial) to model independent over-dispersion, but do not account for sample non-independence due to relatedness, population structure and/or hidden confounders. Here, we present a Poisson mixed model with two random effects terms that account for both independent over-dispersion and sample non-independence. We also develop a scalable sampling-based inference algorithm using a latent variable representation of the Poisson distribution. With simulations, we show that our method properly controls for type I error and is generally more powerful than other widely used approaches, except in small samples (n <15) with other unfavorable properties (e.g. small effect sizes). We also apply our method to three real datasets that contain related individuals, population stratification or hidden confounders. Our results show that our method increases power in all three data compared to other approaches, though the power gain is smallest in the smallest sample (n = 6). Our method is implemented in MACAU, freely available at www.xzlab.org/software.html.
Collapse
Affiliation(s)
- Shiquan Sun
- Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P.R. China
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Michelle Hood
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Laura Scott
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Qinke Peng
- Systems Engineering Institute, Xi’an Jiaotong University, Xi’an, Shaanxi 710049, P.R. China
| | - Sayan Mukherjee
- Departments of Statistical Science, Mathematics, and Computer Science, Duke University, Durham, NC 27708, USA
| | - Jenny Tung
- Departments of Evolutionary Anthropology and Biology, Duke University, Durham, NC 27708, USA
- Duke University Population Research Institute, Duke University, Durham, NC 27708, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan, Ann Arbor, MI 48109, USA
- Center for Statistical Genetics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
25
|
Dennis J, Medina-Rivera A, Truong V, Antounians L, Zwingerman N, Carrasco G, Strug L, Wells P, Trégouët DA, Morange PE, Wilson MD, Gagnon F. Leveraging cell type specific regulatory regions to detect SNPs associated with tissue factor pathway inhibitor plasma levels. Genet Epidemiol 2017; 41:455-466. [PMID: 28421636 DOI: 10.1002/gepi.22049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Revised: 03/07/2017] [Accepted: 03/14/2017] [Indexed: 11/10/2022]
Abstract
Tissue factor pathway inhibitor (TFPI) regulates the formation of intravascular blood clots, which manifest clinically as ischemic heart disease, ischemic stroke, and venous thromboembolism (VTE). TFPI plasma levels are heritable, but the genetics underlying TFPI plasma level variability are poorly understood. Herein we report the first genome-wide association scan (GWAS) of TFPI plasma levels, conducted in 251 individuals from five extended French-Canadian Families ascertained on VTE. To improve discovery, we also applied a hypothesis-driven (HD) GWAS approach that prioritized single nucleotide polymorphisms (SNPs) in (1) hemostasis pathway genes, and (2) vascular endothelial cell (EC) regulatory regions, which are among the highest expressers of TFPI. Our GWAS identified 131 SNPs with suggestive evidence of association (P-value < 5 × 10-8 ), but no SNPs reached the genome-wide threshold for statistical significance. Hemostasis pathway genes were not enriched for TFPI plasma level associated SNPs (global hypothesis test P-value = 0.147), but EC regulatory regions contained more TFPI plasma level associated SNPs than expected by chance (global hypothesis test P-value = 0.046). We therefore stratified our genome-wide SNPs, prioritizing those in EC regulatory regions via stratified false discovery rate (sFDR) control, and reranked the SNPs by q-value. The minimum q-value was 0.27, and the top-ranked SNPs did not show association evidence in the MARTHA replication sample of 1,033 unrelated VTE cases. Although this study did not result in new loci for TFPI, our work lays out a strategy to utilize epigenomic data in prioritization schemes for future GWAS studies.
Collapse
Affiliation(s)
- Jessica Dennis
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Alejandra Medina-Rivera
- Program in Genetics and Genome Biology, the Hospital for Sick Children, Toronto, Canada.,Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - Vinh Truong
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Lina Antounians
- Program in Genetics and Genome Biology, the Hospital for Sick Children, Toronto, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Canada
| | - Nora Zwingerman
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Giovana Carrasco
- Laboratorio Internacional de Investigación sobre el Genoma Humano, Universidad Nacional Autónoma de México, Juriquilla, Querétaro, México
| | - Lisa Strug
- Program in Genetics and Genome Biology, the Hospital for Sick Children, Toronto, Canada.,Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| | - Phil Wells
- Ottawa Hospital Research Institute, Ottawa, Canada
| | - David-Alexandre Trégouët
- Sorbonne Universités, UPMC Univ Paris 06, Paris, France.,INSERM, UMR_S 1166, Paris, France.,ICAN Institute for Cardiometabolism and Nutrition, Paris, France
| | - Pierre-Emmanuel Morange
- INSERM, UMR_S 1062, Marseille, France.,Inra, UMR_INRA 1260, Marseille, France.,Aix Marseille Université, Marseille, France
| | - Michael D Wilson
- Program in Genetics and Genome Biology, the Hospital for Sick Children, Toronto, Canada.,Department of Molecular Genetics, University of Toronto, Toronto, Canada.,Heart & Stroke Richard Lewar Centre of Excellence in Cardiovascular Research, Toronto, Canada
| | - France Gagnon
- Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
| |
Collapse
|
26
|
Lutz SM, Fingerlin TE, Hokanson JE, Lange C. A general approach to testing for pleiotropy with rare and common variants. Genet Epidemiol 2017; 41:163-170. [PMID: 27900789 PMCID: PMC5472207 DOI: 10.1002/gepi.22011] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2016] [Revised: 08/01/2016] [Accepted: 09/19/2016] [Indexed: 12/22/2022]
Abstract
Through genome-wide association studies, numerous genes have been shown to be associated with multiple phenotypes. To determine the overlap of genetic susceptibility of correlated phenotypes, one can apply multivariate regression or dimension reduction techniques, such as principal components analysis, and test for the association with the principal components of the phenotypes rather than the individual phenotypes. However, as these approaches test whether there is a genetic effect for at least one of the phenotypes, a significant test result does not necessarily imply pleiotropy. Recently, a method called Pleiotropy Estimation and Test Bootstrap (PET-B) has been proposed to specifically test for pleiotropy (i.e., that two normally distributed phenotypes are both associated with the single nucleotide polymorphism of interest). Although the method examines the genetic overlap between the two quantitative phenotypes, the extension to binary phenotypes, three or more phenotypes, and rare variants is not straightforward. We provide two approaches to formally test this pleiotropic relationship in multiple scenarios. These approaches depend on permuting the phenotypes of interest and comparing the set of observed P-values to the set of permuted P-values in relation to the origin (e.g., a vector of zeros) either using the Hausdorff metric or a cutoff-based approach. These approaches are appropriate for categorical and quantitative phenotypes, more than two phenotypes, common variants and rare variants. We evaluate these approaches under various simulation scenarios and apply them to the COPDGene study, a case-control study of chronic obstructive pulmonary disease in current and former smokers.
Collapse
Affiliation(s)
- Sharon M Lutz
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Tasha E Fingerlin
- Department of Biostatistics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
- Center for Genes, Environment, and Health, National Jewish Health, Denver, CO, USA
| | - John E Hokanson
- Department of Epidemiology, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA
| | - Christoph Lange
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| |
Collapse
|
27
|
Soave D, Sun L. A generalized Levene's scale test for variance heterogeneity in the presence of sample correlation and group uncertainty. Biometrics 2017; 73:960-971. [DOI: 10.1111/biom.12651] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2016] [Revised: 12/01/2016] [Accepted: 12/01/2016] [Indexed: 10/20/2022]
Affiliation(s)
- David Soave
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto; Toronto, Ontario M5T 3M7 Canada
- Program in Genetics and Genome Biology, Research Institute, The Hospital for Sick Children; Toronto, Ontario M5G 0A4 Canada
| | - Lei Sun
- Department of Statistical Sciences, University of Toronto; Toronto, Ontario M5S 3G3 Canada
- Division of Biostatistics, Dalla Lana School of Public Health, University of Toronto; Toronto, Ontario M5T 3M7 Canada
| |
Collapse
|
28
|
Parker CC, Gopalakrishnan S, Carbonetto P, Gonzales NM, Leung E, Park YJ, Aryee E, Davis J, Blizard DA, Ackert-Bicknell CL, Lionikas A, Pritchard JK, Palmer AA. Genome-wide association study of behavioral, physiological and gene expression traits in outbred CFW mice. Nat Genet 2016; 48:919-26. [PMID: 27376237 PMCID: PMC4963286 DOI: 10.1038/ng.3609] [Citation(s) in RCA: 70] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2016] [Accepted: 06/08/2016] [Indexed: 12/15/2022]
Abstract
Although mice are the most widely used mammalian model organism, genetic studies have suffered from limited mapping resolution due to extensive linkage disequilibrium (LD) that is characteristic of crosses among inbred strains. Carworth Farms White (CFW) mice are a commercially available outbred mouse population that exhibit rapid LD decay in comparison to other available mouse populations. We performed a genome-wide association study (GWAS) of behavioral, physiological and gene expression phenotypes using 1,200 male CFW mice. We used genotyping by sequencing (GBS) to obtain genotypes at 92,734 SNPs. We also measured gene expression using RNA sequencing in three brain regions. Our study identified numerous behavioral, physiological and expression quantitative trait loci (QTLs). We integrated the behavioral QTL and eQTL results to implicate specific genes, including Azi2 in sensitivity to methamphetamine and Zmynd11 in anxiety-like behavior. The combination of CFW mice, GBS and RNA sequencing constitutes a powerful approach to GWAS in mice.
Collapse
Affiliation(s)
- Clarissa C. Parker
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Department of Psychology, Middlebury College, Middlebury, VT 05753, USA
- Program in Neuroscience, Middlebury College, Middlebury, VT 05753, USA
| | - Shyam Gopalakrishnan
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Museum of Natural History, Copenhagen University, Copenhagen, Denmark
| | - Peter Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- AncestryDNA, San Francisco, CA 94105, USA
| | | | - Emily Leung
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Yeonhee J Park
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Emmanuel Aryee
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - Joe Davis
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
| | - David A. Blizard
- Department of Biobehavioral Health, Pennsylvania State University, University Park, PA 16802, USA
| | - Cheryl L. Ackert-Bicknell
- Center for Musculoskeletal Research, University of Rochester, Rochester, NY 14624, USA
- Department of Orthopaedics and Rehabilitation, University of Rochester, Rochester, NY 14624, USA
| | - Arimantas Lionikas
- School of Medicine, Medical Sciences and Nutrition, University of Aberdeen, Foresterhill Aberdeen, Scotland UK
| | - Jonathan K. Pritchard
- Department of Genetics, Stanford University, Palo Alto, CA 94305, USA
- Department of Biology, Stanford University, Palo Alto, CA 94305, USA
- Howard Hughes Medical Institute, Stanford University, Palo Alto, CA 94305, USA
| | - Abraham A. Palmer
- Department of Human Genetics, University of Chicago, Chicago, IL 60637, USA
- Department of Psychiatry and Behavioral Neuroscience, University of Chicago, Chicago, IL 60637, USA
- Department of Psychiatry, University of California San Diego, La Jolla, CA 92103, USA
- Institute for Genomic Medicine, University of California San Diego, La Jolla, CA 92103, USA
| |
Collapse
|
29
|
Alonso-Blanco C, Andrade J, Becker C, Bemm F, Bergelson J, Borgwardt KM, Cao J, Chae E, Dezwaan TM, Ding W, Ecker JR, Exposito-Alonso M, Farlow A, Fitz J, Gan X, Grimm DG, Hancock AM, Henz SR, Holm S, Horton M, Jarsulic M, Kerstetter RA, Korte A, Korte P, Lanz C, Lee CR, Meng D, Michael TP, Mott R, Muliyati NW, Nägele T, Nagler M, Nizhynska V, Nordborg M, Novikova PY, Picó FX, Platzer A, Rabanal FA, Rodriguez A, Rowan BA, Salomé PA, Schmid KJ, Schmitz RJ, Seren Ü, Sperone FG, Sudkamp M, Svardal H, Tanzer MM, Todd D, Volchenboum SL, Wang C, Wang G, Wang X, Weckwerth W, Weigel D, Zhou X. 1,135 Genomes Reveal the Global Pattern of Polymorphism in Arabidopsis thaliana. Cell 2016; 166:481-491. [PMID: 27293186 PMCID: PMC4949382 DOI: 10.1016/j.cell.2016.05.063] [Citation(s) in RCA: 736] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/20/2016] [Accepted: 05/17/2016] [Indexed: 12/30/2022]
Abstract
Arabidopsis thaliana serves as a model organism for the study of fundamental physiological, cellular, and molecular processes. It has also greatly advanced our understanding of intraspecific genome variation. We present a detailed map of variation in 1,135 high-quality re-sequenced natural inbred lines representing the native Eurasian and North African range and recently colonized North America. We identify relict populations that continue to inhabit ancestral habitats, primarily in the Iberian Peninsula. They have mixed with a lineage that has spread to northern latitudes from an unknown glacial refugium and is now found in a much broader spectrum of habitats. Insights into the history of the species and the fine-scale distribution of genetic diversity provide the basis for full exploitation of A. thaliana natural variation through integration of genomes and epigenomes with molecular and non-molecular phenotypes.
Collapse
|
30
|
Joo JWJ, Hormozdiari F, Han B, Eskin E. Multiple testing correction in linear mixed models. Genome Biol 2016; 17:62. [PMID: 27039378 PMCID: PMC4818520 DOI: 10.1186/s13059-016-0903-6] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2015] [Accepted: 02/17/2016] [Indexed: 08/30/2023] Open
Abstract
BACKGROUND Multiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM. RESULTS We were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach. CONCLUSIONS We provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data.
Collapse
Affiliation(s)
- Jong Wha J Joo
- Bioinformatics IDP, University of California, Los Angeles, CA, USA
| | - Farhad Hormozdiari
- Computer Science Department, University of California, Los Angeles, CA, USA
| | - Buhm Han
- Department of Convergence Medicine, University of Ulsan College of Medicine & Asan Institute for Life Sciences, Asan Medical Center, Seoul, 138-736, Republic of Korea.
| | - Eleazar Eskin
- Computer Science Department, University of California, Los Angeles, CA, USA. .,Department of Human Genetics, University of California, Los Angeles, CA, USA.
| |
Collapse
|
31
|
Cusanovich DA, Caliskan M, Billstrand C, Michelini K, Chavarria C, De Leon S, Mitrano A, Lewellyn N, Elias JA, Chupp GL, Lang RM, Shah SJ, Decara JM, Gilad Y, Ober C. Integrated analyses of gene expression and genetic association studies in a founder population. Hum Mol Genet 2016; 25:2104-2112. [PMID: 26931462 PMCID: PMC5062579 DOI: 10.1093/hmg/ddw061] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2015] [Accepted: 02/21/2016] [Indexed: 12/17/2022] Open
Abstract
Genome-wide association studies (GWASs) have become a standard tool for dissecting genetic contributions to disease risk. However, these studies typically require extraordinarily large sample sizes to be adequately powered. Strategies that incorporate functional information alongside genetic associations have proved successful in increasing GWAS power. Following this paradigm, we present the results of 20 different genetic association studies for quantitative traits related to complex diseases, conducted in the Hutterites of South Dakota. To boost the power of these association studies, we collected RNA-sequencing data from lymphoblastoid cell lines for 431 Hutterite individuals. We then used Sherlock, a tool that integrates GWAS and expression quantitative trait locus (eQTL) data, to identify weak GWAS signals that are also supported by eQTL data. Using this approach, we found novel associations with quantitative phenotypes related to cardiovascular disease, including carotid intima-media thickness, left atrial volume index, monocyte count and serum YKL-40 levels.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Jack A Elias
- Division of Biology and Medicine, Brown University, Providence, RI 02912, USA and
| | - Geoffrey L Chupp
- Pulmonary and Critical Care, Yale School of Medicine, New Haven, CT 06519, USA
| | - Roberto M Lang
- Department of Medicine, Section of Cardiology, University of Chicago, Chicago, IL 60637, USA
| | - Sanjiv J Shah
- Department of Medicine, Section of Cardiology, University of Chicago, Chicago, IL 60637, USA
| | - Jeanne M Decara
- Department of Medicine, Section of Cardiology, University of Chicago, Chicago, IL 60637, USA
| | | | | |
Collapse
|
32
|
Abstract
Empirical studies and evolutionary theory support a role for rare variants in the etiology of complex traits. Given this motivation and increasing affordability of whole-exome and whole-genome sequencing, methods for rare variant association have been an active area of research for the past decade. Here, we provide a survey of the current literature and developments from the Genetics Analysis Workshop 19 (GAW19) Collapsing Rare Variants working group. In particular, we present the generalized linear regression framework and associated score statistic for the 2 major types of methods: burden and variance components methods. We further show that by simply modifying weights within these frameworks we arrive at many of the popular existing methods, for example, the cohort allelic sums test and sequence kernel association test. Meta-analysis techniques are also described. Next, we describe the 6 contributions from the GAW19 Collapsing Rare Variants working group. These included development of new methods, such as a retrospective likelihood for family data, a method using genomic structure to compare cases and controls, a haplotype-based meta-analysis, and a permutation-based method for combining different statistical tests. In addition, one contribution compared a mega-analysis of family-based and population-based data to meta-analysis. Finally, the power of existing family-based methods for binary traits was compared. We conclude with suggestions for open research questions.
Collapse
Affiliation(s)
- Stephanie A Santorico
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| | - Audrey E Hendricks
- Department of Mathematical and Statistical Sciences, University of Colorado Denver, Denver, CO, 80217-3364, USA.
| |
Collapse
|
33
|
Sittig LJ, Carbonetto P, Engel KA, Krauss KS, Palmer AA. Integration of genome-wide association and extant brain expression QTL identifies candidate genes influencing prepulse inhibition in inbred F1 mice. GENES BRAIN AND BEHAVIOR 2016; 15:260-70. [PMID: 26482417 DOI: 10.1111/gbb.12262] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Revised: 10/13/2015] [Accepted: 10/15/2015] [Indexed: 12/12/2022]
Abstract
Genetic association mapping in structured populations of model organisms can offer a fruitful complement to human genetic studies by generating new biological hypotheses about complex traits. Here we investigated prepulse inhibition (PPI), a measure of sensorimotor gating that is disrupted in a number of psychiatric disorders. To identify genes that influence PPI, we constructed a panel of half-sibs by crossing 30 females from common inbred mouse strains with inbred C57BL/6J males to create male and female F1 offspring. We used publicly available single nucleotide polymorphism (SNP) genotype data from these inbred strains to perform a genome-wide association scan using a dense panel of over 150,000 SNPs in a combined sample of 604 mice representing 30 distinct F1 genotypes. We identified two independent PPI-associated loci on Chromosomes 2 and 7, each of which explained 12-14% of the variance in PPI. Searches of available databases did not identify any plausible causative coding polymorphisms within these loci. However, previously collected expression quantitative trait locus (eQTL) data from hippocampus and striatum indicated that the SNPs on Chromosomes 2 and 7 that showed the strongest association with PPI were also strongly associated with expression of several transcripts, some of which have been implicated in human psychiatric disorders. This integrative approach successfully identified a focused set of genes which can be prioritized for follow-up studies. More broadly, our results show that F1 crosses among common inbred strains can be used in combination with other informatics and expression datasets to identify candidate genes for complex behavioral traits.
Collapse
Affiliation(s)
- L J Sittig
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - P Carbonetto
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - K A Engel
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - K S Krauss
- Department of Human Genetics, University of Chicago, Chicago, IL
| | - A A Palmer
- Department of Human Genetics, University of Chicago, Chicago, IL.,Department of Psychiatry, University of California San Diego, San Diego, CA, USA
| |
Collapse
|
34
|
Mapping of Craniofacial Traits in Outbred Mice Identifies Major Developmental Genes Involved in Shape Determination. PLoS Genet 2015; 11:e1005607. [PMID: 26523602 PMCID: PMC4629907 DOI: 10.1371/journal.pgen.1005607] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 09/24/2015] [Indexed: 02/05/2023] Open
Abstract
The vertebrate cranium is a prime example of the high evolvability of complex traits. While evidence of genes and developmental pathways underlying craniofacial shape determination is accumulating, we are still far from understanding how such variation at the genetic level is translated into craniofacial shape variation. Here we used 3D geometric morphometrics to map genes involved in shape determination in a population of outbred mice (Carworth Farms White, or CFW). We defined shape traits via principal component analysis of 3D skull and mandible measurements. We mapped genetic loci associated with shape traits at ~80,000 candidate single nucleotide polymorphisms in ~700 male mice. We found that craniofacial shape and size are highly heritable, polygenic traits. Despite the polygenic nature of the traits, we identified 17 loci that explain variation in skull shape, and 8 loci associated with variation in mandible shape. Together, the associated variants account for 11.4% of skull and 4.4% of mandible shape variation, however, the total additive genetic variance associated with phenotypic variation was estimated in ~45%. Candidate genes within the associated loci have known roles in craniofacial development; this includes 6 transcription factors and several regulators of bone developmental pathways. One gene, Mn1, has an unusually large effect on shape variation in our study. A knockout of this gene was previously shown to affect negatively the development of membranous bones of the cranial skeleton, and evolutionary analysis shows that the gene has arisen at the base of the bony vertebrates (Eutelostomi), where the ossified head first appeared. Therefore, Mn1 emerges as a key gene for both skull formation and within-population shape variation. Our study shows that it is possible to identify important developmental genes through genome-wide mapping of high-dimensional shape features in an outbred population. Formation of the face, mandible, and skull is determined in part by genetic factors, but the relationship between genetic variation and craniofacial development is not well understood. We demonstrate how recent advances in mouse genomics and statistical methods can be used to identify genes involved in craniofacial development. We use outbred mice together with a dense panel of genetic markers to identify genetic loci affecting craniofacial shape. Some of the loci we identify are also known from past studies to contribute to craniofacial development and bone formation. For example, the top candidate gene identified in this study, Mn1, is a gene that appeared at a time when animals started to form bony skulls, suggesting that it may be a key gene in this evolutionary innovation. This further suggests that Mn1 and other genes involved in head formation are also responsible for more fine-grained regulation of its shape. Our results confirm that the outbred mouse population used in this study is suitable to identify single genetic factors even under conditions where many genes cooperate to generate a complex phenotype.
Collapse
|