126
|
Abstract
Genome-wide variation data with millions of genetic markers have become commonplace. However, the potential for interpretation and application of these data for clinical assessment of outcomes of interest, and prediction of disease risk, is currently not fully realized. Many common complex diseases now have numerous, well-established risk loci and likely harbor many genetic determinants with effects too small to be detected at genome-wide levels of statistical significance. A simple and intuitive approach for converting genetic data to a predictive measure of disease susceptibility is to aggregate the effects of these loci into a single measure, the genetic risk score. Here, we describe some common methods and software packages for calculating genetic risk scores and polygenic risk scores, with focus on studies of common complex diseases. We review the basic information needed, as well as important considerations for constructing genetic risk scores, including specific requirements for phenotypic and genetic data, and limitations in their application. © 2019 by John Wiley & Sons, Inc.
Collapse
|
127
|
Karavani E, Zuk O, Zeevi D, Barzilai N, Stefanis NC, Hatzimanolis A, Smyrnis N, Avramopoulos D, Kruglyak L, Atzmon G, Lam M, Lencz T, Carmi S. Screening Human Embryos for Polygenic Traits Has Limited Utility. Cell 2019; 179:1424-1435.e8. [PMID: 31761530 PMCID: PMC6957074 DOI: 10.1016/j.cell.2019.10.033] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2019] [Revised: 09/11/2019] [Accepted: 10/25/2019] [Indexed: 12/19/2022]
Abstract
The increasing proportion of variance in human complex traits explained by polygenic scores, along with progress in preimplantation genetic diagnosis, suggests the possibility of screening embryos for traits such as height or cognitive ability. However, the expected outcomes of embryo screening are unclear, which undermines discussion of associated ethical concerns. Here, we use theory, simulations, and real data to evaluate the potential gain of embryo screening, defined as the difference in trait value between the top-scoring embryo and the average embryo. The gain increases very slowly with the number of embryos but more rapidly with the variance explained by the score. Given current technology, the average gain due to screening would be ≈2.5 cm for height and ≈2.5 IQ points for cognitive ability. These mean values are accompanied by wide prediction intervals, and indeed, in large nuclear families, the majority of children top-scoring for height are not the tallest.
Collapse
|
128
|
Horreo JL, Suarez T, Fitze PS. Reversals in complex traits uncovered as reticulation events: Lessons from the evolution of parity-mode, chromosome morphology, and maternal resource transfer. JOURNAL OF EXPERIMENTAL ZOOLOGY PART B-MOLECULAR AND DEVELOPMENTAL EVOLUTION 2019; 334:5-13. [PMID: 31650690 DOI: 10.1002/jez.b.22912] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 05/18/2019] [Accepted: 10/02/2019] [Indexed: 11/08/2022]
Abstract
Complex traits include, among many others, the evolution of eyes, wings, body forms, reproductive modes, human intelligence, social behavior, diseases, and chromosome morphology. Dollo's law states that the evolution of complex traits is irreversible. However, potential exceptions have been proposed. Here, we investigated whether reticulation, a simple and elegant means by which complex characters may be reacquired, could account for suggested reversals in the evolution of complex characters using two datasets with sufficient genetic coverage and a total of five potential reversals. Our analyses uncovered a potential reversal in the evolution of parity mode and a potential reversal in the evolution of placentotrophy of fish (Cyprinodontiformes) as reticulation events. Moreover, in a reptile that exhibits a potential reversal from viviparity to oviparity (Zootoca vivipara), reticulation provided the most parsimonious explanation for sex chromosome evolution. Therefore, three of the five studied potential reversals were unraveled as reticulation events. This constitutes the first evidence that accounting for reticulation can fundamentally influence the interpretation of the evolution of complex traits, that testing for reticulation is crucial for obtaining robust phylogenies, and that complex ancestral characters may be reacquired through hybridization with a lineage that still exhibits the trait. Hybridization, rather than reappearance of ancestral traits by means of small evolutionary steps, may thus account for suggested exceptions to Dollo's law. Consequently, ruling out reticulation is required to claim the evolutionary reversal of complex characters and potential exceptions to Dollo's rule.
Collapse
|
129
|
Bloom JS, Boocock J, Treusch S, Sadhu MJ, Day L, Oates-Barker H, Kruglyak L. Rare variants contribute disproportionately to quantitative trait variation in yeast. eLife 2019; 8:49212. [PMID: 31647408 PMCID: PMC6892613 DOI: 10.7554/elife.49212] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2019] [Accepted: 10/23/2019] [Indexed: 11/24/2022] Open
Abstract
How variants with different frequencies contribute to trait variation is a central question in genetics. We use a unique model system to disentangle the contributions of common and rare variants to quantitative traits. We generated ~14,000 progeny from crosses among 16 diverse yeast strains and identified thousands of quantitative trait loci (QTLs) for 38 traits. We combined our results with sequencing data for 1011 yeast isolates to show that rare variants make a disproportionate contribution to trait variation. Evolutionary analyses revealed that this contribution is driven by rare variants that arose recently, and that negative selection has shaped the relationship between variant frequency and effect size. We leveraged the structure of the crosses to resolve hundreds of QTLs to single genes. These results refine our understanding of trait variation at the population level and suggest that studies of rare variants are a fertile ground for discovery of genetic effects.
Collapse
|
130
|
Raudsepp T, Finno CJ, Bellone RR, Petersen JL. Ten years of the horse reference genome: insights into equine biology, domestication and population dynamics in the post-genome era. Anim Genet 2019; 50:569-597. [PMID: 31568563 PMCID: PMC6825885 DOI: 10.1111/age.12857] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/09/2019] [Indexed: 12/14/2022]
Abstract
The horse reference genome from the Thoroughbred mare Twilight has been available for a decade and, together with advances in genomics technologies, has led to unparalleled developments in equine genomics. At the core of this progress is the continuing improvement of the quality, contiguity and completeness of the reference genome, and its functional annotation. Recent achievements include the release of the next version of the reference genome (EquCab3.0) and generation of a reference sequence for the Y chromosome. Horse satellite‐free centromeres provide unique models for mammalian centromere research. Despite extremely low genetic diversity of the Y chromosome, it has been possible to trace patrilines of breeds and pedigrees and show that Y variation was lost in the past approximately 2300 years owing to selective breeding. The high‐quality reference genome has led to the development of three different SNP arrays and WGSs of almost 2000 modern individual horses. The collection of WGS of hundreds of ancient horses is unique and not available for any other domestic species. These tools and resources have led to global population studies dissecting the natural history of the species and genetic makeup and ancestry of modern breeds. Most importantly, the available tools and resources, together with the discovery of functional elements, are dissecting molecular causes of a growing number of Mendelian and complex traits. The improved understanding of molecular underpinnings of various traits continues to benefit the health and performance of the horse whereas also serving as a model for complex disease across species.
Collapse
|
131
|
Harel T, Peshes-Yaloz N, Bacharach E, Gat-Viks I. Predicting Phenotypic Diversity from Molecular and Genetic Data. Genetics 2019; 213:297-311. [PMID: 31352366 PMCID: PMC6727812 DOI: 10.1534/genetics.119.302463] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2018] [Accepted: 07/04/2019] [Indexed: 01/03/2023] Open
Abstract
Despite the importance of complex phenotypes, an in-depth understanding of the combined molecular and genetic effects on a phenotype has yet to be achieved. Here, we introduce InPhenotype, a novel computational approach for complex phenotype prediction, where gene-expression data and genotyping data are integrated to yield quantitative predictions of complex physiological traits. Unlike existing computational methods, InPhenotype makes it possible to model potential regulatory interactions between gene expression and genomic loci without compromising the continuous nature of the molecular data. We applied InPhenotype to synthetic data, exemplifying its utility for different data parameters, as well as its superiority compared to current methods in both prediction quality and the ability to detect regulatory interactions of genes and genomic loci. Finally, we show that InPhenotype can provide biological insights into both mouse and yeast datasets.
Collapse
|
132
|
Abu‐Toamih Atamni HJ, Iraqi FA. Efficient protocols and methods for high-throughput utilization of the Collaborative Cross mouse model for dissecting the genetic basis of complex traits. Animal Model Exp Med 2019; 2:137-149. [PMID: 31773089 PMCID: PMC6762040 DOI: 10.1002/ame2.12074] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2019] [Accepted: 05/23/2019] [Indexed: 12/25/2022] Open
Abstract
The Collaborative Cross (CC) mouse model is a next-generation mouse genetic reference population (GRP) designated for a high-resolution quantitative trait loci (QTL) mapping of complex traits during health and disease. The CC lines were generated from reciprocal crosses of eight divergent mouse founder strains composed of five classical and three wild-derived strains. Complex traits are defined to be controlled by variations within multiple genes and the gene/environment interactions. In this article, we introduce and present variety of protocols and results of studying the host response to infectious and chronic diseases, including type 2 diabetes and metabolic diseases, body composition, immune response, colorectal cancer, susceptibility to Aspergillus fumigatus, Klebsiella pneumoniae, Pseudomonas aeruginosa, sepsis, and mixed infections of Porphyromonas gingivalis and Fusobacterium nucleatum, which were conducted at our laboratory using the CC mouse population. These traits are observed at multiple levels of the body systems, including metabolism, body weight, immune profile, susceptibility or resistance to the development and progress of infectious or chronic diseases. Herein, we present full protocols and step-by-step methods, implemented in our laboratory for the phenotypic and genotypic characterization of the different CC lines, mapping the gene underlying the host response to these infections and chronic diseases. The CC mouse model is a unique and powerful GRP for dissecting the host genetic architectures underlying complex traits, including chronic and infectious diseases.
Collapse
|
133
|
Yoshida GM, Lhorente JP, Correa K, Soto J, Salas D, Yáñez JM. Genome-Wide Association Study and Cost-Efficient Genomic Predictions for Growth and Fillet Yield in Nile Tilapia ( Oreochromis niloticus). G3 (BETHESDA, MD.) 2019; 9:2597-2607. [PMID: 31171566 PMCID: PMC6686944 DOI: 10.1534/g3.119.400116] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2019] [Accepted: 06/05/2019] [Indexed: 12/16/2022]
Abstract
Fillet yield (FY) and harvest weight (HW) are economically important traits in Nile tilapia production. Genetic improvement of these traits, especially for FY, are lacking, due to the absence of efficient methods to measure the traits without sacrificing fish and the use of information from relatives to selection. However, genomic information could be used by genomic selection to improve traits that are difficult to measure directly in selection candidates, as in the case of FY. The objectives of this study were: (i) to perform genome-wide association studies (GWAS) to dissect the genetic architecture of FY and HW, (ii) to evaluate the accuracy of genotype imputation and (iii) to assess the accuracy of genomic selection using true and imputed low-density (LD) single nucleotide polymorphism (SNP) panels to determine a cost-effective strategy for practical implementation of genomic information in tilapia breeding programs. The data set consisted of 5,866 phenotyped animals and 1,238 genotyped animals (108 parents and 1,130 offspring) using a 50K SNP panel. The GWAS were performed using all genotyped and phenotyped animals. The genotyped imputation was performed from LD panels (LD0.5K, LD1K and LD3K) to high-density panel (HD), using information from parents and 20% of offspring in the reference set and the remaining 80% in the validation set. In addition, we tested the accuracy of genomic selection using true and imputed genotypes comparing the accuracy obtained from pedigree-based best linear unbiased prediction (PBLUP) and genomic predictions. The results from GWAS supports evidence of the polygenic nature of FY and HW. The accuracy of imputation ranged from 0.90 to 0.98 for LD0.5K and LD3K, respectively. The accuracy of genomic prediction outperformed the estimated breeding value from PBLUP. The use of imputation for genomic selection resulted in an increased relative accuracy independent of the trait and LD panel analyzed. The present results suggest that genotype imputation could be a cost-effective strategy for genomic selection in Nile tilapia breeding programs.
Collapse
|
134
|
Truong DT, Adams AK, Paniagua S, Frijters JC, Boada R, Hill DE, Lovett MW, Mahone EM, Willcutt EG, Wolf M, Defries JC, Gialluisi A, Francks C, Fisher SE, Olson RK, Pennington BF, Smith SD, Bosson-Heenan J, Gruen JR. Multivariate genome-wide association study of rapid automatised naming and rapid alternating stimulus in Hispanic American and African-American youth. J Med Genet 2019; 56:557-566. [PMID: 30995994 PMCID: PMC6678051 DOI: 10.1136/jmedgenet-2018-105874] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 03/12/2019] [Accepted: 03/19/2019] [Indexed: 11/23/2022]
Abstract
BACKGROUND Rapid automatised naming (RAN) and rapid alternating stimulus (RAS) are reliable predictors of reading disability. The underlying biology of reading disability is poorly understood. However, the high correlation among RAN, RAS and reading could be attributable to shared genetic factors that contribute to common biological mechanisms. OBJECTIVE To identify shared genetic factors that contribute to RAN and RAS performance using a multivariate approach. METHODS We conducted a multivariate genome-wide association analysis of RAN Objects, RAN Letters and RAS Letters/Numbers in a sample of 1331 Hispanic American and African-American youth. Follow-up neuroimaging genetic analysis of cortical regions associated with reading ability in an independent sample and epigenetic examination of extant data predicting tissue-specific functionality in the brain were also conducted. RESULTS Genome-wide significant effects were observed at rs1555839 (p=4.03×10-8) and replicated in an independent sample of 318 children of European ancestry. Epigenetic analysis and chromatin state models of the implicated 70 kb region of 10q23.31 support active transcription of the gene RNLS in the brain, which encodes a catecholamine metabolising protein. Chromatin contact maps of adult hippocampal tissue indicate a potential enhancer-promoter interaction regulating RNLS expression. Neuroimaging genetic analysis in an independent, multiethnic sample (n=690) showed that rs1555839 is associated with structural variation in the right inferior parietal lobule. CONCLUSION This study provides support for a novel trait locus at chromosome 10q23.31 and proposes a potential gene-brain-behaviour relationship for targeted future functional analysis to understand underlying biological mechanisms for reading disability.
Collapse
|
135
|
Deng WQ, Mao S, Kalnapenkis A, Esko T, Mägi R, Paré G, Sun L. Analytical strategies to include the X-chromosome in variance heterogeneity analyses: Evidence for trait-specific polygenic variance structure. Genet Epidemiol 2019; 43:815-830. [PMID: 31332826 DOI: 10.1002/gepi.22247] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2019] [Revised: 06/07/2019] [Accepted: 06/13/2019] [Indexed: 12/12/2022]
Abstract
Genotype-stratified variance of a quantitative trait could differ in the presence of gene-gene or gene-environment interactions. Genetic markers associated with phenotypic variance are thus considered promising candidates for follow-up interaction or joint location-scale analyses. However, as in studies of main effects, the X-chromosome is routinely excluded from "whole-genome" scans due to analytical challenges. Specifically, as males carry only one copy of the X-chromosome, the inherent sex-genotype dependency could bias the trait-genotype association, through sexual dimorphism in quantitative traits with sex-specific means or variances. Here we investigate phenotypic variance heterogeneity associated with X-chromosome single nucleotide polymorphisms (SNPs) and propose valid and powerful strategies. Among those, a generalized Levene's test has adequate power and remains robust to sexual dimorphism. An alternative approach is a sex-stratified analysis but at the cost of slightly reduced power and modeling flexibility. We applied both methods to an Estonian study of gene expression quantitative trait loci (eQTL; n = 841), and two complex trait studies of height, hip, and waist circumferences, and body mass index from Multi-Ethnic Study of Atherosclerosis (MESA; n = 2,073) and UK Biobank (UKB; n = 327,393). Consistent with previous eQTL findings on mean, we found some but no conclusive evidence for cis regulators being enriched for variance association. SNP rs2681646 is associated with variance of waist circumference (p = 9.5E-07) at X-chromosome-wide significance in UKB, with a suggestive female-specific effect in MESA (p = 0.048). Collectively, an enrichment analysis using permutated UKB (p < 0.1) and MESA (p < 0.01) datasets, suggests a possible polygenic structure for the variance of human height.
Collapse
|
136
|
Huan T, Mendelson M, Joehanes R, Yao C, Liu C, Song C, Bhattacharya A, Rong J, Tanriverdi K, Keefe J, Murabito JM, Courchesne P, Larson MG, Freedman JE, Levy D. Epigenome-wide association study of DNA methylation and microRNA expression highlights novel pathways for human complex traits. Epigenetics 2019; 15:183-198. [PMID: 31282290 DOI: 10.1080/15592294.2019.1640547] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
DNA methylation (DNAm) and microRNAs (miRNAs) have been implicated in a wide-range of human diseases. While often studied in isolation, DNAm and miRNAs are not independent. We analyzed associations of expression of 283 miRNAs with DNAm at >400K CpG sites in whole blood obtained from 3565 individuals and identified 227 CpGs at which differential methylation was associated with the expression of 40 nearby miRNAs (cis-miR-eQTMs) at FDR<0.01, including 91 independent CpG sites at r2 < 0.2. cis-miR-eQTMs were enriched for CpGs in promoter and polycomb-repressed state regions, and 60% were inversely associated with miRNA expression. Bidirectional Mendelian randomization (MR) analysis further identified 58 cis-miR-eQTMCpG-miRNA pairs where DNAm changes appeared to drive miRNA expression changes and opposite directional effects were unlikely. Integration of genetic variants in joint analyses revealed an average partial between cis-miR-eQTM CpGs and miRNAs of 2% after conditioning on site-specific genetic variation, suggesting that DNAm is an important epigenetic regulator of miRNA expression. Finally, two-step MR analysis was performed to identify putatively causal CpGs driving miRNA expression in relation to human complex traits. We found that an imprinted region on 14q32 that was previously identified in relation to age at menarche is enriched with cis-miR-eQTMs. Nine CpGs and three miRNAs at this locus tested causal for age at menarche, reflecting novel epigenetic-driven molecular pathways underlying this complex trait. Our study sheds light on the joint genetic and epigenetic regulation of miRNA expression and provides insights into the relations of miRNAs to their targets and to complex phenotypes.
Collapse
|
137
|
Brieger K, Zajac GJM, Pandit A, Foerster JR, Li KW, Annis AC, Schmidt EM, Clark CP, McMorrow K, Zhou W, Yang J, Kwong AM, Boughton AP, Wu J, Scheller C, Parikh T, de la Vega A, Brazel DM, Frieser M, Rea-Sandin G, Fritsche LG, Vrieze SI, Abecasis GR. Genes for Good: Engaging the Public in Genetics Research via Social Media. Am J Hum Genet 2019; 105:65-77. [PMID: 31204010 PMCID: PMC6612519 DOI: 10.1016/j.ajhg.2019.05.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 05/08/2019] [Indexed: 01/06/2023] Open
Abstract
The Genes for Good study uses social media to engage a large, diverse participant pool in genetics research and education. Health history and daily tracking surveys are administered through a Facebook application, and participants who complete a minimum number of surveys are mailed a saliva sample kit ("spit kit") to collect DNA for genotyping. As of March 2019, we engaged >80,000 individuals, sent spit kits to >32,000 individuals who met minimum participation requirements, and collected >27,000 spit kits. Participants come from all 50 states and include a diversity of ancestral backgrounds. Rates of important chronic health indicators are consistent with those estimated for the general U.S. population using more traditional study designs. However, our sample is younger and contains a greater percentage of females than the general population. As one means of verifying data quality, we have replicated genome-wide association studies (GWASs) for exemplar traits, such as asthma, diabetes, body mass index (BMI), and pigmentation. The flexible framework of the web application makes it relatively simple to add new questionnaires and for other researchers to collaborate. We anticipate that the study sample will continue to grow and that future analyses may further capitalize on the strengths of the longitudinal data in combination with genetic information.
Collapse
|
138
|
Skelly DA, Raghupathy N, Robledo RF, Graber JH, Chesler EJ. Reference Trait Analysis Reveals Correlations Between Gene Expression and Quantitative Traits in Disjoint Samples. Genetics 2019; 212:919-929. [PMID: 31113812 PMCID: PMC6614885 DOI: 10.1534/genetics.118.301865] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 05/14/2019] [Indexed: 12/21/2022] Open
Abstract
Systems genetic analysis of complex traits involves the integrated analysis of genetic, genomic, and disease-related measures. However, these data are often collected separately across multiple study populations, rendering direct correlation of molecular features to complex traits impossible. Recent transcriptome-wide association studies (TWAS) have harnessed gene expression quantitative trait loci (eQTL) to associate unmeasured gene expression with a complex trait in genotyped individuals, but this approach relies primarily on strong eQTL. We propose a simple and powerful alternative strategy for correlating independently obtained sets of complex traits and molecular features. In contrast to TWAS, our approach gains precision by correlating complex traits through a common set of continuous phenotypes instead of genetic predictors, and can identify transcript-trait correlations for which the regulation is not genetic. In our approach, a set of multiple quantitative "reference" traits is measured across all individuals, while measures of the complex trait of interest and transcriptional profiles are obtained in disjoint subsamples. A conventional multivariate statistical method, canonical correlation analysis, is used to relate the reference traits and traits of interest to identify gene expression correlates. We evaluate power and sample size requirements of this methodology, as well as performance relative to other methods, via extensive simulation and analysis of a behavioral genetics experiment in 258 Diversity Outbred mice involving two independent sets of anxiety-related behaviors and hippocampal gene expression. After splitting the data set and hiding one set of anxiety-related traits in half the samples, we identified transcripts correlated with the hidden traits using the other set of anxiety-related traits and exploiting the highest canonical correlation (R = 0.69) between the trait data sets. We demonstrate that this approach outperforms TWAS in identifying associated transcripts. Together, these results demonstrate the validity, reliability, and power of reference trait analysis for identifying relations between complex traits and their molecular substrates.
Collapse
|
139
|
Kikas T, Rull K, Beaumont RN, Freathy RM, Laan M. The Effect of Genetic Variation on the Placental Transcriptome in Humans. Front Genet 2019; 10:550. [PMID: 31244887 PMCID: PMC6581026 DOI: 10.3389/fgene.2019.00550] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2019] [Accepted: 05/24/2019] [Indexed: 12/22/2022] Open
Abstract
The knowledge of genetic variants shaping human placental transcriptome is limited and they are not cataloged in the Genotype-Tissue Expression project. So far, only one whole genome analysis of placental expression quantitative trait loci (eQTLs) has been published by Peng et al. (2017) with no external independent validation. We report the second study on the landscape of placental eQTLs. The study aimed to generate a high-confidence list of placental cis-eQTLs and to investigate their potential functional implications. Analysis of cis-eQTLs (±100 kbp from the gene) utilized 40 placental RNA sequencing and respective whole genome genotyping datasets. The identified 199 placental cis-eSNPs represented 88 independent eQTL signals (FDR < 5%). The most significant placental eQTLs (FDR < 10-5) modulated the expression of ribosomal protein RPL9, transcription factor ZSCAN9 and aminopeptidase ERAP2. The analysis confirmed 50 eSNP-eGenes pairs reported by Peng et al. (2017) and thus, can be claimed as robust placental eQTL signals. The study identified also 13 novel placental eGenes. Among these, ZSCAN9 is modulated by several eSNPs (experimentally validated: rs1150707) that have been also shown to affect the methylation level of genes variably escaping X-chromosomal inactivation. The identified 63 placental eGenes exhibited mostly mixed or ubiquitous expression. Functional enrichment analysis highlighted 35 Gene Ontology categories with the top ranking pathways "ruffle membrane" (FDR = 1.81 × 10-2) contributing to the formation of motile cell surface and "ATPase activity, coupled" (FDR = 2.88 × 10-2), critical for the membrane transport. Placental eGenes were also significantly enriched in pathways implicated in development, signaling and immune function. However, this study was not able to confirm a significant overrepresentation of genome-wide association studies top hits among the placental eSNP and eGenes, reported by Peng et al. (2017). The identified eSNPs were further analyzed in association with newborn and pregnancy traits. In the discovery step, a suggestive association was detected between the eQTL of ALPG (rs11678251) and reduced placental, newborn's and infant's weight. Meta-analysis across REPROMETA, HAPPY PREGNANCY, ALSPAC cohorts (n = 6830) did not replicate these findings. In summary, the study emphasizes the role of genetic variation in driving the transcriptome profile of the human placenta and the importance to explore further its functional implications.
Collapse
|
140
|
Fine RS, Pers TH, Amariuta T, Raychaudhuri S, Hirschhorn JN. Benchmarker: An Unbiased, Association-Data-Driven Strategy to Evaluate Gene Prioritization Algorithms. Am J Hum Genet 2019; 104:1025-1039. [PMID: 31056107 PMCID: PMC6556976 DOI: 10.1016/j.ajhg.2019.03.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/28/2019] [Indexed: 01/17/2023] Open
Abstract
Genome-wide association studies (GWASs) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of similarity-based prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to 20 well-powered GWASs and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression; genes prioritized based on gene sets had higher per-SNP heritability than those prioritized based on gene expression. Additionally, in a direct comparison of three methods, DEPICT and MAGMA outperformed NetWAS. We also evaluated combinations of methods; our results indicated that combining data sources and algorithms can help prioritize higher-quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any similarity-based method that provides genome-wide prioritization of genes, variants, or gene sets and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.
Collapse
|
141
|
Saul MC, Philip VM, Reinholdt LG, Chesler EJ. High-Diversity Mouse Populations for Complex Traits. Trends Genet 2019; 35:501-514. [PMID: 31133439 DOI: 10.1016/j.tig.2019.04.003] [Citation(s) in RCA: 84] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 04/19/2019] [Accepted: 04/22/2019] [Indexed: 12/21/2022]
Abstract
Contemporary mouse genetic reference populations are a powerful platform to discover complex disease mechanisms. Advanced high-diversity mouse populations include the Collaborative Cross (CC) strains, Diversity Outbred (DO) stock, and their isogenic founder strains. When used in systems genetics and integrative genomics analyses, these populations efficiently harnesses known genetic variation for precise and contextualized identification of complex disease mechanisms. Extensive genetic, genomic, and phenotypic data are already available for these high-diversity mouse populations and a growing suite of data analysis tools have been developed to support research on diverse mice. This integrated resource can be used to discover and evaluate disease mechanisms relevant across species.
Collapse
|
142
|
Jakobson CM, Jarosz DF. Molecular Origins of Complex Heritability in Natural Genotype-to-Phenotype Relationships. Cell Syst 2019; 8:363-379.e3. [PMID: 31054809 PMCID: PMC6560647 DOI: 10.1016/j.cels.2019.04.002] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 03/25/2019] [Accepted: 04/05/2019] [Indexed: 01/09/2023]
Abstract
The statistical complexity of heredity has long been evident, but its molecular origins remain elusive. To investigate, we charted 90 comprehensive genotype-to-phenotype maps in a large population of wild diploid yeast. In contrast to long-standing assumptions, all types of genetic variation contributed similarly to phenotype. Causal synonymous and regulatory variants exhibited distinct molecular signatures, as did nonlinearities in heterozygote fitness that likely contribute to hybrid vigor. Highly pleiotropic variants altered disordered sequences within signaling hubs, and their effects correlated across environments-even when antagonistic-suggesting that large fitness gains bring concomitant costs. Natural genetic networks defined by the causal loci differed from those determined by precise gene deletions or protein-protein interactions. Finally, we found that traits that would appear omnigenic in less powered studies do in fact have finite genetic determinants. Integrating these molecular principles will be crucial as genome reading and writing become routine in research, industry, and medicine.
Collapse
|
143
|
Major M, Freund MK, Burch KS, Mancuso N, Ng M, Furniss D, Pasaniuc B, Ophoff RA. Integrative analysis of Dupuytren's disease identifies novel risk locus and reveals a shared genetic etiology with BMI. Genet Epidemiol 2019; 43:629-645. [PMID: 31087417 PMCID: PMC6699495 DOI: 10.1002/gepi.22209] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2019] [Revised: 04/04/2019] [Accepted: 04/19/2019] [Indexed: 12/26/2022]
Abstract
Dupuytren's disease is a common inherited tissue‐specific fibrotic disorder, characterized by progressive and irreversible fibroblastic proliferation affecting the palmar fascia of the hand. Although genome‐wide association study (GWAS) have identified 24 genomic regions associated with Dupuytrens risk, the biological mechanisms driving signal at these regions remain elusive. We identify potential biological mechanisms for Dupuytren's disease by integrating the most recent, largest GWAS (3,871 cases and 4,686 controls) with eQTLs (47 tissue panels from five consortia, total n = 3,975) to perform a transcriptome‐wide association study. We identify 43 tissue‐specific gene associations with Dupuytren's risk, including one in a novel risk region. We also estimate the genome‐wide genetic correlation between Dupuytren's disease and 45 complex traits and find significant genetic correlations between Dupuytren's disease and body mass index (BMI), type II diabetes, triglycerides, and high‐density lipoprotein (HDL), suggesting a shared genetic etiology between these traits. We further examine local genetic correlation to identify 8 and 3 novel regions significantly correlated with BMI and HDL respectively. Our results are consistent with previous epidemiological findings showing that lower BMI increases risk for Dupuytren's disease. These 12 novel risk regions provide new insight into the biological mechanisms of Dupuytren's disease and serve as a starting point for functional validation.
Collapse
|
144
|
Liu X, Li YI, Pritchard JK. Trans Effects on Gene Expression Can Drive Omnigenic Inheritance. Cell 2019; 177:1022-1034.e6. [PMID: 31051098 PMCID: PMC6553491 DOI: 10.1016/j.cell.2019.04.014] [Citation(s) in RCA: 264] [Impact Index Per Article: 52.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Revised: 12/18/2018] [Accepted: 04/07/2019] [Indexed: 01/02/2023]
Abstract
Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.
Collapse
|
145
|
Chen E, Huang X, Tian Z, Wing RA, Han B. The Genomics of Oryza Species Provides Insights into Rice Domestication and Heterosis. ANNUAL REVIEW OF PLANT BIOLOGY 2019; 70:639-665. [PMID: 31035826 DOI: 10.1146/annurev-arplant-050718-100320] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/20/2023]
Abstract
Here, we review recent progress in genetic and genomic studies of the diversity of Oryza species. In recent years, unlocking the genetic diversity of Oryza species has provided insights into the genomics of rice domestication, heterosis, and complex traits. Genome sequencing and analysis of numerous wild rice (Oryza rufipogon) and Asian cultivated rice (Oryza sativa) accessions have enabled the identification of genome-wide signatures of rice domestication and the unlocking of the origin of Asian cultivated rice. Moreover, similar studies on genome variations of African rice (Oryza glaberrima) cultivars and their closely related wild progenitor Oryza barthii accessions have provided strong evidence to support a theory of independent domestication in African rice. Integrated genomic approaches have efficiently investigated many heterotic loci in hybrid rice underlying yield heterosis advantages and revealed the genomic architecture of rice heterosis. We conclude that in-depth unlocking of genetic variations among Oryza species will further enhance rice breeding.
Collapse
|
146
|
Abstract
IgA nephropathy (IgAN) represents a genetically complex multifactorial trait. Its prevalence and clinical features vary geographically, and the disease has a range of clinical presentations that suggest multiple subtypes. Although familial aggregation of IgAN has been reported and prior linkage studies have highlighted significant locus heterogeneity, specific genetic variants underlying familial IgAN have not yet been defined. Population-based genome-wide association studies (GWAS) have discovered nearly 20 IgAN risk loci, providing novel insights into disease epidemiology and molecular mechanisms, shifting old paradigms of the disease pathogenesis. Follow-up fine-mapping studies have identified specific causal variants, and genotype-phenotype correlation studies have begun to delineate clinical consequences of GWAS risk alleles. The association between IgAN and galactose-deficient IgA1 (Gd-IgA1), a validated serum biomarker of IgAN, presented another avenue for genetic discovery because elevated serum levels of Gd-IgA1 are highly heritable. Recent GWAS for serum Gd-IgA1 levels provided novel insights into genetic regulation of this trait, but the genetic link between Gd-IgA1 and IgAN has not yet been established. In this review, we discuss these developments in the broader context of modern genetic approaches to complex traits, and provide our perspective on the critical challenges that need to be addressed to advance the field.
Collapse
|
147
|
Mikhaylova AV, Thornton TA. Accuracy of Gene Expression Prediction From Genotype Data With PrediXcan Varies Across and Within Continental Populations. Front Genet 2019; 10:261. [PMID: 31001318 PMCID: PMC6456650 DOI: 10.3389/fgene.2019.00261] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 03/08/2019] [Indexed: 01/08/2023] Open
Abstract
Using genetic data to predict gene expression has garnered significant attention in recent years. PrediXcan has become one of the most widely used gene-based methods for testing associations between predicted gene expression values and a phenotype, which has facilitated novel insights into the relationship between complex traits and the component of gene expression that can be attributed to genetic variation. The gene expression prediction models for PrediXcan were developed using supervised machine learning methods and training data from the Depression Genes and Networks (DGN) study and the Genotype-Tissue Expression (GTEx) project, where the majority of subjects are of European descent. Many genetic studies, however, include samples from multi-ethnic populations, and in this paper we evaluate the accuracy of PrediXcan for predicting gene expression in diverse populations. Using transcriptomic data from the GEUVADIS (Genetic European Variation in Disease) RNA sequencing project and whole genome sequencing data from the 1000 Genomes project, we evaluate and compare the predictive performance of PrediXcan in an African population (Yoruban) and four European ancestry populations for thousands of genes. We evaluate a range of models from the PrediXcan weight databases and use Pearson's correlation coefficient to assess gene expression prediction accuracy with PrediXcan. From our evaluation, we find that the predictive performance of PrediXcan varies substantially among populations from different continents (F-test p-value < 2.2 × 10-16), where prediction accuracy is lower in the Yoruban population from West Africa compared to the European-ancestry populations. Moreover, not only do we find differences in predictive performance between populations from different continents, we also find highly significant differences in prediction accuracy among the four European ancestry populations considered (F-test p-value < 2.2 × 10-16). Finally, while there is variability in prediction accuracy across different PrediXcan weight databases, we also find consistency in the qualitative performance of PrediXcan for the five populations considered, with the African ancestry population having the lowest accuracy across databases.
Collapse
|
148
|
Sheftel H, Szekely P, Mayo A, Sella G, Alon U. Evolutionary trade-offs and the structure of polymorphisms. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0105. [PMID: 29632259 DOI: 10.1098/rstb.2017.0105] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/30/2017] [Indexed: 12/15/2022] Open
Abstract
Populations of organisms show genetic differences called polymorphisms. Understanding the effects of polymorphisms is important for biology and medicine. Here, we ask which polymorphisms occur at high frequency when organisms evolve under trade-offs between multiple tasks. Multiple tasks present a problem, because it is not possible to be optimal at all tasks simultaneously and hence compromises are necessary. Recent work indicates that trade-offs lead to a simple geometry of phenotypes in the space of traits: phenotypes fall on the Pareto front, which is shaped as a polytope: a line, triangle, tetrahedron etc. The vertices of these polytopes are the optimal phenotypes for a single task. Up to now, work on this Pareto approach has not considered its genetic underpinnings. Here, we address this by asking how the polymorphism structure of a population is affected by evolution under trade-offs. We simulate a multi-task selection scenario, in which the population evolves to the Pareto front: the line segment between two archetypes or the triangle between three archetypes. We find that polymorphisms that become prevalent in the population have pleiotropic phenotypic effects that align with the Pareto front. Similarly, epistatic effects between prevalent polymorphisms are parallel to the front. Alignment with the front occurs also for asexual mating. Alignment is reduced when drift or linkage is strong, and is replaced by a more complex structure in which many perpendicular allele effects cancel out. Aligned polymorphism structure allows mating to produce offspring that stand a good chance of being optimal multi-taskers in at least one of the locales available to the species.This article is part of the theme issue 'Self-organization in cell biology'.
Collapse
|
149
|
Weissenkampen JD, Jiang Y, Eckert S, Jiang B, Li B, Liu DJ. Methods for the Analysis and Interpretation for Rare Variants Associated with Complex Traits. ACTA ACUST UNITED AC 2019; 101:e83. [PMID: 30849219 DOI: 10.1002/cphg.83] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
With the advent of Next Generation Sequencing (NGS) technologies, whole genome and whole exome DNA sequencing has become affordable for routine genetic studies. Coupled with improved genotyping arrays and genotype imputation methodologies, it is increasingly feasible to obtain rare genetic variant information in large datasets. Such datasets allow researchers to gain a more complete understanding of the genetic architecture of complex traits caused by rare variants. State-of-the-art statistical methods for the statistical genetics analysis of sequence-based association, including efficient algorithms for association analysis in biobank-scale datasets, gene-association tests, meta-analysis, fine mapping methods that integrate functional genomic dataset, and phenome-wide association studies (PheWAS), are reviewed here. These methods are expected to be highly useful for next generation statistical genetics analysis in the era of precision medicine. © 2019 by John Wiley & Sons, Inc.
Collapse
|
150
|
De La Torre AR, Puiu D, Crepeau MW, Stevens K, Salzberg SL, Langley CH, Neale DB. Genomic architecture of complex traits in loblolly pine. THE NEW PHYTOLOGIST 2019; 221:1789-1801. [PMID: 30318590 DOI: 10.1111/nph.15535] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Accepted: 10/06/2018] [Indexed: 05/02/2023]
Abstract
Dissecting the genetic and genomic architecture of complex traits is essential to understand the forces maintaining the variation in phenotypic traits of ecological and economical importance. Whole-genome resequencing data were used to generate high-resolution polymorphic single nucleotide polymorphism (SNP) markers and genotype individuals from common gardens across the loblolly pine (Pinus taeda) natural range. Genome-wide associations were tested with a large phenotypic dataset comprising 409 variables including morphological traits (height, diameter, carbon isotope discrimination, pitch canker resistance), and molecular traits such as metabolites and expression of xylem development genes. Our study identified 2335 new SNP × trait associations for the species, with many SNPs located in physical clusters in the genome of the species; and the genomic location of hotspots for metabolic × genotype associations. We found a highly polygenic basis of quantitative inheritance, with significant differences in number, effects size, genomic location and frequency of alleles contributing to variation in phenotypes in the different traits. While mutation-selection balance might be shaping the genetic variation in metabolic traits, balancing selection is more likely to shape the variation in expression of xylem development genes. Our work contributes to the study of complex traits in nonmodel plant species by identifying associations at a whole-genome level.
Collapse
|