101
|
Oyelami FO, Zhao Q, Xu Z, Zhang Z, Sun H, Zhang Z, Ma P, Wang Q, Pan Y. Haplotype Block Analysis Reveals Candidate Genes and QTLs for Meat Quality and Disease Resistance in Chinese Jiangquhai Pig Breed. Front Genet 2020; 11:752. [PMID: 33101353 PMCID: PMC7498712 DOI: 10.3389/fgene.2020.00752] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2019] [Accepted: 06/23/2020] [Indexed: 11/19/2022] Open
Abstract
The Jiangquhai (JQ) pig breed is one of the most widely recognized pig populations in China due to its unique and dominant characteristics. In this study, we examined the extent of Linkage disequilibrium (LD) and haplotype block structure of the JQ pig breed, and scanned the blocks for possible genes underlying important QTLs that could either be responsible for some adaptive features in these pigs or might have undergone some selection pressure. We compared some of our results with other Chinese and Western pig breeds. The results show that the JQ breed had the highest total block length (349.73 Mb ≈ 15% of its genome), and the coverage rate of blocks in most of its chromosomes was larger than those of other breeds except for Sus scrofa chromosome 4 (SSC4), SSC6, SSC7, SSC8, SSC10, SSC12, SSC13, SSC14, SSC17, SSC18, and SSCX. Moreover, the JQ breed had more SNPs that were clustered into haplotype blocks than the other breeds examined in this study. Our shared and unique haplotype block analysis revealed that the Hongdenglong (HD) breed had the lowest percentage of shared haplotype blocks while the Shanzhu (SZ) breed had the highest. We found that the JQ breed had an average r2 > 0.2 at SNPs distances 10–20 kb and concluded that about 120,000–240,000 SNPs would be needed for a successful GWAS in the breed. Finally, we detected a total of 88 genes harbored by selected haplotype blocks in the JQ breed, of which only 4 were significantly enriched (p-value ≤ 0.05). These genes were significantly enriched in 2 GO terms (p-value < 0.01), and 2 KEGG pathways (p-value < 0.02). Most of these enriched genes were related to health. Also, most of the overlapping QTLs detected in the haplotype blocks were related to meat and carcass quality, as well as health, with a few of them relating to reproduction and production. These results provide insights into the genetic architecture of some adaptive and meat quality traits observed in the JQ pig breed and also revealed the pattern of LD in the genome of the pig. Our result provides significant guidance for improving the statistical power of GWAS and optimizing the conservation strategy for this JQ pig breed.
Collapse
|
102
|
The Role of Noncoding Variants in Heritable Disease. Trends Genet 2020; 36:880-891. [PMID: 32741549 DOI: 10.1016/j.tig.2020.07.004] [Citation(s) in RCA: 55] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2020] [Revised: 06/30/2020] [Accepted: 07/02/2020] [Indexed: 12/26/2022]
Abstract
The genetic basis of disease has largely focused on coding regions. However, it has become clear that a large proportion of the noncoding genome is functional and harbors genetic variants that contribute to disease etiology. Here, we review recent examples of inherited noncoding alterations that are responsible for Mendelian disorders or act to influence complex traits. We explore both rare and common genetic variants and discuss the wide range of mechanisms by which they affect gene regulation to promote disease. We also debate the challenges and progress associated with identifying and interpreting the functional and clinical significance of genetic variation in the context of the noncoding regulatory landscape.
Collapse
|
103
|
Lefebvre M, Bruel AL, Tisserant E, Bourgon N, Duffourd Y, Collardeau-Frachon S, Attie-Bitach T, Kuentz P, Assoum M, Schaefer E, El Chehadeh S, Antal MC, Kremer V, Girard-Lemaitre F, Mandel JL, Lehalle D, Nambot S, Jean-Marçais N, Houcinat N, Moutton S, Marle N, Lambert L, Jonveaux P, Foliguet B, Mazutti JP, Gaillard D, Alanio E, Poirisier C, Lebre AS, Aubert-Lenoir M, Arbez-Gindre F, Odent S, Quélin C, Loget P, Fradin M, Willems M, Bigi N, Perez MJ, Blesson S, Francannet C, Beaufrere AM, Patrier-Sallebert S, Guerrot AM, Goldenberg A, Brehin AC, Lespinasse J, Touraine R, Capri Y, Saint-Frison MH, Laurent N, Philippe C, Tran Mau-Them F, Thevenon J, Faivre L, Thauvin-Robinet C, Vitobello A. Genotype-first in a cohort of 95 fetuses with multiple congenital abnormalities: when exome sequencing reveals unexpected fetal phenotype-genotype correlations. J Med Genet 2020; 58:400-413. [PMID: 32732226 DOI: 10.1136/jmedgenet-2020-106867] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 05/04/2020] [Accepted: 05/21/2020] [Indexed: 11/03/2022]
Abstract
PURPOSE Molecular diagnosis based on singleton exome sequencing (sES) is particularly challenging in fetuses with multiple congenital abnormalities (MCA). Indeed, some studies reveal a diagnostic yield of about 20%, far lower than in live birth individuals showing developmental abnormalities (30%), suggesting that standard analyses, based on the correlation between clinical hallmarks described in postnatal syndromic presentations and genotype, may underestimate the impact of the genetic variants identified in fetal analyses. METHODS We performed sES in 95 fetuses with MCA. Blind to phenotype, we applied a genotype-first approach consisting of combined analyses based on variants annotation and bioinformatics predictions followed by reverse phenotyping. Initially applied to OMIM-morbid genes, analyses were then extended to all genes. We complemented our approach by using reverse phenotyping, variant segregation analysis, bibliographic search and data sharing in order to establish the clinical significance of the prioritised variants. RESULTS sES rapidly identified causal variant in 24/95 fetuses (25%), variants of unknown significance in OMIM genes in 8/95 fetuses (8%) and six novel candidate genes in 6/95 fetuses (6%). CONCLUSIONS This method, based on a genotype-first approach followed by reverse phenotyping, shed light on unexpected fetal phenotype-genotype correlations, emphasising the relevance of prenatal studies to reveal extreme clinical presentations associated with well-known Mendelian disorders.
Collapse
|
104
|
Frank DN, Giese APJ, Hafren L, Bootpetch TC, Yarza TKL, Steritz MJ, Pedro M, Labra PJ, Daly KA, Tantoco MLC, Szeremeta W, Reyes-Quintos MRT, Ahankoob N, Llanes EGDV, Pine HS, Yousaf S, Ir D, Einarsdottir E, de la Cruz RAR, Lee NR, Nonato RMA, Robertson CE, Ong KMC, Magno JPM, Chiong ANE, Espiritu-Chiong MC, San Agustin ML, Cruz TLG, Abes GT, Bamshad MJ, Cutiongco-de la Paz EM, Kere J, Nickerson DA, Mohlke KL, Riazuddin S, Chan A, Mattila PS, Leal SM, Ryan AF, Ahmed ZM, Chonmaitree T, Sale MM, Chiong CM, Santos-Cortez RLP. Otitis media susceptibility and shifts in the head and neck microbiome due to SPINK5 variants. J Med Genet 2020; 58:442-452. [PMID: 32709676 DOI: 10.1136/jmedgenet-2020-106844] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 05/06/2020] [Accepted: 05/24/2020] [Indexed: 12/16/2022]
Abstract
BACKGROUND Otitis media (OM) susceptibility has significant heritability; however, the role of rare variants in OM is mostly unknown. Our goal is to identify novel rare variants that confer OM susceptibility. METHODS We performed exome and Sanger sequencing of >1000 DNA samples from 551 multiethnic families with OM and unrelated individuals, RNA-sequencing and microbiome sequencing and analyses of swabs from the outer ear, middle ear, nasopharynx and oral cavity. We also examined protein localisation and gene expression in infected and healthy middle ear tissues. RESULTS A large, intermarried pedigree that includes 81 OM-affected and 53 unaffected individuals cosegregates two known rare A2ML1 variants, a common FUT2 variant and a rare, novel pathogenic variant c.1682A>G (p.Glu561Gly) within SPINK5 (LOD=4.09). Carriage of the SPINK5 missense variant resulted in increased relative abundance of Microbacteriaceae in the middle ear, along with occurrence of Microbacteriaceae in the outer ear and oral cavity but not the nasopharynx. Eight additional novel SPINK5 variants were identified in 12 families and individuals with OM. A role for SPINK5 in OM susceptibility is further supported by lower RNA counts in variant carriers, strong SPINK5 localisation in outer ear skin, faint localisation to middle ear mucosa and eardrum and increased SPINK5 expression in human cholesteatoma. CONCLUSION SPINK5 variants confer susceptibility to non-syndromic OM. These variants potentially contribute to middle ear pathology through breakdown of mucosal and epithelial barriers, immunodeficiency such as poor vaccination response, alteration of head and neck microbiota and facilitation of entry of opportunistic pathogens into the middle ear.
Collapse
|
105
|
Fryett JJ, Morris AP, Cordell HJ. Investigation of prediction accuracy and the impact of sample size, ancestry, and tissue in transcriptome-wide association studies. Genet Epidemiol 2020; 44:425-441. [PMID: 32190932 PMCID: PMC8641384 DOI: 10.1002/gepi.22290] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2019] [Revised: 02/05/2020] [Accepted: 03/06/2020] [Indexed: 01/14/2023]
Abstract
In transcriptome-wide association studies (TWAS), gene expression values are predicted using genotype data and tested for association with a phenotype. The power of this approach to detect associations relies, at least in part, on the accuracy of the prediction. Here we compare the prediction accuracy of six different methods-LASSO, Ridge regression, Elastic net, Best Linear Unbiased Predictor, Bayesian Sparse Linear Mixed Model, and Random Forests-by performing cross-validation using data from the Geuvadis Project. We also examine prediction accuracy (a) at different sample sizes, (b) when ancestry of the prediction model training and testing populations is different, and (c) when the tissue used to train the model is different from the tissue to be predicted. We find that, for most genes, the expression cannot be accurately predicted, but in general sparse statistical models tend to outperform polygenic models at prediction. Average prediction accuracy is reduced when the model training set size is reduced or when predicting across ancestries and is marginally reduced when predicting across tissues. We conclude that using sparse statistical models and the development of large reference panels across multiple ethnicities and tissues will lead to better prediction of gene expression, and thus may improve TWAS power.
Collapse
|
106
|
Zarubin M, Yakhnenko A, Kravchenko E. Transcriptome analysis of Drosophila melanogaster laboratory strains of different geographical origin after long-term laboratory maintenance. Ecol Evol 2020; 10:7082-7093. [PMID: 32760513 PMCID: PMC7391317 DOI: 10.1002/ece3.6410] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/28/2020] [Accepted: 05/03/2020] [Indexed: 01/18/2023] Open
Abstract
Positive selection may be the main factor of the between-population divergence in gene expression. Expression profiles of two Drosophila melanogaster laboratory strains of different geographical origin and long-term laboratory maintenance were analyzed using microchip arrays encompassing probes for 18,500 transcripts. The Russian strain D18 and the North American strain Canton-S were compared. A set of 223 known or putative genes demonstrated significant changes in expression levels between these strains. Differentially expressed genes (DEG) were enriched in response to DDT (p = .0014), proteolysis (p = 2.285E-5), transmembrane transport (p = 1.03E-4), carbohydrate metabolic process (p = .0317), protein homotetramerization (p = .0444), and antibacterial humoral response (p = 425E-4). The expression in subset of genes from different categories was verified by qRT-PCR. Analysis of transcript abundance between Canton-S and D18 strains allowed to select several genes to estimate their participation in latitude adaptation. Expression of selected genes was analyzed in five D. melanogaster lines of different geographic origins by qRT-PCR, and we found two candidate genes that may be associated with latitude adaptation in adult flies-smp-30 and Cda9. Quite possible that several alleles of these genes may be important for insect survival in the environments of global warming. It is interesting that the number of genes involved in local adaptation demonstrates expression level appropriate to their geographical origin even after decades of laboratory maintenance.
Collapse
|
107
|
Sadler B, Haller G, Antunes L, Nikolov M, Amarillo I, Coe B, Dobbs MB, Gurnett CA. Rare and de novo duplications containing SHOX in clubfoot. J Med Genet 2020; 57:851-857. [PMID: 32518174 PMCID: PMC7688552 DOI: 10.1136/jmedgenet-2020-106842] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Revised: 03/04/2020] [Accepted: 03/05/2020] [Indexed: 11/12/2022]
Abstract
Introduction Congenital clubfoot is a common birth defect that affects at least 0.1% of all births. Nearly 25% cases are familial and the remaining are sporadic in inheritance. Copy number variants (CNVs) involving transcriptional regulators of limb development, including PITX1 and TBX4, have previously been shown to cause familial clubfoot, but much of the heritability remains unexplained. Methods Exome sequence data from 816 unrelated clubfoot cases and 2645 in-house controls were analysed using coverage data to identify rare CNVs. The precise size and location of duplications were then determined using high-density Affymetrix Cytoscan chromosomal microarray (CMA). Segregation in families and de novo status were determined using qantitative PCR. Results Chromosome Xp22.33 duplications involving SHOX were identified in 1.1% of cases (9/816) compared with 0.07% of in-house controls (2/2645) (p=7.98×10−5, OR=14.57) and 0.27% (38/13592) of Atherosclerosis Risk in Communities/the Wellcome Trust Case Control Consortium 2 controls (p=0.001, OR=3.97). CMA validation confirmed an overlapping 180.28 kb duplicated region that included SHOX exons as well as downstream non-coding regions. In four of six sporadic cases where DNA was available for unaffected parents, the duplication was de novo. The probability of four de novo mutations in SHOX by chance in a cohort of 450 sporadic clubfoot cases is 5.4×10–10. Conclusions Microduplications of the pseudoautosomal chromosome Xp22.33 region (PAR1) containing SHOX and downstream enhancer elements occur in ~1% of patients with clubfoot. SHOX and regulatory regions have previously been implicated in skeletal dysplasia as well as idiopathic short stature, but have not yet been reported in clubfoot. SHOX duplications likely contribute to clubfoot pathogenesis by altering early limb development.
Collapse
|
108
|
Shi H, Burch KS, Johnson R, Freund MK, Kichaev G, Mancuso N, Manuel AM, Dong N, Pasaniuc B. Localizing Components of Shared Transethnic Genetic Architecture of Complex Traits from GWAS Summary Data. Am J Hum Genet 2020; 106:805-817. [PMID: 32442408 PMCID: PMC7273527 DOI: 10.1016/j.ajhg.2020.04.012] [Citation(s) in RCA: 45] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 04/20/2020] [Indexed: 12/19/2022] Open
Abstract
Despite strong transethnic genetic correlations reported in the literature for many complex traits, the non-transferability of polygenic risk scores across populations suggests the presence of population-specific components of genetic architecture. We propose an approach that models GWAS summary data for one trait in two populations to estimate genome-wide proportions of population-specific/shared causal SNPs. In simulations across various genetic architectures, we show that our approach yields approximately unbiased estimates with in-sample LD and slight upward-bias with out-of-sample LD. We analyze nine complex traits in individuals of East Asian and European ancestry, restricting to common SNPs (MAF > 5%), and find that most common causal SNPs are shared by both populations. Using the genome-wide estimates as priors in an empirical Bayes framework, we perform fine-mapping and observe that high-posterior SNPs (for both the population-specific and shared causal configurations) have highly correlated effects in East Asians and Europeans. In population-specific GWAS risk regions, we observe a 2.8× enrichment of shared high-posterior SNPs, suggesting that population-specific GWAS risk regions harbor shared causal SNPs that are undetected in the other GWASs due to differences in LD, allele frequencies, and/or sample size. Finally, we report enrichments of shared high-posterior SNPs in 53 tissue-specific functional categories and find evidence that SNP-heritability enrichments are driven largely by many low-effect common SNPs.
Collapse
|
109
|
Cohorts. Twin Res Hum Genet 2020; 23:114-115. [PMID: 32450941 DOI: 10.1017/thg.2020.33] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Cohort studies are essential for conducting large studies of multiple exposures and outcomes in humans. Recently, the ability to combine data from multiple cohorts in, for example, meta-analyses, and the willingness in the genetics community to collaborate to enable replication studies has led to many new insights into the genetic and environmental determinants of human health and behaviors. The contribution of Professor Nicholas Martin to the development of cohort studies, particularly of twin and twin-family studies, over a period of several decades is reviewed. He has contributed to the development and use of both Australian and international resources. The contributions of Australian twin studies to genomewide association projects are multiple, and across multiple domains, from biomarkers, lifestyle and behavior to disorders and disease.
Collapse
|
110
|
The SNP-Based Heritability - A Commentary on Yang et al. (2010). Twin Res Hum Genet 2020; 23:118-119. [PMID: 32423524 DOI: 10.1017/thg.2020.25] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
I write this commentary as a part of a special issue published in this journal to celebrate Nick Martin's contribution to the field of human genetics. In this commentary, I briefly describe the background of the Yang et al. (2010) study and show some of the unpublished details of this study, its contribution to tackling the missing heritability problem and Nick's contribution to the work.
Collapse
|
111
|
Barua A, Mikheyev AS. Toxin expression in snake venom evolves rapidly with constant shifts in evolutionary rates. Proc Biol Sci 2020; 287:20200613. [PMID: 32345154 PMCID: PMC7282918 DOI: 10.1098/rspb.2020.0613] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2020] [Accepted: 03/30/2020] [Indexed: 12/21/2022] Open
Abstract
Key innovations provide ecological opportunity by enabling access to new resources, colonization of new environments, and are associated with adaptive radiation. The most well-known pattern associated with adaptive radiation is an early burst of phenotypic diversification. Venoms facilitate prey capture and are widely believed to be key innovations leading to adaptive radiation. However, few studies have estimated their evolutionary rate dynamics. Here, we test for patterns of adaptive evolution in venom gene expression data from 52 venomous snake species. By identifying shifts in tempo and mode of evolution along with models of phenotypic evolution, we show that snake venom exhibits the macroevolutionary dynamics expected of key innovations. Namely, all toxin families undergo shifts in their rates of evolution, likely in response to changes in adaptive optima. Furthermore, we show that rapid-pulsed evolution modelled as a Lévy process better fits snake venom evolution than conventional early burst or Ornstein-Uhlenbeck models. While our results support the idea of snake venom being a key innovation, the innovation of venom chemistry lacks clear mechanisms that would lead to reproductive isolation and thus adaptive radiation. Therefore, the extent to which venom directly influences the diversification process is still a matter of contention.
Collapse
|
112
|
Men M, Wang X, Wu J, Zeng W, Jiang F, Zheng R, Li JD. Prevalence and associated phenotypes of DUSP6, IL17RD and SPRY4 variants in a large Chinese cohort with isolated hypogonadotropic hypogonadism. J Med Genet 2020; 58:66-72. [PMID: 32389901 DOI: 10.1136/jmedgenet-2019-106786] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2019] [Revised: 02/13/2020] [Accepted: 03/09/2020] [Indexed: 01/12/2023]
Abstract
BACKGROUND FGF8-FGFR1 signalling is involved in multiple biological processes, while impairment of this signalling is one of the main reasons for isolated hypogonadotropic hypogonadism (IHH). Recently, several negative modulators of FGF8-FGFR1 signalling were also found to be involved in IHH, including DUSP6, IL17RD, SPRY2 and SPRY4. The aim of this study was to investigate the genotypic and phenotypic spectra of these genes in a large cohort of Chinese patients with IHH. METHODS A total of 196 patients with IHH were enrolled in this study. Whole-exome sequencing was performed to identify variants, which was verified by PCR and Sanger sequencing. RESULTS Four heterozygous DUSP6 variants (p.S157I, p.R83Q, p.P188L and p.N355I) were found in six patients. Cryptorchidism, dental agenesis, syndactyly and blue colour blindness were commonly observed in patients with DUSP6 mutations. Six heterozygous IL17RD variants (p.P191L, p.G35V, p.S671L, p.A221T, p.I329M and p.I329V) were found in seven patients. Segregation analysis indicated that 100% (5/5) of probands inherited the IL17RD variants from their unaffected parents, and oligogenicity was found in 4/7 patients. One rare SPRY4 variant (p.T68S) was found in a female patient with Kallmann syndrome who also carried a PLXNA1 mutation. CONCLUSION Our study greatly enriched the genotypic and phenotypic spectra of DUSP6, IL17RD and SPRY4 in IHH. Mutations in DUSP6 alone seem sufficient to cause IHH in an autosomal dominant manner, whereas IL17RD or SPRY4 mutations may cause IHH phenotypes in synergy with variants in other IHH-associated genes.
Collapse
|
113
|
Yang S, Zhou X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. Am J Hum Genet 2020; 106:679-693. [PMID: 32330416 PMCID: PMC7212266 DOI: 10.1016/j.ajhg.2020.03.013] [Citation(s) in RCA: 53] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Accepted: 03/30/2020] [Indexed: 01/24/2023] Open
Abstract
Accurate construction of polygenic scores (PGS) can enable early diagnosis of diseases and facilitate the development of personalized medicine. Accurate PGS construction requires prediction models that are both adaptive to different genetic architectures and scalable to biobank scale datasets with millions of individuals and tens of millions of genetic variants. Here, we develop such a method called Deterministic Bayesian Sparse Linear Mixed Model (DBSLMM). DBSLMM relies on a flexible modeling assumption on the effect size distribution to achieve robust and accurate prediction performance across a range of genetic architectures. DBSLMM also relies on a simple deterministic search algorithm to yield an approximate analytic estimation solution using summary statistics only. The deterministic search algorithm, when paired with further algebraic innovations, results in substantial computational savings. With simulations, we show that DBSLMM achieves scalable and accurate prediction performance across a range of realistic genetic architectures. We then apply DBSLMM to analyze 25 traits in UK Biobank. For these traits, compared to existing approaches, DBSLMM achieves an average of 2.03%-101.09% accuracy gain in internal cross-validations. In external validations on two separate datasets, including one from BioBank Japan, DBSLMM achieves an average of 14.74%-522.74% accuracy gain. In these real data applications, DBSLMM is 1.03-28.11 times faster and uses only 7.4%-24.8% of physical memory as compared to other multiple regression-based PGS methods. Overall, DBSLMM represents an accurate and scalable method for constructing PGS in biobank scale datasets.
Collapse
|
114
|
Yu C, Ni G, van der Werf J, Lee SH. Detecting Genotype-Population Interaction Effects by Ancestry Principal Components. Front Genet 2020; 11:379. [PMID: 32373165 PMCID: PMC7186421 DOI: 10.3389/fgene.2020.00379] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2019] [Accepted: 03/27/2020] [Indexed: 01/22/2023] Open
Abstract
Heterogeneity in the phenotypic mean and variance across populations is often observed for complex traits. One way to understand heterogeneous phenotypes lies in uncovering heterogeneity in genetic effects. Previous studies on genetic heterogeneity across populations were typically based on discrete groups in populations stratified by different countries or cohorts, which ignored the difference of population characteristics for the individuals within each group and resulted in loss of information. Here, we introduce a novel concept of genotype-by-population (G × P) interaction where population is defined by the first and second ancestry principal components (PCs), which are less likely to be confounded with country/cohort-specific factors. We applied a reaction norm model fitting each of 70 complex traits with significant SNP-heritability and the PCs as covariates to examine G × P interactions across diverse populations including white British and other white Europeans from the UK Biobank (N = 22,229). Our results demonstrated a significant population genetic heterogeneity for behavioral traits such as age at first sexual intercourse and academic qualification. Our approach may shed light on the latent genetic architecture of complex traits that underlies the modulation of genetic effects across different populations.
Collapse
|
115
|
Soller M, Abu-Toamih Atamni HJ, Binenbaum I, Chatziioannou A, Iraqi FA. Designing a QTL Mapping Study for Implementation in the Realized Collaborative Cross Genetic Reference Population. ACTA ACUST UNITED AC 2020; 9:e66. [PMID: 31756057 DOI: 10.1002/cpmo.66] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The Collaborative Cross (CC) mouse resource is a next-generation mouse genetic reference population (GRP) designed for high-resolution mapping of quantitative trait loci (QTL) of large effect affecting complex traits during health and disease. The CC resource consists of a set of 72 recombinant inbred lines (RILs) generated by reciprocal crossing of five classical and three wild-derived mouse founder strains. Complex traits are controlled by variations within multiple genes and environmental factors, and their mutual interactions. These traits are observed at multiple levels of the animals' systems, including metabolism, body weight, immune profile, and susceptibility or resistance to the development and progress of infectious or chronic diseases. Herein, we present general guidelines for design of QTL mapping experiments using the CC resource-along with full step-by-step protocols and methods that were implemented in our lab for the phenotypic and genotypic characterization of the different CC lines-for mapping the genes underlying host response to infectious and chronic diseases. © 2019 by John Wiley & Sons, Inc. Basic Protocol 1: CC lines for whole body mass index (BMI) Basic Protocol 2: A detailed assessment of the power to detect effect sizes based on the number of lines used, and the number of replicates per line Basic Protocol 3: Obtaining power for QTL with given target effect by interpolating in Table 1 of Keele et al. (2019).
Collapse
|
116
|
Bobbili DR, Banda P, Krüger R, May P. Excess of singleton loss-of-function variants in Parkinson's disease contributes to genetic risk. J Med Genet 2020; 57:617-623. [PMID: 32054687 PMCID: PMC7476273 DOI: 10.1136/jmedgenet-2019-106316] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2019] [Revised: 12/04/2019] [Accepted: 01/20/2020] [Indexed: 12/21/2022]
Abstract
BACKGROUND Parkinson's disease (PD) is a neurodegenerative disorder with complex genetic architecture. Besides rare mutations in high-risk genes related to monogenic familial forms of PD, multiple variants associated with sporadic PD were discovered via association studies. METHODS We studied the whole-exome sequencing data of 340 PD cases and 146 ethnically matched controls from the Parkinson's Progression Markers Initiative (PPMI) and performed burden analysis for different rare variant classes. Disease prediction models were built based on clinical, non-clinical and genetic features, including both common and rare variants, and two machine learning methods. RESULTS We observed a significant exome-wide burden of singleton loss-of-function variants (corrected p=0.037). Overall, no exome-wide burden of rare amino acid changing variants was detected. Finally, we built a disease prediction model combining singleton loss-of-function variants, a polygenic risk score based on common variants, and family history of PD as features and reached an area under the curve of 0.703 (95% CI 0.698 to 0.708). By incorporating a rare variant feature, our model increased the performance of the state-of-the-art classification model for the PPMI dataset, which reached an area under the curve of 0.639 based on common variants alone. CONCLUSION The main finding of this study is to highlight the contribution of singleton loss-of-function variants to the complex genetics of PD and that disease risk prediction models combining singleton and common variants can improve models built solely on common variants.
Collapse
|
117
|
Zingaretti LM, Gezan SA, Ferrão LFV, Osorio LF, Monfort A, Muñoz PR, Whitaker VM, Pérez-Enciso M. Exploring Deep Learning for Complex Trait Genomic Prediction in Polyploid Outcrossing Species. FRONTIERS IN PLANT SCIENCE 2020; 11:25. [PMID: 32117371 PMCID: PMC7015897 DOI: 10.3389/fpls.2020.00025] [Citation(s) in RCA: 56] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/22/2019] [Accepted: 01/10/2020] [Indexed: 05/21/2023]
Abstract
Genomic prediction (GP) is the procedure whereby the genetic merits of untested candidates are predicted using genome wide marker information. Although numerous examples of GP exist in plants and animals, applications to polyploid organisms are still scarce, partly due to limited genome resources and the complexity of this system. Deep learning (DL) techniques comprise a heterogeneous collection of machine learning algorithms that have excelled at many prediction tasks. A potential advantage of DL for GP over standard linear model methods is that DL can potentially take into account all genetic interactions, including dominance and epistasis, which are expected to be of special relevance in most polyploids. In this study, we evaluated the predictive accuracy of linear and DL techniques in two important small fruits or berries: strawberry and blueberry. The two datasets contained a total of 1,358 allopolyploid strawberry (2n=8x=112) and 1,802 autopolyploid blueberry (2n=4x=48) individuals, genotyped for 9,908 and 73,045 single nucleotide polymorphism (SNP) markers, respectively, and phenotyped for five agronomic traits each. DL depends on numerous parameters that influence performance and optimizing hyperparameter values can be a critical step. Here we show that interactions between hyperparameter combinations should be expected and that the number of convolutional filters and regularization in the first layers can have an important effect on model performance. In terms of genomic prediction, we did not find an advantage of DL over linear model methods, except when the epistasis component was important. Linear Bayesian models were better than convolutional neural networks for the full additive architecture, whereas the opposite was observed under strong epistasis. However, by using a parameterization capable of taking into account these non-linear effects, Bayesian linear models can match or exceed the predictive accuracy of DL. A semiautomatic implementation of the DL pipeline is available at https://github.com/lauzingaretti/deepGP/.
Collapse
|
118
|
Gianola D, Fernando RL. A Multiple-Trait Bayesian Lasso for Genome-Enabled Analysis and Prediction of Complex Traits. Genetics 2020; 214:305-331. [PMID: 31879318 PMCID: PMC7017027 DOI: 10.1534/genetics.119.302934] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2019] [Accepted: 12/20/2019] [Indexed: 12/21/2022] Open
Abstract
A multiple-trait Bayesian LASSO (MBL) for genome-based analysis and prediction of quantitative traits is presented and applied to two real data sets. The data-generating model is a multivariate linear Bayesian regression on possibly a huge number of molecular markers, and with a Gaussian residual distribution posed. Each (one per marker) of the [Formula: see text] vectors of regression coefficients (T: number of traits) is assigned the same T-variate Laplace prior distribution, with a null mean vector and unknown scale matrix Σ. The multivariate prior reduces to that of the standard univariate Bayesian LASSO when [Formula: see text] The covariance matrix of the residual distribution is assigned a multivariate Jeffreys prior, and Σ is given an inverse-Wishart prior. The unknown quantities in the model are learned using a Markov chain Monte Carlo sampling scheme constructed using a scale-mixture of normal distributions representation. MBL is demonstrated in a bivariate context employing two publicly available data sets using a bivariate genomic best linear unbiased prediction model (GBLUP) for benchmarking results. The first data set is one where wheat grain yields in two different environments are treated as distinct traits. The second data set comes from genotyped Pinus trees, with each individual measured for two traits: rust bin and gall volume. In MBL, the bivariate marker effects are shrunk differentially, i.e., "short" vectors are more strongly shrunk toward the origin than in GBLUP; conversely, "long" vectors are shrunk less. A predictive comparison was carried out as well in wheat, where the comparators of MBL were bivariate GBLUP and bivariate Bayes Cπ-a variable selection procedure. A training-testing layout was used, with 100 random reconstructions of training and testing sets. For the wheat data, all methods produced similar predictions. In Pinus, MBL gave better predictions that either a Bayesian bivariate GBLUP or the single trait Bayesian LASSO. MBL has been implemented in the Julia language package JWAS, and is now available for the scientific community to explore with different traits, species, and environments. It is well known that there is no universally best prediction machine, and MBL represents a new resource in the armamentarium for genome-enabled analysis and prediction of complex traits.
Collapse
|
119
|
Williams-Simon PA, Ganesan M, King EG. Learning to collaborate: bringing together behavior and quantitative genomics. J Neurogenet 2020; 34:28-35. [PMID: 31920134 DOI: 10.1080/01677063.2019.1710145] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
The genetic basis of complex trait like learning and memory have been well studied over the decades. Through those groundbreaking findings, we now have a better understanding about some of the genes and pathways that are involved in learning and/or memory. However, few of these findings identified the naturally segregating variants that are influencing learning and/or memory within populations. In this special issue honoring the legacy of Troy Zars, we review some of the traditional approaches that have been used to elucidate the genetic basis of learning and/or memory, specifically in fruit flies. We highlight some of his contributions to the field, and specifically describe his vision to bring together behavior and quantitative genomics with the aim of expanding our knowledge of the genetic basis of both learning and memory. Finally, we present some of our recent work in this area using a multiparental population (MPP) as a case study and describe the potential of this approach to advance our understanding of neurogenetics.
Collapse
|
120
|
Broekema RV, Bakker OB, Jonkers IH. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol 2020; 10:190221. [PMID: 31937202 PMCID: PMC7014684 DOI: 10.1098/rsob.190221] [Citation(s) in RCA: 62] [Impact Index Per Article: 15.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 12/05/2019] [Indexed: 12/17/2022] Open
Abstract
Over the past 15 years, genome-wide association studies (GWASs) have enabled the systematic identification of genetic loci associated with traits and diseases. However, due to resolution issues and methodological limitations, the true causal variants and genes associated with traits remain difficult to identify. In this post-GWAS era, many biological and computational fine-mapping approaches now aim to solve these issues. Here, we review fine-mapping and gene prioritization approaches that, when combined, will improve the understanding of the underlying mechanisms of complex traits and diseases. Fine-mapping of genetic variants has become increasingly sophisticated: initially, variants were simply overlapped with functional elements, but now the impact of variants on regulatory activity and direct variant-gene 3D interactions can be identified. Moreover, gene manipulation by CRISPR/Cas9, the identification of expression quantitative trait loci and the use of co-expression networks have all increased our understanding of the genes and pathways affected by GWAS loci. However, despite this progress, limitations including the lack of cell-type- and disease-specific data and the ever-increasing complexity of polygenic models of traits pose serious challenges. Indeed, the combination of fine-mapping and gene prioritization by statistical, functional and population-based strategies will be necessary to truly understand how GWAS loci contribute to complex traits and diseases.
Collapse
|
121
|
|
122
|
Lucas AM, Palmiero NE, McGuigan J, Passero K, Zhou J, Orie D, Ritchie MD, Hall MA. CLARITE Facilitates the Quality Control and Analysis Process for EWAS of Metabolic-Related Traits. Front Genet 2019; 10:1240. [PMID: 31921293 PMCID: PMC6930237 DOI: 10.3389/fgene.2019.01240] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2018] [Accepted: 11/08/2019] [Indexed: 02/03/2023] Open
Abstract
While genome-wide association studies are an established method of identifying genetic variants associated with disease, environment-wide association studies (EWAS) highlight the contribution of nongenetic components to complex phenotypes. However, the lack of high-throughput quality control (QC) pipelines for EWAS data lends itself to analysis plans where the data are cleaned after a first-pass analysis, which can lead to bias, or are cleaned manually, which is arduous and susceptible to user error. We offer a novel software, CLeaning to Analysis: Reproducibility-based Interface for Traits and Exposures (CLARITE), as a tool to efficiently clean environmental data, perform regression analysis, and visualize results on a single platform through user-guided automation. It exists as both an R package and a Python package. Though CLARITE focuses on EWAS, it is intended to also improve the QC process for phenotypes and clinical lab measures for a variety of downstream analyses, including phenome-wide association studies and gene-environment interaction studies. With the goal of demonstrating the utility of CLARITE, we performed a novel EWAS in the National Health and Nutrition Examination Survey (NHANES) (N overall Discovery=9063, N overall Replication=9874) for body mass index (BMI) and over 300 environment variables post-QC, adjusting for sex, age, race, socioeconomic status, and survey year. The analysis used survey weights along with cluster and strata information in order to account for the complex survey design. Sixteen BMI results replicated at a Bonferroni corrected p < 0.05. The top replicating results were serum levels of g-tocopherol (vitamin E) (Discovery Bonferroni p: 8.67x10-12, Replication Bonferroni p: 2.70x10-9) and iron (Discovery Bonferroni p: 1.09x10-8, Replication Bonferroni p: 1.73x10-10). Results of this EWAS are important to consider for metabolic trait analysis, as BMI is tightly associated with these phenotypes. As such, exposures predictive of BMI may be useful for covariate and/or interaction assessment of metabolic-related traits. CLARITE allows improved data quality for EWAS, gene-environment interactions, and phenome-wide association studies by establishing a high-throughput quality control infrastructure. Thus, CLARITE is recommended for studying the environmental factors underlying complex disease.
Collapse
|
123
|
Privé F, Vilhjálmsson BJ, Aschard H, Blum MGB. Making the Most of Clumping and Thresholding for Polygenic Scores. Am J Hum Genet 2019; 105:1213-1221. [PMID: 31761295 PMCID: PMC6904799 DOI: 10.1016/j.ajhg.2019.11.001] [Citation(s) in RCA: 79] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/28/2019] [Indexed: 12/19/2022] Open
Abstract
Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.
Collapse
|
124
|
Pallares LF. Searching for solutions to the missing heritability problem. eLife 2019; 8:53018. [PMID: 31799931 PMCID: PMC6892610 DOI: 10.7554/elife.53018] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Accepted: 12/02/2019] [Indexed: 01/04/2023] Open
Abstract
Rare genetic variants in yeast explain a large amount of phenotypic variation in a complex trait like growth.
Collapse
|
125
|
Hayes BJ, Daetwyler HD. 1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes. Annu Rev Anim Biosci 2019; 7:89-102. [PMID: 30508490 DOI: 10.1146/annurev-animal-020518-115024] [Citation(s) in RCA: 184] [Impact Index Per Article: 36.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The 1000 Bull Genomes Project is a collection of whole-genome sequences from 2,703 individuals capturing a significant proportion of the world's cattle diversity. So far, 84 million single-nucleotide polymorphisms (SNPs) and 2.5 million small insertion deletions have been identified in the collection, a very high level of genetic diversity. The project has greatly accelerated the identification of deleterious mutations for a range of genetic diseases, as well as for embryonic lethals. The rate of identification of causal mutations for complex traits has been slower, reflecting the typically small effect size of these mutations and the fact that many are likely in as-yet-unannotated regulatory regions. Both the deleterious mutations that have been identified and the mutations associated with complex trait variation have been included in low-cost SNP array designs, and these arrays are being genotyped in tens of thousands of dairy and beef cattle, enabling management of deleterious mutations in these populations as well as genomic selection.
Collapse
|