1
|
Momin MM, Zhou X, Hyppönen E, Benyamin B, Lee SH. Cross-ancestry genetic architecture and prediction for cholesterol traits. Hum Genet 2024; 143:635-648. [PMID: 38536467 DOI: 10.1007/s00439-024-02660-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 02/13/2024] [Indexed: 05/18/2024]
Abstract
While cholesterol is essential, a high level of cholesterol is associated with the risk of cardiovascular diseases. Genome-wide association studies (GWASs) have proven successful in identifying genetic variants that are linked to cholesterol levels, predominantly in white European populations. However, the extent to which genetic effects on cholesterol vary across different ancestries remains largely unexplored. Here, we estimate cross-ancestry genetic correlation to address questions on how genetic effects are shared across ancestries. We find significant genetic heterogeneity between ancestries for cholesterol traits. Furthermore, we demonstrate that single nucleotide polymorphisms (SNPs) with concordant effects across ancestries for cholesterol are more frequently found in regulatory regions compared to other genomic regions. Indeed, the positive genetic covariance between ancestries is mostly driven by the effects of the concordant SNPs, whereas the genetic heterogeneity is attributed to the discordant SNPs. We also show that the predictive ability of the concordant SNPs is significantly higher than the discordant SNPs in the cross-ancestry polygenic prediction. The list of concordant SNPs for cholesterol is available in GWAS Catalog. These findings have relevance for the understanding of shared genetic architecture across ancestries, contributing to the development of clinical strategies for polygenic prediction of cholesterol in cross-ancestral settings.
Collapse
Affiliation(s)
- Md Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Khulshi, Chattogram, 4225, Bangladesh.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| | - Xuan Zhou
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - Elina Hyppönen
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Clinical and Health Sciences, University of South Australia, Adelaide, SA, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| |
Collapse
|
2
|
Zhang S, Jiang Z, Zeng P. Incorporating genetic similarity of auxiliary samples into eGene identification under the transfer learning framework. J Transl Med 2024; 22:258. [PMID: 38461317 PMCID: PMC10924384 DOI: 10.1186/s12967-024-05053-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 03/01/2024] [Indexed: 03/11/2024] Open
Abstract
BACKGROUND The term eGene has been applied to define a gene whose expression level is affected by at least one independent expression quantitative trait locus (eQTL). It is both theoretically and empirically important to identify eQTLs and eGenes in genomic studies. However, standard eGene detection methods generally focus on individual cis-variants and cannot efficiently leverage useful knowledge acquired from auxiliary samples into target studies. METHODS We propose a multilocus-based eGene identification method called TLegene by integrating shared genetic similarity information available from auxiliary studies under the statistical framework of transfer learning. We apply TLegene to eGene identification in ten TCGA cancers which have an explicit relevant tissue in the GTEx project, and learn genetic effect of variant in TCGA from GTEx. We also adopt TLegene to the Geuvadis project to evaluate its usefulness in non-cancer studies. RESULTS We observed substantial genetic effect correlation of cis-variants between TCGA and GTEx for a larger number of genes. Furthermore, consistent with the results of our simulations, we found that TLegene was more powerful than existing methods and thus identified 169 distinct candidate eGenes, which was much larger than the approach that did not consider knowledge transfer across target and auxiliary studies. Previous studies and functional enrichment analyses provided empirical evidence supporting the associations of discovered eGenes, and it also showed evidence of allelic heterogeneity of gene expression. Furthermore, TLegene identified more eGenes in Geuvadis and revealed that these eGenes were mainly enriched in cells EBV transformed lymphocytes tissue. CONCLUSION Overall, TLegene represents a flexible and powerful statistical method for eGene identification through transfer learning of genetic similarity shared across auxiliary and target studies.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhou Jiang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Xuzhou Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Jiangsu Engineering Research Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
3
|
Kim MS, Kim HJ, Jin HJ. Genetic association between ADRB2 rs1042713 and elite athletic performances in the Korean population. Gene 2024; 896:148037. [PMID: 38036078 DOI: 10.1016/j.gene.2023.148037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2023] [Revised: 11/15/2023] [Accepted: 11/27/2023] [Indexed: 12/02/2023]
Abstract
Athletic performance is a multifactorial trait influenced by environmental and genetic factors. Previous studies have identified various genes associated with athletic performance, including the β2-adrenergic receptor (ADRB2) gene, which has been consistently shown to be linked with elite athletic performance in diverse populations. The ADRB2 gene is known to play a key role in various biological systems, including cardiovascular, pulmonary, metabolic, and musculoskeletal functions. It acts by interacting with adrenaline. In particular, the ADRB2 rs1042713 (A > G) polymorphism has been associated with cardiovascular and respiratory functions. In addition, the association between the ADRB2 rs1042713 polymorphism and athletic performance has been reported. Thus, we conducted a case-control study to analyze the genetic association with ADRB2 rs1042713 polymorphism with 150 elite athletes, 116 college athletes, and 145 controls (control I) in the Korean population. The genotypes were determined by PCR-RFLP. As a result, we found significant differences in the distributions of genotype (p = 0.005) and allele (p = 0.002) frequencies between elite athletes and the control Ⅱ (control I + college athletes). We also found that the ADRB2 rs1042713 G/G genotype [odds ratio (OR) 2.42, 95% CI 1.384-4.235, p = 0.002] and the G allele (OR 1.58, 95% CI 1.184-2.098, p = 0.002) were significantly associated with elite athletic performance. Additionally, we observed a gender-specific association in female elite athletic performance (p = 0.0002 and p = 0.0002, respectively). In conclusion, our results suggest that the ADRB2 rs1042713 polymorphism may be associated with elite athletic performance in the Korean population. To validate these findings, additional studies with larger samples, including elite athletes from various sports types and diverse ethnic origins are needed.
Collapse
Affiliation(s)
- Min Seo Kim
- Department of Biological Sciences, College of Science & Technology, Dankook University, Cheonan, South Korea
| | - Hyung Jun Kim
- Department of Biological Sciences, College of Science & Technology, Dankook University, Cheonan, South Korea
| | - Han Jun Jin
- Department of Biological Sciences, College of Science & Technology, Dankook University, Cheonan, South Korea.
| |
Collapse
|
4
|
Sun Q, Rowland BT, Chen J, Mikhaylova AV, Avery C, Peters U, Lundin J, Matise T, Buyske S, Tao R, Mathias RA, Reiner AP, Auer PL, Cox NJ, Kooperberg C, Thornton TA, Raffield LM, Li Y. Improving polygenic risk prediction in admixed populations by explicitly modeling ancestral-differential effects via GAUDI. Nat Commun 2024; 15:1016. [PMID: 38310129 PMCID: PMC10838303 DOI: 10.1038/s41467-024-45135-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2022] [Accepted: 01/16/2024] [Indexed: 02/05/2024] Open
Abstract
Polygenic risk scores (PRS) have shown successes in clinics, but most PRS methods focus only on participants with distinct primary continental ancestry without accommodating recently-admixed individuals with mosaic continental ancestry backgrounds for different segments of their genomes. Here, we develop GAUDI, a novel penalized-regression-based method specifically designed for admixed individuals. GAUDI explicitly models ancestry-differential effects while borrowing information across segments with shared ancestry in admixed genomes. We demonstrate marked advantages of GAUDI over other methods through comprehensive simulation and real data analyses for traits with associated variants exhibiting ancestral-differential effects. Leveraging data from the Women's Health Initiative study, we show that GAUDI improves PRS prediction of white blood cell count and C-reactive protein in African Americans by > 64% compared to alternative methods, and even outperforms PRS-CSx with large European GWAS for some scenarios. We believe GAUDI will be a valuable tool to mitigate disparities in PRS performance in admixed individuals.
Collapse
Affiliation(s)
- Quan Sun
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Bryce T Rowland
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Jiawen Chen
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Anna V Mikhaylova
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Christy Avery
- Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Ulrike Peters
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Jessica Lundin
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Tara Matise
- Department of Genetics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Steve Buyske
- Department of Statistics, Rutgers University, New Brunswick, NJ, 08901, USA
| | - Ran Tao
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Rasika A Mathias
- Department of Medicine, Johns Hopkins University, Baltimore, MD, 21287, USA
| | - Alexander P Reiner
- Department of Epidemiology, University of Washington, Seattle, WA, 98195, USA
| | - Paul L Auer
- Division of Biostatistics, Institute for Health and Equity, and Cancer Center, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Nancy J Cox
- Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
- Division of Genetic Medicine, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Charles Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Center, Seattle, WA, 98109, USA
| | - Timothy A Thornton
- Department of Biostatistics, University of Washington, Seattle, WA, 98195, USA
| | - Laura M Raffield
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA
| | - Yun Li
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
- Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, NC, 27599, USA.
| |
Collapse
|
5
|
Bocher O, Gilly A, Park YC, Zeggini E, Morris AP. Bridging the diversity gap: Analytical and study design considerations for improving the accuracy of trans-ancestry genetic prediction. HGG ADVANCES 2023; 4:100214. [PMID: 37448981 PMCID: PMC10336686 DOI: 10.1016/j.xhgg.2023.100214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 06/12/2023] [Indexed: 07/18/2023] Open
Abstract
Genetic prediction of common complex disease risk is an essential component of precision medicine. Currently, genome-wide association studies (GWASs) are mostly composed of European-ancestry samples and resulting polygenic scores (PGSs) have been shown to poorly transfer to other ancestries partly due to heterogeneity of allelic effects between populations. Fixed-effects (FETA) and random-effects (RETA) trans-ancestry meta-analyses do not model such ancestry-related heterogeneity, while ancestry-specific (AS) scores may suffer from low power due to low sample sizes. In contrast, trans-ancestry meta-regression (TAMR) builds ancestry-aware PGS that account for more complex trans-ancestry architectures. Here, we examine the predictive performance of these four PGSs under multiple genetic architectures and ancestry configurations. We show that the predictive performance of FETA and RETA is strongly affected by cross-ancestry genetic heterogeneity, while AS PGS performance decreases in under-represented target populations. TAMR PGS is also impacted by heterogeneity but maintains good prediction performance in most situations, especially in ancestry-diverse scenarios. In simulations of human complex traits, TAMR scores currently explain 25% more phenotypic variance than AS in triglyceride levels and 33% more phenotypic variance than FETA in type 2 diabetes in most non-European populations. Importantly, a high proportion of non-European-ancestry individuals is needed to reach prediction levels that are comparable in those populations to the one observed in European-ancestry studies. Our results highlight the need to rebalance the ancestral composition of GWAS to enable accurate prediction in non-European-ancestry groups, and demonstrate the relevance of meta-regression approaches for compensating some of the current population biases in GWAS.
Collapse
Affiliation(s)
| | | | | | - Eleftheria Zeggini
- ITG, Helmholtz Zentrum München, Munich, Germany
- Technical University of Munich, Munich, Germany
- Klinikum Rechts der Isar, Munich, Germany
| | - Andrew P. Morris
- ITG, Helmholtz Zentrum München, Munich, Germany
- Centre for Genetics and Genomics Versus Arthritis, Centre for Musculoskeletal Research, University of Manchester, Manchester, UK
| |
Collapse
|
6
|
Qiao J, Wu Y, Zhang S, Xu Y, Zhang J, Zeng P, Wang T. Evaluating significance of European-associated index SNPs in the East Asian population for 31 complex phenotypes. BMC Genomics 2023; 24:324. [PMID: 37312035 DOI: 10.1186/s12864-023-09425-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2022] [Accepted: 06/01/2023] [Indexed: 06/15/2023] Open
Abstract
BACKGROUND Genome-wide association studies (GWASs) have identified many single-nucleotide polymorphisms (SNPs) associated with complex phenotypes in the European (EUR) population; however, the extent to which EUR-associated SNPs can be generalized to other populations such as East Asian (EAS) is not clear. RESULTS By leveraging summary statistics of 31 phenotypes in the EUR and EAS populations, we first evaluated the difference in heritability between the two populations and calculated the trans-ethnic genetic correlation. We observed the heritability estimates of some phenotypes varied substantially across populations and 53.3% of trans-ethnic genetic correlations were significantly smaller than one. Next, we examined whether EUR-associated SNPs of these phenotypes could be identified in EAS using the trans-ethnic false discovery rate method while accounting for winner's curse for SNP effect in EUR and difference of sample sizes in EAS. We found on average 54.5% of EUR-associated SNPs were also significant in EAS. Furthermore, we discovered non-significant SNPs had higher effect heterogeneity, and significant SNPs showed more consistent linkage disequilibrium and allele frequency patterns between the two populations. We also demonstrated non-significant SNPs were more likely to undergo natural selection. CONCLUSIONS Our study revealed the extent to which EUR-associated SNPs could be significant in the EAS population and offered deep insights into the similarity and diversity of genetic architectures underlying phenotypes in distinct ancestral groups.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yue Xu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
7
|
Zhang J, Zhang S, Qiao J, Wang T, Zeng P. Similarity and diversity of genetic architecture for complex traits between East Asian and European populations. BMC Genomics 2023; 24:314. [PMID: 37308816 DOI: 10.1186/s12864-023-09434-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 06/07/2023] [Indexed: 06/14/2023] Open
Abstract
BACKGROUND Genome-wide association studies have detected a large number of single-nucleotide polymorphisms (SNPs) associated with complex traits in diverse ancestral groups. However, the trans-ethnic similarity and diversity of genetic architecture is not well understood currently. RESULTS By leveraging summary statistics of 37 traits from East Asian (Nmax=254,373) or European (Nmax=693,529) populations, we first evaluated the trans-ethnic genetic correlation (ρg) and found substantial evidence of shared genetic overlap underlying these traits between the two populations, with [Formula: see text] ranging from 0.53 (se = 0.11) for adult-onset asthma to 0.98 (se = 0.17) for hemoglobin A1c. However, 88.9% of the genetic correlation estimates were significantly less than one, indicating potential heterogeneity in genetic effect across populations. We next identified common associated SNPs using the conjunction conditional false discovery rate method and observed 21.7% of trait-associated SNPs can be identified simultaneously in both populations. Among these shared associated SNPs, 20.8% showed heterogeneous influence on traits between the two ancestral populations. Moreover, we demonstrated that population-common associated SNPs often exhibited more consistent linkage disequilibrium and allele frequency pattern across ancestral groups compared to population-specific or null ones. We also revealed population-specific associated SNPs were much likely to undergo natural selection compared to population-common associated SNPs. CONCLUSIONS Our study provides an in-depth understanding of similarity and diversity regarding genetic architecture for complex traits across diverse populations, and can assist in trans-ethnic association analysis, genetic risk prediction, and causal variant fine mapping.
Collapse
Affiliation(s)
- Jinhui Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Shuo Zhang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| |
Collapse
|
8
|
Momin MM, Shin J, Lee S, Truong B, Benyamin B, Lee SH. A method for an unbiased estimate of cross-ancestry genetic correlation using individual-level data. Nat Commun 2023; 14:722. [PMID: 36759513 PMCID: PMC9911789 DOI: 10.1038/s41467-023-36281-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2021] [Accepted: 01/24/2023] [Indexed: 02/11/2023] Open
Abstract
Cross-ancestry genetic correlation is an important parameter to understand the genetic relationship between two ancestry groups. However, existing methods cannot properly account for ancestry-specific genetic architecture, which is diverse across ancestries, producing biased estimates of cross-ancestry genetic correlation. Here, we present a method to construct a genomic relationship matrix (GRM) that can correctly account for the relationship between ancestry-specific allele frequencies and ancestry-specific allelic effects. Through comprehensive simulations, we show that the proposed method outperforms existing methods in the estimations of SNP-based heritability and cross-ancestry genetic correlation. The proposed method is further applied to anthropometric and other complex traits from the UK Biobank data across ancestry groups. For obesity, the estimated genetic correlation between African and European ancestry cohorts is significantly different from unity, suggesting that obesity is genetically heterogenous between these two ancestries.
Collapse
Affiliation(s)
- Md Moksedul Momin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- Department of Genetics and Animal Breeding, Faculty of Veterinary Medicine, Chattogram Veterinary and Animal Sciences University (CVASU), Khulshi, Chattogram, 4225, Bangladesh
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - Jisu Shin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA
| | - Soohyun Lee
- Division of Animal Breeding and Genetics, National Institute of Animal Science (NIAS), Cheonan, South Korea
| | - Buu Truong
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
| | - Beben Benyamin
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute (SAHMRI), University of South Australia, Adelaide, SA, 5000, Australia.
| |
Collapse
|
9
|
Ogawa S, Taniguchi Y, Watanabe T, Iwaisaki H. Fitting Genomic Prediction Models with Different Marker Effects among Prefectures to Carcass Traits in Japanese Black Cattle. Genes (Basel) 2022; 14:24. [PMID: 36672767 PMCID: PMC9859149 DOI: 10.3390/genes14010024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/16/2022] [Accepted: 12/20/2022] [Indexed: 12/25/2022] Open
Abstract
We fitted statistical models, which assumed single-nucleotide polymorphism (SNP) marker effects differing across the fattened steers marketed into different prefectures, to the records for cold carcass weight (CW) and marbling score (MS) of 1036, 733, and 279 Japanese Black fattened steers marketed into Tottori, Hiroshima, and Hyogo prefectures in Japan, respectively. Genotype data on 33,059 SNPs was used. Five models that assume only common SNP effects to all the steers (model 1), common effects plus SNP effects differing between the steers marketed into Hyogo prefecture and others (model 2), only the SNP effects differing between Hyogo steers and others (model 3), common effects plus SNP effects specific to each prefecture (model 4), and only the effects specific to each prefecture (model 5) were exploited. For both traits, slightly lower values of residual variance than that of model 1 were estimated when fitting all other models. Estimated genetic correlation among the prefectures in models 2 and 4 ranged to 0.53 to 0.71, all <0.8. These results might support that the SNP effects differ among the prefectures to some degree, although we discussed the necessity of careful consideration to interpret the current results.
Collapse
Affiliation(s)
- Shinichiro Ogawa
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
- Division of Meat Animal and Poultry Research, Institute of Livestock and Grassland Science, Tsukuba 305-0901, Japan
| | - Yukio Taniguchi
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
| | - Toshio Watanabe
- National Livestock Breeding Center, Fukushima 961-8511, Japan
- Maebashi Institute of Animal Science, Livestock Improvement Association of Japan, Inc., Maebashi 371-0121, Japan
| | - Hiroaki Iwaisaki
- Graduate School of Agriculture, Kyoto University, Kyoto 606-8502, Japan
- Sado Island Center for Ecological Sustainability, Niigata University, Niigata 952-0103, Japan
| |
Collapse
|
10
|
Rooney TE, Kunze KH, Sorrells ME. Genome-wide marker effect heterogeneity is associated with a large effect dormancy locus in winter malting barley. THE PLANT GENOME 2022; 15:e20247. [PMID: 35971877 DOI: 10.1002/tpg2.20247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2022] [Accepted: 06/20/2022] [Indexed: 06/15/2023]
Abstract
Prediction of trait values in plant breeding populations typically relies on assumptions about marker effect homogeneity across populations. Evidence is presented for winter malting barley (Hordeum vulgare L.) germination traits that a single, causative, large-effect gene in the Seed dormancy 1 region on Chromosome 5H, HvAlaAT1 (Qsd1), leads to heterogeneous estimated marker effects genome wide between groups of otherwise related individuals carrying different Qsd1 alleles. This led to reduced prediction accuracy across alleles when a model was trained either on individuals carrying both alleles or one allele. Several genomic prediction models were tested to increase prediction accuracy within the Qsd1 allele groups. Small gains (5-12%) in prediction accuracy were realized using structured genomic best linear unbiased predictor models when information about the Qsd1 allele was used to stratify the population. We concluded that a single large-effect locus can lead to heterogeneous marker effects in the same breeding family. Variance partitioning based on large-effect loci can be used to inform best practices in designing genomic prediction models; however, there are likely few cases for which it may be practical to do this. For malting barley, if germination traits are highly associated with malting quality traits, then similar steps should be considered for malting quality trait prediction.
Collapse
Affiliation(s)
- Travis E Rooney
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| | - Karl H Kunze
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| | - Mark E Sorrells
- Plant Breeding and Genetics Section, School of Integrative Plant Sciences, Cornell Univ., Ithaca, NY, 14853, USA
| |
Collapse
|
11
|
Qiao J, Shao Z, Wu Y, Zeng P, Wang T. Detecting associated genes for complex traits shared across East Asian and European populations under the framework of composite null hypothesis testing. Lab Invest 2022; 20:424. [PMID: 36138484 PMCID: PMC9503281 DOI: 10.1186/s12967-022-03637-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2022] [Accepted: 09/12/2022] [Indexed: 11/21/2022]
Abstract
Background Detecting trans-ethnic common associated genetic loci can offer important insights into shared genetic components underlying complex diseases/traits across diverse continental populations. However, effective statistical methods for such a goal are currently lacking. Methods By leveraging summary statistics available from global-scale genome-wide association studies, we herein proposed a novel genetic overlap detection method called CONTO (COmposite Null hypothesis test for Trans-ethnic genetic Overlap) from the perspective of high-dimensional composite null hypothesis testing. Unlike previous studies which generally analyzed individual genetic variants, CONTO is a gene-centric method which focuses on a set of genetic variants located within a gene simultaneously and assesses their joint significance with the trait of interest. By borrowing the similar principle of joint significance test (JST), CONTO takes the maximum P value of multiple associations as the significance measurement. Results Compared to JST which is often overly conservative, CONTO is improved in two aspects, including the construction of three-component mixture null distribution and the adjustment of trans-ethnic genetic correlation. Consequently, CONTO corrects the conservativeness of JST with well-calibrated P values and is much more powerful validated by extensive simulation studies. We applied CONTO to discover common associated genes for 31 complex diseases/traits between the East Asian and European populations, and identified many shared trait-associated genes that had otherwise been missed by JST. We further revealed that population-common genes were generally more evolutionarily conserved than population-specific or null ones. Conclusion Overall, CONTO represents a powerful method for detecting common associated genes across diverse ancestral groups; our results provide important implications on the transferability of GWAS discoveries in one population to others. Supplementary Information The online version contains supplementary material available at 10.1186/s12967-022-03637-8.
Collapse
Affiliation(s)
- Jiahao Qiao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Zhonghe Shao
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Yuxuan Wu
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Human Genetics and Environmental Medicine, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Key Laboratory of Environment and Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Engineering Research Innovation Center of Biological Data Mining and Healthcare Transformation, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| | - Ting Wang
- Department of Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
12
|
Ju D, Hui D, Hammond DA, Wonkam A, Tishkoff SA. Importance of Including Non-European Populations in Large Human Genetic Studies to Enhance Precision Medicine. Annu Rev Biomed Data Sci 2022; 5:321-339. [PMID: 35576557 PMCID: PMC9904154 DOI: 10.1146/annurev-biodatasci-122220-112550] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
One goal of genomic medicine is to uncover an individual's genetic risk for disease, which generally requires data connecting genotype to phenotype, as done in genome-wide association studies (GWAS). While there may be clinical promise to employing prediction tools such as polygenic risk scores (PRS), it currently stands that individuals of non-European ancestry may not reap the benefits of genomic medicine because of underrepresentation in large-scale genetics studies. Here, we discuss why this inequity poses a problem for genomic medicine and the reasons for the low transferability of PRS across populations. We also survey the ancestry representation of published GWAS and investigate how estimates of ancestry diversity in GWASparticipants might be biased. We highlight the importance of expanding genetic research in Africa, one of the most underrepresented regions in human genomics research, and discuss issues of ethics, resources, and technology for equitable advancement of genomic medicine.
Collapse
Affiliation(s)
- Dan Ju
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
| | - Daniel Hui
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
- Graduate Program in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Dorothy A Hammond
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
- Penn Center for Global Genomics & Health Equity, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| | - Ambroise Wonkam
- Division of Human Genetics, Department of Medicine, Faculty of Health Sciences, University of Cape Town, Cape Town, South Africa
- Department of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland, USA;
| | - Sarah A Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA;
- Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
13
|
Yair S, Coop G. Population differentiation of polygenic score predictions under stabilizing selection. Philos Trans R Soc Lond B Biol Sci 2022; 377:20200416. [PMID: 35430887 PMCID: PMC9014188 DOI: 10.1098/rstb.2020.0416] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2021] [Accepted: 03/08/2022] [Indexed: 12/15/2022] Open
Abstract
Given the many small-effect loci uncovered by genome-wide association studies (GWAS), polygenic scores have become central to genomic medicine, and have found application in diverse settings including evolutionary studies of adaptation. Despite their promise, polygenic scores have been found to suffer from limited portability across human populations. This at first seems in conflict with the observation that most common genetic variation is shared among populations. We investigate one potential cause of this discrepancy: stabilizing selection on complex traits. Counterintuitively, while stabilizing selection constrains phenotypic evolution, it accelerates the loss and fixation of alleles underlying trait variation within populations (GWAS loci). Thus even when populations share an optimum phenotype, stabilizing selection erodes the variance contributed by their shared GWAS loci, such that predictions from GWAS in one population explain less of the phenotypic variation in another. We develop theory to quantify how stabilizing selection is expected to reduce the prediction accuracy of polygenic scores in populations not represented in GWAS samples. In addition, we find that polygenic scores can substantially overstate average genetic differences of phenotypes among populations. We emphasize stabilizing selection around a common optimum as a useful null model to connect patterns of allele frequency and polygenic score differentiation. This article is part of the theme issue 'Celebrating 50 years since Lewontin's apportionment of human diversity'.
Collapse
Affiliation(s)
- Sivan Yair
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| | - Graham Coop
- Center for Population Biology and Department of Evolution and Ecology, University of California, Davis, CA 95616, USA
| |
Collapse
|
14
|
Mathieson I. The omnigenic model and polygenic prediction of complex traits. Am J Hum Genet 2021; 108:1558-1563. [PMID: 34331855 PMCID: PMC8456163 DOI: 10.1016/j.ajhg.2021.07.003] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2021] [Accepted: 07/08/2021] [Indexed: 12/16/2022] Open
Abstract
The omnigenic model was proposed as a framework to understand the highly polygenic architecture of complex traits revealed by genome-wide association studies (GWASs). I argue that this model also explains recent observations about cross-population genetic effects, specifically the low transferability of polygenic scores and the lack of clear evidence for polygenic selection. In particular, the omnigenic model explains why the effects of most GWAS variants vary between populations. This interpretation has several consequences for the evolutionary interpretation and practical use of GWAS summary statistics and polygenic scores. First, some polygenic scores may be applicable only in populations of the same ancestry and environment as the discovery population. Second, most GWAS associations will have differing effects between populations and are unlikely to be robust clinical targets. Finally, it may not always be possible to detect polygenic selection from population genetic data. These considerations make it difficult to interpret the clinical and evolutionary meanings of polygenic scores without an explicit model of genetic architecture.
Collapse
Affiliation(s)
- Iain Mathieson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
| |
Collapse
|
15
|
Lanca C, Kassam I, Patasova K, Foo LL, Li J, Ang M, Hoang QV, Teo YY, Hysi PG, Saw SM. New Polygenic Risk Score to Predict High Myopia in Singapore Chinese Children. Transl Vis Sci Technol 2021; 10:26. [PMID: 34319387 PMCID: PMC8322707 DOI: 10.1167/tvst.10.8.26] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Purpose The purpose of this study was to develop an Asian polygenic risk score (PRS) to predict high myopia (HM) in Chinese children in the Singapore Cohort of Risk factors for Myopia (SCORM) cohort. Methods We included children followed from 6 to 11 years old until teenage years (12–18 years old). Cycloplegic autorefraction, ultrasound biometry, Illumina HumanHap 550, or 550 Duo Beadarrays, demographics, and environmental factors data were obtained. The PRS was generated from the Consortium for Refractive Error and Myopia genomewide association study (n = 542,934) and the Strabismus, Amblyopia, and Refractive Error in Singapore children Study (n = 500). The Growing Up in Singapore Towards healthy Outcomes Cohort study (n = 339) was the replication cohort. The outcome was teenage HM (≤ −5.00 D) with predictive performance assessed using the area under the curve (AUC). Results Mean baseline age ± SD was 7.85 ± 0.84 (n = 1004) and 571 attended the teenage visit; 23.3% had HM. In multivariate analysis, the PRS was associated with a myopic spherical equivalent with an incremental R2 of 0.041 (95% confidence interval [CI] = 0.010, 0.073; P < 0.001). AUC for HM (0.77 [95% CI = 0.71–0.83]) performed better (P = 0.02) with the PRS compared with a model without (0.72 [95% CI = 0.65, 0.78]). Children at the top 25% PRS risk had a 2.34-fold-greater risk of HM (95% CI = 1.53, 3.55; P < 0.001). Conclusions The new Asian PRS improved the predictive performance to detect children at risk of HM. Translational Relevance Clinicians may use the PRS with other predictive factors to identify high risk children and guide interventions to reduce the risk of HM later in life.
Collapse
Affiliation(s)
- Carla Lanca
- Singapore Eye Research Institute, Singapore.,Comprehensive Health Research Center (CHRC), Escola Nacional de Saúde Pública, Universidade Nova de Lisboa, Lisboa, Portugal.,Escola Superior de Tecnologia da Saúde de Lisboa (ESTeSL), Instituto Politécnico de Lisboa, Lisboa, Portugal
| | - Irfahan Kassam
- Saw Swee Hock School of Public Health, National University of Singapore.,Life Sciences Institute, National University of Singapore
| | - Karina Patasova
- Section of Ophthalmology, School of Life Course Sciences, King's College London, United Kingdom Department of Twin Research and Genetic Epidemiology, School of Life Course Sciences, King's College London, UK
| | - Li-Lian Foo
- Singapore Eye Research Institute, Singapore.,Singapore National Eye Centre, Singapore.,Duke-NUS Medical School, Singapore
| | - Jonathan Li
- Department of Ophthalmology, University of California San Francisco, San Francisco, CA, USA
| | - Marcus Ang
- Singapore Eye Research Institute, Singapore.,Singapore National Eye Centre, Singapore.,Duke-NUS Medical School, Singapore
| | - Quan V Hoang
- Singapore Eye Research Institute, Singapore.,Singapore National Eye Centre, Singapore.,Duke-NUS Medical School, Singapore.,Department of Ophthalmology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore
| | - Yik-Ying Teo
- Saw Swee Hock School of Public Health, National University of Singapore
| | - Pirro G Hysi
- Section of Ophthalmology, School of Life Course Sciences, King's College London, United Kingdom Department of Twin Research and Genetic Epidemiology, School of Life Course Sciences, King's College London, UK.,UCL Great Ormond Street Hospital Institute of Child Health, University College London, UK
| | - Seang-Mei Saw
- Singapore Eye Research Institute, Singapore.,Saw Swee Hock School of Public Health, National University of Singapore.,Duke-NUS Medical School, Singapore
| |
Collapse
|
16
|
Lu H, Wang T, Zhang J, Zhang S, Huang S, Zeng P. Evaluating marginal genetic correlation of associated loci for complex diseases and traits between European and East Asian populations. Hum Genet 2021; 140:1285-1297. [PMID: 34091770 DOI: 10.1007/s00439-021-02299-8] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2020] [Accepted: 05/31/2021] [Indexed: 12/14/2022]
Abstract
Genome-wide association studies (GWASs) have successfully identified a large amount of single-nucleotide polymorphisms associated with many complex phenotypes in diverse populations. However, a comprehensive understanding of the genetic correlation of associated loci of phenotypes across populations remains lacking and the extent to which associations discovered in one population can be generalized to other populations or can be utilized for trans-ethnic genetic prediction is also unclear. By leveraging summary statistics, we proposed MAGIC to evaluate the trans-ethnic marginal genetic correlation (rm) of per-allele effect sizes for associated SNPs (P < 5E-8) under the framework of measurement error models. We confirmed the methodological advantage of MAGIC over general approaches through simulations and demonstrated its utility by analyzing 34 GWAS summary statistics of phenotypes from the East Asian (Nmax = 254,373) and European (Nmax = 1,220,901) populations. Among these phenotypes, rm was estimated to range from 0.584 (se = 0.140) for breast cancer to 0.949 (se = 0.035) for age of menarche, with an average of 0.835 (se = 0.045). We also uncovered that the trans-ethnic genetic prediction accuracy for phenotypes in the target population would substantially become low when using associated SNPs identified in non-target populations, indicating that associations discovered in the one population cannot be simply generalized to another population and that the accuracy of trans-ethnic phenotype prediction is generally dissatisfactory. Overall, our study provides in-depth insight into trans-ethnic genetic correlation and prediction for complex phenotypes across diverse populations.
Collapse
Affiliation(s)
- Haojie Lu
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ting Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Jinhui Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuo Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Shuiping Huang
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China
| | - Ping Zeng
- Department of Epidemiology and Biostatistics, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China. .,Center for Medical Statistics and Data Analysis, School of Public Health, Xuzhou Medical University, Xuzhou, 221004, Jiangsu, China.
| |
Collapse
|
17
|
Askland KD, Strong D, Wright MN, Moore JH. The Translational Machine: A novel machine-learning approach to illuminate complex genetic architectures. Genet Epidemiol 2021; 45:485-536. [PMID: 33942369 DOI: 10.1002/gepi.22383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2020] [Revised: 03/05/2021] [Accepted: 03/23/2021] [Indexed: 11/08/2022]
Abstract
The Translational Machine (TM) is a machine learning (ML)-based analytic pipeline that translates genotypic/variant call data into biologically contextualized features that richly characterize complex variant architectures and permit greater interpretability and biological replication. It also reduces potentially confounding effects of population substructure on outcome prediction. The TM consists of three main components. First, replicable but flexible feature engineering procedures translate genome-scale data into biologically informative features that appropriately contextualize simple variant calls/genotypes within biological and functional contexts. Second, model-free, nonparametric ML-based feature filtering procedures empirically reduce dimensionality and noise of both original genotype calls and engineered features. Third, a powerful ML algorithm for feature selection is used to differentiate risk variant contributions across variant frequency and functional prediction spectra. The TM simultaneously evaluates potential contributions of variants operative under polygenic and heterogeneous models of genetic architecture. Our TM enables integration of biological information (e.g., genomic annotations) within conceptual frameworks akin to geneset-/pathways-based and collapsing methods, but overcomes some of these methods' limitations. The full TM pipeline is executed in R. Our approach and initial findings from its application to a whole-exome schizophrenia case-control data set are presented. These TM procedures extend the findings of the primary investigation and yield novel results.
Collapse
Affiliation(s)
- Kathleen D Askland
- Waypoint Centre for Mental Health Care Penetanguishene, University of Toronto, Toronto, Ontario, Canada
| | - David Strong
- Department of Family Medicine and Public Health, University of California San Diego, San Diego, California, USA
| | - Marvin N Wright
- Department Biometry and Data Management, Leibniz Institute for Prevention Research and Epidemiology - BIPS GmbH, Germany
| | - Jason H Moore
- Department of Biostatistics, Epidemiology, & Informatics, The Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
18
|
Lopez-Cruz M, de Los Campos G. Optimal breeding-value prediction using a sparse selection index. Genetics 2021; 218:6179494. [PMID: 33748861 PMCID: PMC8128408 DOI: 10.1093/genetics/iyab030] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Accepted: 02/13/2021] [Indexed: 02/06/2023] Open
Abstract
Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP.
Collapse
Affiliation(s)
- Marco Lopez-Cruz
- Department of Plant, Soil and Microbial Sciences, Michigan State University, East Lansing, MI 48824, USA
| | - Gustavo de Los Campos
- Department of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI 48824, USA.,Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA.,Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, MI 48824, USA
| |
Collapse
|
19
|
Guo J, Bakshi A, Wang Y, Jiang L, Yengo L, Goddard ME, Visscher PM, Yang J. Quantifying genetic heterogeneity between continental populations for human height and body mass index. Sci Rep 2021; 11:5240. [PMID: 33664403 PMCID: PMC7933291 DOI: 10.1038/s41598-021-84739-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2019] [Accepted: 02/22/2021] [Indexed: 12/26/2022] Open
Abstract
Genome-wide association studies (GWAS) in samples of European ancestry have identified thousands of genetic variants associated with complex traits in humans. However, it remains largely unclear whether these associations can be used in non-European populations. Here, we seek to quantify the proportion of genetic variation for a complex trait shared between continental populations. We estimated the between-population correlation of genetic effects at all SNPs (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{g}$$\end{document}rg) or genome-wide significant SNPs (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$r_{{g\left( {GWS} \right)}}$$\end{document}rgGWS) for height and body mass index (BMI) in samples of European (EUR; \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n = 49,839$$\end{document}n=49,839) and African (AFR; \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n = 17,426$$\end{document}n=17,426) ancestry. The \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hat{r}_{g}$$\end{document}r^g between EUR and AFR was 0.75 (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\text{s}}.{\text{e}}. = 0.035$$\end{document}s.e.=0.035) for height and 0.68 (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\text{s}}.{\text{e}}. = 0.062$$\end{document}s.e.=0.062) for BMI, and the corresponding \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hat{r}_{{g\left( {GWS} \right)}}$$\end{document}r^gGWS was 0.82 (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\text{s}}.{\text{e}}. = 0.030$$\end{document}s.e.=0.030) for height and 0.87 (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$${\text{s}}.{\text{e}}. = 0.064$$\end{document}s.e.=0.064) for BMI, suggesting that a large proportion of GWAS findings discovered in Europeans are likely applicable to non-Europeans for height and BMI. There was no evidence that \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$\hat{r}_{g}$$\end{document}r^g differs in SNP groups with different levels of between-population difference in allele frequency or linkage disequilibrium, which, however, can be due to the lack of power.
Collapse
Affiliation(s)
- Jing Guo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia.,Human Genetics, Wellcome Sanger Institute, Hinxton, CB10 1SA, UK
| | - Andrew Bakshi
- Monash Partners Comprehensive Cancer Consortium, Monash Biomedicine Discovery Institute Cancer Program, Prostate Cancer Research Group, Department of Anatomy and Developmental Biology, Monash University, Clayton, VIC, 3800, Australia.,Queensland Brain Institute, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Ying Wang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Longda Jiang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Loic Yengo
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Michael E Goddard
- Faculty of Veterinary and Agricultural Science, University of Melbourne, Parkville, VIC, Australia.,Biosciences Research Division, Department of Economic Development, Jobs, Transport and Resources, Bundoora, VIC, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia
| | - Jian Yang
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, QLD, 4072, Australia. .,School of Life Sciences, Westlake University, Hangzhou, 310024, Zhejiang, China. .,Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou, 310024, Zhejiang, China.
| |
Collapse
|
20
|
Kosińska-Selbi B, Suchocki T, Egger-Danner C, Schwarzenbacher H, Frąszczak M, Szyda J. Exploring the Potential Genetic Heterogeneity in the Incidence of Hoof Disorders in Austrian Fleckvieh and Braunvieh Cattle. Front Genet 2020; 11:577116. [PMID: 33281874 PMCID: PMC7705352 DOI: 10.3389/fgene.2020.577116] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 10/21/2020] [Indexed: 11/13/2022] Open
Abstract
Genetic heterogeneity denotes the situation when different genetic architectures underlying diverse populations result in the same phenotype. In this study, we explore the genetic background underlying differences in the incidence of hoof disorders between Braunvieh and Fleckvieh cattle in the context of genetic heterogeneity between the breeds. Despite potentially higher power of testing due to twice as large sample size, none of the SNPs was significantly associated with the total number of hoof disorders in Fleckvieh, while 15 SNPs were significant in Braunvieh. The most promising candidate genes in Braunvieh were as follows: CBLB on BTA1, which causes arthritis in rats; CAV2 on BTA4, which affects skeletal muscles in mice; PTHLH on BTA5, which causes disease phenotypes related to the skeleton in humans, mice, and zebrafish; and SORCS2 on BTA6, which causes decreased susceptibility to injury in mice. Some of the significant SNPs (BTA1, BTA4, BTA5, BTA13, and BTA16) revealed allelic heterogeneity-i.e., different allele frequencies between Fleckvieh and Braunvieh. Some of the significant regions (BTA1, BTA5, BTA13, and BTA16) correlated to inter-breed differences in linkage disequilibrium (LD) structure and may thus represent false-positive heterogeneity. However, positions on BTA6 (SORCS2), BTA14, and BTA24 mark Braunvieh-specific regions. We hypothesize that the observed genetic heterogeneity of hoof disorders is a by-product of different selection goals defined for the analyzed breeds-toward dairy production in Braunvieh and toward beef production in Fleckvieh. Based on the current dataset, it is not possible to unequivocally confirm or exclude the hypothesis of genetic heterogeneity in the susceptibility to hoof disorders between Fleckvieh and Braunvieh. The main reason for the problem is that the potential heterogeneity was explored through SNP-phenotype associations and not through causal mutations, due to a limited SNP density offered by the SNP-chip. The rationale against genetic heterogeneity comprises a limited power of detection of true associations as well as differences in the length of LD blocks and in linkage phase between breeds. On the other hand, different selection goals defined for the analyzed breeds accompanied by no systematic, genome-wide differences in LD structure between the breeds favor the heterogeneity hypothesis at some smaller genomic regions.
Collapse
Affiliation(s)
- Barbara Kosińska-Selbi
- Biostatistic Group, Department of Genetics, Wrocław University of Environmental and Life Sciences, Wrocław, Poland
| | - Tomasz Suchocki
- Biostatistic Group, Department of Genetics, Wrocław University of Environmental and Life Sciences, Wrocław, Poland
- National Research Institute of Animal Production, Balice, Poland
| | | | | | - Magdalena Frąszczak
- Biostatistic Group, Department of Genetics, Wrocław University of Environmental and Life Sciences, Wrocław, Poland
| | - Joanna Szyda
- Biostatistic Group, Department of Genetics, Wrocław University of Environmental and Life Sciences, Wrocław, Poland
- National Research Institute of Animal Production, Balice, Poland
| |
Collapse
|
21
|
Abstract
Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level, and provide a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that the predictive accuracy of height PRS increases linearly with European ancestry and is partially explained by European ancestry segments of the admixed genomes. We show that recombination rate, differences in allele frequencies, and differences in marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts.
Collapse
|
22
|
Rio S, Moreau L, Charcosset A, Mary-Huard T. Accounting for Group-Specific Allele Effects and Admixture in Genomic Predictions: Theory and Experimental Evaluation in Maize. Genetics 2020; 216:27-41. [PMID: 32680885 PMCID: PMC7463286 DOI: 10.1534/genetics.120.303278] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Accepted: 07/10/2020] [Indexed: 02/01/2023] Open
Abstract
Populations structured into genetic groups may display group-specific linkage disequilibrium, mutations, and/or interactions between quantitative trait loci and the genetic background. These factors lead to heterogeneous marker effects affecting the efficiency of genomic prediction, especially for admixed individuals. Such individuals have a genome that is a mosaic of chromosome blocks from different origins, and may be of interest to combine favorable group-specific characteristics. We developed two genomic prediction models adapted to the prediction of admixed individuals in presence of heterogeneous marker effects: multigroup admixed genomic best linear unbiased prediction random individual (MAGBLUP-RI), modeling the ancestry of alleles; and multigroup admixed genomic best linear unbiased prediction random allele effect (MAGBLUP-RAE), modeling group-specific distributions of allele effects. MAGBLUP-RI can estimate the segregation variance generated by admixture while MAGBLUP-RAE can disentangle the variability that is due to main allele effects from the variability that is due to group-specific deviation allele effects. Both models were evaluated for their genomic prediction accuracy using a maize panel including lines from the Dent and Flint groups, along with admixed individuals. Based on simulated traits, both models proved their efficiency to improve genomic prediction accuracy compared to standard GBLUP models. For real traits, a clear gain was observed at low marker densities whereas it became limited at high marker densities. The interest of including admixed individuals in multigroup training sets was confirmed using simulated traits, but was variable using real traits. Both MAGBLUP models and admixed individuals are of interest whenever group-specific SNP allele effects exist.
Collapse
Affiliation(s)
- Simon Rio
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Laurence Moreau
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Alain Charcosset
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
| | - Tristan Mary-Huard
- Université Paris-Saclay, INRAE, CNRS, AgroParisTech, GQE-Le Moulon, 91190 Gif-sur-Yvette, France
- MIA, INRAE, AgroParisTech, Université Paris-Saclay, 75005 Paris, France
| |
Collapse
|
23
|
Funkhouser SA, Vazquez AI, Steibel JP, Ernst CW, Los Campos GD. Deciphering Sex-Specific Genetic Architectures Using Local Bayesian Regressions. Genetics 2020; 215:231-241. [PMID: 32198180 PMCID: PMC7198271 DOI: 10.1534/genetics.120.303120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 03/01/2020] [Indexed: 11/18/2022] Open
Abstract
Many complex human traits exhibit differences between sexes. While numerous factors likely contribute to this phenomenon, growing evidence from genome-wide studies suggest a partial explanation: that males and females from the same population possess differing genetic architectures. Despite this, mapping gene-by-sex (G×S) interactions remains a challenge likely because the magnitude of such an interaction is typically and exceedingly small; traditional genome-wide association techniques may be underpowered to detect such events, due partly to the burden of multiple test correction. Here, we developed a local Bayesian regression (LBR) method to estimate sex-specific SNP marker effects after fully accounting for local linkage-disequilibrium (LD) patterns. This enabled us to infer sex-specific effects and G×S interactions either at the single SNP level, or by aggregating the effects of multiple SNPs to make inferences at the level of small LD-based regions. Using simulations in which there was imperfect LD between SNPs and causal variants, we showed that aggregating sex-specific marker effects with LBR provides improved power and resolution to detect G×S interactions over traditional single-SNP-based tests. When using LBR to analyze traits from the UK Biobank, we detected a relatively large G×S interaction impacting bone mineral density within ABO, and replicated many previously detected large-magnitude G×S interactions impacting waist-to-hip ratio. We also discovered many new G×S interactions impacting such traits as height and body mass index (BMI) within regions of the genome where both male- and female-specific effects explain a small proportion of phenotypic variance (R2 < 1 × 10-4), but are enriched in known expression quantitative trait loci.
Collapse
Affiliation(s)
- Scott A Funkhouser
- Institute for Behavioral Genetics, The University of Colorado, Boulder, Colorado 80309
- Genetics Graduate Program, Michigan State University, East Lansing, Michigan 48824
| | - Ana I Vazquez
- Departments of Epidemiology and Biostatistics and Statistics and Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| | - Juan P Steibel
- Department of Animal Science, Michigan State University, East Lansing, Michigan, 48824
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing, Michigan, 48824
| | - Gustavo de Los Campos
- Departments of Epidemiology and Biostatistics and Statistics and Probability, Institute for Quantitative Health Science and Engineering, Michigan State University, East Lansing, Michigan, 48824
| |
Collapse
|
24
|
Melamud E, Taylor DL, Sethi A, Cule M, Baryshnikova A, Saleheen D, van Bruggen N, FitzGerald GA. The promise and reality of therapeutic discovery from large cohorts. J Clin Invest 2020; 130:575-581. [PMID: 31929188 PMCID: PMC6994121 DOI: 10.1172/jci129196] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Technological advances in rapid data acquisition have transformed medical biology into a data mining field, where new data sets are routinely dissected and analyzed by statistical models of ever-increasing complexity. Many hypotheses can be generated and tested within a single large data set, and even small effects can be statistically discriminated from a sea of noise. On the other hand, the development of therapeutic interventions moves at a much slower pace. They are determined from carefully randomized and well-controlled experiments with explicitly stated outcomes as the principal mechanism by which a single hypothesis is tested. In this paradigm, only a small fraction of interventions can be tested, and an even smaller fraction are ultimately deemed therapeutically successful. In this Review, we propose strategies to leverage large-cohort data to inform the selection of targets and the design of randomized trials of novel therapeutics. Ultimately, the incorporation of big data and experimental medicine approaches should aim to reduce the failure rate of clinical trials as well as expedite and lower the cost of drug development.
Collapse
Affiliation(s)
- Eugene Melamud
- Calico Life Sciences LLC, South San Francisco, California, USA
| | | | - Anurag Sethi
- Calico Life Sciences LLC, South San Francisco, California, USA
| | - Madeleine Cule
- Calico Life Sciences LLC, South San Francisco, California, USA
| | | | | | | | - Garret A. FitzGerald
- Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
25
|
Affiliation(s)
- Yixin Wang
- Department of Statistics, Columbia University, New York, NY
| | - David M. Blei
- Department of Statistics, Columbia University, New York, NY
- Department of Computer Science, Columbia University, New York, NY
| |
Collapse
|