1
|
Atanda SA, Bandillo N. Genomic-inferred cross-selection methods for multi-trait improvement in a recurrent selection breeding program. PLANT METHODS 2024; 20:133. [PMID: 39218896 PMCID: PMC11367796 DOI: 10.1186/s13007-024-01258-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Accepted: 08/05/2024] [Indexed: 09/04/2024]
Abstract
The major drawback to the implementation of genomic selection in a breeding program lies in long-term decrease in additive genetic variance, which is a trade-off for rapid genetic improvement in short term. Balancing increase in genetic gain with retention of additive genetic variance necessitates careful optimization of this trade-off. In this study, we proposed an integrated index selection approach within the genomic inferred cross-selection (GCS) framework to maximize genetic gain across multiple traits. With this method, we identified optimal crosses that simultaneously maximize progeny performance and maintain genetic variance for multiple traits. Using a stochastic simulated recurrent breeding program over a 40-years period, we evaluated different GCS methods along with other factors, such as the number of parents, crosses, and progeny per cross, that influence genetic gain in a pulse crop breeding program. Across all breeding scenarios, the posterior mean variance consistently enhances genetic gain when compared to other methods, such as the usefulness criterion, optimal haploid value, mean genomic estimated breeding value, and mean index selection value of the superior parents. In addition, we provide a detailed strategy to optimize the number of parents, crosses, and progeny per cross that can potentially maximize short- and long-term genetic gain in a public breeding program.
Collapse
Affiliation(s)
- Sikiru Adeniyi Atanda
- Agricultural Data Analytics Unit, North Dakota State University, Fargo, ND, 58105-6050, USA.
| | - Nonoy Bandillo
- Department of Plant Sciences, North Dakota State University, Fargo, ND, 58108-6050, USA.
| |
Collapse
|
2
|
Mujica PC, Martinez V. A purebred South American breed showing high effective population size and independent breed ancestry: The Chilean Terrier. Anim Genet 2023; 54:772-785. [PMID: 37778752 DOI: 10.1111/age.13359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Revised: 08/31/2023] [Accepted: 09/09/2023] [Indexed: 10/03/2023]
Abstract
The Chilean Terrier is a known breed in Chile that has not been genetically assessed despite its distinctive color patterns, agility, and hardiness across the diversity of climates encountered within the Chilean landscape. The population structure and its relatedness with other breeds, as well as the actual origin of the breed, remain unknown. We estimated several population parameters using samples from individuals representing the distribution of the Chilean Terrier across the country. By utilizing the Illumina HD canine genotyping array, we computed the effective population size (Ne ), individual inbreeding, and relatedness to evaluate the genetic diversity of the breed. The results show that linkage disequilibrium was relatively low and decayed rapidly; in fact, Ne was very high when compared to other breeds, and similar to other American indigenous breeds (such as the Chihuahua with values of Ne near 500). These results are in line with the low estimates of genomic inbreeding and relatedness and the relatively large number of effective chromosome segments (Me = 2467) obtained using the properties of the genomic relationship matrix. Between population analysis (cross-population extended haplotype homozygosity, di ) with other breeds such as the Jack Russell Terrier, the Peruvian-Inca Orchid, and the Chihuahua suggested that candidate regions harboring FGF5, PAX3, and ASIP, probably explained some morphological traits, such as the distinctive color pattern characteristic of the breed. When considering Admixture estimates and phylogenetic analysis, together with other breeds of American and European origin, the Chilean Terrier does not have a recent European ancestry. Overall, the results suggest that the breed has evolved independently in Chile from other terrier breeds, from an unknown European terrier ancestor.
Collapse
Affiliation(s)
- Paola C Mujica
- FAVET-INBIOGEN Laboratory, Faculty of Veterinary Sciences, Universidad de Chile, Santiago, Chile
| | - Víctor Martinez
- FAVET-INBIOGEN Laboratory, Faculty of Veterinary Sciences, Universidad de Chile, Santiago, Chile
| |
Collapse
|
3
|
Ding Y, Hou K, Xu Z, Pimplaskar A, Petter E, Boulier K, Privé F, Vilhjálmsson BJ, Olde Loohuis LM, Pasaniuc B. Polygenic scoring accuracy varies across the genetic ancestry continuum. Nature 2023; 618:774-781. [PMID: 37198491 PMCID: PMC10284707 DOI: 10.1038/s41586-023-06079-4] [Citation(s) in RCA: 79] [Impact Index Per Article: 79.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Accepted: 04/12/2023] [Indexed: 05/19/2023]
Abstract
Polygenic scores (PGSs) have limited portability across different groupings of individuals (for example, by genetic ancestries and/or social determinants of health), preventing their equitable use1-3. PGS portability has typically been assessed using a single aggregate population-level statistic (for example, R2)4, ignoring inter-individual variation within the population. Here, using a large and diverse Los Angeles biobank5 (ATLAS, n = 36,778) along with the UK Biobank6 (UKBB, n = 487,409), we show that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries7 in all considered populations, even within traditionally labelled 'homogeneous' genetic ancestries. The decreasing trend is well captured by a continuous measure of genetic distance (GD) from the PGS training data: Pearson correlation of -0.95 between GD and PGS accuracy averaged across 84 traits. When applying PGS models trained on individuals labelled as white British in the UKBB to individuals with European ancestries in ATLAS, individuals in the furthest GD decile have 14% lower accuracy relative to the closest decile; notably, the closest GD decile of individuals with Hispanic Latino American ancestries show similar PGS performance to the furthest GD decile of individuals with European ancestries. GD is significantly correlated with PGS estimates themselves for 82 of 84 traits, further emphasizing the importance of incorporating the continuum of genetic ancestries in PGS interpretation. Our results highlight the need to move away from discrete genetic ancestry clusters towards the continuum of genetic ancestries when considering PGSs.
Collapse
Affiliation(s)
- Yi Ding
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
| | - Kangcheng Hou
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ziqi Xu
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Aditya Pimplaskar
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Ella Petter
- Department of Computer Science, UCLA, Los Angeles, CA, USA
| | - Kristin Boulier
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA
| | - Florian Privé
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
| | - Bjarni J Vilhjálmsson
- National Centre for Register-based Research, Aarhus University, Aarhus, Denmark
- Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
- Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute, Cambridge, MA, USA
| | - Loes M Olde Loohuis
- Center for Neurobehavioral Genetics, Semel Institute for Neuroscience and Human Behavior, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA
| | - Bogdan Pasaniuc
- Bioinformatics Interdepartmental Program, UCLA, Los Angeles, CA, USA.
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Department of Pathology and Laboratory Medicine, David Geffen School of Medicine at UCLA, Los Angeles, CA, USA.
- Institute for Precision Health, UCLA, Los Angeles, CA, USA.
| |
Collapse
|
4
|
Hao X, Liang A, Plastow G, Zhang C, Wang Z, Liu J, Salzano A, Gasparrini B, Campanile G, Zhang S, Yang L. An Integrative Genomic Prediction Approach for Predicting Buffalo Milk Traits by Incorporating Related Cattle QTLs. Genes (Basel) 2022; 13:genes13081430. [PMID: 36011341 PMCID: PMC9408041 DOI: 10.3390/genes13081430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2022] [Revised: 08/08/2022] [Accepted: 08/09/2022] [Indexed: 11/16/2022] Open
Abstract
Background: The 90K Axiom Buffalo SNP Array is expected to improve and speed up various genomic analyses for the buffalo (Bubalus bubalis). Genomic prediction is an effective approach in animal breeding to improve selection and reduce costs. As buffalo genome research is lagging behind that of the cow and production records are also limited, genomic prediction performance will be relatively poor. To improve the genomic prediction in buffalo, we introduced a new approach (pGBLUP) for genomic prediction of six buffalo milk traits by incorporating QTL information from the cattle milk traits in order to help improve the prediction performance for buffalo. Results: In simulations, the pGBLUP could outperform BayesR and the GBLUP if the prior biological information (i.e., the known causal loci) was appropriate; otherwise, it performed slightly worse than BayesR and equal to or better than the GBLUP. In real data, the heritability of the buffalo genomic region corresponding to the cattle milk trait QTLs was enriched (fold of enrichment > 1) in four buffalo milk traits (FY270, MY270, PY270, and PM) when the EBV was used as the response variable. The DEBV as the response variable yielded more reliable genomic predictions than the traditional EBV, as has been shown by previous research. The performance of the three approaches (GBLUP, BayesR, and pGBLUP) did not vary greatly in this study, probably due to the limited sample size, incomplete prior biological information, and less artificial selection in buffalo. Conclusions: To our knowledge, this study is the first to apply genomic prediction to buffalo by incorporating prior biological information. The genomic prediction of buffalo traits can be further improved with a larger sample size, higher-density SNP chips, and more precise prior biological information.
Collapse
Affiliation(s)
- Xingjie Hao
- Department of Epidemiology and Biostatistics, School of Public Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
- Correspondence: (X.H.); (L.Y.)
| | - Aixin Liang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Graham Plastow
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Chunyan Zhang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Zhiquan Wang
- Livestock Gentec Center, Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB T6G 2C8, Canada
| | - Jiajia Liu
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Angela Salzano
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Bianca Gasparrini
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Giuseppe Campanile
- Department of Veterinary Medicine and Animal Productions, University of Naples “Federico II”, 80137 Naples, Italy
| | - Shujun Zhang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
| | - Liguo Yang
- Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction of Ministry of Education, Huazhong Agricultural University, Wuhan 430070, China
- Correspondence: (X.H.); (L.Y.)
| |
Collapse
|
5
|
Elsen JM. Genomic Prediction of Complex Traits, Principles, Overview of Factors Affecting the Reliability of Genomic Prediction, and Algebra of the Reliability. Methods Mol Biol 2022; 2467:45-76. [PMID: 35451772 DOI: 10.1007/978-1-0716-2205-6_2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
The quality of the predictions of genetic values based on the genotyping of neutral markers (GEBVs) is a key information to decide whether or not to implement genomic selection. This quality depends on the part of the genetic variability captured by the markers and on the precision of the estimate of their effects. Selection index theory provided the framework for evaluating the accuracy of GEBVs once the information had been gathered, with the genomic relationship matrix (GRM) playing a central role. When this accuracy must be known a priori, the theory of quantitative genetics gives clues to calculate the expectation of this GRM. This chapter makes a critical inventory of the methods developed to calculate these accuracies a posteriori and a priori. The most significant factors affecting this accuracy are described (size of the reference population, number of markers, linkage disequilibrium, heritability).
Collapse
Affiliation(s)
- Jean-Michel Elsen
- GenPhySE, Université de Toulouse, INRAE, ENVT, Castanet Tolosan, France.
| |
Collapse
|
6
|
Zhou X, Lee SH. An integrative analysis of genomic and exposomic data for complex traits and phenotypic prediction. Sci Rep 2021; 11:21495. [PMID: 34728654 PMCID: PMC8564528 DOI: 10.1038/s41598-021-00427-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 10/12/2021] [Indexed: 12/18/2022] Open
Abstract
Complementary to the genome, the concept of exposome has been proposed to capture the totality of human environmental exposures. While there has been some recent progress on the construction of the exposome, few tools exist that can integrate the genome and exposome for complex trait analyses. Here we propose a linear mixed model approach to bridge this gap, which jointly models the random effects of the two omics layers on phenotypes of complex traits. We illustrate our approach using traits from the UK Biobank (e.g., BMI and height for N ~ 35,000) with a small fraction of the exposome that comprises 28 lifestyle factors. The joint model of the genome and exposome explains substantially more phenotypic variance and significantly improves phenotypic prediction accuracy, compared to the model based on the genome alone. The additional phenotypic variance captured by the exposome includes its additive effects as well as non-additive effects such as genome-exposome (gxe) and exposome-exposome (exe) interactions. For example, 19% of variation in BMI is explained by additive effects of the genome, while additional 7.2% by additive effects of the exposome, 1.9% by exe interactions and 4.5% by gxe interactions. Correspondingly, the prediction accuracy for BMI, computed using Pearson's correlation between the observed and predicted phenotypes, improves from 0.15 (based on the genome alone) to 0.35 (based on the genome and exposome). We also show, using established theories, that integrating genomic and exposomic data can be an effective way of attaining a clinically meaningful level of prediction accuracy for disease traits. In conclusion, the genomic and exposomic effects can contribute to phenotypic variation via their latent relationships, i.e. genome-exposome correlation, and gxe and exe interactions, and modelling these effects has a potential to improve phenotypic prediction accuracy and thus holds a great promise for future clinical practice.
Collapse
Affiliation(s)
- Xuan Zhou
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia
| | - S Hong Lee
- Australian Centre for Precision Health, University of South Australia, Adelaide, SA, 5000, Australia.
- UniSA Allied Health and Human Performance, University of South Australia, Adelaide, SA, 5000, Australia.
- South Australian Health and Medical Research Institute, Adelaide, SA, 5000, Australia.
| |
Collapse
|
7
|
Rabier C, Grusea S. Prediction in high‐dimensional linear models and application to genomic selection under imperfect linkage disequilibrium. J R Stat Soc Ser C Appl Stat 2021. [DOI: 10.1111/rssc.12496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Charles‐Elie Rabier
- ISE‐MUMR 5554CNRSIRDUniversité de Montpellier France
- IMAGUMR 5149CNRSUniversité de Montpellier France
- LIRMMUMR 5506CNRSUniversité de Montpellier France
| | - Simona Grusea
- Institut de Mathématiques de Toulouse Université de ToulouseINSA de Toulouse France
| |
Collapse
|
8
|
Akbarzadeh M, Dehkordi SR, Roudbar MA, Sargolzaei M, Guity K, Sedaghati-Khayat B, Riahi P, Azizi F, Daneshpour MS. GWAS findings improved genomic prediction accuracy of lipid profile traits: Tehran Cardiometabolic Genetic Study. Sci Rep 2021; 11:5780. [PMID: 33707626 PMCID: PMC7952573 DOI: 10.1038/s41598-021-85203-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 02/26/2021] [Indexed: 12/15/2022] Open
Abstract
In recent decades, ongoing GWAS findings discovered novel therapeutic modifications such as whole-genome risk prediction in particular. Here, we proposed a method based on integrating the traditional genomic best linear unbiased prediction (gBLUP) approach with GWAS information to boost genetic prediction accuracy and gene-based heritability estimation. This study was conducted in the framework of the Tehran Cardio-metabolic Genetic study (TCGS) containing 14,827 individuals and 649,932 SNP markers. Five SNP subsets were selected based on GWAS results: top 1%, 5%, 10%, 50% significant SNPs, and reported associated SNPs in previous studies. Furthermore, we randomly selected subsets as large as every five subsets. Prediction accuracy has been investigated on lipid profile traits with a tenfold and 10-repeat cross-validation algorithm by the gBLUP method. Our results revealed that genetic prediction based on selected subsets of SNPs obtained from the dataset outperformed the subsets from previously reported SNPs. Selected SNPs' subsets acquired a more precise prediction than whole SNPs and much higher than randomly selected SNPs. Also, common SNPs with the most captured prediction accuracy in the selected sets caught the highest gene-based heritability. However, it is better to be mindful of the fact that a small number of SNPs obtained from GWAS results could capture a highly notable proportion of variance and prediction accuracy.
Collapse
Affiliation(s)
- Mahdi Akbarzadeh
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Saeid Rasekhi Dehkordi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Mahmoud Amiri Roudbar
- Department of Animal Science, Safiabad-Dezful Agricultural and Natural Resources Research and Education Center, Agricultural Research, Education & Extension Organization (AREEO), Dezful, Iran
| | - Mehdi Sargolzaei
- Department of Pathobiology, Ontario Veterinary College, University of Guelph, Guelph, Canada
- Select Sires Inc., Plain City, USA
| | - Kamran Guity
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Bahareh Sedaghati-Khayat
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Parisa Riahi
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran
| | - Fereidoun Azizi
- Endocrine Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Maryam S Daneshpour
- Cellular and Molecular Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, POBox: 19195-4763, Tehran, Iran.
| |
Collapse
|
9
|
Marjanovic J, Calus MPL. Factors affecting accuracy of estimated effective number of chromosome segments for numerically small breeds. J Anim Breed Genet 2021; 138:151-160. [PMID: 33040409 PMCID: PMC7891385 DOI: 10.1111/jbg.12512] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2020] [Revised: 08/25/2020] [Accepted: 09/12/2020] [Indexed: 11/28/2022]
Abstract
For numerically small breeds, obtaining a sufficiently large breed-specific reference population for genomic prediction is challenging or simply not possible, but may be overcome by adding individuals from another breed. To prioritize among available breeds, the effective number of chromosome segments (Me ) can be used as an indicator of relatedness between individuals from different breeds. The Me is also an important parameter in determining the accuracy of genomic prediction. The Me can be estimated both within a population and between two populations or breeds, as the reciprocal of the variance of genomic relationships. However, the threshold for number of individuals needed to accurately estimate within or between populations Me is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of Me between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated Me is affected by number of genotyped individuals, single-nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of Me . While estimates of within population Me were hardly affected by SNP density, between population Me values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing Me , estimates of within population Me were three to four times higher than estimates with genotypes only; however, between Me estimates remained the same. For accurate estimation of within and between population Me , at least 50 individuals should be genotyped per population. Estimates of within Me were highly affected by whether pedigree was used or not. For within Me , even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between Me , many more markers are needed to capture all independent segments.
Collapse
Affiliation(s)
- Jovana Marjanovic
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| | - Mario P. L. Calus
- Animal Breeding and GenomicsWageningen University & ResearchWageningenThe Netherlands
| |
Collapse
|
10
|
Xiang R, MacLeod IM, Daetwyler HD, de Jong G, O’Connor E, Schrooten C, Chamberlain AJ, Goddard ME. Genome-wide fine-mapping identifies pleiotropic and functional variants that predict many traits across global cattle populations. Nat Commun 2021; 12:860. [PMID: 33558518 PMCID: PMC7870883 DOI: 10.1038/s41467-021-21001-0] [Citation(s) in RCA: 40] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 11/23/2020] [Indexed: 02/08/2023] Open
Abstract
The difficulty in finding causative mutations has hampered their use in genomic prediction. Here, we present a methodology to fine-map potentially causal variants genome-wide by integrating the functional, evolutionary and pleiotropic information of variants using GWAS, variant clustering and Bayesian mixture models. Our analysis of 17 million sequence variants in 44,000+ Australian dairy cattle for 34 traits suggests, on average, one pleiotropic QTL existing in each 50 kb chromosome-segment. We selected a set of 80k variants representing potentially causal variants within each chromosome segment to develop a bovine XT-50K genotyping array. The custom array contains many pleiotropic variants with biological functions, including splicing QTLs and variants at conserved sites across 100 vertebrate species. This biology-informed custom array outperformed the standard array in predicting genetic value of multiple traits across populations in independent datasets of 90,000+ dairy cattle from the USA, Australia and New Zealand.
Collapse
Affiliation(s)
- Ruidong Xiang
- grid.1008.90000 0001 2179 088XFaculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC Australia ,grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Iona M. MacLeod
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Hans D. Daetwyler
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia ,grid.1018.80000 0001 2342 0938School of Applied Systems Biology, La Trobe University, Bundoora, VIC Australia
| | | | | | | | - Amanda J. Chamberlain
- grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| | - Michael E. Goddard
- grid.1008.90000 0001 2179 088XFaculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC Australia ,grid.452283.a0000 0004 0407 2669Agriculture Victoria, AgriBio, Centre for AgriBiosciences, Bundoora, VIC Australia
| |
Collapse
|
11
|
Atanda SA, Olsen M, Burgueño J, Crossa J, Dzidzienyo D, Beyene Y, Gowda M, Dreher K, Zhang X, Prasanna BM, Tongoona P, Danquah EY, Olaoye G, Robbins KR. Maximizing efficiency of genomic selection in CIMMYT's tropical maize breeding program. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2021; 134:279-294. [PMID: 33037897 PMCID: PMC7813723 DOI: 10.1007/s00122-020-03696-9] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2020] [Accepted: 09/23/2020] [Indexed: 06/01/2023]
Abstract
Historical data from breeding programs can be efficiently used to improve genomic selection accuracy, especially when the training set is optimized to subset individuals most informative of the target testing set. The current strategy for large-scale implementation of genomic selection (GS) at the International Maize and Wheat Improvement Center (CIMMYT) global maize breeding program has been to train models using information from full-sibs in a "test-half-predict-half approach." Although effective, this approach has limitations, as it requires large full-sib populations and limits the ability to shorten variety testing and breeding cycle times. The primary objective of this study was to identify optimal experimental and training set designs to maximize prediction accuracy of GS in CIMMYT's maize breeding programs. Training set (TS) design strategies were evaluated to determine the most efficient use of phenotypic data collected on relatives for genomic prediction (GP) using datasets containing 849 (DS1) and 1389 (DS2) DH-lines evaluated as testcrosses in 2017 and 2018, respectively. Our results show there is merit in the use of multiple bi-parental populations as TS when selected using algorithms to maximize relatedness between the training and prediction sets. In a breeding program where relevant past breeding information is not readily available, the phenotyping expenditure can be spread across connected bi-parental populations by phenotyping only a small number of lines from each population. This significantly improves prediction accuracy compared to within-population prediction, especially when the TS for within full-sib prediction is small. Finally, we demonstrate that prediction accuracy in either sparse testing or "test-half-predict-half" can further be improved by optimizing which lines are planted for phenotyping and which lines are to be only genotyped for advancement based on GP.
Collapse
Affiliation(s)
- Sikiru Adeniyi Atanda
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA
| | - Michael Olsen
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya.
| | - Juan Burgueño
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Jose Crossa
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Daniel Dzidzienyo
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
| | - Yoseph Beyene
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Manje Gowda
- International Maize and Wheat Improvement Center (CIMMYT), Nairobi, Kenya
| | - Kate Dreher
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | - Xuecai Zhang
- International Maize and Wheat Improvement Center (CIMMYT), Texcoco, Mexico
| | | | - Pangirayi Tongoona
- West Africa Center for Crop Improvement (WACCI), University of Ghana, Accra, Ghana
| | | | - Gbadebo Olaoye
- Agronomy Department, University of Ilorin, Ilorin, Nigeria
| | - Kelly R Robbins
- Section of Plant Breeding and Genetics, School of Integrative Plant Sciences, Cornell University, Ithaca, NY, USA.
| |
Collapse
|
12
|
Folkersen L, Pain O, Ingason A, Werge T, Lewis CM, Austin J. Impute.me: An Open-Source, Non-profit Tool for Using Data From Direct-to-Consumer Genetic Testing to Calculate and Interpret Polygenic Risk Scores. Front Genet 2020; 11:578. [PMID: 32714365 PMCID: PMC7340159 DOI: 10.3389/fgene.2020.00578] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 05/11/2020] [Indexed: 01/07/2023] Open
Abstract
To date, interpretation of genomic information has focused on single variants conferring disease risk, but most disorders of major public concern have a polygenic architecture. Polygenic risk scores (PRSs) give a single measure of disease liability by summarizing disease risk across hundreds of thousands of genetic variants. They can be calculated in any genome-wide genotype data-source, using a prediction model based on genome-wide summary statistics from external studies. As genome-wide association studies increase in power, the predictive ability for disease risk will also increase. Although PRSs are unlikely ever to be fully diagnostic, they may give valuable medical information for risk stratification, prognosis, or treatment response prediction. Public engagement is therefore becoming important on the potential use and acceptability of PRSs. However, the current public perception of genetics is that it provides "yes/no" answers about the presence/absence of a condition, or the potential for developing a condition, which in not the case for common, complex disorders with polygenic architecture. Meanwhile, unregulated third-party applications are being developed to satisfy consumer demand for information on the impact of lower-risk variants on common diseases that are highly polygenic. Often, applications report results from single-nucleotide polymorphisms (SNPs) and disregard effect size, which is highly inappropriate for common, complex disorders where everybody carries risk variants. Tools are therefore needed to communicate our understanding of genetic vulnerability as a continuous trait, where a genetic liability confers risk for disease. Impute.me is one such tool, whose focus is on education and information on common, complex disorders with polygenetic architecture. Its research-focused open-source website allows users to upload consumer genetics data to obtain PRSs, with results reported on a population-level normal distribution. Diseases can only be browsed by International Classification of Diseases, 10th Revision (ICD-10) chapter-location or alphabetically, thus prompting the user to consider genetic risk scores in a medical context of relevance to the individual. Here, we present an overview of the implementation of the impute.me site, along with analysis of typical usage patterns, which may advance public perception of genomic risk and precision medicine.
Collapse
Affiliation(s)
- Lasse Folkersen
- Institute of Biological Psychiatry, Mental Health Centre Sankt Hans, Copenhagen, Denmark
| | - Oliver Pain
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
| | - Andrés Ingason
- Institute of Biological Psychiatry, Mental Health Centre Sankt Hans, Copenhagen, Denmark
| | - Thomas Werge
- Institute of Biological Psychiatry, Mental Health Centre Sankt Hans, Copenhagen, Denmark
| | - Cathryn M. Lewis
- Social, Genetic and Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King’s College London, London, United Kingdom
- Department of Medical & Molecular Genetics, Faculty of Life Sciences & Medicine, King’s College London, London, United Kingdom
| | - Jehannine Austin
- Department of Psychiatry, University of British Columbia, Vancouver, BC, Canada
- Department of Medical Genetics, University of British Columbia, Vancouver, BC, Canada
| |
Collapse
|
13
|
Efficient polygenic risk scores for biobank scale data by exploiting phenotypes from inferred relatives. Nat Commun 2020; 11:3074. [PMID: 32555176 PMCID: PMC7299943 DOI: 10.1038/s41467-020-16829-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2020] [Accepted: 05/25/2020] [Indexed: 01/06/2023] Open
Abstract
Polygenic risk scores are emerging as a potentially powerful tool to predict future phenotypes of target individuals, typically using unrelated individuals, thereby devaluing information from relatives. Here, for 50 traits from the UK Biobank data, we show that a design of 5,000 individuals with first-degree relatives of target individuals can achieve a prediction accuracy similar to that of around 220,000 unrelated individuals (mean prediction accuracy = 0.26 vs. 0.24, mean fold-change = 1.06 (95% CI: 0.99-1.13), P-value = 0.08), despite a 44-fold difference in sample size. For lifestyle traits, the prediction accuracy with 5,000 individuals including first-degree relatives of target individuals is significantly higher than that with 220,000 unrelated individuals (mean prediction accuracy = 0.22 vs. 0.16, mean fold-change = 1.40 (1.17-1.62), P-value = 0.025). Our findings suggest that polygenic prediction integrating family information may help to accelerate precision health and clinical intervention. Genetic data from large cohorts of unrelated individuals can be used to create polygenic risk scores, which could be used to predict individual risk of developing a specific disease. Here the authors show that smaller cohorts of related individuals can provide similarly powerful predictive ability.
Collapse
|
14
|
Raymond B, Wientjes YCJ, Bouwman AC, Schrooten C, Veerkamp RF. A deterministic equation to predict the accuracy of multi-population genomic prediction with multiple genomic relationship matrices. Genet Sel Evol 2020; 52:21. [PMID: 32345213 PMCID: PMC7189707 DOI: 10.1186/s12711-020-00540-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2019] [Accepted: 04/14/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A multi-population genomic prediction (GP) model in which important pre-selected single nucleotide polymorphisms (SNPs) are differentially weighted (MPMG) has been shown to result in better prediction accuracy than a multi-population, single genomic relationship matrix ([Formula: see text]) GP model (MPSG) in which all SNPs are weighted equally. Our objective was to underpin theoretically the advantages and limits of the MPMG model over the MPSG model, by deriving and validating a deterministic prediction equation for its accuracy. METHODS Using selection index theory, we derived an equation to predict the accuracy of estimated total genomic values of selection candidates from population [Formula: see text] ([Formula: see text]), when individuals from two populations, [Formula: see text] and [Formula: see text], are combined in the training population and two [Formula: see text], made respectively from pre-selected and remaining SNPs, are fitted simultaneously in MPMG. We used simulations to validate the prediction equation in scenarios that differed in the level of genetic correlation between populations, heritability, and proportion of genetic variance explained by the pre-selected SNPs. Empirical accuracy of the MPMG model in each scenario was calculated and compared to the predicted accuracy from the equation. RESULTS In general, the derived prediction equation resulted in accurate predictions of [Formula: see text] for the scenarios evaluated. Using the prediction equation, we showed that an important advantage of the MPMG model over the MPSG model is its ability to benefit from the small number of independent chromosome segments ([Formula: see text]) due to the pre-selected SNPs, both within and across populations, whereas for the MPSG model, there is only a single value for [Formula: see text], calculated based on all SNPs, which is very large. However, this advantage is dependent on the pre-selected SNPs that explain some proportion of the total genetic variance for the trait. CONCLUSIONS We developed an equation that gives insight into why, and under which conditions the MPMG outperforms the MPSG model for GP. The equation can be used as a deterministic tool to assess the potential benefit of combining information from different populations, e.g., different breeds or lines for GP in livestock or plants, or different groups of people based on their ethnic background for prediction of disease risk scores.
Collapse
Affiliation(s)
- Biaty Raymond
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands. .,Biometris, Wageningen University and Research, 6700AA, Wageningen, The Netherlands.
| | - Yvonne C J Wientjes
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | - Aniek C Bouwman
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| | | | - Roel F Veerkamp
- Animal Breeding and Genomics, Wageningen University and Research, P.O. Box 338, 6700 AH, Wageningen, The Netherlands
| |
Collapse
|
15
|
Cheesman R, Coleman J, Rayner C, Purves KL, Morneau-Vaillancourt G, Glanville K, Choi SW, Breen G, Eley TC. Familial Influences on Neuroticism and Education in the UK Biobank. Behav Genet 2020; 50:84-93. [PMID: 31802328 PMCID: PMC7028797 DOI: 10.1007/s10519-019-09984-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Accepted: 11/20/2019] [Indexed: 01/22/2023]
Abstract
Genome-wide studies often exclude family members, even though they are a valuable source of information. We identified parent-offspring pairs, siblings and couples in the UK Biobank and implemented a family-based DNA-derived heritability method to capture additional genetic effects and multiple sources of environmental influence on neuroticism and years of education. Compared to estimates from unrelated individuals, total heritability increased from 10 to 27% and from 17 to 56% for neuroticism and education respectively by including family-based genetic effects. We detected no family environmental influences on neuroticism. The couple similarity variance component explained 35% of the variation in years of education, probably reflecting assortative mating. Overall, our genetic and environmental estimates closely replicate previous findings from an independent sample. However, more research is required to dissect contributions to the additional heritability by rare and structural genetic effects, assortative mating, and residual environmental confounding. The latter is especially relevant for years of education, a highly socially contingent variable, for which our heritability estimate is at the upper end of twin estimates in the literature. Family-based genetic effects could be harnessed to improve polygenic prediction.
Collapse
Affiliation(s)
- R Cheesman
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK.
| | - J Coleman
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Trust, London, UK
| | - C Rayner
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - K L Purves
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - G Morneau-Vaillancourt
- Research Unit on Child Psychosocial Maladjustment, Laval University, Quebec City, Canada
| | - K Glanville
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - S W Choi
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
| | - G Breen
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Trust, London, UK
| | - T C Eley
- Social Genetic & Developmental Psychiatry Centre, Institute of Psychiatry, Psychology & Neuroscience, King's College London, 16 de Crespigny Park, Denmark Hill, London, SE5 8AF, UK.
- NIHR Biomedical Research Centre for Mental Health, South London and Maudsley NHS Trust, London, UK.
| |
Collapse
|
16
|
Silva RMO, Evenhuis JP, Vallejo RL, Gao G, Martin KE, Leeds TD, Palti Y, Lourenco DAL. Whole-genome mapping of quantitative trait loci and accuracy of genomic predictions for resistance to columnaris disease in two rainbow trout breeding populations. Genet Sel Evol 2019; 51:42. [PMID: 31387519 PMCID: PMC6683352 DOI: 10.1186/s12711-019-0484-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 07/30/2019] [Indexed: 01/09/2023] Open
Abstract
Background Columnaris disease (CD) is an emerging problem for the rainbow trout aquaculture industry in the US. The objectives of this study were to: (1) identify common genomic regions that explain a large proportion of the additive genetic variance for resistance to CD in two rainbow trout (Oncorhynchus mykiss) populations; and (2) estimate the gains in prediction accuracy when genomic information is used to evaluate the genetic potential of survival to columnaris infection in each population. Methods Two aquaculture populations were investigated: the National Center for Cool and Cold Water Aquaculture (NCCCWA) odd-year line and the Troutlodge, Inc., May odd-year (TLUM) nucleus breeding population. Fish that survived to 21 days post-immersion challenge were recorded as resistant. Single nucleotide polymorphism (SNP) genotypes were available for 1185 and 1137 fish from NCCCWA and TLUM, respectively. SNP effects and variances were estimated using the weighted single-step genomic best linear unbiased prediction (BLUP) for genome-wide association. Genomic regions that explained more than 1% of the additive genetic variance were considered to be associated with resistance to CD. Predictive ability was calculated in a fivefold cross-validation scheme and using a linear regression method. Results Validation on adjusted phenotypes provided a prediction accuracy close to zero, due to the binary nature of the trait. Using breeding values computed from the complete data as benchmark improved prediction accuracy of genomic models by about 40% compared to the pedigree-based BLUP. Fourteen windows located on six chromosomes were associated with resistance to CD in the NCCCWA population, of which two windows on chromosome Omy 17 jointly explained more than 10% of the additive genetic variance. Twenty-six windows located on 13 chromosomes were associated with resistance to CD in the TLUM population. Only four associated genomic regions overlapped with quantitative trait loci (QTL) between both populations. Conclusions Our results suggest that genome-wide selection for resistance to CD in rainbow trout has greater potential than selection for a few target genomic regions that were found to be associated to resistance to CD due to the polygenic architecture of this trait, and because the QTL associated with resistance to CD are not sufficiently informative for selection decisions across populations. Electronic supplementary material The online version of this article (10.1186/s12711-019-0484-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Rafael M O Silva
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA.,Department of Animal and Dairy Science, University of Georgia, Athens, 425 River Road, Athens, GA, 30602, USA.,Zoetis, Sao Paulo, Sao Paulo, 04711-130, Brazil
| | - Jason P Evenhuis
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA
| | - Roger L Vallejo
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA
| | - Guangtu Gao
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA
| | - Kyle E Martin
- Troutloged, Inc., P.O. Box 1290, Sumner, WA, 98390, USA
| | - Tim D Leeds
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA
| | - Yniv Palti
- National Center for Cool and Cold Water Aquaculture, Agricultural Research Service, United States Department of Agriculture, 11861 Leetown Road, Leetown, WV, 25430, USA.
| | - Daniela A L Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, 425 River Road, Athens, GA, 30602, USA
| |
Collapse
|
17
|
Psychiatric Polygenic Risk Scores as Predictor for Attention Deficit/Hyperactivity Disorder and Autism Spectrum Disorder in a Clinical Child and Adolescent Sample. Behav Genet 2019; 50:203-212. [PMID: 31346826 PMCID: PMC7355275 DOI: 10.1007/s10519-019-09965-8] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Accepted: 07/10/2019] [Indexed: 12/31/2022]
Abstract
Neurodevelopmental disorders such as attention deficit/hyperactivity disorder (ADHD) and autism spectrum disorder (ASD) are highly heritable and influenced by many single nucleotide polymorphisms (SNPs). SNPs can be used to calculate individual polygenic risk scores (PRS) for a disorder. We aim to explore the association between the PRS for ADHD, ASD and for Schizophrenia (SCZ), and ADHD and ASD diagnoses in a clinical child and adolescent population. Based on the most recent genome wide association studies of ADHD, ASD and SCZ, PRS of each disorder were calculated for individuals of a clinical child and adolescent target sample (N = 688) and for adult controls (N = 943). We tested with logistic regression analyses for an association with (1) a single diagnosis of ADHD (N = 280), (2) a single diagnosis of ASD (N = 295), and (3) combining the two diagnoses, thus subjects with either ASD, ADHD or both (N = 688). Our results showed a significant association of the ADHD PRS with ADHD status (OR 1.6, P = 1.39 × 10−07) and with the combined ADHD/ASD status (OR 1.36, P = 1.211 × 10−05), but not with ASD status (OR 1.14, P = 1). No associations for the ASD and SCZ PRS were observed. In sum, the PRS of ADHD is significantly associated with the combined ADHD/ASD status. Yet, this association is primarily driven by ADHD status, suggesting disorder specific genetic effects of the ADHD PRS.
Collapse
|
18
|
Iqbal A, Choi TJ, Kim YS, Lee YM, Zahangir Alam M, Jung JH, Choe HS, Kim JJ. Comparison of genomic predictions for carcass and reproduction traits in Berkshire, Duroc and Yorkshire populations in Korea. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2019; 32:1657-1663. [PMID: 31480201 PMCID: PMC6817783 DOI: 10.5713/ajas.18.0672] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 09/07/2018] [Accepted: 06/02/2019] [Indexed: 11/27/2022]
Abstract
Objective A genome-based best linear unbiased prediction (GBLUP) method was applied to evaluate accuracies of genomic estimated breeding value (GEBV) of carcass and reproductive traits in Berkshire, Duroc and Yorkshire populations in Korean swine breeding farms. Methods The data comprised a total of 1,870, 696, and 1,723 genotyped pigs belonging to Berkshire, Duroc and Yorkshire breeds, respectively. Reference populations for carcass traits consisted of 888 Berkshire, 466 Duroc, and 1,208 Yorkshire pigs, and those for reproductive traits comprised 210, 154, and 890 dams for the respective breeds. The carcass traits analyzed were backfat thickness (BFT) and carcass weight (CWT), and the reproductive traits were total number born (TNB) and number born alive (NBA). For each trait, GEBV accuracies were evaluated with a GEBV BLUP model and realized GEBVs. Results The accuracies under the GBLUP model for BFT and CWT ranged from 0.33–0.72 and 0.33–0.63, respectively. For NBA and TNB, the model accuracies ranged 0.32 to 0.54 and 0.39 to 0.56, respectively. The realized accuracy estimates for BFT and CWT ranged 0.30 to 0.46 and 0.09 to 0.27, respectively, and 0.50 to 0.70 and 0.70 to 0.87 for NBA and TNB, respectively. For the carcass traits, the GEBV accuracies under the GBLUP model were higher than the realized GEBV accuracies across the breed populations, while for reproductive traits the realized accuracies were higher than the model based GEBV accuracies. Conclusion The genomic prediction accuracy increased with reference population size and heritability of the trait. The GEBV accuracies were also influenced by GEBV estimation method, such that careful selection of animals based on the estimated GEBVs is needed. GEBV accuracy will increase with a larger sized reference population, which would be more beneficial for traits with low heritability such as reproductive traits.
Collapse
Affiliation(s)
- Asif Iqbal
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Korea
| | - Tae-Jeong Choi
- Swine Science Division, National Institute of Animal Science, RDA, Wanju 55365, Korea
| | - You-Sam Kim
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Korea
| | - Yun-Mi Lee
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Korea
| | - M Zahangir Alam
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Korea
| | | | - Ho-Sung Choe
- Department of Animal Biotechnology, Chonbuk National University, Jeonju 54896, Korea
| | - Jong-Joo Kim
- Department of Biotechnology, Yeungnam University, Gyeongsan 38541, Korea
| |
Collapse
|
19
|
Barría A, Christensen KA, Yoshida G, Jedlicki A, Leong JS, Rondeau EB, Lhorente JP, Koop BF, Davidson WS, Yáñez JM. Whole Genome Linkage Disequilibrium and Effective Population Size in a Coho Salmon ( Oncorhynchus kisutch) Breeding Population Using a High-Density SNP Array. Front Genet 2019; 10:498. [PMID: 31191613 PMCID: PMC6539196 DOI: 10.3389/fgene.2019.00498] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2018] [Accepted: 05/07/2019] [Indexed: 12/19/2022] Open
Abstract
The estimation of linkage disequilibrium between molecular markers within a population is critical when establishing the minimum number of markers required for association studies, genomic selection, and inferring historical events influencing different populations. This work aimed to evaluate the extent and decay of linkage disequilibrium in a coho salmon breeding population using a high-density SNP array. Linkage disequilibrium was estimated between a total of 93,502 SNPs found in 64 individuals (33 dams and 31 sires) from the breeding population. The markers encompass all 30 coho salmon chromosomes and comprise 1,684.62 Mb of the genome. The average density of markers per chromosome ranged from 48.31 to 66 per 1 Mb. The minor allele frequency averaged 0.26 (with a range from 0.22 to 0.27). The overall average linkage disequilibrium among SNPs pairs measured as r2 was 0.10. The Average r2 value decreased with increasing physical distance, with values ranging from 0.21 to 0.07 at a distance lower than 1 kb and up to 10 Mb, respectively. An r2 threshold of 0.2 was reached at distance of approximately 40 Kb. Chromosomes Okis05, Okis15 and Okis28 showed high levels of linkage disequilibrium (>0.20 at distances lower than 1 Mb). Average r2 values were lower than 0.15 for all chromosomes at distances greater than 4 Mb. An effective population size of 43 was estimated for the population 10 generations ago, and 325, for 139 generations ago. Based on the effective number of chromosome segments, we suggest that at least 74,000 SNPs would be necessary for an association mapping study and genomic predictions. Therefore, the SNP panel used allowed us to capture high-resolution information in the farmed coho salmon population. Furthermore, based on the contemporary Ne, a new mate allocation strategy is suggested to increase the effective population size.
Collapse
Affiliation(s)
- Agustín Barría
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
| | - Kris A Christensen
- Department of Biology, Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
| | - Grazyella Yoshida
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
| | - Ana Jedlicki
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile
| | - Jong S Leong
- Department of Biology, Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
| | - Eric B Rondeau
- Department of Biology, Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
| | | | - Ben F Koop
- Department of Biology, Centre for Biomedical Research, University of Victoria, Victoria, BC, Canada
| | - William S Davidson
- Department of Molecular Biology and Biochemistry, Simon Fraser University, Burnaby, BC, Canada
| | - José M Yáñez
- Facultad de Ciencias Veterinarias y Pecuarias, Universidad de Chile, Santiago, Chile.,Nucleo Milenio INVASAL, Concepcion, Chile
| |
Collapse
|
20
|
Wray NR, Kemper KE, Hayes BJ, Goddard ME, Visscher PM. Complex Trait Prediction from Genome Data: Contrasting EBV in Livestock to PRS in Humans: Genomic Prediction. Genetics 2019; 211:1131-1141. [PMID: 30967442 PMCID: PMC6456317 DOI: 10.1534/genetics.119.301859] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Accepted: 01/20/2019] [Indexed: 12/20/2022] Open
Abstract
In this Review, we focus on the similarity of the concepts underlying prediction of estimated breeding values (EBVs) in livestock and polygenic risk scores (PRS) in humans. Our research spans both fields and so we recognize factors that are very obvious for those in one field, but less so for those in the other. Differences in family size between species is the wedge that drives the different viewpoints and approaches. Large family size achievable in nonhuman species accompanied by selection generates a smaller effective population size, increased linkage disequilibrium and a higher average genetic relationship between individuals within a population. In human genetic analyses, we select individuals unrelated in the classical sense (coefficient of relationship <0.05) to estimate heritability captured by common SNPs. In livestock data, all animals within a breed are to some extent "related," and so it is not possible to select unrelated individuals and retain a data set of sufficient size to analyze. These differences directly or indirectly impact the way data analyses are undertaken. In livestock, genetic segregation variance exposed through samplings of parental genomes within families is directly observable and taken for granted. In humans, this genomic variation is under-recognized for its contribution to variation in polygenic risk of common disease, in both those with and without family history of disease. We explore the equation that predicts the expected proportion of variance explained using PRS, and quantify how GWAS sample size is the key factor for maximizing accuracy of prediction in both humans and livestock. Last, we bring together the concepts discussed to address some frequently asked questions.
Collapse
Affiliation(s)
- Naomi R Wray
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4067, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4067, Australia
| | - Kathryn E Kemper
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4067, Australia
| | - Benjamin J Hayes
- Queensland Alliance for Agriculture and Food Innovation, Centre for Animal Science, The University of Queensland, St Lucia, Queensland 4072, Australia
| | - Michael E Goddard
- AgriBio, Centre for AgriBioscience, Department of Economic Development, Jobs, Transport and Resources, Bundoora, Victoria, Australia
- Faculty of Land and Food Resources, University of Melbourne, Parkville, Victoria, Australia
| | - Peter M Visscher
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Queensland 4067, Australia
- Queensland Brain Institute, The University of Queensland, Brisbane, Queensland 4067, Australia
| |
Collapse
|
21
|
Porto A, Peralta JM, Blackburn NB, Blangero J. Reliability of genomic predictions of complex human phenotypes. BMC Proc 2018; 12:51. [PMID: 30275897 PMCID: PMC6157117 DOI: 10.1186/s12919-018-0138-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Genome-wide association studies have helped us identify a wealth of genetic variants associated with complex human phenotypes. Because most variants explain a small portion of the total phenotypic variation, however, marker-based studies remain limited in their ability to predict such phenotypes. Here, we show how modern statistical genetic techniques borrowed from animal breeding can be employed to increase the accuracy of genomic prediction of complex phenotypes and the power of genetic mapping studies. Specifically, using the triglyceride data of the GAW20 data set, we apply genomic-best linear unbiased prediction (G-BLUP) methods to obtain empirical genetic values (EGVs) for each triglyceride phenotype and each individual. We then study 2 different factors that influence the prediction accuracy of G-BLUP for the analysis of human data: (a) the choice of kinship matrix, and (b) the overall level of relatedness. The resulting genetic values represent the total genetic component for the phenotype of interest and can be used to represent a trait without its environmental component. Finally, using empirical data, we demonstrate how this method can be used to increase the power of genetic mapping studies. In sum, our results show that dense genome-wide data can be used in a wider scope than previously anticipated.
Collapse
Affiliation(s)
- Arthur Porto
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, One West University Blvd. Modular Building #100, Brownsville, TX 78250 USA
- Centre for Ecological and Evolutionary Synthesis, University of Oslo, Blindernveien 31, Oslo, Norway
| | - Juan M. Peralta
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, One West University Blvd. Modular Building #100, Brownsville, TX 78250 USA
| | - Nicholas B. Blackburn
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, One West University Blvd. Modular Building #100, Brownsville, TX 78250 USA
- Menzies Institute for Medical Research, University of Tasmania, 17, Liverpool St., Hobart, TAS Australia
| | - John Blangero
- South Texas Diabetes and Obesity Institute, University of Texas Rio Grande Valley School of Medicine, One West University Blvd. Modular Building #100, Brownsville, TX 78250 USA
| |
Collapse
|
22
|
Ni G, Moser G, Wray NR, Lee SH, Ripke S, Neale BM, Corvin A, Walters JT, Farh KH, Holmans PA, Lee P, Bulik-Sullivan B, Collier DA, Huang H, Pers TH, Agartz I, Agerbo E, Albus M, Alexander M, Amin F, Bacanu SA, Begemann M, Belliveau RA, Bene J, Bergen SE, Bevilacqua E, Bigdeli TB, Black DW, Bruggeman R, Buccola NG, Buckner RL, Byerley W, Cahn W, Cai G, Campion D, Cantor RM, Carr VJ, Carrera N, Catts SV, Chambert KD, Chan RC, Chen RY, Chen EY, Cheng W, Cheung EF, Chong SA, Cloninger CR, Cohen D, Cohen N, Cormican P, Craddock N, Crowley JJ, Curtis D, Davidson M, Davis KL, Degenhardt F, Del Favero J, Demontis D, Dikeos D, Dinan T, Djurovic S, Donohoe G, Drapeau E, Duan J, Dudbridge F, Durmishi N, Eichhammer P, Eriksson J, Escott-Price V, Essioux L, Fanous AH, Farrell MS, Frank J, Franke L, Freedman R, Freimer NB, Friedl M, Friedman JI, Fromer M, Genovese G, Georgieva L, Giegling I, Giusti-Rodríguez P, Godard S, Goldstein JI, Golimbet V, Gopal S, Gratten J, de Haan L, Hammer C, Hamshere ML, Hansen M, Hansen T, Haroutunian V, Hartmann AM, Henskens FA, Herms S, Hirschhorn JN, Hoffmann P, Hofman A, Hollegaard MV, Hougaard DM, Ikeda M, Joa I, Juliá A, Kahn RS, Kalaydjieva L, Karachanak-Yankova S, Karjalainen J, Kavanagh D, Keller MC, Kennedy JL, Khrunin A, Kim Y, Klovins J, Knowles JA, Konte B, Kucinskas V, Kucinskiene ZA, Kuzelova-Ptackova H, Kähler AK, Laurent C, Keong JLC, Legge SE, Lerer B, Li M, Li T, Liang KY, Lieberman J, Limborska S, Loughland CM, Lubinski J, Lönnqvist J, Macek M, Magnusson PK, Maher BS, Maier W, Mallet J, Marsal S, Mattheisen M, Mattingsda M, McCarley RW, McDonald C, McIntosh AM, Meier S, Meijer CJ, Melegh B, Melle I, Mesholam-Gately RI, Metspalu A, Michie PT, Milani L, Milanova V, Mokrab Y, Morris DW, Mors O, Murphy KC, Murray RM, Myin-Germeys I, Müller-Myhsok B, Nelis M, Nenadic I, Nertney DA, Nestadt G, Nicodemus KK, Nikitina-Zake L, Nisenbaum L, Nordin A, O’Callaghan E, O’Dushlaine C, O’Neill FA, Oh SY, Olinc A, Olsen L, Van Os J, Pantelis C, Papadimitriou GN, Papio S, Parkhomenko E, Pato MT, Paunio T, Pejovic-Milovancevic M, Perkins DO, Pietiläinenl O, Pimm J, Pocklington AJ, Powell J, Price A, Pulver AE, Purcell SM, Quested D, Rasmussen HB, Reichenberg A, Reimers MA, Richards AL, Roffman JL, Roussos P, Ruderfer DM, Salomaa V, Sanders AR, Schall U, Schubert CR, Schulze TG, Schwab SG, Scolnick EM, Scott RJ, Seidman LJ, Shi J, Sigurdsson E, Silagadze T, Silverman JM, Sim K, Slominsky P, Smoller JW, So HC, Spencer CC, Stah EA, Stefansson H, Steinberg S, Stogmann E, Straub RE, Strengman E, Strohmaier J, Stroup TS, Subramaniam M, Suvisaari J, Svrakic DM, Szatkiewicz JP, Söderman E, Thirumalai S, Toncheva D, Tosato S, Veijola J, Waddington J, Walsh D, Wang D, Wang Q, Webb BT, Weiser M, Wildenauer DB, Williams NM, Williams S, Witt SH, Wolen AR, Wong EH, Wormley BK, Xi HS, Zai CC, Zheng X, Zimprich F, Stefansson K, Visscher PM, Adolfsson R, Andreassen OA, Blackwood DH, Bramon E, Buxbaum JD, Børglum AD, Cichon S, Darvasi A, Domenici E, Ehrenreich H, Esko T, Gejman PV, Gill M, Gurling H, Hultman CM, Iwata N, Jablensky AV, Jönsson EG, Kendler KS, Kirov G, Knight J, Lencz T, Levinson DF, Li QS, Liu J, Malhotra AK, McCarrol SA, McQuillin A, Moran JL, Mortensen PB, Mowry BJ, Nöthen MM, Ophoff RA, Owen MJ, Palotie A, Pato CN, Petryshen TL, Posthuma D, Rietsche M, Riley BP, Rujescu D, Sham PC, Sklar P, St Clair D, Weinberger DR, Wendland JR, Werge T, Daly MJ, Sullivan PF, O’Donovan MC. Estimation of Genetic Correlation via Linkage Disequilibrium Score Regression and Genomic Restricted Maximum Likelihood. Am J Hum Genet 2018; 102:1185-1194. [PMID: 29754766 PMCID: PMC5993419 DOI: 10.1016/j.ajhg.2018.03.021] [Citation(s) in RCA: 100] [Impact Index Per Article: 16.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 03/20/2018] [Indexed: 10/16/2022] Open
Abstract
Genetic correlation is a key population parameter that describes the shared genetic architecture of complex traits and diseases. It can be estimated by current state-of-art methods, i.e., linkage disequilibrium score regression (LDSC) and genomic restricted maximum likelihood (GREML). The massively reduced computing burden of LDSC compared to GREML makes it an attractive tool, although the accuracy (i.e., magnitude of standard errors) of LDSC estimates has not been thoroughly studied. In simulation, we show that the accuracy of GREML is generally higher than that of LDSC. When there is genetic heterogeneity between the actual sample and reference data from which LD scores are estimated, the accuracy of LDSC decreases further. In real data analyses estimating the genetic correlation between schizophrenia (SCZ) and body mass index, we show that GREML estimates based on ∼150,000 individuals give a higher accuracy than LDSC estimates based on ∼400,000 individuals (from combined meta-data). A GREML genomic partitioning analysis reveals that the genetic correlation between SCZ and height is significantly negative for regulatory regions, which whole genome or LDSC approach has less power to detect. We conclude that LDSC estimates should be carefully interpreted as there can be uncertainty about homogeneity among combined meta-datasets. We suggest that any interesting findings from massive LDSC analysis for a large number of complex traits should be followed up, where possible, with more detailed analyses with GREML methods, even if sample sizes are lesser.
Collapse
|
23
|
Abstract
Genomic prediction has the potential to contribute to precision medicine. However, to date, the utility of such predictors is limited due to low accuracy for most traits. Here theory and simulation study are used to demonstrate that widespread pleiotropy among phenotypes can be utilised to improve genomic risk prediction. We show how a genetic predictor can be created as a weighted index that combines published genome-wide association study (GWAS) summary statistics across many different traits. We apply this framework to predict risk of schizophrenia and bipolar disorder in the Psychiatric Genomics consortium data, finding substantial heterogeneity in prediction accuracy increases across cohorts. For six additional phenotypes in the UK Biobank data, we find increases in prediction accuracy ranging from 0.7% for height to 47% for type 2 diabetes, when using a multi-trait predictor that combines published summary statistics from multiple traits, as compared to a predictor based only on one trait.
Collapse
|
24
|
Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ, Fontana MA, Nguyen-Viet TA, Wedow R, Zacher M, Furlotte NA, Magnusson P, Oskarsson S, Johannesson M, Visscher PM, Laibson D, Cesarini D, Neale BM, Benjamin DJ. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat Genet 2018; 50:229-237. [PMID: 29292387 PMCID: PMC5805593 DOI: 10.1038/s41588-017-0009-4] [Citation(s) in RCA: 556] [Impact Index Per Article: 92.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Accepted: 11/06/2017] [Indexed: 12/28/2022]
Abstract
We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Collapse
Affiliation(s)
- Patrick Turley
- Broad Institute, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Cambridge, MA, USA.
| | - Raymond K Walters
- Broad Institute, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Cambridge, MA, USA
| | - Omeed Maghzian
- Department of Economics, Harvard University, Cambridge, MA, USA
| | - Aysu Okbay
- Department of Complex Trait Genetics, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands
| | - James J Lee
- Department of Psychology, University of Minnesota, Minneapolis, MN, USA
| | | | - Tuan Anh Nguyen-Viet
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA
| | - Robbee Wedow
- Institute for Behavioral Genetics, University of Colorado Boulder, Boulder, CO, USA
- Institute of Behavioral Science, University of Colorado Boulder, Boulder, CO, USA
- Department of Sociology, University of Colorado Boulder, Boulder, CO, USA
| | - Meghan Zacher
- Department of Sociology, Harvard University, Cambridge, MA, USA
| | | | - Patrik Magnusson
- Institutionen för Medicinsk Epidemiologi och Biostatistik, Karolinska Institutet, Stockholm, Sweden
| | - Sven Oskarsson
- Department of Government, Uppsala Universitet, Uppsala, Sweden
| | - Magnus Johannesson
- Department of Economics, Stockholm School of Economics, Stockholm, Sweden
| | - Peter M Visscher
- Institute for Molecular Bioscience, University of Queensland, Brisbane, Queensland, Australia
- Queensland Brain Institute, University of Queensland, Brisbane, Queensland, Australia
| | - David Laibson
- Department of Economics, Harvard University, Cambridge, MA, USA
- National Bureau of Economic Research, Cambridge, MA, USA
| | - David Cesarini
- National Bureau of Economic Research, Cambridge, MA, USA.
- Department of Economics and Center for Experimental Social Science, New York University, New York, NY, USA.
- Institutet för Näringslivsforskning, Stockholm, Sweden.
| | - Benjamin M Neale
- Broad Institute, Cambridge, MA, USA.
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Cambridge, MA, USA.
| | - Daniel J Benjamin
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, USA.
- National Bureau of Economic Research, Cambridge, MA, USA.
- Department of Economics, University of Southern California, Los Angeles, CA, USA.
| |
Collapse
|
25
|
Estimation of genomic prediction accuracy from reference populations with varying degrees of relationship. PLoS One 2017; 12:e0189775. [PMID: 29267328 PMCID: PMC5739427 DOI: 10.1371/journal.pone.0189775] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2017] [Accepted: 11/09/2017] [Indexed: 01/07/2023] Open
Abstract
Genomic prediction is emerging in a wide range of fields including animal and plant breeding, risk prediction in human precision medicine and forensic. It is desirable to establish a theoretical framework for genomic prediction accuracy when the reference data consists of information sources with varying degrees of relationship to the target individuals. A reference set can contain both close and distant relatives as well as ‘unrelated’ individuals from the wider population in the genomic prediction. The various sources of information were modeled as different populations with different effective population sizes (Ne). Both the effective number of chromosome segments (Me) and Ne are considered to be a function of the data used for prediction. We validate our theory with analyses of simulated as well as real data, and illustrate that the variation in genomic relationships with the target is a predictor of the information content of the reference set. With a similar amount of data available for each source, we show that close relatives can have a substantially larger effect on genomic prediction accuracy than lesser related individuals. We also illustrate that when prediction relies on closer relatives, there is less improvement in prediction accuracy with an increase in training data or marker panel density. We release software that can estimate the expected prediction accuracy and power when combining different reference sources with various degrees of relationship to the target, which is useful when planning genomic prediction (before or after collecting data) in animal, plant and human genetics.
Collapse
|
26
|
Raoul J, Swan AA, Elsen JM. Using a very low-density SNP panel for genomic selection in a breeding program for sheep. Genet Sel Evol 2017; 49:76. [PMID: 29065868 PMCID: PMC5655911 DOI: 10.1186/s12711-017-0351-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2017] [Accepted: 10/17/2017] [Indexed: 01/11/2023] Open
Abstract
Background Building an efficient reference population for genomic selection is an issue when the recorded population is small and phenotypes are poorly informed, which is often the case in sheep breeding programs. Using stochastic simulation, we evaluated a genomic design based on a reference population with medium-density genotypes [around 45 K single nucleotide polymorphisms (SNPs)] of dams that were imputed from very low-density genotypes (≤ 1000 SNPs). Methods A population under selection for a maternal trait was simulated using real genotypes. Genetic gains realized from classical selection and genomic selection designs were compared. Genomic selection scenarios that differed in reference population structure (whether or not dams were included in the reference) and genotype quality (medium-density or imputed to medium-density from very low-density) were evaluated. Results The genomic design increased genetic gain by 26% when the reference population was based on sire medium-density genotypes and by 54% when the reference population included both sire and dam medium-density genotypes. When medium-density genotypes of male candidates and dams were replaced by imputed genotypes from very low-density SNP genotypes (1000 SNPs), the increase in gain was 22% for the sire reference population and 42% for the sire and dam reference population. The rate of increase in inbreeding was lower (from − 20 to − 34%) for the genomic design than for the classical design regardless of the genomic scenario. Conclusions We show that very low-density genotypes of male candidates and dams combined with an imputation process result in a substantial increase in genetic gain for small sheep breeding programs.
Collapse
Affiliation(s)
- Jérôme Raoul
- Institut de l'Elevage, Castanet-Tolosan, France. .,GenPhySE, INRA, Castanet-Tolosan, France.
| | - Andrew A Swan
- Animal Genetics and Breeding Unit, University of New England, Armidale, Australia
| | | |
Collapse
|
27
|
Improving Disease Prediction by Incorporating Family Disease History in Risk Prediction Models with Large-Scale Genetic Data. Genetics 2017; 207:1147-1155. [PMID: 28899997 DOI: 10.1534/genetics.117.300283] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 08/31/2017] [Indexed: 12/30/2022] Open
Abstract
Despite the many successes of genome-wide association studies (GWAS), the known susceptibility variants identified by GWAS have modest effect sizes, leading to notable skepticism about the effectiveness of building a risk prediction model from large-scale genetic data. However, in contrast to genetic variants, the family history of diseases has been largely accepted as an important risk factor in clinical diagnosis and risk prediction. Nevertheless, the complicated structures of the family history of diseases have limited their application in clinical practice. Here, we developed a new method that enables incorporation of the general family history of diseases with a liability threshold model, and propose a new analysis strategy for risk prediction with penalized regression analysis that incorporates both large numbers of genetic variants and clinical risk factors. Application of our model to type 2 diabetes in the Korean population (1846 cases and 1846 controls) demonstrated that single-nucleotide polymorphisms accounted for 32.5% of the variation explained by the predicted risk scores in the test data set, and incorporation of family history led to an additional 6.3% improvement in prediction. Our results illustrate that family medical history provides valuable information on the variation of complex diseases and improves prediction performance.
Collapse
|
28
|
Non-parametric genetic prediction of complex traits with latent Dirichlet process regression models. Nat Commun 2017; 8:456. [PMID: 28878256 PMCID: PMC5587666 DOI: 10.1038/s41467-017-00470-2] [Citation(s) in RCA: 71] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2016] [Accepted: 06/30/2017] [Indexed: 01/03/2023] Open
Abstract
Using genotype data to perform accurate genetic prediction of complex traits can facilitate genomic selection in animal and plant breeding programs, and can aid in the development of personalized medicine in humans. Because most complex traits have a polygenic architecture, accurate genetic prediction often requires modeling all genetic variants together via polygenic methods. Here, we develop such a polygenic method, which we refer to as the latent Dirichlet process regression model. Dirichlet process regression is non-parametric in nature, relies on the Dirichlet process to flexibly and adaptively model the effect size distribution, and thus enjoys robust prediction performance across a broad spectrum of genetic architectures. We compare Dirichlet process regression with several commonly used prediction methods with simulations. We further apply Dirichlet process regression to predict gene expressions, to conduct PrediXcan based gene set test, to perform genomic selection of four traits in two species, and to predict eight complex traits in a human cohort.Genetic prediction of complex traits with polygenic architecture has wide application from animal breeding to disease prevention. Here, Zeng and Zhou develop a non-parametric genetic prediction method based on latent Dirichlet Process regression models.
Collapse
|
29
|
Chen GB, Lee SH, Montgomery GW, Wray NR, Visscher PM, Gearry RB, Lawrance IC, Andrews JM, Bampton P, Mahy G, Bell S, Walsh A, Connor S, Sparrow M, Bowdler LM, Simms LA, Krishnaprasad K, Radford-Smith GL, Moser G. Performance of risk prediction for inflammatory bowel disease based on genotyping platform and genomic risk score method. BMC MEDICAL GENETICS 2017; 18:94. [PMID: 28851283 PMCID: PMC5576242 DOI: 10.1186/s12881-017-0451-2] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/08/2016] [Accepted: 08/14/2017] [Indexed: 12/11/2022]
Abstract
Background Predicting risk of disease from genotypes is being increasingly proposed for a variety of diagnostic and prognostic purposes. Genome-wide association studies (GWAS) have identified a large number of genome-wide significant susceptibility loci for Crohn’s disease (CD) and ulcerative colitis (UC), two subtypes of inflammatory bowel disease (IBD). Recent studies have demonstrated that including only loci that are significantly associated with disease in the prediction model has low predictive power and that power can substantially be improved using a polygenic approach. Methods We performed a comprehensive analysis of risk prediction models using large case-control cohorts genotyped for 909,763 GWAS SNPs or 123,437 SNPs on the custom designed Immunochip using four prediction methods (polygenic score, best linear genomic prediction, elastic-net regularization and a Bayesian mixture model). We used the area under the curve (AUC) to assess prediction performance for discovery populations with different sample sizes and number of SNPs within cross-validation. Results On average, the Bayesian mixture approach had the best prediction performance. Using cross-validation we found little differences in prediction performance between GWAS and Immunochip, despite the GWAS array providing a 10 times larger effective genome-wide coverage. The prediction performance using Immunochip is largely due to the power of the initial GWAS for its marker selection and its low cost that enabled larger sample sizes. The predictive ability of the genomic risk score based on Immunochip was replicated in external data, with AUC of 0.75 for CD and 0.70 for UC. CD patients with higher risk scores demonstrated clinical characteristics typically associated with a more severe disease course including ileal location and earlier age at diagnosis. Conclusions Our analyses demonstrate that the power of genomic risk prediction for IBD is mainly due to strongly associated SNPs with considerable effect sizes. Additional SNPs that are only tagged by high-density GWAS arrays and low or rare-variants over-represented in the high-density region on the Immunochip contribute little to prediction accuracy. Although a quantitative assessment of IBD risk for an individual is not currently possible, we show sufficient power of genomic risk scores to stratify IBD risk among individuals at diagnosis. Electronic supplementary material The online version of this article (doi:10.1186/s12881-017-0451-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Guo-Bo Chen
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia
| | - Sang Hong Lee
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia.,School of Environmental and Rural Science, The University of New England, Armidale, Australia
| | - Grant W Montgomery
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Naomi R Wray
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia
| | - Peter M Visscher
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia.,University of Queensland Diamantina Institute, Translational Research Institute, The University of Queensland, Brisbane, Australia
| | - Richard B Gearry
- Department of Medicine, University of Otago, Christchurch, New Zealand.,Department of Gastroenterology, Christchurch Hospital, Christchurch, New Zealand
| | - Ian C Lawrance
- Harry Perkins Institute of Medical Research, School of Medicine and Pharmacology, University of Western Australia, Murdoch, Australia.,Centre for Inflammatory Bowel Diseases, Saint John of God Hospital, Subiaco, Australia
| | - Jane M Andrews
- Inflammatory Bowel Disease Service, Department of Gastroenterology and Hepatology, Royal Adelaide Hospital, School of Medicine, University of Adelaide, Adelaide, Australia
| | - Peter Bampton
- Department of Gastroenterology and Hepatology, Flinders Medical Centre, Adelaide, Australia
| | - Gillian Mahy
- Department of Gastroenterology, Townsville Hospital, Townsville, Australia
| | - Sally Bell
- Department of Gastroenterology, St Vincent's Hospital, Melbourne, Australia
| | - Alissa Walsh
- Department of Gastroenterology and Hepatology, St Vincent's Hospital, Sydney, Australia
| | - Susan Connor
- Department of Gastroenterology and Hepatology, Liverpool Hospital, Sydney, Australia.,University of NSW, Sydney, Australia
| | - Miles Sparrow
- Department of Gastroenterology, Alfred Health, Melbourne, Australia
| | - Lisa M Bowdler
- Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia
| | - Lisa A Simms
- Inflammatory Bowel Disease Research Group, Immunology Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | - Krupa Krishnaprasad
- Inflammatory Bowel Disease Research Group, Immunology Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia
| | | | - Graham L Radford-Smith
- School of Medicine, The University of Queensland, Brisbane, Australia.,Inflammatory Bowel Disease Research Group, Immunology Division, QIMR Berghofer Medical Research Institute, Brisbane, Australia.,Department of Gastroenterology, Royal Brisbane and Women's Hospital, Brisbane, Australia
| | - Gerhard Moser
- Queensland Brain Institute, The University of Queensland, Brisbane, Australia.
| |
Collapse
|
30
|
Li H, Su G, Jiang L, Bao Z. An efficient unified model for genome-wide association studies and genomic selection. Genet Sel Evol 2017; 49:64. [PMID: 28836943 PMCID: PMC5569572 DOI: 10.1186/s12711-017-0338-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2017] [Accepted: 08/07/2017] [Indexed: 11/10/2022] Open
Abstract
Background A quantitative trait is controlled both by major variants with large genetic effects and by minor variants with small effects. Genome-wide association studies (GWAS) are an efficient approach to identify quantitative trait loci (QTL), and genomic selection (GS) with high-density single nucleotide polymorphisms (SNPs) can achieve higher accuracy of estimated breeding values than conventional best linear unbiased prediction (BLUP). GWAS and GS address different aspects of quantitative traits, but, as statistical models, they are quite similar in their description of the genetic mechanisms that underlie quantitative traits. Methods Here, we propose a stepwise linear regression mixed model (StepLMM) to unify GWAS and GS in a single statistical model. First, the variance components of the genomic-BLUP (GBLUP) model are estimated. Then, in the SNP selection step, the linear mixed model (LMM) for GWAS is equivalently transformed into a simple linear regression to improve computation speed, and the most significant SNP is selected and included into the evaluation model. In the SNP dropping step, the SNPs in the evaluation model are tested according to the standard errors of their estimated effects. If non-significant SNPs are present, the least significant one is dropped from the model and variance components are re-estimated. We used extended Bayesian information criteria (eBIC) to evaluate the model optimization, i.e. the model with the smallest eBIC is the final one and includes only significant SNPs. Results We simulated scenarios with different heritabilities with 100 QTL. StepLMM estimated heritability accurately and mapped QTL precisely. Genomic prediction accuracy was much higher with StepLMM than with GBLUP. The comparison of StepLMM with other GWAS and GS methods based on a dataset from the 16th QTLMAS Workshop showed that StepLMM had medium mapping power, the lowest rate of false positives for QTL mapping, and the highest accuracy for genomic prediction. Conclusions StepLMM is a combination of GWAS and GBLUP. GWAS and GBLUP are beneficial to each other in a single statistical model, GWAS improves genomic prediction accuracy, while GBLUP increases mapping precision and decreases the rate of false positives of GWAS. StepLMM has a high performance in both GWAS and GS and is feasible for agricultural breeding programs and human genetic studies. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0338-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hengde Li
- Ministry of Agriculture Key Laboratory of Aquatic Genomics, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Center for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences, Beijing, 100141, China.
| | - Guosheng Su
- Center for Quantitative Genetics and Genomics, Department of Molecular Biology and Genetics, Aarhus University, 8830, Tjele, Denmark
| | - Li Jiang
- Ministry of Agriculture Key Laboratory of Aquatic Genomics, CAFS Key Laboratory of Aquatic Genomics and Beijing Key Laboratory of Fishery Biotechnology, Center for Applied Aquatic Genomics, Chinese Academy of Fishery Sciences, Beijing, 100141, China
| | - Zhenmin Bao
- College of Marine Life, Ocean University of China, Qingdao, 266003, China.
| |
Collapse
|