1
|
Xu C, Ganesh SK, Zhou X. mtPGS: Leverage multiple correlated traits for accurate polygenic score construction. Am J Hum Genet 2023; 110:1673-1689. [PMID: 37716346 PMCID: PMC10577082 DOI: 10.1016/j.ajhg.2023.08.016] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 08/18/2023] [Accepted: 08/27/2023] [Indexed: 09/18/2023] Open
Abstract
Accurate polygenic scores (PGSs) facilitate the genetic prediction of complex traits and aid in the development of personalized medicine. Here, we develop a statistical method called multi-trait assisted PGS (mtPGS), which can construct accurate PGSs for a target trait of interest by leveraging multiple traits relevant to the target trait. Specifically, mtPGS borrows SNP effect size similarity information between the target trait and its relevant traits to improve the effect size estimation on the target trait, thus achieving accurate PGSs. In the process, mtPGS flexibly models the shared genetic architecture between the target and the relevant traits to achieve robust performance, while explicitly accounting for the environmental covariance among them to accommodate different study designs with various sample overlap patterns. In addition, mtPGS uses only summary statistics as input and relies on a deterministic algorithm with several algebraic techniques for scalable computation. We evaluate the performance of mtPGS through comprehensive simulations and applications to 25 traits in the UK Biobank, where in the real data mtPGS achieves an average of 0.90%-52.91% accuracy gain compared to the state-of-the-art PGS methods. Overall, mtPGS represents an accurate, fast, and robust solution for PGS construction in biobank-scale datasets.
Collapse
Affiliation(s)
- Chang Xu
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA
| | - Santhi K Ganesh
- Department of Internal Medicine, Division of Cardiovascular Medicine, University of Michigan, Ann Arbor, MI, USA; Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Xiang Zhou
- Department of Biostatistics, University of Michigan School of Public Health, Ann Arbor, MI, USA; Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI, USA.
| |
Collapse
|
2
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Holl J, Herring WO, Misztal I, Lourenco D. Using pre-selected variants from large-scale whole-genome sequence data for single-step genomic predictions in pigs. Genet Sel Evol 2023; 55:55. [PMID: 37495982 PMCID: PMC10373252 DOI: 10.1186/s12711-023-00831-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Accepted: 07/18/2023] [Indexed: 07/28/2023] Open
Abstract
BACKGROUND Whole-genome sequence (WGS) data harbor causative variants that may not be present in standard single nucleotide polymorphism (SNP) chip data. The objective of this study was to investigate the impact of using preselected variants from WGS for single-step genomic predictions in maternal and terminal pig lines with up to 1.8k sequenced and 104k sequence imputed animals per line. METHODS Two maternal and four terminal lines were investigated for eight and seven traits, respectively. The number of sequenced animals ranged from 1365 to 1491 for the maternal lines and 381 to 1865 for the terminal lines. Imputation to sequence occurred within each line for 66k to 76k animals for the maternal lines and 29k to 104k animals for the terminal lines. Two preselected SNP sets were generated based on a genome-wide association study (GWAS). Top40k included the SNPs with the lowest p-value in each of the 40k genomic windows, and ChipPlusSign included significant variants integrated into the porcine SNP chip used for routine genotyping. We compared the performance of single-step genomic predictions between using preselected SNP sets assuming equal or different variances and the standard porcine SNP chip. RESULTS In the maternal lines, ChipPlusSign and Top40k showed an average increase in accuracy of 0.6 and 4.9%, respectively, compared to the regular porcine SNP chip. The greatest increase was obtained with Top40k, particularly for fertility traits, for which the initial accuracy based on the standard SNP chip was low. However, in the terminal lines, Top40k resulted in an average loss of accuracy of 1%. ChipPlusSign provided a positive, although small, gain in accuracy (0.9%). Assigning different variances for the SNPs slightly improved accuracies when using variances obtained from BayesR. However, increases were inconsistent across the lines and traits. CONCLUSIONS The benefit of using sequence data depends on the line, the size of the genotyped population, and how the WGS variants are preselected. When WGS data are available on hundreds of thousands of animals, using sequence data presents an advantage but this remains limited in pigs.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA.
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Easter Bush, Midlothian, Scotland, UK
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | - Justin Holl
- The Pig Improvement Company, Genus Plc, Hendersonville, TN, USA
| | | | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, 30602, USA
| |
Collapse
|
3
|
Jang S, Ros-Freixedes R, Hickey JM, Chen CY, Herring WO, Holl J, Misztal I, Lourenco D. Multi-line ssGBLUP evaluation using preselected markers from whole-genome sequence data in pigs. Front Genet 2023; 14:1163626. [PMID: 37252662 PMCID: PMC10213539 DOI: 10.3389/fgene.2023.1163626] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2023] [Accepted: 05/03/2023] [Indexed: 05/31/2023] Open
Abstract
Genomic evaluations in pigs could benefit from using multi-line data along with whole-genome sequencing (WGS) if the data are large enough to represent the variability across populations. The objective of this study was to investigate strategies to combine large-scale data from different terminal pig lines in a multi-line genomic evaluation (MLE) through single-step GBLUP (ssGBLUP) models while including variants preselected from whole-genome sequence (WGS) data. We investigated single-line and multi-line evaluations for five traits recorded in three terminal lines. The number of sequenced animals in each line ranged from 731 to 1,865, with 60k to 104k imputed to WGS. Unknown parent groups (UPG) and metafounders (MF) were explored to account for genetic differences among the lines and improve the compatibility between pedigree and genomic relationships in the MLE. Sequence variants were preselected based on multi-line genome-wide association studies (GWAS) or linkage disequilibrium (LD) pruning. These preselected variant sets were used for ssGBLUP predictions without and with weights from BayesR, and the performances were compared to that of a commercial porcine single-nucleotide polymorphisms (SNP) chip. Using UPG and MF in MLE showed small to no gain in prediction accuracy (up to 0.02), depending on the lines and traits, compared to the single-line genomic evaluation (SLE). Likewise, adding selected variants from the GWAS to the commercial SNP chip resulted in a maximum increase of 0.02 in the prediction accuracy, only for average daily feed intake in the most numerous lines. In addition, no benefits were observed when using preselected sequence variants in multi-line genomic predictions. Weights from BayesR did not help improve the performance of ssGBLUP. This study revealed limited benefits of using preselected whole-genome sequence variants for multi-line genomic predictions, even when tens of thousands of animals had imputed sequence data. Correctly accounting for line differences with UPG or MF in MLE is essential to obtain predictions similar to SLE; however, the only observed benefit of an MLE is to have comparable predictions across lines. Further investigation into the amount of data and novel methods to preselect whole-genome causative variants in combined populations would be of significant interest.
Collapse
Affiliation(s)
- Sungbong Jang
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Roger Ros-Freixedes
- Departament de Ciència Animal, Universitat de Lleida-Agrotecnio-CERCA Center, Lleida, Spain
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Edinburgh, Scotland, United Kingdom
| | - Ching-Yi Chen
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - William O Herring
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Justin Holl
- The Pig Improvement Company, Genus plc, Hendersonville, TN, United States
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| | - Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA, United States
| |
Collapse
|
4
|
|
5
|
Mancin E, Mota LFM, Tuliozi B, Verdiglione R, Mantovani R, Sartori C. Improvement of Genomic Predictions in Small Breeds by Construction of Genomic Relationship Matrix Through Variable Selection. Front Genet 2022; 13:814264. [PMID: 35664297 PMCID: PMC9158133 DOI: 10.3389/fgene.2022.814264] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Genomic selection has been increasingly implemented in the animal breeding industry, and it is becoming a routine method in many livestock breeding contexts. However, its use is still limited in several small-population local breeds, which are, nonetheless, an important source of genetic variability of great economic value. A major roadblock for their genomic selection is accuracy when population size is limited: to improve breeding value accuracy, variable selection models that assume heterogenous variance have been proposed over the last few years. However, while these models might outperform traditional and genomic predictions in terms of accuracy, they also carry a proportional increase of breeding value bias and dispersion. These mutual increases are especially striking when genomic selection is performed with a low number of phenotypes and high shrinkage value—which is precisely the situation that happens with small local breeds. In our study, we tested several alternative methods to improve the accuracy of genomic selection in a small population. First, we investigated the impact of using only a subset of informative markers regarding prediction accuracy, bias, and dispersion. We used different algorithms to select them, such as recursive feature eliminations, penalized regression, and XGBoost. We compared our results with the predictions of pedigree-based BLUP, single-step genomic BLUP, and weighted single-step genomic BLUP in different simulated populations obtained by combining various parameters in terms of number of QTLs and effective population size. We also investigated these approaches on a real data set belonging to the small local Rendena breed. Our results show that the accuracy of GBLUP in small-sized populations increased when performed with SNPs selected via variable selection methods both in simulated and real data sets. In addition, the use of variable selection models—especially those using XGBoost—in our real data set did not impact bias and the dispersion of estimated breeding values. We have discussed possible explanations for our results and how our study can help estimate breeding values for future genomic selection in small breeds.
Collapse
Affiliation(s)
- Enrico Mancin
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Lucio Flavio Macedo Mota
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Beniamino Tuliozi
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Rina Verdiglione
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Roberto Mantovani
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| | - Cristina Sartori
- Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padua, Legnaro, Italy
| |
Collapse
|
6
|
Cesarani A, Masuda Y, Tsuruta S, Nicolazzi EL, VanRaden PM, Lourenco D, Misztal I. Genomic predictions for yield traits in US Holsteins with unknown parent groups. J Dairy Sci 2021; 104:5843-5853. [PMID: 33663836 DOI: 10.3168/jds.2020-19789] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Accepted: 12/18/2020] [Indexed: 11/19/2022]
Abstract
The objective of this study was to assess the reliability and bias of estimated breeding values (EBV) from traditional BLUP with unknown parent groups (UPG), genomic EBV (GEBV) from single-step genomic BLUP (ssGBLUP) with UPG for the pedigree relationship matrix (A) only (SS_UPG), and GEBV from ssGBLUP with UPG for both A and the relationship matrix among genotyped animals (A22; SS_UPG2) using 6 large phenotype-pedigree truncated Holstein data sets. The complete data included 80 million records for milk, fat, and protein yields from 31 million cows recorded since 1980. Phenotype-pedigree truncation scenarios included truncation of phenotypes for cows recorded before 1990 and 2000 combined with truncation of pedigree information after 2 or 3 ancestral generations. A total of 861,525 genotyped bulls with progeny and cows with phenotypic records were used in the analyses. Reliability and bias (inflation/deflation) of GEBV were obtained for 2,710 bulls based on deregressed proofs, and on 381,779 cows born after 2014 based on predictivity (adjusted cow phenotypes). The BLUP reliabilities for young bulls varied from 0.29 to 0.30 across traits and were unaffected by data truncation and number of generations in the pedigree. Reliabilities ranged from 0.54 to 0.69 for SS_UPG and were slightly affected by phenotype-pedigree truncation. Reliabilities ranged from 0.69 to 0.73 for SS_UPG2 and were unaffected by phenotype-pedigree truncation. The regression coefficient of bull deregressed proofs on (G)EBV (i.e., GEBV and EBV) ranged from 0.86 to 0.90 for BLUP, from 0.77 to 0.94 for SS_UPG, and was 1.00 ± 0.03 for SS_UPG2. Cow predictivity ranged from 0.22 to 0.28 for BLUP, 0.48 to 0.51 for SS_UPG, and 0.51 to 0.54 for SS_UPG2. The highest cow predictivities for BLUP were obtained with the most extreme truncation, whereas for SS_UPG2, cow predictivities were also unaffected by phenotype-pedigree truncations. The regression coefficient of cow predictivities on (G)EBV was 1.02 ± 0.02 for SS_UPG2 with the most extreme truncation, which indicated the least biased predictions. Computations with the complete data set took 17 h with BLUP, 58 h with SS_UPG, and 23 h with SS_UPG2. The same computations with the most extreme phenotype-pedigree truncation took 7, 36, and 15 h, respectively. The SS_UPG2 converged in fewer rounds than BLUP, whereas SS_UPG took up to twice as many rounds. Thus, the ssGBLUP with UPG assigned to both A and A22 provided accurate and unbiased evaluations, regardless of phenotype-pedigree truncation scenario. Old phenotypes (before 2000 in this data set) did not affect the reliability of predictions for young selection candidates, especially in SS_UPG2.
Collapse
Affiliation(s)
- A Cesarani
- Department of Animal and Dairy Science, University of Georgia, Athens 30602.
| | - Y Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - S Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | | | - P M VanRaden
- Animal Genomics and Improvement Laboratory, Agricultural Research Service, USDA, Beltsville, MD 20705-2350
| | - D Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| | - I Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens 30602
| |
Collapse
|
7
|
Lourenco D, Legarra A, Tsuruta S, Masuda Y, Aguilar I, Misztal I. Single-Step Genomic Evaluations from Theory to Practice: Using SNP Chips and Sequence Data in BLUPF90. Genes (Basel) 2020; 11:E790. [PMID: 32674271 PMCID: PMC7397237 DOI: 10.3390/genes11070790] [Citation(s) in RCA: 66] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Revised: 07/03/2020] [Accepted: 07/06/2020] [Indexed: 11/16/2022] Open
Abstract
Single-step genomic evaluation became a standard procedure in livestock breeding, and the main reason is the ability to combine all pedigree, phenotypes, and genotypes available into one single evaluation, without the need of post-analysis processing. Therefore, the incorporation of data on genotyped and non-genotyped animals in this method is straightforward. Since 2009, two main implementations of single-step were proposed. One is called single-step genomic best linear unbiased prediction (ssGBLUP) and uses single nucleotide polymorphism (SNP) to construct the genomic relationship matrix; the other is the single-step Bayesian regression (ssBR), which is a marker effect model. Under the same assumptions, both models are equivalent. In this review, we focus solely on ssGBLUP. The implementation of ssGBLUP into the BLUPF90 software suite was done in 2009, and since then, several changes were made to make ssGBLUP flexible to any model, number of traits, number of phenotypes, and number of genotyped animals. Single-step GBLUP from the BLUPF90 software suite has been used for genomic evaluations worldwide. In this review, we will show theoretical developments and numerical examples of ssGBLUP using SNP data from regular chips to sequence data.
Collapse
Affiliation(s)
- Daniela Lourenco
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Andres Legarra
- Institut National de la Recherche Agronomique, UMR1388 GenPhySE, 31326 Castanet Tolosan, France;
| | - Shogo Tsuruta
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Yutaka Masuda
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| | - Ignacio Aguilar
- Instituto Nacional de Investigación Agropecuaria (INIA), 11500 Montevideo, Uruguay;
| | - Ignacy Misztal
- Department of Animal and Dairy Science, University of Georgia, Athens, GA 30602, USA; (S.T.); (Y.M.); (I.M.)
| |
Collapse
|