1
|
Vinje H, Brustad HK, Heggli A, Sevillano CA, Van Son M, Gangsei LE. Classification of breed combinations for slaughter pigs based on genotypes-modeling DNA samples of crossbreeds as fuzzy sets from purebred founders. Front Genet 2023; 14:1289130. [PMID: 38116292 PMCID: PMC10729766 DOI: 10.3389/fgene.2023.1289130] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Accepted: 11/07/2023] [Indexed: 12/21/2023] Open
Abstract
In pig production, the production animals are generally three- or four-way crossbreeds. Reliable information regarding the breed of origin of slaughtered pigs is useful, even a prerequisite, for a number of purposes, e.g., evaluating potential breed effects on carcass grading. Genetic data from slaughtered pigs can easily be extracted and used for crossbreed classification. In the current study, four classification methods, namely, random forest (RF), ADMIXTURE, partial least squares regression (PLSR), and partial least squares together with quadratic discriminant analysis (PLS-QDA) were evaluated on simulated (n = 7,500) genomic data of crossbreeds. The derivation of the theory behind PLS-QDA is a major part of the current study, whereas RF and ADMIXTURE are known and well-described in the literature. Classification success (CS) rate, square loss (SL), and Kullback-Leibler (KL) divergence loss for the simulated data were used to compare methods. Overall, PLS-QDA performed best with 99%/0.0018/0.002 (CS/SL/KL) vs. 97%/0.0084/0.051, 97%/0.0087/0.0623, and 17%/0.068/0.39 for PLSR, ADMIXTURE, and RF, respectively. PLS-QDA and ADMIXTURE, as the most relevant methods, were used on a real dataset (n = 1,013) from Norway where the two largest classes contained 532 and 192 (PLS-QDA), and 531 and 193 (ADMIXTURE) individuals, respectively. These two classes were expected to be dominating a priori. The Bayesian nature of PLS-QDA enables inclusion of desirable features such as a separate class "unknown breed combination" and informative priors for crossbreeds, making this a preferable method for the classification of breed combination in the industry.
Collapse
Affiliation(s)
- H. Vinje
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
| | - H. K. Brustad
- Oslo Center of Biostatistics and Epidemiology, Oslo University Hospital, Oslo, Norway
| | - A. Heggli
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
- Animalia AS, Oslo, Norway
| | | | | | - L. E. Gangsei
- Faculty of Chemistry, Biotechnology and Food Science, Norwegian University of Life Sciences, Ås, Norway
- Animalia AS, Oslo, Norway
| |
Collapse
|
2
|
Gao Z, Zhang Y, Li Z, Zeng Q, Yang F, Song Y, Song Y, He J. Genomic breed composition of Ningxiang pig via different SNP panels. J Anim Physiol Anim Nutr (Berl) 2021; 106:783-791. [PMID: 34260785 DOI: 10.1111/jpn.13603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 06/17/2021] [Accepted: 06/21/2021] [Indexed: 11/30/2022]
Abstract
The genomic breed composition (GBC) reflects the genetic relationship between individual animal and ancestor breeds in composite or hybrid breeds. Also, it can estimate the genomic contribution of each breed (ancestor) to the genome of each individual animal. Using genomic SNP information to estimate Ningxiang pig GBC is of great significance. First of all, GBC was widely used in cattle and had significant effects, but there is almost no using experience in Chinese endemic pig breeds. Importantly, High-density SNPs are expensive but can be economized by deploying a relatively small number of highly informative SNP scattered evenly across the genome. Moreover, the impact of low-density SNPs selection strategy on estimating the GBC of individual animals has not been fully explained. Using SNP data from different databases and organizations, we established reference (N = 2015) and verification (N = 302) data sets. Twelve successively smaller SNP panels (500, 1K, 5K, 10K) were built from those SNP in the reference data by three selection methods (uniform, maximized the Euclidean distance (MED) and random distribution method). For each panel, the GBC of Ningxiang pigs in the reference dataset was estimated. Then combining Shannon entropy and the GBC results, the optimal panel (the 10K SNP panel constructed by MED method) was picked out to estimate the GBC of verification Ningxiang pig, which detected that 230 individuals were purebred Ningxiang pigs and the remaining 72 impure individuals contained 6.44% blood related with Rongchang pigs and 4.09% with Bamaxiang pigs in the verification Ningxiang population. Finally, the genetic structure analysis of verification population was performed combining with the results of GBC, multi-dimensional scaling (MDS) analysis and hierarchical cluster analysis. These results showed: (a) GBC could accurately identify purebred Ningxiang pigs and, scientifically, calculate the genomic contribution of each breed of each hybrid animal. (b) GBC could carry out population genetic structure and understand the genetic background of Ningxiang pigs. Such findings highlight a variety of opportunities to better protect and identify other endangered local breeds in China facing the same situation as Ningxiang pig and provide more accurate, economical and efficient new technical support in GBC estimation breeding work.
Collapse
Affiliation(s)
- Zhendong Gao
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yuebo Zhang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Zhi Li
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Qinhua Zeng
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Fang Yang
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yuexiang Song
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Yukun Song
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| | - Jun He
- College of Animal Science and Technology, Hunan Agricultural University, Changsha, China
| |
Collapse
|
3
|
Estimating breed composition for pigs: A case study focused on Mangalitsa pigs and two methods. Livest Sci 2021. [DOI: 10.1016/j.livsci.2021.104398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
|
4
|
Li Z, Wu XL, Guo W, He J, Li H, Rosa GJM, Gianola D, Tait RG, Parham J, Genho J, Schultz T, Bauck S. Estimation of genomic breed composition of individual animals in composite beef cattle. Anim Genet 2020; 51:457-460. [PMID: 32239777 DOI: 10.1111/age.12928] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/10/2020] [Indexed: 02/01/2023]
Abstract
Three statistical models (an admixture model, linear regression, and ridge-regression BLUP) and two strategies for selecting SNP panels (uniformly spaced vs. maximum Euclidean distance of SNP allele frequencies between ancestral breeds) were compared for estimating genomic-estimated breed composition (GBC) in Brangus and Santa Gertrudis cattle, respectively. Animals were genotyped with a GeneSeek Genomic Profiler bovine low-density version 4 SNP chip. The estimated GBC was consistent among the uniformly spaced SNP panels, and values were similar between the three models. However, estimated GBC varied considerably between the three methods when using fewer than 10 000 SNPs that maximized the Euclidean distance of allele frequencies between the ancestral breeds. The admixture model performed most consistently across various SNP panel sizes. For the other two models, stabilized estimates were obtained with an SNP panel size of 20 000 SNPs or more. Based on the uniformly spaced 20K SNP panel, the estimated GBC was 69.8-70.5% Angus and 29.5-30.2% Brahman for Brangus, and 63.9-65.3% Shorthorn and 34.7-36.1% Brahman in Santa Gertrudis. The estimated GBC of ancestries for Santa Gertrudis roughly agreed with the pedigree-expected values. However, the estimated GBC in Brangus showed a considerably larger Angus composition than the pedigree-expected value (62.5%). The elevated Angus composition in the Brangus could be due to the mixture of some 1/2 Ultrablack animals (Brangus × Angus). Another reason could be the consequences of selection in Brangus cattle for phenotypes where the Angus breed has advantages.
Collapse
Affiliation(s)
- Z Li
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,Department of Animal Science, University of Wyoming, Laramie, WY, 82071, USA.,College of Animal Science and Technology, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - X-L Wu
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - W Guo
- Department of Animal Science, University of Wyoming, Laramie, WY, 82071, USA
| | - J He
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,College of Animal Science and Technology, Hunan Agricultural University, Changsha, Hunan, 410128, China
| | - H Li
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA.,Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - G J M Rosa
- Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - D Gianola
- Department of Animal Sciences, University of Wisconsin, Madison, WI, 53706, USA
| | - R G Tait
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - J Parham
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - J Genho
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - T Schultz
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| | - S Bauck
- Biostatistics and Bioinformatics, Neogen GeneSeek, Lincoln, NE, 68504, USA
| |
Collapse
|
5
|
Martínez CA, Khare K, Elzo MA. BIBI: Bayesian inference of breed composition. J Anim Breed Genet 2017; 135:54-61. [PMID: 29164684 DOI: 10.1111/jbg.12305] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2017] [Accepted: 10/24/2017] [Indexed: 11/28/2022]
Abstract
The aim of this paper was to develop statistical models to estimate individual breed composition based on the previously proposed idea of regressing discrete random variables corresponding to counts of reference alleles of biallelic molecular markers located across the genome on the allele frequencies of each marker in the pure (base) breeds. Some of the existing regression-based methods do not guarantee that estimators of breed composition will lie in the appropriate parameter space, and none of them account for uncertainty about allele frequencies in the pure breeds, that is, uncertainty about the design matrix. To overcome these limitations, we proposed two Bayesian generalized linear models. For each individual, both models assume that the counts of the reference allele at each marker locus follow independent Binomial distributions, use the logit link and pose a Dirichlet prior over the vector of regression coefficients (which corresponds to breed composition). This prior guarantees that point estimators of breed composition such as the posterior mean pertain to the appropriate space. The difference between these models is that model termed BIBI does not account for uncertainty about the design matrix, while model termed BIBI2 accounts for such an uncertainty by assigning independent Beta priors to the entries of this matrix. We implemented these models in a data set from the University of Florida's multibreed Angus-Brahman population. Posterior means were used as point estimators of breed composition. In addition, the ordinary least squares estimator proposed by Kuehn et al. () (OLSK) was also computed. BIBI and BIBI2 estimated breed composition more accurately than OLSK, and BIBI2 had a 7.69% improvement in accuracy as compared to BIBI.
Collapse
Affiliation(s)
- C A Martínez
- Department of Animal Sciences, University of Florida, Gainesville, FL, USA
| | - K Khare
- Department of Statistics, University of Florida, Gainesville, FL, USA
| | - M A Elzo
- Department of Animal Sciences, University of Florida, Gainesville, FL, USA
| |
Collapse
|
6
|
Funkhouser SA, Bates RO, Ernst CW, Newcom D, Steibel JP. Estimation of genome-wide and locus-specific breed composition in pigs. Transl Anim Sci 2017; 1:36-44. [PMID: 32704628 PMCID: PMC7235465 DOI: 10.2527/tas2016.0003] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2016] [Accepted: 10/18/2016] [Indexed: 11/26/2022] Open
Abstract
Advances in pig genomic technologies enable implementation of new methods to estimate breed composition, allowing innovative and efficient ways to evaluate and ensure breed and line background. Existing methods to test for homozygosity at key loci involve test mating the animal in question and observing phenotypic patterns among offspring, requiring extensive resources. In this study, whole-genome pig DNA microarray data from over 8,000 SNP was used to profile the composition of U.S. registered purebred pigs using a refined linear regression method that enhances the interpretation of coefficients. In a simulation analysis, a strong correlation between true and estimated breed composition was observed (R2 = 0.94). Applying these methods to 930 Yorkshire animals registered with the National Swine Registry, 95% were estimated to have a “genome-wide” Yorkshire breed composition of at least 0.825 or 82.5%, with similar performance for evaluating datasets of registered Duroc (n = 88) Landrace (n = 129), and Hampshire (n = 17) breeds. We also developed new methods to evaluate locus-based breed probabilities. Such methods have been applied to multi-locus SNP genotypes flanking the KIT gene known to predominantly control coat color, thereby inferring the probability that an animal has haplotypes in the KIT region that are predominant in white breeds. These methods have been adopted by the National Swine Registry as a means to identify purebred Yorkshire animals.
Collapse
Affiliation(s)
- Scott A Funkhouser
- Genetics Graduate Program, Michigan State University, East Lansing 48824
| | - Ronald O Bates
- Department of Animal Science, Michigan State University, East Lansing 48824
| | - Catherine W Ernst
- Department of Animal Science, Michigan State University, East Lansing 48824
| | - Doug Newcom
- National Swine Registry, West Lafayette, IN 47906
| | - Juan Pedro Steibel
- Department of Animal Science, Michigan State University, East Lansing 48824
| |
Collapse
|