Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Gonen S, Ros-Freixedes R, Battagin M, Gorjanc G, Hickey JM. A method for the allocation of sequencing resources in genotyped livestock populations. Genet Sel Evol 2017;49:47. [PMID: 28521728 PMCID: PMC5437657 DOI: 10.1186/s12711-017-0322-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 05/12/2017] [Indexed: 11/18/2022] Open

For:	Gonen S, Ros-Freixedes R, Battagin M, Gorjanc G, Hickey JM. A method for the allocation of sequencing resources in genotyped livestock populations. Genet Sel Evol 2017;49:47. [PMID: 28521728 PMCID: PMC5437657 DOI: 10.1186/s12711-017-0322-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2016] [Accepted: 05/12/2017] [Indexed: 11/18/2022] Open

Number

Cited by Other Article(s)

Zheng M, Liao J, Li Z, Xu Z, Jiang Z, Tan L, Fu R, Xu H, Li Z, Zhang X, Nie Q. Evaluation of the selection of key individuals for genotype imputation in Chinese yellow-feathered chicken. Poult Sci 2023;102:102901. [PMID: 37499612 PMCID: PMC10393784 DOI: 10.1016/j.psj.2023.102901] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2023] [Revised: 06/02/2023] [Accepted: 06/24/2023] [Indexed: 07/29/2023] Open

Affiliation(s)

Ming Zheng Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Jiahao Liao Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Zhuohang Li Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Zhenqiang Xu Guangdong Wens Nanfang Poultry Breeding Co., Ltd., Xinxing 527439, China
Ziqin Jiang Guangdong Wens Nanfang Poultry Breeding Co., Ltd., Xinxing 527439, China
Liangtian Tan Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Rong Fu Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Haiping Xu Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Zhenhui Li Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Xiquan Zhang Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China
Qinghua Nie Lingnan Guangdong Laboratory of Modern Agriculture, College of Animal Science, South China Agricultural University, Guangzhou 510642, Guangdong, China; Guangdong Provincial Key Lab of Agro-Animal Genomics and Molecular Breeding, and Key Laboratory of Chicken Genetics, Breeding and Reproduction, Ministry of Agriculture, Guangzhou 510642, Guangdong, China; State Key Laboratory of Livestock and Poultry Breeding, South China Agricultural University, Guangzhou 510642, Guangdong, China.

Collapse

Ros-Freixedes R, Johnsson M, Whalen A, Chen CY, Valente BD, Herring WO, Gorjanc G, Hickey JM. Genomic prediction with whole-genome sequence data in intensely selected pig lines. GENETICS SELECTION EVOLUTION 2022;54:65. [PMID: 36153511 PMCID: PMC9509613 DOI: 10.1186/s12711-022-00756-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/05/2022] [Indexed: 12/03/2022]

Abstract

Background

Early simulations indicated that whole-genome sequence data (WGS) could improve the accuracy of genomic predictions within and across breeds. However, empirical results have been ambiguous so far. Large datasets that capture most of the genomic diversity in a population must be assembled so that allele substitution effects are estimated with high accuracy. The objectives of this study were to use a large pig dataset from seven intensely selected lines to assess the benefits of using WGS for genomic prediction compared to using commercial marker arrays and to identify scenarios in which WGS provides the largest advantage.

Methods

We sequenced 6931 individuals from seven commercial pig lines with different numerical sizes. Genotypes of 32.8 million variants were imputed for 396,100 individuals (17,224 to 104,661 per line). We used BayesR to perform genomic prediction for eight complex traits. Genomic predictions were performed using either data from a standard marker array or variants preselected from WGS based on association tests.

Results

The accuracies of genomic predictions based on preselected WGS variants were not robust across traits and lines and the improvements in prediction accuracy that we achieved so far with WGS compared to standard marker arrays were generally small. The most favourable results for WGS were obtained when the largest training sets were available and standard marker arrays were augmented with preselected variants with statistically significant associations to the trait. With this method and training sets of around 80k individuals, the accuracy of within-line genomic predictions was on average improved by 0.025. With multi-line training sets, improvements of 0.04 compared to marker arrays could be expected.

Conclusions

Our results showed that WGS has limited potential to improve the accuracy of genomic predictions compared to marker arrays in intensely selected pig lines. Thus, although we expect that larger improvements in accuracy from the use of WGS are possible with a combination of larger training sets and optimised pipelines for generating and analysing such datasets, the use of WGS in the current implementations of genomic prediction should be carefully evaluated against the cost of large-scale WGS data on a case-by-case basis.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12711-022-00756-0.

Collapse

Rare and population-specific functional variation across pig lines. Genet Sel Evol 2022;54:39. [PMID: 35659233 PMCID: PMC9164375 DOI: 10.1186/s12711-022-00732-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2022] [Accepted: 05/17/2022] [Indexed: 01/09/2023] Open

Abstract

BACKGROUND

It is expected that functional, mainly missense and loss-of-function (LOF), and regulatory variants are responsible for most phenotypic differences between breeds and genetic lines of livestock species that have undergone diverse selection histories. However, there is still limited knowledge about the existing missense and LOF variation in commercial livestock populations, in particular regarding population-specific variation and how it can affect applications such as across-breed genomic prediction.

METHODS

We re-sequenced the whole genome of 7848 individuals from nine commercial pig lines (average sequencing coverage: 4.1×) and imputed whole-genome genotypes for 440,610 pedigree-related individuals. The called variants were categorized according to predicted functional annotation (from LOF to intergenic) and prevalence level (number of lines in which the variant segregated; from private to widespread). Variants in each category were examined in terms of their distribution along the genome, alternative allele frequency, per-site Wright's fixation index (F_ST), individual load, and association to production traits.

RESULTS

Of the 46 million called variants, 28% were private (called in only one line) and 21% were widespread (called in all nine lines). Genomic regions with a low recombination rate were enriched with private variants. Low-prevalence variants (called in one or a few lines only) were enriched for lower allele frequencies, lower F_ST, and putatively functional and regulatory roles (including LOF and deleterious missense variants). On average, individuals carried fewer private deleterious missense alleles than expected compared to alleles with other predicted consequences. Only a small subset of the low-prevalence variants had intermediate allele frequencies and explained small fractions of phenotypic variance (up to 3.2%) of production traits. The significant low-prevalence variants had higher per-site F_ST than the non-significant ones. These associated low-prevalence variants were tagged by other more widespread variants in high linkage disequilibrium, including intergenic variants.

CONCLUSIONS

Most low-prevalence variants have low minor allele frequencies and only a small subset of low-prevalence variants contributed detectable fractions of phenotypic variance of production traits. Accounting for low-prevalence variants is therefore unlikely to noticeably benefit across-breed analyses, such as the prediction of genomic breeding values in a population using reference populations of a different genetic background.

Collapse

Dauben CM, Große-Brinkhaus C, Heuß EM, Henne H, Tholen E. Comparison of the choice of animals for re-sequencing in two maternal pig lines. Genet Sel Evol 2022;54:16. [PMID: 35183111 PMCID: PMC8858453 DOI: 10.1186/s12711-022-00706-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Accepted: 01/31/2022] [Indexed: 11/10/2022] Open

Abstract

Next-generation sequencing is a promising approach for the detection of causal variants within previously identified quantitative trait loci. Because of the costs of re-sequencing experiments, this application is currently mainly restricted to subsets of animals from already genotyped populations. Imputation from a lower to a higher marker density could represent a useful complementary approach. An analysis of the literature shows that several strategies are available to select animals for re-sequencing. This study demonstrates an animal selection workflow under practical conditions. Our approach considers different data sources and limited resources such as budget and availability of sampling material. The workflow combines previously described approaches and makes use of genotype and pedigree information from a Landrace and Large White population. Genotypes were phased and haplotypes were accurately estimated with AlphaPhase. Then, AlphaSeqOpt was used to optimize selection of animals for re-sequencing, reflecting the existing diversity of haplotypes. AlphaSeqOpt and ENDOG were used to select individuals based on pedigree information and by taking into account key animals that represent the genetic diversity of the populations. After the best selection criteria were determined, a subset of 57 animals was selected for subsequent re-sequencing. In order to evaluate and assess the advantage of this procedure, imputation accuracy was assessed by setting a set of single nucleotide polymorphism (SNP) chip genotypes to missing. Accuracy values were compared to those of alternative selection scenarios and the results showed the clear benefits of a targeted selection within this practical-driven approach. Especially imputation of low-frequency markers benefits from the combined approach described here. Accuracy was increased by up to 12% compared to a randomized or exclusively haplotype-based selection of sequencing candidates.

Collapse

Cheng H, Xu K, Li J, Abraham KJ. Optimizing Sequencing Resources in Genotyped Livestock Populations Using Linear Programming. Front Genet 2021;12:740340. [PMID: 34745214 PMCID: PMC8570094 DOI: 10.3389/fgene.2021.740340] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2021] [Accepted: 09/20/2021] [Indexed: 11/13/2022] Open

da Silva ÉDB, Xavier A, Faria MV. Impact of Genomic Prediction Model, Selection Intensity, and Breeding Strategy on the Long-Term Genetic Gain and Genetic Erosion in Soybean Breeding. Front Genet 2021;12:637133. [PMID: 34539725 PMCID: PMC8440908 DOI: 10.3389/fgene.2021.637133] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 08/05/2021] [Indexed: 11/21/2022] Open

Abstract

Genomic-assisted breeding has become an important tool in soybean breeding. However, the impact of different genomic selection (GS) approaches on short- and long-term gains is not well understood. Such gains are conditional on the breeding design and may vary with a combination of the prediction model, family size, selection strategies, and selection intensity. To address these open questions, we evaluated various scenarios through a simulated closed soybean breeding program over 200 breeding cycles. Genomic prediction was performed using genomic best linear unbiased prediction (GBLUP), Bayesian methods, and random forest, benchmarked against selection on phenotypic values, true breeding values (TBV), and random selection. Breeding strategies included selections within family (WF), across family (AF), and within pre-selected families (WPSF), with selection intensities of 2.5, 5.0, 7.5, and 10.0%. Selections were performed at the F4 generation, where individuals were phenotyped and genotyped with a 6K single nucleotide polymorphism (SNP) array. Initial genetic parameters for the simulation were estimated from the SoyNAM population. WF selections provided the most significant long-term genetic gains. GBLUP and Bayesian methods outperformed random forest and provided most of the genetic gains within the first 100 generations, being outperformed by phenotypic selection after generation 100. All methods provided similar performances under WPSF selections. A faster decay in genetic variance was observed when individuals were selected AF and WPSF, as 80% of the genetic variance was depleted within 28-58 cycles, whereas WF selections preserved the variance up to cycle 184. Surprisingly, the selection intensity had less impact on long-term gains than did the breeding strategies. The study supports that genetic gains can be optimized in the long term with specific combinations of prediction models, family size, selection strategies, and selection intensity. A combination of strategies may be necessary for balancing the short-, medium-, and long-term genetic gains in breeding programs while preserving the genetic variance.

Collapse

Akdemir D, Rio S, Isidro y Sánchez J. TrainSel: An R Package for Selection of Training Populations. Front Genet 2021;12:655287. [PMID: 34025720 PMCID: PMC8138169 DOI: 10.3389/fgene.2021.655287] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 03/31/2021] [Indexed: 01/01/2023] Open

Nosková A, Bhati M, Kadri NK, Crysnanto D, Neuenschwander S, Hofer A, Pausch H. Characterization of a haplotype-reference panel for genotyping by low-pass sequencing in Swiss Large White pigs. BMC Genomics 2021;22:290. [PMID: 33882824 PMCID: PMC8061004 DOI: 10.1186/s12864-021-07610-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 04/13/2021] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

The key-ancestor approach has been frequently applied to prioritize individuals for whole-genome sequencing based on their marginal genetic contribution to current populations. Using this approach, we selected 70 key ancestors from two lines of the Swiss Large White breed that have been selected divergently for fertility and fattening traits and sequenced their genomes with short paired-end reads.

RESULTS

Using pedigree records, we estimated the effective population size of the dam and sire line to 72 and 44, respectively. In order to assess sequence variation in both lines, we sequenced the genomes of 70 boars at an average coverage of 16.69-fold. The boars explained 87.95 and 95.35% of the genetic diversity of the breeding populations of the dam and sire line, respectively. Reference-guided variant discovery using the GATK revealed 26,862,369 polymorphic sites. Principal component, admixture and fixation index (F_ST) analyses indicated considerable genetic differentiation between the lines. Genomic inbreeding quantified using runs of homozygosity was higher in the sire than dam line (0.28 vs 0.26). Using two complementary approaches, we detected 51 signatures of selection. However, only six signatures of selection overlapped between both lines. We used the sequenced haplotypes of the 70 key ancestors as a reference panel to call 22,618,811 genotypes in 175 pigs that had been sequenced at very low coverage (1.11-fold) using the GLIMPSE software. The genotype concordance, non-reference sensitivity and non-reference discrepancy between thus inferred and Illumina PorcineSNP60 BeadChip-called genotypes was 97.60, 98.73 and 3.24%, respectively. The low-pass sequencing-derived genomic relationship coefficients were highly correlated (r > 0.99) with those obtained from microarray genotyping.

CONCLUSIONS

We assessed genetic diversity within and between two lines of the Swiss Large White pig breed. Our analyses revealed considerable differentiation, even though the split into two populations occurred only few generations ago. The sequenced haplotypes of the key ancestor animals enabled us to implement genotyping by low-pass sequencing which offers an intriguing cost-effective approach to increase the variant density over current array-based genotyping by more than 350-fold.

Collapse

Ros-Freixedes R, Whalen A, Gorjanc G, Mileham AJ, Hickey JM. Evaluation of sequencing strategies for whole-genome imputation with hybrid peeling. Genet Sel Evol 2020;52:18. [PMID: 32248818 PMCID: PMC7132986 DOI: 10.1186/s12711-020-00537-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 03/27/2020] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

For assembling large whole-genome sequence datasets for routine use in research and breeding, the sequencing strategy should be adapted to the methods that will be used later for variant discovery and imputation. In this study, we used simulation to explore the impact that the sequencing strategy and level of sequencing investment have on the overall accuracy of imputation using hybrid peeling, a pedigree-based imputation method that is well suited for large livestock populations.

METHODS

We simulated marker array and whole-genome sequence data for 15 populations with simulated or real pedigrees that had different structures. In these populations, we evaluated the effect on imputation accuracy of seven methods for selecting which individuals to sequence, the generation of the pedigree to which the sequenced individuals belonged, the use of variable or uniform coverage, and the trade-off between the number of sequenced individuals and their sequencing coverage. For each population, we considered four levels of investment in sequencing that were proportional to the size of the population.

RESULTS

Imputation accuracy depended greatly on pedigree depth. The distribution of the sequenced individuals across the generations of the pedigree underlay the performance of the different methods used to select individuals to sequence and it was critical for achieving high imputation accuracy in both early and late generations. Imputation accuracy was highest with a uniform coverage across the sequenced individuals of 2× rather than variable coverage. An investment equivalent to the cost of sequencing 2% of the population at 2× provided high imputation accuracy. The gain in imputation accuracy from additional investment decreased with larger populations and higher levels of investment. However, to achieve the same imputation accuracy, a proportionally greater investment must be used in the smaller populations compared to the larger ones.

CONCLUSIONS

Suitable sequencing strategies for subsequent imputation with hybrid peeling involve sequencing ~2% of the population at a uniform coverage 2×, distributed preferably across all generations of the pedigree, except for the few earliest generations that lack genotyped ancestors. Such sequencing strategies are beneficial for generating whole-genome sequence data in populations with deep pedigrees of closely related individuals.

Collapse

Ros-Freixedes R, Whalen A, Chen CY, Gorjanc G, Herring WO, Mileham AJ, Hickey JM. Accuracy of whole-genome sequence imputation using hybrid peeling in large pedigreed livestock populations. Genet Sel Evol 2020;52:17. [PMID: 32248811 PMCID: PMC7132992 DOI: 10.1186/s12711-020-00536-8] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2019] [Accepted: 03/27/2020] [Indexed: 12/30/2022] Open

Abstract

BACKGROUND

The coupling of appropriate sequencing strategies and imputation methods is critical for assembling large whole-genome sequence datasets from livestock populations for research and breeding. In this paper, we describe and validate the coupling of a sequencing strategy with the imputation method hybrid peeling in real animal breeding settings.

METHODS

We used data from four pig populations of different size (18,349 to 107,815 individuals) that were widely genotyped at densities between 15,000 and 75,000 markers genome-wide. Around 2% of the individuals in each population were sequenced (most of them at 1× or 2× and 37-92 individuals per population, totalling 284, at 15-30×). We imputed whole-genome sequence data with hybrid peeling. We evaluated the imputation accuracy by removing the sequence data of the 284 individuals with high coverage, using a leave-one-out design. We simulated data that mimicked the sequencing strategy used in the real populations to quantify the factors that affected the individual-wise and variant-wise imputation accuracies using regression trees.

RESULTS

Imputation accuracy was high for the majority of individuals in all four populations (median individual-wise dosage correlation: 0.97). Imputation accuracy was lower for individuals in the earliest generations of each population than for the rest, due to the lack of marker array data for themselves and their ancestors. The main factors that determined the individual-wise imputation accuracy were the genotyping status, the availability of marker array data for immediate ancestors, and the degree of connectedness to the rest of the population, but sequencing coverage of the relatives had no effect. The main factors that determined variant-wise imputation accuracy were the minor allele frequency and the number of individuals with sequencing coverage at each variant site. Results were validated with the empirical observations.

CONCLUSIONS

We demonstrate that the coupling of an appropriate sequencing strategy and hybrid peeling is a powerful strategy for generating whole-genome sequence data with high accuracy in large pedigreed populations where only a small fraction of individuals (2%) had been sequenced, mostly at low coverage. This is a critical step for the successful implementation of whole-genome sequence data for genomic prediction and fine-mapping of causal variants.

Collapse

Butty AM, Sargolzaei M, Miglior F, Stothard P, Schenkel FS, Gredler-Grandl B, Baes CF. Optimizing Selection of the Reference Population for Genotype Imputation From Array to Sequence Variants. Front Genet 2019;10:510. [PMID: 31214246 PMCID: PMC6554347 DOI: 10.3389/fgene.2019.00510] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2019] [Accepted: 05/10/2019] [Indexed: 11/29/2022] Open

Abstract

Imputation of high-density genotypes to whole-genome sequences (WGS) is a cost-effective method to increase the density of available markers within a population. Imputed genotypes have been successfully used for genomic selection and discovery of variants associated with traits of interest for the population. To allow for the use of imputed genotypes for genomic analyses, accuracy of imputation must be high. Accuracy of imputation is influenced by multiple factors, such as size and composition of the reference group, and the allele frequency of variants included. Understanding the use of imputed WGSs prior to the generation of the reference population is important, as accurate imputation might be more focused, for instance, on common or on rare variants. The aim of this study was to present and evaluate new methods to select animals for sequencing relying on a previously genotyped population. The Genetic Diversity Index method optimizes the number of unique haplotypes in the future reference population, while the Highly Segregating Haplotype selection method targets haplotype alleles found throughout the majority of the population of interest. First the WGSs of a dairy cattle population were simulated. The simulated sequences mimicked the linkage disequilibrium level and the variants’ frequency distribution observed in currently available Holstein sequences. Then, reference populations of different sizes, in which animals were selected using both novel methods proposed here as well as two other methods presented in previous studies, were created. Finally, accuracies of imputation obtained with different reference populations were compared against each other. The novel methods were found to have overall accuracies of imputation of more than 0.85. Accuracies of imputation of rare variants reached values above 0.50. In conclusion, if imputed sequences are to be used for discovery of novel associations between variants and traits of interest in the population, animals carrying novel information should be selected and, consequently, the Genetic Diversity Index method proposed here may be used. If sequences are to be used to impute the overall genotyped population, a reference population consisting of common haplotypes carriers selected using the proposed Highly Segregating Haplotype method is recommended.

Collapse

Johnsson M, Ros-Freixedes R, Gorjanc G, Campbell MA, Naswa S, Kelly K, Lightner J, Rounsley S, Hickey JM. Sequence variation, evolutionary constraint, and selection at the CD163 gene in pigs. Genet Sel Evol 2018;50:69. [PMID: 30572815 PMCID: PMC6302423 DOI: 10.1186/s12711-018-0440-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 12/16/2018] [Indexed: 11/24/2022] Open

Whalen A, Ros-Freixedes R, Wilson DL, Gorjanc G, Hickey JM. Hybrid peeling for fast and accurate calling, phasing, and imputation with sequence data of any coverage in pedigrees. Genet Sel Evol 2018;50:67. [PMID: 30563452 PMCID: PMC6299538 DOI: 10.1186/s12711-018-0438-2] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2018] [Accepted: 12/11/2018] [Indexed: 12/31/2022] Open

Ros-Freixedes R, Battagin M, Johnsson M, Gorjanc G, Mileham AJ, Rounsley SD, Hickey JM. Impact of index hopping and bias towards the reference allele on accuracy of genotype calls from low-coverage sequencing. Genet Sel Evol 2018;50:64. [PMID: 30545283 PMCID: PMC6293637 DOI: 10.1186/s12711-018-0436-4] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2018] [Accepted: 11/30/2018] [Indexed: 12/17/2022] Open

A method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals. Genet Sel Evol 2017;49:78. [PMID: 29070022 PMCID: PMC5655873 DOI: 10.1186/s12711-017-0353-y] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 10/18/2017] [Indexed: 11/25/2022] Open

Abstract

Background

This paper describes a heuristic method for allocating low-coverage sequencing resources by targeting haplotypes rather than individuals. Low-coverage sequencing assembles high-coverage sequence information for every individual by accumulating data from the genome segments that they share with many other individuals into consensus haplotypes. Deriving the consensus haplotypes accurately is critical for achieving a high phasing and imputation accuracy. In order to enable accurate phasing and imputation of sequence information for the whole population, we allocate the available sequencing resources among individuals with existing phased genomic data by targeting the sequencing coverage of their haplotypes.

Results

Our method, called AlphaSeqOpt, prioritizes haplotypes using a score function that is based on the frequency of the haplotypes in the sequencing set relative to the target coverage. AlphaSeqOpt has two steps: (1) selection of an initial set of individuals by iteratively choosing the individuals that have the maximum score conditional on the current set, and (2) refinement of the set through several rounds of exchanges of individuals. AlphaSeqOpt is very effective for distributing a fixed amount of sequencing resources evenly across haplotypes, which results in a reduction of the proportion of haplotypes that are sequenced below the target coverage. AlphaSeqOpt can provide a greater proportion of haplotypes sequenced at the target coverage by sequencing less individuals, as compared with other methods that use a score function based on haplotype frequencies in the population. A refinement of the initially selected set can provide a larger more diverse set with more unique individuals, which is beneficial in the context of low-coverage sequencing. We extend the method with an approach for filtering rare haplotypes based on their flanking haplotypes, so that only those that are likely to derive from a recombination event are targeted.

Conclusions

We present a method for allocating sequencing resources so that a greater proportion of haplotypes are sequenced at a coverage that is sufficiently high for population-based imputation with low-coverage sequencing. The haplotype score function, the refinement step, and the new approach for filtering rare haplotypes make AlphaSeqOpt more effective for that purpose than previously reported methods for reducing sequencing redundancy.

Electronic supplementary material

The online version of this article (doi:10.1186/s12711-017-0353-y) contains supplementary material, which is available to authorized users.

Collapse