1
|
Clouard C, Nettelblad C. Genotyping of SNPs in bread wheat at reduced cost from pooled experiments and imputation. TAG. THEORETICAL AND APPLIED GENETICS. THEORETISCHE UND ANGEWANDTE GENETIK 2024; 137:26. [PMID: 38243086 PMCID: PMC10799138 DOI: 10.1007/s00122-023-04533-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/15/2023] [Accepted: 12/19/2023] [Indexed: 01/21/2024]
Abstract
KEY MESSAGE Pooling and imputation are computational methods that can be combined for achieving cost-effective and accurate high-density genotyping of both common and rare variants, as demonstrated in a MAGIC wheat population. The plant breeding industry has shown growing interest in using the genotype data of relevant markers for performing selection of new competitive varieties. The selection usually benefits from large amounts of marker data, and it is therefore crucial to dispose of data collection methods that are both cost-effective and reliable. Computational methods such as genotype imputation have been proposed earlier in several plant science studies for addressing the cost challenge. Genotype imputation methods have though been used more frequently and investigated more extensively in human genetics research. The various algorithms that exist have shown lower accuracy at inferring the genotype of genetic variants occurring at low frequency, while these rare variants can have great significance and impact in the genetic studies that underlie selection. In contrast, pooling is a technique that can efficiently identify low-frequency items in a population, and it has been successfully used for detecting the samples that carry rare variants in a population. In this study, we propose to combine pooling and imputation and demonstrate this by simulating a hypothetical microarray for genotyping a population of recombinant inbred lines in a cost-effective and accurate manner, even for rare variants. We show that with an adequate imputation model, it is feasible to accurately predict the individual genotypes at lower cost than sample-wise genotyping and time-effectively. Moreover, we provide code resources for reproducing the results presented in this study in the form of a containerized workflow.
Collapse
Affiliation(s)
- Camille Clouard
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, 75237, Uppsala, Sweden.
| | - Carl Nettelblad
- Division of Scientific Computing, Department of Information Technology, Uppsala University, Lägerhyddsvägen 1, 75237, Uppsala, Sweden
- SciLifeLab, Science for Life Laboratory, Husargatan 3, 75237, Uppsala, Sweden
| |
Collapse
|
2
|
Campos-Martin R, Schmickler S, Goel M, Schneeberger K, Tresch A. Reliable genotyping of recombinant genomes using a robust hidden Markov model. PLANT PHYSIOLOGY 2023; 192:821-836. [PMID: 36946207 PMCID: PMC10231367 DOI: 10.1093/plphys/kiad191] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Revised: 01/20/2023] [Accepted: 01/27/2023] [Indexed: 06/01/2023]
Abstract
Meiotic recombination is an essential mechanism during sexual reproduction and includes the exchange of chromosome segments between homologous chromosomes. New allelic combinations are transmitted to the new generation, introducing novel genetic variation in the offspring genomes. With the improvement of high-throughput whole-genome sequencing technologies, large numbers of recombinant individuals can now be sequenced with low sequencing depth at low costs, necessitating computational methods for reconstructing their haplotypes. The main challenge is the uncertainty in haplotype calling that arises from the low information content of a single genomic position. Straightforward sliding window-based approaches are difficult to tune and fail to place recombination breakpoints precisely. Hidden Markov model (HMM)-based approaches, on the other hand, tend to over-segment the genome. Here, we present RTIGER, an HMM-based model that exploits in a mathematically precise way the fact that true chromosome segments typically have a certain minimum length. We further separate the task of identifying the correct haplotype sequence from the accurate placement of haplotype borders, thereby maximizing the accuracy of border positions. By comparing segmentations based on simulated data with known underlying haplotypes, we highlight the reasons for RTIGER outperforming traditional segmentation approaches. We then analyze the meiotic recombination pattern of segregants of 2 Arabidopsis (Arabidopsis thaliana) accessions and a previously described hyper-recombining mutant. RTIGER is available as an R package with an efficient Julia implementation of the core algorithm.
Collapse
Affiliation(s)
- Rafael Campos-Martin
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Division of Neurogenetics and Molecular Psychiatry, Department of Psychiatry and Psychotherapy, University of Cologne, Medical Faculty, Cologne 50937, Germany
| | - Sophia Schmickler
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
| | - Manish Goel
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Faculty for Biology, LMU Munich, Planegg-Martinsried 82152, Germany
| | - Korbinian Schneeberger
- Department of Chromosome Biology, Max Planck Institute for Plant Breeding Research, Cologne 50829, Germany
- Faculty for Biology, LMU Munich, Planegg-Martinsried 82152, Germany
- Cluster of Excellence on Plant Sciences, Heinrich-Heine University, Düsseldorf 40225, Germany
| | - Achim Tresch
- Faculty of Medicine, University Hospital Cologne, Cologne 50937, Germany
- CECAD, University of Cologne, Cologne 50931, Germany
- Center for Data and Simulation Science, University of Cologne, Cologne 50931, Germany
| |
Collapse
|
3
|
Song L, Endelman JB. Using haplotype and QTL analysis to fix favorable alleles in diploid potato breeding. THE PLANT GENOME 2023:e20339. [PMID: 37063052 DOI: 10.1002/tpg2.20339] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 02/16/2023] [Accepted: 03/16/2023] [Indexed: 06/19/2023]
Abstract
At present, the potato (Solanum tuberosum L.) of international commerce is autotetraploid, and the complexity of this genetic system creates limitations for breeding. Diploid potato breeding has long been used for population improvement, and because of an improved understanding of the genetics of gametophytic self-incompatibility, there is now sustained interest in the development of uniform F1 hybrid varieties based on inbred parents. We report here on the use of haplotype and quantitative trait locus (QTL) analysis in a modified backcrossing (BC) scheme, using primary dihaploids of S. tuberosum as the recurrent parental background. In Cycle 1, we selected XD3-36, a self-fertile F2 individual homozygous for the self-compatibility gene Sli (S-locus inhibitor). Signatures of gametic and zygotic selection were observed at multiple loci in the F2 generation, including Sli. In the BC1 cycle, an F1 population derived from XD3-36 showed a bimodal response for vine maturity, which led to the identification of late versus early alleles in XD3-36 for the gene CDF1 (Cycling DOF Factor 1). Greenhouse phenotypes and haplotype analysis were used to select a vigorous and self-fertile F2 individual with 43% homozygosity, including for Sli and the early-maturing allele CDF1.3. Partially inbred lines from the BC1 and BC2 cycles have been used to initiate new cycles of selection, with the goal of reaching higher homozygosity while maintaining plant vigor, fertility, and yield.
Collapse
Affiliation(s)
- Lin Song
- Department of Horticulture, University of Wisconsin-Madison, Madison, Wisconsin, USA
| | - Jeffrey B Endelman
- Department of Horticulture, University of Wisconsin-Madison, Madison, Wisconsin, USA
| |
Collapse
|
4
|
Vreeburg SME, Auxier B, Jacobs B, Bourke PM, van den Heuvel J, Zwaan BJ, Aanen DK. A genetic linkage map and improved genome assembly of the termite symbiont Termitomyces cryptogamus. BMC Genomics 2023; 24:123. [PMID: 36927388 PMCID: PMC10021994 DOI: 10.1186/s12864-023-09210-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 02/27/2023] [Indexed: 03/18/2023] Open
Abstract
BACKGROUND The termite-fungus symbiosis is an ancient stable mutualism of two partners that reproduce and disperse independently. With the founding of each termite colony the symbiotic association must be re-established with a new fungus partner. Complementarity in the ability to break down plant substrate may help to stabilize this symbiosis despite horizontal symbiont transmission. An alternative, non-exclusive, hypothesis is that a reduced rate of evolution may contribute to stabilize the symbiosis, the so-called Red King Effect. METHODS To explore this concept, we produced the first linkage map of a species of Termitomyces, using genotyping by sequencing (GBS) of 88 homokaryotic offspring. We constructed a highly contiguous genome assembly using PacBio data and a de-novo evidence-based annotation. This improved genome assembly and linkage map allowed for examination of the recombination landscape and its potential effect on the mutualistic lifestyle. RESULTS Our linkage map resulted in a genome-wide recombination rate of 22 cM/Mb, lower than that of other related fungi. However, the total map length of 1370 cM was similar to that of other related fungi. CONCLUSIONS The apparently decreased rate of recombination is primarily due to genome expansion of islands of gene-poor repetitive sequences. This study highlights the importance of inclusion of genomic context in cross-species comparisons of recombination rate.
Collapse
Affiliation(s)
- Sabine M E Vreeburg
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands
| | - Ben Auxier
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands.
| | - Bas Jacobs
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands.,Biometris, Wageningen University & Research, Wageningen, the Netherlands
| | - Peter M Bourke
- Plant Breeding, Wageningen University & Research, Wageningen, the Netherlands
| | - Joost van den Heuvel
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands
| | - Bas J Zwaan
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands
| | - Duur K Aanen
- Laboratory of Genetics, Wageningen University & Research, Wageningen, the Netherlands
| |
Collapse
|
5
|
Niehoff T, Pook T, Gholami M, Beissinger T. Imputation of low-density marker chip data in plant breeding: Evaluation of methods based on sugar beet. THE PLANT GENOME 2022; 15:e20257. [PMID: 36258672 DOI: 10.1002/tpg2.20257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/29/2022] [Accepted: 08/02/2022] [Indexed: 06/16/2023]
Abstract
Low-density genotyping followed by imputation reduces genotyping costs while still providing high-density marker information. An increased marker density has the potential to improve the outcome of all applications that are based on genomic data. This study investigates techniques for 1k to 20k genomic marker imputation for plant breeding programs with sugar beet (Beta vulgaris L. ssp. vulgaris) as an example crop, where these are realistic marker numbers for modern breeding applications. The generally accepted 'gold standard' for imputation, Beagle 5.1, was compared with the recently developed software AlphaPlantImpute2 which is designed specifically for plant breeding. For Beagle 5.1 and AlphaPlantImpute2, the imputation strategy as well as the imputation parameters were optimized in this study. We found that the imputation accuracy of Beagle could be tremendously improved (0.22 to 0.67) by tuning parameters, mainly by lowering the values for the parameter for the effective population size and increasing the number of iterations performed. Separating the phasing and imputation steps also improved accuracies when optimized parameters were used (0.67 to 0.82). We also found that the imputation accuracy of Beagle decreased when more low-density lines were included for imputation. AlphaPlantImpute2 produced very high accuracies without optimization (0.89) and was generally less responsive to optimization. Overall, AlphaPlantImpute2 performed relatively better for imputation whereas Beagle was better for phasing. Combining both tools yielded the highest accuracies.
Collapse
Affiliation(s)
- Tobias Niehoff
- Animal Breeding and Genomics, Wageningen Univ. & Research, Postbox 338, 6700AH, Wageningen, The Netherlands
- Dep. of Crop Sciences, Division of Plant Breeding Methodology, Univ. of Göttingen, Göttingen, 37075, Germany
| | - Torsten Pook
- Animal Breeding and Genomics, Wageningen Univ. & Research, Postbox 338, 6700AH, Wageningen, The Netherlands
- Dep. of Animal Sciences, Animal Breeding and Genetics Group, Univ. of Göttingen, Göttingen, 37075, Germany
- Center for Integrated Breeding Research, Univ. of Göttingen, Göttingen, 37075, Germany
| | - Mahmood Gholami
- RD-SBCE-BTA, KWS SAAT SE & Co. KGaA, Grimsehlstr. 31, Einbeck, 37574, Germany
| | - Timothy Beissinger
- Dep. of Crop Sciences, Division of Plant Breeding Methodology, Univ. of Göttingen, Göttingen, 37075, Germany
- Center for Integrated Breeding Research, Univ. of Göttingen, Göttingen, 37075, Germany
| |
Collapse
|
6
|
González-Castro M, Cardoso YP, Hughes LC, Ortí G. Hybridization is strongly constrained by salinity during secondary contact between silverside fishes (Odontesthes, Atheriniformes). Heredity (Edinb) 2022; 129:233-243. [PMID: 35821279 PMCID: PMC9519950 DOI: 10.1038/s41437-022-00555-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2021] [Revised: 06/28/2022] [Accepted: 06/28/2022] [Indexed: 11/08/2022] Open
Abstract
This study investigates a contact zone between two silverside fish species (marine Odontesthes argentinensis and freshwater O. bonariensis) in the estuarine Mar Chiquita lagoon along the Atlantic coast in Argentina (MChL), in which intermediate morphs had been reported. It has been suggested that admixture and introgression occur in MChL between these two species, but direct genetic evidence is lacking. Leveraging samples collected over several years (n = 676), we document the spatial distribution of both species and intermediate morphs within this habitat and collect landmark-based morphometric and multilocus genetic data (9876 loci for n = 110 individuals) to test the hypothesis of hybridization. Our analysis unambiguously characterizes intermediate morphs as F1 or F2 hybrids. We show that the low frequency of hybrid individuals in MChL may be explained by uneven abundance of parental species, which in turn are strongly affected by water salinity, limiting the size of the contact zone. Although hybrids seem to be fertile, their fitness may be reduced by external and intrinsic factors that may limit their success and suggest that this is an unstable hybrid zone. Genetic distinctiveness of both parental species is strongly supported by genome-wide data, explaining a known pattern of mitonuclear discordance as a consequence of hybridization followed by mitochondrial introgression. A clear signature of population genetic structure was detected in O. argentinensis, distinguishing MChL residents from marine populations of this species, that also was supported by distinctive morphometric features among these groups. Previous hypotheses of speciation in these fishes are discussed in the light of the new findings.
Collapse
Affiliation(s)
- Mariano González-Castro
- Grupo de Biotaxonomía Morfológica y molecular de peces, IIMyC-CONICET, Universidad Nacional de Mar del Plata, Mar del Plata, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Yamila P Cardoso
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina.
- Laboratorio de Sistemática y Biología Evolutiva, Facultad de Ciencias Naturales y Museo, Universidad Nacional de La Plata, La Plata, Argentina.
- Department of Biological Sciences, George Washington University, Washington, DC, USA.
| | - Lily C Hughes
- Department of Biological Sciences, George Washington University, Washington, DC, USA
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| | - Guillermo Ortí
- Department of Biological Sciences, George Washington University, Washington, DC, USA
- Department of Vertebrate Zoology, National Museum of Natural History, Smithsonian Institution, Washington, DC, USA
| |
Collapse
|
7
|
Barnard-Kubow KB, Becker D, Murray CS, Porter R, Gutierrez G, Erickson P, Nunez JCB, Voss E, Suryamohan K, Ratan A, Beckerman A, Bergland AO. Genetic Variation in Reproductive Investment Across an Ephemerality Gradient in Daphnia pulex. Mol Biol Evol 2022; 39:msac121. [PMID: 35642301 PMCID: PMC9198359 DOI: 10.1093/molbev/msac121] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Species across the tree of life can switch between asexual and sexual reproduction. In facultatively sexual species, the ability to switch between reproductive modes is often environmentally dependent and subject to local adaptation. However, the ecological and evolutionary factors that influence the maintenance and turnover of polymorphism associated with facultative sex remain unclear. We studied the ecological and evolutionary dynamics of reproductive investment in the facultatively sexual model species, Daphnia pulex. We found that patterns of clonal diversity, but not genetic diversity varied among ponds consistent with the predicted relationship between ephemerality and clonal structure. Reconstruction of a multi-year pedigree demonstrated the coexistence of clones that differ in their investment into male production. Mapping of quantitative variation in male production using lab-generated and field-collected individuals identified multiple putative quantitative trait loci (QTL) underlying this trait, and we identified a plausible candidate gene. The evolutionary history of these QTL suggests that they are relatively young, and male limitation in this system is a rapidly evolving trait. Our work highlights the dynamic nature of the genetic structure and composition of facultative sex across space and time and suggests that quantitative genetic variation in reproductive strategy can undergo rapid evolutionary turnover.
Collapse
Affiliation(s)
- Karen B Barnard-Kubow
- Department of Biology, University of Virginia, Charlottesville, VA, USA
- Department of Biology, James Madison University, Harrisonburg, VA, USA
| | - Dörthe Becker
- Department of Biology, University of Virginia, Charlottesville, VA, USA
- School of Biosciences, Ecology and Evolutionary Biology, University of Sheffield, Sheffield, UK
- Department of Biology, University of Marburg, Marburg, Germany
| | - Connor S Murray
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Robert Porter
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Grace Gutierrez
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | | | - Joaquin C B Nunez
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| | - Erin Voss
- Department of Biology, University of Virginia, Charlottesville, VA, USA
- Department of Integrative Biology, UC Berkeley, Berkeley, CA, USA
| | | | - Aakrosh Ratan
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA
| | - Andrew Beckerman
- School of Biosciences, Ecology and Evolutionary Biology, University of Sheffield, Sheffield, UK
| | - Alan O Bergland
- Department of Biology, University of Virginia, Charlottesville, VA, USA
| |
Collapse
|
8
|
Genotyping, the Usefulness of Imputation to Increase SNP Density, and Imputation Methods and Tools. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2467:113-138. [PMID: 35451774 DOI: 10.1007/978-1-0716-2205-6_4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Imputation has become a standard practice in modern genetic research to increase genome coverage and improve accuracy of genomic selection and genome-wide association study as a large number of samples can be genotyped at lower density (and lower cost) and, imputed up to denser marker panels or to sequence level, using information from a limited reference population. Most genotype imputation algorithms use information from relatives and population linkage disequilibrium. A number of software for imputation have been developed originally for human genetics and, more recently, for animal and plant genetics considering pedigree information and very sparse SNP arrays or genotyping-by-sequencing data. In comparison to human populations, the population structures in farmed species and their limited effective sizes allow to accurately impute high-density genotypes or sequences from very low-density SNP panels and a limited set of reference individuals. Whatever the imputation method, the imputation accuracy, measured by the correct imputation rate or the correlation between true and imputed genotypes, increased with the increasing relatedness of the individual to be imputed with its denser genotyped ancestors and as its own genotype density increased. Increasing the imputation accuracy pushes up the genomic selection accuracy whatever the genomic evaluation method. Given the marker densities, the most important factors affecting imputation accuracy are clearly the size of the reference population and the relationship between individuals in the reference and target populations.
Collapse
|
9
|
Broman KW. A generic hidden Markov model for multiparent populations. G3 GENES|GENOMES|GENETICS 2022; 12:6429279. [PMID: 34791211 PMCID: PMC9210298 DOI: 10.1093/g3journal/jkab396] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Accepted: 11/08/2021] [Indexed: 11/12/2022]
Abstract
A common step in the analysis of multiparent populations (MPPs) is genotype reconstruction: identifying the founder origin of haplotypes from dense marker data. This process often makes use of a probability model for the pattern of founder alleles along chromosomes, including the relative frequency of founder alleles and the probability of exchanges among them, which depend on a model for meiotic recombination and on the mating design for the population. While the precise experimental design used to generate the population may be used to derive a precise characterization of the model for exchanges among founder alleles, this can be tedious, particularly given the great variety of experimental designs that have been proposed. We describe an approximate model that can be applied for a variety of MPPs. We have implemented the approach in the R/qtl2 software, and we illustrate its use in applications to publicly available data on Diversity Outbred and Collaborative Cross mice.
Collapse
Affiliation(s)
- Karl W Broman
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison , Madison, WI 53706, USA
| |
Collapse
|
10
|
Teng J, Zhao C, Wang D, Chen Z, Tang H, Li J, Mei C, Yang Z, Ning C, Zhang Q. Assessment of the performance of different imputation methods for low-coverage sequencing in Holstein cattle. J Dairy Sci 2022; 105:3355-3366. [DOI: 10.3168/jds.2021-21360] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 12/13/2021] [Indexed: 12/27/2022]
|
11
|
Weller CA, Tilk S, Rajpurohit S, Bergland AO. Accurate, ultra-low coverage genome reconstruction and association studies in Hybrid Swarm mapping populations. G3-GENES GENOMES GENETICS 2021; 11:6156828. [PMID: 33677482 PMCID: PMC8759814 DOI: 10.1093/g3journal/jkab062] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/11/2020] [Accepted: 02/19/2021] [Indexed: 11/27/2022]
Abstract
Genetic association studies seek to uncover the link between genotype and phenotype, and often utilize inbred reference panels as a replicable source of genetic variation. However, inbred reference panels can differ substantially from wild populations in their genotypic distribution, patterns of linkage-disequilibrium, and nucleotide diversity. As a result, associations discovered using inbred reference panels may not reflect the genetic basis of phenotypic variation in natural populations. To address this problem, we evaluated a mapping population design where dozens to hundreds of inbred lines are outbred for few generations, which we call the Hybrid Swarm. The Hybrid Swarm approach has likely remained underutilized relative to pre-sequenced inbred lines due to the costs of genome-wide genotyping. To reduce sequencing costs and make the Hybrid Swarm approach feasible, we developed a computational pipeline that reconstructs accurate whole genomes from ultra-low-coverage (0.05X) sequence data in Hybrid Swarm populations derived from ancestors with phased haplotypes. We evaluate reconstructions using genetic variation from the Drosophila Genetic Reference Panel as well as variation from neutral simulations. We compared the power and precision of Genome-Wide Association Studies using the Hybrid Swarm, inbred lines, recombinant inbred lines (RILs), and highly outbred populations across a range of allele frequencies, effect sizes, and genetic architectures. Our simulations show that these different mapping panels vary in their power and precision, largely depending on the architecture of the trait. The Hybrid Swam and RILs outperform inbred lines for quantitative traits, but not for monogenic ones. Taken together, our results demonstrate the feasibility of the Hybrid Swarm as a cost-effective method of fine-scale genetic mapping.
Collapse
Affiliation(s)
- Cory A Weller
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| | - Susanne Tilk
- Department of Biology, Stanford University, Stanford, CA 94305, USA
| | - Subhash Rajpurohit
- Department of Biological and Life Sciences, Ahmedabad University, Ahmedabad 380009, India
| | - Alan O Bergland
- Department of Biology, University of Virginia, Charlottesville, VA 22904, USA
| |
Collapse
|
12
|
Davies RW, Kucka M, Su D, Shi S, Flanagan M, Cunniff CM, Chan YF, Myers S. Rapid genotype imputation from sequence with reference panels. Nat Genet 2021; 53:1104-1111. [PMID: 34083788 PMCID: PMC7611184 DOI: 10.1038/s41588-021-00877-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2020] [Accepted: 04/23/2021] [Indexed: 12/30/2022]
Abstract
Inexpensive genotyping methods are essential to modern genomics. Here we present QUILT, which performs diploid genotype imputation using low-coverage whole genome sequence data. QUILT employs Gibbs sampling to partition reads into maternal and paternal sets, facilitating rapid haploid imputation using large reference panels. We show this partitioning to be accurate over many megabases, enabling highly accurate imputation close to theoretical limits and outperforming existing methods. Moreover, QUILT can impute accurately using diverse technologies, including using long reads from Oxford Nanopore Technologies, and a novel form of low-cost barcoded Illumina sequencing called haplotagging, with the latter showing improved accuracy at low coverages. Relative to DNA genotyping microarrays, QUILT offers improved accuracy at reduced cost, particularly for diverse populations that are traditionally underserved in modern genomic analyses, with accuracy nearly doubling at rare SNPs. Finally, QUILT can accurately impute (4-digit) HLA types, the first such method from low-coverage sequence data.
Collapse
Affiliation(s)
| | - Marek Kucka
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Dingwen Su
- Friedrich Miescher Laboratory of the Max Planck Society, Tübingen, Germany
| | - Sinan Shi
- Department of Statistics, University of Oxford, Oxford, UK
| | - Maeve Flanagan
- Department of Pediatrics, Weill Cornell Medical College, New York, NY, USA
| | | | | | - Simon Myers
- Department of Statistics, University of Oxford, Oxford, UK.,The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK
| |
Collapse
|
13
|
Chen M, Fan W, Ji F, Hua H, Liu J, Yan M, Ma Q, Fan J, Wang Q, Zhang S, Liu G, Sun Z, Tian C, Zhao F, Zheng J, Zhang Q, Chen J, Qiu J, Wei X, Chen Z, Zhang P, Pei D, Yang J, Huang X. Genome-wide identification of agronomically important genes in outcrossing crops using OutcrossSeq. MOLECULAR PLANT 2021; 14:556-570. [PMID: 33429094 DOI: 10.1016/j.molp.2021.01.003] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Revised: 12/07/2020] [Accepted: 01/06/2021] [Indexed: 05/27/2023]
Abstract
Many important crops (e.g., tuber, root, and tree crops) are cross-pollinating. For these crops, no inbred lines are available for genetic study and breeding because they are self-incompatible, clonally propagated, or have a long generation time, making the identification of agronomically important genes difficult, particularly in crops with a complex autopolyploid genome. In this study, we developed a method, OutcrossSeq, for mapping agronomically important loci in outcrossing crops based on whole-genome low-coverage resequencing of a large genetic population, and designed three computation algorithms in OutcrossSeq for different types of outcrossing populations. We applied OutcrossSeq to a tuberous root crop (sweet potato, autopolyploid), a tree crop (walnut tree, highly heterozygous diploid), and hybrid crops (double-cross populations) to generate high-density genotype maps for the outcrossing populations, which enable precise identification of genomic loci underlying important agronomic traits. Candidate causative genes at these loci were detected based on functional clues. Taken together, our results indicate that OutcrossSeq is a robust and powerful method for identifying agronomically important genes in heterozygous species, including polyploids, in a cost-efficient way. The OutcrossSeq software and its instruction manual are available for downloading at www.xhhuanglab.cn/tool/OutcrossSeq.html.
Collapse
Affiliation(s)
- Mengjiao Chen
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Weijuan Fan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Feiyang Ji
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Hua Hua
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Jie Liu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Mengxiao Yan
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai 201602, China
| | - Qingguo Ma
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Jiongjiong Fan
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Qin Wang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Shufeng Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Guiling Liu
- Tai'an Academy of Agricultural Sciences, Tai'an 271000, Shandong, China
| | - Zhe Sun
- Tai'an Academy of Agricultural Sciences, Tai'an 271000, Shandong, China
| | - Changgeng Tian
- Tai'an Academy of Agricultural Sciences, Tai'an 271000, Shandong, China
| | - Fengling Zhao
- Tai'an Academy of Agricultural Sciences, Tai'an 271000, Shandong, China
| | - Jianli Zheng
- Tai'an Academy of Agricultural Sciences, Tai'an 271000, Shandong, China
| | - Qi Zhang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Jiaxin Chen
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Jie Qiu
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Xin Wei
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China
| | - Ziru Chen
- National Genomics Data Center, Bio-Med Big Data Center, CAS Key Laboratory of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences, Shanghai 200031, China
| | - Peng Zhang
- CAS Center for Excellence of Molecular Plant Sciences, Institute of Plant Physiology and Ecology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200233, China.
| | - Dong Pei
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of the State Forestry and Grassland Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China.
| | - Jun Yang
- Shanghai Key Laboratory of Plant Functional Genomics and Resources, Shanghai Chenshan Plant Science Research Center, Chinese Academy of Sciences, Shanghai Chenshan Botanical Garden, Shanghai 201602, China.
| | - Xuehui Huang
- Shanghai Key Laboratory of Plant Molecular Sciences, College of Life Sciences, Shanghai Normal University, Shanghai 200234, China.
| |
Collapse
|
14
|
Impact of pre- and post-variant filtration strategies on imputation. Sci Rep 2021; 11:6214. [PMID: 33737531 PMCID: PMC7973508 DOI: 10.1038/s41598-021-85333-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2020] [Accepted: 02/22/2021] [Indexed: 01/04/2023] Open
Abstract
Quality control (QC) methods for genome-wide association studies and fine mapping are commonly used for imputation, however they result in loss of many single nucleotide polymorphisms (SNPs). To investigate the consequences of filtration on imputation, we studied the direct effects on the number of markers, their allele frequencies, imputation quality scores and post-filtration events. We pre-phrased 1031 genotyped individuals from diverse ethnicities and compared the imputed variants to 1089 NCBI recorded individuals for additional validation. Without QC-based variant pre-filtration, we observed no impairment in the imputation of SNPs that failed QC whereas with pre-filtration there was an overall loss of information. Significant differences between frequencies with and without pre-filtration were found only in the range of very rare (5E-04-1E-03) and rare variants (1E-03-5E-03) (p < 1E-04). Increasing the post-filtration imputation quality score from 0.3 to 0.8 reduced the number of single nucleotide variants (SNVs) < 0.001 2.5 fold with or without QC pre-filtration and halved the number of very rare variants (5E-04). Thus, to maintain confidence and enough SNVs, we propose here a two-step filtering procedure which allows less stringent filtering prior to imputation and post-imputation in order to increase the number of very rare and rare variants compared to conservative filtration methods.
Collapse
|
15
|
Scott MF, Ladejobi O, Amer S, Bentley AR, Biernaskie J, Boden SA, Clark M, Dell'Acqua M, Dixon LE, Filippi CV, Fradgley N, Gardner KA, Mackay IJ, O'Sullivan D, Percival-Alwyn L, Roorkiwal M, Singh RK, Thudi M, Varshney RK, Venturini L, Whan A, Cockram J, Mott R. Multi-parent populations in crops: a toolbox integrating genomics and genetic mapping with breeding. Heredity (Edinb) 2020; 125:396-416. [PMID: 32616877 PMCID: PMC7784848 DOI: 10.1038/s41437-020-0336-6] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Revised: 06/16/2020] [Accepted: 06/16/2020] [Indexed: 11/21/2022] Open
Abstract
Crop populations derived from experimental crosses enable the genetic dissection of complex traits and support modern plant breeding. Among these, multi-parent populations now play a central role. By mixing and recombining the genomes of multiple founders, multi-parent populations combine many commonly sought beneficial properties of genetic mapping populations. For example, they have high power and resolution for mapping quantitative trait loci, high genetic diversity and minimal population structure. Many multi-parent populations have been constructed in crop species, and their inbred germplasm and associated phenotypic and genotypic data serve as enduring resources. Their utility has grown from being a tool for mapping quantitative trait loci to a means of providing germplasm for breeding programmes. Genomics approaches, including de novo genome assemblies and gene annotations for the population founders, have allowed the imputation of rich sequence information into the descendent population, expanding the breadth of research and breeding applications of multi-parent populations. Here, we report recent successes from crop multi-parent populations in crops. We also propose an ideal genotypic, phenotypic and germplasm 'package' that multi-parent populations should feature to optimise their use as powerful community resources for crop research, development and breeding.
Collapse
Affiliation(s)
| | | | - Samer Amer
- University of Reading, Reading, RG6 6AH, UK
- Faculty of Agriculture, Alexandria University, Alexandria, 23714, Egypt
| | - Alison R Bentley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Jay Biernaskie
- Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK
| | - Scott A Boden
- School of Agriculture, Food and Wine, University of Adelaide, Glen Osmond, SA, 5064, Australia
| | | | | | - Laura E Dixon
- Faculty of Biological Sciences, University of Leeds, Leeds, LS2 9JT, UK
| | - Carla V Filippi
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), INTA-CONICET, Nicolas Repetto y Los Reseros s/n, 1686, Hurlingham, Buenos Aires, Argentina
| | - Nick Fradgley
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Keith A Gardner
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Ian J Mackay
- SRUC, West Mains Road, Kings Buildings, Edinburgh, EH9 3JG, UK
| | | | | | - Manish Roorkiwal
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rakesh Kumar Singh
- International Center for Biosaline Agriculture, Academic City, Dubai, United Arab Emirates
| | - Mahendar Thudi
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | - Rajeev Kumar Varshney
- Center of Excellence in Genomics and Systems Biology, International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), Hyderabad, India
| | | | - Alex Whan
- CSIRO, GPO Box 1700, Canberra, ACT, 2601, Australia
| | - James Cockram
- The John Bingham Laboratory, NIAB, 93 Lawrence Weaver Road, Cambridge, CB3 0LE, UK
| | - Richard Mott
- UCL Genetics Institute, Gower Street, London, WC1E 6BT, UK
| |
Collapse
|
16
|
Whalen A, Gorjanc G, Hickey JM. AlphaFamImpute: high-accuracy imputation in full-sib families from genotype-by-sequencing data. Bioinformatics 2020; 36:4369-4371. [PMID: 32467963 PMCID: PMC7520044 DOI: 10.1093/bioinformatics/btaa499] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 04/22/2020] [Accepted: 05/25/2020] [Indexed: 12/12/2022] Open
Abstract
SUMMARY AlphaFamImpute is an imputation package for calling, phasing and imputing genome-wide genotypes in outbred full-sib families from single nucleotide polymorphism (SNP) array and genotype-by-sequencing (GBS) data. GBS data are increasingly being used to genotype individuals, especially when SNP arrays do not exist for a population of interest. Low-coverage GBS produces data with a large number of missing or incorrect naïve genotype calls, which can be improved by identifying shared haplotype segments between full-sib individuals. Here, we present AlphaFamImpute, an algorithm specifically designed to exploit the genetic structure of full-sib families. It performs imputation using a two-step approach. In the first step, it phases and imputes parental genotypes based on the segregation states of their offspring (i.e. which pair of parental haplotypes the offspring inherited). In the second step, it phases and imputes the offspring genotypes by detecting which haplotype segments the offspring inherited from their parents. With a series of simulations, we find that AlphaFamImpute obtains high-accuracy genotypes, even when the parents are not genotyped and individuals are sequenced at <1x coverage. AVAILABILITY AND IMPLEMENTATION AlphaFamImpute is available as a Python package from the AlphaGenes website http://www.AlphaGenes.roslin.ed.ac.uk/AlphaFamImpute. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Andrew Whalen
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian EH25 9RG, UK
| | - Gregor Gorjanc
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian EH25 9RG, UK
| | - John M Hickey
- The Roslin Institute and Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian EH25 9RG, UK
| |
Collapse
|
17
|
Zan Y, Payen T, Lillie M, Honaker CF, Siegel PB, Carlborg Ö. Genotyping by low-coverage whole-genome sequencing in intercross pedigrees from outbred founders: a cost-efficient approach. Genet Sel Evol 2019; 51:44. [PMID: 31412777 PMCID: PMC6694510 DOI: 10.1186/s12711-019-0487-1] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2018] [Accepted: 08/07/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Experimental intercrosses between outbred founder populations are powerful resources for mapping loci that contribute to complex traits i.e. quantitative trait loci (QTL). Here, we present an approach and its accompanying software for high-resolution reconstruction of founder mosaic genotypes in the intercross offspring from such populations using whole-genome high-coverage sequence data on founder individuals (~ 30×) and very low-coverage sequence data on intercross individuals (< 0.5×). Sets of founder-line informative markers were selected for each full-sib family and used to infer the founder mosaic genotypes of the intercross individuals. The application of this approach and the quality of the estimated genome-wide genotypes are illustrated in a large F2 pedigree between two divergently selected lines of chickens. RESULTS We describe how we obtained whole-genome genotype data for hundreds of individuals in a cost- and time-efficient manner by using a Tn5-based library preparation protocol and an imputation algorithm that was optimized for this application. In total, 7.6 million markers segregated in this pedigree and, within each full-sib family, between 10.0 and 13.7% of these were fully informative, i.e. fixed for alternative alleles in the founders from the divergent lines, and were used for reconstruction of the offspring mosaic genotypes. The genotypes that were estimated based on the low-coverage sequence data were highly consistent (> 95% agreement) with those obtained using individual single nucleotide polymorphism (SNP) genotyping. The estimated resolution of the inferred recombination breakpoints was relatively high, with 50% of them being defined on regions shorter than 10 kb. CONCLUSIONS A method and software for inferring founder mosaic genotypes in intercross offspring from low-coverage whole-genome sequencing in pedigrees from heterozygous founders are described. They provide high-quality, high-resolution genotypes in a time- and cost-efficient manner. The software is freely available at https://github.com/CarlborgGenomics/Stripes .
Collapse
Affiliation(s)
- Yanjun Zan
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Thibaut Payen
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Mette Lillie
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Christa F Honaker
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Paul B Siegel
- Department of Animal and Poultry Sciences, Virginia Polytechnic Institute and State University, Blacksburg, VA, USA
| | - Örjan Carlborg
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
| |
Collapse
|
18
|
Zheng C, Boer MP, van Eeuwijk FA. Construction of Genetic Linkage Maps in Multiparental Populations. Genetics 2019; 212:1031-1044. [PMID: 31182487 PMCID: PMC6707453 DOI: 10.1534/genetics.119.302229] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2019] [Accepted: 06/04/2019] [Indexed: 11/18/2022] Open
Abstract
Construction of genetic linkage maps has become a routine step for mapping quantitative trait loci (QTL), particularly in animal and plant breeding populations. Many multiparental populations have recently been produced to increase genetic diversity and QTL mapping resolution. However, few software packages are available for map construction in these populations. In this paper, we build a general framework for the construction of genetic linkage maps from genotypic data in diploid populations, including bi- and multiparental populations, cross-pollinated (CP) populations, and breeding pedigrees. The framework is implemented as an automatic pipeline called magicMap, where the maximum multilocus likelihood approach utilizes genotypic information efficiently. We evaluate magicMap by extensive simulations and eight real datasets: one biparental, one CP, four multiparent advanced generation intercross (MAGIC), and two nested association mapping (NAM) populations, the number of markers ranging from a few hundred to tens of thousands. Not only is magicMap the only software capable of accommodating all of these designs, it is more accurate and robust to missing genotypes and genotyping errors than commonly used packages.
Collapse
Affiliation(s)
- Chaozhi Zheng
- Biometris, Wageningen University and Research, 6700 AA, The Netherlands
| | - Martin P Boer
- Biometris, Wageningen University and Research, 6700 AA, The Netherlands
| | | |
Collapse
|
19
|
Malmberg MM, Barbulescu DM, Drayton MC, Shinozuka M, Thakur P, Ogaji YO, Spangenberg GC, Daetwyler HD, Cogan NOI. Evaluation and Recommendations for Routine Genotyping Using Skim Whole Genome Re-sequencing in Canola. FRONTIERS IN PLANT SCIENCE 2018; 9:1809. [PMID: 30581450 PMCID: PMC6292936 DOI: 10.3389/fpls.2018.01809] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/09/2018] [Accepted: 11/21/2018] [Indexed: 05/25/2023]
Abstract
Whole genome sequencing offers genome wide, unbiased markers, and inexpensive library preparation. With the cost of sequencing decreasing rapidly, many plant genomes of modest size are amenable to skim whole genome resequencing (skim WGR). The use of skim WGR in diverse sample sets without the use of imputation was evaluated in silico in 149 canola samples representative of global diversity. Fastq files with an average of 10x coverage of the reference genome were used to generate skim samples representing 0.25x, 0.5x, 1x, 2x, 3x, 4x, and 5x sequencing coverage. Applying a pre-defined list of SNPs versus de novo SNP discovery was evaluated. As skim WGR is expected to result in some degree of insufficient allele sampling, all skim coverage levels were filtered at a range of minimum read depths from a relaxed minimum read depth of 2 to a stringent read depth of 5, resulting in 28 list-based SNP sets. As a broad recommendation, genotyping pre-defined SNPs between 1x and 2x coverage with relatively stringent depth filtering is appropriate for a diverse sample set of canola due to a balance between marker number, sufficient accuracy, and sequencing cost, but depends on the intended application. This was experimentally examined in two sample sets with different genetic backgrounds: 1x coverage of 1,590 individuals from 84 Australian spring type four-parent crosses aimed at maximizing diversity as well as one commercial F1 hybrid, and 2x coverage of 379 doubled haploids (DHs) derived from a subset of the four-parent crosses. To determine optimal coverage in a simpler genetic background, the DH sample sequence coverage was further down sampled in silico. The flexible and cost-effective nature of the protocol makes it highly applicable across a range of species and purposes.
Collapse
Affiliation(s)
- M. Michelle Malmberg
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | | | - Michelle C. Drayton
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Maiko Shinozuka
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Preeti Thakur
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - Yvonne O. Ogaji
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
| | - German C. Spangenberg
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Hans D. Daetwyler
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| | - Noel O. I. Cogan
- Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora, VIC, Australia
| |
Collapse
|
20
|
Recursive Algorithms for Modeling Genomic Ancestral Origins in a Fixed Pedigree. G3-GENES GENOMES GENETICS 2018; 8:3231-3245. [PMID: 30068523 PMCID: PMC6169389 DOI: 10.1534/g3.118.200340] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The study of gene flow in pedigrees is of strong interest for the development of quantitative trait loci (QTL) mapping methods in multiparental populations. We developed a Markovian framework for modeling ancestral origins along two homologous chromosomes within individuals in fixed pedigrees. A highly beneficial property of our method is that the size of state space depends linearly or quadratically on the number of pedigree founders, whereas this increases exponentially with pedigree size in alternative methods. To calculate the parameter values of the Markov process, we describe two novel recursive algorithms that differ with respect to the pedigree founders being assumed to be exchangeable or not. Our algorithms apply equally to autosomes and sex chromosomes, another desirable feature of our approach. We tested the accuracy of the algorithms by a million simulations on a pedigree. We demonstrated two applications of the recursive algorithms in multiparental populations: design a breeding scheme for maximizing the overall density of recombination breakpoints and thus the QTL mapping resolution, and incorporate pedigree information into hidden Markov models in ancestral inference from genotypic data; the conditional probabilities and the recombination breakpoint data resulting from ancestral inference can facilitate follow-up QTL mapping. The results show that the generality of the recursive algorithms can greatly increase the application range of genetic analysis such as ancestral inference in multiparental populations.
Collapse
|