1
|
Wang C, Liu L, Yin M, Eller F, Brix H, Wang T, Salojärvi J, Guo W. Genome-wide analysis tracks the emergence of intraspecific polyploids in Phragmites australis. NPJ BIODIVERSITY 2024; 3:29. [PMID: 39354055 PMCID: PMC11445247 DOI: 10.1038/s44185-024-00060-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Accepted: 08/29/2024] [Indexed: 10/03/2024]
Abstract
Polyploidization plays an important role in plant speciation and adaptation. To address the role of polyploidization in grass diversification, we studied Phragmites australis, an invasive species with intraspecific variation in chromosome numbers ranging from 2n = 36 to 144. We utilized a combined analysis of ploidy estimation, phylogeny, population genetics and model simulations to investigate the evolution of P. australis. Using restriction site-associated DNA sequencing (RAD-seq), we conducted a genome-wide analysis of 88 individuals sourced from diverse populations worldwide, revealing the presence of six distinct intraspecific lineages with extensive genetic admixture. Each lineage was characterized by a specific ploidy level, predominantly tetraploid or octoploid, indicative of multiple independent polyploidization events. The population size of each lineage has declined moderately in history while remaining large, except for the North American native and the US Land types, which experienced constant population size contraction throughout their history. Our investigation did not identify direct association between polyploidization events and grass invasions. Nonetheless, we observed octoploid and hexaploid lineages at contact zones in Romania, Hungary, and South Africa, suggestively due to genomic conflicts arising from allotetraploid parental lineages.
Collapse
Affiliation(s)
- Cui Wang
- Institute of Ecology and Biodiversity, School of Life Sciences, Shandong University, Qingdao, China
- Shandong Provincial Engineering and Technology Research Center for Vegetation Ecology, Shandong University, Qingdao, China
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Viikinkaari 1, Biocentre 3, Helsinki, Finland
| | - Lele Liu
- Institute of Ecology and Biodiversity, School of Life Sciences, Shandong University, Qingdao, China
- Shandong Provincial Engineering and Technology Research Center for Vegetation Ecology, Shandong University, Qingdao, China
| | - Meiqi Yin
- Institute of Ecology and Biodiversity, School of Life Sciences, Shandong University, Qingdao, China
- Shandong Provincial Engineering and Technology Research Center for Vegetation Ecology, Shandong University, Qingdao, China
| | | | - Hans Brix
- Department of Biology, Aarhus University, Aarhus, Denmark
| | - Tong Wang
- College of Landscape Architecture and Forestry, Qingdao Agricultural University, Qingdao, China
| | - Jarkko Salojärvi
- Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Viikinkaari 1, Biocentre 3, Helsinki, Finland.
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore.
| | - Weihua Guo
- Institute of Ecology and Biodiversity, School of Life Sciences, Shandong University, Qingdao, China.
- Shandong Provincial Engineering and Technology Research Center for Vegetation Ecology, Shandong University, Qingdao, China.
| |
Collapse
|
2
|
Phillips AR. Variant calling in polyploids for population and quantitative genetics. APPLICATIONS IN PLANT SCIENCES 2024; 12:e11607. [PMID: 39184203 PMCID: PMC11342233 DOI: 10.1002/aps3.11607] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/27/2023] [Revised: 03/03/2024] [Accepted: 04/10/2024] [Indexed: 08/27/2024]
Abstract
Advancements in genome assembly and sequencing technology have made whole genome sequence (WGS) data and reference genomes accessible to study polyploid species. Compared to popular reduced-representation sequencing approaches, the genome-wide coverage and greater marker density provided by WGS data can greatly improve our understanding of polyploid species and polyploid biology. However, biological features that make polyploid species interesting also pose challenges in read mapping, variant identification, and genotype estimation. Accounting for characteristics in variant calling like allelic dosage uncertainty, homology between subgenomes, and variance in chromosome inheritance mode can reduce errors. Here, I discuss the challenges of variant calling in polyploid WGS data and discuss where potential solutions can be integrated into a standard variant calling pipeline.
Collapse
Affiliation(s)
- Alyssa R. Phillips
- Department of Evolution and EcologyUniversity of California, DavisDavis95616CaliforniaUSA
| |
Collapse
|
3
|
Transcriptome Analysis Reveals Potential Mechanism in Storage Protein Trafficking within Developing Grains of Common Wheat. Int J Mol Sci 2022; 23:ijms232314851. [PMID: 36499182 PMCID: PMC9738083 DOI: 10.3390/ijms232314851] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/07/2022] [Accepted: 10/18/2022] [Indexed: 12/03/2022] Open
Abstract
Gluten proteins are the major storage protein fraction in the mature wheat grain. They are restricted to the starchy endosperm, which defines the viscoelastic properties of wheat dough. The synthesis of these storage proteins is controlled by the endoplasmic reticulum (ER) and is directed into the vacuole via the Golgi apparatus. In the present study, transcriptome analysis was used to explore the potential mechanism within critical stages of grain development of wheat cultivar "Shaannong 33" and its sister line used as the control (CK). Samples were collected at 10 DPA (days after anthesis), 14 DPA, 20 DPA, and 30 DPA for transcriptomic analysis. The comparative transcriptome analysis identified that a total of 18,875 genes were differentially expressed genes (DEGs) between grains of four groups "T10 vs. CK10, T14 vs. CK14, T20 vs. CK20, and T30 vs. CK30", including 2824 up-regulated and 5423 down-regulated genes in T30 vs. CK30. Further, the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment highlighted the maximum number of genes regulating protein processing in the endoplasmic reticulum (ER) during grain enlargement stages (10-20 DPA). In addition, KEGG database analysis reported 1362 and 788 DEGs involved in translation, ribosomal structure, biogenesis, flavonoid biosynthesis pathway and intracellular trafficking, secretion, and vesicular transport through protein processing within ER pathway (ko04141). Notably, consistent with the higher expression of intercellular storage protein trafficking genes at the initial 10 DPA, there was relatively low expression at later stages. Expression levels of nine randomly selected genes were verified by qRT-PCR, which were consistent with the transcriptome data. These data suggested that the initial stages of "cell division" played a significant role in protein quality control within the ER, thus maintaining the protein quality characteristics at grain maturity. Furthermore, our data suggested that the protein synthesis, folding, and trafficking pathways directed by a different number of genes during the grain enlargement stage contributed to the observed high-quality characteristics of gluten protein in Shaannong 33 (Triticum aestivum L.).
Collapse
|
4
|
Naranjo-Ortiz MA, Molina M, Fuentes D, Mixão V, Gabaldón T. Karyon: a computational framework for the diagnosis of hybrids, aneuploids, and other nonstandard architectures in genome assemblies. Gigascience 2022; 11:6751106. [PMID: 36205401 PMCID: PMC9540331 DOI: 10.1093/gigascience/giac088] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2021] [Revised: 11/23/2021] [Accepted: 08/24/2022] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND Recent technological developments have made genome sequencing and assembly highly accessible and widely used. However, the presence in sequenced organisms of certain genomic features such as high heterozygosity, polyploidy, aneuploidy, heterokaryosis, or extreme compositional biases can challenge current standard assembly procedures and result in highly fragmented assemblies. Hence, we hypothesized that genome databases must contain a nonnegligible fraction of low-quality assemblies that result from such type of intrinsic genomic factors. FINDINGS Here we present Karyon, a Python-based toolkit that uses raw sequencing data and de novo genome assembly to assess several parameters and generate informative plots to assist in the identification of nonchanonical genomic traits. Karyon includes automated de novo genome assembly and variant calling pipelines. We tested Karyon by diagnosing 35 highly fragmented publicly available assemblies from 19 different Mucorales (Fungi) species. CONCLUSIONS Our results show that 10 (28.57%) of the assemblies presented signs of unusual genomic configurations, suggesting that these are common, at least for some lineages within the Fungi.
Collapse
Affiliation(s)
- Miguel A Naranjo-Ortiz
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain,Health and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain,Biology Department, Clark University, Worcester, MA 01610, USA,Naturhistoriskmuseum, University of Oslo, Oslo 0562, Norway
| | - Manu Molina
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain,Health and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain,Life Sciences Department, Barcelona Supercomputing Centre (BSC-CNS), Barcelona 08034, Spain
| | - Diego Fuentes
- Life Sciences Department, Barcelona Supercomputing Centre (BSC-CNS), Barcelona 08034, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Verónica Mixão
- Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona 08003, Spain,Health and Life Sciences, Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain,Life Sciences Department, Barcelona Supercomputing Centre (BSC-CNS), Barcelona 08034, Spain,Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute of Science and Technology, Barcelona 08028, Spain
| | - Toni Gabaldón
- Correspondence address. Toni Gabaldón, Plaça Eusebi Güell, 1-3, Barcelona 08034, Spain. E-mail:
| |
Collapse
|
5
|
Margarido GRA, Correr FH, Furtado A, Botha FC, Henry RJ. Limited allele-specific gene expression in highly polyploid sugarcane. Genome Res 2022; 32:297-308. [PMID: 34949669 PMCID: PMC8805727 DOI: 10.1101/gr.275904.121] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2021] [Accepted: 12/19/2021] [Indexed: 12/04/2022]
Abstract
Polyploidy is widespread in plants, allowing the different copies of genes to be expressed differently in a tissue-specific or developmentally specific way. This allele-specific expression (ASE) has been widely reported, but the proportion and nature of genes showing this characteristic have not been well defined. We now report an analysis of the frequency and patterns of ASE at the whole-genome level in the highly polyploid sugarcane genome. Very high depth whole-genome sequencing and RNA sequencing revealed strong correlations between allelic proportions in the genome and in expressed sequences. This level of sequencing allowed discrimination of each of the possible allele doses in this 12-ploid genome. Most genes were expressed in direct proportion to the frequency of the allele in the genome with examples of polymorphisms being found with every possible discrete level of dose from 1:11 for single-copy alleles to 12:0 for monomorphic sites. The rarer cases of ASE were more frequent in the expression of defense-response genes, as well as in some processes related to the biosynthesis of cell walls. ASE was more common in genes with variants that resulted in significant disruption of function. The low level of ASE may reflect the recent origin of polyploid hybrid sugarcane. Much of the ASE present can be attributed to strong selection for resistance to diseases in both nature and domestication.
Collapse
Affiliation(s)
- Gabriel Rodrigues Alves Margarido
- Department of Genetics, University of São Paulo, "Luiz de Queiroz" College of Agriculture, Piracicaba 13418-900, Brazil
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane 4072, Australia
| | - Fernando Henrique Correr
- Department of Genetics, University of São Paulo, "Luiz de Queiroz" College of Agriculture, Piracicaba 13418-900, Brazil
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane 4072, Australia
| | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane 4072, Australia
| | - Frederik C Botha
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane 4072, Australia
| | - Robert James Henry
- Queensland Alliance for Agriculture and Food Innovation, University of Queensland, Brisbane 4072, Australia
| |
Collapse
|
6
|
Wang L, Yang J, Zhang H, Tao Q, Zhang Y, Dang Z, Zhang F, Luo Z. Sequence coverage required for accurate genotyping by sequencing in polyploid species. Mol Ecol Resour 2021; 22:1417-1426. [PMID: 34826191 DOI: 10.1111/1755-0998.13558] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 11/12/2021] [Accepted: 11/15/2021] [Indexed: 11/29/2022]
Abstract
Polyploidy plays an important role in the evolution of eukaryotes, especially for flowering plants. Many of ecologically or agronomically important plant or crop species are polyploids, including sycamore maple (tetraploid), the world second and third largest food crops wheat (hexaploid) and potato (tetraploid) as well as economically important aquaculture animals such as Atlantic salmon and trout. The next generation sequencing data enables to allocate genotype at a sequence variant site, known as genotyping by sequencing (GBS). GBS has stimulated enormous interests in population based genomics studies in almost all diploid and many polyploid organisms. DNA sequence polymorphisms are codominant and thus fully informative about the underlying genotype at the polymorphic site, making GBS a straightforward task in diploids. However, sequence data may usually be uninformative in polyploid species, making GBS a far more challenging task in polyploids. This paper presents novel and rigorous statistical methods for predicting the number of sequence reads needed to ensure accurate GBS at a polymorphic site bared by the reads in polyploids and shows that a dozen of reads can ensure a probability of 95% to recover all constituent alleles of any tetraploid genotype but several hundreds of reads are needed to accurately uncover the genotype with probability confidence of 90%, subverting the proposition of GBS using low coverage sequence data in the literature. The theoretical prediction was tested by use of RAD-seq data from tetraploid potato cultivars. The paper provides polyploid experimentalists with theoretical guides and methods for designing and conducting their sequence-based studies.
Collapse
Affiliation(s)
- Lin Wang
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Jixuan Yang
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Hong Zhang
- Department of Statistics and Finance, University of Science and Technology of China, Hefei, China
| | - Qin Tao
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Yuxin Zhang
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Zhenyu Dang
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Fengjun Zhang
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China
| | - Zewei Luo
- Laboratory of Population and Quantitative Genetics, Institute of Biostatistics, School of Life Sciences, Fudan University, Shanghai, China.,School of Biosciences, University of Birmingham, Birmingham, UK
| |
Collapse
|
7
|
Khalil AIS, Chattopadhyay A, Sanyal A. Analysis of Aneuploidy Spectrum From Whole-Genome Sequencing Provides Rapid Assessment of Clonal Variation Within Established Cancer Cell Lines. Cancer Inform 2021; 20:11769351211049236. [PMID: 34671179 PMCID: PMC8521761 DOI: 10.1177/11769351211049236] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2021] [Accepted: 09/02/2021] [Indexed: 11/16/2022] Open
Abstract
BACKGROUND The revolution in next-generation sequencing (NGS) technology has allowed easy access and sharing of high-throughput sequencing datasets of cancer cell lines and their integrative analyses. However, long-term passaging and culture conditions introduce high levels of genomic and phenotypic diversity in established cell lines resulting in strain differences. Thus, clonal variation in cultured cell lines with respect to the reference standard is a major barrier in systems biology data analyses. Therefore, there is a pressing need for a fast and entry-level assessment of clonal variations within cell lines using their high-throughput sequencing data. RESULTS We developed a Python-based software, AStra, for de novo estimation of the genome-wide segmental aneuploidy to measure and visually interpret strain-level similarities or differences of cancer cell lines from whole-genome sequencing (WGS). We demonstrated that aneuploidy spectrum can capture the genetic variations in 27 strains of MCF7 breast cancer cell line collected from different laboratories. Performance evaluation of AStra using several cancer sequencing datasets revealed that cancer cell lines exhibit distinct aneuploidy spectra which reflect their previously-reported karyotypic observations. Similarly, AStra successfully identified large-scale DNA copy number variations (CNVs) artificially introduced in simulated WGS datasets. CONCLUSIONS AStra provides an analytical and visualization platform for rapid and easy comparison between different strains or between cell lines based on their aneuploidy spectra solely using the raw BAM files representing mapped reads. We recommend AStra for rapid first-pass quality assessment of cancer cell lines before integrating scientific datasets that employ deep sequencing. AStra is an open-source software and is available at https://github.com/AISKhalil/AStra.
Collapse
Affiliation(s)
| | - Anupam Chattopadhyay
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| | - Amartya Sanyal
- School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
8
|
Chen J, Leach L, Yang J, Zhang F, Tao Q, Dang Z, Chen Y, Luo Z. A tetrasomic inheritance model and likelihood-based method for mapping quantitative trait loci in autotetraploid species. THE NEW PHYTOLOGIST 2021; 230:387-398. [PMID: 31913501 PMCID: PMC7984458 DOI: 10.1111/nph.16413] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 12/20/2019] [Indexed: 06/10/2023]
Abstract
Dissecting the genetic architecture of quantitative traits in autotetraploid species is a methodologically challenging task, but a pivotally important goal for breeding globally important food crops, including potato and blueberry, and ornamental species such as rose. Mapping quantitative trait loci (QTLs) is now a routine practice in diploid species but is far less advanced in autotetraploids, largely due to a lack of analytical methods that account for the complexities of tetrasomic inheritance. We present a novel likelihood-based method for QTL mapping in outbred segregating populations of autotetraploid species. The method accounts properly for sophisticated features of gene segregation and recombination in an autotetraploid meiosis. It may model and analyse molecular marker data with or without allele dosage information, such as that from microarray or sequencing experiments. The method developed outperforms existing bivalent-based methods, which may fail to model and analyse the full spectrum of experimental data, in the statistical power of QTL detection, and accuracy of QTL location, as demonstrated by an intensive simulation study and analysis of data sets collected from a segregating population of potato (Solanum tuberosum). The study enables QTL mapping analysis to be conducted in autotetraploid species under a rigorous tetrasomic inheritance model.
Collapse
Affiliation(s)
- Jing Chen
- School of BiosciencesThe University of BirminghamBirminghamB15 2TTUK
| | - Lindsey Leach
- School of BiosciencesThe University of BirminghamBirminghamB15 2TTUK
| | - Jixuan Yang
- Institute of BiostatisticsFudan UniversityShanghai200433China
| | - Fengjun Zhang
- Institute of BiostatisticsFudan UniversityShanghai200433China
- Qinghai Academy of Agricultural and Forestry SciencesXiningQinghai810016China
| | - Qin Tao
- Institute of BiostatisticsFudan UniversityShanghai200433China
| | - Zhenyu Dang
- Institute of BiostatisticsFudan UniversityShanghai200433China
| | - Yue Chen
- Institute of BiostatisticsFudan UniversityShanghai200433China
| | - Zewei Luo
- School of BiosciencesThe University of BirminghamBirminghamB15 2TTUK
- Institute of BiostatisticsFudan UniversityShanghai200433China
| |
Collapse
|
9
|
Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris. G3-GENES GENOMES GENETICS 2019; 9:3409-3421. [PMID: 31427456 PMCID: PMC6778806 DOI: 10.1534/g3.119.400357] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.
Collapse
|
10
|
Viruel J, Conejero M, Hidalgo O, Pokorny L, Powell RF, Forest F, Kantar MB, Soto Gomez M, Graham SW, Gravendeel B, Wilkin P, Leitch IJ. A Target Capture-Based Method to Estimate Ploidy From Herbarium Specimens. FRONTIERS IN PLANT SCIENCE 2019; 10:937. [PMID: 31396248 PMCID: PMC6667659 DOI: 10.3389/fpls.2019.00937] [Citation(s) in RCA: 29] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/20/2019] [Accepted: 07/04/2019] [Indexed: 05/24/2023]
Abstract
Whole genome duplication (WGD) events are common in many plant lineages, but the ploidy status and possible occurrence of intraspecific ploidy variation are unknown for most species. Standard methods for ploidy determination are chromosome counting and flow cytometry approaches. While flow cytometry approaches typically use fresh tissue, an increasing number of studies have shown that recently dried specimens can be used to yield ploidy data. Recent studies have started to explore whether high-throughput sequencing (HTS) data can be used to assess ploidy levels by analyzing allelic frequencies from single copy nuclear genes. Here, we compare different approaches using a range of yam (Dioscorea) tissues of varying ages, drying methods and quality, including herbarium tissue. Our aims were to: (1) explore the limits of flow cytometry in estimating ploidy level from dried samples, including herbarium vouchers collected between 1831 and 2011, and (2) optimize a HTS-based method to estimate ploidy by considering allelic frequencies from nuclear genes obtained using a target-capture method. We show that, although flow cytometry can be used to estimate ploidy levels from herbarium specimens collected up to fifteen years ago, success rate is low (5.9%). We validated our HTS-based estimates of ploidy using 260 genes by benchmarking with dried samples of species of known ploidy (Dioscorea alata, D. communis, and D. sylvatica). Subsequently, we successfully applied the method to the 85 herbarium samples analyzed with flow cytometry, and successfully provided results for 91.7% of them, comprising species across the phylogenetic tree of Dioscorea. We also explored the limits of using this HTS-based approach for identifying high ploidy levels in herbarium material and the effects of heterozygosity and sequence coverage. Overall, we demonstrated that ploidy diversity within and between species may be ascertained from historical collections, allowing the determination of polyploidization events from samples collected up to two centuries ago. This approach has the potential to provide insights into the drivers and dynamics of ploidy level changes during plant evolution and crop domestication.
Collapse
Affiliation(s)
- Juan Viruel
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | | | - Oriane Hidalgo
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
- Laboratori de Botànica, Facultat de Farmàcia i Ciències de l’Alimentació, Universitat de Barcelona, Barcelona, Spain
| | - Lisa Pokorny
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | | | - Félix Forest
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | - Michael B. Kantar
- Department of Tropical Plant and Soil Sciences, University of Hawai’i at Mânoa, Honolulu, HI, United States
| | - Marybel Soto Gomez
- Department of Botany, University of British Columbia, Vancouver, BC, Canada
- UBC Botanical Garden & Centre for Plant Research, University of British Columbia, Vancouver, BC, Canada
| | - Sean W. Graham
- Department of Botany, University of British Columbia, Vancouver, BC, Canada
- UBC Botanical Garden & Centre for Plant Research, University of British Columbia, Vancouver, BC, Canada
| | - Barbara Gravendeel
- Naturalis Biodiversity Center, Endless Forms, Leiden, Netherlands
- Institute of Biology Leiden, Leiden University, Leiden, Netherlands
- Science and Technology Faculty, University of Applied Sciences Leiden, Leiden, Netherlands
| | - Paul Wilkin
- Royal Botanic Gardens, Kew, Richmond, United Kingdom
| | | |
Collapse
|
11
|
Ferreira RCU, Lara LADC, Chiari L, Barrios SCL, do Valle CB, Valério JR, Torres FZV, Garcia AAF, de Souza AP. Genetic Mapping With Allele Dosage Information in Tetraploid Urochloa decumbens (Stapf) R. D. Webster Reveals Insights Into Spittlebug ( Notozulia entreriana Berg) Resistance. FRONTIERS IN PLANT SCIENCE 2019; 10:92. [PMID: 30873183 PMCID: PMC6401981 DOI: 10.3389/fpls.2019.00092] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2018] [Accepted: 01/21/2019] [Indexed: 05/08/2023]
Abstract
Urochloa decumbens (Stapf) R. D. Webster is one of the most important African forage grasses in Brazilian beef production. Currently available genetic-genomic resources for this species are restricted mainly due to polyploidy and apomixis. Therefore, crucial genomic-molecular studies such as the construction of genetic maps and the mapping of quantitative trait loci (QTLs) are very challenging and consequently affect the advancement of molecular breeding. The objectives of this work were to (i) construct an integrated U. decumbens genetic map for a full-sibling progeny using GBS-based markers with allele dosage information, (ii) detect QTLs for spittlebug (Notozulia entreriana) resistance, and (iii) seek putative candidate genes involved in defense against biotic stresses. We used the Setaria viridis genome a reference to align GBS reads and selected 4,240 high-quality SNP markers with allele dosage information. Of these markers, 1,000 were distributed throughout nine homologous groups with a cumulative map length of 1,335.09 cM and an average marker density of 1.33 cM. We detected QTLs for resistance to spittlebug, an important pasture insect pest, that explained between 4.66 and 6.24% of the phenotypic variation. These QTLs are in regions containing putative candidate genes related to defense against biotic stresses. Because this is the first genetic map with SNP autotetraploid dosage data and QTL detection in U. decumbens, it will be useful for future evolutionary studies, genome assembly, and other QTL analyses in Urochloa spp. Moreover, the results might facilitate the isolation of spittlebug-related candidate genes and help clarify the mechanism of spittlebug resistance. These approaches will improve selection efficiency and accuracy in U. decumbens molecular breeding and shorten the breeding cycle.
Collapse
Affiliation(s)
| | | | - Lucimara Chiari
- Embrapa Beef Cattle, Brazilian Agricultural Research Corporation, Campo Grande, Brazil
| | | | | | - José Raul Valério
- Embrapa Beef Cattle, Brazilian Agricultural Research Corporation, Campo Grande, Brazil
| | | | | | - Anete Pereira de Souza
- Center for Molecular Biology and Genetic Engineering, University of Campinas, Campinas, Brazil
- Plant Biology Department, Biology Institute, University of Campinas, Campinas, Brazil
- *Correspondence: Anete Pereira de Souza,
| |
Collapse
|
12
|
Tran HT, Ramaraj T, Furtado A, Lee LS, Henry RJ. Use of a draft genome of coffee (Coffea arabica) to identify SNPs associated with caffeine content. PLANT BIOTECHNOLOGY JOURNAL 2018; 16:1756-1766. [PMID: 29509991 PMCID: PMC6131422 DOI: 10.1111/pbi.12912] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Revised: 02/20/2018] [Accepted: 02/24/2018] [Indexed: 05/21/2023]
Abstract
Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50 of 54 544 bp and a total scaffold length of 1448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome, and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99 829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1444 nonsynonymous single nucleotide polymorphisms (SNPs) associated with caffeine content. Based on Kyoto Encyclopaedia of Genes and Genomes pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates, which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee.
Collapse
Affiliation(s)
- Hue T.M. Tran
- Queensland Alliance for Agriculture and Food Innovation (QAAFI)The University of QueenslandSt LuciaQldAustralia
- Western Highlands Agriculture & Forestry Science Institute (WASI)Buon Ma ThuotVietnam
| | | | - Agnelo Furtado
- Queensland Alliance for Agriculture and Food Innovation (QAAFI)The University of QueenslandSt LuciaQldAustralia
| | - Leonard Slade Lee
- Queensland Alliance for Agriculture and Food Innovation (QAAFI)The University of QueenslandSt LuciaQldAustralia
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food Innovation (QAAFI)The University of QueenslandSt LuciaQldAustralia
| |
Collapse
|
13
|
Thirugnanasambandam PP, Hoang NV, Henry RJ. The Challenge of Analyzing the Sugarcane Genome. FRONTIERS IN PLANT SCIENCE 2018; 9:616. [PMID: 29868072 PMCID: PMC5961476 DOI: 10.3389/fpls.2018.00616] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 04/18/2018] [Indexed: 05/04/2023]
Abstract
Reference genome sequences have become key platforms for genetics and breeding of the major crop species. Sugarcane is probably the largest crop produced in the world (in weight of crop harvested) but lacks a reference genome sequence. Sugarcane has one of the most complex genomes in crop plants due to the extreme level of polyploidy. The genome of modern sugarcane hybrids includes sub-genomes from two progenitors Saccharum officinarum and S. spontaneum with some chromosomes resulting from recombination between these sub-genomes. Advancing DNA sequencing technologies and strategies for genome assembly are making the sugarcane genome more tractable. Advances in long read sequencing have allowed the generation of a more complete set of sugarcane gene transcripts. This is supporting transcript profiling in genetic research. The progenitor genomes are being sequenced. A monoploid coverage of the hybrid genome has been obtained by sequencing BAC clones that cover the gene space of the closely related sorghum genome. The complete polyploid genome is now being sequenced and assembled. The emerging genome will allow comparison of related genomes and increase understanding of the functioning of this polyploidy system. Sugarcane breeding for traditional sugar and new energy and biomaterial uses will be enhanced by the availability of these genomic resources.
Collapse
Affiliation(s)
- Prathima P. Thirugnanasambandam
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia
- ICAR - Sugarcane Breeding Institute, Coimbatore, India
| | - Nam V. Hoang
- College of Agriculture and Forestry, Hue University, Hue, Vietnam
| | - Robert J. Henry
- Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia
| |
Collapse
|
14
|
nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinformatics 2018; 19:122. [PMID: 29618319 PMCID: PMC5885312 DOI: 10.1186/s12859-018-2128-z] [Citation(s) in RCA: 82] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 03/22/2018] [Indexed: 11/30/2022] Open
Abstract
Background Intraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats. Results We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker’s yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log- likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies. Conclusions nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuireunder the MIT license. Electronic supplementary material The online version of this article (10.1186/s12859-018-2128-z) contains supplementary material, which is available to authorized users.
Collapse
|
15
|
Weiß CL, Pais M, Cano LM, Kamoun S, Burbano HA. nQuire: a statistical framework for ploidy estimation using next generation sequencing. BMC Bioinformatics 2018; 19:122. [PMID: 29618319 DOI: 10.1101/143537] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2017] [Accepted: 03/22/2018] [Indexed: 05/27/2023] Open
Abstract
BACKGROUND Intraspecific variation in ploidy occurs in a wide range of species including pathogenic and nonpathogenic eukaryotes such as yeasts and oomycetes. Ploidy can be inferred indirectly - without measuring DNA content - from experiments using next-generation sequencing (NGS). We present nQuire, a statistical framework that distinguishes between diploids, triploids and tetraploids using NGS. The command-line tool models the distribution of base frequencies at variable sites using a Gaussian Mixture Model, and uses maximum likelihood to select the most plausible ploidy model. nQuire handles large genomes at high coverage efficiently and uses standard input file formats. RESULTS We demonstrate the utility of nQuire analyzing individual samples of the pathogenic oomycete Phytophthora infestans and the Baker's yeast Saccharomyces cerevisiae. Using these organisms we show the dependence between reliability of the ploidy assignment and sequencing depth. Additionally, we employ normalized maximized log- likelihoods generated by nQuire to ascertain ploidy level in a population of samples with ploidy heterogeneity. Using these normalized values we cluster samples in three dimensions using multivariate Gaussian mixtures. The cluster assignments retrieved from a S. cerevisiae population recovered the true ploidy level in over 96% of samples. Finally, we show that nQuire can be used regionally to identify chromosomal aneuploidies. CONCLUSIONS nQuire provides a statistical framework to study organisms with intraspecific variation in ploidy. nQuire is likely to be useful in epidemiological studies of pathogens, artificial selection experiments, and for historical or ancient samples where intact nuclei are not preserved. It is implemented as a stand-alone Linux command line tool in the C programming language and is available at https://github.com/clwgg/nQuire under the MIT license.
Collapse
Affiliation(s)
- Clemens L Weiß
- Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tuebingen, Germany
| | | | - Liliana M Cano
- The Sainsbury Laboratory, Norwich, UK
- Department of Plant Pathology, Indian River Research and Education Center, University of Florida, Fort Pierce, USA
| | | | - Hernán A Burbano
- Research Group for Ancient Genomics and Evolution, Department of Molecular Biology, Max Planck Institute for Developmental Biology, Tuebingen, Germany.
| |
Collapse
|
16
|
Augusto Corrêa Dos Santos R, Goldman GH, Riaño-Pachón DM. ploidyNGS: visually exploring ploidy with Next Generation Sequencing data. Bioinformatics 2018; 33:2575-2576. [PMID: 28383704 DOI: 10.1093/bioinformatics/btx204] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 04/04/2017] [Indexed: 11/12/2022] Open
Abstract
Summary ploidyNGS is a model-free, open source tool to visualize and explore ploidy levels in a newly sequenced genome, exploiting short read data. We tested ploidyNGS using both simulated and real NGS data of the model yeast Saccharomyces cerevisiae. ploidyNGS allows the identification of the ploidy level of a newly sequenced genome in a visual way. Availability and Implementation ploidyNGS is available under the GNU General Public License (GPL) at https://github.com/diriano/ploidyNGS. ploidyNGS is implemented in Python and R. Contact diriano@gmail.com.
Collapse
Affiliation(s)
| | - Gustavo Henrique Goldman
- Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Preto
| | - Diego Mauricio Riaño-Pachón
- Centro Nacional de Pesquisa em Energia e Materiais (CNPEM), Campinas.,Laboratório de Biologia de Sistemas Regulatórios, Departamento de Bioquímica, Instituto de Química, Universidade de São Paulo, São Paulo, Brazil
| |
Collapse
|
17
|
West PT, Probst AJ, Grigoriev IV, Thomas BC, Banfield JF. Genome-reconstruction for eukaryotes from complex natural microbial communities. Genome Res 2018; 28:569-580. [PMID: 29496730 PMCID: PMC5880246 DOI: 10.1101/gr.228429.117] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2017] [Accepted: 02/27/2018] [Indexed: 11/24/2022]
Abstract
Microbial eukaryotes are integral components of natural microbial communities, and their inclusion is critical for many ecosystem studies, yet the majority of published metagenome analyses ignore eukaryotes. In order to include eukaryotes in environmental studies, we propose a method to recover eukaryotic genomes from complex metagenomic samples. A key step for genome recovery is separation of eukaryotic and prokaryotic fragments. We developed a k-mer-based strategy, EukRep, for eukaryotic sequence identification and applied it to environmental samples to show that it enables genome recovery, genome completeness evaluation, and prediction of metabolic potential. We used this approach to test the effect of addition of organic carbon on a geyser-associated microbial community and detected a substantial change of the community metabolism, with selection against almost all candidate phyla bacteria and archaea and for eukaryotes. Near complete genomes were reconstructed for three fungi placed within the Eurotiomycetes and an arthropod. While carbon fixation and sulfur oxidation were important functions in the geyser community prior to carbon addition, the organic carbon-impacted community showed enrichment for secreted proteases, secreted lipases, cellulose targeting CAZymes, and methanol oxidation. We demonstrate the broader utility of EukRep by reconstructing and evaluating relatively high-quality fungal, protist, and rotifer genomes from complex environmental samples. This approach opens the way for cultivation-independent analyses of whole microbial communities.
Collapse
Affiliation(s)
- Patrick T West
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA
| | - Alexander J Probst
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA
| | - Igor V Grigoriev
- Department of Plant and Microbial Biology, University of California, Berkeley, California 94720, USA.,US Department of Energy Joint Genome Institute, Walnut Creek, California 94598, USA
| | - Brian C Thomas
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA
| | - Jillian F Banfield
- Department of Earth and Planetary Science, University of California, Berkeley, California 94709, USA.,Department of Environmental Science, Policy, and Management, University of California, Berkeley, California 94720, USA.,Earth Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
| |
Collapse
|
18
|
Kyriakidou M, Tai HH, Anglin NL, Ellis D, Strömvik MV. Current Strategies of Polyploid Plant Genome Sequence Assembly. FRONTIERS IN PLANT SCIENCE 2018; 9:1660. [PMID: 30519250 PMCID: PMC6258962 DOI: 10.3389/fpls.2018.01660] [Citation(s) in RCA: 109] [Impact Index Per Article: 18.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2018] [Accepted: 10/25/2018] [Indexed: 05/14/2023]
Abstract
Polyploidy or duplication of an entire genome occurs in the majority of angiosperms. The understanding of polyploid genomes is important for the improvement of those crops, which humans rely on for sustenance and basic nutrition. As climate change continues to pose a potential threat to agricultural production, there will increasingly be a demand for plant cultivars that can resist biotic and abiotic stresses and also provide needed and improved nutrition. In the past decade, Next Generation Sequencing (NGS) has fundamentally changed the genomics landscape by providing tools for the exploration of polyploid genomes. Here, we review the challenges of the assembly of polyploid plant genomes, and also present recent advances in genomic resources and functional tools in molecular genetics and breeding. As genomes of diploid and less heterozygous progenitor species are increasingly available, we discuss the lack of complexity of these currently available reference genomes as they relate to polyploid crops. Finally, we review recent approaches of haplotyping by phasing and the impact of third generation technologies on polyploid plant genome assembly.
Collapse
Affiliation(s)
- Maria Kyriakidou
- Department of Plant Science, McGill University, Montreal, QC, Canada
| | - Helen H. Tai
- Fredericton Research and Development Centre, Agriculture and Agri-Food Canada, Fredericton, NB, Canada
| | | | | | - Martina V. Strömvik
- Department of Plant Science, McGill University, Montreal, QC, Canada
- *Correspondence: Martina V. Strömvik
| |
Collapse
|
19
|
Gompert Z, Mock KE. Detection of individual ploidy levels with genotyping‐by‐sequencing (GBS) analysis. Mol Ecol Resour 2017; 17:1156-1167. [DOI: 10.1111/1755-0998.12657] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2016] [Revised: 01/23/2017] [Accepted: 01/25/2017] [Indexed: 11/29/2022]
Affiliation(s)
- Zachariah Gompert
- Department of Biology and the Ecology Center Utah State University 5305 Old Main Hill Logan UT 84322‐5305 USA
| | - Karen E. Mock
- Wildland Resources Department and the Ecology Center Utah State University Logan UT 84322 USA
| |
Collapse
|
20
|
Balsalobre TWA, da Silva Pereira G, Margarido GRA, Gazaffi R, Barreto FZ, Anoni CO, Cardoso-Silva CB, Costa EA, Mancini MC, Hoffmann HP, de Souza AP, Garcia AAF, Carneiro MS. GBS-based single dosage markers for linkage and QTL mapping allow gene mining for yield-related traits in sugarcane. BMC Genomics 2017; 18:72. [PMID: 28077090 PMCID: PMC5225503 DOI: 10.1186/s12864-016-3383-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 12/07/2016] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Sugarcane (Saccharum spp.) is predominantly an autopolyploid plant with a variable ploidy level, frequent aneuploidy and a large genome that hampers investigation of its organization. Genetic architecture studies are important for identifying genomic regions associated with traits of interest. However, due to the genetic complexity of sugarcane, the practical applications of genomic tools have been notably delayed in this crop, in contrast to other crops that have already advanced to marker-assisted selection (MAS) and genomic selection. High-throughput next-generation sequencing (NGS) technologies have opened new opportunities for discovering molecular markers, especially single nucleotide polymorphisms (SNPs) and insertion-deletion (indels), at the genome-wide level. The objectives of this study were to (i) establish a pipeline for identifying variants from genotyping-by-sequencing (GBS) data in sugarcane, (ii) construct an integrated genetic map with GBS-based markers plus target region amplification polymorphisms and microsatellites, (iii) detect QTLs related to yield component traits, and (iv) perform annotation of the sequences that originated the associated markers with mapped QTLs to search putative candidate genes. RESULTS We used four pseudo-references to align the GBS reads. Depending on the reference, from 3,433 to 15,906 high-quality markers were discovered, and half of them segregated as single-dose markers (SDMs) on average. In addition to 7,049 non-redundant SDMs from GBS, 629 gel-based markers were used in a subsequent linkage analysis. Of 7,678 SDMs, 993 were mapped. These markers were distributed throughout 223 linkage groups, which were clustered in 18 homo(eo)logous groups (HGs), with a cumulative map length of 3,682.04 cM and an average marker density of 3.70 cM. We performed QTL mapping of four traits and found seven QTLs. Our results suggest the presence of a stable QTL across locations. Furthermore, QTLs to soluble solid content (BRIX) and fiber content (FIB) traits had markers linked to putative candidate genes. CONCLUSIONS This study is the first to report the use of GBS for large-scale variant discovery and genotyping of a mapping population in sugarcane, providing several insights regarding the use of NGS data in a polyploid, non-model species. The use of GBS generated a large number of markers and still enabled ploidy and allelic dosage estimation. Moreover, we were able to identify seven QTLs, two of which had great potential for validation and future use for molecular breeding in sugarcane.
Collapse
Affiliation(s)
- Thiago Willian Almeida Balsalobre
- Departamento de Biotecnologia e Produção Vegetal e Animal, Centro de Ciências Agrárias, Universidade Federal de São Carlos, Rodovia Anhanguera, Km 174, Araras, CEP 13600-970 São Paulo Brazil
- Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Avenida Monteiro Lobato 255, Campinas, CEP 13083-862 São Paulo Brazil
- Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, Avenida Candido Rondon 400, Campinas, CEP 13083-875 São Paulo Brazil
| | - Guilherme da Silva Pereira
- Departamento de Genética, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Avenida Pádua Dias 11, Piracicaba, CEP 13418-900 São Paulo Brazil
| | - Gabriel Rodrigues Alves Margarido
- Departamento de Genética, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Avenida Pádua Dias 11, Piracicaba, CEP 13418-900 São Paulo Brazil
| | - Rodrigo Gazaffi
- Departamento de Biotecnologia e Produção Vegetal e Animal, Centro de Ciências Agrárias, Universidade Federal de São Carlos, Rodovia Anhanguera, Km 174, Araras, CEP 13600-970 São Paulo Brazil
| | - Fernanda Zatti Barreto
- Departamento de Biotecnologia e Produção Vegetal e Animal, Centro de Ciências Agrárias, Universidade Federal de São Carlos, Rodovia Anhanguera, Km 174, Araras, CEP 13600-970 São Paulo Brazil
| | - Carina Oliveira Anoni
- Departamento de Genética, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Avenida Pádua Dias 11, Piracicaba, CEP 13418-900 São Paulo Brazil
| | - Cláudio Benício Cardoso-Silva
- Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Avenida Monteiro Lobato 255, Campinas, CEP 13083-862 São Paulo Brazil
- Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, Avenida Candido Rondon 400, Campinas, CEP 13083-875 São Paulo Brazil
| | - Estela Araújo Costa
- Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Avenida Monteiro Lobato 255, Campinas, CEP 13083-862 São Paulo Brazil
- Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, Avenida Candido Rondon 400, Campinas, CEP 13083-875 São Paulo Brazil
| | - Melina Cristina Mancini
- Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Avenida Monteiro Lobato 255, Campinas, CEP 13083-862 São Paulo Brazil
- Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, Avenida Candido Rondon 400, Campinas, CEP 13083-875 São Paulo Brazil
| | - Hermann Paulo Hoffmann
- Departamento de Biotecnologia e Produção Vegetal e Animal, Centro de Ciências Agrárias, Universidade Federal de São Carlos, Rodovia Anhanguera, Km 174, Araras, CEP 13600-970 São Paulo Brazil
| | - Anete Pereira de Souza
- Departamento de Biologia Vegetal, Instituto de Biologia, Universidade Estadual de Campinas, Avenida Monteiro Lobato 255, Campinas, CEP 13083-862 São Paulo Brazil
- Centro de Biologia Molecular e Engenharia Genética, Universidade Estadual de Campinas, Avenida Candido Rondon 400, Campinas, CEP 13083-875 São Paulo Brazil
| | - Antonio Augusto Franco Garcia
- Departamento de Genética, Escola Superior de Agricultura Luiz de Queiroz, Universidade de São Paulo, Avenida Pádua Dias 11, Piracicaba, CEP 13418-900 São Paulo Brazil
| | - Monalisa Sampaio Carneiro
- Departamento de Biotecnologia e Produção Vegetal e Animal, Centro de Ciências Agrárias, Universidade Federal de São Carlos, Rodovia Anhanguera, Km 174, Araras, CEP 13600-970 São Paulo Brazil
| |
Collapse
|