1
|
Aalborg T, Sverrisdóttir E, Kristensen HT, Nielsen KL. The effect of marker types and density on genomic prediction and GWAS of key performance traits in tetraploid potato. FRONTIERS IN PLANT SCIENCE 2024; 15:1340189. [PMID: 38525152 PMCID: PMC10957621 DOI: 10.3389/fpls.2024.1340189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/14/2024] [Indexed: 03/26/2024]
Abstract
Genomic prediction and genome-wide association studies are becoming widely employed in potato key performance trait QTL identifications and to support potato breeding using genomic selection. Elite cultivars are tetraploid and highly heterozygous but also share many common ancestors and generation-spanning inbreeding events, resulting from the clonal propagation of potatoes through seed potatoes. Consequentially, many SNP markers are not in a 1:1 relationship with a single allele variant but shared over several alleles that might exert varying effects on a given trait. The impact of such redundant "diluted" predictors on the statistical models underpinning genome-wide association studies (GWAS) and genomic prediction has scarcely been evaluated despite the potential impact on model accuracy and performance. We evaluated the impact of marker location, marker type, and marker density on the genomic prediction and GWAS of five key performance traits in tetraploid potato (chipping quality, dry matter content, length/width ratio, senescence, and yield). A 762-offspring panel of a diallel cross of 18 elite cultivars was genotyped by sequencing, and markers were annotated according to a reference genome. Genomic prediction models (GBLUP) were trained on four marker subsets [non-synonymous (29,553 SNPs), synonymous (31,229), non-coding (32,388), and a combination], and robustness to marker reduction was investigated. Single-marker regression GWAS was performed for each trait and marker subset. The best cross-validated prediction correlation coefficients of 0.54, 0.75, 0.49, 0.35, and 0.28 were obtained for chipping quality, dry matter content, length/width ratio, senescence, and yield, respectively. The trait prediction abilities were similar across all marker types, with only non-synonymous variants improving yield predictive ability by 16%. Marker reduction response did not depend on marker type but rather on trait. Traits with high predictive abilities, e.g., dry matter content, reached a plateau using fewer markers than traits with intermediate-low correlations, such as yield. The predictions were unbiased across all traits, marker types, and all marker densities >100 SNPs. Our results suggest that using non-synonymous variants does not enhance the performance of genomic prediction of most traits. The major known QTLs were identified by GWAS and were reproducible across exonic and whole-genome variant sets for dry matter content, length/width ratio, and senescence. In contrast, minor QTL detection was marker type dependent.
Collapse
Affiliation(s)
- Trine Aalborg
- Department of Chemistry and Bioscience, Aalborg University, Aalborg, Denmark
| | | | | | | |
Collapse
|
2
|
Hartig N, Seibt KM, Heitkam T. How to start a LINE: 5' switching rejuvenates LINE retrotransposons in tobacco and related Nicotiana species. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2023. [PMID: 36965091 DOI: 10.1111/tpj.16208] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/01/2022] [Revised: 02/10/2023] [Accepted: 02/19/2023] [Indexed: 06/18/2023]
Abstract
By contrast to their conserved mammalian counterparts, plant long interspersed nuclear elements (LINEs) are highly variable, splitting into many low-copy families. Curiously, LINE families from the retrotransposable element (RTE) clade retain a stronger sequence conservation and hence reach higher copy numbers. The cause of this RTE-typical property is not yet understood, but would help clarify why some transposable elements are removed quickly, whereas others persist in plant genomes. Here, we bring forward a detailed study of RTE LINE structure, diversity and evolution in plants. For this, we argue that the nightshade family is the ideal taxon to follow the evolutionary trajectories of RTE LINEs, given their high abundance, recent activity and partnership to non-autonomous elements. Using bioinformatic, cytogenetic and molecular approaches, we detect 4029 full-length RTE LINEs across the Solanaceae. We finely characterize and manually curate a core group of 458 full-length LINEs in allotetraploid tobacco, show an integration event after polyploidization and trace hybridization by RTE LINE composition of parental genomes. Finally, we reveal the role of the untranslated regions (UTRs) as causes for the unique RTE LINE amplification and evolution pattern in plants. On the one hand, we detected a highly conserved motif at the 3' UTR, suggesting strong selective constraints acting on the RTE terminus. On the other hand, we observed successive rounds of 5' UTR cycling, constantly rejuvenating the promoter sequences. This interplay between exchangeable promoters and conserved LINE bodies and 3' UTR likely allows RTE LINEs to persist and thrive in plant genomes.
Collapse
Affiliation(s)
- Nora Hartig
- Faculty of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Kathrin M Seibt
- Faculty of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| | - Tony Heitkam
- Faculty of Botany, Technische Universität Dresden, 01069, Dresden, Germany
| |
Collapse
|
3
|
DeepPlnc: Bi-modal deep learning for highly accurate plant lncRNA discovery. Genomics 2022; 114:110443. [PMID: 35931273 DOI: 10.1016/j.ygeno.2022.110443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 06/27/2022] [Accepted: 07/29/2022] [Indexed: 11/24/2022]
Abstract
We present here a bi-modal CNN based deep-learning system, DeepPlnc, to identify plant lncRNAs with high accuracy while using sequence and structural properties. Unlike most of the existing software, it works accurately even in conditions with ambiguity of boundaries and incomplete sequences. It scored consistently high for performance metrics while breaching accuracy of >98% when tested across a large number of validated instances. During multiple benchmarkings it consistently outperformed all the compared tools and maintained a highly significant lead in the range of 2.5%- 4.6% from the second best performing tool (p-value << 0.01). DeepPlnc was used to annotate a de novo assembled transcriptome of a himalayan species where again it suggested its much better suitability for genome and transcriptome annotation purposes than the existing tools. DeepPlnc has been made freely available as a web-server and stand-alone program at https://scbb.ihbt.res.in/DeepPlnc/.
Collapse
|
4
|
Gantuz M, Morales A, Bertoldi MV, Ibañez VN, Duarte PF, Marfil CF, Masuelli RW. Hybridization and polyploidization effects on LTR-retrotransposon activation in potato genome. JOURNAL OF PLANT RESEARCH 2022; 135:81-92. [PMID: 34674075 DOI: 10.1007/s10265-021-01354-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/06/2021] [Accepted: 10/13/2021] [Indexed: 06/13/2023]
Abstract
Hybridization and polyploidization are major forces in plant evolution and potatoes are not an exception. It is proposed that the proliferation of Long Terminal Repeat-retrotransposons (LTR-RT) is related to genome reorganization caused by hybridization and/or polyploidization. The main purpose of the present work was to evaluate the effect of interspecific hybridization and polyploidization on the activation of LTR-RT. We evaluated the proliferation of putative active LTR-RT in a diploid hybrid between the cultivated potato Solanum tuberosum and the wild diploid potato species S. kurtzianum, allotetraploid lines derived from this interspecific hybrid and S. kurtzianum autotetraploid lines (ktz-autotetraploid) using the S-SAP (sequence-specific amplified polymorphism) technique and normalized copy number determination by qPCR. Twenty-nine LTR-RT copies were activated in the hybrid and present in the allotetraploid lines. Major LTR-RT activity was detected in Copia-27, Copia-12, Copia-14 and, Gypsy-22. According to our results, LTR-RT copies were activated principally in the hybrid, there was no activation in allotetraploid lines and only one copy was activated in the autotetraploid.
Collapse
Affiliation(s)
- Magdalena Gantuz
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina.
| | - Andrés Morales
- Instituto Nacional de Tecnología Agropecuaria (INTA), Luján de Cuyo, Mendoza, Argentina
| | - María Victoria Bertoldi
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina
| | - Verónica Noé Ibañez
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina
| | - Paola Fernanda Duarte
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina
| | - Carlos Federico Marfil
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina
| | - Ricardo Williams Masuelli
- Facultad de Ciencias Agrarias, Instituto de Biología Agrícola de Mendoza, Consejo Nacional de Investigaciones Científicas y Técnicas (IBAM-CONICET), Universidad Nacional de Cuyo, A. Brown 500 (M5528AHB) Chacras de Coria, Mendoza, Argentina.
| |
Collapse
|
5
|
Fuentes RR, de Ridder D, van Dijk ADJ, Peters SA. Domestication shapes recombination patterns in tomato. Mol Biol Evol 2021; 39:6379725. [PMID: 34597400 PMCID: PMC8763028 DOI: 10.1093/molbev/msab287] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Meiotic recombination is a biological process of key importance in breeding, to generate genetic diversity and develop novel or agronomically relevant haplotypes. In crop tomato, recombination is curtailed as manifested by linkage disequilibrium decay over a longer distance and reduced diversity compared with wild relatives. Here, we compared domesticated and wild populations of tomato and found an overall conserved recombination landscape, with local changes in effective recombination rate in specific genomic regions. We also studied the dynamics of recombination hotspots resulting from domestication and found that loss of such hotspots is associated with selective sweeps, most notably in the pericentromeric heterochromatin. We detected footprints of genetic changes and structural variants, among them associated with transposable elements, linked with hotspot divergence during domestication, likely causing fine-scale alterations to recombination patterns and resulting in linkage drag.
Collapse
Affiliation(s)
- Roven Rommel Fuentes
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, Wageningen, 6708 PB The Netherlands
| | - Dick de Ridder
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, Wageningen, 6708 PB The Netherlands
| | - Aalt D J van Dijk
- Bioinformatics Group, Wageningen University and Research, Droevendaalsesteeg 1, Wageningen, 6708 PB The Netherlands
| | - Sander A Peters
- Applied Bioinformatics, Wageningen Plant Research, Wageningen University and Research, Droevendaalsesteeg 1, Wageningen, 6708 PB, The Netherlands
| |
Collapse
|
6
|
Zavallo D, Crescente JM, Gantuz M, Leone M, Vanzetti LS, Masuelli RW, Asurmendi S. Genomic re-assessment of the transposable element landscape of the potato genome. PLANT CELL REPORTS 2020; 39:1161-1174. [PMID: 32435866 DOI: 10.1007/s00299-020-02554-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Accepted: 05/07/2020] [Indexed: 05/14/2023]
Abstract
We provide a comprehensive and reliable potato TE landscape, based on a wide variety of identification tools and integrative approaches, producing clear and ready-to-use outputs for the scientific community. Transposable elements (TEs) are DNA sequences with the ability to autoreplicate and move throughout the host genome. TEs are major drivers in stress response and genome evolution. Given their significance, the development of clear and efficient TE annotation pipelines has become essential for many species. The latest de novo TE discovery tools, along with available TEs from Repbase and sRNA-seq data, allowed us to perform a reliable potato TEs detection, classification and annotation through an open-source and freely available pipeline ( https://github.com/DiegoZavallo/TE_Discovery ). Using a variety of tools, approaches and rules, we were able to provide a clearly annotated of characterized TEs landscape. Additionally, we described the distribution of the different types of TEs across the genome, where LTRs and MITEs present a clear clustering pattern in pericentromeric and subtelomeric/telomeric regions respectively. Finally, we analyzed the insertion age and distribution of LTR retrotransposon families which display a distinct pattern between the two major superfamilies. While older Gypsy elements concentrated around heterochromatic regions, younger Copia elements located predominantly on euchromatic regions. Overall, we delivered not only a reliable, ready-to-use potato TE annotation files, but also all the necessary steps to perform de novo detection for other species.
Collapse
Affiliation(s)
- Diego Zavallo
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), Instituto Nacional de Tecnología Agropecuaria (INTA), Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Los Reseros y Nicolas Repeto, Hurlingham, Argentina.
| | - Juan Manuel Crescente
- Grupo Biotecnologia y Recursos Genéticos, EEA INTA Marcos Juárez, Ruta 12 Km 3, 2580, Marcos Juárez, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Magdalena Gantuz
- Instituto de Biología Agrícola de Mendoza (IBAM), Facultad de Ciencias Agrarias (FCA), CONICET-UNCuyo, Almirante Brown 500, M5528AHB, Chacras de Coria, Mendoza, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Melisa Leone
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), Instituto Nacional de Tecnología Agropecuaria (INTA), Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Los Reseros y Nicolas Repeto, Hurlingham, Argentina
- Agencia Nacional de Promocion Científica y Tecnológica (ANPCyT), Buenos Aires, Argentina
| | - Leonardo Sebastian Vanzetti
- Grupo Biotecnologia y Recursos Genéticos, EEA INTA Marcos Juárez, Ruta 12 Km 3, 2580, Marcos Juárez, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Ricardo Williams Masuelli
- Instituto de Biología Agrícola de Mendoza (IBAM), Facultad de Ciencias Agrarias (FCA), CONICET-UNCuyo, Almirante Brown 500, M5528AHB, Chacras de Coria, Mendoza, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Sebastian Asurmendi
- Instituto de Agrobiotecnología y Biología Molecular (IABIMO), Instituto Nacional de Tecnología Agropecuaria (INTA), Consejo Nacional de Investigaciones Científicas y Tecnológicas (CONICET), Los Reseros y Nicolas Repeto, Hurlingham, Argentina.
| |
Collapse
|
7
|
Tunjić Cvitanić M, Vojvoda Zeljko T, Pasantes JJ, García-Souto D, Gržan T, Despot-Slade E, Plohl M, Šatović E. Sequence Composition Underlying Centromeric and Heterochromatic Genome Compartments of the Pacific Oyster Crassostrea gigas. Genes (Basel) 2020; 11:genes11060695. [PMID: 32599860 PMCID: PMC7348941 DOI: 10.3390/genes11060695] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2020] [Revised: 06/10/2020] [Accepted: 06/22/2020] [Indexed: 02/07/2023] Open
Abstract
Segments of the genome enriched in repetitive sequences still present a challenge and are omitted in genome assemblies. For that reason, the exact composition of DNA sequences underlying the heterochromatic regions and the active centromeres are still unexplored for many organisms. The centromere is a crucial region of eukaryotic chromosomes responsible for the accurate segregation of genetic material. The typical landmark of centromere chromatin is the rapidly-evolving variant of the histone H3, CenH3, while DNA sequences packed in constitutive heterochromatin are associated with H3K9me3-modified histones. In the Pacific oyster Crassostrea gigas we identified its centromere histone variant, Cg-CenH3, that shows stage-specific distribution in gonadal cells. In order to investigate the DNA composition of genomic regions associated with the two specific chromatin types, we employed chromatin immunoprecipitation followed by high-throughput next-generation sequencing of the Cg-CenH3- and H3K9me3-associated sequences. CenH3-associated sequences were assigned to six groups of repetitive elements, while H3K9me3-associated-ones were assigned only to three. Those associated with CenH3 indicate the lack of uniformity in the chromosomal distribution of sequences building the centromeres, being also in the same time dispersed throughout the genome. The heterochromatin of C. gigas exhibited general paucity and limited chromosomal localization as predicted, with H3K9me3-associated sequences being predominantly constituted of DNA transposons.
Collapse
Affiliation(s)
- Monika Tunjić Cvitanić
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Tanja Vojvoda Zeljko
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Juan J. Pasantes
- Departamento de Bioquímica, Xenética e Inmunoloxía, Centro de Investigación Mariña (CIM), Universidade de Vigo, 36310 Vigo, Spain; (J.J.P.); (D.G.-S.)
| | - Daniel García-Souto
- Departamento de Bioquímica, Xenética e Inmunoloxía, Centro de Investigación Mariña (CIM), Universidade de Vigo, 36310 Vigo, Spain; (J.J.P.); (D.G.-S.)
- Department of Zoology, Genetics and Physical Anthropology, Universidade de Santiago de Compostela, Praza do Obradoiro, 0, 15705 Santiago de Compostela, Spain
- Cancer, Ageing and Somatic Mutation, Wellcome Sanger Institute, Hinxton, Cambridgeshire CB10 1SA, UK
| | - Tena Gržan
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Evelin Despot-Slade
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
| | - Miroslav Plohl
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
- Correspondence: (M.P.); (E.Š.)
| | - Eva Šatović
- Division of Molecular Biology, Ruđer Bošković Institute, Bijenička cesta 54, 10000 Zagreb, Croatia; (M.T.C.); (T.V.Z.); (T.G.); (E.D.-S.)
- Correspondence: (M.P.); (E.Š.)
| |
Collapse
|
8
|
Wang Z, Baulcombe DC. Transposon age and non-CG methylation. Nat Commun 2020; 11:1221. [PMID: 32144266 PMCID: PMC7060349 DOI: 10.1038/s41467-020-14995-6] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2020] [Accepted: 02/11/2020] [Indexed: 11/09/2022] Open
Abstract
Silencing of transposable elements (TEs) is established by small RNA-directed DNA methylation (RdDM). Maintenance of silencing is then based on a combination of RdDM and RNA-independent mechanisms involving DNA methyltransferase MET1 and chromodomain DNA methyltransferases (CMTs). Involvement of RdDM, according to this model should decrease with TE age but here we show a different pattern in tomato and Arabidopsis. In these species the CMTs silence long terminal repeat (LTR) transposons in the distal chromatin that are younger than those affected by RdDM. To account for these findings we propose that, after establishment of primary RdDM as in the original model, there is an RNA-independent maintenance phase involving CMTs followed by secondary RdDM. This progression of epigenetic silencing in the gene-rich distal chromatin is likely to influence the transcriptome either in cis or in trans depending on whether the mechanisms are RNA-dependent or -independent. RNA-directed DNA methylation (RdDM) is thought to silence newly inserted transposable elements (TEs) with RNA-independent mechanisms becoming more prominent as TEs age. Here, the authors show that RdDM continues to silence the oldest intact distal TEs in tomato and Arabidopsis suggesting a second, later phase of RdDM.
Collapse
Affiliation(s)
- Zhengming Wang
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK
| | - David C Baulcombe
- Department of Plant Sciences, University of Cambridge, Cambridge, CB2 3EA, UK.
| |
Collapse
|
9
|
Grehl C, Wagner M, Lemnian I, Glaser B, Grosse I. Performance of Mapping Approaches for Whole-Genome Bisulfite Sequencing Data in Crop Plants. FRONTIERS IN PLANT SCIENCE 2020; 11:176. [PMID: 32256504 PMCID: PMC7093021 DOI: 10.3389/fpls.2020.00176] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/12/2019] [Accepted: 02/05/2020] [Indexed: 05/09/2023]
Abstract
DNA methylation is involved in many different biological processes in the development and well-being of crop plants such as transposon activation, heterosis, environment-dependent transcriptome plasticity, aging, and many diseases. Whole-genome bisulfite sequencing is an excellent technology for detecting and quantifying DNA methylation patterns in a wide variety of species, but optimized data analysis pipelines exist only for a small number of species and are missing for many important crop plants. This is especially important as most existing benchmark studies have been performed on mammals with hardly any repetitive elements and without CHG and CHH methylation. Pipelines for the analysis of whole-genome bisulfite sequencing data usually consists of four steps: read trimming, read mapping, quantification of methylation levels, and prediction of differentially methylated regions (DMRs). Here we focus on read mapping, which is challenging because un-methylated cytosines are transformed to uracil during bisulfite treatment and to thymine during the subsequent polymerase chain reaction, and read mappers must be capable of dealing with this cytosine/thymine polymorphism. Several read mappers have been developed over the last years, with different strengths and weaknesses, but their performances have not been critically evaluated. Here, we compare eight read mappers: Bismark, BismarkBwt2, BSMAP, BS-Seeker2, Bwameth, GEM3, Segemehl, and GSNAP to assess the impact of the read-mapping results on the prediction of DMRs. We used simulated data generated from the genomes of Arabidopsis thaliana, Brassica napus, Glycine max, Solanum tuberosum, and Zea mays, monitored the effects of the bisulfite conversion rate, the sequencing error rate, the maximum number of allowed mismatches, as well as the genome structure and size, and calculated precision, number of uniquely mapped reads, distribution of the mapped reads, run time, and memory consumption as features for benchmarking the eight read mappers mentioned above. Furthermore, we validated our findings using real-world data of Glycine max and showed the influence of the mapping step on DMR calling in WGBS pipelines. We found that the conversion rate had only a minor impact on the mapping quality and the number of uniquely mapped reads, whereas the error rate and the maximum number of allowed mismatches had a strong impact and leads to differences of the performance of the eight read mappers. In conclusion, we recommend BSMAP which needs the shortest run time and yields the highest precision, and Bismark which requires the smallest amount of memory and yields precision and high numbers of uniquely mapped reads.
Collapse
Affiliation(s)
- Claudius Grehl
- Institute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), Germany
- Institute of Agronomy and Nutritional Sciences, Soil Biogeochemistry, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 3, Halle (Saale), Germany
- *Correspondence: Claudius Grehl,
| | - Marc Wagner
- Institute of Mathematics and Informatics, Freie Universität Berlin, Berlin, Germany
| | - Ioana Lemnian
- Institute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), Germany
- Institute of Human Genetics, Martin Luther University Halle-Wittenberg, Halle (Saale), Germany
| | - Bruno Glaser
- Institute of Agronomy and Nutritional Sciences, Soil Biogeochemistry, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 3, Halle (Saale), Germany
| | - Ivo Grosse
- Institute of Computer Science, Bioinformatics, Martin Luther University Halle–Wittenberg, Von Seckendorff-Platz 1, Halle (Saale), Germany
- Bioinformatics Unit, German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany
| |
Collapse
|
10
|
Gao D, Chu Y, Xia H, Xu C, Heyduk K, Abernathy B, Ozias-Akins P, Leebens-Mack JH, Jackson SA. Horizontal Transfer of Non-LTR Retrotransposons from Arthropods to Flowering Plants. Mol Biol Evol 2019; 35:354-364. [PMID: 29069493 PMCID: PMC5850137 DOI: 10.1093/molbev/msx275] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Even though lateral movements of transposons across families and even phyla within multicellular eukaryotic kingdoms have been found, little is known about transposon transfer between the kingdoms Animalia and Plantae. We discovered a novel non-LTR retrotransposon, AdLINE3, in a wild peanut species. Sequence comparisons and phylogenetic analyses indicated that AdLINE3 is a member of the RTE clade, originally identified in a nematode and rarely reported in plants. We identified RTE elements in 82 plants, spanning angiosperms to algae, including recently active elements in some flowering plants. RTE elements in flowering plants were likely derived from a single family we refer to as An-RTE. Interestingly, An-RTEs show significant DNA sequence identity with non-LTR retroelements from 42 animals belonging to four phyla. Moreover, the sequence identity of RTEs between two arthropods and two plants was higher than that of homologous genes. Phylogenetic and evolutionary analyses of RTEs from both animals and plants suggest that the An-RTE family was likely transferred horizontally into angiosperms from an ancient aphid(s) or ancestral arthropod(s). Notably, some An-RTEs were recruited as coding sequences of functional genes participating in metabolic or other biochemical processes in plants. This is the first potential example of horizontal transfer of transposons between animals and flowering plants. Our findings help to understand exchanges of genetic material between the kingdom Animalia and Plantae and suggest arthropods likely impacted on plant genome evolution.
Collapse
Affiliation(s)
- Dongying Gao
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA
| | - Ye Chu
- Department of Horticulture, University of Georgia, Tifton, GA
| | - Han Xia
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA.,Biotechnology Research Center, Shandong Academy of Agricultural Sciences, Jinan, Shandong, China
| | - Chunming Xu
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA
| | - Karolina Heyduk
- Department of Plant Biology, University of Georgia, Athens, GA
| | - Brian Abernathy
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA
| | | | | | - Scott A Jackson
- Center for Applied Genetic Technologies, University of Georgia, Athens, GA
| |
Collapse
|
11
|
Bastien M, Boudhrioua C, Fortin G, Belzile F. Exploring the potential and limitations of genotyping-by-sequencing for SNP discovery and genotyping in tetraploid potato. Genome 2018; 61:449-456. [PMID: 29688035 DOI: 10.1139/gen-2017-0236] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Genotyping-by-sequencing (GBS) potentially offers a cost-effective alternative for SNP discovery and genotyping. Here, we report the exploration of GBS in tetraploid potato. Both ApeKI and PstI/MspI enzymes were used for library preparation on eight diverse potato genotypes. ApeKI yielded more markers than PstI/MspI but provided a lower read coverage per marker, resulting in more missing data and limiting effective genotyping to the tetraploid mode. We then assessed the accuracy of these SNPs by comparison with SolCAP data (5824 data points in diploid mode and 3243 data points in tetraploid mode) and found the match rates between genotype calls was 90.4% and 81.3%, respectively. Imputation of missing data did not prove very accurate because of incomplete haplotype discovery, suggesting caution in setting the allowance for missing data. To further assess the quality of GBS-derived data, a genome-wide association analysis was performed for flower color on 318 clones (with ApeKI). A strong association signal on chromosome 2 was obtained with the most significant SNP located in the middle of the dihydroflavonol 4-reductase (DFR) gene. We conclude that an appropriate choice of enzyme for GBS library preparation makes it possible to obtain high-quality SNPs in potato and will be helpful for marker-assisted genomics.
Collapse
Affiliation(s)
- Maxime Bastien
- Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada.,Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Chiheb Boudhrioua
- Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada.,Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - Gabrielle Fortin
- Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada.,Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| | - François Belzile
- Département de phytologie and Institut de biologie intégrative et des systèmes (IBIS), Université Laval, Quebec City, QC G1V 0A6, Canada
| |
Collapse
|
12
|
Xiong W, Dooner HK, Du C. Rolling-circle amplification of centromeric Helitrons in plant genomes. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2016; 88:1038-1045. [PMID: 27553634 DOI: 10.1111/tpj.13314] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2016] [Revised: 08/22/2016] [Accepted: 08/23/2016] [Indexed: 06/06/2023]
Abstract
The unusual eukaryotic Helitron transposons can readily capture host sequences and are, thus, evolutionarily important. They are presumed to amplify by rolling-circle replication (RCR) because some elements encode predicted proteins homologous to RCR prokaryotic transposases. In support of this replication mechanism, it was recently shown that transposition of a bat Helitron generates covalently closed circular intermediates. Another strong prediction is that RCR should generate tandem Helitron concatemers, yet almost all Helitrons identified to date occur as solo elements in the genome. To investigate alternative modes of Helitron organization in present-day genomes, we have applied the novel computational tool HelitronScanner to 27 plant genomes and have uncovered numerous tandem arrays of partially decayed, truncated Helitrons in all of them. Strikingly, most of these Helitron tandem arrays are interspersed with other repeats in centromeres. Many of these arrays have multiple Helitron 5' ends, but a single 3' end. The number of repeats in any one array can range from a handful to several hundreds. We propose here an RCR model that conforms to the present Helitron landscape of plant genomes. Our study provides strong evidence that plant Helitrons amplify by RCR and that the tandemly arrayed replication products accumulate mostly in centromeres.
Collapse
Affiliation(s)
- Wenwei Xiong
- Department of Biology, Montclair State University, Montclair, NJ, 07043, USA
| | - Hugo K Dooner
- Waksman Institute, Rutgers, the State University of New Jersey, Piscataway, NJ, 08854, USA
- Department of Plant Biology, Rutgers, the State University of New Jersey, New Brunswick, NJ, 08801, USA
| | - Chunguang Du
- Department of Biology, Montclair State University, Montclair, NJ, 07043, USA
| |
Collapse
|
13
|
Jouffroy O, Saha S, Mueller L, Quesneville H, Maumus F. Comprehensive repeatome annotation reveals strong potential impact of repetitive elements on tomato ripening. BMC Genomics 2016; 17:624. [PMID: 27519651 PMCID: PMC4981986 DOI: 10.1186/s12864-016-2980-z] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2016] [Accepted: 07/28/2016] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Plant genomes are populated by different types of repetitive elements including transposable elements (TEs) and simple sequence repeats (SSRs) that can have a strong impact on genome size and dynamic as well as on the regulation of gene transcription. At least two-thirds of the tomato genome is composed of repeats. While their bulk impact on genome organization has been recently revealed by whole genome assembly, their influence on tomato biology and phenotype remains largely unaddressed. More specifically, the effects and roles of DNA repeats on the maturation of fleshy fruits, which is a complex process of key agro-economic interest, still needs to be investigated comprehensively and tomato is arguably an excellent model for such study. RESULTS We have performed a comprehensive annotation of the tomato repeatome to explore its potential impact on tomato genome composition and gene transcription. Our results show that the tomato genome can be fractioned into three compartments with different gene and repeat density, each compartment presenting contrasting repeat and gene composition, repeat-gene associations and different gene transcriptional levels. In the context of fruit ripening, we found that repeats are present in the majority of differentially methylated regions (DMRs) and thousands of repeat-associated DMRs are found in gene proximity including hundreds that are differentially regulated. Furthermore, we found that repeats are also present in the proximity of binding sites of the key ripening protein RIN. We also observed that some repeat families are present at unexpected high frequency in the proximity of genes that are differentially expressed during tomato ripening. CONCLUSION Altogether, our study emphasizes the fractionation as defined by repeat content in the tomato genome and enables to further characterize the specificities of each genomic compartment. Additionally, our results present strong associations between differentially regulated genes, differentially methylated regions and repeats, suggesting a potential adaptive function of repeats in tomato ripening. Our work therefore provides significant perspectives for the understanding of the impact of repeats on the maturation of fleshy fruits.
Collapse
Affiliation(s)
| | - Surya Saha
- Boyce Thompson Institute, Ithaca, NY, 14853, USA
| | - Lukas Mueller
- Boyce Thompson Institute, Ithaca, NY, 14853, USA.,Department of Plant Breeding, Cornell University, Ithaca, NY, 14853, USA
| | | | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France.
| |
Collapse
|