1
|
Geethanjali S, Kadirvel P, Anumalla M, Hemanth Sadhana N, Annamalai A, Ali J. Streamlining of Simple Sequence Repeat Data Mining Methodologies and Pipelines for Crop Scanning. PLANTS (BASEL, SWITZERLAND) 2024; 13:2619. [PMID: 39339594 PMCID: PMC11435353 DOI: 10.3390/plants13182619] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/18/2024] [Accepted: 08/29/2024] [Indexed: 09/30/2024]
Abstract
Genetic markers are powerful tools for understanding genetic diversity and the molecular basis of traits, ushering in a new era of molecular breeding in crops. Over the past 50 years, DNA markers have rapidly changed, moving from hybridization-based and second-generation-based to sequence-based markers. Simple sequence repeats (SSRs) are the ideal markers in plant breeding, and they have numerous desirable properties, including their repeatability, codominance, multi-allelic nature, and locus specificity. They can be generated from any species, which requires prior sequence knowledge. SSRs may serve as evolutionary tuning knobs, allowing for rapid identification and adaptation to new circumstances. The evaluations published thus far have mostly ignored SSR polymorphism and gene evolution due to a lack of data regarding the precise placements of SSRs on chromosomes. However, NGS technologies have made it possible to produce high-throughput SSRs for any species using massive volumes of genomic sequence data that can be generated fast and at a minimal cost. Though SNP markers are gradually replacing the erstwhile DNA marker systems, SSRs remain the markers of choice in orphan crops due to the lack of genomic resources at the reference level and their adaptability to resource-limited labor. Several bioinformatic approaches and tools have evolved to handle genomic sequences to identify SSRs and generate primers for genotyping applications in plant breeding projects. This paper includes the currently available methodologies for producing SSR markers, genomic resource databases, and computational tools/pipelines for SSR data mining and primer generation. This review aims to provide a 'one-stop shop' of information to help each new user carefully select tools for identifying and utilizing SSRs in genetic research and breeding programs.
Collapse
Affiliation(s)
- Subramaniam Geethanjali
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Palchamy Kadirvel
- Crop Improvement Section, ICAR-Indian Institute of Oilseeds Research, Rajendranagar, Hyderabad 500030, India
| | - Mahender Anumalla
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
- IRRI South Asia Hub, Patancheru, Hyderabad 502324, India
| | - Nithyananth Hemanth Sadhana
- Department of Plant Biotechnology, Centre for Plant Molecular Biology and Biotechnology, Tamil Nadu Agricultural University, Coimbatore 641003, India
| | - Anandan Annamalai
- Indian Council of Agricultural Research (ICAR), Indian Institute of Seed Science, Bengaluru 560065, India
| | - Jauhar Ali
- Rice Breeding Innovation Platform, International Rice Research Institute (IRRI), Los Baños 4031, Laguna, Philippines
| |
Collapse
|
2
|
Sierra P, Durbin R. Identification of transposable element families from pangenome polymorphisms. Mob DNA 2024; 15:13. [PMID: 38926873 PMCID: PMC11202377 DOI: 10.1186/s13100-024-00323-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Accepted: 06/13/2024] [Indexed: 06/28/2024] Open
Abstract
BACKGROUND Transposable Elements (TEs) are segments of DNA, typically a few hundred base pairs up to several tens of thousands bases long, that have the ability to generate new copies of themselves in the genome. Most existing methods used to identify TEs in a newly sequenced genome are based on their repetitive character, together with detection based on homology and structural features. As new high quality assemblies become more common, including the availability of multiple independent assemblies from the same species, an alternative strategy for identification of TE families becomes possible in which we focus on the polymorphism at insertion sites caused by TE mobility. RESULTS We develop the idea of using the structural polymorphisms found in pangenomes to create a library of the TE families recently active in a species, or in a closely related group of species. We present a tool, pantera, that achieves this task, and illustrate its use both on species with well-curated libraries, and on new assemblies. CONCLUSIONS Our results show that pantera is sensitive and accurate, tending to correctly identify complete elements with precise boundaries, and is particularly well suited to detect larger, low copy number TEs that are often undetected with existing de novo methods.
Collapse
Affiliation(s)
- Pío Sierra
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK
| | - Richard Durbin
- Department of Genetics, University of Cambridge, Cambridge, CB2 3EH, UK.
| |
Collapse
|
3
|
Peona V, Martelossi J, Almojil D, Bocharkina J, Brännström I, Brown M, Cang A, Carrasco-Valenzuela T, DeVries J, Doellman M, Elsner D, Espíndola-Hernández P, Montoya GF, Gaspar B, Zagorski D, Hałakuc P, Ivanovska B, Laumer C, Lehmann R, Boštjančić LL, Mashoodh R, Mazzoleni S, Mouton A, Nilsson MA, Pei Y, Potente G, Provataris P, Pardos-Blas JR, Raut R, Sbaffi T, Schwarz F, Stapley J, Stevens L, Sultana N, Symonova R, Tahami MS, Urzì A, Yang H, Yusuf A, Pecoraro C, Suh A. Teaching transposon classification as a means to crowd source the curation of repeat annotation - a tardigrade perspective. Mob DNA 2024; 15:10. [PMID: 38711146 DOI: 10.1186/s13100-024-00319-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2024] [Accepted: 04/09/2024] [Indexed: 05/08/2024] Open
Abstract
BACKGROUND The advancement of sequencing technologies results in the rapid release of hundreds of new genome assemblies a year providing unprecedented resources for the study of genome evolution. Within this context, the significance of in-depth analyses of repetitive elements, transposable elements (TEs) in particular, is increasingly recognized in understanding genome evolution. Despite the plethora of available bioinformatic tools for identifying and annotating TEs, the phylogenetic distance of the target species from a curated and classified database of repetitive element sequences constrains any automated annotation effort. Moreover, manual curation of raw repeat libraries is deemed essential due to the frequent incompleteness of automatically generated consensus sequences. RESULTS Here, we present an example of a crowd-sourcing effort aimed at curating and annotating TE libraries of two non-model species built around a collaborative, peer-reviewed teaching process. Manual curation and classification are time-consuming processes that offer limited short-term academic rewards and are typically confined to a few research groups where methods are taught through hands-on experience. Crowd-sourcing efforts could therefore offer a significant opportunity to bridge the gap between learning the methods of curation effectively and empowering the scientific community with high-quality, reusable repeat libraries. CONCLUSIONS The collaborative manual curation of TEs from two tardigrade species, for which there were no TE libraries available, resulted in the successful characterization of hundreds of new and diverse TEs in a reasonable time frame. Our crowd-sourcing setting can be used as a teaching reference guide for similar projects: A hidden treasure awaits discovery within non-model organisms.
Collapse
Affiliation(s)
- Valentina Peona
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden.
- Swiss Ornithological Institute Vogelwarte, Sempach, CH-6204, Switzerland.
- Department of Bioinformatics and Genetics, Swedish Natural History Museum, Stockholm, Sweden.
| | - Jacopo Martelossi
- Department of Biological Geological and Environmental Science, University of Bologna, Via Selmi 3, Bologna, 40126, Italy.
| | - Dareen Almojil
- New York University Abu Dhabi, Saadiyat Island, United Arab Emirates
| | | | - Ioana Brännström
- Natural History Museum, Oslo University, Oslo, Norway
- Department of Ecology and Genetics, Uppsala University, Uppsala, Sweden
| | - Max Brown
- Anglia Ruskin University, East Rd, Cambridge, CB1 1PT, UK
| | | | - Tomàs Carrasco-Valenzuela
- Evolutionary Genetics Department, Leibniz Institute for Zoo and Wildlife Research, 10315, Berlin, Germany
- Berlin Center for Genomics in Biodiversity Research, 14195, Berlin, Germany
| | - Jon DeVries
- Reed College, Portland, OR, United States of America
| | - Meredith Doellman
- Department of Ecology and Evolution, The University of Chicago, Chicago, IL, 60637, USA
- Department of Biological Sciences, University of Notre Dame, Notre Dame, IN, 46556, USA
| | - Daniel Elsner
- Evolutionary Biology & Ecology, University of Freiburg, Freiburg, Germany
| | - Pamela Espíndola-Hernández
- Research Unit Comparative Microbiome Analysis (COMI), Helmholtz Zentrum München, Ingolstädter Landstraße 1, D-85764, Neuherberg, Germany
| | | | - Bence Gaspar
- Institute of Evolution and Ecology, University of Tuebingen, Tuebingen, Germany
| | - Danijela Zagorski
- Institute of Botany, Czech Academy of Sciences, Průhonice, Czech Republic
| | - Paweł Hałakuc
- Institute of Evolutionary Biology, Faculty of Biology, Biological and Chemical Research Centre, University of Warsaw, Warsaw, Poland
| | - Beti Ivanovska
- Institute of Genetics and Biotechnology, Hungarian University of Agriculture and Life Sciences, Budapest, Hungary
| | | | - Robert Lehmann
- Biological and Environmental Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Ljudevit Luka Boštjančić
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
| | - Rahia Mashoodh
- Department of Genetics, Environment & Evolution, Centre for Biodiversity & Environment Research, University College London, London, UK
| | - Sofia Mazzoleni
- Department of Ecology, Faculty of Science, Charles University, Prague, Czech Republic
| | - Alice Mouton
- INBIOS-Conservation Genetic Lab, University of Liege, Liege, Belgium
| | - Maria Anna Nilsson
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Senckenberganlage 25, 60325, Frankfurt, Germany
| | - Yifan Pei
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden
- Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 127, 53113, Bonn, Germany
| | - Giacomo Potente
- Department of Systematic and Evolutionary Botany, University of Zurich, Zurich, Switzerland
| | - Panagiotis Provataris
- German Cancer Research Center, NGS Core Facility, DKFZ-ZMBH Alliance, 69120, Heidelberg, Germany
| | - José Ramón Pardos-Blas
- Departamento de Biodiversidad y Biología Evolutiva, Museo Nacional de Ciencias Naturales (MNCN-CSIC), José Gutiérrez Abascal 2, Madrid, 28006, Spain
| | - Ravindra Raut
- Department of Biotechnology, National Institute of Technology Durgapur, Durgapur, India
| | - Tomasa Sbaffi
- Molecular Ecology Group (MEG), National Research Council of Italy - Water Research Institute (CNR-IRSA), Verbania, Italy
| | - Florian Schwarz
- Eurofins Genomics Europe Pharma and Diagnostics Products & Services Sales GmbH, Ebersberg, Germany
| | - Jessica Stapley
- Plant Pathology Group, Institute of Integrative Biology, ETH Zurich, Zurich, Switzerland
| | - Lewis Stevens
- Tree of Life, Wellcome Sanger Institute, Cambridge, CB10 1SA, UK
| | - Nusrat Sultana
- Department of Botany, Jagannath Univerity, Dhaka, 1100, Bangladesh
| | - Radka Symonova
- Institute of Hydrobiology, Biology Centre of the Czech Academy of Sciences, České Budějovice, Czech Republic
| | - Mohadeseh S Tahami
- Department of Biological and Environmental Science, University of Jyväskylä, P.O. Box 35, Jyväskylä, 40014, Finland
| | - Alice Urzì
- Centogene GmbH, Am Strande 7, 18055, Rostock, Germany
| | - Heidi Yang
- Department of Ecology & Evolutionary Biology, University of California, Los Angeles, Los Angeles, CA, United States of America
| | - Abdullah Yusuf
- Zell- und Molekularbiologie der Pflanzen, Technische Universität Dresden, Dresden, Germany
| | | | - Alexander Suh
- Department of Organismal Biology - Systematic Biology, Evolutionary Biology Centre, Uppsala University, Uppsala, SE-752 36, Sweden.
- School of Biological Sciences, University of East Anglia, Norwich Research Park, Norwich, NR4 7TU, UK.
- Present address: Centre for Molecular Biodiversity Research, Leibniz Institute for the Analysis of Biodiversity Change, Adenauerallee 160, 53113, Bonn, Germany.
| |
Collapse
|
4
|
Choudalakis M, Bashtrykov P, Jeltsch A. RepEnTools: an automated repeat enrichment analysis package for ChIP-seq data reveals hUHRF1 Tandem-Tudor domain enrichment in young repeats. Mob DNA 2024; 15:6. [PMID: 38570859 PMCID: PMC10988844 DOI: 10.1186/s13100-024-00315-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Accepted: 03/05/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Repeat elements (REs) play important roles for cell function in health and disease. However, RE enrichment analysis in short-read high-throughput sequencing (HTS) data, such as ChIP-seq, is a challenging task. RESULTS Here, we present RepEnTools, a software package for genome-wide RE enrichment analysis of ChIP-seq and similar chromatin pulldown experiments. Our analysis package bundles together various software with carefully chosen and validated settings to provide a complete solution for RE analysis, starting from raw input files to tabular and graphical outputs. RepEnTools implementations are easily accessible even with minimal IT skills (Galaxy/UNIX). To demonstrate the performance of RepEnTools, we analysed chromatin pulldown data by the human UHRF1 TTD protein domain and discovered enrichment of TTD binding on young primate and hominid specific polymorphic repeats (SVA, L1PA1/L1HS) overlapping known enhancers and decorated with H3K4me1-K9me2/3 modifications. We corroborated these new bioinformatic findings with experimental data by qPCR assays using newly developed primate and hominid specific qPCR assays which complement similar research tools. Finally, we analysed mouse UHRF1 ChIP-seq data with RepEnTools and showed that the endogenous mUHRF1 protein colocalizes with H3K4me1-H3K9me3 on promoters of REs which were silenced by UHRF1. These new data suggest a functional role for UHRF1 in silencing of REs that is mediated by TTD binding to the H3K4me1-K9me3 double mark and conserved in two mammalian species. CONCLUSIONS RepEnTools improves the previously available programmes for RE enrichment analysis in chromatin pulldown studies by leveraging new tools, enhancing accessibility and adding some key functions. RepEnTools can analyse RE enrichment rapidly, efficiently, and accurately, providing the community with an up-to-date, reliable and accessible tool for this important type of analysis.
Collapse
Affiliation(s)
- Michel Choudalakis
- Department of Biochemistry, Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569, Stuttgart, Germany
| | - Pavel Bashtrykov
- Department of Biochemistry, Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569, Stuttgart, Germany.
| | - Albert Jeltsch
- Department of Biochemistry, Institute of Biochemistry and Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569, Stuttgart, Germany.
| |
Collapse
|
5
|
Garcia S, Kovarik A, Maiwald S, Mann L, Schmidt N, Pascual-Díaz JP, Vitales D, Weber B, Heitkam T. The Dynamic Interplay Between Ribosomal DNA and Transposable Elements: A Perspective From Genomics and Cytogenetics. Mol Biol Evol 2024; 41:msae025. [PMID: 38306580 PMCID: PMC10946416 DOI: 10.1093/molbev/msae025] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 12/06/2023] [Accepted: 01/29/2024] [Indexed: 02/04/2024] Open
Abstract
Although both are salient features of genomes, at first glance ribosomal DNAs and transposable elements are genetic elements with not much in common: whereas ribosomal DNAs are mainly viewed as housekeeping genes that uphold all prime genome functions, transposable elements are generally portrayed as selfish and disruptive. These opposing characteristics are also mirrored in other attributes: organization in tandem (ribosomal DNAs) versus organization in a dispersed manner (transposable elements); evolution in a concerted manner (ribosomal DNAs) versus evolution by diversification (transposable elements); and activity that prolongs genomic stability (ribosomal DNAs) versus activity that shortens it (transposable elements). Re-visiting relevant instances in which ribosomal DNA-transposable element interactions have been reported, we note that both repeat types share at least four structural and functional hallmarks: (1) they are repetitive DNAs that shape genomes in evolutionary timescales, (2) they exchange structural motifs and can enter co-evolution processes, (3) they are tightly controlled genomic stress sensors playing key roles in senescence/aging, and (4) they share common epigenetic marks such as DNA methylation and histone modification. Here, we give an overview of the structural, functional, and evolutionary characteristics of both ribosomal DNAs and transposable elements, discuss their roles and interactions, and highlight trends and future directions as we move forward in understanding ribosomal DNA-transposable element associations.
Collapse
Affiliation(s)
- Sònia Garcia
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, 08038 Barcelona, Catalonia, Spain
| | - Ales Kovarik
- Institute of Biophysics, Academy of Sciences of the Czech Republic, 61265 Brno, Czech Republic
| | - Sophie Maiwald
- Faculty of Biology, Technische Universität Dresden, D-01069 Dresden, Germany
| | - Ludwig Mann
- Faculty of Biology, Technische Universität Dresden, D-01069 Dresden, Germany
| | - Nicola Schmidt
- Faculty of Biology, Technische Universität Dresden, D-01069 Dresden, Germany
| | | | - Daniel Vitales
- Institut Botànic de Barcelona (IBB), CSIC-CMCNB, 08038 Barcelona, Catalonia, Spain
- Laboratori de Botànica–Unitat Associada CSIC, Facultat de Farmàcia i Ciències de l’Alimentació, Universitat de Barcelona, 08028 Barcelona, Catalonia, Spain
| | - Beatrice Weber
- Faculty of Biology, Technische Universität Dresden, D-01069 Dresden, Germany
| | - Tony Heitkam
- Faculty of Biology, Technische Universität Dresden, D-01069 Dresden, Germany
- Institute of Biology, NAWI Graz, Karl-Franzens-Universität, A-8010 Graz, Austria
| |
Collapse
|
6
|
Loreto ELS, Melo ESD, Wallau GL, Gomes TMFF. The good, the bad and the ugly of transposable elements annotation tools. Genet Mol Biol 2024; 46:e20230138. [PMID: 38373163 PMCID: PMC10876081 DOI: 10.1590/1678-4685-gmb-2023-0138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/26/2023] [Indexed: 02/21/2024] Open
Abstract
Transposable elements are repetitive and mobile DNA segments that can be found in virtually all organisms investigated to date. Their complex structure and variable nature are particularly challenging from the genomic annotation point of view. Many softwares have been developed to automate and facilitate TEs annotation at the genomic level, but they are highly heterogeneous regarding documentation, usability and methods. In this review, we revisited the existing software for TE genomic annotation, concentrating on the most often used ones, the methodologies they apply, and usability. Building on the state of the art of TE annotation software we propose best practices and highlight the strengths and weaknesses from the available solutions.
Collapse
Affiliation(s)
- Elgion L S Loreto
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
- Universidade Federal de Santa Maria, Departamento de Bioquímica e Biologia Molecular, Santa Maria, RS, Brazil
| | - Elverson S de Melo
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Gabriel L Wallau
- Fundação Oswaldo Cruz, Instituto Aggeu Magalhães, Departamento de Entomologia, Recife, PE, Brazil
| | - Tiago M F F Gomes
- Universidade Federal do Rio Grande do Sul, Programa de Pós-Graduação em Genética e Biologia Molecular, Porto Alegre, RS, Brazil
| |
Collapse
|
7
|
Maiwald S, Mann L, Garcia S, Heitkam T. Evolving Together: Cassandra Retrotransposons Gradually Mirror Promoter Mutations of the 5S rRNA Genes. Mol Biol Evol 2024; 41:msae010. [PMID: 38262464 PMCID: PMC10853983 DOI: 10.1093/molbev/msae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 10/26/2023] [Accepted: 12/11/2023] [Indexed: 01/25/2024] Open
Abstract
The 5S rRNA genes are among the most conserved nucleotide sequences across all species. Similar to the 5S preservation we observe the occurrence of 5S-related nonautonomous retrotransposons, so-called Cassandras. Cassandras harbor highly conserved 5S rDNA-related sequences within their long terminal repeats, advantageously providing them with the 5S internal promoter. However, the dynamics of Cassandra retrotransposon evolution in the context of 5S rRNA gene sequence information and structural arrangement are still unclear, especially: (1) do we observe repeated or gradual domestication of the highly conserved 5S promoter by Cassandras and (2) do changes in 5S organization such as in the linked 35S-5S rDNA arrangements impact Cassandra evolution? Here, we show evidence for gradual co-evolution of Cassandra sequences with their corresponding 5S rDNAs. To follow the impact of 5S rDNA variability on Cassandra TEs, we investigate the Asteraceae family where highly variable 5S rDNAs, including 5S promoter shifts and both linked and separated 35S-5S rDNA arrangements have been reported. Cassandras within the Asteraceae mirror 5S rDNA promoter mutations of their host genome, likely as an adaptation to the host's specific 5S transcription factors and hence compensating for evolutionary changes in the 5S rDNA sequence. Changes in the 5S rDNA sequence and in Cassandras seem uncorrelated with linked/separated rDNA arrangements. We place all these observations into the context of angiosperm 5S rDNA-Cassandra evolution, discuss Cassandra's origin hypotheses (single or multiple) and Cassandra's possible impact on rDNA and plant genome organization, giving new insights into the interplay of ribosomal genes and transposable elements.
Collapse
Affiliation(s)
- Sophie Maiwald
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Ludwig Mann
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
| | - Sònia Garcia
- Institut Botànic de Barcelona, IBB (CSIC-MCNB), 08038 Barcelona, Catalonia, Spain
| | - Tony Heitkam
- Faculty of Biology, Technische Universität Dresden, 01069 Dresden, Germany
- Institute of Biology, NAWI Graz, Karl-Franzens-Universität, 8010 Graz, Austria
| |
Collapse
|
8
|
Mandal AK. Recent insights into crosstalk between genetic parasites and their host genome. Brief Funct Genomics 2024; 23:15-23. [PMID: 36307128 PMCID: PMC10799329 DOI: 10.1093/bfgp/elac032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/14/2022] [Accepted: 09/21/2022] [Indexed: 01/21/2024] Open
Abstract
The bulk of higher order organismal genomes is comprised of transposable element (TE) copies, i.e. genetic parasites. The host-parasite relation is multi-faceted, varying across genomic region (genic versus intergenic), life-cycle stages, tissue-type and of course in health versus pathological state. The reach of functional genomics though, in investigating genotype-to-phenotype relations, has been limited when TEs are involved. The aim of this review is to highlight recent progress made in understanding how TE origin biochemical activity interacts with the central dogma stages of the host genome. Such interaction can also bring about modulation of the immune context and this could have important repercussions in disease state where immunity has a role to play. Thus, the review is to instigate ideas and action points around identifying evolutionary adaptations that the host genome and the genetic parasite have evolved and why they could be relevant.
Collapse
Affiliation(s)
- Amit K Mandal
- Corresponding author: A.K. Mandal, Nuffield Department of Surgical Sciences (NDS), University of Oxford, Old Road Campus Research building (ORCRB), Oxford OX3 7DQ, UK. Tel: +44 (0)1865 617123; Fax: +44 (0)1865 768876; E-mail:
| |
Collapse
|
9
|
Gao D. Introduction of Plant Transposon Annotation for Beginners. BIOLOGY 2023; 12:1468. [PMID: 38132293 PMCID: PMC10741241 DOI: 10.3390/biology12121468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 12/23/2023]
Abstract
Transposons are mobile DNA sequences that contribute large fractions of many plant genomes. They provide exclusive resources for tracking gene and genome evolution and for developing molecular tools for basic and applied research. Despite extensive efforts, it is still challenging to accurately annotate transposons, especially for beginners, as transposon prediction requires necessary expertise in both transposon biology and bioinformatics. Moreover, the complexity of plant genomes and the dynamic evolution of transposons also bring difficulties for genome-wide transposon discovery. This review summarizes the three major strategies for transposon detection including repeat-based, structure-based, and homology-based annotation, and introduces the transposon superfamilies identified in plants thus far, and some related bioinformatics resources for detecting plant transposons. Furthermore, it describes transposon classification and explains why the terms 'autonomous' and 'non-autonomous' cannot be used to classify the superfamilies of transposons. Lastly, this review also discusses how to identify misannotated transposons and improve the quality of the transposon database. This review provides helpful information about plant transposons and a beginner's guide on annotating these repetitive sequences.
Collapse
Affiliation(s)
- Dongying Gao
- Small Grains and Potato Germplasm Research Unit, USDA-ARS, Aberdeen, ID 83210, USA
| |
Collapse
|
10
|
Sproul JS, Hotaling S, Heckenhauer J, Powell A, Marshall D, Larracuente AM, Kelley JL, Pauls SU, Frandsen PB. Analyses of 600+ insect genomes reveal repetitive element dynamics and highlight biodiversity-scale repeat annotation challenges. Genome Res 2023; 33:1708-1717. [PMID: 37739812 PMCID: PMC10691545 DOI: 10.1101/gr.277387.122] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 09/20/2023] [Indexed: 09/24/2023]
Abstract
Repetitive elements (REs) are integral to the composition, structure, and function of eukaryotic genomes, yet remain understudied in most taxonomic groups. We investigated REs across 601 insect species and report wide variation in RE dynamics across groups. Analysis of associations between REs and protein-coding genes revealed dynamic evolution at the interface between REs and coding regions across insects, including notably elevated RE-gene associations in lineages with abundant long interspersed nuclear elements (LINEs). We leveraged this large, empirical data set to quantify impacts of long-read technology on RE detection and investigate fundamental challenges to RE annotation in diverse groups. In long-read assemblies, we detected ∼36% more REs than short-read assemblies, with long terminal repeats (LTRs) showing 162% increased detection, whereas DNA transposons and LINEs showed less respective technology-related bias. In most insect lineages, 25%-85% of repetitive sequences were "unclassified" following automated annotation, compared with only ∼13% in Drosophila species. Although the diversity of available insect genomes has rapidly expanded, we show the rate of community contributions to RE databases has not kept pace, preventing efficient annotation and high-resolution study of REs in most groups. We highlight the tremendous opportunity and need for the biodiversity genomics field to embrace REs and suggest collective steps for making progress toward this goal.
Collapse
Affiliation(s)
- John S Sproul
- Department of Biology, Brigham Young University, Provo, Utah 84602, USA;
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
- Department of Biology, University of Rochester, Rochester, New York 14627, USA
| | - Scott Hotaling
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Watershed Sciences, Utah State University, Logan, Utah 84322, USA
| | - Jacqueline Heckenhauer
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
| | - Ashlyn Powell
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
| | - Dez Marshall
- Department of Biology, University of Nebraska Omaha, Omaha, Nebraska 68182, USA
| | | | - Joanna L Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington 99163, USA
- Department of Ecology and Evolutionary Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA
| | - Steffen U Pauls
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Senckenberg Research Institute and Natural History Museum Frankfurt, 60325 Frankfurt, Germany
- Department of Insect Biotechnology, Justus-Liebig-University Gießen, 35392 Gießen, Germany
| | - Paul B Frandsen
- LOEWE Center for Translational Biodiversity Genomics (LOEWE-TBG), 60325 Frankfurt, Germany
- Department of Plant and Wildlife Sciences, Brigham Young University, Provo, Utah 84602, USA
- Data Science Lab, Smithsonian Institution, Washington, District of Columbia 20560, USA
| |
Collapse
|
11
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023; 14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
- Department of Genetics, University of Georgia, Athens, GA USA
| |
Collapse
|
12
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia-superfamily retrotransposons in yeast. CONCLUSION McClintock (https://github.com/bergmanlab/mcclintock/) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA
- Department of Genetics, University of Georgia, Athens, GA
| |
Collapse
|
13
|
Rodriguez F, Arkhipova IR. An Overview of Best Practices for Transposable Element Identification, Classification, and Annotation in Eukaryotic Genomes. Methods Mol Biol 2023; 2607:1-23. [PMID: 36449155 PMCID: PMC10149145 DOI: 10.1007/978-1-0716-2883-6_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/05/2022]
Abstract
Transposable elements (TEs) exert an increasingly diverse spectrum of influences on eukaryotic genome structure, function, and evolution. A deluge of genomic, transcriptomic, and proteomic data provides the foundation for turning essentially any non-model eukaryotic species into an emerging model to study any and all aspects of organismal biology, ultimately shaping future directions for biomedical, environmental, and biodiversity research. However, identification and annotation of the mobile genome component still lags behind the standards accepted for host gene annotation. To achieve the objective of providing every genome project with a comprehensive description of its mobilome component in addition to the standard genic and transcriptomic datasets, each step of TE identification, classification, and annotation should be focused on improving TE boundary designation, reducing identification error rates, and providing accurate information on the type and integrity of TE insertions. Here, we offer practical advice for generating TE models in de novo assemblies for non-model organisms, provide step-by-step instructions to guide inexperienced TE annotators through some of the commonly utilized TE analysis pipelines, and entertain suggestions for tool improvement which could be implemented by interested developers.
Collapse
Affiliation(s)
- Fernando Rodriguez
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA, USA.
| | - Irina R Arkhipova
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, MA, USA.
| |
Collapse
|
14
|
Goubert C, Craig RJ, Bilat AF, Peona V, Vogan AA, Protasio AV. A beginner's guide to manual curation of transposable elements. Mob DNA 2022; 13:7. [PMID: 35354491 PMCID: PMC8969392 DOI: 10.1186/s13100-021-00259-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Accepted: 12/17/2021] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND In the study of transposable elements (TEs), the generation of a high confidence set of consensus sequences that represent the diversity of TEs found in a given genome is a key step in the path to investigate these fascinating genomic elements. Many algorithms and pipelines are available to automatically identify putative TE families present in a genome. Despite the availability of these valuable resources, producing a library of high-quality full-length TE consensus sequences largely remains a process of manual curation. This know-how is often passed on from mentor-to-mentee within research groups, making it difficult for those outside the field to access this highly specialised skill. RESULTS Our manuscript attempts to fill this gap by providing a set of detailed computer protocols, software recommendations and video tutorials for those aiming to manually curate TEs. Detailed step-by-step protocols, aimed at the complete beginner, are presented in the Supplementary Methods. CONCLUSIONS The proposed set of programs and tools presented here will make the process of manual curation achievable and amenable to all researchers and in special to those new to the field of TEs.
Collapse
Affiliation(s)
- Clement Goubert
- Canadian Center for Computational Genomics, McGill University, Montreal, Québec Canada
- Department of Human Genetics, McGill University, Montreal, Québec Canada
| | - Rory J. Craig
- Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, EH9 3FL UK
| | - Agustin F. Bilat
- Departamento de Genética, Facultad de Medicina, Universidad de la República, Montevideo, Uruguay
| | - Valentina Peona
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Aaron A. Vogan
- Department of Organismal Biology, Uppsala University, Norbyvägen 18D, 752 36 Uppsala, Sweden
| | - Anna V. Protasio
- Department of Pathology, Tennis Court Road, Cambridge, CB1 2PQ UK
- Christ’s College, St Andrews Street, Cambridge, CB2 3BU UK
| |
Collapse
|