1
|
Teterina AA, Willis JH, Baer CF, Phillips PC. Pervasive conservation of intron number and other genetic elements revealed by a chromosome-level genomic assembly of the hyper-polymorphic nematode Caenorhabditis brenneri. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.25.600681. [PMID: 38979286 PMCID: PMC11230420 DOI: 10.1101/2024.06.25.600681] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/10/2024]
Abstract
With within-species genetic diversity estimates that span the gambit of that seen across the entirety of animals, the Caenorhabditis genus of nematodes holds unique potential to provide insights into how population size and reproductive strategies influence gene and genome organization and evolution. Our study focuses on Caenorhabditis brenneri, currently known as one of the most genetically diverse nematodes within its genus and metazoan phyla. Here, we present a high-quality gapless genome assembly and annotation for C. brenneri, revealing a common nematode chromosome arrangement characterized by gene-dense central regions and repeat rich peripheral parts. Comparison of C. brenneri with other nematodes from the 'Elegans' group revealed conserved macrosynteny but a lack of microsynteny, characterized by frequent rearrangements and low correlation iof orthogroup sizes, indicative of high rates of gene turnover. We also assessed genome organization within corresponding syntenic blocks in selfing and outcrossing species, affirming that selfing species predominantly experience loss of both genes and intergenic DNA. Comparison of gene structures revealed strikingly small number of shared introns across species, yet consistent distributions of intron number and length, regardless of population size or reproductive mode, suggesting that their evolutionary dynamics are primarily reflective of functional constraints. Our study provides valuable insights into genome evolution and expands the nematode genome resources with the highly genetically diverse C. brenneri, facilitating research into various aspects of nematode biology and evolutionary processes.
Collapse
Affiliation(s)
- Anastasia A Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
- Center of Parasitology, Severtsov Institute of Ecology and Evolution RAS, Moscow, Russia
| | - John H Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| | - Charles F Baer
- Department of Biology, University of Florida, Gainesville, USA
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, OR, USA
| |
Collapse
|
2
|
Minnick MF. Functional Roles and Genomic Impact of Miniature Inverted-Repeat Transposable Elements (MITEs) in Prokaryotes. Genes (Basel) 2024; 15:328. [PMID: 38540387 PMCID: PMC10969869 DOI: 10.3390/genes15030328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2024] [Revised: 02/27/2024] [Accepted: 03/01/2024] [Indexed: 06/14/2024] Open
Abstract
Prokaryotic genomes are dynamic tapestries that are strongly influenced by mobile genetic elements (MGEs), including transposons (Tn's), plasmids, and bacteriophages. Of these, miniature inverted-repeat transposable elements (MITEs) are undoubtedly the least studied MGEs in bacteria and archaea. This review explores the diversity and distribution of MITEs in prokaryotes and describes what is known about their functional roles in the host and involvement in genomic plasticity and evolution.
Collapse
Affiliation(s)
- Michael F Minnick
- Program in Cellular, Molecular and Microbial Biology, Division of Biological Sciences, University of Montana, Missoula, MT 59812, USA
| |
Collapse
|
3
|
Arnqvist G, Westerberg I, Galbraith J, Sayadi A, Scofield DG, Olsen RA, Immonen E, Bonath F, Ewels P, Suh A. A chromosome-level assembly of the seed beetle Callosobruchus maculatus genome with annotation of its repetitive elements. G3 (BETHESDA, MD.) 2024; 14:jkad266. [PMID: 38092066 PMCID: PMC10849321 DOI: 10.1093/g3journal/jkad266] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Accepted: 10/30/2023] [Indexed: 02/09/2024]
Abstract
Callosobruchus maculatus is a major agricultural pest of legume crops worldwide and an established model system in ecology and evolution. Yet, current molecular biological resources for this species are limited. Here, we employ Hi-C sequencing to generate a greatly improved genome assembly and we annotate its repetitive elements in a dedicated in-depth effort where we manually curate and classify the most abundant unclassified repeat subfamilies. We present a scaffolded chromosome-level assembly, which is 1.01 Gb in total length with 86% being contained within the 9 autosomes and the X chromosome. Repetitive sequences accounted for 70% of the total assembly. DNA transposons covered 18% of the genome, with the most abundant superfamily being Tc1-Mariner (9.75% of the genome). This new chromosome-level genome assembly of C. maculatus will enable future genetic and evolutionary studies not only of this important species but of beetles more generally.
Collapse
Affiliation(s)
- Göran Arnqvist
- Animal Ecology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
| | - Ivar Westerberg
- Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala SE75236, Sweden
- Department of Ecology, Environment and Plant Sciences, Stockholm University, Stockholm SE10691, Sweden
| | - James Galbraith
- School of Biological Sciences, University of Adelaide, Adelaide 5005, Australia
- Faculty of Environment, Science and Economy, University of Exeter, Cornwall TR10 9FE, UK
| | - Ahmed Sayadi
- Rheumatology, Department of Medical Sciences, Uppsala University, Uppsala SE75236, Sweden
| | - Douglas G Scofield
- Evolutionary Biology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
- Uppsala Multidisciplinary Center for Advanced Computational Science, Uppsala University, Uppsala SE75236, Sweden
| | - Remi-André Olsen
- Science for Life Laboratory, Department of Biochemistry and Biophysics, Stockholm University, Stockholm SE10691, Sweden
| | - Elina Immonen
- Evolutionary Biology, Department of Ecology and Genetics, Uppsala University, Uppsala SE75236, Sweden
| | - Franziska Bonath
- Science for Life Laboratory, Department of Molecular Biosciences, The Wenner-Gren Institute, Stockholm University, Stockholm SE10691, Sweden
| | | | - Alexander Suh
- Systematic Biology, Department of Organismal Biology, Uppsala University, Uppsala SE75236, Sweden
| |
Collapse
|
4
|
Gao D. Introduction of Plant Transposon Annotation for Beginners. BIOLOGY 2023; 12:1468. [PMID: 38132293 PMCID: PMC10741241 DOI: 10.3390/biology12121468] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 11/21/2023] [Accepted: 11/23/2023] [Indexed: 12/23/2023]
Abstract
Transposons are mobile DNA sequences that contribute large fractions of many plant genomes. They provide exclusive resources for tracking gene and genome evolution and for developing molecular tools for basic and applied research. Despite extensive efforts, it is still challenging to accurately annotate transposons, especially for beginners, as transposon prediction requires necessary expertise in both transposon biology and bioinformatics. Moreover, the complexity of plant genomes and the dynamic evolution of transposons also bring difficulties for genome-wide transposon discovery. This review summarizes the three major strategies for transposon detection including repeat-based, structure-based, and homology-based annotation, and introduces the transposon superfamilies identified in plants thus far, and some related bioinformatics resources for detecting plant transposons. Furthermore, it describes transposon classification and explains why the terms 'autonomous' and 'non-autonomous' cannot be used to classify the superfamilies of transposons. Lastly, this review also discusses how to identify misannotated transposons and improve the quality of the transposon database. This review provides helpful information about plant transposons and a beginner's guide on annotating these repetitive sequences.
Collapse
Affiliation(s)
- Dongying Gao
- Small Grains and Potato Germplasm Research Unit, USDA-ARS, Aberdeen, ID 83210, USA
| |
Collapse
|
5
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
6
|
Ashwood LM, Elnahriry KA, Stewart ZK, Shafee T, Naseem MU, Szanto TG, van der Burg CA, Smith HL, Surm JM, Undheim EAB, Madio B, Hamilton BR, Guo S, Wai DCC, Coyne VL, Phillips MJ, Dudley KJ, Hurwood DA, Panyi G, King GF, Pavasovic A, Norton RS, Prentis PJ. Genomic, functional and structural analyses elucidate evolutionary innovation within the sea anemone 8 toxin family. BMC Biol 2023; 21:121. [PMID: 37226201 DOI: 10.1186/s12915-023-01617-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2023] [Accepted: 05/09/2023] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND The ShK toxin from Stichodactyla helianthus has established the therapeutic potential of sea anemone venom peptides, but many lineage-specific toxin families in Actiniarians remain uncharacterised. One such peptide family, sea anemone 8 (SA8), is present in all five sea anemone superfamilies. We explored the genomic arrangement and evolution of the SA8 gene family in Actinia tenebrosa and Telmatactis stephensoni, characterised the expression patterns of SA8 sequences, and examined the structure and function of SA8 from the venom of T. stephensoni. RESULTS We identified ten SA8-family genes in two clusters and six SA8-family genes in five clusters for T. stephensoni and A. tenebrosa, respectively. Nine SA8 T. stephensoni genes were found in a single cluster, and an SA8 peptide encoded by an inverted SA8 gene from this cluster was recruited to venom. We show that SA8 genes in both species are expressed in a tissue-specific manner and the inverted SA8 gene has a unique tissue distribution. While the functional activity of the SA8 putative toxin encoded by the inverted gene was inconclusive, its tissue localisation is similar to toxins used for predator deterrence. We demonstrate that, although mature SA8 putative toxins have similar cysteine spacing to ShK, SA8 peptides are distinct from ShK peptides based on structure and disulfide connectivity. CONCLUSIONS Our results provide the first demonstration that SA8 is a unique gene family in Actiniarians, evolving through a variety of structural changes including tandem and proximal gene duplication and an inversion event that together allowed SA8 to be recruited into the venom of T. stephensoni.
Collapse
Affiliation(s)
- Lauren M Ashwood
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia.
- Cancer Program, QIMR Berghofer Medical Research Institute, Brisbane, QLD, 4006, Australia.
| | - Khaled A Elnahriry
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, 3052, Australia
| | - Zachary K Stewart
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Thomas Shafee
- Department of Animal Plant & Soil Sciences, La Trobe University, Melbourne, Australia
- Swinburne University of Technology, Melbourne, VIC, Australia
| | - Muhammad Umair Naseem
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, 4032, Debrecen, Hungary
| | - Tibor G Szanto
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, 4032, Debrecen, Hungary
| | - Chloé A van der Burg
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, 9016, New Zealand
| | - Hayden L Smith
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Joachim M Surm
- Department of Ecology, Evolution and Behavior, Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, 9190401, Jerusalem, Israel
| | - Eivind A B Undheim
- Department of Biosciences, Centre for Ecological and Evolutionary Synthesis, University of Oslo, Blindern, PO Box 1066, 0316, Oslo, Norway
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Bruno Madio
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Brett R Hamilton
- Centre for Advanced Imaging, The University of Queensland, St Lucia, QLD, 4072, Australia
- Centre for Microscopy and Microanalysis, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Shaodong Guo
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Dorothy C C Wai
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, 3052, Australia
| | - Victoria L Coyne
- Research Infrastructure, Central Analytical Research Facility, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Matthew J Phillips
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Kevin J Dudley
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- Research Infrastructure, Central Analytical Research Facility, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - David A Hurwood
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Gyorgy Panyi
- Department of Biophysics and Cell Biology, Faculty of Medicine, University of Debrecen, 4032, Debrecen, Hungary
| | - Glenn F King
- Institute for Molecular Bioscience, The University of Queensland, St Lucia, QLD, 4072, Australia
- ARC Centre for Innovations in Peptide and Protein Science, The University of Queensland, St Lucia, QLD, 4072, Australia
| | - Ana Pavasovic
- School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| | - Raymond S Norton
- Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, 3052, Australia
- ARC Centre for Fragment-Based Design, Monash University, Parkville, VIC, 3052, Australia
| | - Peter J Prentis
- School of Biology and Environmental Science, Faculty of Science, Queensland University of Technology, Brisbane, QLD, 4000, Australia
- Centre for Agriculture and the Bioeconomy, Queensland University of Technology, Brisbane, QLD, 4000, Australia
| |
Collapse
|
7
|
Riehl K, Riccio C, Miska EA, Hemberg M. TransposonUltimate: software for transposon classification, annotation and detection. Nucleic Acids Res 2022; 50:e64. [PMID: 35234904 PMCID: PMC9226531 DOI: 10.1093/nar/gkac136] [Citation(s) in RCA: 26] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/09/2022] [Accepted: 02/14/2022] [Indexed: 12/17/2022] Open
Abstract
Most genomes harbor a large number of transposons, and they play an important role in evolution and gene regulation. They are also of interest to clinicians as they are involved in several diseases, including cancer and neurodegeneration. Although several methods for transposon identification are available, they are often highly specialised towards specific tasks or classes of transposons, and they lack common standards such as a unified taxonomy scheme and output file format. We present TransposonUltimate, a powerful bundle of three modules for transposon classification, annotation, and detection of transposition events. TransposonUltimate comes as a Conda package under the GPL-3.0 licence, is well documented and it is easy to install through https://github.com/DerKevinRiehl/TransposonUltimate. We benchmark the classification module on the large TransposonDB covering 891,051 sequences to demonstrate that it outperforms the currently best existing solutions. The annotation and detection modules combine sixteen existing softwares, and we illustrate its use by annotating Caenorhabditis elegans, Rhizophagus irregularis and Oryza sativa subs. japonica genomes. Finally, we use the detection module to discover 29 554 transposition events in the genomes of 20 wild type strains of C. elegans. Databases, assemblies, annotations and further findings can be downloaded from (https://doi.org/10.5281/zenodo.5518085).
Collapse
Affiliation(s)
- Kevin Riehl
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Cristian Riccio
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Eric A Miska
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02215, USA
| |
Collapse
|
8
|
Storer JM, Hubley R, Rosen J, Smit AFA. Methodologies for the De novo Discovery of Transposable Element Families. Genes (Basel) 2022; 13:709. [PMID: 35456515 PMCID: PMC9025800 DOI: 10.3390/genes13040709] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 04/14/2022] [Accepted: 04/15/2022] [Indexed: 02/07/2023] Open
Abstract
The discovery and characterization of transposable element (TE) families are crucial tasks in the process of genome annotation. Careful curation of TE libraries for each organism is necessary as each has been exposed to a unique and often complex set of TE families. De novo methods have been developed; however, a fully automated and accurate approach to the development of complete libraries remains elusive. In this review, we cover established methods and recent developments in de novo TE analysis. We also present various methodologies used to assess these tools and discuss opportunities for further advancement of the field.
Collapse
Affiliation(s)
| | | | | | - Arian F. A. Smit
- Institute for Systems Biology, Seattle, WA 98109, USA; (J.M.S.); (R.H.); (J.R.)
| |
Collapse
|
9
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
10
|
Zeng C, Takeda A, Sekine K, Osato N, Fukunaga T, Hamada M. Bioinformatics Approaches for Determining the Functional Impact of Repetitive Elements on Non-coding RNAs. Methods Mol Biol 2022; 2509:315-340. [PMID: 35796972 DOI: 10.1007/978-1-0716-2380-0_19] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
With a large number of annotated non-coding RNAs (ncRNAs), repetitive sequences are found to constitute functional components (termed as repetitive elements) in ncRNAs that perform specific biological functions. Bioinformatics analysis is a powerful tool for improving our understanding of the role of repetitive elements in ncRNAs. This chapter summarizes recent findings that reveal the role of repetitive elements in ncRNAs. Furthermore, relevant bioinformatics approaches are systematically reviewed, which promises to provide valuable resources for studying the functional impact of repetitive elements on ncRNAs.
Collapse
Affiliation(s)
- Chao Zeng
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| | - Atsushi Takeda
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Kotaro Sekine
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Naoki Osato
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan
| | - Tsukasa Fukunaga
- Waseda Institute for Advanced Study, Waseda University, Tokyo, Japan
| | - Michiaki Hamada
- Faculty of Science and Engineering, Waseda University, Tokyo, Japan.
- AIST-Waseda University Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), Tokyo, Japan.
| |
Collapse
|
11
|
Hammond‐Kosack MC, King R, Kanyuka K, Hammond‐Kosack KE. Exploring the diversity of promoter and 5'UTR sequences in ancestral, historic and modern wheat. PLANT BIOTECHNOLOGY JOURNAL 2021; 19:2469-2487. [PMID: 34289221 PMCID: PMC8633512 DOI: 10.1111/pbi.13672] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/07/2021] [Revised: 06/15/2021] [Accepted: 07/08/2021] [Indexed: 05/25/2023]
Abstract
A data set of promoter and 5'UTR sequences of homoeo-alleles of 459 wheat genes that contribute to agriculturally important traits in 95 ancestral and commercial wheat cultivars is presented here. The high-stringency myBaits technology used made individual capture of homoeo-allele promoters possible, which is reported here for the first time. Promoters of most genes are remarkably conserved across the 83 hexaploid cultivars used with <7 haplotypes per promoter and 21% being identical to the reference Chinese Spring. InDels and many high-confidence SNPs are located within predicted plant transcription factor binding sites, potentially changing gene expression. Most haplotypes found in the Watkins landraces and a few haplotypes found in Triticum monococcum, germplasms hitherto not thought to have been used in modern wheat breeding, are already found in many commercial hexaploid wheats. The full data set which is useful for genomic and gene function studies and wheat breeding is available at https://rrescloud.rothamsted.ac.uk/index.php/s/DMCFDu5iAGTl50u/authenticate.
Collapse
Affiliation(s)
| | - Robert King
- Department of Computational and Analytical SciencesRothamsted ResearchHarpendenUK
| | - Kostya Kanyuka
- Department of Biointeractions and Crop ProtectionRothamsted ResearchHarpendenUK
| | | |
Collapse
|
12
|
Liao X, Li M, Hu K, Wu FX, Gao X, Wang J. A sensitive repeat identification framework based on short and long reads. Nucleic Acids Res 2021; 49:e100. [PMID: 34214175 PMCID: PMC8464074 DOI: 10.1093/nar/gkab563] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2020] [Revised: 06/08/2021] [Accepted: 06/18/2021] [Indexed: 12/11/2022] Open
Abstract
Numerous studies have shown that repetitive regions in genomes play indispensable roles in the evolution, inheritance and variation of living organisms. However, most existing methods cannot achieve satisfactory performance on identifying repeats in terms of both accuracy and size, since NGS reads are too short to identify long repeats whereas SMS (Single Molecule Sequencing) long reads are with high error rates. In this study, we present a novel identification framework, LongRepMarker, based on the global de novo assembly and k-mer based multiple sequence alignment for precisely marking long repeats in genomes. The major characteristics of LongRepMarker are as follows: (i) by introducing barcode linked reads and SMS long reads to assist the assembly of all short paired-end reads, it can identify the repeats to a greater extent; (ii) by finding the overlap sequences between assemblies or chomosomes, it locates the repeats faster and more accurately; (iii) by using the multi-alignment unique k-mers rather than the high frequency k-mers to identify repeats in overlap sequences, it can obtain the repeats more comprehensively and stably; (iv) by applying the parallel alignment model based on the multi-alignment unique k-mers, the efficiency of data processing can be greatly optimized and (v) by taking the corresponding identification strategies, structural variations that occur between repeats can be identified. Comprehensive experimental results show that LongRepMarker can achieve more satisfactory results than the existing de novo detection methods (https://github.com/BioinformaticsCSU/LongRepMarker).
Collapse
Affiliation(s)
- Xingyu Liao
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Min Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Kang Hu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| | - Fang-Xiang Wu
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N5A9, Canada
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Jianxin Wang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha 410083, P.R. China
| |
Collapse
|
13
|
Lin L, Sharma A, Yu Q. Recent amplification of microsatellite-associated miniature inverted-repeat transposable elements in the pineapple genome. BMC PLANT BIOLOGY 2021; 21:424. [PMID: 34537020 PMCID: PMC8449440 DOI: 10.1186/s12870-021-03194-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/21/2021] [Accepted: 08/09/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND Miniature inverted-repeat transposable elements (MITEs) are non-autonomous DNA transposable elements that play important roles in genome organization and evolution. Genome-wide identification and characterization of MITEs provide essential information for understanding genome structure and evolution. RESULTS We performed genome-wide identification and characterization of MITEs in the pineapple genome. The top two MITE families, accounting for 29.39% of the total MITEs and 3.86% of the pineapple genome, have insertion preference in (TA) n dinucleotide microsatellite regions. We therefore named these MITEs A. comosus microsatellite-associated MITEs (Ac-mMITEs). The two Ac-mMITE families, Ac-mMITE-1 and Ac-mMITE-2, shared sequence similarity in the terminal inverted repeat (TIR) regions, suggesting that these two Ac-mMITE families might be derived from a common or closely related autonomous elements. The Ac-mMITEs are frequently clustered via adjacent insertions. Among the 21,994 full-length Ac-mMITEs, 46.1% of them were present in clusters. By analyzing the Ac-mMITEs without (TA) n microsatellite flanking sequences, we found that Ac-mMITEs were likely derived from Mutator-like DNA transposon. Ac-MITEs showed highly polymorphic insertion sites between cultivated pineapples and their wild relatives. To better understand the evolutionary history of Ac-mMITEs, we filtered and performed comparative analysis on the two distinct groups of Ac-mMITEs, microsatellite-targeting MITEs (mt-MITEs) that are flanked by dinucleotide microsatellites on both sides and mutator-like MITEs (ml-MITEs) that contain 9/10 bp TSDs. Epigenetic analysis revealed a lower level of host-induced silencing on the mt-MITEs in comparison to the ml-MITEs, which partially explained the significantly higher abundance of mt-MITEs in pineapple genome. The mt-MITEs and ml-MITEs exhibited differential insertion preference to gene-related regions and RNA-seq analysis revealed their differential influences on expression regulation of nearby genes. CONCLUSIONS Ac-mMITEs are the most abundant MITEs in the pineapple genome and they were likely derived from Mutator-like DNA transposon. Preferential insertion in (TA) n microsatellite regions of Ac-mMITEs occurred recently and is likely the result of damage-limiting strategy adapted by Ac-mMITEs during co-evolution with their host. Insertion in (TA) n microsatellite regions might also have promoted the amplification of mt-MITEs. In addition, mt-MITEs showed no or negligible impact on nearby gene expression, which may help them escape genome control and lead to their amplification.
Collapse
Affiliation(s)
- Lianyu Lin
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, TX, 75252, USA
- College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, Fujian, China
| | - Anupma Sharma
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, TX, 75252, USA
| | - Qingyi Yu
- Texas A&M AgriLife Research Center at Dallas, Texas A&M University System, Dallas, TX, 75252, USA.
| |
Collapse
|
14
|
Farhat S, Le P, Kayal E, Noel B, Bigeard E, Corre E, Maumus F, Florent I, Alberti A, Aury JM, Barbeyron T, Cai R, Da Silva C, Istace B, Labadie K, Marie D, Mercier J, Rukwavu T, Szymczak J, Tonon T, Alves-de-Souza C, Rouzé P, Van de Peer Y, Wincker P, Rombauts S, Porcel BM, Guillou L. Rapid protein evolution, organellar reductions, and invasive intronic elements in the marine aerobic parasite dinoflagellate Amoebophrya spp. BMC Biol 2021; 19:1. [PMID: 33407428 PMCID: PMC7789003 DOI: 10.1186/s12915-020-00927-9] [Citation(s) in RCA: 36] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2020] [Accepted: 11/12/2020] [Indexed: 12/28/2022] Open
Abstract
BACKGROUND Dinoflagellates are aquatic protists particularly widespread in the oceans worldwide. Some are responsible for toxic blooms while others live in symbiotic relationships, either as mutualistic symbionts in corals or as parasites infecting other protists and animals. Dinoflagellates harbor atypically large genomes (~ 3 to 250 Gb), with gene organization and gene expression patterns very different from closely related apicomplexan parasites. Here we sequenced and analyzed the genomes of two early-diverging and co-occurring parasitic dinoflagellate Amoebophrya strains, to shed light on the emergence of such atypical genomic features, dinoflagellate evolution, and host specialization. RESULTS We sequenced, assembled, and annotated high-quality genomes for two Amoebophrya strains (A25 and A120), using a combination of Illumina paired-end short-read and Oxford Nanopore Technology (ONT) MinION long-read sequencing approaches. We found a small number of transposable elements, along with short introns and intergenic regions, and a limited number of gene families, together contribute to the compactness of the Amoebophrya genomes, a feature potentially linked with parasitism. While the majority of Amoebophrya proteins (63.7% of A25 and 59.3% of A120) had no functional assignment, we found many orthologs shared with Dinophyceae. Our analyses revealed a strong tendency for genes encoded by unidirectional clusters and high levels of synteny conservation between the two genomes despite low interspecific protein sequence similarity, suggesting rapid protein evolution. Most strikingly, we identified a large portion of non-canonical introns, including repeated introns, displaying a broad variability of associated splicing motifs never observed among eukaryotes. Those introner elements appear to have the capacity to spread over their respective genomes in a manner similar to transposable elements. Finally, we confirmed the reduction of organelles observed in Amoebophrya spp., i.e., loss of the plastid, potential loss of a mitochondrial genome and functions. CONCLUSION These results expand the range of atypical genome features found in basal dinoflagellates and raise questions regarding speciation and the evolutionary mechanisms at play while parastitism was selected for in this particular unicellular lineage.
Collapse
Affiliation(s)
- Sarah Farhat
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
- School of Marine and Atmospheric Sciences, Stony Brook University, Stony Brook, New York, 11794, USA
| | - Phuong Le
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Ehsan Kayal
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Benjamin Noel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Estelle Bigeard
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Erwan Corre
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Florian Maumus
- URGI, INRA, Université Paris-Saclay, 78026, Versailles, France
| | - Isabelle Florent
- Unité Molécules de Communication et Adaptation des Microorganismes (MCAM, UMR7245), Muséum national d'Histoire naturelle, CNRS, CP 52, 57 rue Cuvier, 75005, Paris, France
| | - Adriana Alberti
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jean-Marc Aury
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Tristan Barbeyron
- Sorbonne Université, CNRS, UMR 8227, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
| | - Ruibo Cai
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Corinne Da Silva
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Benjamin Istace
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Karine Labadie
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Dominique Marie
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Jonathan Mercier
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Tsinda Rukwavu
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Jeremy Szymczak
- Sorbonne Université, CNRS, FR2424, Station Biologique de Roscoff, Place Georges Teissier, 29680, Roscoff, France
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France
| | - Thierry Tonon
- Centre for Novel Agricultural Products, Department of Biology, University of York, Heslington, York, YO10 5DD, UK
| | - Catharina Alves-de-Souza
- Algal Resources Collection, MARBIONC, Center for Marine Sciences, University of North Carolina Wilmington, 5600 Marvin K. Moss Lane, Wilmington, NC, 28409, USA
| | - Pierre Rouzé
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Yves Van de Peer
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
- Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa
| | - Patrick Wincker
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France
| | - Stephane Rombauts
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- VIB Center for Plant Systems Biology, Ghent, Belgium
| | - Betina M Porcel
- Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ. Evry, Université Paris-Saclay, 91057, Evry, France.
| | - Laure Guillou
- Sorbonne Université, CNRS, UMR7144 Adaptation et Diversité en Milieu Marin, Ecology of Marine Plankton (ECOMAP), Station Biologique de Roscoff SBR, 29680, Roscoff, France.
| |
Collapse
|
15
|
Molin WT, Yaguchi A, Blenner M, Saski CA. The EccDNA Replicon: A Heritable, Extranuclear Vehicle That Enables Gene Amplification and Glyphosate Resistance in Amaranthus palmeri. THE PLANT CELL 2020; 32:2132-2140. [PMID: 32327538 PMCID: PMC7346551 DOI: 10.1105/tpc.20.00099] [Citation(s) in RCA: 43] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Revised: 04/09/2020] [Accepted: 04/21/2020] [Indexed: 05/10/2023]
Abstract
Gene copy number variation is a predominant mechanism used by organisms to respond to selective pressures from the environment. This often results in unbalanced structural variations that perpetuate as adaptations to sustain life. However, the underlying mechanisms that give rise to gene proliferation are poorly understood. Here, we show a unique result of genomic plasticity in Amaranthus palmeri: a massive, ∼400-kb extrachromosomal circular DNA (eccDNA) that harbors the 5-ENOYLPYRUVYLSHIKIMATE-3-PHOSPHATE SYNTHASE (EPSPS) gene and 58 other genes whose encoded functions traverse detoxification, replication, recombination, transposition, tethering, and transport. Gene expression analysis under glyphosate stress showed transcription of 41 of these 59 genes, with high expression of EPSPS, as well as genes coding for aminotransferases, zinc finger proteins, and several uncharacterized proteins. The genomic architecture of the eccDNA replicon is composed of a complex arrangement of repeat sequences and mobile genetic elements interspersed among arrays of clustered palindromes that may be crucial for stability, DNA duplication and tethering, and/or a means of nuclear integration of the adjacent and intervening sequences. Comparative analysis of orthologous genes in grain amaranth (Amaranthus hypochondriacus) and waterhemp (Amaranthus tuberculatus) suggests that higher order chromatin interactions contribute to the genomic origins of the A. palmeri eccDNA replicon structure.
Collapse
Affiliation(s)
- William T Molin
- Crop Protection Systems Research Unit, U.S. Department of Agriculture-Agricultural Research Service, Stoneville, Mississippi 38776
| | - Allison Yaguchi
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634
| | - Mark Blenner
- Department of Chemical and Biomolecular Engineering, Clemson University, Clemson, South Carolina 29634
| | - Christopher A Saski
- Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina 29634
| |
Collapse
|
16
|
Yan H, Bombarely A, Li S. DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 2020; 36:4269-4275. [DOI: 10.1093/bioinformatics/btaa519] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2020] [Revised: 04/12/2020] [Accepted: 05/12/2020] [Indexed: 01/23/2023] Open
Abstract
Abstract
Motivation
Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis.
Results
We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes.
Availability and implementation
DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Haidong Yan
- School of Plant and Environmental Sciences (SPES), Virginia Tech, Blacksburg, VA 24061, USA
| | - Aureliano Bombarely
- School of Plant and Environmental Sciences (SPES), Virginia Tech, Blacksburg, VA 24061, USA
- Department of Life Sciences, University of Milan, Milan 20122, Italy
| | - Song Li
- School of Plant and Environmental Sciences (SPES), Virginia Tech, Blacksburg, VA 24061, USA
- Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
17
|
Teterina AA, Willis JH, Phillips PC. Chromosome-Level Assembly of the Caenorhabditis remanei Genome Reveals Conserved Patterns of Nematode Genome Organization. Genetics 2020; 214:769-780. [PMID: 32111628 PMCID: PMC7153949 DOI: 10.1534/genetics.119.303018] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2019] [Accepted: 02/24/2020] [Indexed: 12/23/2022] Open
Abstract
The nematode Caenorhabditis elegans is one of the key model systems in biology, including possessing the first fully assembled animal genome. Whereas C. elegans is a self-reproducing hermaphrodite with fairly limited within-population variation, its relative C. remanei is an outcrossing species with much more extensive genetic variation, making it an ideal parallel model system for evolutionary genetic investigations. Here, we greatly improve on previous assemblies by generating a chromosome-level assembly of the entire C. remanei genome (124.8 Mb of total size) using long-read sequencing and chromatin conformation capture data. Like other fully assembled genomes in the genus, we find that the C. remanei genome displays a high degree of synteny with C. elegans despite multiple within-chromosome rearrangements. Both genomes have high gene density in central regions of chromosomes relative to chromosome ends and the opposite pattern for the accumulation of repetitive elements. C. elegans and C. remanei also show similar patterns of interchromosome interactions, with the central regions of chromosomes appearing to interact with one another more than the distal ends. The new C. remanei genome presented here greatly augments the use of the Caenorhabditis as a platform for comparative genomics and serves as a basis for molecular population genetics within this highly diverse species.
Collapse
Affiliation(s)
- Anastasia A Teterina
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
- Center of Parasitology, A.N. Severtsov Institute of Ecology and Evolution, Russian Academy of Sciences, Moscow 117071, Russia
| | - John H Willis
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| | - Patrick C Phillips
- Institute of Ecology and Evolution, University of Oregon, Eugene, Oregon 97403
| |
Collapse
|
18
|
Miniature inverted-repeat transposable elements (MITEs), derived insertional polymorphism as a tool of marker systems for molecular plant breeding. Mol Biol Rep 2020; 47:3155-3167. [PMID: 32162128 DOI: 10.1007/s11033-020-05365-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 02/29/2020] [Indexed: 12/20/2022]
Abstract
Plant molecular breeding is expected to give significant gains in cultivar development through development and utilization of suitable molecular marker systems for genetic diversity analysis, rapid DNA fingerprinting, identification of true hybrids, trait mapping and marker-assisted selection. Transposable elements (TEs) are the most abundant component in a genome and being used as genetic markers in the plant molecular breeding. Here, we review on the high copious transposable element belonging to class-II DNA TEs called "miniature inverted-repeat transposable elements" (MITEs). MITEs are ubiquitous, short and non-autonomous DNA transposable elements which have a tendency to insert into genes and genic regions have paved a way for the development of functional DNA marker systems in plant genomes. This review summarises the characteristics of MITEs, principles and methodologies for development of MITEs based DNA markers, bioinformatics tools and resources for plant MITE discovery and their utilization in crop improvement.
Collapse
|
19
|
Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 2019. [PMID: 31843001 DOI: 10.1101/657890v1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/11/2023] Open
Abstract
BACKGROUND Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA
| | - Weija Su
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697, USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
| | - Jireh R A Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Adam J Hellinga
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | | | - Tyler A Elliott
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, N1G 2W1, Canada
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, 11724, USA
- USDA-ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY, 14853, USA
| | - Thomas Peterson
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA, 50011, USA
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI, 48824, USA.
| | - Candice N Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN, 55108, USA.
| | - Matthew B Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA, 50011, USA.
| |
Collapse
|
20
|
Ou S, Su W, Liao Y, Chougule K, Agda JRA, Hellinga AJ, Lugo CSB, Elliott TA, Ware D, Peterson T, Jiang N, Hirsch CN, Hufford MB. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 2019; 20:275. [PMID: 31843001 PMCID: PMC6913007 DOI: 10.1186/s13059-019-1905-y] [Citation(s) in RCA: 473] [Impact Index Per Article: 94.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2019] [Accepted: 11/28/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. RESULTS We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. CONCLUSIONS The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| | - Weija Su
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
| | - Yi Liao
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697 USA
| | - Kapeel Chougule
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA
| | - Jireh R. A. Agda
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | - Adam J. Hellinga
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | | | - Tyler A. Elliott
- Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario N1G 2W1 Canada
| | - Doreen Ware
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724 USA
- USDA-ARS NEA Robert W. Holley Center for Agriculture and Health, Cornell University, Ithaca, NY 14853 USA
| | - Thomas Peterson
- Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011 USA
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, MI 48824 USA
| | - Candice N. Hirsch
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN 55108 USA
| | - Matthew B. Hufford
- Department of Ecology, Evolution, and Organismal Biology, Iowa State University, Ames, IA 50011 USA
| |
Collapse
|
21
|
Surm JM, Stewart ZK, Papanicolaou A, Pavasovic A, Prentis PJ. The draft genome of Actinia tenebrosa reveals insights into toxin evolution. Ecol Evol 2019; 9:11314-11328. [PMID: 31641475 PMCID: PMC6802032 DOI: 10.1002/ece3.5633] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2019] [Revised: 08/06/2019] [Accepted: 08/12/2019] [Indexed: 12/17/2022] Open
Abstract
Sea anemones have a wide array of toxic compounds (peptide toxins found in their venom) which have potential uses as therapeutics. To date, the majority of studies characterizing toxins in sea anemones have been restricted to species from the superfamily, Actinioidea. No highly complete draft genomes are currently available for this superfamily, however, highlighting our limited understanding of the genes encoding toxins in this important group. Here we have sequenced, assembled, and annotated a draft genome for Actinia tenebrosa. The genome is estimated to be approximately 255 megabases, with 31,556 protein-coding genes. Quality metrics revealed that this draft genome matches the quality and completeness of other model cnidarian genomes, including Nematostella, Hydra, and Acropora. Phylogenomic analyses revealed strong conservation of the Cnidaria and Hexacorallia core-gene set. However, we found that lineage-specific gene families have undergone significant expansion events compared with shared gene families. Enrichment analysis performed for both gene ontologies, and protein domains revealed that genes encoding toxins contribute to a significant proportion of the lineage-specific genes and gene families. The results make clear that the draft genome of A. tenebrosa will provide insight into the evolution of toxins and lineage-specific genes, and provide an important resource for the discovery of novel biological compounds.
Collapse
Affiliation(s)
- Joachim M. Surm
- Faculty of HealthSchool of Biomedical SciencesQueensland University of TechnologyKelvin GroveQldAustralia
- Institute of Health and Biomedical InnovationQueensland University of TechnologyKelvin GroveQldAustralia
| | - Zachary K. Stewart
- Science and Engineering FacultySchool of Earth, Environmental and Biological SciencesQueensland University of TechnologyBrisbaneQldAustralia
- Institute for Future EnvironmentsQueensland University of TechnologyBrisbaneQldAustralia
| | | | - Ana Pavasovic
- Faculty of HealthSchool of Biomedical SciencesQueensland University of TechnologyKelvin GroveQldAustralia
| | - Peter J. Prentis
- Science and Engineering FacultySchool of Earth, Environmental and Biological SciencesQueensland University of TechnologyBrisbaneQldAustralia
- Institute for Future EnvironmentsQueensland University of TechnologyBrisbaneQldAustralia
| |
Collapse
|
22
|
Shi J, Liang C. Generic Repeat Finder: A High-Sensitivity Tool for Genome-Wide De Novo Repeat Detection. PLANT PHYSIOLOGY 2019; 180:1803-1815. [PMID: 31152127 PMCID: PMC6670090 DOI: 10.1104/pp.19.00386] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Accepted: 05/17/2019] [Indexed: 05/25/2023]
Abstract
Comprehensive and accurate annotation of the repeatome, including transposons, is critical for deepening our understanding of repeat origins, biogenesis, regulatory mechanisms, and roles. Here, we developed Generic Repeat Finder (GRF), a tool for genome-wide repeat detection based on fast, exhaustive numerical calculation algorithms integrated with optimized dynamic programming strategies. GRF sensitively identifies terminal inverted repeats (TIRs), terminal direct repeats (TDRs), and interspersed repeats that bear both inverted and direct repeats. GRF also detects DNA or RNA transposable elements characterized by these repeats in plant and animal genomes. For TIRs and TDRs, GRF identifies spacers in the middle and mismatches/insertions or deletions in terminal repeats, showing their alignment or base-pairing information. GRF helps improve the annotation for various DNA transposons and retrotransposons, such as miniature inverted-repeat transposable elements (MITEs), long terminal repeat (LTR) retrotransposons, and non-LTR retrotransposons, including long interspersed nuclear elements and short interspersed nuclear elements in plants. We used GRF to perform TIR/TDR, interspersed-repeat, and MITE detection in several species, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and mouse (Mus musculus). As a generic bioinformatics tool in repeat finding implemented as a parallelized C++ program, GRF was faster and more sensitive than the existing inverted repeat/MITE detection tools based on numerical approaches (i.e. detectIR and detectMITE) in Arabidopsis and mouse. GRF is more sensitive than Inverted Repeat Finder in TIR detection, LTR_FINDER in short TDR detection (≤1,000 nt), and phRAIDER in interspersed repeat detection in Arabidopsis and rice. GRF is an open source available from Github.
Collapse
Affiliation(s)
- Jieming Shi
- Department of Biology, Miami University, Oxford, Ohio 45056
| | - Chun Liang
- Department of Biology, Miami University, Oxford, Ohio 45056
- Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio 45056
| |
Collapse
|
23
|
Ahmadzadeh V, Farajnia S, Baghban R, Rahbarnia L, Zarredar H. CRISPR-Cas system: Toward a more efficient technology for genome editing and beyond. J Cell Biochem 2019; 120:16379-16392. [PMID: 31219653 DOI: 10.1002/jcb.29140] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 05/07/2019] [Indexed: 12/26/2022]
Abstract
Genome engineering technology is of great interest for biomedical research that enables scientists to make specific manipulation in the DNA sequence. Early methods for introducing double-stranded DNA breaks relies on protein-based systems. These platforms have enabled fascinating advances, but all are costly and time-consuming to engineer, preventing these from gaining high-throughput applications. The CRISPR-Cas9 system, co-opted from bacteria, has generated considerable excitement in gene targeting. In this review, we describe gene targeting techniques with an emphasis on recent strategies to improve the specificities of CRISPR-Cas systems for nuclease and non-nuclease applications.
Collapse
Affiliation(s)
- Vahideh Ahmadzadeh
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Safar Farajnia
- Drug Applied Research Center, Tabriz University of Medical Sciences, Tabriz, Iran.,Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Roghayyeh Baghban
- Biotechnology Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Leila Rahbarnia
- Infectious and Tropical Diseases Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Habib Zarredar
- Tuberculosis and Lung Disease Research Center, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
24
|
Su W, Gu X, Peterson T. TIR-Learner, a New Ensemble Method for TIR Transposable Element Annotation, Provides Evidence for Abundant New Transposable Elements in the Maize Genome. MOLECULAR PLANT 2019; 12:447-460. [PMID: 30802553 DOI: 10.1016/j.molp.2019.02.008] [Citation(s) in RCA: 53] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2018] [Revised: 02/19/2019] [Accepted: 02/19/2019] [Indexed: 05/21/2023]
Abstract
Transposable elements (TEs) make up a large and rapidly evolving proportion of plant genomes. Among Class II DNA TEs, TIR elements are flanked by characteristic terminal inverted repeat sequences (TIRs). TIR TEs may play important roles in genome evolution, including generating allelic diversity, inducing structural variation, and regulating gene expression. However, TIR TE identification and annotation has been hampered by the lack of effective tools, resulting in erroneous TE annotations and a significant underestimation of the proportion of TIR elements in the maize genome. This problem has largely limited our understanding of the impact of TIR elements on plant genome structure and evolution. In this paper, we propose a new method of TIR element detection and annotation. This new pipeline combines the advantages of current homology-based annotation methods with powerful de novo machine-learning approaches, resulting in greatly increased efficiency and accuracy of TIR element annotation. The results show that the copy number and genome proportion of TIR elements in maize is much larger than that of current annotations. In addition, the distribution of some TIR superfamily elements is reduced in centromeric and pericentromeric positions, while others do not show a similar bias. Finally, the incorporation of machine-learning techniques has enabled the identification of large numbers of new DTA (hAT) family elements, which have all the hallmarks of bona fide TEs yet which lack high homology with currently known DTA elements. Together, these results provide new tools for TE research and new insight into the impact of TIR elements on maize genome diversity.
Collapse
Affiliation(s)
- Weijia Su
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA
| | - Xun Gu
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA
| | - Thomas Peterson
- Department of Genetics, Development and Cell Biology, Iowa State University, Ames, IA 50011-3260, USA; Department of Agronomy, Iowa State University, Ames, IA 50011-3260, USA.
| |
Collapse
|
25
|
MiteFinderII: a novel tool to identify miniature inverted-repeat transposable elements hidden in eukaryotic genomes. BMC Med Genomics 2018; 11:101. [PMID: 30453969 PMCID: PMC6245586 DOI: 10.1186/s12920-018-0418-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Background Miniature inverted-repeat transposable element (MITE) is a type of class II non-autonomous transposable element playing a crucial role in the process of evolution in biology. There is an urgent need to develop bioinformatics tools to effectively identify MITEs on a whole genome-wide scale. However, most of currently existing tools suffer from low ability to deal with large eukaryotic genomes. Methods In this paper, we proposed a novel tool MiteFinderII, which was adapted from our previous algorithm MiteFinder, to efficiently detect MITEs from genomics sequences. It has six major steps: (1) build K-mer Index and search for inverted repeats; (2) filtration of inverted repeats with low complexity; (3) merger of inverted repeats; (4) filtration of candidates with low score; (5) selection of final MITE sequences; (6) selection of representative sequences. Results To test the performance, MiteFinderII and three other existing algorithms were applied to identify MITEs on the whole genome of oryza sativa. Results suggest that MiteFinderII outperforms existing popular tools in terms of both specificity and recall. Additionally, it is much faster and more memory-efficient than other tools in the detection. Conclusion MiteFinderII is an accurate and effective tool to detect MITEs hidden in eukaryotic genomes. The source code is freely accessible at the website: https://github.com/screamer/miteFinder.
Collapse
|
26
|
Crescente JM, Zavallo D, Helguera M, Vanzetti LS. MITE Tracker: an accurate approach to identify miniature inverted-repeat transposable elements in large genomes. BMC Bioinformatics 2018; 19:348. [PMID: 30285604 PMCID: PMC6171319 DOI: 10.1186/s12859-018-2376-y] [Citation(s) in RCA: 64] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 09/18/2018] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Miniature inverted-repeat transposable elements (MITEs) are short, non-autonomous class II transposable elements present in a high number of conserved copies in eukaryote genomes. An accurate identification of these elements can help to shed light on the mechanisms controlling genome evolution and gene regulation. The structure and distribution of these elements are well-defined and therefore computational approaches can be used to identify MITEs sequences. RESULTS Here we describe MITE Tracker, a novel, open source software program that finds and classifies MITEs using an efficient alignment strategy to retrieve nearby inverted-repeat sequences from large genomes. This program groups them into high sequence homology families using a fast clustering algorithm and finally filters only those elements that were likely transposed from different genomic locations because of their low scoring flanking sequence alignment. CONCLUSIONS Many programs have been proposed to find MITEs hidden in genomes. However, none of them are able to process large-scale genomes such as that of bread wheat. Furthermore, in many cases the existing methods perform high false-positive rates (or miss rates). The rice genome was used as reference to compare MITE Tracker against known tools. Our method turned out to be the most reliable in our tests. Indeed, it revealed more known elements, presented the lowest false-positive number and was the only program able to run with the bread wheat genome as input. In wheat, MITE Tracker discovered 6013 MITE families and allowed the first structural exploration of MITEs in the complete bread wheat genome.
Collapse
Affiliation(s)
- Juan Manuel Crescente
- Grupo Biotecnología y Recursos Genéticos, EEA INTA Marcos Juárez, Ruta 12 km 3, 2580, Marcos Juárez, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| | - Diego Zavallo
- Instituto de Biotecnología, CNIA, Instituto Nacional de Tecnología Agropecuaria (INTA) Castelar, Los Reseros y Nicolas Repeto, Hurlingham, Buenos Aires, Argentina
| | - Marcelo Helguera
- Grupo Biotecnología y Recursos Genéticos, EEA INTA Marcos Juárez, Ruta 12 km 3, 2580, Marcos Juárez, Argentina
| | - Leonardo Sebastián Vanzetti
- Grupo Biotecnología y Recursos Genéticos, EEA INTA Marcos Juárez, Ruta 12 km 3, 2580, Marcos Juárez, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Buenos Aires, Argentina
| |
Collapse
|
27
|
Ou S, Jiang N. LTR_retriever: A Highly Accurate and Sensitive Program for Identification of Long Terminal Repeat Retrotransposons. PLANT PHYSIOLOGY 2018; 176:1410-1422. [PMID: 29233850 PMCID: PMC5813529 DOI: 10.1104/pp.17.01310] [Citation(s) in RCA: 552] [Impact Index Per Article: 92.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/13/2017] [Accepted: 12/10/2017] [Indexed: 05/18/2023]
Abstract
Long terminal repeat retrotransposons (LTR-RTs) are prevalent in plant genomes. The identification of LTR-RTs is critical for achieving high-quality gene annotation. Based on the well-conserved structure, multiple programs were developed for the de novo identification of LTR-RTs; however, these programs are associated with low specificity and high false discovery rates. Here, we report LTR_retriever, a multithreading-empowered Perl program that identifies LTR-RTs and generates high-quality LTR libraries from genomic sequences. LTR_retriever demonstrated significant improvements by achieving high levels of sensitivity (91%), specificity (97%), accuracy (96%), and precision (90%) in rice (Oryza sativa). LTR_retriever is also compatible with long sequencing reads. With 40k self-corrected PacBio reads equivalent to 4.5× genome coverage in Arabidopsis (Arabidopsis thaliana), the constructed LTR library showed excellent sensitivity and specificity. In addition to canonical LTR-RTs with 5'-TG…CA-3' termini, LTR_retriever also identifies noncanonical LTR-RTs (non-TGCA), which have been largely ignored in genome-wide studies. We identified seven types of noncanonical LTRs from 42 out of 50 plant genomes. The majority of noncanonical LTRs are Copia elements, with which the LTR is four times shorter than that of other Copia elements, which may be a result of their target specificity. Strikingly, non-TGCA Copia elements are often located in genic regions and preferentially insert nearby or within genes, indicating their impact on the evolution of genes and their potential as mutagenesis tools.
Collapse
Affiliation(s)
- Shujun Ou
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| | - Ning Jiang
- Department of Horticulture, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
28
|
Modulating signaling networks by CRISPR/Cas9-mediated transposable element insertion. Curr Genet 2017; 64:405-412. [DOI: 10.1007/s00294-017-0765-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2017] [Revised: 10/01/2017] [Accepted: 10/09/2017] [Indexed: 12/11/2022]
|
29
|
Ge R, Mai G, Zhang R, Wu X, Wu Q, Zhou F. MUSTv2: An Improved De Novo Detection Program for Recently Active Miniature Inverted Repeat Transposable Elements (MITEs). J Integr Bioinform 2017; 14:/j/jib.ahead-of-print/jib-2017-0029/jib-2017-0029.xml. [PMID: 28796642 PMCID: PMC6042816 DOI: 10.1515/jib-2017-0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2017] [Accepted: 05/08/2017] [Indexed: 11/15/2022] Open
Abstract
Background Miniature inverted repeat transposable element (MITE) is a short transposable element, carrying no protein-coding regions. However, its high proliferation rate and sequence-specific insertion preference renders it as a good genetic tool for both natural evolution and experimental insertion mutagenesis. Recently active MITE copies are those with clear signals of Terminal Inverted Repeats (TIRs) and Direct Repeats (DRs), and are recently translocated into their current sites. Their proliferation ability renders them good candidates for the investigation of genomic evolution. Results This study optimizes the C++ code and running pipeline of the MITE Uncovering SysTem (MUST) by assuming no prior knowledge of MITEs required from the users, and the current version, MUSTv2, shows significantly increased detection accuracy for recently active MITEs, compared with similar programs. The running speed is also significantly increased compared with MUSTv1. We prepared a benchmark dataset, the simulated genome with 150 MITE copies for researchers who may be of interest. Conclusions MUSTv2 represents an accurate detection program of recently active MITE copies, which is complementary to the existing template-based MITE mapping programs. We believe that the release of MUSTv2 will greatly facilitate the genome annotation and structural analysis of the bioOMIC big data researchers.
Collapse
|
30
|
Guo C, Spinelli M, Ye C, Li QQ, Liang C. Genome-Wide Comparative Analysis of Miniature Inverted Repeat Transposable Elements in 19 Arabidopsis thaliana Ecotype Accessions. Sci Rep 2017; 7:2634. [PMID: 28572566 PMCID: PMC5454002 DOI: 10.1038/s41598-017-02855-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2016] [Accepted: 04/20/2017] [Indexed: 01/03/2023] Open
Abstract
Miniature inverted repeat transposable elements (MITEs) are prevalent in eukaryotic genomes. They are known to critically influence the process of genome evolution and play a role in gene regulation. As the first study concentrated in the transposition activities of MITEs among different ecotype accessions within a species, we conducted a genome-wide comparative analysis by characterizing and comparing MITEs in 19 Arabidopsis thaliana accessions. A total of 343485 MITE putative sequences, including canonical, diverse and partial ones, were delineated from all 19 accessions. Within the entire population of MITEs sequences, 80.7% of them were previously unclassified MITEs, demonstrating a different genomic distribution and functionality compared to the classified MITEs. The interactions between MITEs and homologous genes across 19 accessions provided a fine source for analyzing MITE transposition activities and their impacts on genome evolution. Moreover, a significant proportion of MITEs were found located in the last exon of genes besides the ordinary intron locality, thus potentially modifying the end of genes. Finally, analysis of the impact of MITEs on gene expression suggests that migrations of MITEs have no detectable effect on the expression level for host genes across accessions.
Collapse
Affiliation(s)
- Cheng Guo
- Department of Biology, Miami University, Oxford, OH, 45056, USA
| | | | - Congting Ye
- Key Laboratory of the Ministry of Education for Costal and Wetland Ecosystems College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China
| | - Qingshun Q Li
- Key Laboratory of the Ministry of Education for Costal and Wetland Ecosystems College of the Environment and Ecology, Xiamen University, Xiamen, Fujian, 361102, China.
- Graduate College of Biomedical Sciences, Western University of Health Sciences, Pomona, CA, 91766, USA.
| | - Chun Liang
- Department of Biology, Miami University, Oxford, OH, 45056, USA.
| |
Collapse
|