Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Robb SM, Lu L, Valencia E, Burnette JM 3rd, Okumoto Y, Wessler SR, Stajich JE. The use of RelocaTE and unassembled short reads to produce high-resolution snapshots of transposable element generated diversity in rice. G3 (Bethesda) 2013;3:949-57. [PMID: 23576519 DOI: 10.1534/g3.112.005348] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

For:	Robb SM, Lu L, Valencia E, Burnette JM 3rd, Okumoto Y, Wessler SR, Stajich JE. The use of RelocaTE and unassembled short reads to produce high-resolution snapshots of transposable element generated diversity in rice. G3 (Bethesda) 2013;3:949-57. [PMID: 23576519 DOI: 10.1534/g3.112.005348] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]

Number

Cited by Other Article(s)

Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023;14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open

Abstract

BACKGROUND

Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors.

RESULTS

We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast.

CONCLUSION

McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.

Collapse

Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of short-read transposable element detectors and species-wide data mining of insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]

Abstract

Background

Results

We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae , we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of âˆ¼ 1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge aboutfine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia -superfamily retrotransposons in yeast.

Conclusion

Collapse

Yan H, Haak DC, Li S, Huang L, Bombarely A. Exploring transposable element-based markers to identify allelic variations underlying agronomic traits in rice. PLANT COMMUNICATIONS 2022;3:100270. [PMID: 35576152 PMCID: PMC9251385 DOI: 10.1016/j.xplc.2021.100270] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 10/29/2021] [Accepted: 12/16/2021] [Indexed: 06/10/2023]

Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022;2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]

Han S, Basting PJ, Dias GB, Luhur A, Zelhof AC, Bergman CM. Transposable element profiles reveal cell line identity and loss of heterozygosity in Drosophila cell culture. Genetics 2021;219:6321957. [PMID: 34849875 PMCID: PMC8633141 DOI: 10.1093/genetics/iyab113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/01/2021] [Indexed: 11/28/2022] Open

Genomic diversity generated by a transposable element burst in a rice recombinant inbred population. Proc Natl Acad Sci U S A 2020;117:26288-26297. [PMID: 33020276 PMCID: PMC7584900 DOI: 10.1073/pnas.2015736117] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Bogaerts-Márquez M, Barrón MG, Fiston-Lavier AS, Vendrell-Mir P, Castanera R, Casacuberta JM, González J. T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data. Bioinformatics 2020;36:1191-1197. [PMID: 31580402 PMCID: PMC7703783 DOI: 10.1093/bioinformatics/btz727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 09/16/2019] [Accepted: 09/25/2019] [Indexed: 12/22/2022] Open

Nandety RS, Serrani‐Yarce JC, Gill US, Oh S, Lee H, Zhang X, Dai X, Zhang W, Krom N, Wen J, Zhao PX, Mysore KS. Insertional mutagenesis of Brachypodium distachyon using the Tnt1 retrotransposable element. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020;103:1924-1936. [PMID: 32410353 PMCID: PMC7496502 DOI: 10.1111/tpj.14813] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 04/29/2020] [Accepted: 05/05/2020] [Indexed: 06/11/2023]

Miniature inverted-repeat transposable elements (MITEs), derived insertional polymorphism as a tool of marker systems for molecular plant breeding. Mol Biol Rep 2020;47:3155-3167. [PMID: 32162128 DOI: 10.1007/s11033-020-05365-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 02/29/2020] [Indexed: 12/20/2022]

Macko-Podgórni A, Stelmach K, Kwolek K, Grzebelus D. Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot. Mob DNA 2019;10:47. [PMID: 31798695 PMCID: PMC6881990 DOI: 10.1186/s13100-019-0190-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/21/2019] [Indexed: 01/02/2023] Open

Abstract

BACKGROUND

Miniature inverted repeat transposable elements (MITEs) are small non-autonomous DNA transposons that are ubiquitous in plant genomes, and are mobilised by their autonomous relatives. Stowaway MITEs are derived from and mobilised by elements from the mariner superfamily. Those elements constitute a significant portion of the carrot genome; however the variation caused by Daucus carota Stowaway MITEs (DcStos), their association with genes and their putative impact on genome evolution has not been comprehensively analysed.

RESULTS

Fourteen families of Stowaway elements DcStos occupy about 0.5% of the carrot genome. We systematically analysed 31 genomes of wild and cultivated Daucus carota, yielding 18.5 thousand copies of these elements, showing remarkable insertion site polymorphism. DcSto element demography differed based on the origin of the host populations, and corresponded with the four major groups of D. carota, wild European, wild Asian, eastern cultivated and western cultivated. The DcStos elements were associated with genes, and most frequently occurred in 5' and 3' untranslated regions (UTRs). Individual families differed in their propensity to reside in particular segments of genes. Most importantly, DcSto copies in the 2 kb regions up- and downstream of genes were more frequently associated with open reading frames encoding transcription factors, suggesting their possible functional impact. More than 1.5% of all DcSto insertion sites in different host genomes contained different copies in exactly the same position, indicating the existence of insertional hotspots. The DcSto7b family was much more polymorphic than the other families in cultivated carrot. A line of evidence pointed at its activity in the course of carrot domestication, and identified Dcmar1 as an active carrot mariner element and a possible source of the transposition machinery for DcSto7b.

CONCLUSION

Stowaway MITEs have made a substantial contribution to the structural and functional variability of the carrot genome.

Collapse

Bae J, Lee KW, Islam MN, Yim HS, Park H, Rho M. iMGEins: detecting novel mobile genetic elements inserted in individual genomes. BMC Genomics 2018;19:944. [PMID: 30563451 PMCID: PMC6299635 DOI: 10.1186/s12864-018-5290-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/20/2018] [Indexed: 11/10/2022] Open

Serrato-Capuchina A, Matute DR. The Role of Transposable Elements in Speciation. Genes (Basel) 2018;9:E254. [PMID: 29762547 PMCID: PMC5977194 DOI: 10.3390/genes9050254] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/26/2018] [Accepted: 04/26/2018] [Indexed: 01/20/2023] Open

Tracking the genome-wide outcomes of a transposable element burst over decades of amplification. Proc Natl Acad Sci U S A 2017;114:E10550-E10559. [PMID: 29158416 PMCID: PMC5724284 DOI: 10.1073/pnas.1716459114] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

Abstract

Rice (Oryza sativa) has a unique combination of attributes that made it an ideal host to track the natural behavior of very active transposable elements (TEs) over generations. In this study, we have exploited its small genome and propagation by self or sibling pollination to identify and characterize two strain pairs, EG4/HEG4 and A119/A123, undergoing bursts of the nonautonomous miniature inverted repeat transposable element mPing. Comparative sequence analyses of these strains have advanced our understanding of (i) factors that contribute to sustaining a TE burst for decades, (ii) features that distinguish a natural TE burst from bursts in cell culture or mutant backgrounds, and (iii) the extent to which TEs can rapidly diversify the genome of an inbred organism.

To understand the success strategies of transposable elements (TEs) that attain high copy numbers, we analyzed two pairs of rice (Oryza sativa) strains, EG4/HEG4 and A119/A123, undergoing decades of rapid amplification (bursts) of the class 2 autonomous Ping element and the nonautonomous miniature inverted repeat transposable element (MITE) mPing. Comparative analyses of whole-genome sequences of the two strain pairs validated that each pair has been maintained for decades as inbreds since divergence from their respective last common ancestor. Strains EG4 and HEG4 differ by fewer than 160 SNPs and a total of 264 new mPing insertions. Similarly, strains A119 and A123 exhibited about half as many SNPs (277) as new mPing insertions (518). Examination of all other potentially active TEs in these genomes revealed only a single new insertion out of ∼40,000 loci surveyed. The virtual absence of any new TE insertions in these strains outside the mPing bursts demonstrates that the Ping/mPing family gradually attains high copy numbers by maintaining activity and evading host detection for dozens of generations. Evasion is possible because host recognition of mPing sequences appears to have no impact on initiation or maintenance of the burst. Ping is actively transcribed, and both Ping and mPing can transpose despite methylation of terminal sequences. This finding suggests that an important feature of MITE success is that host recognition does not lead to the silencing of the source of transposase.

Collapse

McClintock: An Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data. G3-GENES GENOMES GENETICS 2017. [PMID: 28637810 PMCID: PMC5555480 DOI: 10.1534/g3.117.043893] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Zhang S, Kelleher ES. Targeted identification of TE insertions in a Drosophila genome through hemi-specific PCR. Mob DNA 2017;8:10. [PMID: 28775768 PMCID: PMC5534036 DOI: 10.1186/s13100-017-0092-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 07/10/2017] [Indexed: 11/10/2022] Open

Treiber CD, Waddell S. Resolving the prevalence of somatic transposition in Drosophila. eLife 2017;6. [PMID: 28742021 PMCID: PMC5553932 DOI: 10.7554/elife.28297] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/21/2017] [Indexed: 11/13/2022] Open

Chen J, Wrightsman TR, Wessler SR, Stajich JE. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ 2017;5:e2942. [PMID: 28149701 PMCID: PMC5274521 DOI: 10.7717/peerj.2942] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/26/2016] [Indexed: 12/26/2022] Open

Abstract

Background

Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools.

Methods

We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision.

Results and Discussion

The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.

Collapse

Kang H, Zhu D, Lin R, Opiyo SO, Jiang N, Shiu SH, Wang GL. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads. DNA Res 2016;23:241-51. [PMID: 27098848 PMCID: PMC4909310 DOI: 10.1093/dnares/dsw011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 02/21/2016] [Indexed: 11/16/2022] Open

Ecovoiu AA, Ghionoiu IC, Ciuca AM, Ratiu AC. Genome ARTIST: a robust, high-accuracy aligner tool for mapping transposon insertions and self-insertions. Mob DNA 2016;7:3. [PMID: 26855675 PMCID: PMC4744444 DOI: 10.1186/s13100-016-0061-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/19/2016] [Indexed: 01/16/2023] Open

Abstract

Background

A critical topic of insertional mutagenesis experiments performed on model organisms is mapping the hits of artificial transposons (ATs) at nucleotide level accuracy. Mapping errors may occur when sequencing artifacts or mutations as single nucleotide polymorphisms (SNPs) and small indels are present very close to the junction between a genomic sequence and a transposon inverted repeat (TIR). Another particular item of insertional mutagenesis is mapping of the transposon self-insertions and, to our best knowledge, there is no publicly available mapping tool designed to analyze such molecular events.

Results

We developed Genome ARTIST, a pairwise gapped aligner tool which works out both issues by means of an original, robust mapping strategy. Genome ARTIST is not designed to use next-generation sequencing (NGS) data but to analyze ATs insertions obtained in small to medium-scale mutagenesis experiments. Genome ARTIST employs a heuristic approach to find DNA sequence similarities and harnesses a multi-step implementation of a Smith-Waterman adapted algorithm to compute the mapping alignments. The experience is enhanced by easily customizable parameters and a user-friendly interface that describes the genomic landscape surrounding the insertion. Genome ARTIST is functional with many genomes of bacteria and eukaryotes available in Ensembl and GenBank repositories. Our tool specifically harnesses the sequence annotation data provided by FlyBase for Drosophila melanogaster (the fruit fly), which enables mapping of insertions relative to various genomic features such as natural transposons. Genome ARTIST was tested against other alignment tools using relevant query sequences derived from the D. melanogaster and Mus musculus (mouse) genomes. Real and simulated query sequences were also comparatively inquired, revealing that Genome ARTIST is a very robust solution for mapping transposon insertions.

Conclusions

Genome ARTIST is a stand-alone user-friendly application, designed for high-accuracy mapping of transposon insertions and self-insertions. The tool is also useful for routine aligning assessments like detection of SNPs or checking the specificity of primers and probes. Genome ARTIST is an open source software and is available for download at www.genomeartist.ro and at GitHub (https://github.com/genomeartist/genomeartist ).

Electronic supplementary material

The online version of this article (doi:10.1186/s13100-016-0061-0) contains supplementary material, which is available to authorized users.

Collapse

Nicolas J, Peterlongo P, Tempel S. Finding and Characterizing Repeats in Plant Genomes. Methods Mol Biol 2016;1374:293-337. [PMID: 26519414 DOI: 10.1007/978-1-4939-3167-5_17] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Abstract

Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.

Collapse

Ewing AD. Transposable element detection from whole genome sequence data. Mob DNA 2015;6:24. [PMID: 26719777 PMCID: PMC4696183 DOI: 10.1186/s13100-015-0055-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 12/21/2015] [Indexed: 11/25/2022] Open

Hawkey J, Hamidian M, Wick RR, Edwards DJ, Billman-Jacobe H, Hall RM, Holt KE. ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data. BMC Genomics 2015;16:667. [PMID: 26336060 PMCID: PMC4558774 DOI: 10.1186/s12864-015-1860-2] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 08/18/2015] [Indexed: 11/23/2022] Open

Abstract

Background

Insertion sequences (IS) are small transposable elements, commonly found in bacterial genomes. Identifying the location of IS in bacterial genomes can be useful for a variety of purposes including epidemiological tracking and predicting antibiotic resistance. However IS are commonly present in multiple copies in a single genome, which complicates genome assembly and the identification of IS insertion sites. Here we present ISMapper, a mapping-based tool for identification of the site and orientation of IS insertions in bacterial genomes, directly from paired-end short read data.

Results

ISMapper was validated using three types of short read data: (i) simulated reads from a variety of species, (ii) Illumina reads from 5 isolates for which finished genome sequences were available for comparison, and (iii) Illumina reads from 7 Acinetobacter baumannii isolates for which predicted IS locations were tested using PCR. A total of 20 genomes, including 13 species and 32 distinct IS, were used for validation. ISMapper correctly identified 97 % of known IS insertions in the analysis of simulated reads, and 98 % in real Illumina reads. Subsampling of real Illumina reads to lower depths indicated ISMapper was able to correctly detect insertions for average genome-wide read depths >20x, although read depths >50x were required to obtain confident calls that were highly-supported by evidence from reads. All ISAba1 insertions identified by ISMapper in the A. baumannii genomes were confirmed by PCR. In each A. baumannii genome, ISMapper successfully identified an IS insertion upstream of the ampC beta-lactamase that could explain phenotypic resistance to third-generation cephalosporins. The utility of ISMapper was further demonstrated by profiling genome-wide IS6110 insertions in 138 publicly available Mycobacterium tuberculosis genomes, revealing lineage-specific insertions and multiple insertion hotspots.

Conclusions

ISMapper provides a rapid and robust method for identifying IS insertion sites directly from short read data, with a high degree of accuracy demonstrated across a wide range of bacteria.

Collapse

Jiang C, Chen C, Huang Z, Liu R, Verdier J. ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. BMC Bioinformatics 2015;16:72. [PMID: 25887332 PMCID: PMC4351942 DOI: 10.1186/s12859-015-0507-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Accepted: 02/20/2015] [Indexed: 08/30/2023] Open

Abstract

Background

Transposable elements constitute an important part of the genome and are essential in adaptive mechanisms. Transposition events associated with phenotypic changes occur naturally or are induced in insertional mutant populations. Transposon mutagenesis results in multiple random insertions and recovery of most/all the insertions is critical for forward genetics study. Using genome next-generation sequencing data and appropriate bioinformatics tool, it is plausible to accurately identify transposon insertion sites, which could provide candidate causal mutations for desired phenotypes for further functional validation.

Results

We developed a novel bioinformatics tool, ITIS (Identification of Transposon Insertion Sites), for localizing transposon insertion sites within a genome. It takes next-generation genome re-sequencing data (NGS data), transposon sequence, and reference genome sequence as input, and generates a list of highly reliable candidate insertion sites as well as zygosity information of each insertion. Using a simulated dataset and a case study based on an insertional mutant line from Medicago truncatula, we showed that ITIS performed better in terms of sensitivity and specificity than other similar algorithms such as RelocaTE, RetroSeq, TEMP and TIF. With the case study data, we demonstrated the efficiency of ITIS by validating the presence and zygosity of predicted insertion sites of the Tnt1 transposon within a complex plant system, M. truncatula.

Conclusion

This study showed that ITIS is a robust and powerful tool for forward genetic studies in identifying transposable element insertions causing phenotypes. ITIS is suitable in various systems such as cell culture, bacteria, yeast, insect, mammal and plant.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0507-2) contains supplementary material, which is available to authorized users.

Collapse

Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet 2015;11:e1004915. [PMID: 25569788 PMCID: PMC4287451 DOI: 10.1371/journal.pgen.1004915] [Citation(s) in RCA: 245] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 11/24/2014] [Indexed: 11/25/2022] Open

Abstract

Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress-induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize.

Transposable elements are mobile DNA elements that are a prevalent component of many eukaryotic genomes. While transposable elements can often have deleterious effects through insertions into protein-coding genes they may also contribute to regulatory variation of gene expression. There are a handful of examples in which specific transposon insertions contribute to regulatory variation of nearby genes, particularly in response to environmental stress. We sought to understand the genome-wide influence of transposable elements on gene expression responses to abiotic stress in maize, a plant with many families of transposable elements located in between genes. Our analysis suggests that a small number of maize transposable element families may contribute to the response of nearby genes to abiotic stress by providing stress-responsive enhancer-like functions. The specific insertions of transposable elements are often polymorphic within a species. Our data demonstrate that allelic variation for insertions of the transposable elements associated with stress-responsive expression can contribute to variation in the regulation of nearby genes. Thus novel insertions of transposable elements provide a potential mechanism for genes to acquire cis-regulatory influences that could contribute to heritable variation for stress response.

Collapse

Fiston-Lavier AS, Barrón MG, Petrov DA, González J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res 2014;43:e22. [PMID: 25510498 PMCID: PMC4344482 DOI: 10.1093/nar/gku1250] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open

Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM. TE-Tracker: systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics 2014;15:377. [PMID: 25408240 PMCID: PMC4279814 DOI: 10.1186/s12859-014-0377-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 11/05/2014] [Indexed: 11/10/2022] Open

Abstract

Background

Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements.

Results

We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker.

Conclusions

We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-014-0377-z) contains supplementary material, which is available to authorized users.

Collapse

Affiliation(s)

Arthur Gilly Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France. .,Current address: The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
Mathilde Etcheverry Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
Mohammed-Amin Madoui Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Julie Guy Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Leandro Quadrana Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
Adriana Alberti Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Antoine Martin Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Technische Universität Dresden, Institute of Bota, ny, Plant Cell and Molecular Biology, D-01062, Dresden, Germany.
Tony Heitkam Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Laboratoire de Biochimie et Physiologie Moléculaire des Plantes, Institut de Biologie Intégrative des Plantes 'Claude Grignon', UMR CNRS/INRA/SupAgro/UM2, Place Viala, 34060, Montpellier, Cedex, France.
Stefan Engelen Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Karine Labadie Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Jeremie Le Pen Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Gurdon Institute and Department of Biochemistry, University of Cambridge, The Henry Wellcome Building of Cancer and Developmental Biology, Tennis Court Rd, Cambridge, CB2 1QN, UK.
Patrick Wincker Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
Vincent Colot Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
Jean-Marc Aury Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.

Collapse

Barrón MG, Fiston-Lavier AS, Petrov DA, González J. Population genomics of transposable elements in Drosophila. Annu Rev Genet 2014;48:561-81. [PMID: 25292358 DOI: 10.1146/annurev-genet-120213-092359] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

McCoy RC, Taylor RW, Blauwkamp TA, Kelley JL, Kertesz M, Pushkarev D, Petrov DA, Fiston-Lavier AS. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 2014;9:e106689. [PMID: 25188499 PMCID: PMC4154752 DOI: 10.1371/journal.pone.0106689] [Citation(s) in RCA: 158] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 07/24/2014] [Indexed: 11/18/2022] Open

Tobias PA, Guest DI. Tree immunity: growing old without antibodies. TRENDS IN PLANT SCIENCE 2014;19:367-70. [PMID: 24556378 DOI: 10.1016/j.tplants.2014.01.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 01/18/2014] [Accepted: 01/21/2014] [Indexed: 05/04/2023]

Vitte C, Fustier MA, Alix K, Tenaillon MI. The bright side of transposons in crop evolution. Brief Funct Genomics 2014;13:276-95. [PMID: 24681749 DOI: 10.1093/bfgp/elu002] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Nakagome M, Solovieva E, Takahashi A, Yasue H, Hirochika H, Miyao A. Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 2014;15:71. [PMID: 24629057 PMCID: PMC4004357 DOI: 10.1186/1471-2105-15-71] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 03/06/2014] [Indexed: 11/10/2022] Open