1
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023; 14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
- Department of Genetics, University of Georgia, Athens, GA USA
| |
Collapse
|
2
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of short-read transposable element detectors and species-wide data mining of insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Background Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute or evaluate multiple TE insertion detectors. Results We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae , we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼ 1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge aboutfine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia -superfamily retrotransposons in yeast. Conclusion McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.
Collapse
|
3
|
Yan H, Haak DC, Li S, Huang L, Bombarely A. Exploring transposable element-based markers to identify allelic variations underlying agronomic traits in rice. PLANT COMMUNICATIONS 2022; 3:100270. [PMID: 35576152 PMCID: PMC9251385 DOI: 10.1016/j.xplc.2021.100270] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 10/29/2021] [Accepted: 12/16/2021] [Indexed: 06/10/2023]
Abstract
Transposable elements (TEs) are a major force in the production of new alleles during domestication; nevertheless, their use in association studies has been limited because of their complexity. We have developed a TE genotyping pipeline (TEmarker) and applied it to whole-genome genome-wide association study (GWAS) data from 176 Oryza sativa subsp. japonica accessions to identify genetic elements associated with specific agronomic traits. TE markers recovered a large proportion (69%) of single-nucleotide polymorphism (SNP)-based GWAS peaks, and these TE peaks retained ca. 25% of the SNPs. The use of TEs in GWASs may reduce false positives associated with linkage disequilibrium (LD) among SNP markers. A genome scan revealed positive selection on TEs associated with agronomic traits. We found several cases of insertion and deletion variants that potentially resulted from the direct action of TEs, including an allele of LOC_Os11g08410 associated with plant height and panicle length traits. Together, these findings reveal the utility of TE markers for connecting genotype to phenotype and suggest a potential role for TEs in influencing phenotypic variations in rice that impact agronomic traits.
Collapse
Affiliation(s)
- Haidong Yan
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - David C Haak
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA; Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Virginia Tech, Blacksburg, VA 24061, USA
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA; Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Virginia Tech, Blacksburg, VA 24061, USA
| | - Linkai Huang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu 611130, China
| | - Aureliano Bombarely
- Department of Bioscience, Universita degli Studi di Milano (UNIMI), 20133 Milano, Italy; Instituto de Biologıa Molecular y Celular de Plantas (IBMCP), UPV-CSIC, 46022 Valencia, Spain.
| |
Collapse
|
4
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
5
|
Han S, Basting PJ, Dias GB, Luhur A, Zelhof AC, Bergman CM. Transposable element profiles reveal cell line identity and loss of heterozygosity in Drosophila cell culture. Genetics 2021; 219:6321957. [PMID: 34849875 PMCID: PMC8633141 DOI: 10.1093/genetics/iyab113] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2021] [Accepted: 07/01/2021] [Indexed: 11/28/2022] Open
Abstract
Cell culture systems allow key insights into biological mechanisms yet suffer from irreproducible outcomes in part because of cross-contamination or mislabeling of cell lines. Cell line misidentification can be mitigated by the use of genotyping protocols, which have been developed for human cell lines but are lacking for many important model species. Here, we leverage the classical observation that transposable elements (TEs) proliferate in cultured Drosophila cells to demonstrate that genome-wide TE insertion profiles can reveal the identity and provenance of Drosophila cell lines. We identify multiple cases where TE profiles clarify the origin of Drosophila cell lines (Sg4, mbn2, and OSS_E) relative to published reports, and also provide evidence that insertions from only a subset of long-terminal repeat retrotransposon families are necessary to mark Drosophila cell line identity. We also develop a new bioinformatics approach to detect TE insertions and estimate intra-sample allele frequencies in legacy whole-genome sequencing data (called ngs_te_mapper2), which revealed loss of heterozygosity as a mechanism shaping the unique TE profiles that identify Drosophila cell lines. Our work contributes to the general understanding of the forces impacting metazoan genomes as they evolve in cell culture and paves the way for high-throughput protocols that use TE insertions to authenticate cell lines in Drosophila and other organisms.
Collapse
Affiliation(s)
- Shunhua Han
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Preston J Basting
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA
| | - Guilherme B Dias
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,Department of Genetics, University of Georgia, Athens, GA 30602, USA
| | - Arthur Luhur
- Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA.,Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Andrew C Zelhof
- Drosophila Genomics Resource Center, Indiana University, Bloomington, IN 47405, USA.,Department of Biology, Indiana University, Bloomington, IN 47405, USA
| | - Casey M Bergman
- Department of Genetics and Institute of Bioinformatics, University of Georgia, Athens, GA 30602, USA.,Department of Genetics, University of Georgia, Athens, GA 30602, USA
| |
Collapse
|
6
|
Genomic diversity generated by a transposable element burst in a rice recombinant inbred population. Proc Natl Acad Sci U S A 2020; 117:26288-26297. [PMID: 33020276 PMCID: PMC7584900 DOI: 10.1073/pnas.2015736117] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Genomes of all characterized higher eukaryotes harbor examples of transposable element (TE) bursts-the rapid amplification of TE copies throughout a genome. Despite their prevalence, understanding how bursts diversify genomes requires the characterization of actively transposing TEs before insertion sites and structural rearrangements have been obscured by selection acting over evolutionary time. In this study, rice recombinant inbred lines (RILs), generated by crossing a bursting accession and the reference Nipponbare accession, were exploited to characterize the spread of the very active Ping/mPing family through a small population and the resulting impact on genome diversity. Comparative sequence analysis of 272 individuals led to the identification of over 14,000 new insertions of the mPing miniature inverted-repeat transposable element (MITE), with no evidence for silencing of the transposase-encoding Ping element. In addition to new insertions, Ping-encoded transposase was found to preferentially catalyze the excision of mPing loci tightly linked to a second mPing insertion. Similarly, structural variations, including deletion of rice exons or regulatory regions, were enriched for those with break points at one or both ends of linked mPing elements. Taken together, these results indicate that structural variations are generated during a TE burst as transposase catalyzes both the high copy numbers needed to distribute linked elements throughout the genome and the DNA cuts at the TE ends known to dramatically increase the frequency of recombination.
Collapse
|
7
|
Bogaerts-Márquez M, Barrón MG, Fiston-Lavier AS, Vendrell-Mir P, Castanera R, Casacuberta JM, González J. T-lex3: an accurate tool to genotype and estimate population frequencies of transposable elements using the latest short-read whole genome sequencing data. Bioinformatics 2020; 36:1191-1197. [PMID: 31580402 PMCID: PMC7703783 DOI: 10.1093/bioinformatics/btz727] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 09/16/2019] [Accepted: 09/25/2019] [Indexed: 12/22/2022] Open
Abstract
Motivation Transposable elements (TEs) constitute a significant proportion of the majority of genomes sequenced to date. TEs are responsible for a considerable fraction of the genetic variation within and among species. Accurate genotyping of TEs in genomes is therefore crucial for a complete identification of the genetic differences among individuals, populations and species. Results In this work, we present a new version of T-lex, a computational pipeline that accurately genotypes and estimates the population frequencies of reference TE insertions using short-read high-throughput sequencing data. In this new version, we have re-designed the T-lex algorithm to integrate the BWA-MEM short-read aligner, which is one of the most accurate short-read mappers and can be launched on longer short-reads (e.g. reads >150 bp). We have added new filtering steps to increase the accuracy of the genotyping, and new parameters that allow the user to control both the minimum and maximum number of reads, and the minimum number of strains to genotype a TE insertion. We also showed for the first time that T-lex3 provides accurate TE calls in a plant genome. Availability and implementation To test the accuracy of T-lex3, we called 1630 individual TE insertions in Drosophila melanogaster, 1600 individual TE insertions in humans, and 3067 individual TE insertions in the rice genome. We showed that this new version of T-lex is a broadly applicable and accurate tool for genotyping and estimating TE frequencies in organisms with different genome sizes and different TE contents. T-lex3 is available at Github: https://github.com/GonzalezLab/T-lex3. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- María Bogaerts-Márquez
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| | - Maite G Barrón
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| | - Anna-Sophie Fiston-Lavier
- Institut des Sciences de l'Evolution de Montpellier (UMR 5554, CNRS-UM-IRD-EPHE), 11 Université de Motpellier, Place Eugène Bataillon, Montpellier, France
| | - Pol Vendrell-Mir
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Raúl Castanera
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Josep M Casacuberta
- Center for Research in Agricultural Genomics, CRAG (CSIC-IRTA-UAB-UB), Campus UAB, Cerdanyola del Vallès, Barcelona, Spain
| | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Paseo Maritimo Barceloneta 37-49, Barcelona, Spain
| |
Collapse
|
8
|
Nandety RS, Serrani‐Yarce JC, Gill US, Oh S, Lee H, Zhang X, Dai X, Zhang W, Krom N, Wen J, Zhao PX, Mysore KS. Insertional mutagenesis of Brachypodium distachyon using the Tnt1 retrotransposable element. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2020; 103:1924-1936. [PMID: 32410353 PMCID: PMC7496502 DOI: 10.1111/tpj.14813] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Revised: 04/29/2020] [Accepted: 05/05/2020] [Indexed: 06/11/2023]
Abstract
Brachypodium distachyon is an annual C3 grass used as a monocot model system in functional genomics research. Insertional mutagenesis is a powerful tool for both forward and reverse genetics studies. In this study, we explored the possibility of using the tobacco retrotransposon Tnt1 to create a transposon-based insertion mutant population in B. distachyon. We developed transgenic B. distachyon plants expressing Tnt1 (R0) and in the subsequent regenerants (R1) we observed that Tnt1 actively transposed during somatic embryogenesis, generating an average of 6.37 insertions per line in a population of 19 independent R1 regenerant plants analyzed. In seed-derived progeny of R1 plants, Tnt1 segregated in a Mendelian ratio of 3:1 and no new Tnt1 transposition was observed. A total of 126 flanking sequence tags (FSTs) were recovered from the analyzed R0 and R1 lines. Analysis of the FSTs showed a uniform pattern of insertion in all the chromosomes (1-5) without any preference for a particular chromosome region. Considering the average length of a gene transcript to be 3.37 kb, we estimated that 29 613 lines are required to achieve a 90% possibility of tagging a given gene in the B. distachyon genome using the Tnt1-based mutagenesis approach. Our results show the possibility of using Tnt1 to achieve near-saturation mutagenesis in B. distachyon, which will aid in functional genomics studies of other C3 grasses.
Collapse
Affiliation(s)
| | - Juan C. Serrani‐Yarce
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
- Present address:
Department of Biological SciencesUniversity of North TexasDentonTX76203USA
| | - Upinder S. Gill
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
- Present address:
Department of Plant PathologyNorth Dakota State UniversityFargoND58102USA
| | - Sunhee Oh
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Hee‐Kyung Lee
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Xinji Zhang
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Xinbin Dai
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Wenchao Zhang
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Nick Krom
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Jiangqi Wen
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | - Patrick X. Zhao
- Noble Research InstituteLLC.2510 Sam Noble ParkwayArdmoreOK73401USA
| | | |
Collapse
|
9
|
Miniature inverted-repeat transposable elements (MITEs), derived insertional polymorphism as a tool of marker systems for molecular plant breeding. Mol Biol Rep 2020; 47:3155-3167. [PMID: 32162128 DOI: 10.1007/s11033-020-05365-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 02/29/2020] [Indexed: 12/20/2022]
Abstract
Plant molecular breeding is expected to give significant gains in cultivar development through development and utilization of suitable molecular marker systems for genetic diversity analysis, rapid DNA fingerprinting, identification of true hybrids, trait mapping and marker-assisted selection. Transposable elements (TEs) are the most abundant component in a genome and being used as genetic markers in the plant molecular breeding. Here, we review on the high copious transposable element belonging to class-II DNA TEs called "miniature inverted-repeat transposable elements" (MITEs). MITEs are ubiquitous, short and non-autonomous DNA transposable elements which have a tendency to insert into genes and genic regions have paved a way for the development of functional DNA marker systems in plant genomes. This review summarises the characteristics of MITEs, principles and methodologies for development of MITEs based DNA markers, bioinformatics tools and resources for plant MITE discovery and their utilization in crop improvement.
Collapse
|
10
|
Macko-Podgórni A, Stelmach K, Kwolek K, Grzebelus D. Stowaway miniature inverted repeat transposable elements are important agents driving recent genomic diversity in wild and cultivated carrot. Mob DNA 2019; 10:47. [PMID: 31798695 PMCID: PMC6881990 DOI: 10.1186/s13100-019-0190-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Accepted: 11/21/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Miniature inverted repeat transposable elements (MITEs) are small non-autonomous DNA transposons that are ubiquitous in plant genomes, and are mobilised by their autonomous relatives. Stowaway MITEs are derived from and mobilised by elements from the mariner superfamily. Those elements constitute a significant portion of the carrot genome; however the variation caused by Daucus carota Stowaway MITEs (DcStos), their association with genes and their putative impact on genome evolution has not been comprehensively analysed. RESULTS Fourteen families of Stowaway elements DcStos occupy about 0.5% of the carrot genome. We systematically analysed 31 genomes of wild and cultivated Daucus carota, yielding 18.5 thousand copies of these elements, showing remarkable insertion site polymorphism. DcSto element demography differed based on the origin of the host populations, and corresponded with the four major groups of D. carota, wild European, wild Asian, eastern cultivated and western cultivated. The DcStos elements were associated with genes, and most frequently occurred in 5' and 3' untranslated regions (UTRs). Individual families differed in their propensity to reside in particular segments of genes. Most importantly, DcSto copies in the 2 kb regions up- and downstream of genes were more frequently associated with open reading frames encoding transcription factors, suggesting their possible functional impact. More than 1.5% of all DcSto insertion sites in different host genomes contained different copies in exactly the same position, indicating the existence of insertional hotspots. The DcSto7b family was much more polymorphic than the other families in cultivated carrot. A line of evidence pointed at its activity in the course of carrot domestication, and identified Dcmar1 as an active carrot mariner element and a possible source of the transposition machinery for DcSto7b. CONCLUSION Stowaway MITEs have made a substantial contribution to the structural and functional variability of the carrot genome.
Collapse
Affiliation(s)
- Alicja Macko-Podgórni
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Katarzyna Stelmach
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Kornelia Kwolek
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| | - Dariusz Grzebelus
- Institute of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, 31425 Krakow, Poland
| |
Collapse
|
11
|
Bae J, Lee KW, Islam MN, Yim HS, Park H, Rho M. iMGEins: detecting novel mobile genetic elements inserted in individual genomes. BMC Genomics 2018; 19:944. [PMID: 30563451 PMCID: PMC6299635 DOI: 10.1186/s12864-018-5290-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2018] [Accepted: 11/20/2018] [Indexed: 11/10/2022] Open
Abstract
Background Recent advances in sequencing technology have allowed us to investigate personal genomes to find structural variations, which have been studied extensively to identify their association with the physiology of diseases such as cancer. In particular, mobile genetic elements (MGEs) are one of the major constituents of the human genomes, and cause genome instability by insertion, mutation, and rearrangement. Result We have developed a new program, iMGEins, to identify such novel MGEs by using sequencing reads of individual genomes, and to explore the breakpoints with the supporting reads and MGEs detected. iMGEins is the first MGE detection program that integrates three algorithmic components: discordant read-pair mapping, split-read mapping, and insertion sequence assembly. Our evaluation results showed its outstanding performance in detecting novel MGEs from simulated genomes, as well as real personal genomes. In detail, the average recall and precision rates of iMGEins are 96.67 and 100%, respectively, which are the highest among the programs compared. In the testing with real human genomes of the NA12878 sample, iMGEins shows the highest accuracy in detecting MGEs within 20 bp proximity of the breakpoints annotated. Conclusion In order to study the dynamics of MGEs in individual genomes, iMGEins was developed to accurately detect breakpoints and report inserted MGEs. Compared with other programs, iMGEins has valuable features of identifying novel MGEs and assembling the MGEs inserted. Electronic supplementary material The online version of this article (10.1186/s12864-018-5290-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Junwoo Bae
- Department of Electronics and Computer Engineering, Hanyang University, Seoul, Korea
| | - Kyeong Won Lee
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea
| | - Mohammad Nazrul Islam
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea.,Department of Marine Biotechnology, Korea University of Science and Technology, Daejeon, Korea.,Department of Biotechnology, Sher-e-Bangla Agricultural University, Dhaka, 1207, Bangladesh
| | - Hyung-Soon Yim
- Marine Biotechnology Research Center, Korea Institute of Ocean Science and Technology, Ansan, Korea.,Department of Marine Biotechnology, Korea University of Science and Technology, Daejeon, Korea
| | - Heejin Park
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea. .,Department of Biomedical Informatics, Hanyang University, Seoul, Korea.
| | - Mina Rho
- Department of Computer Science and Engineering, Hanyang University, Seoul, Korea. .,Department of Biomedical Informatics, Hanyang University, Seoul, Korea.
| |
Collapse
|
12
|
Serrato-Capuchina A, Matute DR. The Role of Transposable Elements in Speciation. Genes (Basel) 2018; 9:E254. [PMID: 29762547 PMCID: PMC5977194 DOI: 10.3390/genes9050254] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Revised: 04/26/2018] [Accepted: 04/26/2018] [Indexed: 01/20/2023] Open
Abstract
Understanding the phenotypic and molecular mechanisms that contribute to genetic diversity between and within species is fundamental in studying the evolution of species. In particular, identifying the interspecific differences that lead to the reduction or even cessation of gene flow between nascent species is one of the main goals of speciation genetic research. Transposable elements (TEs) are DNA sequences with the ability to move within genomes. TEs are ubiquitous throughout eukaryotic genomes and have been shown to alter regulatory networks, gene expression, and to rearrange genomes as a result of their transposition. However, no systematic effort has evaluated the role of TEs in speciation. We compiled the evidence for TEs as potential causes of reproductive isolation across a diversity of taxa. We find that TEs are often associated with hybrid defects that might preclude the fusion between species, but that the involvement of TEs in other barriers to gene flow different from postzygotic isolation is still relatively unknown. Finally, we list a series of guides and research avenues to disentangle the effects of TEs on the origin of new species.
Collapse
Affiliation(s)
- Antonio Serrato-Capuchina
- Biology Department, Genome Sciences Building, University of North Carolina, Chapel Hill, NC 27514, USA.
| | - Daniel R Matute
- Biology Department, Genome Sciences Building, University of North Carolina, Chapel Hill, NC 27514, USA.
| |
Collapse
|
13
|
Tracking the genome-wide outcomes of a transposable element burst over decades of amplification. Proc Natl Acad Sci U S A 2017; 114:E10550-E10559. [PMID: 29158416 PMCID: PMC5724284 DOI: 10.1073/pnas.1716459114] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Rice (Oryza sativa) has a unique combination of attributes that made it an ideal host to track the natural behavior of very active transposable elements (TEs) over generations. In this study, we have exploited its small genome and propagation by self or sibling pollination to identify and characterize two strain pairs, EG4/HEG4 and A119/A123, undergoing bursts of the nonautonomous miniature inverted repeat transposable element mPing. Comparative sequence analyses of these strains have advanced our understanding of (i) factors that contribute to sustaining a TE burst for decades, (ii) features that distinguish a natural TE burst from bursts in cell culture or mutant backgrounds, and (iii) the extent to which TEs can rapidly diversify the genome of an inbred organism. To understand the success strategies of transposable elements (TEs) that attain high copy numbers, we analyzed two pairs of rice (Oryza sativa) strains, EG4/HEG4 and A119/A123, undergoing decades of rapid amplification (bursts) of the class 2 autonomous Ping element and the nonautonomous miniature inverted repeat transposable element (MITE) mPing. Comparative analyses of whole-genome sequences of the two strain pairs validated that each pair has been maintained for decades as inbreds since divergence from their respective last common ancestor. Strains EG4 and HEG4 differ by fewer than 160 SNPs and a total of 264 new mPing insertions. Similarly, strains A119 and A123 exhibited about half as many SNPs (277) as new mPing insertions (518). Examination of all other potentially active TEs in these genomes revealed only a single new insertion out of ∼40,000 loci surveyed. The virtual absence of any new TE insertions in these strains outside the mPing bursts demonstrates that the Ping/mPing family gradually attains high copy numbers by maintaining activity and evading host detection for dozens of generations. Evasion is possible because host recognition of mPing sequences appears to have no impact on initiation or maintenance of the burst. Ping is actively transcribed, and both Ping and mPing can transpose despite methylation of terminal sequences. This finding suggests that an important feature of MITE success is that host recognition does not lead to the silencing of the source of transposase.
Collapse
|
14
|
McClintock: An Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data. G3-GENES GENOMES GENETICS 2017. [PMID: 28637810 PMCID: PMC5555480 DOI: 10.1534/g3.117.043893] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae We find substantial variation among McClintock component methods in their ability to detect nonreference TEs in the yeast genome, but show that nonreference TEs at nearly all biologically realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer nonreference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, target preferences, and target site duplication structure, albeit with varying levels of accuracy. Our work provides a general framework for integrating and analyzing results from multiple TE detection methods, as well as useful guidance for researchers studying TEs in yeast resequencing data.
Collapse
|
15
|
Zhang S, Kelleher ES. Targeted identification of TE insertions in a Drosophila genome through hemi-specific PCR. Mob DNA 2017; 8:10. [PMID: 28775768 PMCID: PMC5534036 DOI: 10.1186/s13100-017-0092-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 07/10/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are major components of eukaryotic genomes and drivers of genome evolution, producing intraspecific polymorphism and interspecific differences through mobilization and non-homologous recombination. TE insertion sites are often highly variable within species, creating a need for targeted genome re-sequencing (TGS) methods to identify TE insertion sites. METHODS We present a hemi-specific PCR approach for TGS of P-elements in Drosophila genomes on the Illumina platform. We also present a computational framework for identifying new insertions from TGS reads. Finally, we describe a new method for estimating the frequency of TE insertions from WGS data, which is based precise insertion sites provided by TGS annotations. RESULTS By comparing our results to TE annotations based on whole genome re-sequencing (WGS) data for the same Drosophilamelanogaster strain, we demonstrate that TGS is powerful for identifying true insertions, even in repeat-rich heterochromatic regions. We also demonstrate that TGS offers enhanced annotation of precise insertion sites, which facilitates estimation of TE insertion frequency. CONCLUSIONS TGS by hemi-specific PCR is a powerful approach for identifying TE insertions of particular TE families in species with a high-quality reference genome, at greatly reduced cost as compared to WGS. It may therefore be ideal for population genomic studies of particular TE families. Additionally, TGS and WGS can be used as complementary approaches, with TGS annotations identifying more annotated insertions with greater precision for a target TE family, and WGS data allowing for estimates of TE insertion frequencies, and a broader picture of the location of non-target TEs across the genome.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| | - Erin S. Kelleher
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| |
Collapse
|
16
|
Treiber CD, Waddell S. Resolving the prevalence of somatic transposition in Drosophila. eLife 2017; 6. [PMID: 28742021 PMCID: PMC5553932 DOI: 10.7554/elife.28297] [Citation(s) in RCA: 49] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/21/2017] [Indexed: 11/13/2022] Open
Abstract
Somatic transposition in mammals and insects could increase cellular diversity and neural mobilization has been implicated in age-dependent decline. To understand the impact of transposition in somatic cells it is essential to reliably measure the frequency and map locations of new insertions. Here we identified thousands of putative somatic transposon insertions in neurons from individual Drosophila melanogaster using whole-genome sequencing. However, the number of de novo insertions did not correlate with transposon expression or fly age. Analysing our data with exons as 'immobile genetic elements' revealed a similar frequency of unexpected exon translocations. A new sequencing strategy that recovers transposon: chromosome junction information revealed most putative de novo transposon and exon insertions likely result from unavoidable chimeric artefacts. Reanalysis of other published data suggests similar artefacts are often mistaken for genuine somatic transposition. We conclude that somatic transposition is less prevalent in Drosophila than previously envisaged.
Collapse
Affiliation(s)
- Christoph D Treiber
- Centre for Neural Circuits and Behaviour, The University of Oxford, Oxford, United Kingdom
| | - Scott Waddell
- Centre for Neural Circuits and Behaviour, The University of Oxford, Oxford, United Kingdom
| |
Collapse
|
17
|
Chen J, Wrightsman TR, Wessler SR, Stajich JE. RelocaTE2: a high resolution transposable element insertion site mapping tool for population resequencing. PeerJ 2017; 5:e2942. [PMID: 28149701 PMCID: PMC5274521 DOI: 10.7717/peerj.2942] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Accepted: 12/26/2016] [Indexed: 12/26/2022] Open
Abstract
Background Transposable element (TE) polymorphisms are important components of population genetic variation. The functional impacts of TEs in gene regulation and generating genetic diversity have been observed in multiple species, but the frequency and magnitude of TE variation is under appreciated. Inexpensive and deep sequencing technology has made it affordable to apply population genetic methods to whole genomes with methods that identify single nucleotide and insertion/deletion polymorphisms. However, identifying TE polymorphisms, particularly transposition events or non-reference insertion sites can be challenging due to the repetitive nature of these sequences, which hamper both the sensitivity and specificity of analysis tools. Methods We have developed the tool RelocaTE2 for identification of TE insertion sites at high sensitivity and specificity. RelocaTE2 searches for known TE sequences in whole genome sequencing reads from second generation sequencing platforms such as Illumina. These sequence reads are used as seeds to pinpoint chromosome locations where TEs have transposed. RelocaTE2 detects target site duplication (TSD) of TE insertions allowing it to report TE polymorphism loci with single base pair precision. Results and Discussion The performance of RelocaTE2 is evaluated using both simulated and real sequence data. RelocaTE2 demonstrate high level of sensitivity and specificity, particularly when the sequence coverage is not shallow. In comparison to other tools tested, RelocaTE2 achieves the best balance between sensitivity and specificity. In particular, RelocaTE2 performs best in prediction of TSDs for TE insertions. Even in highly repetitive regions, such as those tested on rice chromosome 4, RelocaTE2 is able to report up to 95% of simulated TE insertions with less than 0.1% false positive rate using 10-fold genome coverage resequencing data. RelocaTE2 provides a robust solution to identify TE insertion sites and can be incorporated into analysis workflows in support of describing the complete genotype from light coverage genome sequencing.
Collapse
Affiliation(s)
- Jinfeng Chen
- Department of Plant Pathology & Microbiology, University of California, Riverside, CA, United States; Institute for Integrative Genome Biology, University of California, Riverside, CA, United States; Department of Botany and Plant Sciences, University of California, Riverside, CA, United States
| | - Travis R Wrightsman
- Department of Botany and Plant Sciences, University of California , Riverside , CA , United States
| | - Susan R Wessler
- Institute for Integrative Genome Biology, University of California, Riverside, CA, United States; Department of Botany and Plant Sciences, University of California, Riverside, CA, United States
| | - Jason E Stajich
- Department of Plant Pathology & Microbiology, University of California, Riverside, CA, United States; Institute for Integrative Genome Biology, University of California, Riverside, CA, United States
| |
Collapse
|
18
|
Kang H, Zhu D, Lin R, Opiyo SO, Jiang N, Shiu SH, Wang GL. A novel method for identifying polymorphic transposable elements via scanning of high-throughput short reads. DNA Res 2016; 23:241-51. [PMID: 27098848 PMCID: PMC4909310 DOI: 10.1093/dnares/dsw011] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 02/21/2016] [Indexed: 11/16/2022] Open
Abstract
Identification of polymorphic transposable elements (TEs) is important because TE polymorphism creates genetic diversity and influences the function of genes in the host genome. However, de novo scanning of polymorphic TEs remains a challenge. Here, we report a novel computational method, called PTEMD (polymorphic TEs and their movement detection), for de novo discovery of genome-wide polymorphic TEs. PTEMD searches highly identical sequences using reads supported breakpoint evidences. Using PTEMD, we identified 14 polymorphic TE families (905 sequences) in rice blast fungus Magnaporthe oryzae, and 68 (10,618 sequences) in maize. We validated one polymorphic TE family experimentally, MoTE-1; all MoTE-1 family members are located in different genomic loci in the three tested isolates. We found that 57.1% (8 of 14) of the PTEMD-detected polymorphic TE families in M. oryzae are active. Furthermore, our data indicate that there are more polymorphic DNA transposons in maize than their counterparts of retrotransposons despite the fact that retrotransposons occupy largest fraction of genomic mass. We demonstrated that PTEMD is an effective tool for identifying polymorphic TEs in M. oryzae and maize genomes. PTEMD and the genome-wide polymorphic TEs in M. oryzae and maize are publically available at http://www.kanglab.cn/blast/PTEMD_V1.02.htm.
Collapse
Affiliation(s)
- Houxiang Kang
- State Key Laboratory for Biology of Plant Diseases and Insect Pest, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China
| | - Dan Zhu
- State Key Laboratory for Biology of Plant Diseases and Insect Pest, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China Department of Agronomy, Hunan Agricultural University, Changsha, Hunan 410128, China
| | - Runmao Lin
- Department of Plant Pathology, Institute of Vegetables and flowers, Chinese Academy of Agriculture Science, Beijing 100081, China
| | - Stephen Obol Opiyo
- Molecular and Cellular Imaging Center - Columbus, Ohio Agricultural Research and Development Center, Columbus, OH 43210, USA
| | - Ning Jiang
- Department of Horticulture, Michigan State University, 1066 Bogue Street, East Lansing, MI 48823, USA
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, MI 48823, USA
| | - Guo-Liang Wang
- State Key Laboratory for Biology of Plant Diseases and Insect Pest, Institute of Plant Protection, Chinese Academy of Agricultural Sciences, Beijing 100193, China Department of Plant Pathology, Ohio State University, Columbus, OH 43210, USA
| |
Collapse
|
19
|
Ecovoiu AA, Ghionoiu IC, Ciuca AM, Ratiu AC. Genome ARTIST: a robust, high-accuracy aligner tool for mapping transposon insertions and self-insertions. Mob DNA 2016; 7:3. [PMID: 26855675 PMCID: PMC4744444 DOI: 10.1186/s13100-016-0061-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2015] [Accepted: 01/19/2016] [Indexed: 01/16/2023] Open
Abstract
Background A critical topic of insertional mutagenesis experiments performed on model organisms is mapping the hits of artificial transposons (ATs) at nucleotide level accuracy. Mapping errors may occur when sequencing artifacts or mutations as single nucleotide polymorphisms (SNPs) and small indels are present very close to the junction between a genomic sequence and a transposon inverted repeat (TIR). Another particular item of insertional mutagenesis is mapping of the transposon self-insertions and, to our best knowledge, there is no publicly available mapping tool designed to analyze such molecular events. Results We developed Genome ARTIST, a pairwise gapped aligner tool which works out both issues by means of an original, robust mapping strategy. Genome ARTIST is not designed to use next-generation sequencing (NGS) data but to analyze ATs insertions obtained in small to medium-scale mutagenesis experiments. Genome ARTIST employs a heuristic approach to find DNA sequence similarities and harnesses a multi-step implementation of a Smith-Waterman adapted algorithm to compute the mapping alignments. The experience is enhanced by easily customizable parameters and a user-friendly interface that describes the genomic landscape surrounding the insertion. Genome ARTIST is functional with many genomes of bacteria and eukaryotes available in Ensembl and GenBank repositories. Our tool specifically harnesses the sequence annotation data provided by FlyBase for Drosophila melanogaster (the fruit fly), which enables mapping of insertions relative to various genomic features such as natural transposons. Genome ARTIST was tested against other alignment tools using relevant query sequences derived from the D. melanogaster and Mus musculus (mouse) genomes. Real and simulated query sequences were also comparatively inquired, revealing that Genome ARTIST is a very robust solution for mapping transposon insertions. Conclusions Genome ARTIST is a stand-alone user-friendly application, designed for high-accuracy mapping of transposon insertions and self-insertions. The tool is also useful for routine aligning assessments like detection of SNPs or checking the specificity of primers and probes. Genome ARTIST is an open source software and is available for download at www.genomeartist.ro and at GitHub (https://github.com/genomeartist/genomeartist ). Electronic supplementary material The online version of this article (doi:10.1186/s13100-016-0061-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Alexandru Al Ecovoiu
- Department of Genetics, Faculty of Biology, University of Bucharest, Bucharest, Romania
| | | | | | - Attila Cristian Ratiu
- Department of Genetics, Faculty of Biology, University of Bucharest, Bucharest, Romania
| |
Collapse
|
20
|
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of available software that can help biologists to look for these repeats and check some hypothetical models intended to characterize their structures. Since transposable elements are a major source of repeats in plants, many methods have been used or developed for this large class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided a whole section on this topic as well as a selection of the main existing software. In order to better understand how they work and how repeats may be efficiently found in genomes, it is necessary to look at the technical issues involved in the large-scale search of these structures. Indeed, it may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of the search for repeats and more complex patterns. The second section introduces the key concepts that are useful for understanding the current state of the art in playing with words, applied to genomic sequences. This can be seen as the first stage of a very general approach called linguistic analysis that is interested in the analysis of natural or artificial texts. Words, the lexical level, correspond to simple repeated entities in texts or strings. In fact, biologists need to represent more complex entities where a repeat family is built on more abstract structures, including direct or inverted small repeats, motifs, composition constraints as well as ordering and distance constraints between these elementary blocks. In terms of linguistics, this corresponds to the syntactic level of a language. The last section introduces concepts and practical tools that can be used to reach this syntactic level in biological sequence analysis.
Collapse
Affiliation(s)
- Jacques Nicolas
- Dyliss Team, Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France.
| | - Pierre Peterlongo
- Irisa/Inria Centre de Rennes Bretagne Atlantique, Campus de Beaulieu, 35510, Rennes cedex, France
| | - Sébastien Tempel
- LCB, CNRS UMR 7283, 31 Chemin Joseph Aiguier, 13402, Marseille cedex 20, France
| |
Collapse
|
21
|
Ewing AD. Transposable element detection from whole genome sequence data. Mob DNA 2015; 6:24. [PMID: 26719777 PMCID: PMC4696183 DOI: 10.1186/s13100-015-0055-3] [Citation(s) in RCA: 123] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2015] [Accepted: 12/21/2015] [Indexed: 11/25/2022] Open
Abstract
The number of software tools available for detecting transposable element insertions from whole genome sequence data has been increasing steadily throughout the last ~5 years. Some of these methods have unique features suiting them for particular use cases, but in general they follow one or more of a common set of approaches. Here, detection and filtering approaches are reviewed in the light of transposable element biology and the current state of whole genome sequencing. We demonstrate that the current state-of-the-art methods still do not produce highly concordant results and provide resources to assist future development in transposable element detection methods.
Collapse
Affiliation(s)
- Adam D Ewing
- Mater Research Institute - University of Queensland, 37 Kent St Level 4, Woolloongabba, QLD 4102 Australia
| |
Collapse
|
22
|
Hawkey J, Hamidian M, Wick RR, Edwards DJ, Billman-Jacobe H, Hall RM, Holt KE. ISMapper: identifying transposase insertion sites in bacterial genomes from short read sequence data. BMC Genomics 2015; 16:667. [PMID: 26336060 PMCID: PMC4558774 DOI: 10.1186/s12864-015-1860-2] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Accepted: 08/18/2015] [Indexed: 11/23/2022] Open
Abstract
Background Insertion sequences (IS) are small transposable elements, commonly found in bacterial genomes. Identifying the location of IS in bacterial genomes can be useful for a variety of purposes including epidemiological tracking and predicting antibiotic resistance. However IS are commonly present in multiple copies in a single genome, which complicates genome assembly and the identification of IS insertion sites. Here we present ISMapper, a mapping-based tool for identification of the site and orientation of IS insertions in bacterial genomes, directly from paired-end short read data. Results ISMapper was validated using three types of short read data: (i) simulated reads from a variety of species, (ii) Illumina reads from 5 isolates for which finished genome sequences were available for comparison, and (iii) Illumina reads from 7 Acinetobacter baumannii isolates for which predicted IS locations were tested using PCR. A total of 20 genomes, including 13 species and 32 distinct IS, were used for validation. ISMapper correctly identified 97 % of known IS insertions in the analysis of simulated reads, and 98 % in real Illumina reads. Subsampling of real Illumina reads to lower depths indicated ISMapper was able to correctly detect insertions for average genome-wide read depths >20x, although read depths >50x were required to obtain confident calls that were highly-supported by evidence from reads. All ISAba1 insertions identified by ISMapper in the A. baumannii genomes were confirmed by PCR. In each A. baumannii genome, ISMapper successfully identified an IS insertion upstream of the ampC beta-lactamase that could explain phenotypic resistance to third-generation cephalosporins. The utility of ISMapper was further demonstrated by profiling genome-wide IS6110 insertions in 138 publicly available Mycobacterium tuberculosis genomes, revealing lineage-specific insertions and multiple insertion hotspots. Conclusions ISMapper provides a rapid and robust method for identifying IS insertion sites directly from short read data, with a high degree of accuracy demonstrated across a wide range of bacteria.
Collapse
Affiliation(s)
- Jane Hawkey
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia. .,Faculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Mohammad Hamidian
- School of Molecular Bioscience, The University of Sydney, Sydney, 2006, Australia.
| | - Ryan R Wick
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - David J Edwards
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Helen Billman-Jacobe
- Faculty of Veterinary and Agricultural Science, The University of Melbourne, Parkville, VIC, 3010, Australia.
| | - Ruth M Hall
- School of Molecular Bioscience, The University of Sydney, Sydney, 2006, Australia.
| | - Kathryn E Holt
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, 3010, Australia.
| |
Collapse
|
23
|
Jiang C, Chen C, Huang Z, Liu R, Verdier J. ITIS, a bioinformatics tool for accurate identification of transposon insertion sites using next-generation sequencing data. BMC Bioinformatics 2015; 16:72. [PMID: 25887332 PMCID: PMC4351942 DOI: 10.1186/s12859-015-0507-2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Accepted: 02/20/2015] [Indexed: 08/30/2023] Open
Abstract
Background Transposable elements constitute an important part of the genome and are essential in adaptive mechanisms. Transposition events associated with phenotypic changes occur naturally or are induced in insertional mutant populations. Transposon mutagenesis results in multiple random insertions and recovery of most/all the insertions is critical for forward genetics study. Using genome next-generation sequencing data and appropriate bioinformatics tool, it is plausible to accurately identify transposon insertion sites, which could provide candidate causal mutations for desired phenotypes for further functional validation. Results We developed a novel bioinformatics tool, ITIS (Identification of Transposon Insertion Sites), for localizing transposon insertion sites within a genome. It takes next-generation genome re-sequencing data (NGS data), transposon sequence, and reference genome sequence as input, and generates a list of highly reliable candidate insertion sites as well as zygosity information of each insertion. Using a simulated dataset and a case study based on an insertional mutant line from Medicago truncatula, we showed that ITIS performed better in terms of sensitivity and specificity than other similar algorithms such as RelocaTE, RetroSeq, TEMP and TIF. With the case study data, we demonstrated the efficiency of ITIS by validating the presence and zygosity of predicted insertion sites of the Tnt1 transposon within a complex plant system, M. truncatula. Conclusion This study showed that ITIS is a robust and powerful tool for forward genetic studies in identifying transposable element insertions causing phenotypes. ITIS is suitable in various systems such as cell culture, bacteria, yeast, insect, mammal and plant. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0507-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chuan Jiang
- Shanghai Center for Plant Stress Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 201602, China. .,University of Chinese Academy of Sciences, Beijing, 100039, China.
| | - Chao Chen
- Shanghai Center for Plant Stress Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 201602, China. .,University of Chinese Academy of Sciences, Beijing, 100039, China.
| | - Ziyue Huang
- Shanghai Center for Plant Stress Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 201602, China.
| | - Renyi Liu
- Shanghai Center for Plant Stress Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 201602, China.
| | - Jerome Verdier
- Shanghai Center for Plant Stress Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai, 201602, China.
| |
Collapse
|
24
|
Transposable elements contribute to activation of maize genes in response to abiotic stress. PLoS Genet 2015; 11:e1004915. [PMID: 25569788 PMCID: PMC4287451 DOI: 10.1371/journal.pgen.1004915] [Citation(s) in RCA: 245] [Impact Index Per Article: 27.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 11/24/2014] [Indexed: 11/25/2022] Open
Abstract
Transposable elements (TEs) account for a large portion of the genome in many eukaryotic species. Despite their reputation as “junk” DNA or genomic parasites deleterious for the host, TEs have complex interactions with host genes and the potential to contribute to regulatory variation in gene expression. It has been hypothesized that TEs and genes they insert near may be transcriptionally activated in response to stress conditions. The maize genome, with many different types of TEs interspersed with genes, provides an ideal system to study the genome-wide influence of TEs on gene regulation. To analyze the magnitude of the TE effect on gene expression response to environmental changes, we profiled gene and TE transcript levels in maize seedlings exposed to a number of abiotic stresses. Many genes exhibit up- or down-regulation in response to these stress conditions. The analysis of TE families inserted within upstream regions of up-regulated genes revealed that between four and nine different TE families are associated with up-regulated gene expression in each of these stress conditions, affecting up to 20% of the genes up-regulated in response to abiotic stress, and as many as 33% of genes that are only expressed in response to stress. Expression of many of these same TE families also responds to the same stress conditions. The analysis of the stress-induced transcripts and proximity of the transposon to the gene suggests that these TEs may provide local enhancer activities that stimulate stress-responsive gene expression. Our data on allelic variation for insertions of several of these TEs show strong correlation between the presence of TE insertions and stress-responsive up-regulation of gene expression. Our findings suggest that TEs provide an important source of allelic regulatory variation in gene response to abiotic stress in maize. Transposable elements are mobile DNA elements that are a prevalent component of many eukaryotic genomes. While transposable elements can often have deleterious effects through insertions into protein-coding genes they may also contribute to regulatory variation of gene expression. There are a handful of examples in which specific transposon insertions contribute to regulatory variation of nearby genes, particularly in response to environmental stress. We sought to understand the genome-wide influence of transposable elements on gene expression responses to abiotic stress in maize, a plant with many families of transposable elements located in between genes. Our analysis suggests that a small number of maize transposable element families may contribute to the response of nearby genes to abiotic stress by providing stress-responsive enhancer-like functions. The specific insertions of transposable elements are often polymorphic within a species. Our data demonstrate that allelic variation for insertions of the transposable elements associated with stress-responsive expression can contribute to variation in the regulation of nearby genes. Thus novel insertions of transposable elements provide a potential mechanism for genes to acquire cis-regulatory influences that could contribute to heritable variation for stress response.
Collapse
|
25
|
Fiston-Lavier AS, Barrón MG, Petrov DA, González J. T-lex2: genotyping, frequency estimation and re-annotation of transposable elements using single or pooled next-generation sequencing data. Nucleic Acids Res 2014; 43:e22. [PMID: 25510498 PMCID: PMC4344482 DOI: 10.1093/nar/gku1250] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
Transposable elements (TEs) constitute the most active, diverse and ancient component in a broad range of genomes. Complete understanding of genome function and evolution cannot be achieved without a thorough understanding of TE impact and biology. However, in-depth analysis of TEs still represents a challenge due to the repetitive nature of these genomic entities. In this work, we present a broadly applicable and flexible tool: T-lex2. T-lex2 is the only available software that allows routine, automatic and accurate genotyping of individual TE insertions and estimation of their population frequencies both using individual strain and pooled next-generation sequencing data. Furthermore, T-lex2 also assesses the quality of the calls allowing the identification of miss-annotated TEs and providing the necessary information to re-annotate them. The flexible and customizable design of T-lex2 allows running it in any genome and for any type of TE insertion. Here, we tested the fidelity of T-lex2 using the fly and human genomes. Overall, T-lex2 represents a significant improvement in our ability to analyze the contribution of TEs to genome function and evolution as well as learning about the biology of TEs. T-lex2 is freely available online at http://sourceforge.net/projects/tlex.
Collapse
Affiliation(s)
- Anna-Sophie Fiston-Lavier
- Department of Biology, Stanford University, Stanford, CA 94305-5020, USA Institut des Sciences de l'Evolution de Montpellier (ISEM), UMR5554 CNRS-Université Montpellier 2, France
| | - Maite G Barrón
- Genomics, Bioinformatics and Evolution Group, Institut de Biotecnologia i de Biomedicina - IBB/Department of Genetics and Microbiology, Campus Universitat Autònoma de Barcelona, Bellaterra 08193, Spain Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| | - Dmitri A Petrov
- Department of Biology, Stanford University, Stanford, CA 94305-5020, USA
| | - Josefa González
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona 08003, Spain
| |
Collapse
|
26
|
Gilly A, Etcheverry M, Madoui MA, Guy J, Quadrana L, Alberti A, Martin A, Heitkam T, Engelen S, Labadie K, Le Pen J, Wincker P, Colot V, Aury JM. TE-Tracker: systematic identification of transposition events through whole-genome resequencing. BMC Bioinformatics 2014; 15:377. [PMID: 25408240 PMCID: PMC4279814 DOI: 10.1186/s12859-014-0377-z] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 11/05/2014] [Indexed: 11/10/2022] Open
Abstract
Background Transposable elements (TEs) are DNA sequences that are able to move from their location in the genome by cutting or copying themselves to another locus. As such, they are increasingly recognized as impacting all aspects of genome function. With the dramatic reduction in cost of DNA sequencing, it is now possible to resequence whole genomes in order to systematically characterize novel TE mobilization in a particular individual. However, this task is made difficult by the inherently repetitive nature of TE sequences, which in some eukaryotes compose over half of the genome sequence. Currently, only a few software tools dedicated to the detection of TE mobilization using next-generation-sequencing are described in the literature. They often target specific TEs for which annotation is available, and are only able to identify families of closely related TEs, rather than individual elements. Results We present TE-Tracker, a general and accurate computational method for the de-novo detection of germ line TE mobilization from re-sequenced genomes, as well as the identification of both their source and destination sequences. We compare our method with the two classes of existing software: specialized TE-detection tools and generic structural variant (SV) detection tools. We show that TE-Tracker, while working independently of any prior annotation, bridges the gap between these two approaches in terms of detection power. Indeed, its positive predictive value (PPV) is comparable to that of dedicated TE software while its sensitivity is typical of a generic SV detection tool. TE-Tracker demonstrates the benefit of adopting an annotation-independent, de novo approach for the detection of TE mobilization events. We use TE-Tracker to provide a comprehensive view of transposition events induced by loss of DNA methylation in Arabidopsis. TE-Tracker is freely available at http://www.genoscope.cns.fr/TE-Tracker. Conclusions We show that TE-Tracker accurately detects both the source and destination of novel transposition events in re-sequenced genomes. Moreover, TE-Tracker is able to detect all potential donor sequences for a given insertion, and can identify the correct one among them. Furthermore, TE-Tracker produces significantly fewer false positives than common SV detection programs, thus greatly facilitating the detection and analysis of TE mobilization events. Electronic supplementary material The online version of this article (doi:10.1186/s12859-014-0377-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Arthur Gilly
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France. .,Current address: The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.
| | - Mathilde Etcheverry
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
| | - Mohammed-Amin Madoui
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Julie Guy
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Leandro Quadrana
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
| | - Adriana Alberti
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Antoine Martin
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Technische Universität Dresden, Institute of Bota, ny, Plant Cell and Molecular Biology, D-01062, Dresden, Germany.
| | - Tony Heitkam
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Laboratoire de Biochimie et Physiologie Moléculaire des Plantes, Institut de Biologie Intégrative des Plantes 'Claude Grignon', UMR CNRS/INRA/SupAgro/UM2, Place Viala, 34060, Montpellier, Cedex, France.
| | - Stefan Engelen
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Karine Labadie
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Jeremie Le Pen
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France. .,Current address: Gurdon Institute and Department of Biochemistry, University of Cambridge, The Henry Wellcome Building of Cancer and Developmental Biology, Tennis Court Rd, Cambridge, CB2 1QN, UK.
| | - Patrick Wincker
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| | - Vincent Colot
- Institut de Biologie de l'Ecole Normale Supérieure, F-75230, Paris, Cedex 05, France. .,Centre National de la Recherche Scientifique (CNRS), UMR 8197, F-75230, Paris, Cedex 05, France. .,Institut national de la santé et de la recherche médicale (INSERM), U1024, F-75230, Paris, Cedex 05, France.
| | - Jean-Marc Aury
- Commissariat a l'Energie Atomique (CEA), Institut de Genomique (IG), Genoscope, 2 rue Gaston Crémieux, BP5706, 91057, Evry, France. .,Centre National de Recherche Scientifique (CNRS), UMR 8030, CP5706, Evry, France. .,Universite d'Evry, UMR 8030, CP5706, Evry, France.
| |
Collapse
|
27
|
Barrón MG, Fiston-Lavier AS, Petrov DA, González J. Population genomics of transposable elements in Drosophila. Annu Rev Genet 2014; 48:561-81. [PMID: 25292358 DOI: 10.1146/annurev-genet-120213-092359] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Studies of the population dynamics of transposable elements (TEs) in Drosophila melanogaster indicate that consistent forces are affecting TEs independently of their modes of transposition and regulation. New sequencing technologies enable biologists to sample genomes at an unprecedented scale in order to quantify genome-wide polymorphism for annotated and novel TE insertions. In this review, we first present new insights gleaned from high-throughput data for population genomics studies of D. melanogaster. We then consider the latest population genomics models for TE evolution and present examples of functional evidence revealed by genome-wide studies of TE population dynamics in D. melanogaster. Although most of the TE insertions are deleterious or neutral, some TE insertions increase the fitness of the individual that carries them and play a role in genome adaptation.
Collapse
Affiliation(s)
- Maite G Barrón
- Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Barcelona, Spain 08003; ,
| | | | | | | |
Collapse
|
28
|
McCoy RC, Taylor RW, Blauwkamp TA, Kelley JL, Kertesz M, Pushkarev D, Petrov DA, Fiston-Lavier AS. Illumina TruSeq synthetic long-reads empower de novo assembly and resolve complex, highly-repetitive transposable elements. PLoS One 2014; 9:e106689. [PMID: 25188499 PMCID: PMC4154752 DOI: 10.1371/journal.pone.0106689] [Citation(s) in RCA: 158] [Impact Index Per Article: 15.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 07/24/2014] [Indexed: 11/18/2022] Open
Abstract
High-throughput DNA sequencing technologies have revolutionized genomic analysis, including the de novo assembly of whole genomes. Nevertheless, assembly of complex genomes remains challenging, in part due to the presence of dispersed repeats which introduce ambiguity during genome reconstruction. Transposable elements (TEs) can be particularly problematic, especially for TE families exhibiting high sequence identity, high copy number, or complex genomic arrangements. While TEs strongly affect genome function and evolution, most current de novo assembly approaches cannot resolve long, identical, and abundant families of TEs. Here, we applied a novel Illumina technology called TruSeq synthetic long-reads, which are generated through highly-parallel library preparation and local assembly of short read data and which achieve lengths of 1.5–18.5 Kbp with an extremely low error rate (0.03% per base). To test the utility of this technology, we sequenced and assembled the genome of the model organism Drosophila melanogaster (reference genome strain y; cn, bw, sp) achieving an N50 contig size of 69.7 Kbp and covering 96.9% of the euchromatic chromosome arms of the current reference genome. TruSeq synthetic long-read technology enables placement of individual TE copies in their proper genomic locations as well as accurate reconstruction of TE sequences. We entirely recovered and accurately placed 4,229 (77.8%) of the 5,434 annotated transposable elements with perfect identity to the current reference genome. As TEs are ubiquitous features of genomes of many species, TruSeq synthetic long-reads, and likely other methods that generate long-reads, offer a powerful approach to improve de novo assemblies of whole genomes.
Collapse
Affiliation(s)
- Rajiv C. McCoy
- Department of Biology, Stanford University, Stanford, California, United States of America
- * E-mail:
| | - Ryan W. Taylor
- Department of Biology, Stanford University, Stanford, California, United States of America
| | | | - Joanna L. Kelley
- School of Biological Sciences, Washington State University, Pullman, Washington, United States of America
| | - Michael Kertesz
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Dmitry Pushkarev
- Department of Physics, Stanford University, Stanford, California, United States of America
| | - Dmitri A. Petrov
- Department of Biology, Stanford University, Stanford, California, United States of America
| | - Anna-Sophie Fiston-Lavier
- Department of Biology, Stanford University, Stanford, California, United States of America
- Institut des Sciences de l'Evolution-Montpellier, Montpellier, France
| |
Collapse
|
29
|
Tobias PA, Guest DI. Tree immunity: growing old without antibodies. TRENDS IN PLANT SCIENCE 2014; 19:367-70. [PMID: 24556378 DOI: 10.1016/j.tplants.2014.01.011] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/06/2013] [Revised: 01/18/2014] [Accepted: 01/21/2014] [Indexed: 05/04/2023]
Abstract
Perennial plants need to cope with changing environments and pathogens over their lifespan. Infections are compartmentalised by localised physiological responses, and multiple apical meristems enable repair and regrowth, but genes are another crucial component in the perception and response to pathogens. In this opinion article we suggest that the mechanism for dynamic pathogen-specific recognition in long-lived plants could be explained by extending our current understanding of plant defence genes. We propose that, in addition to physiological responses, tree defence uses a three-pronged genomic approach involving: (i) gene numbers, (ii) genomic architecture, and (iii) mutation loads accumulated over long lifespans.
Collapse
Affiliation(s)
- Peri A Tobias
- Department of Plant and Food Sciences, Faculty of Agriculture and Environment, University of Sydney, Biomedical Building C81, 1 Central Avenue, Australian Technology Park, Eveleigh, NSW 2015, Australia.
| | - David I Guest
- Department of Plant and Food Sciences, Faculty of Agriculture and Environment, University of Sydney, Biomedical Building C81, 1 Central Avenue, Australian Technology Park, Eveleigh, NSW 2015, Australia
| |
Collapse
|
30
|
Vitte C, Fustier MA, Alix K, Tenaillon MI. The bright side of transposons in crop evolution. Brief Funct Genomics 2014; 13:276-95. [PMID: 24681749 DOI: 10.1093/bfgp/elu002] [Citation(s) in RCA: 72] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The past decades have revealed an unexpected yet prominent role of so-called 'junk DNA' in the regulation of gene expression, thereby challenging our view of the mechanisms underlying phenotypic evolution. In particular, several mechanisms through which transposable elements (TEs) participate in functional genome diversity have been depicted, bringing to light the 'TEs bright side'. However, the relative contribution of those mechanisms and, more generally, the importance of TE-based polymorphisms on past and present phenotypic variation in crops species remain poorly understood. Here, we review current knowledge on both issues, and discuss how analyses of massively parallel sequencing data combined with statistical methodologies and functional validations will help unravelling the impact of TEs on crop evolution in a near future.
Collapse
|
31
|
Nakagome M, Solovieva E, Takahashi A, Yasue H, Hirochika H, Miyao A. Transposon Insertion Finder (TIF): a novel program for detection of de novo transpositions of transposable elements. BMC Bioinformatics 2014; 15:71. [PMID: 24629057 PMCID: PMC4004357 DOI: 10.1186/1471-2105-15-71] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Accepted: 03/06/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposition event detection of transposable element (TE) in the genome using short reads from the next-generation sequence (NGS) was difficult, because the nucleotide sequence of TE itself is repetitive, making it difficult to identify locations of its insertions by alignment programs for NGS. We have developed a program with a new algorithm to detect the transpositions from NGS data. RESULTS In the process of tool development, we used next-generation sequence (NGS) data of derivative lines (ttm2 and ttm5) of japonica rice cv. Nipponbare, regenerated through cell culture. The new program, called a transposon insertion finder (TIF), was applied to detect the de novo transpositions of Tos17 in the regenerated lines. TIF searched 300 million reads of a line within 20 min, identifying 4 and 12 de novo transposition in ttm2 and ttm5 lines, respectively. All of the transpositions were confirmed by PCR/electrophoresis and sequencing. Using the program, we also detected new transposon insertions of P-element from NGS data of Drosophila melanogaster. CONCLUSION TIF operates to find the transposition of any elements provided that target site duplications (TSDs) are generated by their transpositions.
Collapse
Affiliation(s)
| | | | | | | | | | - Akio Miyao
- Agrogenomics Research Center, National Institute of Agrobiological Sciences, 2-1-2, Kannondai, Tsukuba, Ibaraki 305-8602, Japan.
| |
Collapse
|