1
|
Yin Z, Yang Q, Shen D, Liu J, Huang W, Dou D. Online data resource for exploring transposon insertion polymorphisms in public soybean germplasm accessions. PLANT PHYSIOLOGY 2023; 193:1036-1044. [PMID: 37399251 DOI: 10.1093/plphys/kiad386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Revised: 05/30/2023] [Accepted: 06/11/2023] [Indexed: 07/05/2023]
Abstract
Soybean (Glycine max L. Merrill) is one of the most important economical crops. A large number of whole-genome resequencing datasets have been generated and are increasingly expanded for exploring genetic diversity and mining important quantitative trait loci. Most genome-wide association studies have focused on single-nucleotide polymorphisms, short insertions, and deletions. Nevertheless, structure variants mainly caused by transposon element mobilization are not fully considered. To fill this gap, we uniformly processed the publicly available whole-genome resequencing data from 5,521 soybean germplasm accessions and built an online soybean transposon insertion polymorphisms database named Soybean Transposon Insertion Polymorphisms Database (SoyTIPdb) (https://biotec.njau.edu.cn/soytipdb). The collected germplasm accessions derived from more than 45 countries and 160 regions representing the most comprehensive genetic diversity of soybean. SoyTIPdb implements easy-to-use query, analysis, and browse functions to help understand and find meaningful structural variations from TE insertions. In conclusion, SoyTIPdb is a valuable data resource and will help soybean breeders/researchers take advantage of the whole-genome sequencing datasets available in the public depositories.
Collapse
Affiliation(s)
- Zhiyuan Yin
- Department of Plant Pathology, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Qingjie Yang
- Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Danyu Shen
- Department of Plant Pathology, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| | - Jinding Liu
- Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Wen Huang
- Department of Animal Science, Michigan State University, East Lansing, MI 48824, USA
| | - Daolong Dou
- Department of Plant Pathology, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
- Bioinformatics Center, Academy for Advanced Interdisciplinary Studies, Nanjing Agricultural University, Nanjing, Jiangsu 210095, China
| |
Collapse
|
2
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of transposable element detectors with McClintock 2 guides accurate inference of Ty insertion patterns in yeast. Mob DNA 2023; 14:8. [PMID: 37452430 PMCID: PMC10347736 DOI: 10.1186/s13100-023-00296-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Accepted: 06/09/2023] [Indexed: 07/18/2023] Open
Abstract
BACKGROUND Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute, or evaluate multiple TE insertion detectors. RESULTS We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae, we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide consistent estimates of [Formula: see text]50 non-reference TE insertions per strain and that Ty2 has the highest number of non-reference TE insertions in a species-wide panel of [Formula: see text]1000 yeast genomes. Finally, we show that best-in-class predictors for yeast applied to resequencing data have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge about fine-scale target preferences revealed previously for experimentally-induced Ty1 insertions to spontaneous insertions for other copia-superfamily retrotransposons in yeast. CONCLUSION McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors in other species.
Collapse
Affiliation(s)
- Jingxuan Chen
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | | | - Shunhua Han
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
| | - David J. Garfinkel
- Department of Biochemistry and Molecular Biology, University of Georgia, Athens, GA USA
| | - Casey M. Bergman
- Institute of Bioinformatics, University of Georgia, Athens, GA USA
- Department of Genetics, University of Georgia, Athens, GA USA
| |
Collapse
|
3
|
Hays M, Schwartz K, Schmidtke DT, Aggeli D, Sherlock G. Paths to adaptation under fluctuating nitrogen starvation: The spectrum of adaptive mutations in Saccharomyces cerevisiae is shaped by retrotransposons and microhomology-mediated recombination. PLoS Genet 2023; 19:e1010747. [PMID: 37192196 PMCID: PMC10218751 DOI: 10.1371/journal.pgen.1010747] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 05/26/2023] [Accepted: 04/14/2023] [Indexed: 05/18/2023] Open
Abstract
There are many mechanisms that give rise to genomic change: while point mutations are often emphasized in genomic analyses, evolution acts upon many other types of genetic changes that can result in less subtle perturbations. Changes in chromosome structure, DNA copy number, and novel transposon insertions all create large genomic changes, which can have correspondingly large impacts on phenotypes and fitness. In this study we investigate the spectrum of adaptive mutations that arise in a population under consistently fluctuating nitrogen conditions. We specifically contrast these adaptive alleles and the mutational mechanisms that create them, with mechanisms of adaptation under batch glucose limitation and constant selection in low, non-fluctuating nitrogen conditions to address if and how selection dynamics influence the molecular mechanisms of evolutionary adaptation. We observe that retrotransposon activity accounts for a substantial number of adaptive events, along with microhomology-mediated mechanisms of insertion, deletion, and gene conversion. In addition to loss of function alleles, which are often exploited in genetic screens, we identify putative gain of function alleles and alleles acting through as-of-yet unclear mechanisms. Taken together, our findings emphasize that how selection (fluctuating vs. non-fluctuating) is applied also shapes adaptation, just as the selective pressure (nitrogen vs. glucose) does itself. Fluctuating environments can activate different mutational mechanisms, shaping adaptive events accordingly. Experimental evolution, which allows a wider array of adaptive events to be assessed, is thus a complementary approach to both classical genetic screens and natural variation studies to characterize the genotype-to-phenotype-to-fitness map.
Collapse
Affiliation(s)
- Michelle Hays
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Katja Schwartz
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Danica T. Schmidtke
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Dimitra Aggeli
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Gavin Sherlock
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| |
Collapse
|
4
|
Chen J, Basting PJ, Han S, Garfinkel DJ, Bergman CM. Reproducible evaluation of short-read transposable element detectors and species-wide data mining of insertion patterns in yeast. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.02.13.528343. [PMID: 36824955 PMCID: PMC9948991 DOI: 10.1101/2023.02.13.528343] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/17/2023]
Abstract
Background Many computational methods have been developed to detect non-reference transposable element (TE) insertions using short-read whole genome sequencing data. The diversity and complexity of such methods often present challenges to new users seeking to reproducibly install, execute or evaluate multiple TE insertion detectors. Results We previously developed the McClintock meta-pipeline to facilitate the installation, execution, and evaluation of six first-generation short-read TE detectors. Here, we report a completely re-implemented version of McClintock written in Python using Snakemake and Conda that improves its installation, error handling, speed, stability, and extensibility. McClintock 2 now includes 12 short-read TE detectors, auxiliary pre-processing and analysis modules, interactive HTML reports, and a simulation framework to reproducibly evaluate the accuracy of component TE detectors. When applied to the model microbial eukaryote Saccharomyces cerevisiae , we find substantial variation in the ability of McClintock 2 components to identify the precise locations of non-reference TE insertions, with RelocaTE2 showing the highest recall and precision in simulated data. We find that RelocaTE2, TEMP, TEMP2 and TEBreak provide a consistent and biologically meaningful view of non-reference TE insertions in a species-wide panel of ∼ 1000 yeast genomes, as evaluated by coverage-based abundance estimates and expected patterns of tRNA promoter targeting. Finally, we show that best-in-class predictors for yeast have sufficient resolution to reveal a dyad pattern of integration in nucleosome-bound regions upstream of yeast tRNA genes for Ty1, Ty2, and Ty4, allowing us to extend knowledge aboutfine-scale target preferences first revealed experimentally for Ty1 to natural insertions and related copia -superfamily retrotransposons in yeast. Conclusion McClintock ( https://github.com/bergmanlab/mcclintock/ ) provides a user-friendly pipeline for the identification of TEs in short-read WGS data using multiple TE detectors, which should benefit researchers studying TE insertion variation in a wide range of different organisms. Application of the improved McClintock system to simulated and empirical yeast genome data reveals best-in-class methods and novel biological insights for one of the most widely-studied model eukaryotes and provides a paradigm for evaluating and selecting non-reference TE detectors for other species.
Collapse
|
5
|
Bajus M, Macko-Podgórni A, Grzebelus D, Baránek M. A review of strategies used to identify transposition events in plant genomes. FRONTIERS IN PLANT SCIENCE 2022; 13:1080993. [PMID: 36531345 PMCID: PMC9751208 DOI: 10.3389/fpls.2022.1080993] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 11/17/2022] [Indexed: 06/17/2023]
Abstract
Transposable elements (TEs) were initially considered redundant and dubbed 'junk DNA'. However, more recently they were recognized as an essential element of genome plasticity. In nature, they frequently become active upon exposition of the host to stress conditions. Even though most transposition events are neutral or even deleterious, occasionally they may happen to be beneficial, resulting in genetic novelty providing better fitness to the host. Hence, TE mobilization may promote adaptability and, in the long run, act as a significant evolutionary force. There are many examples of TE insertions resulting in increased tolerance to stresses or in novel features of crops which are appealing to the consumer. Possibly, TE-driven de novo variability could be utilized for crop improvement. However, in order to systematically study the mechanisms of TE/host interactions, it is necessary to have suitable tools to globally monitor any ongoing TE mobilization. With the development of novel potent technologies, new high-throughput strategies for studying TE dynamics are emerging. Here, we present currently available methods applied to monitor the activity of TEs in plants. We divide them on the basis of their operational principles, the position of target molecules in the process of transposition and their ability to capture real cases of actively transposing elements. Their possible theoretical and practical drawbacks are also discussed. Finally, conceivable strategies and combinations of methods resulting in an improved performance are proposed.
Collapse
Affiliation(s)
- Marko Bajus
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| | - Alicja Macko-Podgórni
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Dariusz Grzebelus
- Department of Plant Biology and Biotechnology, Faculty of Biotechnology and Horticulture, University of Agriculture in Krakow, Kraków, Poland
| | - Miroslav Baránek
- Mendeleum—Institute of Genetics, Faculty of Horticulture, Mendel University in Brno, Lednice, Czechia
| |
Collapse
|
6
|
Monden Y, Tanaka H, Funakoshi R, Sunayama S, Yabe K, Kimoto E, Matsumiya K, Yoshikawa T. Comprehensive survey of transposon mPing insertion sites and transcriptome analysis for identifying candidate genes controlling high protein content of rice. FRONTIERS IN PLANT SCIENCE 2022; 13:969582. [PMID: 36119631 PMCID: PMC9479144 DOI: 10.3389/fpls.2022.969582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 08/05/2022] [Indexed: 06/15/2023]
Abstract
Rice is the most important crop species in the world, being staple food of more than 80% of people in Asia. About 80% of rice grain is composed of carbohydrates (starch), with its protein content as low as 7-8%. Therefore, increasing the protein content of rice offers way to create a stable protein source that contributes to improving malnutrition and health problems worldwide. We detected two rice lines harboring a significantly higher protein content (namely, HP5-7 and HP7-5) in the EG4 population. The EG4 strain of rice is a unique material in that the transposon mPing has high transpositional activity and high copy numbers under natural conditions. Other research indicated that mPing is abundant in the gene-rich euchromatic regions, suggesting that mPing amplification should create new allelic variants, novel regulatory networks, and phenotypic changes in the EG4 population. Here, we aimed to identify the candidate genes and/or mPing insertion sites causing high protein content by comprehensively identifying the mPing insertion sites and carrying out an RNA-seq-based transcriptome analysis. By utilizing the next-generation sequencing (NGS)-based methods, ca. 570 mPing insertion sites were identified per line in the EG4 population. Our results also indicated that mPing apparently has a preference for inserting itself in the region near a gene, with 38 genes in total found to contain the mPing insertion in the HP lines, of which 21 and 17 genes were specific to HP5-7 and HP7-5, respectively. Transcriptome analysis revealed that most of the genes related to protein synthesis (encoding glutelin, prolamin, and globulin) were up-regulated in HP lines relative to the control line. Interestingly, the differentially expressed gene (DEG) analysis revealed that the expression levels of many genes related to photosynthesis decreased in both HP lines; this suggests the amount of starch may have decreased, indirectly contributing to the increased protein content. The high-protein lines studied here are expected to contribute to the development of high protein-content rice by introducing valuable phenotypic traits such as high and stable yield, disease resistance, and abundant nutrients.
Collapse
Affiliation(s)
- Yuki Monden
- Graduate School of Environmental and Life Science, Okayama University, Okayama, Japan
| | - Hirona Tanaka
- Faculty of Agriculture, Okayama University, Okayama, Japan
| | | | | | - Kiyotaka Yabe
- Faculty of Agriculture, Kyoto University, Kyoto, Japan
| | - Eri Kimoto
- Graduate School of Agriculture, Kyoto University, Kyoto, Japan
| | | | | |
Collapse
|
7
|
Cai X, Lin R, Liang J, King GJ, Wu J, Wang X. Transposable element insertion: a hidden major source of domesticated phenotypic variation in Brassica rapa. PLANT BIOTECHNOLOGY JOURNAL 2022; 20:1298-1310. [PMID: 35278263 PMCID: PMC9241368 DOI: 10.1111/pbi.13807] [Citation(s) in RCA: 18] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/27/2021] [Revised: 02/16/2022] [Accepted: 03/01/2022] [Indexed: 05/20/2023]
Abstract
Transposable element (TE) is prevalent in plant genomes. However, studies on their impact on phenotypic evolution in crop plants are relatively rare, because systematically identifying TE insertions within a species has been a challenge. Here, we present a novel approach for uncovering TE insertion polymorphisms (TIPs) using pan-genome analysis combined with population-scale resequencing, and we adopt this pipeline to retrieve TIPs in a Brassica rapa germplasm collection. We found that 23% of genes within the reference Chiifu-401-42 genome harbored TIPs. TIPs tended to have large transcriptional effects, including modifying gene expression levels and altering gene structure by introducing new introns. Among 524 diverse accessions, TIPs broadly influenced genes related to traits and acted a crucial role in the domestication of B. rapa morphotypes. As examples, four specific TIP-containing genes were found to be candidates that potentially involved in various climatic conditions, promoting the formation of diverse vegetable crops in B. rapa. Our work reveals the hitherto hidden TIPs implicated in agronomic traits and highlights their widespread utility in studies of crop domestication.
Collapse
Affiliation(s)
- Xu Cai
- Institute of Vegetables and FlowersChinese Academy of Agricultural SciencesBeijingChina
| | - Runmao Lin
- Institute of Vegetables and FlowersChinese Academy of Agricultural SciencesBeijingChina
| | - Jianli Liang
- Institute of Vegetables and FlowersChinese Academy of Agricultural SciencesBeijingChina
| | - Graham J. King
- Southern Cross Plant ScienceSouthern Cross UniversityLismoreNSWAustralia
| | - Jian Wu
- Institute of Vegetables and FlowersChinese Academy of Agricultural SciencesBeijingChina
| | - Xiaowu Wang
- Institute of Vegetables and FlowersChinese Academy of Agricultural SciencesBeijingChina
| |
Collapse
|
8
|
Yan H, Haak DC, Li S, Huang L, Bombarely A. Exploring transposable element-based markers to identify allelic variations underlying agronomic traits in rice. PLANT COMMUNICATIONS 2022; 3:100270. [PMID: 35576152 PMCID: PMC9251385 DOI: 10.1016/j.xplc.2021.100270] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 10/29/2021] [Accepted: 12/16/2021] [Indexed: 06/10/2023]
Abstract
Transposable elements (TEs) are a major force in the production of new alleles during domestication; nevertheless, their use in association studies has been limited because of their complexity. We have developed a TE genotyping pipeline (TEmarker) and applied it to whole-genome genome-wide association study (GWAS) data from 176 Oryza sativa subsp. japonica accessions to identify genetic elements associated with specific agronomic traits. TE markers recovered a large proportion (69%) of single-nucleotide polymorphism (SNP)-based GWAS peaks, and these TE peaks retained ca. 25% of the SNPs. The use of TEs in GWASs may reduce false positives associated with linkage disequilibrium (LD) among SNP markers. A genome scan revealed positive selection on TEs associated with agronomic traits. We found several cases of insertion and deletion variants that potentially resulted from the direct action of TEs, including an allele of LOC_Os11g08410 associated with plant height and panicle length traits. Together, these findings reveal the utility of TE markers for connecting genotype to phenotype and suggest a potential role for TEs in influencing phenotypic variations in rice that impact agronomic traits.
Collapse
Affiliation(s)
- Haidong Yan
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - David C Haak
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA; Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Virginia Tech, Blacksburg, VA 24061, USA
| | - Song Li
- School of Plant and Environmental Sciences, Virginia Tech, Blacksburg, VA 24061, USA; Graduate Program in Genetics, Bioinformatics and Computational Biology (GBCB), Virginia Tech, Blacksburg, VA 24061, USA
| | - Linkai Huang
- Department of Grassland Science, Animal Science and Technology College, Sichuan Agricultural University, Chengdu 611130, China
| | - Aureliano Bombarely
- Department of Bioscience, Universita degli Studi di Milano (UNIMI), 20133 Milano, Italy; Instituto de Biologıa Molecular y Celular de Plantas (IBMCP), UPV-CSIC, 46022 Valencia, Spain.
| |
Collapse
|
9
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
10
|
Song R, Wang Z, Wang H, Zhang H, Wang X, Nguyen H, Holding D, Yu B, Clemente T, Jia S, Zhang C. InMut-finder: a software tool for insertion identification in mutagenesis using Nanopore long reads. BMC Genomics 2021; 22:908. [PMID: 34923956 PMCID: PMC8684674 DOI: 10.1186/s12864-021-08206-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2021] [Accepted: 11/24/2021] [Indexed: 11/24/2022] Open
Abstract
Background Biological mutagens (such as transposon) with sequences inserted, play a crucial role to link observed phenotype and genotype in reverse genetic studies. For this reason, accurate and efficient software tools for identifying insertion sites based on the analysis of sequencing reads are desired. Results We developed a bioinformatics tool, a Finder, to identify genome-wide Insertions in Mutagenesis (named as “InMut-Finder”), based on target sequences and flanking sequences from long reads, such as Oxford Nanopore Sequencing. InMut-Finder succeeded in identify > 100 insertion sites in Medicago truncatula and soybean mutants based on sequencing reads of whole-genome DNA or enriched insertion-site DNA fragments. Insertion sites discovered by InMut-Finder were validated by PCR experiments. Conclusion InMut-Finder is a comprehensive and powerful tool for automated insertion detection from Nanopore long reads. The simplicity, efficiency, and flexibility of InMut-Finder make it a valuable tool for functional genomics and forward and reverse genetics. InMut-Finder was implemented with Perl, R, and Shell scripts, which are independent of the OS. The source code and instructions can be accessed at https://github.com/jsg200830/InMut-Finder. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08206-9.
Collapse
Affiliation(s)
- Rui Song
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Ziyao Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Hui Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Han Zhang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Xuemeng Wang
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China
| | - Hanh Nguyen
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - David Holding
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA.,Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA
| | - Bin Yu
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA.,School of Biological Sciences, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - Tom Clemente
- Department of Agronomy and Horticulture, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA
| | - Shangang Jia
- College of Grassland Science and Technology, China Agricultural University, Beijing, 100193, China.
| | - Chi Zhang
- Center for Plant Science Innovation, University of Nebraska-Lincoln, Lincoln, NE, 68588, USA. .,School of Biological Sciences, Center for Plant Science Innovation, Beadle Center for Biotechnology, University of Nebraska, Lincoln, NE, 68588, USA.
| |
Collapse
|
11
|
Yu T, Huang X, Dou S, Tang X, Luo S, Theurkauf WE, Lu J, Weng Z. A benchmark and an algorithm for detecting germline transposon insertions and measuring de novo transposon insertion frequencies. Nucleic Acids Res 2021; 49:e44. [PMID: 33511407 PMCID: PMC8096211 DOI: 10.1093/nar/gkab010] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2020] [Revised: 12/28/2020] [Accepted: 01/06/2021] [Indexed: 02/01/2023] Open
Abstract
Transposons are genomic parasites, and their new insertions can cause instability and spur the evolution of their host genomes. Rapid accumulation of short-read whole-genome sequencing data provides a great opportunity for studying new transposon insertions and their impacts on the host genome. Although many algorithms are available for detecting transposon insertions, the task remains challenging and existing tools are not designed for identifying de novo insertions. Here, we present a new benchmark fly dataset based on PacBio long-read sequencing and a new method TEMP2 for detecting germline insertions and measuring de novo ‘singleton’ insertion frequencies in eukaryotic genomes. TEMP2 achieves high sensitivity and precision for detecting germline insertions when compared with existing tools using both simulated data in fly and experimental data in fly and human. Furthermore, TEMP2 can accurately assess the frequencies of de novo transposon insertions even with high levels of chimeric reads in simulated datasets; such chimeric reads often occur during the construction of short-read sequencing libraries. By applying TEMP2 to published data on hybrid dysgenic flies inflicted by de-repressed P-elements, we confirmed the continuous new insertions of P-elements in dysgenic offspring before they regain piRNAs for P-element repression. TEMP2 is freely available at Github: https://github.com/weng-lab/TEMP2.
Collapse
Affiliation(s)
- Tianxiong Yu
- Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Xiao Huang
- Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China
| | - Shengqian Dou
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Xiaolu Tang
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Shiqi Luo
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - William E Theurkauf
- Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | - Jian Lu
- State Key Laboratory of Protein and Plant Gene Research, Center for Bioinformatics, School of Life Sciences and Peking-Tsinghua Center for Life Sciences, Peking University, Beijing 100871, China
| | - Zhiping Weng
- Department of Thoracic Surgery, Clinical Translational Research Center, Shanghai Pulmonary Hospital, The School of Life Sciences and Technology, Tongji University, Shanghai 200092, China.,Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| |
Collapse
|
12
|
Chen P, Zhang J. Asexual Experimental Evolution of Yeast Does Not Curtail Transposable Elements. Mol Biol Evol 2021; 38:2831-2842. [PMID: 33720342 PMCID: PMC8233515 DOI: 10.1093/molbev/msab073] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
Compared with asexual reproduction, sex facilitates the transmission of transposable elements (TEs) from one genome to another, but boosts the efficacy of selection against deleterious TEs. Thus, theoretically, it is unclear whether sex has a positive net effect on TE’s proliferation. An empirical study concluded that sex is at the root of TE’s evolutionary success because the yeast TE load was found to decrease rapidly in approximately 1,000 generations of asexual but not sexual experimental evolution. However, this finding contradicts the maintenance of TEs in natural yeast populations where sexual reproduction occurs extremely infrequently. Here, we show that the purported TE load reduction during asexual experimental evolution is likely an artifact of low genomic sequencing coverages. We observe stable TE loads in both sexual and asexual experimental evolution from multiple yeast data sets with sufficient coverages. To understand the evolutionary dynamics of yeast TEs, we turn to asexual mutation accumulation lines that have been under virtually no selection. We find that both TE transposition and excision rates per generation, but not their difference, tend to be higher in environments where yeast grows more slowly. However, the transposition rate is not significantly higher than the excision rate and the variance of the TE number among natural strains is close to its neutral expectation, suggesting that selection against TEs is at best weak in yeast. We conclude that the yeast TE load is maintained largely by a transposition–excision balance and that the influence of sex remains unclear.
Collapse
Affiliation(s)
- Piaopiao Chen
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, USA
| |
Collapse
|
13
|
Genomic diversity generated by a transposable element burst in a rice recombinant inbred population. Proc Natl Acad Sci U S A 2020; 117:26288-26297. [PMID: 33020276 PMCID: PMC7584900 DOI: 10.1073/pnas.2015736117] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Genomes of all characterized higher eukaryotes harbor examples of transposable element (TE) bursts-the rapid amplification of TE copies throughout a genome. Despite their prevalence, understanding how bursts diversify genomes requires the characterization of actively transposing TEs before insertion sites and structural rearrangements have been obscured by selection acting over evolutionary time. In this study, rice recombinant inbred lines (RILs), generated by crossing a bursting accession and the reference Nipponbare accession, were exploited to characterize the spread of the very active Ping/mPing family through a small population and the resulting impact on genome diversity. Comparative sequence analysis of 272 individuals led to the identification of over 14,000 new insertions of the mPing miniature inverted-repeat transposable element (MITE), with no evidence for silencing of the transposase-encoding Ping element. In addition to new insertions, Ping-encoded transposase was found to preferentially catalyze the excision of mPing loci tightly linked to a second mPing insertion. Similarly, structural variations, including deletion of rice exons or regulatory regions, were enriched for those with break points at one or both ends of linked mPing elements. Taken together, these results indicate that structural variations are generated during a TE burst as transposase catalyzes both the high copy numbers needed to distribute linked elements throughout the genome and the DNA cuts at the TE ends known to dramatically increase the frequency of recombination.
Collapse
|
14
|
Goubert C, Thomas J, Payer LM, Kidd JM, Feusier J, Watkins WS, Burns KH, Jorde LB, Feschotte C. TypeTE: a tool to genotype mobile element insertions from whole genome resequencing data. Nucleic Acids Res 2020; 48:e36. [PMID: 32067044 PMCID: PMC7102983 DOI: 10.1093/nar/gkaa074] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2019] [Revised: 01/08/2020] [Accepted: 02/11/2020] [Indexed: 12/12/2022] Open
Abstract
Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.
Collapse
Affiliation(s)
- Clément Goubert
- Department of Molecular Biology and Genetics, 215 Tower Rd, Cornell University, Ithaca, NY 14853, USA
| | - Jainy Thomas
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Lindsay M Payer
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Jeffrey M Kidd
- Department of Human Genetics, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Julie Feusier
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - W Scott Watkins
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Kathleen H Burns
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Lynn B Jorde
- Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT 84112, USA
| | - Cédric Feschotte
- Department of Molecular Biology and Genetics, 215 Tower Rd, Cornell University, Ithaca, NY 14853, USA
| |
Collapse
|
15
|
Akakpo R, Carpentier MC, Ie Hsing Y, Panaud O. The impact of transposable elements on the structure, evolution and function of the rice genome. THE NEW PHYTOLOGIST 2020; 226:44-49. [PMID: 31797393 DOI: 10.1111/nph.16356] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/02/2019] [Accepted: 11/05/2019] [Indexed: 06/10/2023]
Abstract
Transposable elements (TEs) are ubiquitous in plants and are the primary genomic component of the majority of taxa. Knowledge of their impact on the structure, function and evolution of plant genomes is therefore a priority in the field of genomics. Rice, as one of the most prevalent crops for food security worldwide, has been subjected to intense research efforts over recent decades. Consequently, a considerable amount of genomic resources has been generated and made freely available to the scientific community. These can be exploited both to improve our understanding of some basic aspects of genome biology of this species and to develop new concepts for crop improvement. In this review, we describe the current knowledge on how TEs have shaped rice chromosomes and propose a new strategy based on a genome-wide association study (GWAS) to address the important question of their functional impact on this crop.
Collapse
Affiliation(s)
- Roland Akakpo
- Laboratoire Génome et Développement des Plantes, UMR 5096 CNRS/UPVD, Université de Perpignan, Via Domitia, 52 Avenue Paul Alduy, 66860, Perpignan Cedex, France
| | - Marie-Christine Carpentier
- Laboratoire Génome et Développement des Plantes, UMR 5096 CNRS/UPVD, Université de Perpignan, Via Domitia, 52 Avenue Paul Alduy, 66860, Perpignan Cedex, France
| | - Yue Ie Hsing
- Institute of Plant and Microbial Biology, Acadeia Sinica, 128, Section 2, Yien-chu-yuan Road, Nankang, 115, Taipei, Taiwan
| | - Olivier Panaud
- Laboratoire Génome et Développement des Plantes, UMR 5096 CNRS/UPVD, Université de Perpignan, Via Domitia, 52 Avenue Paul Alduy, 66860, Perpignan Cedex, France
- Institut Universitaire de France, 1 Rue Descartes, 75231, Paris Cedex 05, France
| |
Collapse
|
16
|
Uzunović J, Josephs EB, Stinchcombe JR, Wright SI. Transposable Elements Are Important Contributors to Standing Variation in Gene Expression in Capsella Grandiflora. Mol Biol Evol 2020; 36:1734-1745. [PMID: 31028401 DOI: 10.1093/molbev/msz098] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Transposable elements (TEs) make up a significant portion of eukaryotic genomes and are important drivers of genome evolution. However, the extent to which TEs affect gene expression variation on a genome-wide scale in comparison with other types of variants is still unclear. We characterized TE insertion polymorphisms and their association with gene expression in 124 whole-genome sequences from a single population of Capsella grandiflora, and contrasted this with the effects of single nucleotide polymorphisms (SNPs). Population frequency of insertions was negatively correlated with distance to genes, as well as density of conserved noncoding elements, suggesting that the negative effects of TEs on gene regulation are important in limiting their abundance. Rare TE variants strongly influence gene expression variation, predominantly through downregulation. In contrast, rare SNPs contribute equally to up- and down-regulation, but have a weaker individual effect than TEs. An expression quantitative trait loci (eQTL) analysis shows that a greater proportion of common TEs are eQTLs as opposed to common SNPs, and a third of the genes with TE eQTLs do not have SNP eQTLs. In contrast with rare TE insertions, common insertions are more likely to increase expression, consistent with recent models of cis-regulatory evolution favoring enhancer alleles. Taken together, these results imply that TEs are a significant contributor to gene expression variation and are individually more likely than rare SNPs to cause extreme changes in gene expression.
Collapse
Affiliation(s)
- Jasmina Uzunović
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Emily B Josephs
- Department of Plant Biology, Michigan State University, East Lansing, MI
| | - John R Stinchcombe
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.,Koffler Scientific Reserve, University of Toronto, Toronto, Ontario, Canada
| | - Stephen I Wright
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada.,Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
17
|
Vendrell-Mir P, Barteri F, Merenciano M, González J, Casacuberta JM, Castanera R. A benchmark of transposon insertion detection tools using real data. Mob DNA 2019; 10:53. [PMID: 31892957 PMCID: PMC6937713 DOI: 10.1186/s13100-019-0197-9] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Accepted: 12/17/2019] [Indexed: 02/01/2023] Open
Abstract
Background Transposable elements (TEs) are an important source of genomic variability in eukaryotic genomes. Their activity impacts genome architecture and gene expression and can lead to drastic phenotypic changes. Therefore, identifying TE polymorphisms is key to better understand the link between genotype and phenotype. However, most genotype-to-phenotype analyses have concentrated on single nucleotide polymorphisms as they are easier to reliable detect using short-read data. Many bioinformatic tools have been developed to identify transposon insertions from resequencing data using short reads. Nevertheless, the performance of most of these tools has been tested using simulated insertions, which do not accurately reproduce the complexity of natural insertions. Results We have overcome this limitation by building a dataset of insertions from the comparison of two high-quality rice genomes, followed by extensive manual curation. This dataset contains validated insertions of two very different types of TEs, LTR-retrotransposons and MITEs. Using this dataset, we have benchmarked the sensitivity and precision of 12 commonly used tools, and our results suggest that in general their sensitivity was previously overestimated when using simulated data. Our results also show that, increasing coverage leads to a better sensitivity but with a cost in precision. Moreover, we found important differences in tool performance, with some tools performing better on a specific type of TEs. We have also used two sets of experimentally validated insertions in Drosophila and humans and show that this trend is maintained in genomes of different size and complexity. Conclusions We discuss the possible choice of tools depending on the goals of the study and show that the appropriate combination of tools could be an option for most approaches, increasing the sensitivity while maintaining a good precision.
Collapse
Affiliation(s)
- Pol Vendrell-Mir
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Fabio Barteri
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Miriam Merenciano
- 2Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Passeig Maritim Barceloneta 37-49, 08003 Barcelona, Spain
| | - Josefa González
- 2Institute of Evolutionary Biology (CSIC-Universitat Pompeu Fabra), Passeig Maritim Barceloneta 37-49, 08003 Barcelona, Spain
| | - Josep M Casacuberta
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| | - Raúl Castanera
- 1Centre for Research in Agricultural Genomics CSIC-IRTA-UAB-UB, Campus UAB, Edifici CRAG, Bellaterra, 08193 Barcelona, Spain
| |
Collapse
|
18
|
Bourgeois Y, Boissinot S. On the Population Dynamics of Junk: A Review on the Population Genomics of Transposable Elements. Genes (Basel) 2019; 10:genes10060419. [PMID: 31151307 PMCID: PMC6627506 DOI: 10.3390/genes10060419] [Citation(s) in RCA: 67] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2019] [Revised: 05/05/2019] [Accepted: 05/21/2019] [Indexed: 01/18/2023] Open
Abstract
Transposable elements (TEs) play an important role in shaping genomic organization and structure, and may cause dramatic changes in phenotypes. Despite the genetic load they may impose on their host and their importance in microevolutionary processes such as adaptation and speciation, the number of population genetics studies focused on TEs has been rather limited so far compared to single nucleotide polymorphisms (SNPs). Here, we review the current knowledge about the dynamics of transposable elements at recent evolutionary time scales, and discuss the mechanisms that condition their abundance and frequency. We first discuss non-adaptive mechanisms such as purifying selection and the variable rates of transposition and elimination, and then focus on positive and balancing selection, to finally conclude on the potential role of TEs in causing genomic incompatibilities and eventually speciation. We also suggest possible ways to better model TEs dynamics in a population genomics context by incorporating recent advances in TEs into the rich information provided by SNPs about the demography, selection, and intrinsic properties of genomes.
Collapse
Affiliation(s)
- Yann Bourgeois
- New York University Abu Dhabi, P.O. 129188, Saadiyat Island, Abu Dhabi, United Arab Emirates.
| | - Stéphane Boissinot
- New York University Abu Dhabi, P.O. 129188, Saadiyat Island, Abu Dhabi, United Arab Emirates.
| |
Collapse
|
19
|
Tracking the origin of two genetic components associated with transposable element bursts in domesticated rice. Nat Commun 2019; 10:641. [PMID: 30733435 PMCID: PMC6367367 DOI: 10.1038/s41467-019-08451-3] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2018] [Accepted: 01/09/2019] [Indexed: 11/08/2022] Open
Abstract
Transposable elements (TEs) shape genome evolution through periodic bursts of amplification. In this study prior knowledge of the mPing/Ping/Pong TE family is exploited to track their copy numbers and distribution in genome sequences from 3,000 accessions of domesticated Oryza sativa (rice) and the wild progenitor Oryza rufipogon. We find that mPing bursts are restricted to recent domestication and is likely due to the accumulation of two TE components, Ping16A and Ping16A_Stow, that appear to be critical for mPing hyperactivity. Ping16A is a variant of the autonomous element with reduced activity as shown in a yeast transposition assay. Transposition of Ping16A into a Stowaway element generated Ping16A_Stow, the only Ping locus shared by all bursting accessions, and shown here to correlate with high mPing copies. Finally, we show that sustained activity of the mPing/Ping family in domesticated rice produced the components necessary for mPing bursts, not the loss of epigenetic regulation.
Collapse
|
20
|
McClintock: An Integrated Pipeline for Detecting Transposable Element Insertions in Whole-Genome Shotgun Sequencing Data. G3-GENES GENOMES GENETICS 2017. [PMID: 28637810 PMCID: PMC5555480 DOI: 10.1534/g3.117.043893] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Transposable element (TE) insertions are among the most challenging types of variants to detect in genomic data because of their repetitive nature and complex mechanisms of replication . Nevertheless, the recent availability of large resequencing data sets has spurred the development of many new methods to detect TE insertions in whole-genome shotgun sequences. Here we report an integrated bioinformatics pipeline for the detection of TE insertions in whole-genome shotgun data, called McClintock (https://github.com/bergmanlab/mcclintock), which automatically runs and standardizes output for multiple TE detection methods. We demonstrate the utility of McClintock by evaluating six TE detection methods using simulated and real genome data from the model microbial eukaryote, Saccharomyces cerevisiae We find substantial variation among McClintock component methods in their ability to detect nonreference TEs in the yeast genome, but show that nonreference TEs at nearly all biologically realistic locations can be detected in simulated data by combining multiple methods that use split-read and read-pair evidence. In general, our results reveal that split-read methods detect fewer nonreference TE insertions than read-pair methods, but generally have much higher positional accuracy. Analysis of a large sample of real yeast genomes reveals that most McClintock component methods can recover known aspects of TE biology in yeast such as the transpositional activity status of families, target preferences, and target site duplication structure, albeit with varying levels of accuracy. Our work provides a general framework for integrating and analyzing results from multiple TE detection methods, as well as useful guidance for researchers studying TEs in yeast resequencing data.
Collapse
|
21
|
Zhang S, Kelleher ES. Targeted identification of TE insertions in a Drosophila genome through hemi-specific PCR. Mob DNA 2017; 8:10. [PMID: 28775768 PMCID: PMC5534036 DOI: 10.1186/s13100-017-0092-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2017] [Accepted: 07/10/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Transposable elements (TEs) are major components of eukaryotic genomes and drivers of genome evolution, producing intraspecific polymorphism and interspecific differences through mobilization and non-homologous recombination. TE insertion sites are often highly variable within species, creating a need for targeted genome re-sequencing (TGS) methods to identify TE insertion sites. METHODS We present a hemi-specific PCR approach for TGS of P-elements in Drosophila genomes on the Illumina platform. We also present a computational framework for identifying new insertions from TGS reads. Finally, we describe a new method for estimating the frequency of TE insertions from WGS data, which is based precise insertion sites provided by TGS annotations. RESULTS By comparing our results to TE annotations based on whole genome re-sequencing (WGS) data for the same Drosophilamelanogaster strain, we demonstrate that TGS is powerful for identifying true insertions, even in repeat-rich heterochromatic regions. We also demonstrate that TGS offers enhanced annotation of precise insertion sites, which facilitates estimation of TE insertion frequency. CONCLUSIONS TGS by hemi-specific PCR is a powerful approach for identifying TE insertions of particular TE families in species with a high-quality reference genome, at greatly reduced cost as compared to WGS. It may therefore be ideal for population genomic studies of particular TE families. Additionally, TGS and WGS can be used as complementary approaches, with TGS annotations identifying more annotated insertions with greater precision for a target TE family, and WGS data allowing for estimates of TE insertion frequencies, and a broader picture of the location of non-target TEs across the genome.
Collapse
Affiliation(s)
- Shuo Zhang
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| | - Erin S. Kelleher
- Department of Biology and Biochemistry, University of Houston, 3455 Cullen Blvd. Suite 342, Houston, TX 77204 USA
| |
Collapse
|