1
|
Garza AB, Lerat E, Girgis HZ. Look4LTRs: a Long terminal repeat retrotransposon detection tool capable of cross species studies and discovering recently nested repeats. Mob DNA 2024; 15:8. [PMID: 38627766 PMCID: PMC11020628 DOI: 10.1186/s13100-024-00317-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Accepted: 03/08/2024] [Indexed: 04/20/2024] Open
Abstract
Plant genomes include large numbers of transposable elements. One particular type of these elements is flanked by two Long Terminal Repeats (LTRs) and can translocate using RNA. Such elements are known as LTR-retrotransposons; they are the most abundant type of transposons in plant genomes. They have many important functions involving gene regulation and the rise of new genes and pseudo genes in response to severe stress. Additionally, LTR-retrotransposons have several applications in biotechnology. Due to the abundance and the importance of LTR-retrotransposons, multiple computational tools have been developed for their detection. However, none of these tools take advantages of the availability of related genomes; they process one chromosome at a time. Further, recently nested LTR-retrotransposons (multiple elements of the same family are inserted into each other) cannot be annotated accurately - or cannot be annotated at all - by the currently available tools. Motivated to overcome these two limitations, we built Look4LTRs, which can annotate LTR-retrotransposons in multiple related genomes simultaneously and discover recently nested elements. The methodology of Look4LTRs depends on techniques imported from the signal-processing field, graph algorithms, and machine learning with a minimal use of alignment algorithms. Four plant genomes were used in developing Look4LTRs and eight plant genomes for evaluating it in contrast to three related tools. Look4LTRs is the fastest while maintaining better or comparable F1 scores (the harmonic average of recall and precision) to those obtained by the other tools. Our results demonstrate the added benefit of annotating LTR-retrotransposons in multiple related genomes simultaneously and the ability to discover recently nested elements. Expert human manual examination of six elements - not included in the ground truth - revealed that three elements belong to known families and two elements are likely from new families. With respect to examining recently nested LTR-retrotransposons, three out of five were confirmed to be valid elements. Look4LTRs - with its speed, accuracy, and novel features - represents a true advancement in the annotation of LTR-retrotransposons, opening the door to many studies focused on understanding their functions in plants.
Collapse
Affiliation(s)
- Anthony B Garza
- Bioinformatics Toolsmith Laboratory, Department of Electrical Engineering and Computer Science, Texas A &M University-Kingsville, Kingsville, Texas, USA
| | - Emmanuelle Lerat
- The Biometrics and Evolutionary Biology Laboratory, University Lyon 1, Lyon, France
| | - Hani Z Girgis
- Bioinformatics Toolsmith Laboratory, Department of Electrical Engineering and Computer Science, Texas A &M University-Kingsville, Kingsville, Texas, USA.
| |
Collapse
|
2
|
Feldmeyer B, Bornberg-Bauer E, Dohmen E, Fouks B, Heckenhauer J, Huylmans AK, Jones ARC, Stolle E, Harrison MC. Comparative Evolutionary Genomics in Insects. Methods Mol Biol 2024; 2802:473-514. [PMID: 38819569 DOI: 10.1007/978-1-0716-3838-5_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]
Abstract
Genome sequencing quality, in terms of both read length and accuracy, is constantly improving. By combining long-read sequencing technologies with various scaffolding techniques, chromosome-level genome assemblies are now achievable at an affordable price for non-model organisms. Insects represent an exciting taxon for studying the genomic underpinnings of evolutionary innovations, due to ancient origins, immense species-richness, and broad phenotypic diversity. Here we summarize some of the most important methods for carrying out a comparative genomics study on insects. We describe available tools and offer concrete tips on all stages of such an endeavor from DNA extraction through genome sequencing, annotation, and several evolutionary analyses. Along the way we describe important insect-specific aspects, such as DNA extraction difficulties or gene families that are particularly difficult to annotate, and offer solutions. We describe results from several examples of comparative genomics analyses on insects to illustrate the fascinating questions that can now be addressed in this new age of genomics research.
Collapse
Affiliation(s)
- Barbara Feldmeyer
- Senckenberg Biodiversity and Climate Research Centre (SBiK-F), Molecular Ecology, Frankfurt, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
- Department of Protein Evolution, Max Planck Institute for Developmental Biology, Tübingen, Germany
| | - Elias Dohmen
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Bertrand Fouks
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Jacqueline Heckenhauer
- LOEWE Centre for Translational Biodiversity Genomics (LOEWE-TBG), Frankfurt, Germany
- Department of Terrestrial Zoology, Senckenberg Research Institute and Natural History Museum Frankfurt, Frankfurt, Germany
| | - Ann Kathrin Huylmans
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University, Mainz, Germany
| | - Alun R C Jones
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany
| | - Eckart Stolle
- Museum Koenig, Leibniz Institute for the Analysis of Biodiversity Change (LIB), Bonn, Germany
| | - Mark C Harrison
- Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
| |
Collapse
|
3
|
Ouadi S, Sierro N, Kessler F, Ivanov NV. Chromosome-scale assemblies of S. malaccense, S. aqueum, S. jambos, and S. syzygioides provide insights into the evolution of Syzygium genomes. FRONTIERS IN PLANT SCIENCE 2023; 14:1248780. [PMID: 37868305 PMCID: PMC10587690 DOI: 10.3389/fpls.2023.1248780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 08/28/2023] [Indexed: 10/24/2023]
Abstract
Syzygium is a large and diverse tree genus in the Myrtaceae family. Genome assemblies for clove (Syzygium aromaticum, 370 Mb) and sea apple (Syzygium grande, 405 Mb) provided the first insights into the genomic features and evolution of the Syzygium genus. Here, we present additional de novo chromosome-scale genome assemblies for Syzygium malaccense, Syzygium aqueum, Syzygium jambos, and Syzygium syzygioides. Genome profiling analyses show that S. malaccense, like S. aromaticum and S. grande, is diploid (2n = 2x = 22), while the S. aqueum, S. jambos, and S. syzygioides specimens are autotetraploid (2n = 4x = 44). The genome assemblies of S. malaccense (430 Mb), S. aqueum (392 Mb), S. jambos (426 Mb), and S. syzygioides (431 Mb) are highly complete (BUSCO scores of 98%). Comparative genomics analyses showed conserved organization of the 11 chromosomes with S. aromaticum and S. grande, and revealed species-specific evolutionary dynamics of the long terminal repeat retrotransposon elements belonging to the Gypsy and Copia lineages. This set of Syzygium genomes is a valuable resource for future structural and functional comparative genomic studies on Myrtaceae species.
Collapse
Affiliation(s)
- Sonia Ouadi
- Faculty of Sciences, Laboratory of Plant Physiology, University of Neuchâtel, Neuchâtel, Switzerland
- Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Nicolas Sierro
- Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| | - Felix Kessler
- Faculty of Sciences, Laboratory of Plant Physiology, University of Neuchâtel, Neuchâtel, Switzerland
| | - Nikolai V Ivanov
- Faculty of Sciences, Laboratory of Plant Physiology, University of Neuchâtel, Neuchâtel, Switzerland
- Philip Morris International R&D, Philip Morris Products S.A., Neuchâtel, Switzerland
| |
Collapse
|
4
|
Liao X, Zhu W, Zhou J, Li H, Xu X, Zhang B, Gao X. Repetitive DNA sequence detection and its role in the human genome. Commun Biol 2023; 6:954. [PMID: 37726397 PMCID: PMC10509279 DOI: 10.1038/s42003-023-05322-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 09/04/2023] [Indexed: 09/21/2023] Open
Abstract
Repetitive DNA sequences playing critical roles in driving evolution, inducing variation, and regulating gene expression. In this review, we summarized the definition, arrangement, and structural characteristics of repeats. Besides, we introduced diverse biological functions of repeats and reviewed existing methods for automatic repeat detection, classification, and masking. Finally, we analyzed the type, structure, and regulation of repeats in the human genome and their role in the induction of complex diseases. We believe that this review will facilitate a comprehensive understanding of repeats and provide guidance for repeat annotation and in-depth exploration of its association with human diseases.
Collapse
Affiliation(s)
- Xingyu Liao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Wufei Zhu
- Department of Endocrinology, Yichang Central People's Hospital, The First College of Clinical Medical Science, China Three Gorges University, 443000, Yichang, P.R. China
| | - Juexiao Zhou
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Haoyang Li
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xiaopeng Xu
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Bin Zhang
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia
| | - Xin Gao
- Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, 23955, Saudi Arabia.
| |
Collapse
|
5
|
Voelker WG, Krishnan K, Chougule K, Alexander LC, Lu Z, Olson A, Ware D, Songsomboon K, Ponce C, Brenton ZW, Boatwright JL, Cooper EA. Ten new high-quality genome assemblies for diverse bioenergy sorghum genotypes. FRONTIERS IN PLANT SCIENCE 2023; 13:1040909. [PMID: 36684744 PMCID: PMC9846640 DOI: 10.3389/fpls.2022.1040909] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/09/2022] [Accepted: 12/09/2022] [Indexed: 06/17/2023]
Abstract
INTRODUCTION Sorghum (Sorghum bicolor (L.) Moench) is an agriculturally and economically important staple crop that has immense potential as a bioenergy feedstock due to its relatively high productivity on marginal lands. To capitalize on and further improve sorghum as a potential source of sustainable biofuel, it is essential to understand the genomic mechanisms underlying complex traits related to yield, composition, and environmental adaptations. METHODS Expanding on a recently developed mapping population, we generated de novo genome assemblies for 10 parental genotypes from this population and identified a comprehensive set of over 24 thousand large structural variants (SVs) and over 10.5 million single nucleotide polymorphisms (SNPs). RESULTS We show that SVs and nonsynonymous SNPs are enriched in different gene categories, emphasizing the need for long read sequencing in crop species to identify novel variation. Furthermore, we highlight SVs and SNPs occurring in genes and pathways with known associations to critical bioenergy-related phenotypes and characterize the landscape of genetic differences between sweet and cellulosic genotypes. DISCUSSION These resources can be integrated into both ongoing and future mapping and trait discovery for sorghum and its myriad uses including food, feed, bioenergy, and increasingly as a carbon dioxide removal mechanism.
Collapse
Affiliation(s)
- William G. Voelker
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Krittika Krishnan
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Kapeel Chougule
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Louie C. Alexander
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Zhenyuan Lu
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Andrew Olson
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
| | - Doreen Ware
- Cold Spring Harbor Research Laboratory, Cold Spring Harbor, NY, United States
- United States Department of Agriculture - Agricultural Research Service in the North Atlantic Area (USDA-ARS NAA), Robert W. Holley Center for Agriculture and Health, Ithaca, NY, United States
| | - Kittikun Songsomboon
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Cristian Ponce
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| | - Zachary W. Brenton
- Carolina Seed Systems, Darlington, SC, United States
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
| | - J. Lucas Boatwright
- Advanced Plant Technology, Clemson University, Clemson, SC, United States
- Dept. of Plant and Environmental Sciences, Clemson University, Clemson, SC, United States
| | - Elizabeth A. Cooper
- Dept. of Bioinformatics & Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States
- North Carolina Research Campus, Kannapolis, NC, United States
| |
Collapse
|
6
|
Lexa M, Cechova M, Nguyen SH, Jedlicka P, Tokan V, Kubat Z, Hobza R, Kejnovsky E. HiC-TE: a computational pipeline for Hi-C data analysis to study the role of repeat family interactions in the genome 3D organization. Bioinformatics 2022; 38:4030-4032. [PMID: 35781332 DOI: 10.1093/bioinformatics/btac442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 06/14/2022] [Accepted: 06/30/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION The role of repetitive DNA in the 3D organization of the interphase nucleus is a subject of intensive study. In studies of 3D nucleus organization, mutual contacts of various loci can be identified by Hi-C sequencing. Typical analyses use binning of read pairs by location to reduce noise. We use binning by repeat families instead to make similar conclusions about repeat regions. RESULTS To achieve this, we combined Hi-C data, reference genome data and tools for repeat analysis into a Nextflow pipeline identifying and quantifying the contacts of specific repeat families. As an output, our pipeline produces heatmaps showing contact frequency and circular diagrams visualizing repeat contact localization. Using our pipeline with tomato data, we revealed the preferential homotypic interactions of ribosomal DNA, centromeric satellites and some LTR retrotransposon families and, as expected, little contact between organellar and nuclear DNA elements. While the pipeline can be applied to any eukaryotic genome, results in plants provide better coverage, since the built-in TE-greedy-nester software only detects tandems and LTR retrotransposons. Other repeats can be fed via GFF3 files. This pipeline represents a novel and reproducible way to analyze the role of repetitive elements in the 3D organization of genomes. AVAILABILITY AND IMPLEMENTATION https://gitlab.fi.muni.cz/lexa/hic-te/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Matej Lexa
- Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic.,Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| | - Monika Cechova
- Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic
| | - Son Hoang Nguyen
- Faculty of Informatics, Masaryk University, 60200 Brno, Czech Republic
| | - Pavel Jedlicka
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| | - Viktor Tokan
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| | - Zdenek Kubat
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| | - Roman Hobza
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| | - Eduard Kejnovsky
- Department of Plant Developmental Genetics, Institute of Biophysics of the Czech Academy of Sciences, 61200 Brno, Czech Republic
| |
Collapse
|
7
|
Orozco-Arias S, Candamil-Cortes MS, Jaimes PA, Valencia-Castrillon E, Tabares-Soto R, Isaza G, Guyot R. Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning. J Integr Bioinform 2022; 19:jib-2021-0036. [PMID: 35822734 PMCID: PMC9521825 DOI: 10.1515/jib-2021-0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 06/10/2022] [Indexed: 11/19/2022] Open
Abstract
Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | | | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia.,Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| |
Collapse
|
8
|
Ouadi S, Sierro N, Goepfert S, Bovet L, Glauser G, Vallat A, Peitsch MC, Kessler F, Ivanov NV. The clove (Syzygium aromaticum) genome provides insights into the eugenol biosynthesis pathway. Commun Biol 2022; 5:684. [PMID: 35810198 PMCID: PMC9271057 DOI: 10.1038/s42003-022-03618-z] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2020] [Accepted: 06/22/2022] [Indexed: 11/09/2022] Open
Abstract
The clove (Syzygium aromaticum) is an important tropical spice crop in global trade. Evolving environmental pressures necessitate modern characterization and selection techniques that are currently inaccessible to clove growers owing to the scarcity of genomic and genetic information. Here, we present a 370-Mb high-quality chromosome-scale genome assembly for clove. Comparative genomic analysis between S. aromaticum and Eucalyptus grandis—both species of the Myrtaceae family—reveals good genome structure conservation and intrachromosomal rearrangements on seven of the eleven chromosomes. We report genes that belong to families involved in the biosynthesis of eugenol, the major bioactive component of clove products. On the basis of our transcriptomic and metabolomic findings, we propose a hypothetical scenario in which eugenol acetate plays a key role in high eugenol accumulation in clove leaves and buds. The clove genome is a new contribution to omics resources for the Myrtaceae family and an important tool for clove research. A newly assembled clove genome is compared with E. grandis to investigate genome evolution between the two genera of the Myrtaceae family, and putative genes involved in the biosynthesis of eugenol are identified through transcriptomics and metabolomics.
Collapse
Affiliation(s)
- Sonia Ouadi
- Faculty of Sciences, Laboratory of Plant Physiology, University of Neuchâtel, 2000, Neuchâtel, Switzerland.,PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland
| | - Nicolas Sierro
- PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland
| | - Simon Goepfert
- PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland
| | - Lucien Bovet
- PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland
| | - Gaetan Glauser
- Faculty of Sciences, Neuchâtel Platform of Analytical Chemistry, University of Neuchâtel, 2000, Neuchâtel, Switzerland
| | - Armelle Vallat
- Faculty of Sciences, Neuchâtel Platform of Analytical Chemistry, University of Neuchâtel, 2000, Neuchâtel, Switzerland
| | - Manuel C Peitsch
- PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland
| | - Felix Kessler
- Faculty of Sciences, Laboratory of Plant Physiology, University of Neuchâtel, 2000, Neuchâtel, Switzerland
| | - Nikolai V Ivanov
- PMI R&D, Philip Morris Products S. A, Quai Jeanrenaud 5, CH-2000, Neuchâtel, Switzerland.
| |
Collapse
|
9
|
Hu Y, Wu X, Jin G, Peng J, Leng R, Li L, Gui D, Fan C, Zhang C. Rapid Genome Evolution and Adaptation of Thlaspi arvense Mediated by Recurrent RNA-Based and Tandem Gene Duplications. FRONTIERS IN PLANT SCIENCE 2021; 12:772655. [PMID: 35058947 PMCID: PMC8764390 DOI: 10.3389/fpls.2021.772655] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/22/2021] [Accepted: 11/09/2021] [Indexed: 05/21/2023]
Abstract
Retrotransposons are the most abundant group of transposable elements (TEs) in plants, providing an extraordinarily versatile source of genetic variation. Thlaspi arvense, a close relative of the model plant Arabidopsis thaliana with worldwide distribution, thrives from sea level to above 4,000 m elevation in the Qinghai-Tibet Plateau (QTP), China. Its strong adaptability renders it an ideal model system for studying plant adaptation in extreme environments. However, how the retrotransposons affect the T. arvense genome evolution and adaptation is largely unknown. We report a high-quality chromosome-scale genome assembly of T. arvense with a scaffold N50 of 59.10 Mb. Long terminal repeat retrotransposons (LTR-RTs) account for 56.94% of the genome assembly, and the Gypsy superfamily is the most abundant TEs. The amplification of LTR-RTs in the last six million years primarily contributed to the genome size expansion in T. arvense. We identified 351 retrogenes and 303 genes flanked by LTRs, respectively. A comparative analysis showed that orthogroups containing those retrogenes and genes flanked by LTRs have a higher percentage of significantly expanded orthogroups (SEOs), and these SEOs possess more recent tandem duplicated genes. All present results indicate that RNA-based gene duplication (retroduplication) accelerated the subsequent tandem duplication of homologous genes resulting in family expansions, and these expanded gene families were implicated in plant growth, development, and stress responses, which were one of the pivotal factors for T. arvense's adaptation to the harsh environment in the QTP regions. In conclusion, the high-quality assembly of the T. arvense genome provides insights into the retroduplication mediated mechanism of plant adaptation to extreme environments.
Collapse
Affiliation(s)
- Yanting Hu
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Xiaopei Wu
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Guihua Jin
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Junchu Peng
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- Key Laboratory for Plant Diversity and Biogeography of East Asia, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Rong Leng
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ling Li
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Daping Gui
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
| | - Chuanzhu Fan
- Department of Biological Sciences, Wayne State University, Detroit, MI, United States
- Chuanzhu Fan,
| | - Chengjun Zhang
- Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- Haiyan Engineering & Technology Center, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, China
- *Correspondence: Chengjun Zhang,
| |
Collapse
|