1
|
Mokhtar MM, Alsamman AM, El Allali A. PlantLTRdb: An interactive database for 195 plant species LTR-retrotransposons. FRONTIERS IN PLANT SCIENCE 2023; 14:1134627. [PMID: 36950350 PMCID: PMC10025401 DOI: 10.3389/fpls.2023.1134627] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/16/2023] [Indexed: 05/29/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.
Collapse
|
2
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
3
|
Abstract
Advances in genomic sequencing have recently offered vast opportunities for biological exploration, unraveling the evolution and improving our understanding of Earth biodiversity. Due to distinct plant species characteristics in terms of genome size, ploidy and heterozygosity, transposable elements (TEs) are common characteristics of many genomes. TEs are ubiquitous and dispersed repetitive DNA sequences that frequently impact the evolution and composition of the genome, mainly due to their redundancy and rearrangements. For this study, we provided an atlas of TE data by employing an easy-to-use portal ( APTE website ). To our knowledge, this is the most extensive and standardized analysis of TEs in plant genomes. We evaluated 67 plant genomes assembled at chromosome scale, recovering a total of 49,802,023 TE records, representing a total of 47,992,091,043 (~47,62%) base pairs (bp) of the total genomic space. We observed that new types of TEs were identified and annotated compared to other data repositories. By establishing a standardized catalog of TE annotation on 67 genomes, new hypotheses, exploration of TE data and their influences on the genomes may allow a better understanding of their function and processes. All original code and an example of how we developed the TE annotation strategy is available on GitHub ( Extended data).
Collapse
Affiliation(s)
- Daniel Longhi Fernandes Pedro
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| | - Tharcisio Soares Amorim
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| | - Alessandro Varani
- Departament of Agricultural and Environmental Biotechnology, School of Agricultural and Veterinary Sciences, São Paulo State University (UNESP), Jaboticabal, São Paulo, 14884-900, Brazil
| | - Romain Guyot
- Institut de Recherche pour le Développement, IRD, University of Montpellier, Montpellier, France
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Douglas Silva Domingues
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
- Group of Genomics and Transcriptomes in Plants, Institute of Biosciences, São Paulo State University (UNESP), Rio Claro, São Paulo, 13506-900, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science; Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná (UTFPR), Cornélio Procópio, Paraná, 86300000, Brazil
| |
Collapse
|
4
|
Mokhtar MM, Alsamman AM, Abd-Elhalim HM, El Allali A. CicerSpTEdb: A web-based database for high-resolution genome-wide identification of transposable elements in Cicer species. PLoS One 2021; 16:e0259540. [PMID: 34762703 PMCID: PMC8584679 DOI: 10.1371/journal.pone.0259540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2021] [Accepted: 10/20/2021] [Indexed: 11/19/2022] Open
Abstract
Recently, Cicer species have experienced increased research interest due to their economic importance, especially in genetics, genomics, and crop improvement. The Cicer arietinum, Cicer reticulatum, and Cicer echinospermum genomes have been sequenced and provide valuable resources for trait improvement. Since the publication of the chickpea draft genome, progress has been made in genome assembly, functional annotation, and identification of polymorphic markers. However, work is still needed to identify transposable elements (TEs) and make them available for researchers. In this paper, we present CicerSpTEdb, a comprehensive TE database for Cicer species that aims to improve our understanding of the organization and structural variations of the chickpea genome. Using structure and homology-based methods, 3942 C. echinospermum, 3579 C. reticulatum, and 2240 C. arietinum TEs were identified. Comparisons between Cicer species indicate that C. echinospermum has the highest number of LTR-RT and hAT TEs. C. reticulatum has more Mutator, PIF Harbinger, Tc1 Mariner, and CACTA TEs, while C. arietinum has the highest number of Helitron. CicerSpTEdb enables users to search and visualize TEs by location and download their results. The database will provide a powerful resource that can assist in developing TE target markers for molecular breeding and answer related biological questions. Database URL: http://cicersptedb.easyomics.org/index.php.
Collapse
Affiliation(s)
- Morad M. Mokhtar
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- * E-mail: (AEA); (MMM)
| | | | - Haytham M. Abd-Elhalim
- Agricultural Genetic Engineering Research Institute, Agricultural Research Center, Giza, Egypt
| | - Achraf El Allali
- African Genome Center, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- * E-mail: (AEA); (MMM)
| |
Collapse
|
5
|
A Practical Guide on Computational Tools and Databases for Transposable Elements in Plants. Methods Mol Biol 2021. [PMID: 33900590 DOI: 10.1007/978-1-0716-1134-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In the age of big data, obtaining precise information about the research topic of interesting is extremely important. Keeping this in mind, this chapter focuses on providing a practical knowledge guide about computational tools and databases of transposable elements (TE) in plants. For that, we organize and present this text in three sections: (1) a discussion about tools and databases on this theme; (2) hands-on of how to use a few of them; (3) an exploratory data analysis on public TE data. Finally, we are going deep to present the main challenges and possible solutions to improve resources and tools.
Collapse
|
6
|
O'Neill K, Brocks D, Hammell MG. Mobile genomics: tools and techniques for tackling transposons. Philos Trans R Soc Lond B Biol Sci 2020; 375:20190345. [PMID: 32075565 PMCID: PMC7061981 DOI: 10.1098/rstb.2019.0345] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/20/2019] [Indexed: 12/22/2022] Open
Abstract
Next-generation sequencing approaches have fundamentally changed the types of questions that can be asked about gene function and regulation. With the goal of approaching truly genome-wide quantifications of all the interaction partners and downstream effects of particular genes, these quantitative assays have allowed for an unprecedented level of detail in exploring biological interactions. However, many challenges remain in our ability to accurately describe and quantify the interactions that take place in those hard to reach and extremely repetitive regions of our genome comprised mostly of transposable elements (TEs). Tools dedicated to TE-derived sequences have lagged behind, making the inclusion of these sequences in genome-wide analyses difficult. Recent improvements, both computational and experimental, allow for the better inclusion of TE sequences in genomic assays and a renewed appreciation for the importance of TE biology. This review will discuss the recent improvements that have been made in the computational analysis of TE-derived sequences as well as the areas where such analysis still proves difficult. This article is part of a discussion meeting issue 'Crossroads between transposons and gene regulation'.
Collapse
Affiliation(s)
- Kathryn O'Neill
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - David Brocks
- Department of Computer Science and Applied Mathematics, The Weizmann Institute of Science, Rehovot, Israel
| | - Molly Gale Hammell
- Simons Center for Quantitative Biology, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| |
Collapse
|
7
|
Weighill D, Tschaplinski TJ, Tuskan GA, Jacobson D. Data Integration in Poplar: 'Omics Layers and Integration Strategies. Front Genet 2019; 10:874. [PMID: 31608114 PMCID: PMC6773870 DOI: 10.3389/fgene.2019.00874] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2019] [Accepted: 08/20/2019] [Indexed: 12/20/2022] Open
Abstract
Populus trichocarpa is an important biofuel feedstock that has been the target of extensive research and is emerging as a model organism for plants, especially woody perennials. This research has generated several large ‘omics datasets. However, only few studies in Populus have attempted to integrate various data types. This review will summarize various ‘omics data layers, focusing on their application in Populus species. Subsequently, network and signal processing techniques for the integration and analysis of these data types will be discussed, with particular reference to examples in Populus.
Collapse
Affiliation(s)
- Deborah Weighill
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Timothy J Tschaplinski
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Gerald A Tuskan
- Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| | - Daniel Jacobson
- The Bredesen Center for Interdisciplinary Research and Graduate Education, University of Tennessee, Knoxville, Knoxville, TN, United States.,Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN, United States
| |
Collapse
|
8
|
Yi F, Ling J, Xiao Y, Zhang H, Ouyang F, Wang J. ConTEdb: a comprehensive database of transposable elements in conifers. Database (Oxford) 2018; 2018:5255192. [PMID: 30576494 PMCID: PMC6301336 DOI: 10.1093/database/bay131] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2018] [Revised: 10/24/2018] [Accepted: 11/26/2018] [Indexed: 11/14/2022]
Abstract
Conifers are the largest and most ubiquitous group of gymnosperms and have significant ecological significance and economic importance. However, the huge and complex genomes have hindered the sequencing and mining of conifer genomes. In this study, we identified 413 423 transposable elements (TEs) from Picea abies, Picea glauca and Pinus taeda using a combination of multiple approaches and classified them into 11 133 families. A comprehensive web-based database, ConTEdb, was constructed and served for researchers. ConTEdb enables users to browse, retrieve and download the TE sequences from the database. Several analysis tools are integrated into ConTEdb to help users mine the TE data easily and effectively. In summary, ConTEdb provides a platform to study TE biology and functional genomics in conifers.
Collapse
Affiliation(s)
- Fei Yi
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- College of Biological and Pharmaceutical Sciences, Three Gorges University, Yichang, China
| | - Juanjuan Ling
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Yao Xiao
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Hanguo Zhang
- State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China
| | - Fangqun Ouyang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Junhui Wang
- State Key Laboratory of Tree Genetics and Breeding, Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| |
Collapse
|