1
|
Mokhtar MM, Alsamman AM, El Allali A. PlantLTRdb: An interactive database for 195 plant species LTR-retrotransposons. FRONTIERS IN PLANT SCIENCE 2023; 14:1134627. [PMID: 36950350 PMCID: PMC10025401 DOI: 10.3389/fpls.2023.1134627] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 02/16/2023] [Indexed: 05/29/2023]
Abstract
LTR-retrotransposons (LTR-RTs) are a large group of transposable elements that replicate through an RNA intermediate and alter genome structure. The activities of LTR-RTs in plant genomes provide helpful information about genome evolution and gene function. LTR-RTs near or within genes can directly alter gene function. This work introduces PlantLTRdb, an intact LTR-RT database for 195 plant species. Using homology- and de novo structure-based methods, a total of 150.18 Gbp representing 3,079,469 pseudomolecules/scaffolds were analyzed to identify, characterize, annotate LTR-RTs, estimate insertion ages, detect LTR-RT-gene chimeras, and determine nearby genes. Accordingly, 520,194 intact LTR-RTs were discovered, including 29,462 autonomous and 490,732 nonautonomous LTR-RTs. The autonomous LTR-RTs included 10,286 Gypsy and 19,176 Copia, while the nonautonomous were divided into 224,906 Gypsy, 218,414 Copia, 1,768 BARE-2, 3,147 TR-GAG and 4,2497 unknown. Analysis of the identified LTR-RTs located within genes showed that a total of 36,236 LTR-RTs were LTR-RT-gene chimeras and 11,619 LTR-RTs were within pseudo-genes. In addition, 50,026 genes are within 1 kbp of LTR-RTs, and 250,587 had a distance of 1 to 10 kbp from LTR-RTs. PlantLTRdb allows researchers to search, visualize, BLAST and analyze plant LTR-RTs. PlantLTRdb can contribute to the understanding of structural variations, genome organization, functional genomics, and the development of LTR-RT target markers for molecular plant breeding. PlantLTRdb is available at https://bioinformatics.um6p.ma/PlantLTRdb.
Collapse
|
2
|
Riehl K, Riccio C, Miska EA, Hemberg M. TransposonUltimate: software for transposon classification, annotation and detection. Nucleic Acids Res 2022; 50:e64. [PMID: 35234904 PMCID: PMC9226531 DOI: 10.1093/nar/gkac136] [Citation(s) in RCA: 30] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Revised: 02/09/2022] [Accepted: 02/14/2022] [Indexed: 12/17/2022] Open
Abstract
Most genomes harbor a large number of transposons, and they play an important role in evolution and gene regulation. They are also of interest to clinicians as they are involved in several diseases, including cancer and neurodegeneration. Although several methods for transposon identification are available, they are often highly specialised towards specific tasks or classes of transposons, and they lack common standards such as a unified taxonomy scheme and output file format. We present TransposonUltimate, a powerful bundle of three modules for transposon classification, annotation, and detection of transposition events. TransposonUltimate comes as a Conda package under the GPL-3.0 licence, is well documented and it is easy to install through https://github.com/DerKevinRiehl/TransposonUltimate. We benchmark the classification module on the large TransposonDB covering 891,051 sequences to demonstrate that it outperforms the currently best existing solutions. The annotation and detection modules combine sixteen existing softwares, and we illustrate its use by annotating Caenorhabditis elegans, Rhizophagus irregularis and Oryza sativa subs. japonica genomes. Finally, we use the detection module to discover 29 554 transposition events in the genomes of 20 wild type strains of C. elegans. Databases, assemblies, annotations and further findings can be downloaded from (https://doi.org/10.5281/zenodo.5518085).
Collapse
Affiliation(s)
- Kevin Riehl
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
| | - Cristian Riccio
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| | - Eric A Miska
- Gurdon Institute, University of Cambridge, Cambridge CB2 1QN, UK
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Department of Genetics, University of Cambridge, Downing Street, Cambridge CB2 3EH, UK
| | - Martin Hemberg
- Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women’s Hospital, 75 Francis Street, Boston, MA 02215, USA
| |
Collapse
|
3
|
Zhou P, Zhang X, Ma X, Yue J, Liao Z, Ming R. Methylation related genes affect sex differentiation in dioecious and gynodioecious papaya. HORTICULTURE RESEARCH 2022; 9:uhab065. [PMID: 35048102 PMCID: PMC8935930 DOI: 10.1093/hr/uhab065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 11/25/2021] [Indexed: 06/14/2023]
Abstract
Morphological, genic and epigenetic differences often exist in separate sexes of dioecious and trioecious plants. However, the connections and relationships among them in different breeding systems are still unclear. Papaya has three sex types, which is genetically determined and epigenetically regulated, and was chosen as a model to study sex differentiation. Bisulfite sequencing of genomic DNA extracted from early-stage flowers revealed sex-specific genomic methylation landscapes and seasonally methylome reprogramming processes in dioecious and gynodioecious papaya grown in spring and summer. Extensive methylation of sex-determining region (SDR) was the distinguishing epigenetic characteristics of nascent XY sex chromosomes in papaya. Seasonal methylome reprogramming of early-stage flowers in both dioecy and gynodioecy systems were detected, resulting from transcriptional expression pattern alterations of methylation-modification-related and chromatin-remodeling-related genes, particularly from those genes involved in active demethylation. Genes involved in phytohormone signal transduction pathway in male flowers have played an important role in the formation of male-specific characteristics. These findings enhanced the understanding of the genetic and epigenetic contributions to sex differentiation and the complexity of sex chromosome evolution in trioecious plants.
Collapse
Affiliation(s)
- Ping Zhou
- Fruit Research Institute,Fujian Academy of Agricultural Sciences,Fuzhou 350013,Fujian, China
| | - Xiaodan Zhang
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| | - Xinyi Ma
- FAFU and UIUC Joint Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Jingjing Yue
- FAFU and UIUC Joint Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Zhenyang Liao
- FAFU and UIUC Joint Center for Genomics and Biotechnology, Fujian Provincial Key Laboratory of Haixia Applied Plant Systems Biology, Fujian Agriculture and Forestry University, Fuzhou 350002, Fujian, China
| | - Ray Ming
- Department of Plant Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA
| |
Collapse
|
4
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
5
|
Oliveira LS, Patera AC, Domingues DS, Sanches DS, Lopes FM, Bugatti PH, Saito PTM, Maracaja-Coutinho V, Durham AM, Paschoal AR. Computational Analysis of Transposable Elements and CircRNAs in Plants. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2362:147-172. [PMID: 34195962 DOI: 10.1007/978-1-0716-1645-1_9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
This chapter provides two main contributions: (1) a description of computational tools and databases used to identify and analyze transposable elements (TEs) and circRNAs in plants; and (2) data analysis on public TE and circRNA data. Our goal is to highlight the primary information available in the literature on circular noncoding RNAs and transposable elements in plants. The exploratory analysis performed on publicly available circRNA and TEs data help discuss four sequence features. Finally, we investigate the association on circRNAs:TE in plants in the model organism Arabidopsis thaliana.
Collapse
Affiliation(s)
- Liliane Santana Oliveira
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil. .,Embrapa Soja, Londrina, Paraná, Brazil.
| | - Andressa Caroline Patera
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Douglas Silva Domingues
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.,Group of Genomics and Transcriptomes in Plants, Instituto de Biociências de Rio Claro, Universidade Estadual Paulista (UNESP), Rio Claro, SP, Brazil
| | - Danilo Sipoli Sanches
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Fabricio Martins Lopes
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Pedro Henrique Bugatti
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Priscila Tiemi Maeda Saito
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil
| | - Vinicius Maracaja-Coutinho
- Centro de Modelamiento Molecular, Biofísica y Bioinformática-CM2B2, Facultad de Ciencias Quimicas y Farmaceuticas, Universidad de Chile, Santiago, Chile
| | - Alan Mitchell Durham
- Department of Computer Science, Instituto de Matemática e Estatística, Universidade de São Paulo (USP), Cidade Universitária, SP, Brazil
| | - Alexandre Rossi Paschoal
- Department of Computer Science, Federal University of Technology-Paraná (UTFPR), Cornélio Procópio, PR, Brazil.
| |
Collapse
|
6
|
Zhou SS, Yan XM, Zhang KF, Liu H, Xu J, Nie S, Jia KH, Jiao SQ, Zhao W, Zhao YJ, Porth I, El Kassaby YA, Wang T, Mao JF. A comprehensive annotation dataset of intact LTR retrotransposons of 300 plant genomes. Sci Data 2021; 8:174. [PMID: 34267227 PMCID: PMC8282616 DOI: 10.1038/s41597-021-00968-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 06/07/2021] [Indexed: 12/11/2022] Open
Abstract
LTR retrotransposons (LTR-RTs) are ubiquitous and represent the dominant repeat element in plant genomes, playing important roles in functional variation, genome plasticity and evolution. With the advent of new sequencing technologies, a growing number of whole-genome sequences have been made publicly available, making it possible to carry out systematic analyses of LTR-RTs. However, a comprehensive and unified annotation of LTR-RTs in plant groups is still lacking. Here, we constructed a plant intact LTR-RTs dataset, which is designed to classify and annotate intact LTR-RTs with a standardized procedure. The dataset currently comprises a total of 2,593,685 intact LTR-RTs from genomes of 300 plant species representing 93 families of 46 orders. The dataset is accompanied by sequence, diverse structural and functional annotation, age determination and classification information associated with the LTR-RTs. This dataset will contribute valuable resources for investigating the evolutionary dynamics and functional implications of LTR-RTs in plant genomes.
Collapse
Affiliation(s)
- Shan-Shan Zhou
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Xue-Mei Yan
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Kai-Fu Zhang
- College of Big data and Intelligent Engineering, Southwest Forestry University, Yunnan, 650224, China
| | - Hui Liu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Jie Xu
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Shuai Nie
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Kai-Hua Jia
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Si-Qian Jiao
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - Wei Zhao
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China
| | - You-Jie Zhao
- College of Big data and Intelligent Engineering, Southwest Forestry University, Yunnan, 650224, China
| | - Ilga Porth
- Départment des Sciences du Bois et de la Forêt, Faculté de Foresterie, de Géographie et Géomatique, Université Laval Québec, Québec, QC, G1V 0A6, Canada
| | - Yousry A El Kassaby
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Tongli Wang
- Department of Forest and Conservation Sciences, Faculty of Forestry, The University of British Columbia, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada
| | - Jian-Feng Mao
- Beijing Advanced Innovation Center for Tree Breeding by Molecular Design, National Engineering Laboratory for Tree Breeding, Key Laboratory of Genetics and Breeding in Forest Trees and Ornamental Plants, Ministry of Education, College of Biological Sciences and Technology, Beijing Forestry University, Beijing, 100083, China.
| |
Collapse
|
7
|
A Practical Guide on Computational Tools and Databases for Transposable Elements in Plants. Methods Mol Biol 2021. [PMID: 33900590 DOI: 10.1007/978-1-0716-1134-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In the age of big data, obtaining precise information about the research topic of interesting is extremely important. Keeping this in mind, this chapter focuses on providing a practical knowledge guide about computational tools and databases of transposable elements (TE) in plants. For that, we organize and present this text in three sections: (1) a discussion about tools and databases on this theme; (2) hands-on of how to use a few of them; (3) an exploratory data analysis on public TE data. Finally, we are going deep to present the main challenges and possible solutions to improve resources and tools.
Collapse
|
8
|
da Cruz MHP, Domingues DS, Saito PTM, Paschoal AR, Bugatti PH. TERL: classification of transposable elements by convolutional neural networks. Brief Bioinform 2020; 22:5900933. [PMID: 34020551 DOI: 10.1093/bib/bbaa185] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2020] [Revised: 07/07/2020] [Accepted: 07/20/2020] [Indexed: 11/12/2022] Open
Abstract
Transposable elements (TEs) are the most represented sequences occurring in eukaryotic genomes. Few methods provide the classification of these sequences into deeper levels, such as superfamily level, which could provide useful and detailed information about these sequences. Most methods that classify TE sequences use handcrafted features such as k-mers and homology-based search, which could be inefficient for classifying non-homologous sequences. Here we propose an approach, called transposable elements pepresentation learner (TERL), that preprocesses and transforms one-dimensional sequences into two-dimensional space data (i.e., image-like data of the sequences) and apply it to deep convolutional neural networks. This classification method tries to learn the best representation of the input data to classify it correctly. We have conducted six experiments to test the performance of TERL against other methods. Our approach obtained macro mean accuracies and F1-score of 96.4% and 85.8% for superfamilies and 95.7% and 91.5% for the order sequences from RepBase, respectively. We have also obtained macro mean accuracies and F1-score of 95.0% and 70.6% for sequences from seven databases into superfamily level and 89.3% and 73.9% for the order level, respectively. We surpassed accuracy, recall and specificity obtained by other methods on the experiment with the classification of order level sequences from seven databases and surpassed by far the time elapsed of any other method for all experiments. Therefore, TERL can learn how to predict any hierarchical level of the TEs classification system and is about 20 times and three orders of magnitude faster than TEclass and PASTEC, respectively https://github.com/muriloHoracio/TERL. Contact:murilocruz@alunos.utfpr.edu.br.
Collapse
Affiliation(s)
- Murilo Horacio Pereira da Cruz
- Federal University of Technology - Parana (UTFPR), Brazil.,Bioinformatics Graduation Program (PPGBIOINFO), Department of Computer Science, Federal University of Technology - Parana (UTFPR), Brazil
| | - Douglas Silva Domingues
- São Paulo State University at Botucatu, Brazil.,University of São Paulo, Brazil.,Department of Biodiversity, São Paulo State University at Rio Claro, Brazil
| | - Priscila Tiemi Maeda Saito
- Euripides Soares da Rocha University of Marilia, Brazil.,University of São Paulo (ICMC-USP), Brazil.,University of Campinas (IC-UNICAMP), Brazil.,Department of Computing, Federal University of Technology - Parana (UTFPR), Brazil
| | | | - Pedro Henrique Bugatti
- Euripides Soares da Rocha University of Marilia, Brazil.,University of São Paulo (ICMC-USP), Brazil.,Department of Computing, Federal University of Technology - Parana (UTFPR), Brazil
| |
Collapse
|
9
|
Eshaghi M, Shiran B, Fallahi H, Ravash R, Đeri BB. Identification of genes involved in steroid alkaloid biosynthesis in Fritillaria imperialis via de novo transcriptomics. Genomics 2019; 111:1360-1372. [DOI: 10.1016/j.ygeno.2018.09.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Revised: 08/15/2018] [Accepted: 09/14/2018] [Indexed: 01/22/2023]
|
10
|
Weilguny L, Kofler R. DeviaTE: Assembly-free analysis and visualization of mobile genetic element composition. Mol Ecol Resour 2019; 19:1346-1354. [PMID: 31056858 PMCID: PMC6791034 DOI: 10.1111/1755-0998.13030] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2018] [Revised: 04/19/2019] [Accepted: 04/22/2019] [Indexed: 11/28/2022]
Abstract
Transposable elements (TEs) are selfish DNA sequences that multiply within host genomes. They are present in most species investigated so far at varying degrees of abundance and sequence diversity. The TE composition may not only vary between but also within species and could have important biological implications. Variation in prevalence among populations may for example indicate a recent TE invasion, whereas sequence variation could indicate the presence of hyperactive or inactive forms. Gaining unbiased estimates of TE composition is thus vital for understanding the evolutionary dynamics of transposons. To this end, we developed DeviaTE, a tool to analyse and visualize TE abundance using Illumina or Sanger sequencing reads. Our tool requires sequencing reads of one or more samples (tissue, individual or population) and consensus sequences of TEs. It generates a table and a visual representation of TE composition. This allows for an intuitive assessment of coverage, sequence divergence, segregating SNPs and indels, as well as the presence of internal and terminal deletions. By contrasting the coverage between TEs and single copy genes, DeviaTE derives unbiased estimates of TE abundance. We show that naive approaches, which do not consider regions spanned by internal deletions, may substantially underestimate TE abundance. Using published data we demonstrate that DeviaTE can be used to study the TE composition within samples, identify clinal variation in TEs, compare TE diversity among species, and monitor TE invasions. Finally we present careful validations with publicly available and simulated data. DeviaTE is implemented in Python and distributed under the GPLv3 (https://github.com/W-L/deviaTE).
Collapse
Affiliation(s)
- Lukas Weilguny
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| | - Robert Kofler
- Institut für Populationsgenetik, Vetmeduni Vienna, Wien, Austria
| |
Collapse
|
11
|
da Cruz MHP, Saito PTM, Paschoal AR, Bugatti PH. Classification of Transposable Elements by Convolutional Neural Networks. ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING 2019. [DOI: 10.1007/978-3-030-20915-5_15] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
|
12
|
Identification of transposons near predicted lncRNA and mRNA pools of Prunus mume using an integrative transposable element database constructed from Rosaceae plant genomes. Mol Genet Genomics 2018; 293:1301-1316. [PMID: 29804262 DOI: 10.1007/s00438-018-1449-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Accepted: 05/17/2018] [Indexed: 12/30/2022]
Abstract
This study focused on the construction of a database of transposable elements (TEs) from Rosaceae plants, the third most economically important plant family in temperate regions, and its transcriptomics applications. The evolutionary effects of TEs on gene regulation have been explored, and TE insertions can be the molecular bases of changes in gene structure and function. However, a specific Rosaceae plant TE database (RPTEdb) is lacking. The genomes of several Rosaceae plants have been sequenced, providing the opportunity to mine TE data at a whole-genome level. Therefore, we constructed the RPTEdb, a collective and comprehensive database of 19,596 annotated TEs in the genomes of Rosaceae plants using previously described identification and annotation methods and published genome sequences. The user-friendly web-based database provides access to research tools through hyperlinks, including Browse, TE tree, tools, JBrowse, and search sections, and through the inputting of sequences on the main webpage. Next, we performed one advanced application in which TEs near predicted long non-coding RNA (lncRNA) and mRNA domains within white and red petal-tissue transcriptomes of Prunus mume 'Fuban Tiaozhi' were identified, revealing 16 TEs that overlapped or were near 16 differentially expressed lncRNA domains, and 54 TEs that overlapped or were near 54 differentially expressed mRNA domains, and the TEs' possible functions were also discussed. We believe that the RPTEdb will contribute to the understanding of TE roles in the structural, functional and evolutionary dynamics of Rosaceae plant genomes.
Collapse
|
13
|
A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection. mSphere 2018; 3:mSphere00069-18. [PMID: 29564396 PMCID: PMC5853486 DOI: 10.1128/mspheredirect.00069-18] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2018] [Accepted: 02/16/2018] [Indexed: 12/20/2022] Open
Abstract
To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
Collapse
|
14
|
Ding CJ, Liang LX, Diao S, Su XH, Zhang BY. Genome-wide analysis of day/night DNA methylation differences in Populus nigra. PLoS One 2018; 13:e0190299. [PMID: 29293569 PMCID: PMC5749751 DOI: 10.1371/journal.pone.0190299] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 12/12/2017] [Indexed: 12/19/2022] Open
Abstract
DNA methylation is an important mechanism of epigenetic modification. Methylation changes during stress responses and developmental processes have been well studied; however, their role in plant adaptation to the day/night cycle is poorly understood. In this study, we detected global methylation patterns in leaves of the black poplar Populus nigra ‘N46’ at 8:00 and 24:00 by methylated DNA immunoprecipitation sequencing (MeDIP-seq). We found 10,027 and 10,242 genes to be methylated in the 8:00 and 24:00 samples, respectively. The methylated genes appeared to be involved in multiple biological processes, molecular functions, and cellular components, suggesting important roles for DNA methylation in poplar cells. Comparing the 8:00 and 24:00 samples, only 440 differentially methylated regions (DMRs) overlapped with genic regions, including 193 hyper- and 247 hypo-methylated DMRs, and may influence the expression of 137 downstream genes. Most hyper-methylated genes were associated with transferase activity, kinase activity, and phosphotransferase activity, whereas most hypo-methylated genes were associated with protein binding, ATP binding, and adenyl ribonucleotide binding, suggesting that different biological processes were activated during the day and night. Our results indicated that methylated genes were prevalent in the poplar genome, but that only a few of these participated in diurnal gene expression regulation.
Collapse
Affiliation(s)
- Chang-Jun Ding
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Li-Xiong Liang
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Shu Diao
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Xiao-Hua Su
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
| | - Bing-Yu Zhang
- State Key Laboratory of Tree Genetics and Breeding, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- Key Laboratory of Tree Breeding and Cultivation of State Forestry Administration, Research Institute of Forestry, Chinese Academy of Forestry, Beijing, China
- * E-mail:
| |
Collapse
|
15
|
Xu Z, Liu J, Ni W, Peng Z, Guo Y, Ye W, Huang F, Zhang X, Xu P, Guo Q, Shen X, Du J. GrTEdb: the first web-based database of transposable elements in cotton (Gossypium raimondii). DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:3084694. [PMID: 28365739 PMCID: PMC5467567 DOI: 10.1093/database/bax013] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 01/13/2017] [Indexed: 11/21/2022]
Abstract
Although several diploid and tetroploid Gossypium species genomes have been sequenced, the well annotated web-based transposable elements (TEs) database is lacking. To better understand the roles of TEs in structural, functional and evolutionary dynamics of the cotton genome, a comprehensive, specific, and user-friendly web-based database, Gossypium raimondii transposable elements database (GrTEdb), was constructed. A total of 14 332 TEs were structurally annotated and clearly categorized in G. raimondii genome, and these elements have been classified into seven distinct superfamilies based on the order of protein-coding domains, structures and/or sequence similarity, including 2929 Copia-like elements, 10 368 Gypsy-like elements, 299 L1, 12 Mutators, 435 PIF-Harbingers, 275 CACTAs and 14 Helitrons. Meanwhile, the web-based sequence browsing, searching, downloading and blast tool were implemented to help users easily and effectively to annotate the TEs or TE fragments in genomic sequences from G. raimondii and other closely related Gossypium species. GrTEdb provides resources and information related with TEs in G. raimondii, and will facilitate gene and genome analyses within or across Gossypium species, evaluating the impact of TEs on their host genomes, and investigating the potential interaction between TEs and protein-coding genes in Gossypium species. Database URL: http://www.grtedb.org/
Collapse
Affiliation(s)
- Zhenzhen Xu
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jing Liu
- Provincial Key Laboratory of Agrobiology, The Institute of Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Wanchao Ni
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Zhen Peng
- Provincial Key Laboratory of Agrobiology, The Institute of Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Yue Guo
- Provincial Key Laboratory of Agrobiology, The Institute of Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Wuwei Ye
- State Key Laboratory of Cotton Biology, The Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, China
| | - Fang Huang
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Xianggui Zhang
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Peng Xu
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Qi Guo
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Xinlian Shen
- Key Laboratory of Cotton and Rapeseed (Nanjing), The Institute of Industrial Crops, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| | - Jianchang Du
- Provincial Key Laboratory of Agrobiology, The Institute of Biotechnology, Jiangsu Academy of Agricultural Sciences, Nanjing 210014, China
| |
Collapse
|
16
|
Puterova J, Razumova O, Martinek T, Alexandrov O, Divashuk M, Kubat Z, Hobza R, Karlov G, Kejnovsky E. Satellite DNA and Transposable Elements in Seabuckthorn (Hippophae rhamnoides), a Dioecious Plant with Small Y and Large X Chromosomes. Genome Biol Evol 2017; 9:197-212. [PMID: 28057732 PMCID: PMC5381607 DOI: 10.1093/gbe/evw303] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/03/2017] [Indexed: 01/05/2023] Open
Abstract
Seabuckthorn (Hippophae rhamnoides) is a dioecious shrub commonly used in the pharmaceutical, cosmetic, and environmental industry as a source of oil, minerals and vitamins. In this study, we analyzed the transposable elements and satellites in its genome. We carried out Illumina DNA sequencing and reconstructed the main repetitive DNA sequences. For data analysis, we developed a new bioinformatics approach for advanced satellite DNA analysis and showed that about 25% of the genome consists of satellite DNA and about 24% is formed of transposable elements, dominated by Ty3/Gypsy and Ty1/Copia LTR retrotransposons. FISH mapping revealed X chromosome-accumulated, Y chromosome-specific or both sex chromosomes-accumulated satellites but most satellites were found on autosomes. Transposable elements were located mostly in the subtelomeres of all chromosomes. The 5S rDNA and 45S rDNA were localized on one autosomal locus each. Although we demonstrated the small size of the Y chromosome of the seabuckthorn and accumulated satellite DNA there, we were unable to estimate the age and extent of the Y chromosome degeneration. Analysis of dioecious relatives such as Shepherdia would shed more light on the evolution of these sex chromosomes.
Collapse
Affiliation(s)
- Janka Puterova
- Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Olga Razumova
- Centre for Molecular Biotechnology, Russian State Agrarian University – Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - Tomas Martinek
- Department of Information Systems, Faculty of Information Technology, Brno University of Technology, Brno, Czech Republic
| | - Oleg Alexandrov
- Centre for Molecular Biotechnology, Russian State Agrarian University – Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - Mikhail Divashuk
- Centre for Molecular Biotechnology, Russian State Agrarian University – Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - Zdenek Kubat
- Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| | - Roman Hobza
- Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic
- Institute of Experimental Botany, Center of the Region Haná for Biotechnological and Agricultural Research, Olomouc, Czech Republic
| | - Gennady Karlov
- Centre for Molecular Biotechnology, Russian State Agrarian University – Moscow Timiryazev Agricultural Academy, Moscow, Russia
- All-Russia Research Institute of Agricultural Biotechnology, Moscow, Russia
| | - Eduard Kejnovsky
- Department of Plant Developmental Genetics, Institute of Biophysics, Academy of Sciences of the Czech Republic, Brno, Czech Republic
| |
Collapse
|