1
|
The Dynamism of Transposon Methylation for Plant Development and Stress Adaptation. Int J Mol Sci 2021; 22:ijms222111387. [PMID: 34768817 PMCID: PMC8583499 DOI: 10.3390/ijms222111387] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/13/2021] [Accepted: 10/19/2021] [Indexed: 02/06/2023] Open
Abstract
Plant development processes are regulated by epigenetic alterations that shape nuclear structure, gene expression, and phenotypic plasticity; these alterations can provide the plant with protection from environmental stresses. During plant growth and development, these processes play a significant role in regulating gene expression to remodel chromatin structure. These epigenetic alterations are mainly regulated by transposable elements (TEs) whose abundance in plant genomes results in their interaction with genomes. Thus, TEs are the main source of epigenetic changes and form a substantial part of the plant genome. Furthermore, TEs can be activated under stress conditions, and activated elements cause mutagenic effects and substantial genetic variability. This introduces novel gene functions and structural variation in the insertion sites and primarily contributes to epigenetic modifications. Altogether, these modifications indirectly or directly provide the ability to withstand environmental stresses. In recent years, many studies have shown that TE methylation plays a major role in the evolution of the plant genome through epigenetic process that regulate gene imprinting, thereby upholding genome stability. The induced genetic rearrangements and insertions of mobile genetic elements in regions of active euchromatin contribute to genome alteration, leading to genomic stress. These TE-mediated epigenetic modifications lead to phenotypic diversity, genetic variation, and environmental stress tolerance. Thus, TE methylation is essential for plant evolution and stress adaptation, and TEs hold a relevant military position in the plant genome. High-throughput techniques have greatly advanced the understanding of TE-mediated gene expression and its associations with genome methylation and suggest that controlled mobilization of TEs could be used for crop breeding. However, development application in this area has been limited, and an integrated view of TE function and subsequent processes is lacking. In this review, we explore the enormous diversity and likely functions of the TE repertoire in adaptive evolution and discuss some recent examples of how TEs impact gene expression in plant development and stress adaptation.
Collapse
|
2
|
Orozco-Arias S, Candamil-Cortés MS, Jaimes PA, Piña JS, Tabares-Soto R, Guyot R, Isaza G. K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes. PeerJ 2021; 9:e11456. [PMID: 34055489 PMCID: PMC8140598 DOI: 10.7717/peerj.11456] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 04/24/2021] [Indexed: 12/15/2022] Open
Abstract
Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | | | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Johan S Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| |
Collapse
|
3
|
Orozco-Arias S, Jaimes PA, Candamil MS, Jiménez-Varón CF, Tabares-Soto R, Isaza G, Guyot R. InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning. Genes (Basel) 2021; 12:genes12020190. [PMID: 33525408 PMCID: PMC7910972 DOI: 10.3390/genes12020190] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/04/2022] Open
Abstract
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
- Correspondence: (S.O.-A.); (R.G.)
| | - Paula A. Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | - Mariana S. Candamil
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, 34394 Montpellier, France
- Correspondence: (S.O.-A.); (R.G.)
| |
Collapse
|
4
|
Comparative Study of Pine Reference Genomes Reveals Transposable Element Interconnected Gene Networks. Genes (Basel) 2020; 11:genes11101216. [PMID: 33081418 PMCID: PMC7602945 DOI: 10.3390/genes11101216] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/11/2020] [Accepted: 10/13/2020] [Indexed: 12/13/2022] Open
Abstract
Sequencing the giga-genomes of several pine species has enabled comparative genomic analyses of these outcrossing tree species. Previous studies have revealed the wide distribution and extraordinary diversity of transposable elements (TEs) that occupy the large intergenic spaces in conifer genomes. In this study, we analyzed the distribution of TEs in gene regions of the assembled genomes of Pinus taeda and Pinus lambertiana using high-performance computing resources. The quality of draft genomes and the genome annotation have significant consequences for the investigation of TEs and these aspects are discussed. Several TE families frequently inserted into genes or their flanks were identified in both species’ genomes. Potentially important sequence motifs were identified in TEs that could bind additional regulatory factors, promoting gene network formation with faster or enhanced transcription initiation. Node genes that contain many TEs were observed in multiple potential transposable element-associated networks. This study demonstrated the increased accumulation of TEs in the introns of stress-responsive genes of pines and suggests the possibility of rewiring them into responsive networks and sub-networks interconnected with node genes containing multiple TEs. Many such regulatory influences could lead to the adaptive environmental response clines that are characteristic of naturally spread pine populations.
Collapse
|
5
|
Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. PeerJ 2019; 7:e8311. [PMID: 31976169 PMCID: PMC6967008 DOI: 10.7717/peerj.8311] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/28/2019] [Indexed: 12/16/2022] Open
Abstract
Background Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. Methodology We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Results Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. Conclusions ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Romain Guyot
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, Montpellier, France.,Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| |
Collapse
|
6
|
Orozco-Arias S, Isaza G, Guyot R. Retrotransposons in Plant Genomes: Structure, Identification, and Classification through Bioinformatics and Machine Learning. Int J Mol Sci 2019; 20:E3837. [PMID: 31390781 PMCID: PMC6696364 DOI: 10.3390/ijms20153837] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 07/31/2019] [Accepted: 08/02/2019] [Indexed: 01/26/2023] Open
Abstract
Transposable elements (TEs) are genomic units able to move within the genome of virtually all organisms. Due to their natural repetitive numbers and their high structural diversity, the identification and classification of TEs remain a challenge in sequenced genomes. Although TEs were initially regarded as "junk DNA", it has been demonstrated that they play key roles in chromosome structures, gene expression, and regulation, as well as adaptation and evolution. A highly reliable annotation of these elements is, therefore, crucial to better understand genome functions and their evolution. To date, much bioinformatics software has been developed to address TE detection and classification processes, but many problematic aspects remain, such as the reliability, precision, and speed of the analyses. Machine learning and deep learning are algorithms that can make automatic predictions and decisions in a wide variety of scientific applications. They have been tested in bioinformatics and, more specifically for TEs, classification with encouraging results. In this review, we will discuss important aspects of TEs, such as their structure, importance in the evolution and architecture of the host, and their current classifications and nomenclatures. We will also address current methods and their limitations in identifying and classifying TEs.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170001, Colombia
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170001, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170001, Colombia
| | - Romain Guyot
- Department of Electronics and Automatization, Universidad Autónoma de Manizales, Manizales 170001, Colombia.
- Institut de Recherche pour le Développement, CIRAD, University Montpellier, 34000 Montpellier, France.
| |
Collapse
|
7
|
Mlinarec J, Franjević D, Harapin J, Besendorfer V. The impact of the Tekay chromoviral elements on genome organisation and evolution of Anemone s.l. (Ranunculaceae). PLANT BIOLOGY (STUTTGART, GERMANY) 2016; 18:332-347. [PMID: 26370195 DOI: 10.1111/plb.12393] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/14/2015] [Accepted: 09/10/2015] [Indexed: 06/05/2023]
Abstract
We studied the highly abundant chromoviral Tekay clade in species from three sister genera - Anemone, Pulsatilla and Hepatica (Ranunculaceae). With this clade, we performed a concomitant survey of its phylogenetic diversity, chromosomal organisation and transcriptional activity in Anemone s.l. in order to investigate dynamics of the Tekay elements at a finer scale than previously achieved in this or any other flowering clade. The phylogenetic tree built from Tekay sequences conformed to expected evolutionary relationships of the species; exceptions being A. nemorosa and A. sylvestris, which appeared more closely related that expected, and we invoke hybridisation events to explain the observed topology. The separation of elements into six clusters could be explained by episodic bursts of activity since divergence from a common ancestor at different points in their respective evolutionary histories. In Anemone s.l. the Tekay elements do not have a preferential position on chromosomes, i.e. they can have a: (i) centromeric/pericentromeric position; (ii) interstitial position in DAPI-positive AT-rich heterochromatic regions; can be (iii) dispersed throughout chromosomes; or even (iv) be absent from large heterochromatic blocks. Widespread transcriptional activity of the Tekay elements in Anemone s.l. taxa indicate that some copies of Tekay elements could still be active in this plant group, contributing to genome evolution and speciation within Anemone s.l. Identification of Tekay elements in Anemone s.l. provides valuable information for understanding how different localisation patterns might help to facilitate plant genome organisation in a structural and functional manner.
Collapse
Affiliation(s)
- J Mlinarec
- Division of Biology, Department of Molecular Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - D Franjević
- Division of Biology, Zoology Department, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - J Harapin
- Division of Biology, Department of Molecular Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| | - V Besendorfer
- Division of Biology, Department of Molecular Biology, Faculty of Science, University of Zagreb, Zagreb, Croatia
| |
Collapse
|
8
|
Marcon HS, Domingues DS, Silva JC, Borges RJ, Matioli FF, Fontes MRDM, Marino CL. Transcriptionally active LTR retrotransposons in Eucalyptus genus are differentially expressed and insertionally polymorphic. BMC PLANT BIOLOGY 2015; 15:198. [PMID: 26268941 PMCID: PMC4535378 DOI: 10.1186/s12870-015-0550-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 06/12/2015] [Indexed: 06/01/2023]
Abstract
BACKGROUND In Eucalyptus genus, studies on genome composition and transposable elements (TEs) are particularly scarce. Nearly half of the recently released Eucalyptus grandis genome is composed by retrotransposons and this data provides an important opportunity to understand TE dynamics in Eucalyptus genome and transcriptome. RESULTS We characterized nine families of transcriptionally active LTR retrotransposons from Copia and Gypsy superfamilies in Eucalyptus grandis genome and we depicted genomic distribution and copy number in two Eucalyptus species. We also evaluated genomic polymorphism and transcriptional profile in three organs of five Eucalyptus species. We observed contrasting genomic and transcriptional behavior in the same family among different species. RLC_egMax_1 was the most prevalent family and RLC_egAngela_1 was the family with the lowest copy number. Most families of both superfamilies have their insertions occurring <3 million years, except one Copia family, RLC_egBianca_1. Protein theoretical models suggest different properties between Copia and Gypsy domains. IRAP and REMAP markers suggested genomic polymorphisms among Eucalyptus species. Using EST analysis and qRT-PCRs, we observed transcriptional activity in several tissues and in all evaluated species. In some families, osmotic stress increases transcript values. CONCLUSION Our strategy was successful in isolating transcriptionally active retrotransposons in Eucalyptus, and each family has a particular genomic and transcriptional pattern. Overall, our results show that retrotransposon activity have differentially affected genome and transcriptome among Eucalyptus species.
Collapse
Affiliation(s)
- Helena Sanches Marcon
- Departamento de Genética, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Programa de Pós-graduação em Ciências Biológicas (Genética), Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
| | - Douglas Silva Domingues
- Programa de Pós-graduação em Ciências Biológicas (Genética), Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Departamento de Botânica, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Rio Claro, Brazil.
| | - Juliana Costa Silva
- Plant Biotechnology Laboratory, Instituto Agronômico do Paraná - IAPAR, Londrina, Brazil.
| | - Rafael Junqueira Borges
- Programa de Pós-graduação em Ciências Biológicas (Genética), Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Departamento de Física e Biofísica, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Botucatu, Brazil and INCTTOX-CNPq, Brazil.
| | - Fábio Filippi Matioli
- Departamento de Física e Biofísica, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Botucatu, Brazil and INCTTOX-CNPq, Brazil.
| | - Marcos Roberto de Mattos Fontes
- Programa de Pós-graduação em Ciências Biológicas (Genética), Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Departamento de Física e Biofísica, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Botucatu, Brazil and INCTTOX-CNPq, Brazil.
| | - Celso Luis Marino
- Departamento de Genética, Instituto de Biociências, Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Programa de Pós-graduação em Ciências Biológicas (Genética), Universidade Estadual Paulista - UNESP, Botucatu, Brazil.
- Instituto de Biotecnologia da UNESP - IBTEC, Botucatu, Brazil.
| |
Collapse
|
9
|
Lupin Allergy: Uncovering Structural Features and Epitopes of β-conglutin Proteins in Lupinus Angustifolius L. with a Focus on Cross-allergenic Reactivity to Peanut and Other Legumes. BIOINFORMATICS AND BIOMEDICAL ENGINEERING 2015. [DOI: 10.1007/978-3-319-16483-0_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
10
|
Gao D, Abernathy B, Rohksar D, Schmutz J, Jackson SA. Annotation and sequence diversity of transposable elements in common bean (Phaseolus vulgaris). FRONTIERS IN PLANT SCIENCE 2014; 5:339. [PMID: 25071814 PMCID: PMC4093653 DOI: 10.3389/fpls.2014.00339] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 06/25/2014] [Indexed: 05/21/2023]
Abstract
Common bean (Phaseolus vulgaris) is an important legume crop grown and consumed worldwide. With the availability of the common bean genome sequence, the next challenge is to annotate the genome and characterize functional DNA elements. Transposable elements (TEs) are the most abundant component of plant genomes and can dramatically affect genome evolution and genetic variation. Thus, it is pivotal to identify TEs in the common bean genome. In this study, we performed a genome-wide transposon annotation in common bean using a combination of homology and sequence structure-based methods. We developed a 2.12-Mb transposon database which includes 791 representative transposon sequences and is available upon request or from www.phytozome.org. Of note, nearly all transposons in the database are previously unrecognized TEs. More than 5,000 transposon-related expressed sequence tags (ESTs) were detected which indicates that some transposons may be transcriptionally active. Two Ty1-copia retrotransposon families were found to encode the envelope-like protein which has rarely been identified in plant genomes. Also, we identified an extra open reading frame (ORF) termed ORF2 from 15 Ty3-gypsy families that was located between the ORF encoding the retrotransposase and the 3'LTR. The ORF2 was in opposite transcriptional orientation to retrotransposase. Sequence homology searches and phylogenetic analysis suggested that the ORF2 may have an ancient origin, but its function is not clear. These transposon data provide a useful resource for understanding the genome organization and evolution and may be used to identify active TEs for developing transposon-tagging system in common bean and other related genomes.
Collapse
Affiliation(s)
- Dongying Gao
- Center for Applied Genetic Technologies, University of GeorgiaAthens, GA, USA
| | - Brian Abernathy
- Center for Applied Genetic Technologies, University of GeorgiaAthens, GA, USA
| | - Daniel Rohksar
- US Department of Energy Joint Genome InstituteWalnut Creek, CA, USA
| | - Jeremy Schmutz
- US Department of Energy Joint Genome InstituteWalnut Creek, CA, USA
- HudsonAlpha Institute of BiotechnologyHuntsville, AL, USA
| | - Scott A. Jackson
- Center for Applied Genetic Technologies, University of GeorgiaAthens, GA, USA
- *Correspondence: Scott A. Jackson, Center for Applied Genetic Technologies, University of Georgia, 111 Riverbend Road, Athens, GA 30602, USA e-mail:
| |
Collapse
|