1
|
Roy N, Kabir AH, Zahan N, Mouna ST, Chakravarty S, Rahman AH, Bayzid MS. Genome wide association studies on seven yield-related traits of 183 rice varieties in Bangladesh. PLANT DIRECT 2024; 8:e593. [PMID: 38887667 PMCID: PMC11182691 DOI: 10.1002/pld3.593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 03/26/2024] [Accepted: 05/02/2024] [Indexed: 06/20/2024]
Abstract
Rice genetic diversity is regulated by multiple genes and is largely dependent on various environmental factors. Uncovering the genetic variations associated with the diversity in rice populations is the key to breed stable and high yielding rice varieties. We performed genome wide association studies (GWASs) on seven rice yielding traits (grain length, grain width, grain weight, panicle length, leaf length, leaf width, and leaf angle) based on a population of 183 rice landraces of Bangladesh. Our GWASs reveal various chromosomal regions and candidate genes that are associated with different traits in Bangladeshi rice varieties. Noteworthy was the recurrent implication of chromosome 10 in all three grain-shape-related traits (grain length, grain width, and grain weight), indicating its pivotal role in shaping rice grain morphology. Our study also underscores the involvement of transposon gene families across these three traits. For leaf related traits, chromosome 10 was found to harbor regions that are significantly associated with leaf length and leaf width. The results of these association studies support previous findings as well as provide additional insights into the genetic diversity of rice. This is the first known GWAS study on various yield-related traits in the varieties of Oryza sativa available in Bangladesh-the fourth largest rice-producing country. We believe this study will accelerate rice genetics research and breeding stable high-yielding rice in Bangladesh.
Collapse
Affiliation(s)
- Nilanjan Roy
- Department of Biomedical EngineeringMilitary Institute of Science and TechnologyDhakaBangladesh
- Molecular, Cellular, and Developmental BiologyUniversity of KansasLawrenceKansasUSA
| | - Acramul Haque Kabir
- Department of Biomedical EngineeringMilitary Institute of Science and TechnologyDhakaBangladesh
- Department of Biomedical EngineeringUniversity of UtahSalt Lake CityUtahUSA
| | - Nourin Zahan
- Department of Biomedical EngineeringMilitary Institute of Science and TechnologyDhakaBangladesh
| | - Shahba Tasmiya Mouna
- Department of Biomedical EngineeringMilitary Institute of Science and TechnologyDhakaBangladesh
| | - Sakshar Chakravarty
- Department of Computer Science and EngineeringUniversity of CaliforniaRiversideCaliforniaUSA
- Department of Computer Science and EngineeringBangladesh University of Engineering and TechnologyDhakaBangladesh
| | - Atif Hasan Rahman
- Department of Computer Science and EngineeringBangladesh University of Engineering and TechnologyDhakaBangladesh
| | - Md. Shamsuzzoha Bayzid
- Department of Computer Science and EngineeringBangladesh University of Engineering and TechnologyDhakaBangladesh
| |
Collapse
|
2
|
Yu Z, Li J, Wang H, Ping B, Li X, Liu Z, Guo B, Yu Q, Zou Y, Sun Y, Ma F, Zhao T. Transposable elements in Rosaceae: insights into genome evolution, expression dynamics, and syntenic gene regulation. HORTICULTURE RESEARCH 2024; 11:uhae118. [PMID: 38919560 PMCID: PMC11197308 DOI: 10.1093/hr/uhae118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 04/17/2024] [Indexed: 06/27/2024]
Abstract
Transposable elements (TEs) exert significant influence on plant genomic structure and gene expression. Here, we explored TE-related aspects across 14 Rosaceae genomes, investigating genomic distribution, transposition activity, expression patterns, and nearby differentially expressed genes (DEGs). Analyses unveiled distinct long terminal repeat retrotransposon (LTR-RT) evolutionary patterns, reflecting varied genome size changes among nine species over the past million years. In the past 2.5 million years, Rubus idaeus showed a transposition rate twice as fast as Fragaria vesca, while Pyrus bretschneideri displayed significantly faster transposition compared with Crataegus pinnatifida. Genes adjacent to recent TE insertions were linked to adversity resistance, while those near previous insertions were functionally enriched in morphogenesis, enzyme activity, and metabolic processes. Expression analysis revealed diverse responses of LTR-RTs to internal or external conditions. Furthermore, we identified 3695 pairs of syntenic DEGs proximal to TEs in Malus domestica cv. 'Gala' and M. domestica (GDDH13), suggesting TE insertions may contribute to varietal trait differences in these apple varieties. Our study across representative Rosaceae species underscores the pivotal role of TEs in plant genome evolution within this diverse family. It elucidates how these elements regulate syntenic DEGs on a genome-wide scale, offering insights into Rosaceae-specific genomic evolution.
Collapse
Affiliation(s)
- Ze Yu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Jiale Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Hanyu Wang
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Boya Ping
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Xinchu Li
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Zhiguang Liu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Bocheng Guo
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Qiaoming Yu
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yangjun Zou
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Yaqiang Sun
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Fengwang Ma
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| | - Tao Zhao
- State Key Laboratory for Crop Stress Resistance and High-Efficiency Production/Shaanxi Key Laboratory of Apple, College of Horticulture, Northwest A&F University, Yangling, Shaanxi 712100, China
| |
Collapse
|
3
|
Schmidt N, Sielemann K, Breitenbach S, Fuchs J, Pucker B, Weisshaar B, Holtgräwe D, Heitkam T. Repeat turnover meets stable chromosomes: repetitive DNA sequences mark speciation and gene pool boundaries in sugar beet and wild beets. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2024; 118:171-190. [PMID: 38128038 DOI: 10.1111/tpj.16599] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 12/05/2023] [Accepted: 12/08/2023] [Indexed: 12/23/2023]
Abstract
Sugar beet and its wild relatives share a base chromosome number of nine and similar chromosome morphologies. Yet, interspecific breeding is impeded by chromosome and sequence divergence that is still not fully understood. Since repetitive DNAs are among the fastest evolving parts of the genome, we investigated, if repeatome innovations and losses are linked to chromosomal differentiation and speciation. We traced genome and chromosome-wide evolution across 13 beet species comprising all sections of the genera Beta and Patellifolia. For this, we combined short and long read sequencing, flow cytometry, and cytogenetics to build a comprehensive framework that spans the complete scale from DNA to chromosome to genome. Genome sizes and repeat profiles reflect the separation into three gene pools with contrasting evolutionary patterns. Among all repeats, satellite DNAs harbor most genomic variability, leading to fundamentally different centromere architectures, ranging from chromosomal uniformity in Beta and Patellifolia to the formation of patchwork chromosomes in Corollinae/Nanae. We show that repetitive DNAs are causal for the genome expansions and contractions across the beet genera, providing insights into the genomic underpinnings of beet speciation. Satellite DNAs in particular vary considerably between beet genomes, leading to the evolution of distinct chromosomal setups in the three gene pools, likely contributing to the barriers in beet breeding. Thus, with their isokaryotypic chromosome sets, beet genomes present an ideal system for studying the link between repeats, genomic variability, and chromosomal differentiation and provide a theoretical fundament for understanding barriers in any crop breeding effort.
Collapse
Affiliation(s)
- Nicola Schmidt
- Faculty of Biology, Technische Universität Dresden, 01069, Dresden, Germany
| | - Katharina Sielemann
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany
- Graduate School DILS, Bielefeld Institute for Bioinformatics Infrastructure (BIBI), Bielefeld University, 33615, Bielefeld, Germany
| | - Sarah Breitenbach
- Faculty of Biology, Technische Universität Dresden, 01069, Dresden, Germany
| | - Jörg Fuchs
- Leibniz Institute of Plant Genetics and Crop Plant Research (IPK) Gatersleben, 06466, Stadt Seeland, Germany
| | - Boas Pucker
- Plant Biotechnology and Bioinformatics, Institute of Plant Biology & Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, 38106, Braunschweig, Germany
| | - Bernd Weisshaar
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany
| | - Daniela Holtgräwe
- Genetics and Genomics of Plants, Center for Biotechnology (CeBiTec) & Faculty of Biology, Bielefeld University, 33615, Bielefeld, Germany
| | - Tony Heitkam
- Faculty of Biology, Technische Universität Dresden, 01069, Dresden, Germany
- Institute of Biology, NAWI Graz, Karl-Franzens-Universität, A-8010 Graz, Graz, Austria
| |
Collapse
|
4
|
Chen Z, Ain NU, Zhao Q, Zhang X. From tradition to innovation: conventional and deep learning frameworks in genome annotation. Brief Bioinform 2024; 25:bbae138. [PMID: 38581418 PMCID: PMC10998533 DOI: 10.1093/bib/bbae138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 03/08/2024] [Accepted: 03/10/2024] [Indexed: 04/08/2024] Open
Abstract
Following the milestone success of the Human Genome Project, the 'Encyclopedia of DNA Elements (ENCODE)' initiative was launched in 2003 to unearth information about the numerous functional elements within the genome. This endeavor coincided with the emergence of numerous novel technologies, accompanied by the provision of vast amounts of whole-genome sequences, high-throughput data such as ChIP-Seq and RNA-Seq. Extracting biologically meaningful information from this massive dataset has become a critical aspect of many recent studies, particularly in annotating and predicting the functions of unknown genes. The core idea behind genome annotation is to identify genes and various functional elements within the genome sequence and infer their biological functions. Traditional wet-lab experimental methods still rely on extensive efforts for functional verification. However, early bioinformatics algorithms and software primarily employed shallow learning techniques; thus, the ability to characterize data and features learning was limited. With the widespread adoption of RNA-Seq technology, scientists from the biological community began to harness the potential of machine learning and deep learning approaches for gene structure prediction and functional annotation. In this context, we reviewed both conventional methods and contemporary deep learning frameworks, and highlighted novel perspectives on the challenges arising during annotation underscoring the dynamic nature of this evolving scientific landscape.
Collapse
Affiliation(s)
- Zhaojia Chen
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
- College of Biomedical Engineering, Taiyuan University of Technology, Jinzhong 030600, China
| | - Noor ul Ain
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| | - Qian Zhao
- State Key Laboratory for Ecological Pest Control of Fujian/Taiwan Crops and College of Life Science, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
| | - Xingtan Zhang
- National Key Laboratory for Tropical Crop Breeding, Shenzhen Branch, Guangdong Laboratory for Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, Guangzhou 518120, China
| |
Collapse
|
5
|
Hassan AH, Mokhtar MM, El Allali A. Transposable elements: multifunctional players in the plant genome. FRONTIERS IN PLANT SCIENCE 2024; 14:1330127. [PMID: 38239225 PMCID: PMC10794571 DOI: 10.3389/fpls.2023.1330127] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 12/06/2023] [Indexed: 01/22/2024]
Abstract
Transposable elements (TEs) are indispensable components of eukaryotic genomes that play diverse roles in gene regulation, recombination, and environmental adaptation. Their ability to mobilize within the genome leads to gene expression and DNA structure changes. TEs serve as valuable markers for genetic and evolutionary studies and facilitate genetic mapping and phylogenetic analysis. They also provide insight into how organisms adapt to a changing environment by promoting gene rearrangements that lead to new gene combinations. These repetitive sequences significantly impact genome structure, function and evolution. This review takes a comprehensive look at TEs and their applications in biotechnology, particularly in the context of plant biology, where they are now considered "genomic gold" due to their extensive functionalities. The article addresses various aspects of TEs in plant development, including their structure, epigenetic regulation, evolutionary patterns, and their use in gene editing and plant molecular markers. The goal is to systematically understand TEs and shed light on their diverse roles in plant biology.
Collapse
Affiliation(s)
- Asmaa H. Hassan
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agriculture Research Center, Giza, Egypt
| | - Morad M. Mokhtar
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
- Agricultural Genetic Engineering Research Institute, Agriculture Research Center, Giza, Egypt
| | - Achraf El Allali
- Bioinformatics Laboratory, College of Computing, Mohammed VI Polytechnic University, Ben Guerir, Morocco
| |
Collapse
|
6
|
Sato R, Kondo Y, Agarie S. The first released available genome of the common ice plant ( Mesembryanthemum crystallinum L.) extended the research region on salt tolerance, C 3-CAM photosynthetic conversion, and halophilism. F1000Res 2024; 12:448. [PMID: 38618020 PMCID: PMC11016173 DOI: 10.12688/f1000research.129958.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/03/2024] [Indexed: 04/16/2024] Open
Abstract
Background The common ice plant ( Mesembryanthemum crystallinum L.) is an annual herb belonging to the genus Mesembryanthemum of the family Aizoaceae, native to Southern Africa. Methods We performed shotgun genome paired-end sequencing using the Illumina platform to determine the genome sequence of the ice plants. We assembled the whole genome sequences using the genome assembler "ALGA" and "Redundans", then released them as available genomic information. Finally, we mainly estimated the potential genomic function by the homology search method. Results A draft genome was generated with a total length of 286 Mb corresponding to 79.2% of the estimated genome size (361 Mb), consisting of 49,782 contigs. It encompassed 93.49% of the genes of terrestrial higher plants, 99.5% of the ice plant transcriptome, and 100% of known DNA sequences. In addition, 110.9 Mb (38.8%) of repetitive sequences and untranslated regions, 971 tRNA, and 100 miRNA loci were identified, and their effects on stress tolerance and photosynthesis were investigated. Molecular phylogenetic analysis based on ribosomal DNA among 26 kinds of plant species revealed genetic similarity between the ice plant and poplar, which have salt tolerance. Overall, 35,702 protein-coding regions were identified in the genome, of which 56.05% to 82.59% were annotated and submitted to domain searches and gene ontology (GO) analyses, which found that eighteen GO terms stood out among five plant species. These terms were related to biological defense, growth, reproduction, transcription, post-transcription, and intermembrane transportation, regarded as one of the fundamental results of using the utilized ice plant genome. Conclusions The information that we characterized is useful for elucidation of the mechanism of growth promotion under salinity and reversible conversion of the photosynthetic type from C3 to Crassulacean Acid Metabolism (CAM).
Collapse
Affiliation(s)
- Ryoma Sato
- Graduate school of Bioresource and Bioenvironmental Sciences, Kyushu University, 744 Motooka Nishi-ku Fukuoka, 819-0395, Japan
| | - Yuri Kondo
- Graduate school of Bioresource and Bioenvironmental Sciences, Kyushu University, 744 Motooka Nishi-ku Fukuoka, 819-0395, Japan
| | - Sakae Agarie
- Faculty of Agriculture, Kyushu University, 744 Motooka Nishi-ku Fukuoka, 819-0395, Japan
| |
Collapse
|
7
|
Lei L, Gordon SP, Liu L, Sade N, Lovell JT, Rubio Wilhelmi MDM, Singan V, Sreedasyam A, Hestrin R, Phillips J, Hernandez BT, Barry K, Shu S, Jenkins J, Schmutz J, Goodstein DM, Thilmony R, Blumwald E, Vogel JP. The reference genome and abiotic stress responses of the model perennial grass Brachypodium sylvaticum. G3 (BETHESDA, MD.) 2023; 14:jkad245. [PMID: 37883711 PMCID: PMC10755203 DOI: 10.1093/g3journal/jkad245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Revised: 09/12/2023] [Accepted: 09/28/2023] [Indexed: 10/28/2023]
Abstract
Perennial grasses are important forage crops and emerging biomass crops and have the potential to be more sustainable grain crops. However, most perennial grass crops are difficult experimental subjects due to their large size, difficult genetics, and/or their recalcitrance to transformation. Thus, a tractable model perennial grass could be used to rapidly make discoveries that can be translated to perennial grass crops. Brachypodium sylvaticum has the potential to serve as such a model because of its small size, rapid generation time, simple genetics, and transformability. Here, we provide a high-quality genome assembly and annotation for B. sylvaticum, an essential resource for a modern model system. In addition, we conducted transcriptomic studies under 4 abiotic stresses (water, heat, salt, and freezing). Our results indicate that crowns are more responsive to freezing than leaves which may help them overwinter. We observed extensive transcriptional responses with varying temporal dynamics to all abiotic stresses, including classic heat-responsive genes. These results can be used to form testable hypotheses about how perennial grasses respond to these stresses. Taken together, these results will allow B. sylvaticum to serve as a truly tractable perennial model system.
Collapse
Affiliation(s)
- Li Lei
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Sean P Gordon
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Lifeng Liu
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nir Sade
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
- School of Plant Sciences and Food Security, Tel Aviv University, Tel Aviv 69978, Israel
| | - John T Lovell
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | | | - Vasanth Singan
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Avinash Sreedasyam
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Rachel Hestrin
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jeremy Phillips
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Bryan T Hernandez
- Crop Improvement and Genetics Research Unit, USDA-ARS Western Regional Research Center, Albany, CA 94710, USA
| | - Kerrie Barry
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Shengqiang Shu
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jerry Jenkins
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - Jeremy Schmutz
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
- Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, AL 35806, USA
| | - David M Goodstein
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Roger Thilmony
- Crop Improvement and Genetics Research Unit, USDA-ARS Western Regional Research Center, Albany, CA 94710, USA
| | - Eduardo Blumwald
- Department of Plant Sciences, University of California, Davis, CA 95616, USA
| | - John P Vogel
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
8
|
Gultyaev AP, Koster C, van Batenburg DC, Sistermans T, van Belle N, Vijfvinkel D, Roussis A. Conserved structured domains in plant non-coding RNA enod40, their evolution and recruitment of sequences from transposable elements. NAR Genom Bioinform 2023; 5:lqad091. [PMID: 37850034 PMCID: PMC10578108 DOI: 10.1093/nargab/lqad091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Revised: 07/22/2023] [Accepted: 09/22/2023] [Indexed: 10/19/2023] Open
Abstract
Plant long noncoding RNA enod40 is involved in the regulation of symbiotic associations with bacteria, in particular, in nitrogen-fixing root nodules of legumes, and with fungi in phosphate-acquiring arbuscular mycorrhizae formed by various plants. The presence of enod40 genes in plants that do not form such symbioses indicates its other roles in cell physiology. The molecular mechanisms of enod40 RNA function are poorly understood. Enod40 RNAs form several structured domains, conserved to different extents. Due to relatively low sequence similarity, identification of enod40 sequences in plant genomes is not straightforward, and many enod40 genes remain unannotated even in complete genomes. Here, we used comparative structure analysis and sequence similarity searches in order to locate enod40 genes and determine enod40 RNA structures in nitrogen-fixing clade plants and in grasses. The structures combine conserved features with considerable diversity of structural elements, including insertions of structured domain modules originating from transposable elements. Remarkably, these insertions contain sequences similar to tandem repeats and several stem-loops are homologous to microRNA precursors.
Collapse
Affiliation(s)
- Alexander P Gultyaev
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
- Department of Viroscience, Erasmus Medical Center, PO Box 2040, 3000 CA Rotterdam, The Netherlands
| | - Celine Koster
- Life Science & Technology Honours College, Leiden University, PO Box 9502, 2300 RA Leiden, The Netherlands
- Amsterdam University Medical Center, Department of Human Genetics, section Ophthalmogenetics, Location AMC, Meibergdreef 9, Amsterdam, The Netherlands
| | - Diederik Cames van Batenburg
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
- CareRate, Unit E1.165, Stationsplein 45, 3013 AK Rotterdam, The Netherlands
| | - Tom Sistermans
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
- Institute of Organismic and Molecular Evolution, Johannes Gutenberg University Mainz, 55128 Mainz, Germany
| | - Niels van Belle
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
| | - Daan Vijfvinkel
- Leiden Institute of Advanced Computer Science, Leiden University, PO Box 9512, 2300 RA Leiden, The Netherlands
| | - Andreas Roussis
- National & Kapodistrian University of Athens, Faculty of Biology, Section of Botany, Group Molecular Plant Physiology, Panepistimiopolis - Zografou - Athens, 15784, Greece
| |
Collapse
|
9
|
Varghese R, Cherukuri AK, Doddrell NH, Doss CGP, Simkin AJ, Ramamoorthy S. Machine learning in photosynthesis: Prospects on sustainable crop development. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2023; 335:111795. [PMID: 37473784 DOI: 10.1016/j.plantsci.2023.111795] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 07/10/2023] [Accepted: 07/13/2023] [Indexed: 07/22/2023]
Abstract
Improving photosynthesis is a promising avenue to increase food security. Studying photosynthetic traits with the aim to improve efficiency has been one of many strategies to increase crop yield but analyzing large data sets presents an ongoing challenge. Machine learning (ML) represents a ubiquitous tool that can provide a more elaborate data analysis. Here we review the application of ML in various domains of photosynthetic research, as well as in photosynthetic pigment studies. We highlight how correlating hyperspectral data with photosynthetic parameters to improve crop yield could be achieved through various ML algorithms. We also propose strategies to employ ML in promoting photosynthetic pigment research for furthering crop yield.
Collapse
Affiliation(s)
- Ressin Varghese
- School of Bio Sciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Aswani Kumar Cherukuri
- School of Information Technology and Engineering, VIT University, Vellore 632014, Tamil Nadu, India
| | | | - C George Priya Doss
- School of Bio Sciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India
| | - Andrew J Simkin
- School of Biosciences, University of Kent, Canterbury CT2 7NJ, UK; School of Life Sciences, University of Essex, Wivenhoe Park, Colchester CO4 3SQ, UK
| | - Siva Ramamoorthy
- School of Bio Sciences and Technology, VIT University, Vellore 632014, Tamil Nadu, India.
| |
Collapse
|
10
|
Orozco-Arias S, Lopez-Murillo LH, Piña JS, Valencia-Castrillon E, Tabares-Soto R, Castillo-Ossa L, Isaza G, Guyot R. Genomic object detection: An improved approach for transposable elements detection and classification using convolutional neural networks. PLoS One 2023; 18:e0291925. [PMID: 37733731 PMCID: PMC10513252 DOI: 10.1371/journal.pone.0291925] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 09/10/2023] [Indexed: 09/23/2023] Open
Abstract
Analysis of eukaryotic genomes requires the detection and classification of transposable elements (TEs), a crucial but complex and time-consuming task. To improve the performance of tools that accomplish these tasks, Machine Learning approaches (ML) that leverage computer resources, such as GPUs (Graphical Processing Unit) and multiple CPU (Central Processing Unit) cores, have been adopted. However, until now, the use of ML techniques has mostly been limited to classification of TEs. Herein, a detection-classification strategy (named YORO) based on convolutional neural networks is adapted from computer vision (YOLO) to genomics. This approach enables the detection of genomic objects through the prediction of the position, length, and classification in large DNA sequences such as fully sequenced genomes. As a proof of concept, the internal protein-coding domains of LTR-retrotransposons are used to train the proposed neural network. Precision, recall, accuracy, F1-score, execution times and time ratios, as well as several graphical representations were used as metrics to measure performance. These promising results open the door for a new generation of Deep Learning tools for genomics. YORO architecture is available at https://github.com/simonorozcoarias/YORO.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia
- Center for Technology Development Bioprocess and Agroindustry Plant, Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | | | - Johan S. Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia
| | | | - Reinel Tabares-Soto
- Center for Technology Development Bioprocess and Agroindustry Plant, Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Luis Castillo-Ossa
- Center for Technology Development Bioprocess and Agroindustry Plant, Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | - Gustavo Isaza
- Center for Technology Development Bioprocess and Agroindustry Plant, Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
- Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| |
Collapse
|
11
|
Yang Y, Wen X, Wu Z, Wang K, Zhu Y. Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton. SCIENCE CHINA. LIFE SCIENCES 2023; 66:1711-1724. [PMID: 37079218 DOI: 10.1007/s11427-022-2341-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 04/03/2023] [Indexed: 04/21/2023]
Abstract
Genomic analysis has revealed that the 1,637-Mb Gossypium arboreum genome contains approximately 81% transposable elements (TEs), while only 57% of the 735-Mb G. raimondii genome is occupied by TEs. In this study, we investigated whether there were unknown transcripts associated with TE or TE fragments and, if so, how these new transcripts were evolved and regulated. As sequence depths increased from 4 to 100 G, a total of 10,284 novel intergenic transcripts (intergenic genes) were discovered. On average, approximately 84% of these intergenic transcripts possibly overlapped with the long terminal repeat (LTR) insertions in the otherwise untranscribed intergenic regions and were expressed at relatively low levels. Most of these intergenic transcripts possessed no transcription activation markers, while the majority of the regular genic genes possessed at least one such marker. Genes without transcription activation markers formed their+1 and -1 nucleosomes more closely (only (117±1.4)bp apart), while twice as big spaces (approximately (403.5±46.0) bp apart) were detected for genes with the activation markers. The analysis of 183 previously assembled genomes across three different kingdoms demonstrated systematically that intergenic transcript numbers in a given genome correlated positively with its LTR content. Evolutionary analysis revealed that genic genes originated during one of the whole-genome duplication events around 137.7 million years ago (MYA) for all eudicot genomes or 13.7 MYA for the Gossypium family, respectively, while the intergenic transcripts evolved around 1.6 MYA, resultant of the last LTR insertion. The characterization of these low-transcribed intergenic transcripts can facilitate our understanding of the potential biological roles played by LTRs during speciation and diversifications.
Collapse
Affiliation(s)
- Yan Yang
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China
| | - Xingpeng Wen
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Zhiguo Wu
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Kun Wang
- College of Life Sciences, Wuhan University, Wuhan, 430072, China
| | - Yuxian Zhu
- Institute for Advanced Studies, Wuhan University, Wuhan, 430072, China.
- College of Life Sciences, Wuhan University, Wuhan, 430072, China.
- Hubei Hongshan Laboratory, Wuhan, 430072, China.
- TaiKang Center for Life and Medical Sciences, RNA Institute, Remin Hospital, Wuhan University, Wuhan, 430072, China.
| |
Collapse
|
12
|
Piña JS, Orozco-Arias S, Tobón-Orozco N, Camargo-Forero L, Tabares-Soto R, Guyot R. G-SAIP: Graphical Sequence Alignment Through Parallel Programming in the Post-Genomic Era. Evol Bioinform Online 2023; 19:11769343221150585. [PMID: 36703866 PMCID: PMC9871978 DOI: 10.1177/11769343221150585] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Accepted: 12/23/2022] [Indexed: 01/22/2023] Open
Abstract
A common task in bioinformatics is to compare DNA sequences to identify similarities between organisms at the sequence level. An approach to such comparison is the dot-plots, a 2-dimensional graphical representation to analyze DNA or protein alignments. Dot-plots alignment software existed before the sequencing revolution, and now there is an ongoing limitation when dealing with large-size sequences, resulting in very long execution times. High-Performance Computing (HPC) techniques have been successfully used in many applications to reduce computing times, but so far, very few applications for graphical sequence alignment using HPC have been reported. Here, we present G-SAIP (Graphical Sequence Alignment in Parallel), a software capable of spawning multiple distributed processes on CPUs, over a supercomputing infrastructure to speed up the execution time for dot-plot generation up to 1.68× compared with other current fastest tools, improve the efficiency for comparative structural genomic analysis, phylogenetics because the benefits of pairwise alignments for comparison between genomes, repetitive structure identification, and assembly quality checking.
Collapse
Affiliation(s)
- Johan S. Piña
- Department of Data Science, People
Contact, Manizales, Caldas, Colombia,Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Johan S. Piña, Department of Computer
Science, Universidad Autónoma de Manizales, Antigua estación del ferrocarril,
Manizales, Caldas 170004, Colombia.
| | - Simon Orozco-Arias
- Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Department of Systems and Informatics,
Universidad de Caldas, Manizales, Caldas, Colombia
| | - Nicolas Tobón-Orozco
- Department of Computer Science,
Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | | | - Reinel Tabares-Soto
- Department of Electronics and
Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and
Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia,Institut de Recherche pour le
Développement, CIRAD, University of Montpellier, Montpellier, France
| |
Collapse
|
13
|
Ramakrishnan M, Papolu PK, Mullasseri S, Zhou M, Sharma A, Ahmad Z, Satheesh V, Kalendar R, Wei Q. The role of LTR retrotransposons in plant genetic engineering: how to control their transposition in the genome. PLANT CELL REPORTS 2023; 42:3-15. [PMID: 36401648 DOI: 10.1007/s00299-022-02945-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 10/23/2022] [Indexed: 06/16/2023]
Abstract
We briefly discuss that the similarity of LTR retrotransposons to retroviruses is a great opportunity for the development of a genetic engineering tool that exploits intragenic elements in the plant genome for plant genetic improvement. Long terminal repeat (LTR) retrotransposons are very similar to retroviruses but do not have the property of being infectious. While spreading between its host cells, a retrovirus inserts a DNA copy of its genome into the cells. The ability of retroviruses to cause infection with genome integration allows genes to be delivered to cells and tissues. Retrovirus vectors are, however, only specific to animals and insects, and, thus, are not relevant to plant genetic engineering. However, the similarity of LTR retrotransposons to retroviruses is an opportunity to explore the former as a tool for genetic engineering. Although recent long-read sequencing technologies have advanced the knowledge about transposable elements (TEs), the integration of TEs is still unable either to control them or to direct them to specific genomic locations. The use of existing intragenic elements to achieve the desired genome composition is better than using artificial constructs like vectors, but it is not yet clear how to control the process. Moreover, most LTR retrotransposons are inactive and unable to produce complete proteins. They are also highly mutable. In addition, it is impossible to find a full active copy of a LTR retrotransposon out of thousands of its own copies. Theoretically, if these elements were directly controlled and turned on or off using certain epigenetic mechanisms (inducing by stress or infection), LTR retrotransposons could be a great opportunity to develop a genetic engineering tool using intragenic elements in the plant genome. In this review, the recent developments in uncovering the nature of LTR retrotransposons and the possibility of using these intragenic elements as a tool for plant genetic engineering are briefly discussed.
Collapse
Affiliation(s)
- Muthusamy Ramakrishnan
- Co-Innovation Center for Sustainable Forestry in Southern China, Bamboo Research Institute, Key Laboratory of National Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Biology and the Environment, Nanjing Forestry University, Nanjing, 210037, Jiangsu, China
| | - Pradeep K Papolu
- State Key Laboratory of Subtropical Silviculture, Institute of Bamboo Research, Zhejiang A&F University, Lin'an, Hangzhou, 311300, Zhejiang, China
| | - Sileesh Mullasseri
- Department of Zoology, St. Albert's College (Autonomous), Kochi, 682018, Kerala, India
| | - Mingbing Zhou
- State Key Laboratory of Subtropical Silviculture, Institute of Bamboo Research, Zhejiang A&F University, Lin'an, Hangzhou, 311300, Zhejiang, China
- Zhejiang Provincial Collaborative Innovation Center for Bamboo Resources and High-Efficiency Utilization, Zhejiang A&F University, Lin'an, Hangzhou, 311300, Zhejiang, China
| | - Anket Sharma
- State Key Laboratory of Subtropical Silviculture, Institute of Bamboo Research, Zhejiang A&F University, Lin'an, Hangzhou, 311300, Zhejiang, China
- Department of Plant Science and Landscape Architecture, University of Maryland, College Park, USA
| | - Zishan Ahmad
- Co-Innovation Center for Sustainable Forestry in Southern China, Bamboo Research Institute, Key Laboratory of National Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Biology and the Environment, Nanjing Forestry University, Nanjing, 210037, Jiangsu, China
| | - Viswanathan Satheesh
- Shanghai Center for Plant Stress Biology, CAS Center for Excellence in Molecular Plant Sciences, Chinese Academy of Sciences, Shanghai, 200032, China
| | - Ruslan Kalendar
- Helsinki Institute of Life Science HiLIFE, University of Helsinki, Biocenter 3, Viikinkaari 1, F1-00014, Helsinki, Finland.
- Institute of Plant Biology and Biotechnology (IPBB), Timiryazev Street 45, 050040, Almaty, Kazakhstan.
| | - Qiang Wei
- Co-Innovation Center for Sustainable Forestry in Southern China, Bamboo Research Institute, Key Laboratory of National Forestry and Grassland Administration on Subtropical Forest Biodiversity Conservation, College of Biology and the Environment, Nanjing Forestry University, Nanjing, 210037, Jiangsu, China.
| |
Collapse
|
14
|
Orozco-Arias S, Gaviria-Orrego S, Tabares-Soto R, Isaza G, Guyot R. InpactorDB: A Plant LTR Retrotransposon Reference Library. Methods Mol Biol 2023; 2703:31-44. [PMID: 37646935 DOI: 10.1007/978-1-0716-3389-2_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
LTR retrotransposons (LTR-RT) are major components of plant genomes. These transposable elements participate in the structure and evolution of genes and genomes through their mobility and their copy number amplification. For example, they are commonly used as evolutionary markers in genetic, genomic, and cytogenetic approaches. However, the plant research community is faced with the near absence of free availability of full-length, curated, and lineage-level classified LTR retrotransposon reference sequences. In this chapter, we will introduce InpactorDB, an LTR retrotransposon sequence database of 181 plant species representing 98 plant families for a total of 67,241 non-redundant elements. We will introduce how to use newly sequenced genomes to identify and classify LTR-RTs in a similar way with a standardized procedure using the Inpactor tool. InpactorDB is freely available at https://inpactordb.github.io .
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Simon Gaviria-Orrego
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, Montpellier, France.
| |
Collapse
|
15
|
Orozco-Arias S, Humberto Lopez-Murillo L, Candamil-Cortés MS, Arias M, Jaimes PA, Rossi Paschoal A, Tabares-Soto R, Isaza G, Guyot R. Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes. Brief Bioinform 2022; 24:6887110. [PMID: 36502372 PMCID: PMC9851300 DOI: 10.1093/bib/bbac511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/13/2022] [Accepted: 10/26/2022] [Indexed: 12/14/2022] Open
Abstract
LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | | | | | - Maradey Arias
- Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Alexandre Rossi Paschoal
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170001, Caldas, Colombia
| | - Gustavo Isaza
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| | - Romain Guyot
- Corresponding authors. Simon Orozco-Arias, Computer Science Department, Universidad Autónoma de Manizales, Antigua Estación del Ferrocarrill, Manizalez, Colombia. Tel.: +57(606)8727272 - 8727709 Ext 102; E-mail: ; Alexandre Rossi Paschoal, Department of Computer Science, Bioinformatics and Pattern Recognition Group, Graduation Program in Bioinformatics, Federal University of Technology - Paraná, UTFPR, Cornélio Procópio, Paraná, 86300-000, Brazil. Tel.: +433133-3790; E-mail: ; Gustavo Isaza, Systems and Informatics Department, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, St 65 #26-10, Manizales, Colombia. Tel.: +57(606)8781500 ext 13146; E-mail: , Romain Guyot, IRD, 911 Av. Agropolis, 34394 Montpellier, France. Tel.: +334674160000; E-mail:
| |
Collapse
|
16
|
Genome-Wide Comparison of Structural Variations and Transposon Alterations in Soybean Cultivars Induced by Spaceflight. Int J Mol Sci 2022; 23:ijms232213721. [PMID: 36430198 PMCID: PMC9696660 DOI: 10.3390/ijms232213721] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 10/31/2022] [Accepted: 11/01/2022] [Indexed: 11/09/2022] Open
Abstract
Space mutation causes genetic and phenotypic changes in biological materials. Transposon activation is an adaptive mechanism for organisms to cope with changes in the external environment, such as space mutation. Although transposon alterations have been widely reported in diverse plant species, few studies have assessed the global transposon alterations in plants exposed to the space environment. In this study, for the first time, the effects of transposon alterations in soybean caused by space mutation were considered. A new vegetable soybean variety, 'Zhexian 9' (Z9), derived from space mutation treatment of 'Taiwan 75' (T75), was genetically analyzed. Comparative analyses of these two soybean genomes uncovered surprising structural differences, especially with respect to translocation breakends, deletions, and inversions. In total, 12,028 structural variations (SVs) and 29,063 transposable elements (TEs) between T75 and Z9 were detected. In addition, 1336 potential genes were variable between T75 and Z9 in terms of SVs and TEs. These differential genes were enriched in functions such as defense response, cell wall-related processes, epigenetics, auxin metabolism and transport, signal transduction, and especially methylation, which implied that regulation of epigenetic mechanisms and TE activity are important in the space environment. These results are helpful for understanding the role of TEs in response to the space environment and provide a theoretical basis for the selection of wild plant materials suitable for space breeding.
Collapse
|
17
|
Orozco-Arias S, Candamil-Cortes MS, Jaimes PA, Valencia-Castrillon E, Tabares-Soto R, Isaza G, Guyot R. Automatic curation of LTR retrotransposon libraries from plant genomes through machine learning. J Integr Bioinform 2022; 19:jib-2021-0036. [PMID: 35822734 PMCID: PMC9521825 DOI: 10.1515/jib-2021-0036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2021] [Accepted: 06/10/2022] [Indexed: 11/19/2022] Open
Abstract
Transposable elements are mobile sequences that can move and insert themselves into chromosomes, activating under internal or external stimuli, giving the organism the ability to adapt to the environment. Annotating transposable elements in genomic data is currently considered a crucial task to understand key aspects of organisms such as phenotype variability, species evolution, and genome size, among others. Because of the way they replicate, LTR retrotransposons are the most common transposable elements in plants, accounting in some cases for up to 80% of all DNA information. To annotate these elements, a reference library is usually created, a curation process is performed, eliminating TE fragments and false positives and then annotated in the genome using the homology method. However, the curation process can take weeks, requires extensive manual work and the execution of multiple time-consuming bioinformatics software. Here, we propose a machine learning-based approach to perform this process automatically on plant genomes, obtaining up to 91.18% F1-score. This approach was tested with four plant species, obtaining up to 93.6% F1-score (Oryza granulata) in only 22.61 s, where bioinformatics methods took approximately 6 h. This acceleration demonstrates that the ML-based approach is efficient and could be used in massive sequencing projects.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | | | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Colombia
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia.,Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| |
Collapse
|
18
|
de Souza TB, Parteka LM, de Assis R, Vanzela ALL. Diversity of the repetitive DNA fraction in Cestrum, the genus with the largest genomes within Solanaceae. Mol Biol Rep 2022; 49:8785-8799. [PMID: 35809181 DOI: 10.1007/s11033-022-07728-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2021] [Accepted: 06/17/2022] [Indexed: 10/17/2022]
Abstract
BACKGROUND Cestrum species present large genomes (2 C = ~ 24 pg), a high occurrence of B chromosomes and great diversity in heterochromatin bands. Despite this diversity, karyotypes maintain the chromosome number 2n = 16 (except when they present B chromosomes), and a relative similarity in chromosome morphology and symmetry. To deepen our knowledge of the Cestrum genome composition, low-coverage sequencing data of C. strigilatum and C. elegans were compared, including cytogenomic analyses of seven species. METHODS AND RESULTS Bioinformatics analyses showed retrotransposons comprising more than 70% of the repetitive fraction, followed by DNA transposons (~ 17%), but FISH assays using retrotransposon probes revealed inconspicuous and scattered signals. The four satellite DNA families here analyzed represented approximately 2.48% of the C. strigilatum dataset, and these sequences were used as probes in FISH assays. Hybridization signals were colocalized with all AT- and GC-rich sequences associated with heterochromatin, including AT-rich Cold-Sensitive Regions (CSRs). Although satellite probes hybridized in almost all tested species, a satDNA family named CsSat49 was highlighted because it predominates in centromeric regions. CONCLUSIONS Data suggest that the satDNA fraction is conserved in the genus, although there is variation in the number of FISH signals between karyotypes. Except to the absence of FISH signals with probes CsSat1 and CsSat72 in two species, the other satellites occurred in species of different phylogenetic clades. Some satDNA sequences have been detected in the B chromosomes, indicating that they are rich in preexisting sequences in the chromosomes of the A complement. This comparative study provides an important advance in the knowledge on genome organization and heterochromatin composition in Cestrum, especially on the distribution of satellite fractions between species and their importance for the B chromosome composition.
Collapse
Affiliation(s)
- Thaíssa Boldieri de Souza
- Laboratório de Citogenética e Diversidade Vegetal, Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil.,Programa de Pós-graduação em Genética e Biologia Molecular, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil
| | - Letícia Maria Parteka
- Laboratório de Citogenética e Diversidade Vegetal, Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil.,Programa de Pós-graduação em Genética e Biologia Molecular, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil
| | - Rafael de Assis
- Laboratório de Citogenética e Diversidade Vegetal, Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil.,Programa de Pós-graduação em Genética e Biologia Molecular, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil
| | - André Luís Laforga Vanzela
- Laboratório de Citogenética e Diversidade Vegetal, Departamento de Biologia Geral, Centro de Ciências Biológicas, Universidade Estadual de Londrina, Londrina, Paraná, 86097-570, Brazil.
| |
Collapse
|
19
|
Specificities and Dynamics of Transposable Elements in Land Plants. BIOLOGY 2022; 11:biology11040488. [PMID: 35453688 PMCID: PMC9033089 DOI: 10.3390/biology11040488] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 03/10/2022] [Accepted: 03/18/2022] [Indexed: 01/27/2023]
Abstract
Simple Summary Transposable elements are dynamic components of plant genomes, and display a high diversity of lineages and distribution as the result of evolutionary driving forces and overlapping mechanisms of genetic and epigenetic regulation. They are now regarded as main contributors for genome evolution and function, and important regulators of endogenous gene expression. In this review, we survey recent progress and current challenges in the identification and classification of transposon lineages in complex plant genomes, highlighting the molecular specificities that may explain the expansion and diversification of mobile genetic elements in land plants. Abstract Transposable elements (TEs) are important components of most plant genomes. These mobile repetitive sequences are highly diverse in terms of abundance, structure, transposition mechanisms, activity and insertion specificities across plant species. This review will survey the different mechanisms that may explain the variability of TE patterns in land plants, highlighting the tight connection between TE dynamics and host genome specificities, and their co-evolution to face and adapt to a changing environment. We present the current TE classification in land plants, and describe the different levels of genetic and epigenetic controls originating from the plant, the TE itself, or external environmental factors. Such overlapping mechanisms of TE regulation might be responsible for the high diversity and dynamics of plant TEs observed in nature.
Collapse
|
20
|
Finding and Characterizing Repeats in Plant Genomes. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2022; 2443:327-385. [PMID: 35037215 DOI: 10.1007/978-1-0716-2067-0_18] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Plant genomes contain a particularly high proportion of repeated structures of various types. This chapter proposes a guided tour of the available software that can help biologists to scan automatically for these repeats in sequence data or check hypothetical models intended to characterize their structures. Since transposable elements (TEs) are a major source of repeats in plants, many methods have been used or developed for this broad class of sequences. They are representative of the range of tools available for other classes of repeats and we have provided two sections on this topic (for the analysis of genomes or directly of sequenced reads), as well as a selection of the main existing software. It may be hard to keep up with the profusion of proposals in this dynamic field and the rest of the chapter is devoted to the foundations of an efficient search for repeats and more complex patterns. We first introduce the key concepts of the art of indexing and mapping or querying sequences. We end the chapter with the more prospective issue of building models of repeat families. We present the Machine Learning approach first, seeking to build predictors automatically for some families of ET, from a set of sequences known to belong to this family. A second approach, the linguistic (or syntactic) approach, allows biologists to describe themselves and check the validity of models of their favorite repeat family.
Collapse
|
21
|
Ayala-Usma DA, Cárdenas M, Guyot R, Mares MCD, Bernal A, Muñoz AR, Restrepo S. A whole genome duplication drives the genome evolution of Phytophthora betacei, a closely related species to Phytophthora infestans. BMC Genomics 2021; 22:795. [PMID: 34740326 PMCID: PMC8571832 DOI: 10.1186/s12864-021-08079-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2021] [Accepted: 09/27/2021] [Indexed: 11/14/2022] Open
Abstract
Background Pathogens of the genus Phytophthora are the etiological agents of many devastating diseases in several high-value crops and forestry species such as potato, tomato, cocoa, and oak, among many others. Phytophthora betacei is a recently described species that causes late blight almost exclusively in tree tomatoes, and it is closely related to Phytophthora infestans that causes the disease in potato crops and other Solanaceae. This study reports the assembly and annotation of the genomes of P. betacei P8084, the first of its species, and P. infestans RC1-10, a Colombian strain from the EC-1 lineage, using long-read SMRT sequencing technology. Results Our results show that P. betacei has the largest sequenced genome size of the Phytophthora genus so far with 270 Mb. A moderate transposable element invasion and a whole genome duplication likely explain its genome size expansion when compared to P. infestans, whereas P. infestans RC1-10 has expanded its genome under the activity of transposable elements. The high diversity and abundance (in terms of copy number) of classified and unclassified transposable elements in P. infestans RC1-10 relative to P. betacei bears testimony of the power of long-read technologies to discover novel repetitive elements in the genomes of organisms. Our data also provides support for the phylogenetic placement of P. betacei as a standalone species and as a sister group of P. infestans. Finally, we found no evidence to support the idea that the genome of P. betacei P8084 follows the same gene-dense/gense-sparse architecture proposed for P. infestans and other filamentous plant pathogens. Conclusions This study provides the first genome-wide picture of P. betacei and expands the genomic resources available for P. infestans. This is a contribution towards the understanding of the genome biology and evolutionary history of Phytophthora species belonging to the subclade 1c. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-021-08079-y.
Collapse
Affiliation(s)
- David A Ayala-Usma
- Research Group in Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia.,Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia.,Laboratory of Mycology and Plant Pathology (LAMFU), Department of Chemical and Food Engineering, Universidad de Los Andes, Bogotá, Colombia
| | - Martha Cárdenas
- Laboratory of Mycology and Plant Pathology (LAMFU), Department of Chemical and Food Engineering, Universidad de Los Andes, Bogotá, Colombia
| | - Romain Guyot
- Institut de Recherche pour le Développement, CIRAD, Université de Montpellier, 34394, Montpellier, France.,Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Colombia
| | - Maryam Chaib De Mares
- Research Group in Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia.,Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia
| | - Adriana Bernal
- Laboratory of Molecular Interactions of Agricultural Microbes (LIMMA), Department of Biological Sciences, Universidad de Los Andes, Bogotá, Colombia
| | - Alejandro Reyes Muñoz
- Research Group in Computational Biology and Microbial Ecology, Department of Biological Sciences, Universidad de los Andes, Bogotá, Colombia. .,Max Planck Tandem Group in Computational Biology, Universidad de los Andes, Bogotá, Colombia. .,The Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, MO, 63108, St Louis, USA.
| | - Silvia Restrepo
- Laboratory of Mycology and Plant Pathology (LAMFU), Department of Chemical and Food Engineering, Universidad de Los Andes, Bogotá, Colombia.
| |
Collapse
|
22
|
Viviani A, Ventimiglia M, Fambrini M, Vangelisti A, Mascagni F, Pugliesi C, Usai G. Impact of transposable elements on the evolution of complex living systems and their epigenetic control. Biosystems 2021; 210:104566. [PMID: 34718084 DOI: 10.1016/j.biosystems.2021.104566] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Revised: 10/21/2021] [Accepted: 10/21/2021] [Indexed: 10/20/2022]
Abstract
Transposable elements (TEs) contribute to genomic innovations, as well as genome instability, across a wide variety of species. Popular designations such as 'selfish DNA' and 'junk DNA,' common in the 1980s, may be either inaccurate or misleading, while a more enlightened view of the TE-host relationship covers a range from parasitism to mutualism. Both plant and animal hosts have evolved epigenetic mechanisms to reduce the impact of TEs, both by directly silencing them and by reducing their ability to transpose in the genome. However, TEs have also been co-opted by both plant and animal genomes to perform a variety of physiological functions, ranging from TE-derived proteins acting directly in normal biological functions to innovations in transcription factor activity and also influencing gene expression. Their presence, in fact, can affect a range of features at genome, phenotype, and population levels. The impact TEs have had on evolution is multifaceted, and many aspects still remain unexplored. In this review, the epigenetic control of TEs is contextualized according to the evolution of complex living systems.
Collapse
Affiliation(s)
- Ambra Viviani
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| | - Maria Ventimiglia
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| | - Marco Fambrini
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| | - Alberto Vangelisti
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| | - Flavia Mascagni
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| | - Claudio Pugliesi
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy.
| | - Gabriele Usai
- Department of Agriculture, Food and Environment (DAFE), University of Pisa, Via del Borghetto, 80-56124, Pisa, Italy
| |
Collapse
|
23
|
Costa ZP, Varani AM, Cauz-Santos LA, Sader MA, Giopatto HA, Zirpoli B, Callot C, Cauet S, Marande W, Souza Cardoso JL, Pinheiro DG, Kitajima JP, Dornelas MC, Harand AP, Berges H, Monteiro-Vitorello CB, Carneiro Vieira ML. A genome sequence resource for the genus Passiflora, the genome of the wild diploid species Passiflora organensis. THE PLANT GENOME 2021; 14:e20117. [PMID: 34296827 DOI: 10.1002/tpg2.20117] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Accepted: 05/09/2021] [Indexed: 06/13/2023]
Abstract
The genus Passiflora comprises a large group of plants popularly known as passionfruit, much appreciated for their exotic flowers and edible fruits. The species (∼500) are morphologically variable (e.g., growth habit, size, and color of flowers) and are adapted to distinct tropical ecosystems. In this study, we generated the genome of the wild diploid species Passiflora organensis Gardner by adopting a hybrid assembly approach. Passiflora organensis has a small genome of 259 Mbp and a heterozygosity rate of 81%, consistent with its reproductive system. Most of the genome sequences could be integrated into its chromosomes with cytogenomic markers (satellite DNA) as references. The repeated sequences accounted for 58.55% of the total DNA analyzed, and the Tekay lineage was the prevalent retrotransposon. In total, 25,327 coding genes were predicted. Passiflora organensis retains 5,609 singletons and 15,671 gene families. We focused on the genes potentially involved in the locus determining self-incompatibility and the MADS-box gene family, allowing us to infer expansions and contractions within specific subfamilies. Finally, we recovered the organellar DNA. Structural rearrangements and two mitoviruses, besides relics of other mobile elements, were found in the chloroplast and mt-DNA molecules, respectively. This study presents the first draft genome assembly of a wild Passiflora species, providing a valuable sequence resource for genomic and evolutionary studies on the genus, and support for breeding cropped passionfruit species.
Collapse
Affiliation(s)
- Zirlane Portugal Costa
- Dep. de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Univ. de São Paulo, Piracicaba, 13418-900, Brazil
| | - Alessandro Mello Varani
- Dep. de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Univ. Estadual Paulista, Jaboticabal, 14884-900, Brazil
| | - Luiz Augusto Cauz-Santos
- Dep. de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Univ. de São Paulo, Piracicaba, 13418-900, Brazil
- Present address: Dep. of Botany and Biodiversity Research, Univ. of Vienna, Vienna, 1030, Austria
| | | | - Helena Augusto Giopatto
- Dep. de Biologia Vegetal, Instituto de Biologia, Univ. Estadual de Campinas, Campinas, 13083-862, Brazil
| | - Bruna Zirpoli
- Dep. de Botânica, Univ. Federal de Pernambuco, Recife, 50670-901, Brazil
| | - Caroline Callot
- Institut National de la Recherche Agronomique, Centre National de Ressources Génomique Végétales, Castanet-Tolosan, 31326, France
| | - Stephane Cauet
- Institut National de la Recherche Agronomique, Centre National de Ressources Génomique Végétales, Castanet-Tolosan, 31326, France
| | - Willian Marande
- Institut National de la Recherche Agronomique, Centre National de Ressources Génomique Végétales, Castanet-Tolosan, 31326, France
| | - Jessica Luana Souza Cardoso
- Dep. de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Univ. de São Paulo, Piracicaba, 13418-900, Brazil
| | - Daniel Guariz Pinheiro
- Dep. de Tecnologia, Faculdade de Ciências Agrárias e Veterinárias, Univ. Estadual Paulista, Jaboticabal, 14884-900, Brazil
| | | | - Marcelo Carnier Dornelas
- Dep. de Biologia Vegetal, Instituto de Biologia, Univ. Estadual de Campinas, Campinas, 13083-862, Brazil
| | | | - Helene Berges
- Institut National de la Recherche Agronomique, Centre National de Ressources Génomique Végétales, Castanet-Tolosan, 31326, France
| | | | - Maria Lucia Carneiro Vieira
- Dep. de Genética, Escola Superior de Agricultura "Luiz de Queiroz", Univ. de São Paulo, Piracicaba, 13418-900, Brazil
| |
Collapse
|
24
|
The Dynamism of Transposon Methylation for Plant Development and Stress Adaptation. Int J Mol Sci 2021; 22:ijms222111387. [PMID: 34768817 PMCID: PMC8583499 DOI: 10.3390/ijms222111387] [Citation(s) in RCA: 29] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 10/13/2021] [Accepted: 10/19/2021] [Indexed: 02/06/2023] Open
Abstract
Plant development processes are regulated by epigenetic alterations that shape nuclear structure, gene expression, and phenotypic plasticity; these alterations can provide the plant with protection from environmental stresses. During plant growth and development, these processes play a significant role in regulating gene expression to remodel chromatin structure. These epigenetic alterations are mainly regulated by transposable elements (TEs) whose abundance in plant genomes results in their interaction with genomes. Thus, TEs are the main source of epigenetic changes and form a substantial part of the plant genome. Furthermore, TEs can be activated under stress conditions, and activated elements cause mutagenic effects and substantial genetic variability. This introduces novel gene functions and structural variation in the insertion sites and primarily contributes to epigenetic modifications. Altogether, these modifications indirectly or directly provide the ability to withstand environmental stresses. In recent years, many studies have shown that TE methylation plays a major role in the evolution of the plant genome through epigenetic process that regulate gene imprinting, thereby upholding genome stability. The induced genetic rearrangements and insertions of mobile genetic elements in regions of active euchromatin contribute to genome alteration, leading to genomic stress. These TE-mediated epigenetic modifications lead to phenotypic diversity, genetic variation, and environmental stress tolerance. Thus, TE methylation is essential for plant evolution and stress adaptation, and TEs hold a relevant military position in the plant genome. High-throughput techniques have greatly advanced the understanding of TE-mediated gene expression and its associations with genome methylation and suggest that controlled mobilization of TEs could be used for crop breeding. However, development application in this area has been limited, and an integrated view of TE function and subsequent processes is lacking. In this review, we explore the enormous diversity and likely functions of the TE repertoire in adaptive evolution and discuss some recent examples of how TEs impact gene expression in plant development and stress adaptation.
Collapse
|
25
|
Targeted designing functional markers revealed the role of retrotransposon derived miRNAs as mobile epigenetic regulators in adaptation responses of pistachio. Sci Rep 2021; 11:19751. [PMID: 34611187 PMCID: PMC8492636 DOI: 10.1038/s41598-021-98402-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Accepted: 09/06/2021] [Indexed: 02/08/2023] Open
Abstract
We developed novel miRNA-based markers based on salt responsive miRNA sequences to detect polymorphisms in miRNA sequences and locations. The validation of 76 combined miRNA + miRNA and miRNA + ISSR markers in the three extreme pistachio populations led to the identification of three selected markers that could link salt tolerance phenotype to genotype and divided pistachio genotypes and Pistacia species into three clusters. This novel functional marker system, in addition to more efficient performance, has higher polymorphisms than previous miRNA-based marker systems. The functional importance of the target gene of five miRNAs in the structure of the three selected markers in regulation of different genes such as ECA2, ALA10, PFK, PHT1;4, PTR3, KUP2, GRAS, TCP, bHLH, PHD finger, PLATZ and genes involved in developmental, signaling and biosynthetic processes shows that the polymorphism associated with these selected miRNAs can make a significant phenotypic difference between salt sensitive and tolerant pistachio genotypes. The sequencing results of selected bands showed the presence of conserved miRNAs in the structure of the mitochondrial genome. Further notable findings of this study are that the sequences of PCR products of two selected markers were annotated as Gypsy and Copia retrotransposable elements. The transposition of retrotransposons with related miRNAs by increasing the number of miRNA copies and changing their location between nuclear and organellar genomes can affect the regulatory activity of these molecules. These findings show the crucial role of retrotransposon-derived miRNAs as mobile epigenetic regulators between intracellular genomes in regulating salt stress responses as well as creating new and tolerant phenotypes for adaptation to environmental conditions.
Collapse
|
26
|
Wang X, Chen Z, Murani E, D'Alessandro E, An Y, Chen C, Li K, Galeano G, Wimmers K, Song C. A 192 bp ERV fragment insertion in the first intron of porcine TLR6 may act as an enhancer associated with the increased expressions of TLR6 and TLR1. Mob DNA 2021; 12:20. [PMID: 34407874 PMCID: PMC8375133 DOI: 10.1186/s13100-021-00248-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2020] [Accepted: 07/23/2021] [Indexed: 12/20/2022] Open
Abstract
Background Toll-like receptors (TLRs) play important roles in building innate immune and inducing adaptive immune responses. Associations of the TLR genes polymorphisms with disease susceptibility, which are the basis of molecular breeding for disease resistant animals, have been reported extensively. Retrotransposon insertion polymorphisms (RIPs), as a new type of molecular markers developed recently, have great potential in population genetics and quantitative trait locus mapping. In this study, bioinformatic prediction combined with PCR-based amplification was employed to screen for RIPs in porcine TLR genes. Their population distribution was examined, and for one RIP the impact on gene activity and phenotype was further evaluated. Results Five RIPs, located at the 3' flank of TLR3, 5' flank of TLR5, intron 1 of TLR6, intron 1 of TLR7, and 3' flank of TLR8 respectively, were identified. These RIPs were detected in different breeds with an uneven distribution among them. By using the dual luciferase activity assay a 192 bp endogenous retrovirus (ERV) in the intron 1 of TLR6 was shown to act as an enhancer increasing the activities of TLR6 putative promoter and two mini-promoters. Furthermore, real-time quantitative polymerase chain reaction (qPCR) analysis revealed significant association (p < 0.05) of the ERV insertion with increased mRNA expression of TLR6, the neighboring gene TLR1, and genes downstream in the TLR signaling pathway such as MyD88 (Myeloid differentiation factor 88), Rac1 (Rac family small GTPase 1), TIRAP (TIR domain containing adaptor protein), Tollip (Toll interacting protein) as well as the inflammatory factors IL6 (Interleukin 6), IL8 (Interleukin 8), and TNFα (Tumor necrosis factor alpha) in tissues of 30 day-old piglet. In addition, serum IL6 and TNFα concentrations were also significantly upregulated by the ERV insertion (p < 0.05). Conclusions A total of five RIPs were identified in five different TLR loci. The 192 bp ERV insertion in the first intron of TLR6 was associated with higher expression of TLR6, TLR1, and several genes downstream in the signaling cascade. Thus, the ERV insertion may act as an enhancer affecting regulation of the TLR signaling pathways, and can be potentially applied in breeding of disease resistant animals. Supplementary Information The online version contains supplementary material available at 10.1186/s13100-021-00248-w.
Collapse
Affiliation(s)
- XiaoYan Wang
- College of Animal Science & Technology, Yangzhou University, Yangzhou, 225009, Jiangsu, China
| | - Zixuan Chen
- College of Animal Science & Technology, Yangzhou University, Yangzhou, 225009, Jiangsu, China
| | - Eduard Murani
- Leibniz Institute for Farm Animal Biology (FBN), 18196, Dummerstorf, Germany
| | - Enrico D'Alessandro
- Department of Veterinary Science, Unit of Animal Production, University of Messina, 98168, Messina, Italy
| | - Yalong An
- College of Animal Science & Technology, Yangzhou University, Yangzhou, 225009, Jiangsu, China
| | - Cai Chen
- College of Animal Science & Technology, Yangzhou University, Yangzhou, 225009, Jiangsu, China
| | - Kui Li
- Institute of Animal Science, Chinese Academy of Agricultural Sciences, 100081, Beijing, China
| | - Grazia Galeano
- Department of Veterinary Science, Unit of Animal Production, University of Messina, 98168, Messina, Italy
| | - Klaus Wimmers
- Leibniz Institute for Farm Animal Biology (FBN), 18196, Dummerstorf, Germany
| | - Chengyi Song
- College of Animal Science & Technology, Yangzhou University, Yangzhou, 225009, Jiangsu, China.
| |
Collapse
|
27
|
Kšiňan S, Ďurišová Ľ, Eliáš P. Genome size estimation of Cotoneaster species (Rosaceae) from the Western Carpathians. Biologia (Bratisl) 2021. [DOI: 10.1007/s11756-021-00772-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
28
|
Orozco-Arias S, Candamil-Cortés MS, Jaimes PA, Piña JS, Tabares-Soto R, Guyot R, Isaza G. K-mer-based machine learning method to classify LTR-retrotransposons in plant genomes. PeerJ 2021; 9:e11456. [PMID: 34055489 PMCID: PMC8140598 DOI: 10.7717/peerj.11456] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 04/24/2021] [Indexed: 12/15/2022] Open
Abstract
Every day more plant genomes are available in public databases and additional massive sequencing projects (i.e., that aim to sequence thousands of individuals) are formulated and released. Nevertheless, there are not enough automatic tools to analyze this large amount of genomic information. LTR retrotransposons are the most frequent repetitive sequences in plant genomes; however, their detection and classification are commonly performed using semi-automatic and time-consuming programs. Despite the availability of several bioinformatic tools that follow different approaches to detect and classify them, none of these tools can individually obtain accurate results. Here, we used Machine Learning algorithms based on k-mer counts to classify LTR retrotransposons from other genomic sequences and into lineages/families with an F1-Score of 95%, contributing to develop a free-alignment and automatic method to analyze these sequences.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | | | - Paula A Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Johan S Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Institut de Recherche pour le Développement, CIRAD, Univ. Montpellier, Montpellier, France
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| |
Collapse
|
29
|
Suvorova YM, Kamionskaya AM, Korotkov EV. Search for SINE repeats in the rice genome using correlation-based position weight matrices. BMC Bioinformatics 2021; 22:42. [PMID: 33530928 PMCID: PMC7852121 DOI: 10.1186/s12859-021-03977-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2020] [Accepted: 01/21/2021] [Indexed: 11/21/2022] Open
Abstract
Background Transposable elements (TEs) constitute a significant part of eukaryotic genomes. Short interspersed nuclear elements (SINEs) are non-autonomous TEs, which are widely represented in mammalian genomes and also found in plants. After insertion in a new position in the genome, TEs quickly accumulate mutations, which complicate their identification and annotation by modern bioinformatics methods. In this study, we searched for highly divergent SINE copies in the genome of rice (Oryza sativa subsp. japonica) using the Highly Divergent Repeat Search Method (HDRSM). Results The HDRSM considers correlations of neighboring symbols to construct position weight matrix (PWM) for a SINE family, which is then used to perform a search for new copies. In order to evaluate the accuracy of the method and compare it with the RepeatMasker program, we generated a set of SINE copies containing nucleotide substitutions and indels and inserted them into an artificial chromosome for analysis. The HDRSM showed better results both in terms of the number of identified inserted repeats and the accuracy of determining their boundaries. A search for the copies of 39 SINE families in the rice genome produced 14,030 hits; among them, 5704 were not detected by RepeatMasker. Conclusions The HDRSM could find divergent SINE copies, correctly determine their boundaries, and offer a high level of statistical significance. We also found that RepeatMasker is able to find relatively short copies of the SINE families with a higher level of similarity, while HDRSM is able to find more diverged copies. To obtain a comprehensive profile of SINE distribution in the genome, combined application of the HDRSM and RepeatMasker is recommended.
Collapse
Affiliation(s)
- Yulia M Suvorova
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia.
| | - Anastasia M Kamionskaya
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia
| | - Eugene V Korotkov
- Research Center of Biotechnology of the Russian Academy of Sciences, 60 let Oktjabrja pr-t, 7, bld. 1, Moscow, Russia
| |
Collapse
|
30
|
Orozco-Arias S, Jaimes PA, Candamil MS, Jiménez-Varón CF, Tabares-Soto R, Isaza G, Guyot R. InpactorDB: A Classified Lineage-Level Plant LTR Retrotransposon Reference Library for Free-Alignment Methods Based on Machine Learning. Genes (Basel) 2021; 12:genes12020190. [PMID: 33525408 PMCID: PMC7910972 DOI: 10.3390/genes12020190] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 01/21/2021] [Accepted: 01/22/2021] [Indexed: 12/04/2022] Open
Abstract
Long terminal repeat (LTR) retrotransposons are mobile elements that constitute the major fraction of most plant genomes. The identification and annotation of these elements via bioinformatics approaches represent a major challenge in the era of massive plant genome sequencing. In addition to their involvement in genome size variation, LTR retrotransposons are also associated with the function and structure of different chromosomal regions and can alter the function of coding regions, among others. Several sequence databases of plant LTR retrotransposons are available for public access, such as PGSB and RepetDB, or restricted access such as Repbase. Although these databases are useful to identify LTR-RTs in new genomes by similarity, the elements of these databases are not fully classified to the lineage (also called family) level. Here, we present InpactorDB, a semi-curated dataset composed of 130,439 elements from 195 plant genomes (belonging to 108 plant species) classified to the lineage level. This dataset has been used to train two deep neural networks (i.e., one fully connected and one convolutional) for the rapid classification of these elements. In lineage-level classification approaches, we obtain up to 98% performance, indicated by the F1-score, precision and recall scores.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
- Correspondence: (S.O.-A.); (R.G.)
| | - Paula A. Jaimes
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | - Mariana S. Candamil
- Department of Computer Science, Universidad Autónoma de Manizales, 170002 Manizales, Colombia; (P.A.J.); (M.S.C.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, 170002 Manizales, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, 170002 Manizales, Colombia;
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, 34394 Montpellier, France
- Correspondence: (S.O.-A.); (R.G.)
| |
Collapse
|
31
|
Fambrini M, Usai G, Vangelisti A, Mascagni F, Pugliesi C. The plastic genome: The impact of transposable elements on gene functionality and genomic structural variations. Genesis 2020; 58:e23399. [DOI: 10.1002/dvg.23399] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2020] [Revised: 11/07/2020] [Accepted: 11/10/2020] [Indexed: 12/15/2022]
Affiliation(s)
- Marco Fambrini
- Department of Agriculture, Food and Environment (DAFE) University of Pisa Pisa Italy
| | - Gabriele Usai
- Department of Agriculture, Food and Environment (DAFE) University of Pisa Pisa Italy
| | - Alberto Vangelisti
- Department of Agriculture, Food and Environment (DAFE) University of Pisa Pisa Italy
| | - Flavia Mascagni
- Department of Agriculture, Food and Environment (DAFE) University of Pisa Pisa Italy
| | - Claudio Pugliesi
- Department of Agriculture, Food and Environment (DAFE) University of Pisa Pisa Italy
| |
Collapse
|
32
|
Orozco-Arias S, Tobon-Orozco N, Piña JS, Jiménez-Varón CF, Tabares-Soto R, Guyot R. TIP_finder: An HPC Software to Detect Transposable Element Insertion Polymorphisms in Large Genomic Datasets. BIOLOGY 2020; 9:biology9090281. [PMID: 32917036 PMCID: PMC7563458 DOI: 10.3390/biology9090281] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2020] [Revised: 09/01/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022]
Abstract
Transposable elements (TEs) are non-static genomic units capable of moving indistinctly from one chromosomal location to another. Their insertion polymorphisms may cause beneficial mutations, such as the creation of new gene function, or deleterious in eukaryotes, e.g., different types of cancer in humans. A particular type of TE called LTR-retrotransposons comprises almost 8% of the human genome. Among LTR retrotransposons, human endogenous retroviruses (HERVs) bear structural and functional similarities to retroviruses. Several tools allow the detection of transposon insertion polymorphisms (TIPs) but fail to efficiently analyze large genomes or large datasets. Here, we developed a computational tool, named TIP_finder, able to detect mobile element insertions in very large genomes, through high-performance computing (HPC) and parallel programming, using the inference of discordant read pair analysis. TIP_finder inputs are (i) short pair reads such as those obtained by Illumina, (ii) a chromosome-level reference genome sequence, and (iii) a database of consensus TE sequences. The HPC strategy we propose adds scalability and provides a useful tool to analyze huge genomic datasets in a decent running time. TIP_finder accelerates the detection of transposon insertion polymorphisms (TIPs) by up to 55 times in breast cancer datasets and 46 times in cancer-free datasets compared to the fastest available algorithms. TIP_finder applies a validated strategy to find TIPs, accelerates the process through HPC, and addresses the issues of runtime for large-scale analyses in the post-genomic era.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
- Department of Systems and Informatics, Universidad de Caldas, Manizales 170002, Colombia
- Correspondence: (S.O.-A.); (R.G.)
| | - Nicolas Tobon-Orozco
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | - Johan S. Piña
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales 170002, Colombia; (N.T.-O.); (J.S.P.)
| | | | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
| | - Romain Guyot
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales 170002, Colombia;
- Institut de Recherche pour le Développement (IRD), CIRAD, Université de Montpellier, 34394 Montpellier, France
- Correspondence: (S.O.-A.); (R.G.)
| |
Collapse
|
33
|
Bellinger MR, Paudel R, Starnes S, Kambic L, Kantar MB, Wolfgruber T, Lamour K, Geib S, Sim S, Miyasaka SC, Helmkampf M, Shintaku M. Taro Genome Assembly and Linkage Map Reveal QTLs for Resistance to Taro Leaf Blight. G3 (BETHESDA, MD.) 2020; 10:2763-2775. [PMID: 32546503 PMCID: PMC7407455 DOI: 10.1534/g3.120.401367] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/09/2020] [Accepted: 06/08/2020] [Indexed: 02/06/2023]
Abstract
Taro (Colocasia esculenta) is a food staple widely cultivated in the humid tropics of Asia, Africa, Pacific and the Caribbean. One of the greatest threats to taro production is Taro Leaf Blight caused by the oomycete pathogen Phytophthora colocasiae Here we describe a de novo taro genome assembly and use it to analyze sequence data from a Taro Leaf Blight resistant mapping population. The genome was assembled from linked-read sequences (10x Genomics; ∼60x coverage) and gap-filled and scaffolded with contigs assembled from Oxford Nanopore Technology long-reads and linkage map results. The haploid assembly was 2.45 Gb total, with a maximum contig length of 38 Mb and scaffold N50 of 317,420 bp. A comparison of family-level (Araceae) genome features reveals the repeat content of taro to be 82%, >3.5x greater than in great duckweed (Spirodela polyrhiza), 23%. Both genomes recovered a similar percent of Benchmarking Universal Single-copy Orthologs, 80% and 84%, based on a 3,236 gene database for monocot plants. A greater number of nucleotide-binding leucine-rich repeat disease resistance genes were present in genomes of taro than the duckweed, ∼391 vs. ∼70 (∼182 and ∼46 complete). The mapping population data revealed 16 major linkage groups with 520 markers, and 10 quantitative trait loci (QTL) significantly associated with Taro Leaf Blight disease resistance. The genome sequence of taro enhances our understanding of resistance to TLB, and provides markers that may accelerate breeding programs. This genome project may provide a template for developing genomic resources in other understudied plant species.
Collapse
Affiliation(s)
| | - Roshan Paudel
- University of Hawaii at Manoa, Department of Tropical Plant and Soil Sciences, Honolulu, Hawaii
| | - Steven Starnes
- University of Hawaii at Hilo, College of Agriculture, Forestry and Natural Resource Management, Hilo, Hawaii
| | - Lukas Kambic
- University of Hawaii at Hilo, College of Agriculture, Forestry and Natural Resource Management, Hilo, Hawaii
| | - Michael B Kantar
- University of Hawaii at Manoa, Department of Tropical Plant and Soil Sciences, Honolulu, Hawaii
| | - Thomas Wolfgruber
- University of Hawaii at Manoa, Department of Tropical Plant and Soil Sciences, Honolulu, Hawaii
| | - Kurt Lamour
- University of Tennessee at Knoxville, Department of Entomology and Plant Pathology, Knoxville, Tennessee
| | - Scott Geib
- United States Department of Agriculture-Agricultural Research Service, Hilo, Hawaii
| | - Sheina Sim
- United States Department of Agriculture-Agricultural Research Service, Hilo, Hawaii
| | - Susan C Miyasaka
- University of Hawaii at Manoa, Department of Tropical Plant and Soil Sciences, Honolulu, Hawaii
| | - Martin Helmkampf
- University of Hawaii at Hilo, Department of Biology, Hilo, Hawaii
| | - Michael Shintaku
- University of Hawaii at Hilo, College of Agriculture, Forestry and Natural Resource Management, Hilo, Hawaii,
| |
Collapse
|
34
|
Measuring Performance Metrics of Machine Learning Algorithms for Detecting and Classifying Transposable Elements. Processes (Basel) 2020. [DOI: 10.3390/pr8060638] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Because of the promising results obtained by machine learning (ML) approaches in several fields, every day is more common, the utilization of ML to solve problems in bioinformatics. In genomics, a current issue is to detect and classify transposable elements (TEs) because of the tedious tasks involved in bioinformatics methods. Thus, ML was recently evaluated for TE datasets, demonstrating better results than bioinformatics applications. A crucial step for ML approaches is the selection of metrics that measure the realistic performance of algorithms. Each metric has specific characteristics and measures properties that may be different from the predicted results. Although the most commonly used way to compare measures is by using empirical analysis, a non-result-based methodology has been proposed, called measure invariance properties. These properties are calculated on the basis of whether a given measure changes its value under certain modifications in the confusion matrix, giving comparative parameters independent of the datasets. Measure invariance properties make metrics more or less informative, particularly on unbalanced, monomodal, or multimodal negative class datasets and for real or simulated datasets. Although several studies applied ML to detect and classify TEs, there are no works evaluating performance metrics in TE tasks. Here, we analyzed 26 different metrics utilized in binary, multiclass, and hierarchical classifications, through bibliographic sources, and their invariance properties. Then, we corroborated our findings utilizing freely available TE datasets and commonly used ML algorithms. Based on our analysis, the most suitable metrics for TE tasks must be stable, even using highly unbalanced datasets, multimodal negative class, and training datasets with errors or outliers. Based on these parameters, we conclude that the F1-score and the area under the precision-recall curve are the most informative metrics since they are calculated based on other metrics, providing insight into the development of an ML application.
Collapse
|
35
|
de Assis R, Baba VY, Cintra LA, Gonçalves LSA, Rodrigues R, Vanzela ALL. Genome relationships and LTR-retrotransposon diversity in three cultivated Capsicum L. (Solanaceae) species. BMC Genomics 2020; 21:237. [PMID: 32183698 PMCID: PMC7076952 DOI: 10.1186/s12864-020-6618-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2019] [Accepted: 02/24/2020] [Indexed: 01/08/2023] Open
Abstract
Background Plant genomes are rich in repetitive sequences, and transposable elements (TEs) are the most accumulated of them. This mobile fraction can be distinguished as Class I (retrotransposons) and Class II (transposons). Retrotransposons that are transposed using an intermediate RNA and that accumulate in a “copy-and-paste” manner were screened in three genomes of peppers (Solanaceae). The present study aimed to understand the genome relationships among Capsicum annuum, C. chinense, and C. baccatum, based on a comparative analysis of the function, diversity and chromosome distribution of TE lineages in the Capsicum karyotypes. Due to the great commercial importance of pepper in natura, as a spice or as an ornamental plant, these genomes have been widely sequenced, and all of the assemblies are available in the SolGenomics group. These sequences were used to compare all repetitive fractions from a cytogenomic point of view. Results The qualification and quantification of LTR-retrotransposons (LTR-RT) families were contrasted with molecular cytogenetic data, and the results showed a strong genome similarity between C. annuum and C. chinense as compared to C. baccatum. The Gypsy superfamily is more abundant than Copia, especially for Tekay/Del lineage members, including a high representation in C. annuum and C. chinense. On the other hand, C. baccatum accumulates more Athila/Tat sequences. The FISH results showed retrotransposons differentially scattered along chromosomes, except for CRM lineage sequences, which mainly have a proximal accumulation associated with heterochromatin bands. Conclusions The results confirm a close genomic relationship between C. annuum and C. chinense in comparison to C. baccatum. Centromeric GC-rich bands may be associated with the accumulation regions of CRM elements, whereas terminal and subterminal AT- and GC-rich bands do not correspond to the accumulation of the retrotransposons in the three Capsicum species tested.
Collapse
Affiliation(s)
- Rafael de Assis
- Laboratório de Citogenética e Diversidade Vegetal, Universidade Estadual de Londrina, 86057-970, Londrina, Paraná, Brazil
| | - Viviane Yumi Baba
- Departamento de Agronomia, Universidade Estadual de Londrina, 86057-970, Londrina, Paraná, Brazil
| | - Leonardo Adabo Cintra
- Laboratório de Citogenética e Diversidade Vegetal, Universidade Estadual de Londrina, 86057-970, Londrina, Paraná, Brazil
| | | | - Rosana Rodrigues
- Laboratório de Melhoramento Genético Vegetal, Universidade Estadual do Norte Fluminense Darcy Ribeiro, Campos dos Goytacazes, Rio de Janeiro, 28013-602, Brazil
| | - André Luís Laforga Vanzela
- Laboratório de Citogenética e Diversidade Vegetal, Universidade Estadual de Londrina, 86057-970, Londrina, Paraná, Brazil.
| |
Collapse
|
36
|
Tabares-Soto R, Orozco-Arias S, Romero-Cano V, Segovia Bucheli V, Rodríguez-Sotelo JL, Jiménez-Varón CF. A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data. PeerJ Comput Sci 2020; 6:e270. [PMID: 33816921 PMCID: PMC7924492 DOI: 10.7717/peerj-cs.270] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2019] [Accepted: 03/08/2020] [Indexed: 05/06/2023]
Abstract
Cancer classification is a topic of major interest in medicine since it allows accurate and efficient diagnosis and facilitates a successful outcome in medical treatments. Previous studies have classified human tumors using a large-scale RNA profiling and supervised Machine Learning (ML) algorithms to construct a molecular-based classification of carcinoma cells from breast, bladder, adenocarcinoma, colorectal, gastro esophagus, kidney, liver, lung, ovarian, pancreas, and prostate tumors. These datasets are collectively known as the 11_tumor database, although this database has been used in several works in the ML field, no comparative studies of different algorithms can be found in the literature. On the other hand, advances in both hardware and software technologies have fostered considerable improvements in the precision of solutions that use ML, such as Deep Learning (DL). In this study, we compare the most widely used algorithms in classical ML and DL to classify the tumors described in the 11_tumor database. We obtained tumor identification accuracies between 90.6% (Logistic Regression) and 94.43% (Convolutional Neural Networks) using k-fold cross-validation. Also, we show how a tuning process may or may not significantly improve algorithms' accuracies. Our results demonstrate an efficient and accurate classification method based on gene expression (microarray data) and ML/DL algorithms, which facilitates tumor type prediction in a multi-cancer-type scenario.
Collapse
Affiliation(s)
- Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
- Department of Systems and informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Victor Romero-Cano
- Department of Automatics and Electronics, Universidad Autónoma de Occidente, Cali, Valle del Cauca, Colombia
| | - Vanesa Segovia Bucheli
- İzmir International Biomedicine and Genome Institute, Dokuz Eylül University, Izmir, Turkey
| | - José Luis Rodríguez-Sotelo
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | | |
Collapse
|
37
|
Development and Deployment of High-Throughput Retrotransposon-Based Markers Reveal Genetic Diversity and Population Structure of Asian Bamboo. FORESTS 2019. [DOI: 10.3390/f11010031] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Bamboo, a non-timber grass species, known for exceptionally fast growth is a commercially viable crop. Long terminal repeat (LTR) retrotransposons, the main class I mobile genetic elements in plant genomes, are highly abundant (46%) in bamboo, contributing to genome diversity. They play significant roles in the regulation of gene expression, chromosome size and structure as well as in genome integrity. Due to their random insertion behavior, interspaces of retrotransposons can vary significantly among bamboo genotypes. Capitalizing this feature, inter-retrotransposon amplified polymorphism (IRAP) is a high-throughput marker system to study the genetic diversity of plant species. To date, there are no transposon based markers reported from the bamboo genome and particularly using IRAP markers on genetic diversity. Phyllostachys genus of Asian bamboo is the largest of the Bambusoideae subfamily, with great economic importance. We report structure-based analysis of bamboo genome for the LTR-retrotransposon superfamilies, Ty3-gypsy and Ty1-copia, which revealed a total of 98,850 retrotransposons with intact LTR sequences at both the ends. Grouped into 64,281 clusters/scaffold using CD-HIT-EST software, only 13 clusters of retroelements were found with more than 30 LTR sequences and with at least one copy having all intact protein domains such as gag and polyprotein. A total of 16 IRAP primers were synthesized, based on the high copy numbers of conserved LTR sequences. A study using these IRAP markers on genetic diversity and population structure of 58 Asian bamboo accessions belonging to the genus Phyllostachys revealed 3340 amplicons with an average of 98% polymorphism. The bamboo accessions were collected from nine different provinces of China, as well as from Italy and America. A three phased approach using hierarchical clustering, principal components and a model based population structure divided the bamboo accessions into four sub-populations, PhSP1, PhSP2, PhSP3 and PhSP4. All the three analyses produced significant sub-population wise consensus. Further, all the sub-populations revealed admixture of alleles. The analysis of molecular variance (AMOVA) among the sub-populations revealed high intra-population genetic variation (75%) than inter-population. The results suggest that Phyllostachys bamboos are not well evolutionarily diversified, although geographic speciation could have occurred at a limited level. This study highlights the usability of IRAP markers in determining the inter-species variability of Asian bamboos.
Collapse
|
38
|
Orozco-Arias S, Isaza G, Guyot R, Tabares-Soto R. A systematic review of the application of machine learning in the detection and classification of transposable elements. PeerJ 2019; 7:e8311. [PMID: 31976169 PMCID: PMC6967008 DOI: 10.7717/peerj.8311] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2019] [Accepted: 11/28/2019] [Indexed: 12/16/2022] Open
Abstract
Background Transposable elements (TEs) constitute the most common repeated sequences in eukaryotic genomes. Recent studies demonstrated their deep impact on species diversity, adaptation to the environment and diseases. Although there are many conventional bioinformatics algorithms for detecting and classifying TEs, none have achieved reliable results on different types of TEs. Machine learning (ML) techniques can automatically extract hidden patterns and novel information from labeled or non-labeled data and have been applied to solving several scientific problems. Methodology We followed the Systematic Literature Review (SLR) process, applying the six stages of the review protocol from it, but added a previous stage, which aims to detect the need for a review. Then search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Results Several ML approaches have already been tested on other bioinformatics problems with promising results, yet there are few algorithms and architectures available in literature focused specifically on TEs, despite representing the majority of the nuclear DNA of many organisms. Only 35 articles were found and categorized as relevant in TE or related fields. Conclusions ML is a powerful tool that can be used to address many problems. Although ML techniques have been used widely in other biological tasks, their utilization in TE analyses is still limited. Following the SLR, it was possible to notice that the use of ML for TE analyses (detection and classification) is an open problem, and this new field of research is growing in interest.
Collapse
Affiliation(s)
- Simon Orozco-Arias
- Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.,Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Gustavo Isaza
- Department of Systems and Informatics, Universidad de Caldas, Manizales, Caldas, Colombia
| | - Romain Guyot
- Institut de Recherche pour le Développement, CIRAD, University of Montpellier, Montpellier, France.,Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| | - Reinel Tabares-Soto
- Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia
| |
Collapse
|