1
|
Žárský V, Karnkowska A, Boscaro V, Trznadel M, Whelan TA, Hiltunen-Thorén M, Onut-Brännström I, Abbott CL, Fast NM, Burki F, Keeling PJ. Contrasting outcomes of genome reduction in mikrocytids and microsporidians. BMC Biol 2023; 21:137. [PMID: 37280585 DOI: 10.1186/s12915-023-01635-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Accepted: 05/26/2023] [Indexed: 06/08/2023] Open
Abstract
BACKGROUND Intracellular symbionts often undergo genome reduction, losing both coding and non-coding DNA in a process that ultimately produces small, gene-dense genomes with few genes. Among eukaryotes, an extreme example is found in microsporidians, which are anaerobic, obligate intracellular parasites related to fungi that have the smallest nuclear genomes known (except for the relic nucleomorphs of some secondary plastids). Mikrocytids are superficially similar to microsporidians: they are also small, reduced, obligate parasites; however, as they belong to a very different branch of the tree of eukaryotes, the rhizarians, such similarities must have evolved in parallel. Since little genomic data are available from mikrocytids, we assembled a draft genome of the type species, Mikrocytos mackini, and compared the genomic architecture and content of microsporidians and mikrocytids to identify common characteristics of reduction and possible convergent evolution. RESULTS At the coarsest level, the genome of M. mackini does not exhibit signs of extreme genome reduction; at 49.7 Mbp with 14,372 genes, the assembly is much larger and gene-rich than those of microsporidians. However, much of the genomic sequence and most (8075) of the protein-coding genes code for transposons, and may not contribute much of functional relevance to the parasite. Indeed, the energy and carbon metabolism of M. mackini share several similarities with those of microsporidians. Overall, the predicted proteome involved in cellular functions is quite reduced and gene sequences are extremely divergent. Microsporidians and mikrocytids also share highly reduced spliceosomes that have retained a strikingly similar subset of proteins despite having reduced independently. In contrast, the spliceosomal introns in mikrocytids are very different from those of microsporidians in that they are numerous, conserved in sequence, and constrained to an exceptionally narrow size range (all 16 or 17 nucleotides long) at the shortest extreme of known intron lengths. CONCLUSIONS Nuclear genome reduction has taken place many times and has proceeded along different routes in different lineages. Mikrocytids show a mix of similarities and differences with other extreme cases, including uncoupling the actual size of a genome with its functional reduction.
Collapse
Affiliation(s)
- Vojtečh Žárský
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Anna Karnkowska
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
- Institute of Evolutionary Biology, Faculty of Biology, University of Warsaw, 02-089, Warsaw, Poland
| | - Vittorio Boscaro
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada.
| | - Morelia Trznadel
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Thomas A Whelan
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Markus Hiltunen-Thorén
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
- Department of Ecology, Environment and Plant Sciences, Stockholm University, SE-106 91, Stockholm, Sweden
| | - Ioana Onut-Brännström
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
- Department of Ecology and Genetics, Uppsala University, 752 36, Uppsala, Sweden
- Natural History Museum, University of Oslo, 0562, Oslo, Norway
| | - Cathryn L Abbott
- Pacific Biological Station, Fisheries and Oceans Canada, Nanaimo, BC, V9T 6N7, Canada
| | - Naomi M Fast
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada
| | - Fabien Burki
- Department of Organismal Biology, Uppsala University, Norbyv. 18D, 752 36, Uppsala, Sweden
| | - Patrick J Keeling
- Department of Botany, University of British Columbia, V6T 1Z4, Vancouver, 3529-6270 University Boulevard, BC, Canada.
| |
Collapse
|
2
|
Arnaiz O, Van Dijk E, Bétermier M, Lhuillier-Akakpo M, de Vanssay A, Duharcourt S, Sallet E, Gouzy J, Sperling L. Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression. BMC Genomics 2017; 18:483. [PMID: 28651633 PMCID: PMC5485702 DOI: 10.1186/s12864-017-3887-z] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2017] [Accepted: 06/21/2017] [Indexed: 12/22/2022] Open
Abstract
Background The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. Results We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. Conclusions We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC). Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-3887-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Olivier Arnaiz
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette CEDEX, France
| | - Erwin Van Dijk
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette CEDEX, France
| | - Mireille Bétermier
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette CEDEX, France
| | - Maoussi Lhuillier-Akakpo
- Institut Jacques Monod, CNRS, UMR 7592, Université Paris Diderot, Sorbonne Paris Cité, F-75205, Paris, France.,Current address: IRCM, CEA, INSERM UMR 967, Université Paris Diderot, Université Paris-Saclay, 92265, Fontenay-aux-Roses CEDEX, France
| | - Augustin de Vanssay
- Institut Jacques Monod, CNRS, UMR 7592, Université Paris Diderot, Sorbonne Paris Cité, F-75205, Paris, France
| | - Sandra Duharcourt
- Institut Jacques Monod, CNRS, UMR 7592, Université Paris Diderot, Sorbonne Paris Cité, F-75205, Paris, France
| | - Erika Sallet
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
| | - Jérôme Gouzy
- LIPM, Université de Toulouse, INRA, CNRS, Castanet-Tolosan, France
| | - Linda Sperling
- Institute for Integrative Biology of the Cell (I2BC), CNRS, CEA, Univ. Paris-Sud, Université Paris-Saclay, 91198, Gif-sur-Yvette CEDEX, France.
| |
Collapse
|
3
|
Lozada-Chávez I, Stadler PF, Prohaska SJ. "Hypothesis for the modern RNA world": a pervasive non-coding RNA-based genetic regulation is a prerequisite for the emergence of multicellular complexity. ORIGINS LIFE EVOL B 2011; 41:587-607. [PMID: 22322874 DOI: 10.1007/s11084-011-9262-1] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2011] [Accepted: 12/12/2011] [Indexed: 02/06/2023]
Abstract
The transitions to multicellularity mark the most pivotal and distinctive events in life's history on Earth. Although several transitions to "simple" multicellularity (SM) have been recorded in both bacterial and eukaryotic clades, transitions to complex multicellularity (CM) have only happened a few times in eukaryotes. A large number of cell types (associated with large body size), increased energy consumption per gene expressed, and an increment of non-protein-coding DNA positively correlate with CM. These three factors can indeed be understood as the causes and consequences of the regulation of gene expression. Here, we discuss how a vast expansion of non-protein-coding RNA (ncRNAs) regulators rather than large numbers of novel protein regulators can easily contribute to the emergence of CM. We also propose that the evolutionary advantage of RNA-based gene regulation derives from the robustness of the RNA structure that makes it easy to combine genetic drift with functional exploration. We describe a model which aims to explain how the evolutionary dynamic of ncRNAs becomes dominated by the accessibility of advantageous mutations to innovate regulation in complex multicellular organisms. The information and models discussed here outline the hypothesis that pervasive ncRNA-based regulatory systems, only capable of being expanded and explored in higher eukaryotes, are prerequisite to complex multicellularity. Thereby, regulatory RNA molecules in Eukarya have allowed intensification of morphological complexity by stabilizing critical phenotypes and controlling developmental precision. Although the origin of RNA on early Earth is still controversial, it is becoming clear that once RNA emerged into a protocellular system, its relevance within the evolution of biological systems has been greater than we previously thought.
Collapse
Affiliation(s)
- Irma Lozada-Chávez
- Computational EvoDevo Group, University of Leipzig, Härtelstrasse 16-18, 04107, Leipzig, Germany.
| | | | | |
Collapse
|
4
|
Andersen KL, Nielsen H. Experimental identification and analysis of macronuclear non-coding RNAs from the ciliate Tetrahymena thermophila. Nucleic Acids Res 2011; 40:1267-81. [PMID: 21967850 PMCID: PMC3273799 DOI: 10.1093/nar/gkr792] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The ciliate Tetrahymena thermophila is an important eukaryotic model organism that has been used in pioneering studies of general phenomena, such as ribozymes, telomeres, chromatin structure and genome reorganization. Recent work has shown that Tetrahymena has many classes of small RNA molecules expressed during vegetative growth or sexual reorganization. In order to get an overview of medium-sized (40-500 nt) RNAs expressed from the Tetrahymena genome, we created a size-fractionated cDNA library from macronuclear RNA and analyzed 80 RNAs, most of which were previously unknown. The most abundant class was small nucleolar RNAs (snoRNAs), many of which are formed by an unusual maturation pathway. The modifications guided by the snoRNAs were analyzed bioinformatically and experimentally and many Tetrahymena-specific modifications were found, including several in an essential, but not conserved domain of ribosomal RNA. Of particular interest, we detected two methylations in the 5'-end of U6 small nuclear RNA (snRNA) that has an unusual structure in Tetrahymena. Further, we found a candidate for the first U8 outside metazoans, and an unusual U14 candidate. In addition, a number of candidates for new non-coding RNAs were characterized by expression analysis at different growth conditions.
Collapse
Affiliation(s)
- Kasper L Andersen
- Department of Cellular and Molecular Medicine and Center for Non-coding RNA in Technology and Health, The Panum Institute, University of Copenhagen, 3 Blegdamsvej, DK-2200N, Denmark
| | | |
Collapse
|
5
|
Cruz JA, Westhof E. Identification and annotation of noncoding RNAs in Saccharomycotina. C R Biol 2011; 334:671-8. [PMID: 21819949 DOI: 10.1016/j.crvi.2011.05.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Accepted: 03/23/2011] [Indexed: 11/16/2022]
Abstract
The importance of ncRNAs in biological processes makes their annotation an essential component of any genome-sequencing project. The identification of ncRNAs in genomes requires specific expertise and tools that are distinct from the traditional protein gene annotation tools. Here, we describe the assembly of two automatic annotation pipelines, integrating publicly available tools, for homology and de novo ncRNA search in genomes. We applied both pipelines to 10 Saccharomycotina genomes and were able to find and annotate 693 ncRNA genes, corresponding to 81% of the ncRNAs expected for those genomes assuming the number of ncRNAs in Saccharomyces cerevisiae (86) as a reference. Several new ncRNAs, not yet known in the Saccharomycotina clade, were also detected. The results show the feasibility of automatic search for ncRNAs in full genomes and the utility of such approaches in large multi-genome sequencing and annotation projects.
Collapse
Affiliation(s)
- José Almeida Cruz
- Architecture et Réactivité de l'ARN, Institut de Biologie Moléculaire et Cellulaire du CNRS, Université de Strasbourg, 15 rue René-Descartes, 67084 Strasbourg cedex, France.
| | | |
Collapse
|
6
|
Sperling L. Remembrance of things past retrieved from the Paramecium genome. Res Microbiol 2011; 162:587-97. [DOI: 10.1016/j.resmic.2011.02.012] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2011] [Accepted: 02/17/2011] [Indexed: 11/30/2022]
|
7
|
Jung S, Swart EC, Minx PJ, Magrini V, Mardis ER, Landweber LF, Eddy SR. Exploiting Oxytricha trifallax nanochromosomes to screen for non-coding RNA genes. Nucleic Acids Res 2011; 39:7529-47. [PMID: 21715380 PMCID: PMC3177221 DOI: 10.1093/nar/gkr501] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
We took advantage of the unusual genomic organization of the ciliate Oxytricha trifallax to screen for eukaryotic non-coding RNA (ncRNA) genes. Ciliates have two types of nuclei: a germ line micronucleus that is usually transcriptionally inactive, and a somatic macronucleus that contains a reduced, fragmented and rearranged genome that expresses all genes required for growth and asexual reproduction. In some ciliates including Oxytricha, the macronuclear genome is particularly extreme, consisting of thousands of tiny 'nanochromosomes', each of which usually contains only a single gene. Because the organism itself identifies and isolates most of its genes on single-gene nanochromosomes, nanochromosome structure could facilitate the discovery of unusual genes or gene classes, such as ncRNA genes. Using a draft Oxytricha genome assembly and a custom-written protein-coding genefinding program, we identified a subset of nanochromosomes that lack any detectable protein-coding gene, thereby strongly enriching for nanochromosomes that carry ncRNA genes. We found only a small proportion of non-coding nanochromosomes, suggesting that Oxytricha has few independent ncRNA genes besides homologs of already known RNAs. Other than new members of known ncRNA classes including C/D and H/ACA snoRNAs, our screen identified one new family of small RNA genes, named the Arisong RNAs, which share some of the features of small nuclear RNAs.
Collapse
Affiliation(s)
- Seolkyoung Jung
- Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn VA 20147, USA
| | | | | | | | | | | | | |
Collapse
|
8
|
Chen CJ, Zhou H, Chen YQ, Qu LH, Gautheret D. Plant noncoding RNA gene discovery by "single-genome comparative genomics". RNA (NEW YORK, N.Y.) 2011; 17:390-400. [PMID: 21220549 PMCID: PMC3039139 DOI: 10.1261/rna.2426511] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
Plant genomes have undergone multiple rounds of duplications that contributed massively to the growth of gene families. The structure of resulting families has been studied in depth for protein-coding genes. However, little is known about the impact of duplications on noncoding RNA (ncRNA) genes. Here we perform a systematic analysis of duplicated regions in the rice genome in search of such ncRNA repeats. We observe that, just like their protein counterparts, most ncRNA genes have undergone multiple duplications that left visible sequence conservation footprints. The extent of ncRNA gene duplication in plants is such that these sequence footprints can be exploited for the discovery of novel ncRNA gene families on a large scale. We developed an SVM model that is able to retrieve likely ncRNA candidates among the 100,000+ repeat families in the rice genome, with a reasonably low false-positive discovery rate. Among the nearly 4000 ncRNA families predicted by this means, only 90 correspond to putative snoRNA or miRNA families. About half of the remaining families are classified as structured RNAs. New candidate ncRNAs are particularly enriched in UTR and intronic regions. Interestingly, 89% of the putative ncRNA families do not produce a detectable signal when their sequences are compared to another grass genome such as maize. Our results show that a large fraction of rice ncRNA genes are present in multiple copies and are species-specific or of recent origin. Intragenome comparison is a unique and potent source for the computational annotation of this major class of ncRNA.
Collapse
Affiliation(s)
- Chong-Jian Chen
- Institut de Génétique et Microbiologie, CNRS/UMR 8621, Université Paris Sud, Orsay, France
| | | | | | | | | |
Collapse
|
9
|
Doniger T, Katz R, Wachtel C, Michaeli S, Unger R. A comparative genome-wide study of ncRNAs in trypanosomatids. BMC Genomics 2010; 11:615. [PMID: 21050447 PMCID: PMC3091756 DOI: 10.1186/1471-2164-11-615] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2010] [Accepted: 11/04/2010] [Indexed: 01/18/2023] Open
Abstract
Background Recent studies have provided extensive evidence for multitudes of non-coding RNA (ncRNA) transcripts in a wide range of eukaryotic genomes. ncRNAs are emerging as key players in multiple layers of cellular regulation. With the availability of many whole genome sequences, comparative analysis has become a powerful tool to identify ncRNA molecules. In this study, we performed a systematic genome-wide in silico screen to search for novel small ncRNAs in the genome of Trypanosoma brucei using techniques of comparative genomics. Results In this study, we identified by comparative genomics, and validated by experimental analysis several novel ncRNAs that are conserved across multiple trypanosomatid genomes. When tested on known ncRNAs, our procedure was capable of finding almost half of the known repertoire through homology over six genomes, and about two-thirds of the known sequences were found in at least four genomes. After filtering, 72 conserved unannotated sequences in at least four genomes were found, 29 of which, ranging in size from 30 to 392 nts, were conserved in all six genomes. Fifty of the 72 candidates in the final set were chosen for experimental validation. Eighteen of the 50 (36%) were shown to be expressed, and for 11 of them a distinct expression product was detected, suggesting that they are short ncRNAs. Using functional experimental assays, five of the candidates were shown to be novel H/ACA and C/D snoRNAs; these included three sequences that appear as singletons in the genome, unlike previously identified snoRNA molecules that are found in clusters. The other candidates appear to be novel ncRNA molecules, and their function is, as yet, unknown. Conclusions Using comparative genomic techniques, we predicted 72 sequences as ncRNA candidates in T. brucei. The expression of 50 candidates was tested in laboratory experiments. This resulted in the discovery of 11 novel short ncRNAs in procyclic stage T. brucei, which have homologues in the other trypansomatids. A few of these molecules are snoRNAs, but most of them are novel ncRNA molecules. Based on this study, our analysis suggests that the total number of ncRNAs in trypanosomatids is in the range of several hundred.
Collapse
Affiliation(s)
- Tirza Doniger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan 52900, Israel
| | | | | | | | | |
Collapse
|
10
|
Arnaiz O, Sperling L. ParameciumDB in 2011: new tools and new data for functional and comparative genomics of the model ciliate Paramecium tetraurelia. Nucleic Acids Res 2010; 39:D632-6. [PMID: 20952411 PMCID: PMC3013783 DOI: 10.1093/nar/gkq918] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
ParameciumDB is a community model organism database built with the GMOD toolkit to integrate the genome and biology of the ciliate Paramecium tetraurelia. Over the last four years, post-genomic data from proteome and transcriptome studies has been incorporated along with predicted orthologs in 33 species, annotations from the community and publications from the scientific literature. Available tools include BioMart for complex queries, GBrowse2 for genome browsing, the Apollo genome editor for expert curation of gene models, a Blast server, a motif finder, and a wiki for protocols, nomenclature guidelines and other documentation. In-house tools have been developed for ontology browsing and evaluation of off-target RNAi matches. Now ready for next-generation deep sequencing data and the genomes of other Paramecium species, this open-access resource is available at http://paramecium.cgm.cnrs-gif.fr.
Collapse
Affiliation(s)
- Olivier Arnaiz
- Centre de Génétique Moléculaire, CNRS FRE3144, Avenue de la Terrasse, 91198 Gif-sur-Yvette, France
| | | |
Collapse
|