1
|
Hoffmann A, Fallmann J, Vilardo E, Mörl M, Stadler PF, Amman F. Accurate mapping of tRNA reads. Bioinformatics 2019; 34:1116-1124. [PMID: 29228294 DOI: 10.1093/bioinformatics/btx756] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Accepted: 12/07/2017] [Indexed: 11/12/2022] Open
Abstract
Motivation Many repetitive DNA elements are transcribed at appreciable expression levels. Mapping the corresponding RNA sequencing reads back to a reference genome is notoriously difficult and error-prone task, however. This is in particular true if chemical modifications introduce systematic mismatches, while at the same time the genomic loci are only approximately identical, as in the case of tRNAs. Results We therefore developed a dedicated mapping strategy to handle RNA-seq reads that map to tRNAs relying on a modified target genome in which known tRNA loci are masked and instead intronless tRNA precursor sequences are appended as artificial 'chromosomes'. In a first pass, reads that overlap the boundaries of mature tRNAs are extracted. In the second pass, the remaining reads are mapped to a tRNA-masked target that is augmented by representative mature tRNA sequences. Using both simulated and real life data we show that our best-practice workflow removes most of the mapping artefacts introduced by simpler mapping schemes and makes it possible to reliably identify many of chemical tRNA modifications in generic small RNA-seq data. Using simulated data the FDR is only 2%. We find compelling evidence for tissue specific differences of tRNA modification patterns. Availability and implementation The workflow is available both as a bash script and as a Galaxy workflow from https://github.com/AnneHoffmann/tRNA-read-mapping. Contact fabian@tbi.univie.ac.at. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anne Hoffmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, D-04107 Leipzig, Germany
| | - Jörg Fallmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, D-04107 Leipzig, Germany
| | - Elisa Vilardo
- Center for Anatomy and Cell Biology, Medical University of Vienna, Austria
| | - Mario Mörl
- Institute for Biochemistry, Leipzig University, D-04103 Leipzig, Germany
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, D-04107 Leipzig, Germany.,German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Competence Center for Scalable Data Services and Solutions, and Leipzig Research Center for Civilization Diseases, Leipzig University, D-04107 Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, D-04103 Leipzig, Germany.,Fraunhofer Institute for Cell Therapy and Immunology, D-04103 Leipzig, Germany.,Center for RNA in Technology and Health, University of Copenhagen, Frederiksberg C, Denmark.,Santa Fe Institute, Santa Fe, NM 87501, USA.,Department of Theoretical Chemistry of the University of Vienna, A-1090 Vienna, Austria
| | - Fabian Amman
- Department of Theoretical Chemistry of the University of Vienna, A-1090 Vienna, Austria.,Department of Chromosome Biology of the University of Vienna, A-1030 Vienna, Austria
| |
Collapse
|
2
|
SMORE: Synteny Modulator of Repetitive Elements. Life (Basel) 2017; 7:life7040042. [PMID: 29088079 PMCID: PMC5745555 DOI: 10.3390/life7040042] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Revised: 10/27/2017] [Accepted: 10/28/2017] [Indexed: 12/19/2022] Open
Abstract
Several families of multicopy genes, such as transfer ribonucleic acids (tRNAs) and ribosomal RNAs (rRNAs), are subject to concerted evolution, an effect that keeps sequences of paralogous genes effectively identical. Under these circumstances, it is impossible to distinguish orthologs from paralogs on the basis of sequence similarity alone. Synteny, the preservation of relative genomic locations, however, also remains informative for the disambiguation of evolutionary relationships in this situation. In this contribution, we describe an automatic pipeline for the evolutionary analysis of such cases that use genome-wide alignments as a starting point to assign orthology relationships determined by synteny. The evolution of tRNAs in primates as well as the history of the Y RNA family in vertebrates and nematodes are used to showcase the method. The pipeline is freely available.
Collapse
|
3
|
Ganie SA, Debnath AB, Gumi AM, Mondal TK. Comprehensive survey and evolutionary analysis of genome-wide miRNA genes from ten diploid Oryza species. BMC Genomics 2017; 18:711. [PMID: 28893199 PMCID: PMC5594537 DOI: 10.1186/s12864-017-4089-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 08/25/2017] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) are non-coding RNAs that play versatile roles in post-transcriptional gene regulation. Although much is known about their biogenesis, and gene regulation very little is known about their evolutionary relation among the closely related species. RESULT All the orthologous miRNA genes of Oryza sativa (japonica) from 10 different Oryza species were identified, and the evolutionary changes among these genes were analysed. Significant differences in the expansion of miRNA gene families were observed across the Oryza species. Analysis of the nucleotide substitution rates indicated that the mature sequences show the least substitution rates among the different regions of miRNA genes, and also show a very much less substitution rates as compared to that of all protein-coding genes across the Oryza species. Evolution of miRNA genes was also found to be contributed by transposons. A non-neutral selection was observed at 80 different miRNA loci across Oryza species which were estimated to have lost ~87% of the sequence diversity during the domestication. The phylogenetic analysis revealed that O. longistaminata diverged first among the AA-genomes, whereas O. brachyantha and O. punctata appeared as the eminent out-groups. The miR1861 family organised into nine distinct compact clusters in the studied Oryza species except O. brachyantha. Further, the expression analysis showed that 11 salt-responsive miRNAs were differentially regulated between O. coarctata and O. glaberrima. CONCLUSION Our study provides the evolutionary dynamics in the miRNA genes of 10 different Oryza species which will support more investigations about the structural and functional organization of miRNA genes of Oryza species.
Collapse
Affiliation(s)
- Showkat Ahmad Ganie
- Division of Genomic Resources, National Bureau of Plant Genetic Resources, Pusa, IARI Campus, New Delhi, 110012, India
| | - Ananda Bhusan Debnath
- Division of Genomic Resources, National Bureau of Plant Genetic Resources, Pusa, IARI Campus, New Delhi, 110012, India
| | - Abubakar Mohammad Gumi
- Division of Genomic Resources, National Bureau of Plant Genetic Resources, Pusa, IARI Campus, New Delhi, 110012, India
| | - Tapan Kumar Mondal
- Division of Genomic Resources, National Bureau of Plant Genetic Resources, Pusa, IARI Campus, New Delhi, 110012, India.
| |
Collapse
|
4
|
Velandia-Huerto CA, Berkemer SJ, Hoffmann A, Retzlaff N, Romero Marroquín LC, Hernández-Rosales M, Stadler PF, Bermúdez-Santana CI. Orthologs, turn-over, and remolding of tRNAs in primates and fruit flies. BMC Genomics 2016; 17:617. [PMID: 27515907 PMCID: PMC4981973 DOI: 10.1186/s12864-016-2927-4] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Accepted: 07/11/2016] [Indexed: 12/26/2022] Open
Abstract
Background Transfer RNAs (tRNAs) are ubiquitous in all living organism. They implement the genetic code so that most genomes contain distinct tRNAs for almost all 61 codons. They behave similar to mobile elements and proliferate in genomes spawning both local and non-local copies. Most tRNA families are therefore typically present as multicopy genes. The members of the individual tRNA families evolve under concerted or rapid birth-death evolution, so that paralogous copies maintain almost identical sequences over long evolutionary time-scales. To a good approximation these are functionally equivalent. Individual tRNA copies thus are evolutionary unstable and easily turn into pseudogenes and disappear. This leads to a rapid turnover of tRNAs and often large differences in the tRNA complements of closely related species. Since tRNA paralogs are not distinguished by sequence, common methods cannot not be used to establish orthology between tRNA genes. Results In this contribution we introduce a general framework to distinguish orthologs and paralogs in gene families that are subject to concerted evolution. It is based on the use of uniquely aligned adjacent sequence elements as anchors to establish syntenic conservation of sequence intervals. In practice, anchors and intervals can be extracted from genome-wide multiple sequence alignments. Syntenic clusters of concertedly evolving genes of different families can then be subdivided by list alignments, leading to usually small clusters of candidate co-orthologs. On the basis of recent advances in phylogenetic combinatorics, these candidate clusters can be further processed by cograph editing to recover their duplication histories. We developed a workflow that can be conceptualized as stepwise refinement of a graph of homologous genes. We apply this analysis strategy with different types of synteny anchors to investigate the evolution of tRNAs in primates and fruit flies. We identified a large number of tRNA remolding events concentrated at the tips of the phylogeny. With one notable exception all phylogenetically old tRNA remoldings do not change the isoacceptor class. Conclusions Gene families evolving under concerted evolution are not amenable to classical phylogenetic analyses since paralogs maintain identical, species-specific sequences, precluding the estimation of correct gene trees from sequence differences. This leaves conservation of syntenic arrangements with respect to “anchor elements” that are not subject to concerted evolution as the only viable source of phylogenetic information. We have demonstrated here that a purely synteny-based analysis of tRNA gene histories is indeed feasible. Although the choice of synteny anchors influences the resolution in particular when tight gene clusters are present, and the quality of sequence alignments, genome assemblies, and genome rearrangements limits the scope of the analysis, largely coherent results can be obtained for tRNAs. In particular, we conclude that a large fraction of the tRNAs are recent copies. This proliferation is compensated by rapid pseudogenization as exemplified by many very recent alloacceptor remoldings. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-2927-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Cristian A Velandia-Huerto
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| | - Sarah J Berkemer
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Anne Hoffmann
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Nancy Retzlaff
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany.,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany
| | - Liliana C Romero Marroquín
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| | - Maribel Hernández-Rosales
- CONACYT - Instituto de Matemáticas, UNAM Juriquilla, Av. Juriquilla #3001, Santiago de Querétaro, MX-76230, QRO, México
| | - Peter F Stadler
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103, Germany. .,Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16-18D-04107, Leipzig, Germany. .,Fraunhofer Institut for Cell Therapy and Immunology, Perlickstraße 1, Leipzig, D-04103, Germany. .,Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090, Austria. .,Center for non-coding RNA in Technology and Health, Grønegårdsvej 3, Frederiksberg C, DK-1870, Denmark. .,Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501, USA.
| | - Clara I Bermúdez-Santana
- Biology Department, Universidad Nacional de Colombia, Carrera 45 # 26-85, Edif. Uriel Gutiérrez, Bogotá, D.C, Colombia
| |
Collapse
|
5
|
Jovelin R, Cutter AD. Hitting two birds with one stone: The unforeseen consequences of nested gene knockouts in Caenorhabditis elegans. WORM 2016; 5:e1156835. [PMID: 27386165 DOI: 10.1080/21624054.2016.1156835] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2016] [Accepted: 02/16/2016] [Indexed: 12/19/2022]
Abstract
Nested genes represent an intriguing form of non-random genomic organization in which the boundaries of one gene are fully contained within another, longer host gene. The C. elegans genome contains over 10,000 nested genes, 92% of which are ncRNAs, which occur inside 16% of the protein coding gene complement. Host genes are longer than non-host coding genes, owing to their longer and more numerous introns. Indel alleles are available for nearly all of these host genes that simultaneously alter the nested gene, raising the possibility of nested gene disruption contributing to phenotypes that might be attributed to the host gene. Such dual-knockouts could represent a source of misinterpretation about host gene function. Dual-knockouts might also provide a novel source of synthetic phenotypes that reveal the functional effects of ncRNA genes, whereby the host gene disruption acts as a perturbed genetic background to help unmask ncRNA phenotypes.
Collapse
Affiliation(s)
- Richard Jovelin
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada; Informatics and Bio-Computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Asher D Cutter
- Department of Ecology and Evolutionary Biology, University of Toronto , Toronto, Ontario, Canada
| |
Collapse
|