Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Engelhardt J, Stadler PF. Evolution of the unspliced transcriptome. BMC Evol Biol 2015;15:166. [PMID: 26289325 PMCID: PMC4546029 DOI: 10.1186/s12862-015-0437-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 07/29/2015] [Indexed: 12/17/2022] Open

For:	Engelhardt J, Stadler PF. Evolution of the unspliced transcriptome. BMC Evol Biol 2015;15:166. [PMID: 26289325 PMCID: PMC4546029 DOI: 10.1186/s12862-015-0437-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2015] [Accepted: 07/29/2015] [Indexed: 12/17/2022] Open

Number

Cited by Other Article(s)

Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2024;2802:347-393. [PMID: 38819565 DOI: 10.1007/978-1-0716-3838-5_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J. Identification and characterization of novel conserved RNA structures in Drosophila. BMC Genomics 2018;19:899. [PMID: 30537930 PMCID: PMC6288889 DOI: 10.1186/s12864-018-5234-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 11/08/2018] [Indexed: 12/20/2022] Open

Abstract

BACKGROUND

Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes - and thus for conserved secondary structures - even in genomic regions that are poorly alignable in the primary sequence. The wealth of available transcriptome data can add valuable insight into expression and possible function for new ncRNA candidates. Earlier work identifying ncRNAs in Drosophila melanogaster made use of sequence-based alignments and employed a sliding window approach, inevitably biasing identification toward RNAs encoded in the more conserved parts of the genome.

RESULTS

To search for conserved RNA structures (CRSs) that may not be highly conserved in sequence and to assess the expression of CRSs, we conducted a genome-wide structural alignment screen of 27 insect genomes including D. melanogaster and integrated this with an extensive set of tiling array data. The structural alignment screen revealed ∼30,000 novel candidate CRSs at an estimated false discovery rate of less than 10%. With more than one quarter of all individual CRS motifs showing sequence identities below 60%, the predicted CRSs largely complement the findings of sliding window approaches applied previously. While a sixth of the CRSs were ubiquitously expressed, we found that most were expressed in specific developmental stages or cell lines. Notably, most statistically significant enrichment of CRSs were observed in pupae, mainly in exons of untranslated regions, promotors, enhancers, and long ncRNAs. Interestingly, cell lines were found to express a different set of CRSs than were found in vivo. Only a small fraction of intergenic CRSs were co-expressed with the adjacent protein coding genes, which suggests that most intergenic CRSs are independent genetic units.

CONCLUSIONS

This study provides a more comprehensive view of the ncRNA transcriptome in fly as well as evidence for differential expression of CRSs during development and in cell lines.

Collapse

Affiliation(s)

Rebecca Kirsch Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany
Stefan E. Seemann Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
Walter L. Ruzzo Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark School of Computer Science and Engineering, University of Washington, Box 352350, Seattle, 98195-2350 WA USA Department of Genome Sciences, University of Washington, Box 355065, Seattle, 98195-5065 WA USA Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, 98109-1024 WA USA
Stephen M. Cohen Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, Copenhagen N, DK-2200 Denmark
Peter F. Stadler Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103 Germany Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, COL-111321 D.C. Colombia Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
Jan Gorodkin Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark

Collapse

Backofen R, Gorodkin J, Hofacker IL, Stadler PF. Comparative RNA Genomics. Methods Mol Biol 2018;1704:363-400. [PMID: 29277874 DOI: 10.1007/978-1-4939-7463-4_14] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]

Schneider HW, Raiol T, Brigido MM, Walter MEMT, Stadler PF. A Support Vector Machine based method to distinguish long non-coding RNAs from protein coding transcripts. BMC Genomics 2017;18:804. [PMID: 29047334 PMCID: PMC5648457 DOI: 10.1186/s12864-017-4178-4] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2017] [Accepted: 10/05/2017] [Indexed: 12/31/2022] Open

Abstract

Background

In recent years, a rapidly increasing number of RNA transcripts has been generated by thousands of sequencing projects around the world, creating enormous volumes of transcript data to be analyzed. An important problem to be addressed when analyzing this data is distinguishing between long non-coding RNAs (lncRNAs) and protein coding transcripts (PCTs). Thus, we present a Support Vector Machine (SVM) based method to distinguish lncRNAs from PCTs, using features based on frequencies of nucleotide patterns and ORF lengths, in transcripts.

Methods

The proposed method is based on SVM and uses the first ORF relative length and frequencies of nucleotide patterns selected by PCA as features. FASTA files were used as input to calculate all possible features. These features were divided in two sets: (i) 336 frequencies of nucleotide patterns; and (ii) 4 features derived from ORFs. PCA were applied to the first set to identify 6 groups of frequencies that could most contribute to the distinction. Twenty-four experiments using the 6 groups from the first set and the features from the second set where built to create the best model to distinguish lncRNAs from PCTs.

Results

This method was trained and tested with human (Homo sapiens), mouse (Mus musculus) and zebrafish (Danio rerio) data, achieving 98.21%, 98.03% and 96.09%, accuracy, respectively. Our method was compared to other tools available in the literature (CPAT, CPC, iSeeRNA, lncRNApred, lncRScan-SVM and FEELnc), and showed an improvement in accuracy by ≈3.00%. In addition, to validate our model, the mouse data was classified with the human model, and vice-versa, achieving ≈97.80% accuracy in both cases, showing that the model is not overfit. The SVM models were validated with data from rat (Rattus norvegicus), pig (Sus scrofa) and fruit fly (Drosophila melanogaster), and obtained more than 84.00% accuracy in all these organisms. Our results also showed that 81.2% of human pseudogenes and 91.7% of mouse pseudogenes were classified as non-coding. Moreover, our method was capable of re-annotating two uncharacterized sequences of Swiss-Prot database with high probability of being lncRNAs. Finally, in order to use the method to annotate transcripts derived from RNA-seq, previously identified lncRNAs of human, gorilla (Gorilla gorilla) and rhesus macaque (Macaca mulatta) were analyzed, having successfully classified 98.62%, 80.8% and 91.9%, respectively.

Conclusions

The SVM method proposed in this work presents high performance to distinguish lncRNAs from PCTs, as shown in the results. To build the model, besides using features known in the literature regarding ORFs, we used PCA to identify features among nucleotide pattern frequencies that contribute the most in distinguishing lncRNAs from PCTs, in reference data sets. Interestingly, models created with two evolutionary distant species could distinguish lncRNAs of even more distant species.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-4178-4) contains supplementary material, which is available to authorized users.

Collapse

Rare Splice Variants in Long Non-Coding RNAs. Noncoding RNA 2017;3:ncrna3030023. [PMID: 29657294 PMCID: PMC5831916 DOI: 10.3390/ncrna3030023] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2017] [Revised: 06/24/2017] [Accepted: 06/29/2017] [Indexed: 12/12/2022] Open

HIPSTR and thousands of lncRNAs are heterogeneously expressed in human embryos, primordial germ cells and stable cell lines. Sci Rep 2016;6:32753. [PMID: 27605307 PMCID: PMC5015059 DOI: 10.1038/srep32753] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2016] [Accepted: 08/11/2016] [Indexed: 01/02/2023] Open

Valadkhan S, Valencia-Hipólito A. lncRNAs in Stress Response. Curr Top Microbiol Immunol 2016;394:203-36. [PMID: 26658944 PMCID: PMC11293874 DOI: 10.1007/82_2015_489] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]