1
|
Gadekar V, Munk AW, Miladi M, Junge A, Backofen R, Seemann S, Gorodkin J. Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites. NAR Genom Bioinform 2024; 6:lqae089. [PMID: 39131818 PMCID: PMC11310781 DOI: 10.1093/nargab/lqae089] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2023] [Revised: 06/26/2024] [Accepted: 07/16/2024] [Indexed: 08/13/2024] Open
Abstract
RNA secondary structures play essential roles in the formation of the tertiary structure and function of a transcript. Recent genome-wide studies highlight significant potential for RNA structures in the mammalian genome. However, a major challenge is assigning functional roles to these structured RNAs. In this study, we conduct a guilt-by-association analysis of clusters of computationally predicted conserved RNA structure (CRSs) in human untranslated regions (UTRs) to associate them with gene functions. We filtered a broad pool of ∼500 000 human CRSs for UTR overlap, resulting in 4734 and 24 754 CRSs from the 5' and 3' UTR of protein-coding genes, respectively. We separately clustered these CRSs for both sets using RNAscClust, obtaining 793 and 2403 clusters, each containing an average of five CRSs per cluster. We identified overrepresented binding sites for 60 and 43 RNA-binding proteins co-localizing with the clustered CRSs. Furthermore, 104 and 441 clusters from the 5' and 3' UTRs, respectively, showed enrichment for various Gene Ontologies, including biological processes such as 'signal transduction', 'nervous system development', molecular functions like 'transferase activity' and the cellular components such as 'synapse' among others. Our study shows that significant functional insights can be gained by clustering RNA structures based on their structural characteristics.
Collapse
Affiliation(s)
- Veerendra P Gadekar
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
- Centre for Integrative Biology and Systems Medicine (IBSE), IIT Madras, Chennai, India
- Robert Bosch Centre for Data Science and Artificial Intelligence (RBCDSAI), IIT Madras, Chennai, India
| | - Alexander Welford Munk
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Milad Miladi
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Alexander Junge
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | - Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Ridebanevej 9, 1870 Frederiksberg, Denmark
- Department of Veterinary and Animal Sciences, Faculty of Health and Medical Sciences, University of Copenhagen, Frederiksberg, 1870 Frederiksberg, Denmark
| |
Collapse
|
2
|
Grigorashvili EI, Chervontseva ZS, Gelfand MS. Predicting RNA secondary structure by a neural network: what features may be learned? PeerJ 2022; 10:e14335. [PMID: 36530406 PMCID: PMC9756865 DOI: 10.7717/peerj.14335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2022] [Accepted: 10/12/2022] [Indexed: 12/14/2022] Open
Abstract
Deep learning is a class of machine learning techniques capable of creating internal representation of data without explicit preprogramming. Hence, in addition to practical applications, it is of interest to analyze what features of biological data may be learned by such models. Here, we describe PredPair, a deep learning neural network trained to predict base pairs in RNA structure from sequence alone, without any incorporated prior knowledge, such as the stacking energies or possible spatial structures. PredPair learned the Watson-Crick and wobble base-pairing rules and created an internal representation of the stacking energies and helices. Application to independent experimental (DMS-Seq) data on nucleotide accessibility in mRNA showed that the nucleotides predicted as paired indeed tend to be involved in the RNA structure. The performance of the constructed model was comparable with the state-of-the-art method based on the thermodynamic approach, but with a higher false positives rate. On the other hand, it successfully predicted pseudoknots. t-SNE clusters of embeddings of RNA sequences created by PredPair tend to contain embeddings from particular Rfam families, supporting the predictions of PredPair being in line with biological classification.
Collapse
Affiliation(s)
| | | | - Mikhail S. Gelfand
- Center of Molecular and Cellular Biology, Skolkovo Institute of Science and Technology, Moscow, Russia,Institute of Information Transmission Problems, Moscow, Russia
| |
Collapse
|
3
|
Seemann SE, Mirza AH, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Workman CT, Pociot F, Tommerup N, Gorodkin J, Ruzzo WL. OUP accepted manuscript. Nucleic Acids Res 2022; 50:2452-2463. [PMID: 35188540 PMCID: PMC8934657 DOI: 10.1093/nar/gkac067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 01/07/2022] [Accepted: 01/25/2022] [Indexed: 12/01/2022] Open
Abstract
Accelerated evolution of any portion of the genome is of significant interest, potentially signaling positive selection of phenotypic traits and adaptation. Accelerated evolution remains understudied for structured RNAs, despite the fact that an RNA’s structure is often key to its function. RNA structures are typically characterized by compensatory (structure-preserving) basepair changes that are unexpected given the underlying sequence variation, i.e., they have evolved through negative selection on structure. We address the question of how fast the primary sequence of an RNA can change through evolution while conserving its structure. Specifically, we consider predicted and known structures in vertebrate genomes. After careful control of false discovery rates, we obtain 13 de novo structures (and three known Rfam structures) that we predict to have rapidly evolving sequences—defined as structures where the primary sequences of human and mouse have diverged at least twice as fast (1.5 times for Rfam) as nearby neutrally evolving sequences. Two of the three known structures function in translation inhibition related to infection and immune response. We conclude that rapid sequence divergence does not preclude RNA structure conservation in vertebrates, although these events are relatively rare.
Collapse
Affiliation(s)
| | - Aashiq H Mirza
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Claus H Bang-Berthelsen
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- National Food Institute, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Christian Garde
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
| | | | - Christopher T Workman
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Center for Biological Sequence Analysis, Technical University of Denmark, Denmark
| | - Flemming Pociot
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Steno Diabetes Center Copenhagen, Gentofte, Denmark
| | - Niels Tommerup
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Cellular and Molecular Medicine (ICMM), University of Copenhagen, Denmark
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Department of Veterinary and Animal Sciences, University of Copenhagen, Denmark
| | - Walter L Ruzzo
- Center for non-coding RNA in Technology and Health (RTH), University of Copenhagen, Denmark
- Computer Science and Engineering and Genome Sciences, University of Washington, USA
- Fred Hutchinson Cancer Research Center, Seattle, USA
| |
Collapse
|
4
|
3D Modeling of Non-coding RNA Interactions. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2022; 1385:281-317. [DOI: 10.1007/978-3-031-08356-3_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
5
|
Vahed M, Vahed M, Garmire LX. BML: a versatile web server for bipartite motif discovery. Brief Bioinform 2021; 23:6490318. [PMID: 34974623 PMCID: PMC8769915 DOI: 10.1093/bib/bbab536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2021] [Revised: 11/18/2021] [Accepted: 11/19/2021] [Indexed: 11/28/2022] Open
Abstract
Motif discovery and characterization are important for gene regulation analysis. The lack of intuitive and integrative web servers impedes the effective use of motifs. Most motif discovery web tools are either not designed for non-expert users or lacking optimization steps when using default settings. Here we describe bipartite motifs learning (BML), a parameter-free web server that provides a user-friendly portal for online discovery and analysis of sequence motifs, using high-throughput sequencing data as the input. BML utilizes both position weight matrix and dinucleotide weight matrix, the latter of which enables the expression of the interdependencies of neighboring bases. With input parameters concerning the motifs are given, the BML achieves significantly higher accuracy than other available tools for motif finding. When no parameters are given by non-expert users, unlike other tools, BML employs a learning method to identify motifs automatically and achieve accuracy comparable to the scenario where the parameters are set. The BML web server is freely available at http://motif.t-ridership.com/ (https://github.com/Mohammad-Vahed/BML).
Collapse
Affiliation(s)
- Mohammad Vahed
- Department of Pathology & Laboratory Medicine, David Geffen School of Medicine, University of California Los Angeles (UCLA), California, USA.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48105, USA
| | - Majid Vahed
- Pharmaceutical Sciences Research Center, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Lana X Garmire
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, 48105, USA
| |
Collapse
|
6
|
Zhang Y, Zhang L, Wang Y, Ding H, Xue S, Qi H, Li P. MicroRNAs or Long Noncoding RNAs in Diagnosis and Prognosis of Coronary Artery Disease. Aging Dis 2019; 10:353-366. [PMID: 31011482 PMCID: PMC6457061 DOI: 10.14336/ad.2018.0617] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2018] [Accepted: 06/17/2018] [Indexed: 12/14/2022] Open
Abstract
Coronary artery disease (CAD) is the result of atherosclerotic plaque development in the wall of the coronary arteries. The underlying mechanism involves atherosclerosis of the arteries of the heart which is a relatively complex process comprising several steps. In CAD, atherosclerosis induces functional and structural changes. The pathogenesis of CAD results from various changes in and interactions between multiple cell types in the artery walls; these changes mainly include endothelial cell (EC) dysfunction, vascular smooth muscle cell (SMC) alteration, lipid deposition and macrophage activation. Various blood markers associated with an increased risk for cardiovascular endpoints have been identified; however, few have yet been shown to have a diagnostic impact or important clinical implications that would affect patient management. Noncoding RNAs, especially microRNAs (miRNAs) and long noncoding RNAs (lncRNAs), can be stable in plasma and other body fluids and could therefore serve as biomarkers for some diseases. Many studies have shown that some miRNAs and lncRNAs play key roles in heart and vascular development and in cardiac pathophysiology. Thus, we summarize here the latest research progress, focusing on the molecular mechanism of miRNAs and lncRNAs in CAD, with the intent of seeking new targets for the treatment of heart disease.
Collapse
Affiliation(s)
- Yuan Zhang
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Lei Zhang
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Yu Wang
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Han Ding
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Sheng Xue
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Hongzhao Qi
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| | - Peifeng Li
- Institute for Translational Medicine, Qingdao University, Deng Zhou Road 38, Qingdao 266021, China
| |
Collapse
|
7
|
Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J. Identification and characterization of novel conserved RNA structures in Drosophila. BMC Genomics 2018; 19:899. [PMID: 30537930 PMCID: PMC6288889 DOI: 10.1186/s12864-018-5234-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 11/08/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes - and thus for conserved secondary structures - even in genomic regions that are poorly alignable in the primary sequence. The wealth of available transcriptome data can add valuable insight into expression and possible function for new ncRNA candidates. Earlier work identifying ncRNAs in Drosophila melanogaster made use of sequence-based alignments and employed a sliding window approach, inevitably biasing identification toward RNAs encoded in the more conserved parts of the genome. RESULTS To search for conserved RNA structures (CRSs) that may not be highly conserved in sequence and to assess the expression of CRSs, we conducted a genome-wide structural alignment screen of 27 insect genomes including D. melanogaster and integrated this with an extensive set of tiling array data. The structural alignment screen revealed ∼30,000 novel candidate CRSs at an estimated false discovery rate of less than 10%. With more than one quarter of all individual CRS motifs showing sequence identities below 60%, the predicted CRSs largely complement the findings of sliding window approaches applied previously. While a sixth of the CRSs were ubiquitously expressed, we found that most were expressed in specific developmental stages or cell lines. Notably, most statistically significant enrichment of CRSs were observed in pupae, mainly in exons of untranslated regions, promotors, enhancers, and long ncRNAs. Interestingly, cell lines were found to express a different set of CRSs than were found in vivo. Only a small fraction of intergenic CRSs were co-expressed with the adjacent protein coding genes, which suggests that most intergenic CRSs are independent genetic units. CONCLUSIONS This study provides a more comprehensive view of the ncRNA transcriptome in fly as well as evidence for differential expression of CRSs during development and in cell lines.
Collapse
Affiliation(s)
- Rebecca Kirsch
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany
| | - Stefan E. Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
| | - Walter L. Ruzzo
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- School of Computer Science and Engineering, University of Washington, Box 352350, Seattle, 98195-2350 WA USA
- Department of Genome Sciences, University of Washington, Box 355065, Seattle, 98195-5065 WA USA
- Fred Hutchinson Cancer Research Center, 1100 Fairview Ave. N., Seattle, 98109-1024 WA USA
| | - Stephen M. Cohen
- Department of Cellular and Molecular Medicine, University of Copenhagen, Blegdamsvej 3, Copenhagen N, DK-2200 Denmark
| | - Peter F. Stadler
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Bioinformatics Group, Department of Computer Science, and Interdisciplinary Center for Bioinformatics, Universität Leipzig, Härtelstraße 16–18, Leipzig, D-04107 Germany
- Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, Leipzig, D-04103 Germany
- Faculdad de Ciencias, Universidad Nacional de Colombia, Sede Bogotá, Ciudad Universitaria, Bogotá, COL-111321 D.C. Colombia
- Department of Theoretical Chemistry, University of Vienna, Währinger Straße 17, Vienna, A-1090 Austria
- Santa Fe Institute, 1399 Hyde Park Rd., Santa Fe, NM87501 USA
| | - Jan Gorodkin
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
- Department of Veterinary and Animal Science, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg C, DK-1870 Denmark
| |
Collapse
|
8
|
Liu Z, Li C, Li X, Yao Y, Ni W, Zhang X, Cao Y, Hazi W, Wang D, Quan R, Yu S, Wu Y, Niu S, Cui Y, Khan Y, Hu S. Expression profiles of microRNAs in skeletal muscle of sheep by deep sequencing. ASIAN-AUSTRALASIAN JOURNAL OF ANIMAL SCIENCES 2018; 32:757-766. [PMID: 30477295 PMCID: PMC6498074 DOI: 10.5713/ajas.18.0473] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2018] [Accepted: 11/05/2018] [Indexed: 11/27/2022]
Abstract
Objective MicroRNAs are a class of endogenous small regulatory RNAs that regulate cell proliferation, differentiation and apoptosis. Recent studies on miRNAs are mainly focused on mice, human and pig. However, the studies on miRNAs in skeletal muscle of sheep are not comprehensive. Methods RNA-seq technology was used to perform genomic analysis of miRNAs in prenatal and postnatal skeletal muscle of sheep. Targeted genes were predicted using miRanda software and miRNA-mRNA interactions were verified by quantitative real-time polymerase chain reaction. To further investigate the function of miRNAs, candidate targeted genes were enriched for analysis using gene ontology (GO) and Kyoto encyclopedia of genes and genomes (KEGG) enrichment. Results The results showed total of 1,086 known miRNAs and 40 new candidate miRNAs were detected in prenatal and postnatal skeletal muscle of sheep. In addition, 345 miRNAs (151 up-regulated, 94 down-regulated) were differentially expressed. Moreover, miRanda software was performed to predict targeted genes of miRNAs, resulting in a total of 2,833 predicted targets, especially miR-381 which targeted multiple muscle-related mRNAs. Furthermore, GO and KEGG pathway analysis confirmed that targeted genes of miRNAs were involved in development of skeletal muscles. Conclusion This study supplements the miRNA database of sheep, which provides valuable information for further study of the biological function of miRNAs in sheep skeletal muscle.
Collapse
Affiliation(s)
- Zhijin Liu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Cunyuan Li
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Xiaoyue Li
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yang Yao
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Wei Ni
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Xiangyu Zhang
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yang Cao
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Wureli Hazi
- College of Animal Science and Technology, Shihezi University, Shihezi, Xinjiang, 832003, China
| | - Dawei Wang
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Renzhe Quan
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Shuting Yu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yuyu Wu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Songmin Niu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yulong Cui
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Yaseen Khan
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| | - Shengwei Hu
- College of Life Sciences, Shihezi University, Shihezi, Xinjiang 832003, China
| |
Collapse
|
9
|
Fallmann J, Will S, Engelhardt J, Grüning B, Backofen R, Stadler PF. Recent advances in RNA folding. J Biotechnol 2017; 261:97-104. [DOI: 10.1016/j.jbiotec.2017.07.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Revised: 07/02/2017] [Accepted: 07/04/2017] [Indexed: 12/23/2022]
|
10
|
van Son M, Kent MP, Grove H, Agarwal R, Hamland H, Lien S, Grindflek E. Fine mapping of a QTL affecting levels of skatole on pig chromosome 7. BMC Genet 2017; 18:85. [PMID: 29020941 PMCID: PMC5637327 DOI: 10.1186/s12863-017-0549-8] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 09/11/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Previous studies in the Norwegian pig breeds Landrace and Duroc have revealed a QTL for levels of skatole located in the region 74.7-80.5 Mb on SSC7. Skatole is one of the main components causing boar taint, which gives an undesirable smell and taste to the pig meat when heated. Surgical castration of boars is a common practice to reduce the risk of boar taint, however, a selection for boars genetically predisposed for low levels of taint would help eliminating the need for castration and be advantageous for both economic and welfare reasons. In order to identify the causal mutation(s) for the QTL and/or identify genetic markers for selection purposes we performed a fine mapping of the SSC7 skatole QTL region. RESULTS A dense set of markers on SSC7 was obtained by whole genome re-sequencing of 24 Norwegian Landrace and 23 Duroc boars. Subsets of 126 and 157 SNPs were used for association analyses in Landrace and Duroc, respectively. Significant single markers associated with skatole spanned a large 4.4 Mb region from 75.9-80.3 Mb in Landrace, with the highest test scores found in a region between the genes NOVA1 and TGM1 (p < 0.001). The same QTL was obtained in Duroc and, although less significant, with associated SNPs spanning a 1.2 Mb region from 78.9-80.1 Mb (p < 0.01). The highest test scores in Duroc were found in genes of the granzyme family (GZMB and GZMH-like) and STXBP6. Haplotypes associated with levels of skatole were identified in Landrace but not in Duroc, and a haplotype block was found to explain 2.3% of the phenotypic variation for skatole. The SNPs in this region were not associated with levels of sex steroids. CONCLUSIONS Fine mapping of a QTL for skatole on SSC7 confirmed associations of this region with skatole levels in pigs. The QTL region was narrowed down to 4.4 Mb in Landrace and haplotypes explaining 2.3% of the phenotypic variance for skatole levels were identified. Results confirmed that sex steroids are not affected by this QTL region, making these markers attractive for selection against boar taint.
Collapse
Affiliation(s)
- Maren van Son
- Topigs Norsvin, Storhamargata 44, 2317, Hamar, Norway.
| | - Matthew P Kent
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway
| | - Harald Grove
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway
| | - Rahul Agarwal
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway
| | - Hanne Hamland
- Topigs Norsvin, Storhamargata 44, 2317, Hamar, Norway
| | - Sigbjørn Lien
- Centre for Integrative Genetics (CIGENE), Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, P. O. Box 5003, 1432, Ås, Norway
| | - Eli Grindflek
- Topigs Norsvin, Storhamargata 44, 2317, Hamar, Norway
| |
Collapse
|
11
|
Abstract
Protein-coding RNAs represent only a small fraction of the transcriptional output in higher eukaryotes. The remaining RNA species encompass a broad range of molecular functions and regulatory roles, a consequence of the structural polyvalence of RNA polymers. Albeit several classes of small noncoding RNAs are relatively well characterized, the accessibility of affordable high-throughput sequencing is generating a wealth of novel, unannotated transcripts, especially long noncoding RNAs (lncRNAs) that are derived from genomic regions that are antisense, intronic, intergenic, and overlapping protein-coding loci. Parsing and characterizing the functions of noncoding RNAs-lncRNAs in particular-is one of the great challenges of modern genome biology. Here we discuss concepts and computational methods for the identification of structural domains in lncRNAs from genomic and transcriptomic data. In the first part, we briefly review how to identify RNA structural motifs in individual lncRNAs. In the second part, we describe how to leverage the evolutionary dynamics of structured RNAs in a computationally efficient screen to detect putative functional lncRNA motifs using comparative genomics.
Collapse
Affiliation(s)
- Martin A Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia. .,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia.
| | - John S Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria St, Darlinghurst, NSW, 2010, Australia.,St-Vincent's Clinical School, Faculty of Medicine, UNSW Australia, Sydney, NSW, 2052, Australia
| |
Collapse
|
12
|
Lagarde J, Uszczynska-Ratajczak B, Santoyo-Lopez J, Gonzalez JM, Tapanari E, Mudge JM, Steward CA, Wilming L, Tanzer A, Howald C, Chrast J, Vela-Boza A, Rueda A, Lopez-Domingo FJ, Dopazo J, Reymond A, Guigó R, Harrow J. Extension of human lncRNA transcripts by RACE coupled with long-read high-throughput sequencing (RACE-Seq). Nat Commun 2016; 7:12339. [PMID: 27531712 PMCID: PMC4992054 DOI: 10.1038/ncomms12339] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2016] [Accepted: 06/23/2016] [Indexed: 12/22/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) constitute a large, yet mostly uncharacterized fraction of the mammalian transcriptome. Such characterization requires a comprehensive, high-quality annotation of their gene structure and boundaries, which is currently lacking. Here we describe RACE-Seq, an experimental workflow designed to address this based on RACE (rapid amplification of cDNA ends) and long-read RNA sequencing. We apply RACE-Seq to 398 human lncRNA genes in seven tissues, leading to the discovery of 2,556 on-target, novel transcripts. About 60% of the targeted loci are extended in either 5′ or 3′, often reaching genomic hallmarks of gene boundaries. Analysis of the novel transcripts suggests that lncRNAs are as long, have as many exons and undergo as much alternative splicing as protein-coding genes, contrary to current assumptions. Overall, we show that RACE-Seq is an effective tool to annotate an organism's deep transcriptome, and compares favourably to other targeted sequencing techniques. Long non-coding RNAs are increasingly recognised to be important factors in regulating cellular processes and comprise a large faction of the transcriptome, however most are uncharacterised. Here the authors present RACE-Seq, a tool to improve and extend the annotation of low-expression transcripts.
Collapse
Affiliation(s)
- Julien Lagarde
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Barbara Uszczynska-Ratajczak
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | | | | | - Electra Tapanari
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Jonathan M Mudge
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Charles A Steward
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Laurens Wilming
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| | - Andrea Tanzer
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Cédric Howald
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Jacqueline Chrast
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Alicia Vela-Boza
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain.,Roche Diagnostics, 08174 Sant Cugat Del Vallès, Barcelona, Spain
| | - Antonio Rueda
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain
| | | | - Joaquin Dopazo
- Genomics and Bioinformatics Platform of Andalusia (GBPA), 41092 Seville, Spain.,Computational Genomics Department, Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain.,Functional Genomics Node (INB), Centro de Investigación Príncipe Felipe, 46012 Valencia, Spain
| | - Alexandre Reymond
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Roderic Guigó
- Centre for Genomic Regulation (CRG), Barcelona Institute of Science and Technology (BIST), Dr Aiguader 88, 08003 Barcelona, Spain.,Universitat Pompeu Fabra (UPF), Barcelona, Spain
| | - Jennifer Harrow
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire CB10 1HH, UK
| |
Collapse
|
13
|
Nitsche A, Stadler PF. Evolutionary clues in lncRNAs. WILEY INTERDISCIPLINARY REVIEWS-RNA 2016; 8. [PMID: 27436689 DOI: 10.1002/wrna.1376] [Citation(s) in RCA: 43] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/24/2016] [Revised: 06/06/2016] [Accepted: 06/09/2016] [Indexed: 12/13/2022]
Abstract
The diversity of long non-coding RNAs (lncRNAs) in the human transcriptome is in stark contrast to the sparse exploration of their functions concomitant with their conservation and evolution. The pervasive transcription of the largely non-coding human genome makes the evolutionary age and conservation patterns of lncRNAs to a topic of interest. Yet it is a fairly unexplored field and not that easy to determine as for protein-coding genes. Although there are a few experimentally studied cases, which are conserved at the sequence level, most lncRNAs exhibit weak or untraceable primary sequence conservation. Recent studies shed light on the interspecies conservation of secondary structures among lncRNA homologs by using diverse computational methods. This highlights the importance of structure on functionality of lncRNAs as opposed to the poor impact of primary sequence changes. Further clues in the evolution of lncRNAs are given by selective constraints on non-coding gene structures (e.g., promoters or splice sites) as well as the conservation of prevalent spatio-temporal expression patterns. However, a rapid evolutionary turnover is observable throughout the heterogeneous group of lncRNAs. This still gives rise to questions about its functional meaning. WIREs RNA 2017, 8:e1376. doi: 10.1002/wrna.1376 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Anne Nitsche
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Institute de Biologie Moléculaire et Cellulaire, Université de Strasbourg, Cedex, France
| | - Peter F Stadler
- Bioinformatics Group, Department of Computer Science, University Leipzig, Leipzig, Germany.,Interdisciplinary Center for Bioinformatics, University Leipzig, Leipzig, Germany.,Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany.,Department of Diagnostics, Fraunhofer Institute for Cell Therapy and Immunology - IZI, Leipzig, Germany.,Center for Non-Coding RNA in Technology and Health, University of Copenhagen, Frederiksberg, Denmark.,Department of Theoretical Chemistry, University of Vienna, Wien, Austria.,Santa Fe Institute, Santa Fe, NM, USA
| |
Collapse
|
14
|
Vinogradova SV, Sutormin RA, Mironov AA, Soldatov RA. Probing-directed identification of novel structured RNAs. RNA Biol 2016; 13:232-42. [PMID: 26732206 DOI: 10.1080/15476286.2015.1132140] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.
Collapse
Affiliation(s)
- Svetlana V Vinogradova
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| | - Roman A Sutormin
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,c Lawrence Berkeley National Laboratory , Berkeley , 94710 , CA , USA
| | - Andrey A Mironov
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| | - Ruslan A Soldatov
- a Department of Bioengineering and Bioinformatics , Lomonosov Moscow State University, 1-73 Vorobievy Gory , Moscow , 119991 , Russia.,b Institute for Information Transmission Problems, Russian Academy of Sciences, 19 Bolshoi Karetnyi per , Moscow , 127994 , Russia
| |
Collapse
|
15
|
Abstract
Genomic studies have greatly expanded our knowledge of structural non-coding RNAs (ncRNAs). These RNAs fold into characteristic secondary structures and perform specific-structure dependent biological functions. Hence RNA secondary structure prediction is one of the most well studied problems in computational RNA biology. Comparative sequence analysis is one of the more reliable RNA structure prediction approaches as it exploits information of multiple related sequences to infer the consensus secondary structure. This class of methods essentially learns a global secondary structure from the input sequences. In this paper, we consider the more general problem of unearthing common local secondary structure based patterns from a set of related sequences. The input sequences for example could correspond to 3(') or 5(') untranslated regions of a set of orthologous genes and the unearthed local patterns could correspond to regulatory motifs found in these regions. These sequences could also correspond to in vitro selected RNA, genomic segments housing ncRNA genes from the same family and so on. Here, we give a detailed review of the various computational techniques proposed in literature attempting to solve this general motif discovery problem. We also give empirical comparisons of some of the current state of the art methods and point out future directions of research.
Collapse
Affiliation(s)
- Avinash Achar
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
| | - Pål Sætrom
- Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway.
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway.
| |
Collapse
|
16
|
Jalali S, Kapoor S, Sivadas A, Bhartiya D, Scaria V. Computational approaches towards understanding human long non-coding RNA biology. Bioinformatics 2015; 31:2241-51. [DOI: 10.1093/bioinformatics/btv148] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2014] [Accepted: 03/10/2015] [Indexed: 12/18/2022] Open
|
17
|
Yip DKS, Pang IK, Yip KY. Systematic exploration of autonomous modules in noisy microRNA-target networks for testing the generality of the ceRNA hypothesis. BMC Genomics 2014; 15:1178. [PMID: 25539629 PMCID: PMC4367885 DOI: 10.1186/1471-2164-15-1178] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 12/11/2014] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In the competing endogenous RNA (ceRNA) hypothesis, different transcripts communicate through a competition for their common targeting microRNAs (miRNAs). Individual examples have clearly shown the functional importance of ceRNA in gene regulation and cancer biology. It remains unclear to what extent gene expression levels are regulated by ceRNA in general. One major hurdle to studying this problem is the intertwined connections in miRNA-target networks, which makes it difficult to isolate the effects of individual miRNAs. RESULTS Here we propose computational methods for decomposing a complex miRNA-target network into largely autonomous modules called microRNA-target biclusters (MTBs). Each MTB contains a relatively small number of densely connected miRNAs and mRNAs with few connections to other miRNAs and mRNAs. Each MTB can thus be individually analyzed with minimal crosstalk with other MTBs. Our approach differs from previous methods for finding modules in miRNA-target networks by not making any pre-assumptions about expression patterns, thereby providing objective information for testing the ceRNA hypothesis. We show that the expression levels of miRNAs and mRNAs in an MTB are significantly more anti-correlated than random miRNA-mRNA pairs and other validated and predicted miRNA-target pairs, demonstrating the biological relevance of MTBs. We further show that there is widespread correlation of expression between mRNAs in same MTBs under a wide variety of parameter settings, and the correlation remains even when co-regulatory effects are controlled for, which suggests potential widespread expression buffering between these mRNAs, which is consistent with the ceRNA hypothesis. Lastly, we also propose a potential use of MTBs in functional annotation of miRNAs. CONCLUSIONS MTBs can be used to help identify autonomous miRNA-target modules for testing the generality of the ceRNA hypothesis experimentally. The identified modules can also be used to test other properties of miRNA-target networks in general.
Collapse
Affiliation(s)
- Danny Kit-Sang Yip
- />Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Iris K Pang
- />School of Life Sciences, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| | - Kevin Y Yip
- />Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- />Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
- />CUHK-BGI Innovation Institute of Trans-omics, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong
| |
Collapse
|
18
|
Interconversion between parallel and antiparallel conformations of a 4H RNA junction in domain 3 of foot-and-mouth disease virus IRES captured by dynamics simulations. Biophys J 2014; 106:447-58. [PMID: 24461020 DOI: 10.1016/j.bpj.2013.12.008] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 11/23/2013] [Accepted: 12/03/2013] [Indexed: 01/31/2023] Open
Abstract
RNA junctions are common secondary structural elements present in a wide range of RNA species. They play crucial roles in directing the overall folding of RNA molecules as well as in a variety of biological functions. In particular, there has been great interest in the dynamics of RNA junctions, including conformational pathways of fully base-paired 4-way (4H) RNA junctions. In such constructs, all nucleotides participate in one of the four double-stranded stem regions, with no connecting loops. Dynamical aspects of these 4H RNAs are interesting because frequent interchanges between parallel and antiparallel conformations are thought to occur without binding of other factors. Gel electrophoresis and single-molecule fluorescence resonance energy transfer experiments have suggested two possible pathways: one involves a helical rearrangement via disruption of coaxial stacking, and the other occurs by a rotation between the helical axes of coaxially stacked conformers. Employing molecular dynamics simulations, we explore this conformational variability in a 4H junction derived from domain 3 of the foot-and-mouth disease virus internal ribosome entry site (IRES); this junction contains highly conserved motifs for RNA-RNA and RNA-protein interactions, important for IRES activity. Our simulations capture transitions of the 4H junction between parallel and antiparallel conformations. The interconversion is virtually barrier-free and occurs via a rotation between the axes of coaxially stacked helices with a transient perpendicular intermediate. We characterize this transition, with various interhelical orientations, by pseudodihedral angle and interhelical distance measures. The high flexibility of the junction, as also demonstrated experimentally, is suitable for IRES activity. Because foot-and-mouth disease virus IRES structure depends on long-range interactions involving domain 3, the perpendicular intermediate, which maintains coaxial stacking of helices and thereby consensus primary and secondary structure information, may be beneficial for guiding the overall organization of the RNA system in domain 3.
Collapse
|
19
|
Lertampaiporn S, Thammarongtham C, Nukoolkit C, Kaewkamnerdpong B, Ruengjitchatchawalya M. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm. Nucleic Acids Res 2014; 42:e93. [PMID: 24771344 PMCID: PMC4066759 DOI: 10.1093/nar/gku325] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/02/2014] [Accepted: 04/07/2014] [Indexed: 12/13/2022] Open
Abstract
To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features-structure, sequence, modularity, structural robustness and coding potential-to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm.
Collapse
Affiliation(s)
- Supatcha Lertampaiporn
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Chinae Thammarongtham
- Biochemical Engineering and Pilot Plant Research and Development Unit, National Center for Genetic Engineering and Biotechnology at King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| | - Chakarida Nukoolkit
- School of Information Technology, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Boonserm Kaewkamnerdpong
- Biological Engineering Program, Faculty of Engineering, King Mongkut's University of Technology Thonburi, 126 Pracha Uthit Rd, Bangmod, Thung Khru, Bangkok 10140, Thailand
| | - Marasri Ruengjitchatchawalya
- Biotechnology Program, School of Bioresources and Technology, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand Bioinformatics and Systems Biology Program, King Mongkut's University of Technology Thonburi (Bang Khun Thian Campus), 49 Soi Thian Thale 25, Bang Khun Thian Chai Thale Rd, Tha Kham, Bangkok 10150, Thailand
| |
Collapse
|
20
|
Warris S, Boymans S, Muiser I, Noback M, Krijnen W, Nap JP. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences. BMC Res Notes 2014; 7:34. [PMID: 24418292 PMCID: PMC3895842 DOI: 10.1186/1756-0500-7-34] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Accepted: 01/07/2014] [Indexed: 11/29/2022] Open
Abstract
Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Collapse
Affiliation(s)
| | | | | | | | | | - Jan-Peter Nap
- Expertise Centre ALIFE, Institute for Life Science & Technology, Hanze University of Applied Sciences, Groningen, The Netherlands.
| |
Collapse
|
21
|
Abstract
De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.
Collapse
Affiliation(s)
- Walter L Ruzzo
- Fred Hutchinson Cancer Research Center, Seattle, WA, 98109, USA
| | | |
Collapse
|
22
|
Abstract
The last decade has seen tremendous effort committed to the annotation of the human genome sequence, most notably perhaps in the form of the ENCODE project. One of the major findings of ENCODE, and other genome analysis projects, is that the human transcriptome is far larger and more complex than previously thought. This complexity manifests, for example, as alternative splicing within protein-coding genes, as well as in the discovery of thousands of long noncoding RNAs. It is also possible that significant numbers of human transcripts have not yet been described by annotation projects, while existing transcript models are frequently incomplete. The question as to what proportion of this complexity is truly functional remains open, however, and this ambiguity presents a serious challenge to genome scientists. In this article, we will discuss the current state of human transcriptome annotation, drawing on our experience gained in generating the GENCODE gene annotation set. We highlight the gaps in our knowledge of transcript functionality that remain, and consider the potential computational and experimental strategies that can be used to help close them. We propose that an understanding of the true overlap between transcriptional complexity and functionality will not be gained in the short term. However, significant steps toward obtaining this knowledge can now be taken by using an integrated strategy, combining all of the experimental resources at our disposal.
Collapse
Affiliation(s)
- Jonathan M Mudge
- Department of Informatics, Wellcome Trust Sanger Institute, Hinxton CB10 1SA, United Kingdom
| | | | | |
Collapse
|
23
|
Smith MA, Gesell T, Stadler PF, Mattick JS. Widespread purifying selection on RNA structure in mammals. Nucleic Acids Res 2013; 41:8220-36. [PMID: 23847102 PMCID: PMC3783177 DOI: 10.1093/nar/gkt596] [Citation(s) in RCA: 130] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 05/29/2013] [Accepted: 06/16/2013] [Indexed: 12/14/2022] Open
Abstract
Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consensus RNA structure prediction. The benchmarking data obtained from two refined structure prediction algorithms, RNAz and SISSIz, were then analyzed to fine-tune the parameters of an optimized workflow for genomic sliding window screens. When applied to consistency-based multiple genome alignments of 35 mammals, our approach confidently identifies >4 million evolutionarily constrained RNA structures using a conservative sensitivity threshold that entails historically low false discovery rates for such analyses (5-22%). These predictions comprise 13.6% of the human genome, 88% of which fall outside any known sequence-constrained element, suggesting that a large proportion of the mammalian genome is functional. As an example, our findings identify both known and novel conserved RNA structure motifs in the long noncoding RNA MALAT1. This study provides an extensive set of functional transcriptomic annotations that will assist researchers in uncovering the precise mechanisms underlying the developmental ontologies of higher eukaryotes.
Collapse
Affiliation(s)
- Martin A. Smith
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - Tanja Gesell
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - Peter F. Stadler
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| | - John S. Mattick
- RNA Biology and Plasticity Laboratory, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, Sydney, NSW 2010 Australia, Genomics and Computational Biology Division, Institute for Molecular Bioscience, 306 Carmody Rd, University of Queensland, Brisbane, 4067 Australia, Department of Structural and Computational Biology; and Center for Integrative Bioinformatics Vienna (CIBIV), Max F. Perutz Laboratories (MFPL), University of Vienna, Medical University of Vienna, Dr. Bohr-Gasse 9, A-1030 Vienna, Austria, Bioinformatics Group, Department of Computer Science; and Interdisciplinary Center for Bioinformatics, University of Leipzig, Härtelstrasse 16–18, D-04107 Leipzig, Germany, Max Planck Institute for Mathematics in the Sciences, Inselstraße 22, D-04103 Leipzig, Germany, Center for Non-coding RNA in Technology and Health, Department of Basic Veterinary and Animal Sciences, Faculty of Life Sciences University of Copenhagen, Grønnegårdsvej 3, 1870 Frederiksberg C Denmark, Santa Fe Institute, 1399 Hyde Park Rd, Santa Fe, NM 87501, USA and St Vincent’s Clinical School, University of New South Wales, Level 5, de Lacy, Victoria St, St Vincent's Hospital, Sydney, NSW 2010 Australia
| |
Collapse
|
24
|
Laing C, Jung S, Kim N, Elmetwaly S, Zahran M, Schlick T. Predicting helical topologies in RNA junctions as tree graphs. PLoS One 2013; 8:e71947. [PMID: 23991010 PMCID: PMC3753280 DOI: 10.1371/journal.pone.0071947] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 07/05/2013] [Indexed: 01/11/2023] Open
Abstract
RNA molecules are important cellular components involved in many fundamental biological processes. Understanding the mechanisms behind their functions requires knowledge of their tertiary structures. Though computational RNA folding approaches exist, they often require manual manipulation and expert intuition; predicting global long-range tertiary contacts remains challenging. Here we develop a computational approach and associated program module (RNAJAG) to predict helical arrangements/topologies in RNA junctions. Our method has two components: junction topology prediction and graph modeling. First, junction topologies are determined by a data mining approach from a given secondary structure of the target RNAs; second, the predicted topology is used to construct a tree graph consistent with geometric preferences analyzed from solved RNAs. The predicted graphs, which model the helical arrangements of RNA junctions for a large set of 200 junctions using a cross validation procedure, yield fairly good representations compared to the helical configurations in native RNAs, and can be further used to develop all-atom models as we show for two examples. Because junctions are among the most complex structural elements in RNA, this work advances folding structure prediction methods of large RNAs. The RNAJAG module is available to academic users upon request.
Collapse
Affiliation(s)
- Christian Laing
- Department of Biology, Wilkes University, Wilkes-Barre, Pennsylvania, United States of America
- Department of Mathematics and Computer Science, Wilkes University, Wilkes-Barre, Pennsylvania, United States of America
| | - Segun Jung
- Department of Chemistry, New York University, New York, United States of America
| | - Namhee Kim
- Department of Chemistry, New York University, New York, United States of America
| | - Shereef Elmetwaly
- Department of Chemistry, New York University, New York, United States of America
| | - Mai Zahran
- Department of Chemistry, New York University, New York, United States of America
| | - Tamar Schlick
- Department of Chemistry, New York University, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, United States of America
- * E-mail:
| |
Collapse
|
25
|
McGraw S, Shojaei Saadi HA, Robert C. Meeting the methodological challenges in molecular mapping of the embryonic epigenome. Mol Hum Reprod 2013; 19:809-27. [PMID: 23783346 DOI: 10.1093/molehr/gat046] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
The past decade of life sciences research has been driven by progress in genomics. Many voices are already proclaiming the post-genomics era, in which phenomena other than sequence polymorphism influence gene expression and also explain complex phenotypes. One of these burgeoning fields is the study of the epigenome. Although the mechanisms by which chromatin structure and reorganization as well as cytosine methylation influence gene expression are not fully understood, they are being invoked to explain the now-accepted long-term impact of the environment on gene expression, which appears to be a factor in the development of numerous diseases. Such studies are particularly relevant in early embryonic development, during which waves of epigenetic reprogramming are known to have profound impacts. Since gametes and zygotes are in the process of resetting the genome in order to create embryonic stem cells that will each differentiate to create one of many specific tissue types, this phase of life is now viewed as a window of susceptibility to epigenetic reprogramming errors. Epigenetics could explain the influence of factors such as the nutritional/metabolic status of the mother or the artificial environment of assisted reproductive technologies. However, the peculiar nature of early embryos in addition to their scarcity poses numerous technological challenges that are slowly being overcome. The principal subject of this article is to review the suitability of various current and emerging technological platforms to study oocytes and early embryonic epigenome with more emphasis on studying DNA methylation. Furthermore, the constraint of samples size, inherent to the study of preimplantation embryo development, was put in perspective with the various molecular platforms described.
Collapse
Affiliation(s)
- Serge McGraw
- Department of Human Genetics, Montreal Children's Hospital Research Institute, McGill University, Montréal, QC H3Z 2Z3, Canada
| | | | | |
Collapse
|
26
|
Abstract
Recent genome-wide computational screens that search for conservation of RNA secondary structure in whole-genome alignments (WGAs) have predicted thousands of structural noncoding RNAs (ncRNAs). The sensitivity of such approaches, however, is limited, due to their reliance on sequence-based whole-genome aligners, which regularly misalign structural ncRNAs. This suggests that many more structural ncRNAs may remain undetected. Structure-based alignment, which could increase the sensitivity, has been prohibitive for genome-wide screens due to its extreme computational costs. Breaking this barrier, we present the pipeline REAPR (RE-Alignment for Prediction of structural ncRNA), which efficiently realigns whole genomes based on RNA sequence and structure, thus allowing us to boost the performance of de novo ncRNA predictors, such as RNAz. Key to the pipeline's efficiency is the development of a novel banding technique for multiple RNA alignment. REAPR significantly outperforms the widely used predictors RNAz and EvoFold in genome-wide screens; in direct comparison to the most recent RNAz screen on D. melanogaster, REAPR predicts twice as many high-confidence ncRNA candidates. Moreover, modENCODE RNA-seq experiments confirm a substantial number of its predictions as transcripts. REAPR's advancement of de novo structural characterization of ncRNAs complements the identification of transcripts from rapidly accumulating RNA-seq data.
Collapse
Affiliation(s)
- Sebastian Will
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | |
Collapse
|
27
|
Podolska A, Anthon C, Bak M, Tommerup N, Skovgaard K, Heegaard PM, Gorodkin J, Cirera S, Fredholm M. Profiling microRNAs in lung tissue from pigs infected with Actinobacillus pleuropneumoniae. BMC Genomics 2012; 13:459. [PMID: 22953717 PMCID: PMC3465251 DOI: 10.1186/1471-2164-13-459] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 08/29/2012] [Indexed: 12/25/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a class of non-protein-coding genes that play a crucial regulatory role in mammalian development and disease. Whereas a large number of miRNAs have been annotated at the structural level during the latest years, functional annotation is sparse. Actinobacillus pleuropneumoniae (APP) causes serious lung infections in pigs. Severe damage to the lungs, in many cases deadly, is caused by toxins released by the bacterium and to some degree by host mediated tissue damage. However, understanding of the role of microRNAs in the course of this infectious disease in porcine is still very limited. Results In this study, the RNA extracted from visually unaffected and necrotic tissue from pigs infected with Actinobacillus pleuropneumoniae was subjected to small RNA deep sequencing. We identified 169 conserved and 11 candidate novel microRNAs in the pig. Of these, 17 were significantly up-regulated in the necrotic sample and 12 were down-regulated. The expression analysis of a number of candidates revealed microRNAs of potential importance in the innate immune response. MiR-155, a known key player in inflammation, was found expressed in both samples. Moreover, miR-664-5p, miR-451 and miR-15a appear as very promising candidates for microRNAs involved in response to pathogen infection. Conclusions This is the first study revealing significant differences in composition and expression profiles of miRNAs in lungs infected with a bacterial pathogen. Our results extend annotation of microRNA in pig and provide insight into the role of a number of microRNAs in regulation of bacteria induced immune and inflammatory response in porcine lung.
Collapse
Affiliation(s)
- Agnieszka Podolska
- Department of Veterinary Clinical and Animal Sciences, Section of Anatomy, Cell Biology, Genetics and Bioinformatics, University of Copenhagen, Faculty of Health and Medical Sciences, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Wenzel A, Akbasli E, Gorodkin J. RIsearch: fast RNA-RNA interaction search using a simplified nearest-neighbor energy model. ACTA ACUST UNITED AC 2012; 28:2738-46. [PMID: 22923300 PMCID: PMC3476332 DOI: 10.1093/bioinformatics/bts519] [Citation(s) in RCA: 69] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Motivation: Regulatory, non-coding RNAs often function by forming a duplex with other RNAs. It is therefore of interest to predict putative RNA–RNA duplexes in silico on a genome-wide scale. Current computational methods for predicting these interactions range from fast complementary-based searches to those that take intramolecular binding into account. Together these methods constitute a trade-off between speed and accuracy, while leaving room for improvement within the context of genome-wide screens. A fast pre-filtering of putative duplexes would therefore be desirable. Results: We present RIsearch, an implementation of a simplified Turner energy model for fast computation of hybridization, which significantly reduces runtime while maintaining accuracy. Its time complexity for sequences of lengths m and n is with a much smaller pre-factor than other tools. We show that this energy model is an accurate approximation of the full energy model for near-complementary RNA–RNA duplexes. RIsearch uses a Smith–Waterman-like algorithm using a dinucleotide scoring matrix which approximates the Turner nearest-neighbor energies. We show in benchmarks that we achieve a speed improvement of at least 2.4× compared with RNAplex, the currently fastest method for searching near-complementary regions. RIsearch shows a prediction accuracy similar to RNAplex on two datasets of known bacterial short RNA (sRNA)–messenger RNA (mRNA) and eukaryotic microRNA (miRNA)–mRNA interactions. Using RIsearch as a pre-filter in genome-wide screens reduces the number of binding site candidates reported by miRNA target prediction programs, such as TargetScanS and miRanda, by up to 70%. Likewise, substantial filtering was performed on bacterial RNA–RNA interaction data. Availability: The source code for RIsearch is available at: http://rth.dk/resources/risearch. Contact:gorodkin@rth.dk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anne Wenzel
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, DK-1870 Frederiksberg, Denmark
| | | | | |
Collapse
|
29
|
Enfield KSS, Pikor LA, Martinez VD, Lam WL. Mechanistic Roles of Noncoding RNAs in Lung Cancer Biology and Their Clinical Implications. GENETICS RESEARCH INTERNATIONAL 2012; 2012:737416. [PMID: 22852089 PMCID: PMC3407615 DOI: 10.1155/2012/737416] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/16/2011] [Accepted: 03/08/2012] [Indexed: 01/07/2023]
Abstract
Lung cancer biology has traditionally focused on genomic and epigenomic deregulation of protein-coding genes to identify oncogenes and tumor suppressors diagnostic and therapeutic targets. Another important layer of cancer biology has emerged in the form of noncoding RNAs (ncRNAs), which are major regulators of key cellular processes such as proliferation, RNA splicing, gene regulation, and apoptosis. In the past decade, microRNAs (miRNAs) have moved to the forefront of ncRNA cancer research, while the role of long noncoding RNAs (lncRNAs) is emerging. Here we review the mechanisms by which miRNAs and lncRNAs are deregulated in lung cancer, the technologies that can be applied to detect such alterations, and the clinical potential of these RNA species. An improved comprehension of lung cancer biology will come through the understanding of the interplay between deregulation of non-coding RNAs, the protein-coding genes they regulate, and how these interactions influence cellular networks and signalling pathways.
Collapse
Affiliation(s)
- Katey S. S. Enfield
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
| | - Larissa A. Pikor
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
| | - Victor D. Martinez
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6T2B5
| | - Wan L. Lam
- British Columbia Cancer Research Center, Vancouver, BC, Canada V5Z 1L3
- Interdisciplinary Oncology Program, University of British Columbia, Vancouver, BC, Canada V5Z1L3
- Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, BC, Canada V6T2B5
| |
Collapse
|
30
|
Seemann SE, Sunkin SM, Hawrylycz MJ, Ruzzo WL, Gorodkin J. Transcripts with in silico predicted RNA structure are enriched everywhere in the mouse brain. BMC Genomics 2012; 13:214. [PMID: 22651826 PMCID: PMC3464589 DOI: 10.1186/1471-2164-13-214] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2011] [Accepted: 05/31/2012] [Indexed: 01/24/2023] Open
Abstract
Background Post-transcriptional control of gene expression is mostly conducted by specific elements in untranslated regions (UTRs) of mRNAs, in collaboration with specific binding proteins and RNAs. In several well characterized cases, these RNA elements are known to form stable secondary structures. RNA secondary structures also may have major functional implications for long noncoding RNAs (lncRNAs). Recent transcriptional data has indicated the importance of lncRNAs in brain development and function. However, no methodical efforts to investigate this have been undertaken. Here, we aim to systematically analyze the potential for RNA structure in brain-expressed transcripts. Results By comprehensive spatial expression analysis of the adult mouse in situ hybridization data of the Allen Mouse Brain Atlas, we show that transcripts (coding as well as non-coding) associated with in silico predicted structured probes are highly and significantly enriched in almost all analyzed brain regions. Functional implications of these RNA structures and their role in the brain are discussed in detail along with specific examples. We observe that mRNAs with a structure prediction in their UTRs are enriched for binding, transport and localization gene ontology categories. In addition, after manual examination we observe agreement between RNA binding protein interaction sites near the 3’ UTR structures and correlated expression patterns. Conclusions Our results show a potential use for RNA structures in expressed coding as well as noncoding transcripts in the adult mouse brain, and describe the role of structured RNAs in the context of intracellular signaling pathways and regulatory networks. Based on this data we hypothesize that RNA structure is widely involved in transcriptional and translational regulatory mechanisms in the brain and ultimately plays a role in brain function.
Collapse
Affiliation(s)
- Stefan E Seemann
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark
| | | | | | | | | |
Collapse
|
31
|
Minocherhomji S, Seemann S, Mang Y, El-Schich Z, Bak M, Hansen C, Papadopoulos N, Josefsen K, Nielsen H, Gorodkin J, Tommerup N, Silahtaroglu A. Sequence and expression analysis of gaps in human chromosome 20. Nucleic Acids Res 2012; 40:6660-72. [PMID: 22510267 PMCID: PMC3413113 DOI: 10.1093/nar/gks302] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The finished human genome-assemblies comprise several hundred un-sequenced euchromatic gaps, which may be rich in long polypurine/polypyrimidine stretches. Human chromosome 20 (chr 20) currently has three unfinished gaps remaining on its q-arm. All three gaps are within gene-dense regions and/or overlap disease-associated loci, including the DLGAP4 locus. In this study, we sequenced ∼99% of all three unfinished gaps on human chr 20, determined their complete genomic sizes and assessed epigenetic profiles using a combination of Sanger sequencing, mate pair paired-end high-throughput sequencing and chromatin, methylation and expression analyses. We found histone 3 trimethylated at Lysine 27 to be distributed across all three gaps in immortalized B-lymphocytes. In one gap, five novel CpG islands were predominantly hypermethylated in genomic DNA from peripheral blood lymphocytes and human cerebellum. One of these CpG islands was differentially methylated and paternally hypermethylated. We found all chr 20 gaps to comprise structured non-coding RNAs (ncRNAs) and to be conserved in primates. We verified expression for 13 candidate ncRNAs, some of which showed tissue specificity. Four ncRNAs expressed within the gap at DLGAP4 show elevated expression in the human brain. Our data suggest that unfinished human genome gaps are likely to comprise numerous functional elements.
Collapse
Affiliation(s)
- Sheroy Minocherhomji
- Wilhelm Johannsen Centre for Functional Genome Research, University of Copenhagen, Blegdamsvej 3B, DK-2200 Copenhagen N, Denmark
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
32
|
Lange SJ, Maticzka D, Möhl M, Gagnon JN, Brown CM, Backofen R. Global or local? Predicting secondary structure and accessibility in mRNAs. Nucleic Acids Res 2012; 40:5215-26. [PMID: 22373926 PMCID: PMC3384308 DOI: 10.1093/nar/gks181] [Citation(s) in RCA: 116] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022] Open
Abstract
Determining the structural properties of mRNA is key to understanding vital post-transcriptional processes. As experimental data on mRNA structure are scarce, accurate structure prediction is required to characterize RNA regulatory mechanisms. Although various structure prediction approaches are available, it is often unclear which to choose and how to set their parameters. Furthermore, no standard measure to compare predictions of local structure exists. We assessed the performance of different methods using two types of data: transcriptome-wide enzymatic probing information and a large, curated set of cis-regulatory elements. To compare the approaches, we introduced structure accuracy, a measure that is applicable to both global and local methods. Our results showed that local folding was more accurate than the classic global approach. We investigated how the locality parameters, maximum base pair span and window size, influenced the prediction performance. A span of 150 provided a reasonable balance between maximizing the number of accurately predicted base pairs, while minimizing effects of incorrect long-range predictions. We characterized the error at artificial sequence ends, which we reduced by setting the window size sufficiently greater than the maximum span. Our method, LocalFold, diminished all border effects and produced the most robust performance.
Collapse
Affiliation(s)
- Sita J Lange
- Department of Computer Science and Centre for Biological Signalling Studies (BIOSS), Albert-Ludwigs-Universität Freiburg, Germany
| | | | | | | | | | | |
Collapse
|
33
|
Langenberger D, Pundhir S, Ekstrøm CT, Stadler PF, Hoffmann S, Gorodkin J. deepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns. ACTA ACUST UNITED AC 2011; 28:17-24. [PMID: 22053076 PMCID: PMC3244762 DOI: 10.1093/bioinformatics/btr598] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example. RESULTS deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads. AVAILABILITY The program deepBlockAlign is available as source code from http://rth.dk/resources/dba/. CONTACT gorodkin@rth.dk; studla@bioinf.uni-leipzig.de SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David Langenberger
- Bioinformatics Group, Department of Computer Science, Interdisciplinary Center for Bioinformatics, Universität Leipzig, Philipp-Rosenthal-Strasse 27, D-04107 Leipzig, Germany
| | | | | | | | | | | |
Collapse
|