51
|
Jacobsen MJ, Havgaard JH, Mentzel CMJ, Sørensen PM, Pundhir S, Anthon C, Karlskov-Mortensen P, Bruun CS, Cirera S, Gorodkin J, Jørgensen CB, Barrès R, Fredholm M. P2019 Adipocyte gene expression and DNA methylation patterns differ significantly between lean and obese pigs. J Anim Sci 2016. [DOI: 10.2527/jas2016.94supplement446x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
52
|
Seemann SE, Anthon C, Palasca O, Gorodkin J. Quality Assessment of Domesticated Animal Genome Assemblies. Bioinform Biol Insights 2016; 9:49-58. [PMID: 27279738 PMCID: PMC4898645 DOI: 10.4137/bbi.s29333] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2015] [Revised: 05/02/2016] [Accepted: 05/03/2016] [Indexed: 11/07/2022] Open
Abstract
The era of high-throughput sequencing has made it relatively simple to sequence genomes and transcriptomes of individuals from many species. In order to analyze the resulting sequencing data, high-quality reference genome assemblies are required. However, this is still a major challenge, and many domesticated animal genomes still need to be sequenced deeper in order to produce high-quality assemblies. In the meanwhile, ironically, the extent to which RNAseq and other next-generation data is produced frequently far exceeds that of the genomic sequence. Furthermore, basic comparative analysis is often affected by the lack of genomic sequence. Herein, we quantify the quality of the genome assemblies of 20 domesticated animals and related species by assessing a range of measurable parameters, and we show that there is a positive correlation between the fraction of mappable reads from RNAseq data and genome assembly quality. We rank the genomes by their assembly quality and discuss the implications for genotype analyses.
Collapse
|
53
|
Sundfeld D, Havgaard JH, de Melo ACMA, Gorodkin J. Foldalign 2.5: multithreaded implementation for pairwise structural RNA alignment. Bioinformatics 2015; 32:1238-40. [PMID: 26704597 PMCID: PMC4824132 DOI: 10.1093/bioinformatics/btv748] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2015] [Accepted: 12/16/2015] [Indexed: 11/13/2022] Open
Abstract
Motivation: Structured RNAs can be hard to search for as they often are not well conserved in their primary structure and are local in their genomic or transcriptomic context. Thus, the need for tools which in particular can make local structural alignments of RNAs is only increasing. Results: To meet the demand for both large-scale screens and hands on analysis through web servers, we present a new multithreaded version of Foldalign. We substantially improve execution time while maintaining all previous functionalities, including carrying out local structural alignments of sequences with low similarity. Furthermore, the improvements allow for comparing longer RNAs and increasing the sequence length. For example, lengths in the range 2000–6000 nucleotides improve execution up to a factor of five. Availability and implementation: The Foldalign software and the web server are available at http://rth.dk/resources/foldalign Contact:gorodkin@rth.dk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
|
54
|
Sükösd Z, Andersen ES, Seemann SE, Jensen MK, Hansen M, Gorodkin J, Kjems J. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain. Nucleic Acids Res 2015; 43:10168-79. [PMID: 26476446 PMCID: PMC4666355 DOI: 10.1093/nar/gkv1039] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 09/30/2015] [Indexed: 11/30/2022] Open
Abstract
A distance constrained secondary structural model of the ≈10 kb RNA genome of the HIV-1 has been predicted but higher-order structures, involving long distance interactions, are currently unknown. We present the first global RNA secondary structure model for the HIV-1 genome, which integrates both comparative structure analysis and information from experimental data in a full-length prediction without distance constraints. Besides recovering known structural elements, we predict several novel structural elements that are conserved in HIV-1 evolution. Our results also indicate that the structure of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping protein-coding regions the COS is supported by a particular high frequency of compensatory base changes, suggesting functional importance for this element. This new structural element potentially organizes the whole genome into three major domains protruding from a conserved core structure with potential roles in replication and evolution for the virus.
Collapse
|
55
|
Hecker N, Christensen-Dalsgaard M, Seemann SE, Havgaard JH, Stadler PF, Hofacker IL, Nielsen H, Gorodkin J. Optimizing RNA structures by sequence extensions using RNAcop. Nucleic Acids Res 2015; 43:8135-45. [PMID: 26283181 PMCID: PMC4787817 DOI: 10.1093/nar/gkv813] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2015] [Revised: 07/28/2015] [Accepted: 07/30/2015] [Indexed: 12/26/2022] Open
Abstract
A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop.
Collapse
|
56
|
Mentzel CMJ, Anthon C, Jacobsen MJ, Karlskov-Mortensen P, Bruun CS, Jørgensen CB, Gorodkin J, Cirera S, Fredholm M. Gender and Obesity Specific MicroRNA Expression in Adipose Tissue from Lean and Obese Pigs. PLoS One 2015; 10:e0131650. [PMID: 26222688 PMCID: PMC4519260 DOI: 10.1371/journal.pone.0131650] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2015] [Accepted: 06/05/2015] [Indexed: 02/06/2023] Open
Abstract
Obesity is a complex condition that increases the risk of life threatening diseases such as cardiovascular disease and diabetes. Studying the gene regulation of obesity is important for understanding the molecular mechanisms behind the obesity derived diseases and may lead to better intervention and treatment plans. MicroRNAs (miRNAs) are short non-coding RNAs regulating target mRNA by binding to their 3'UTR. They are involved in numerous biological processes and diseases, including obesity. In this study we use a mixed breed pig model designed for obesity studies to investigate differentially expressed miRNAs in subcutaneous adipose tissue by RNA sequencing (RNAseq). Both male and female pigs are included to explore gender differences. The RNAseq study shows that the most highly expressed miRNAs are in accordance with comparable studies in pigs and humans. A total of six miRNAs are differentially expressed in subcutaneous adipose tissue between the lean and obese group of pigs, and in addition gender specific significant differential expression is observed for a number of miRNAs. The differentially expressed miRNAs have been verified using qPCR. The results of these studies in general confirm the trends found by RNAseq. Mir-9 and mir-124a are significantly differentially expressed with large fold changes in subcutaneous adipose tissue between lean and obese pigs. Mir-9 is more highly expressed in the obese pigs with a fold change of 10 and a p-value < 0.001. Mir-124a is more highly expressed in the obese pigs with a fold change of 114 and a p-value < 0.001. In addition, mir-124a is significantly higher expressed in abdominal adipose tissue in male pigs with a fold change of 119 and a p-value < 0.05. Both miRNAs are also significantly higher expressed in the liver of obese male pigs where mir-124a has a fold change of 12 and mir-9 has a fold change of 1.6, both with p-values < 0.05.
Collapse
|
57
|
Pundhir S, Gorodkin J. Differential and coherent processing patterns from small RNAs. Sci Rep 2015; 5:12062. [PMID: 26166713 PMCID: PMC4499813 DOI: 10.1038/srep12062] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 06/16/2015] [Indexed: 12/16/2022] Open
Abstract
Post-transcriptional processing events related to short RNAs are often reflected in their read profile patterns emerging from high-throughput sequencing data. MicroRNA arm switching across different tissues is a well-known example of what we define as differential processing. Here, short RNAs from the nine cell lines of the ENCODE project, irrespective of their annotation status, were analyzed for genomic loci representing differential or coherent processing. We observed differential processing predominantly in RNAs annotated as miRNA, snoRNA or tRNA. Four out of five known cases of differentially processed miRNAs that were in the input dataset were recovered and several novel cases were discovered. In contrast to differential processing, coherent processing is observed widespread in both annotated and unannotated regions. While the annotated loci predominantly consist of ~24 nt short RNAs, the unannotated loci comparatively consist of ~17 nt short RNAs. Furthermore, these ~17 nt short RNAs are significantly enriched for overlap to transcription start sites and DNase I hypersensitive sites (p-value < 0.01) that are characteristic features of transcription initiation RNAs. We discuss how the computational pipeline developed in this study has the potential to be applied to other forms of RNA-seq data for further transcriptome-wide studies of differential and coherent processing.
Collapse
|
58
|
Stewart JB, Alaei-Mahabadi B, Sabarinathan R, Samuelsson T, Gorodkin J, Gustafsson CM, Larsson E. Simultaneous DNA and RNA Mapping of Somatic Mitochondrial Mutations across Diverse Human Cancers. PLoS Genet 2015; 11:e1005333. [PMID: 26125550 PMCID: PMC4488357 DOI: 10.1371/journal.pgen.1005333] [Citation(s) in RCA: 91] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2015] [Accepted: 06/03/2015] [Indexed: 12/30/2022] Open
Abstract
Somatic mutations in the nuclear genome are required for tumor formation, but the functional consequences of somatic mitochondrial DNA (mtDNA) mutations are less understood. Here we identify somatic mtDNA mutations across 527 tumors and 14 cancer types, using an approach that takes advantage of evidence from both genomic and transcriptomic sequencing. We find that there is selective pressure against deleterious coding mutations, supporting that functional mitochondria are required in tumor cells, and also observe a strong mutational strand bias, compatible with endogenous replication-coupled errors as the major source of mutations. Interestingly, while allelic ratios in general were consistent in RNA compared to DNA, some mutations in tRNAs displayed strong allelic imbalances caused by accumulation of unprocessed tRNA precursors. The effect was explained by altered secondary structure, demonstrating that correct tRNA folding is a major determinant for processing of polycistronic mitochondrial transcripts. Additionally, the data suggest that tRNA clusters are preferably processed in the 3′ to 5′ direction. Our study gives insights into mtDNA function in cancer and answers questions regarding mitochondrial tRNA biogenesis that are difficult to address in controlled experimental systems. According to the so-called “tRNA punctuation model”, tRNA processing is key to generating all mature mitochondrial mRNAs. However, the process is difficult to study in vivo, since standard tools for genetic manipulation are not applicable to mitochondria. Here, we circumvent this problem by using a large compendium of naturally occurring genetic perturbations, derived from human tumor sequencing data. We identify somatic mitochondrial mutations across hundreds of human tumors using an approach that simultaneously takes advantage of both genomic and transcriptomic sequencing. This enables us to compare the allele frequency in DNA and RNA for each mutation. Our data reveals that some mutations in mitochondrial tRNAs are associated with strong accumulation of immature tRNA precursors, indicative of impaired tRNA mutaration. We find that intact tRNA secondary structure is a major requirement for correct maturation, and that mutations affecting tRNA folding can impair maturation of not only the affected tRNA, but also neighboring gene transcripts. Mutations in mitochondrial tRNAs underlie a range of disease conditions, and our findings may help to explain why mutations in the same tRNA can present different phenotypes. Our results additionally support that there is selective pressure against mutations affecting oxidative phosphorylation, showing that functional mitochondria are required in many tumor cells.
Collapse
|
59
|
Pundhir S, Poirazi P, Gorodkin J. Emerging applications of read profiles towards the functional annotation of the genome. Front Genet 2015; 6:188. [PMID: 26042150 PMCID: PMC4437211 DOI: 10.3389/fgene.2015.00188] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 05/06/2015] [Indexed: 12/21/2022] Open
Abstract
Functional annotation of the genome is important to understand the phenotypic complexity of various species. The road toward functional annotation involves several challenges ranging from experiments on individual molecules to large-scale analysis of high-throughput sequencing (HTS) data. HTS data is typically a result of the protocol designed to address specific research questions. The sequencing results in reads, which when mapped to a reference genome often leads to the formation of distinct patterns (read profiles). Interpretation of these read profiles is essential for their analysis in relation to the research question addressed. Several strategies have been employed at varying levels of abstraction ranging from a somewhat ad hoc to a more systematic analysis of read profiles. These include methods which can compare read profiles, e.g., from direct (non-sequence based) alignments to classification of patterns into functional groups. In this review, we highlight the emerging applications of read profiles for the annotation of non-coding RNA and cis-regulatory elements (CREs) such as enhancers and promoters. We also discuss the biological rationale behind their formation.
Collapse
|
60
|
Mirza AH, Berthelsen CH, Seemann SE, Pan X, Frederiksen KS, Vilien M, Gorodkin J, Pociot F. Transcriptomic landscape of lncRNAs in inflammatory bowel disease. Genome Med 2015; 7:39. [PMID: 25991924 PMCID: PMC4437449 DOI: 10.1186/s13073-015-0162-2] [Citation(s) in RCA: 133] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2014] [Accepted: 04/09/2015] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Inflammatory bowel disease (IBD) is a complex multi-factorial inflammatory disease with Crohn's disease (CD) and ulcerative colitis (UC) being the two most common forms. A number of transcriptional profiling studies have provided compelling evidence that describe the role of protein-coding genes and microRNAs in modulating the immune responses in IBD. METHODS In the present study, we performed a genome-wide transcriptome profiling of lncRNAs and protein-coding genes in 96 colon pinch biopsies (inflamed and non-inflamed) extracted from multiple colonic locations from 45 patients (CD = 13, UC = 20, controls = 12) using an expression microarray platform. RESULTS In our study, we identified widespread dysregulation of lncRNAs and protein-coding genes in both inflamed and non-inflamed CD and UC compared to the healthy controls. In cases of inflamed CD and UC, we identified 438 and 745 differentially expressed lncRNAs, respectively, while in cases of the non-inflamed CD and UC, we identified 12 and 19 differentially expressed lncRNAs, respectively. We also observed significant enrichment (P-value <0.001, Pearson's Chi-squared test) for 96 differentially expressed lncRNAs and 154 protein-coding genes within the IBD susceptibility loci. Furthermore, we found strong positive expression correlations for the intersecting and cis-neighboring differentially expressed IBD loci-associated lncRNA-protein-coding gene pairs. The functional annotation analysis of differentially expressed genes revealed their involvement in the immune response, pro-inflammatory cytokine activity and MHC protein complex. CONCLUSIONS The lncRNA expression profiling in both inflamed and non-inflamed CD and UC successfully stratified IBD patients from the healthy controls. Taken together, the identified lncRNA transcriptional signature along with clinically relevant parameters suggest their potential as biomarkers in IBD.
Collapse
|
61
|
Birkedal U, Christensen-Dalsgaard M, Krogh N, Sabarinathan R, Gorodkin J, Nielsen H. Profiling of ribose methylations in RNA by high-throughput sequencing. Angew Chem Int Ed Engl 2014; 54:451-5. [PMID: 25417815 DOI: 10.1002/anie.201408362] [Citation(s) in RCA: 125] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2014] [Indexed: 11/07/2022]
Abstract
Ribose methylations are the most abundant chemical modifications of ribosomal RNA and are critical for ribosome assembly and fidelity of translation. Many aspects of ribose methylations have been difficult to study due to lack of efficient mapping methods. Here, we present a sequencing-based method (RiboMeth-seq) and its application to yeast ribosomes, presently the best-studied eukaryotic model system. We demonstrate detection of the known as well as new modifications, reveal partial modifications and unexpected communication between modification events, and determine the order of modification at several sites during ribosome biogenesis. Surprisingly, the method also provides information on a subset of other modifications. Hence, RiboMeth-seq enables a detailed evaluation of the importance of RNA modifications in the cells most sophisticated molecular machine. RiboMeth-seq can be adapted to other RNA classes, for example, mRNA, to reveal new biology involving RNA modifications.
Collapse
|
62
|
Birkedal U, Christensen-Dalsgaard M, Krogh N, Sabarinathan R, Gorodkin J, Nielsen H. Profiling of Ribose Methylations in RNA by High-Throughput Sequencing. Angew Chem Int Ed Engl 2014. [DOI: 10.1002/ange.201408362] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
63
|
Anthon C, Tafer H, Havgaard JH, Thomsen B, Hedegaard J, Seemann SE, Pundhir S, Kehr S, Bartschat S, Nielsen M, Nielsen RO, Fredholm M, Stadler PF, Gorodkin J. Structured RNAs and synteny regions in the pig genome. BMC Genomics 2014; 15:459. [PMID: 24917120 PMCID: PMC4124155 DOI: 10.1186/1471-2164-15-459] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 05/02/2014] [Indexed: 11/25/2022] Open
Abstract
Background Annotating mammalian genomes for noncoding RNAs (ncRNAs) is nontrivial since far from all ncRNAs are known and the computational models are resource demanding. Currently, the human genome holds the best mammalian ncRNA annotation, a result of numerous efforts by several groups. However, a more direct strategy is desired for the increasing number of sequenced mammalian genomes of which some, such as the pig, are relevant as disease models and production animals. Results We present a comprehensive annotation of structured RNAs in the pig genome. Combining sequence and structure similarity search as well as class specific methods, we obtained a conservative set with a total of 3,391 structured RNA loci of which 1,011 and 2,314, respectively, hold strong sequence and structure similarity to structured RNAs in existing databases. The RNA loci cover 139 cis-regulatory element loci, 58 lncRNA loci, 11 conflicts of annotation, and 3,183 ncRNA genes. The ncRNA genes comprise 359 miRNAs, 8 ribozymes, 185 rRNAs, 638 snoRNAs, 1,030 snRNAs, 810 tRNAs and 153 ncRNA genes not belonging to the here fore mentioned classes. When running the pipeline on a local shuffled version of the genome, we obtained no matches at the highest confidence level. Additional analysis of RNA-seq data from a pooled library from 10 different pig tissues added another 165 miRNA loci, yielding an overall annotation of 3,556 structured RNA loci. This annotation represents our best effort at making an automated annotation. To further enhance the reliability, 571 of the 3,556 structured RNAs were manually curated by methods depending on the RNA class while 1,581 were declared as pseudogenes. We further created a multiple alignment of pig against 20 representative vertebrates, from which RNAz predicted 83,859 de novo RNA loci with conserved RNA structures. 528 of the RNAz predictions overlapped with the homology based annotation or novel miRNAs. We further present a substantial synteny analysis which includes 1,004 lineage specific de novo RNA loci and 4 ncRNA loci in the known annotation specific for Laurasiatheria (pig, cow, dolphin, horse, cat, dog, hedgehog). Conclusions We have obtained one of the most comprehensive annotations for structured ncRNAs of a mammalian genome, which is likely to play central roles in both health modelling and production. The core annotation is available in Ensembl 70 and the complete annotation is available at
http://rth.dk/resources/rnannotator/susscr102/version1.02. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-459) contains supplementary material, which is available to authorized users.
Collapse
|
64
|
Pundhir S, Hannibal TD, Bang-Berthelsen CH, Wegener AMK, Pociot F, Holmberg D, Gorodkin J. Spatially conserved regulatory elements identified within human and mouse Cd247 gene using high-throughput sequencing data from the ENCODE project. Gene 2014; 545:80-7. [PMID: 24797614 DOI: 10.1016/j.gene.2014.05.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2013] [Revised: 03/31/2014] [Accepted: 05/01/2014] [Indexed: 10/25/2022]
Abstract
The Cd247 gene encodes for a transmembrane protein important for the expression and assembly of TCR/CD3 complex on the surface of T lymphocytes. Down-regulation of CD247 has functional consequences in systemic autoimmunity and has been shown to be associated with Type 1 Diabetes in NOD mouse. In this study, we have utilized the wealth of high-throughput sequencing data produced during the Encyclopedia of DNA Elements (ENCODE) project to identify spatially conserved regulatory elements within the Cd247 gene from human and mouse. We show the presence of two transcription factor binding sites, supported by histone marks and ChIP-seq data, that specifically have features of an enhancer and a promoter, respectively. We also identified a putative long non-coding RNA from the characteristically long first intron of the Cd247 gene. The long non-coding RNA annotation is supported by manual annotations from the GENCODE project in human and our expression quantification analysis performed in NOD and B6 mice using qRT-PCR. Furthermore, 17 of the 23 SNPs already known to be implicated with T1D were observed within the long non-coding RNA region in mouse. The spatially conserved regulatory elements identified in this study have the potential to enrich our understanding of the role of Cd247 gene in autoimmune diabetes.
Collapse
|
65
|
Sabarinathan R, Wenzel A, Novotny P, Tang X, Kalari KR, Gorodkin J. Transcriptome-wide analysis of UTRs in non-small cell lung cancer reveals cancer-related genes with SNV-induced changes on RNA secondary structure and miRNA target sites. PLoS One 2014; 9:e82699. [PMID: 24416147 PMCID: PMC3885406 DOI: 10.1371/journal.pone.0082699] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2013] [Accepted: 10/26/2013] [Indexed: 01/08/2023] Open
Abstract
Traditional mutation assessment methods generally focus on predicting disruptive changes in protein-coding regions rather than non-coding regulatory regions like untranslated regions (UTRs) of mRNAs. The UTRs, however, are known to have many sequence and structural motifs that can regulate translational and transcriptional efficiency and stability of mRNAs through interaction with RNA-binding proteins and other non-coding RNAs like microRNAs (miRNAs). In a recent study, transcriptomes of tumor cells harboring mutant and wild-type KRAS (V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog) genes in patients with non-small cell lung cancer (NSCLC) have been sequenced to identify single nucleotide variations (SNVs). About 40% of the total SNVs (73,717) identified were mapped to UTRs, but omitted in the previous analysis. To meet this obvious demand for analysis of the UTRs, we designed a comprehensive pipeline to predict the effect of SNVs on two major regulatory elements, secondary structure and miRNA target sites. Out of 29,290 SNVs in 6462 genes, we predict 472 SNVs (in 408 genes) affecting local RNA secondary structure, 490 SNVs (in 447 genes) affecting miRNA target sites and 48 that do both. Together these disruptive SNVs were present in 803 different genes, out of which 188 (23.4%) were previously known to be cancer-associated. Notably, this ratio is significantly higher (one-sided Fisher's exact test p-value = 0.032) than the ratio (20.8%) of known cancer-associated genes (n = 1347) in our initial data set (n = 6462). Network analysis shows that the genes harboring disruptive SNVs were involved in molecular mechanisms of cancer, and the signaling pathways of LPS-stimulated MAPK, IL-6, iNOS, EIF2 and mTOR. In conclusion, we have found hundreds of SNVs which are highly disruptive with respect to changes in the secondary structure and miRNA target sites within UTRs. These changes hold the potential to alter the expression of known cancer genes or genes linked to cancer-associated pathways.
Collapse
|
66
|
Abstract
De novo discovery of "motifs" capturing the commonalities among related noncoding ncRNA structured RNAs is among the most difficult problems in computational biology. This chapter outlines the challenges presented by this problem, together with some approaches towards solving them, with an emphasis on an approach based on the CMfinder CMfinder program as a case study. Applications to genomic screens for novel de novo structured ncRNA ncRNA s, including structured RNA elements in untranslated portions of protein-coding genes, are presented.
Collapse
|
67
|
Mørk S, Pletscher-Frankild S, Palleja Caro A, Gorodkin J, Jensen LJ. Protein-driven inference of miRNA-disease associations. Bioinformatics 2013; 30:392-7. [PMID: 24273243 PMCID: PMC3904518 DOI: 10.1093/bioinformatics/btt677] [Citation(s) in RCA: 145] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION MicroRNAs (miRNAs) are a highly abundant class of non-coding RNA genes involved in cellular regulation and thus also diseases. Despite miRNAs being important disease factors, miRNA-disease associations remain low in number and of variable reliability. Furthermore, existing databases and prediction methods do not explicitly facilitate forming hypotheses about the possible molecular causes of the association, thereby making the path to experimental follow-up longer. RESULTS Here we present miRPD in which miRNA-Protein-Disease associations are explicitly inferred. Besides linking miRNAs to diseases, it directly suggests the underlying proteins involved, which can be used to form hypotheses that can be experimentally tested. The inference of miRNAs and diseases is made by coupling known and predicted miRNA-protein associations with protein-disease associations text mined from the literature. We present scoring schemes that allow us to rank miRNA-disease associations inferred from both curated and predicted miRNA targets by reliability and thereby to create high- and medium-confidence sets of associations. Analyzing these, we find statistically significant enrichment for proteins involved in pathways related to cancer and type I diabetes mellitus, suggesting either a literature bias or a genuine biological trend. We show by example how the associations can be used to extract proteins for disease hypothesis. AVAILABILITY AND IMPLEMENTATION All datasets, software and a searchable Web site are available at http://mirpd.jensenlab.org.
Collapse
|
68
|
Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J. RNAsnp: efficient detection of local RNA secondary structure changes induced by SNPs. Hum Mutat 2013; 34:546-56. [PMID: 23315997 PMCID: PMC3708107 DOI: 10.1002/humu.22273] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2012] [Accepted: 12/18/2012] [Indexed: 02/05/2023]
Abstract
Structural characteristics are essential for the functioning of many noncoding RNAs and cis-regulatory elements of mRNAs. SNPs may disrupt these structures, interfere with their molecular function, and hence cause a phenotypic effect. RNA folding algorithms can provide detailed insights into structural effects of SNPs. The global measures employed so far suffer from limited accuracy of folding programs on large RNAs and are computationally too demanding for genome-wide applications. Here, we present a strategy that focuses on the local regions of maximal structural change between mutant and wild-type. These local regions are approximated in a “screening mode” that is intended for genome-wide applications. Furthermore, localized regions are identified as those with maximal discrepancy. The mutation effects are quantified in terms of empirical P values. To this end, the RNAsnp software uses extensive precomputed tables of the distribution of SNP effects as function of length and GC content. RNAsnp thus achieves both a noise reduction and speed-up of several orders of magnitude over shuffling-based approaches. On a data set comprising 501 SNPs associated with human-inherited diseases, we predict 54 to have significant local structural effect in the untranslated region of mRNAs. RNAsnp is available at http://rth.dk/resources/rnasnp.
Collapse
|
69
|
Theis C, Höner Zu Siederdissen C, Hofacker IL, Gorodkin J. Automated identification of RNA 3D modules with discriminative power in RNA structural alignments. Nucleic Acids Res 2013; 41:9999-10009. [PMID: 24005040 PMCID: PMC3905863 DOI: 10.1093/nar/gkt795] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
Recent progress in predicting RNA structure is moving towards filling the ‘gap’ in 2D RNA structure prediction where, for example, predicted internal loops often form non-canonical base pairs. This is increasingly recognized with the steady increase of known RNA 3D modules. There is a general interest in matching structural modules known from one molecule to other molecules for which the 3D structure is not known yet. We have created a pipeline, metaRNAmodules, which completely automates extracting putative modules from the FR3D database and mapping of such modules to Rfam alignments to obtain comparative evidence. Subsequently, the modules, initially represented by a graph, are turned into models for the RMDetect program, which allows to test their discriminative power using real and randomized Rfam alignments. An initial extraction of 22 495 3D modules in all PDB files results in 977 internal loop and 17 hairpin modules with clear discriminatory power. Many of these modules describe only minor variants of each other. Indeed, mapping of the modules onto Rfam families results in 35 unique locations in 11 different families. The metaRNAmodules pipeline source for the internal loop modules is available at http://rth.dk/resources/mrm.
Collapse
|
70
|
Pundhir S, Gorodkin J. MicroRNA discovery by similarity search to a database of RNA-seq profiles. Front Genet 2013; 4:133. [PMID: 23874353 PMCID: PMC3708161 DOI: 10.3389/fgene.2013.00133] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2013] [Accepted: 06/21/2013] [Indexed: 01/01/2023] Open
Abstract
In silico generated search for microRNAs (miRNAs) has been driven by methods compiling structural features of the miRNA precursor hairpin, as well as to some degree combining this with the analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1-2 ~22 nt blocks of reads corresponding to the mature and star miRNA. In complement to the previous methods, we present a study where we systematically exploit these patterns of read profiles. We created two datasets comprised of 2540 and 4795 read profiles obtained after preprocessing short RNA-seq data from miRBase and ENCODE, respectively. Out of 4795 ENCODE read profiles, 1361 are annotated as non-coding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using deepBlockAlign (dba), we align ncRNA read profiles from ENCODE against the miRBase read profiles (cleaned for "self-matches") and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews Correlation Coefficient (MCC) of 0.8 and obtain an area under the curve of 0.93. Based on the dba score cut-off of 0.7 at which we observed the maximum MCC of 0.8, we predict 523 novel miRNA candidates. An additional RNA secondary structure analysis reveal that 42 of the candidates overlap with predicted conserved secondary structure. Further analysis reveal that the 523 miRNA candidates are located in genomic regions with MAF block (UCSC) fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts. We further analyzed known human and mouse miRNA read profiles and found two distinct classes; the first containing two blocks and the second containing >2 blocks of reads. Also the latter class holds read profiles that have less well defined arrangement of reads in comparison to the former class. On comparison of miRNA read profiles from plants and animals, we observed kingdom specific read profiles that are distinct in terms of both length and distribution of reads within the read profiles to each other. All the data, as well as a server to search miRBase read profiles by uploading a BED file, is available at http://rth.dk/resources/mirdba.
Collapse
|
71
|
Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J. The RNAsnp web server: predicting SNP effects on local RNA secondary structure. Nucleic Acids Res 2013; 41:W475-9. [PMID: 23630321 PMCID: PMC3977658 DOI: 10.1093/nar/gkt291] [Citation(s) in RCA: 105] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The function of many non-coding RNA genes and cis-regulatory elements of messenger RNA largely depends on the structure, which is in turn determined by their sequence. Single nucleotide polymorphisms (SNPs) and other mutations may disrupt the RNA structure, interfere with the molecular function and hence cause a phenotypic effect. RNAsnp is an efficient method to predict the effect of SNPs on local RNA secondary structure based on the RNA folding algorithms implemented in the Vienna RNA package. The SNP effects are quantified in terms of empirical P-values, which, for computational efficiency, are derived from extensive pre-computed tables of distributions of substitution effects as a function of gene length and GC content. Here, we present a web service that not only provides an interface for RNAsnp but also features a graphical output representation. In addition, the web server is connected to a local mirror of the UCSC genome browser database that enables the users to select the genomic sequences for analysis and visualize the results directly in the UCSC genome browser. The RNAsnp web server is freely available at: http://rth.dk/resources/rnasnp/.
Collapse
|
72
|
Alquezar-Planas DE, Mourier T, Bruhn CAW, Hansen AJ, Vitcetz SN, Mørk S, Gorodkin J, Nielsen HA, Guo Y, Sethuraman A, Paxinos EE, Shan T, Delwart EL, Nielsen LP. Discovery of a divergent HPIV4 from respiratory secretions using second and third generation metagenomic sequencing. Sci Rep 2013; 3:2468. [PMID: 24002378 PMCID: PMC3760282 DOI: 10.1038/srep02468] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 07/26/2013] [Indexed: 11/13/2022] Open
Abstract
Molecular detection of viruses has been aided by high-throughput sequencing, permitting the genomic characterization of emerging strains. In this study, we comprehensively screened 500 respiratory secretions from children with upper and/or lower respiratory tract infections for viral pathogens. The viruses detected are described, including a divergent human parainfluenza virus type 4 from GS FLX pyrosequencing of 92 specimens. Complete full-genome characterization of the virus followed, using Single Molecule, Real-Time (SMRT) sequencing. Subsequent "primer walking" combined with Sanger sequencing validated the RS platform's utility in viral sequencing from complex clinical samples. Comparative genomics reveals the divergent strain clusters with the only completely sequenced HPIV4a subtype. However, it also exhibits various structural features present in one of the HPIV4b reference strains, opening questions regarding their lifecycle and evolutionary relationships among these viruses. Clinical data from patients infected with the strain, as well as viral prevalence estimates using real-time PCR, is also described.
Collapse
|
73
|
Havgaard J, Kaur S, Gorodkin J. Comparative ncRNA gene and structure prediction using Foldalign and FoldalignM. ACTA ACUST UNITED AC 2012; Chapter 12:12.11.1-12.11.15. [PMID: 22948726 DOI: 10.1002/0471250953.bi1211s39] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
This unit describes how to use Foldalign and FoldalignM to make structural alignments of non-protein-coding-RNA (ncRNA). These tools can be used to find new ncRNAs, to find the structure of novel ncRNAs, and to improve alignments for known ncRNAs.
Collapse
|
74
|
Groenen MAM, Archibald AL, Uenishi H, Tuggle CK, Takeuchi Y, Rothschild MF, Rogel-Gaillard C, Park C, Milan D, Megens HJ, Li S, Larkin DM, Kim H, Frantz LAF, Caccamo M, Ahn H, Aken BL, Anselmo A, Anthon C, Auvil L, Badaoui B, Beattie CW, Bendixen C, Berman D, Blecha F, Blomberg J, Bolund L, Bosse M, Botti S, Bujie Z, Bystrom M, Capitanu B, Carvalho-Silva D, Chardon P, Chen C, Cheng R, Choi SH, Chow W, Clark RC, Clee C, Crooijmans RPMA, Dawson HD, Dehais P, De Sapio F, Dibbits B, Drou N, Du ZQ, Eversole K, Fadista J, Fairley S, Faraut T, Faulkner GJ, Fowler KE, Fredholm M, Fritz E, Gilbert JGR, Giuffra E, Gorodkin J, Griffin DK, Harrow JL, Hayward A, Howe K, Hu ZL, Humphray SJ, Hunt T, Hornshøj H, Jeon JT, Jern P, Jones M, Jurka J, Kanamori H, Kapetanovic R, Kim J, Kim JH, Kim KW, Kim TH, Larson G, Lee K, Lee KT, Leggett R, Lewin HA, Li Y, Liu W, Loveland JE, Lu Y, Lunney JK, Ma J, Madsen O, Mann K, Matthews L, McLaren S, Morozumi T, Murtaugh MP, Narayan J, Nguyen DT, Ni P, Oh SJ, Onteru S, Panitz F, Park EW, Park HS, Pascal G, Paudel Y, Perez-Enciso M, Ramirez-Gonzalez R, Reecy JM, Rodriguez-Zas S, Rohrer GA, Rund L, Sang Y, Schachtschneider K, Schraiber JG, Schwartz J, Scobie L, Scott C, Searle S, Servin B, Southey BR, Sperber G, Stadler P, Sweedler JV, Tafer H, Thomsen B, Wali R, Wang J, Wang J, White S, Xu X, Yerle M, Zhang G, Zhang J, Zhang J, Zhao S, Rogers J, Churcher C, Schook LB. Analyses of pig genomes provide insight into porcine demography and evolution. Nature 2012; 491:393-8. [PMID: 23151582 PMCID: PMC3566564 DOI: 10.1038/nature11622] [Citation(s) in RCA: 947] [Impact Index Per Article: 78.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2012] [Accepted: 09/27/2012] [Indexed: 01/03/2023]
Abstract
For 10,000 years pigs and humans have shared a close and complex relationship. From domestication to modern breeding practices, humans have shaped the genomes of domestic pigs. Here we present the assembly and analysis of the genome sequence of a female domestic Duroc pig (Sus scrofa) and a comparison with the genomes of wild and domestic pigs from Europe and Asia. Wild pigs emerged in South East Asia and subsequently spread across Eurasia. Our results reveal a deep phylogenetic split between European and Asian wild boars ∼1 million years ago, and a selective sweep analysis indicates selection on genes involved in RNA processing and regulation. Genes associated with immune response and olfaction exhibit fast evolution. Pigs have the largest repertoire of functional olfactory receptor genes, reflecting the importance of smell in this scavenging animal. The pig genome sequence provides an important resource for further improvements of this important livestock species, and our identification of many putative disease-causing variants extends the potential of the pig as a biomedical model.
Collapse
|
75
|
Podolska A, Anthon C, Bak M, Tommerup N, Skovgaard K, Heegaard PM, Gorodkin J, Cirera S, Fredholm M. Profiling microRNAs in lung tissue from pigs infected with Actinobacillus pleuropneumoniae. BMC Genomics 2012; 13:459. [PMID: 22953717 PMCID: PMC3465251 DOI: 10.1186/1471-2164-13-459] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2012] [Accepted: 08/29/2012] [Indexed: 12/25/2022] Open
Abstract
Background MicroRNAs (miRNAs) are a class of non-protein-coding genes that play a crucial regulatory role in mammalian development and disease. Whereas a large number of miRNAs have been annotated at the structural level during the latest years, functional annotation is sparse. Actinobacillus pleuropneumoniae (APP) causes serious lung infections in pigs. Severe damage to the lungs, in many cases deadly, is caused by toxins released by the bacterium and to some degree by host mediated tissue damage. However, understanding of the role of microRNAs in the course of this infectious disease in porcine is still very limited. Results In this study, the RNA extracted from visually unaffected and necrotic tissue from pigs infected with Actinobacillus pleuropneumoniae was subjected to small RNA deep sequencing. We identified 169 conserved and 11 candidate novel microRNAs in the pig. Of these, 17 were significantly up-regulated in the necrotic sample and 12 were down-regulated. The expression analysis of a number of candidates revealed microRNAs of potential importance in the innate immune response. MiR-155, a known key player in inflammation, was found expressed in both samples. Moreover, miR-664-5p, miR-451 and miR-15a appear as very promising candidates for microRNAs involved in response to pathogen infection. Conclusions This is the first study revealing significant differences in composition and expression profiles of miRNAs in lungs infected with a bacterial pathogen. Our results extend annotation of microRNA in pig and provide insight into the role of a number of microRNAs in regulation of bacteria induced immune and inflammatory response in porcine lung.
Collapse
|