26
|
Doncheva NT, Palasca O, Yarani R, Litman T, Anthon C, Groenen MAM, Stadler PF, Pociot F, Jensen LJ, Gorodkin J. Human pathways in animal models: possibilities and limitations. Nucleic Acids Res 2021; 49:1859-1871. [PMID: 33524155 PMCID: PMC7913694 DOI: 10.1093/nar/gkab012] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2020] [Revised: 12/08/2020] [Accepted: 01/07/2021] [Indexed: 12/20/2022] Open
Abstract
Animal models are crucial for advancing our knowledge about the molecular pathways involved in human diseases. However, it remains unclear to what extent tissue expression of pathways in healthy individuals is conserved between species. In addition, organism-specific information on pathways in animal models is often lacking. Within these limitations, we explore the possibilities that arise from publicly available data for the animal models mouse, rat, and pig. We approximate the animal pathways activity by integrating the human counterparts of curated pathways with tissue expression data from the models. Specifically, we compare whether the animal orthologs of the human genes are expressed in the same tissue. This is complicated by the lower coverage and worse quality of data in rat and pig as compared to mouse. Despite that, from 203 human KEGG pathways and the seven tissues with best experimental coverage, we identify 95 distinct pathways, for which the tissue expression in one animal model agrees better with human than the others. Our systematic pathway-tissue comparison between human and three animal modes points to specific similarities with human and to distinct differences among the animal models, thereby suggesting the most suitable organism for modeling a human pathway or tissue.
Collapse
|
27
|
Geissler AS, Anthon C, Alkan F, González-Tortuero E, Poulsen LD, Kallehauge TB, Breüner A, Seemann SE, Vinther J, Gorodkin J. BSGatlas: a unified Bacillus subtilis genome and transcriptome annotation atlas with enhanced information access. Microb Genom 2021; 7:000524. [PMID: 33539279 PMCID: PMC8208703 DOI: 10.1099/mgen.0.000524] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2020] [Accepted: 01/11/2021] [Indexed: 12/26/2022] Open
Abstract
A large part of our current understanding of gene regulation in Gram-positive bacteria is based on Bacillus subtilis, as it is one of the most well studied bacterial model systems. The rapid growth in data concerning its molecular and genomic biology is distributed across multiple annotation resources. Consequently, the interpretation of data from further B. subtilis experiments becomes increasingly challenging in both low- and large-scale analyses. Additionally, B. subtilis annotation of structured RNA and non-coding RNA (ncRNA), as well as the operon structure, is still lagging behind the annotation of the coding sequences. To address these challenges, we created the B. subtilis genome atlas, BSGatlas, which integrates and unifies multiple existing annotation resources. Compared to any of the individual resources, the BSGatlas contains twice as many ncRNAs, while improving the positional annotation for 70 % of the ncRNAs. Furthermore, we combined known transcription start and termination sites with lists of known co-transcribed gene sets to create a comprehensive transcript map. The combination with transcription start/termination site annotations resulted in 717 new sets of co-transcribed genes and 5335 untranslated regions (UTRs). In comparison to existing resources, the number of 5' and 3' UTRs increased nearly fivefold, and the number of internal UTRs doubled. The transcript map is organized in 2266 operons, which provides transcriptional annotation for 92 % of all genes in the genome compared to the at most 82 % by previous resources. We predicted an off-target-aware genome-wide library of CRISPR-Cas9 guide RNAs, which we also linked to polycistronic operons. We provide the BSGatlas in multiple forms: as a website (https://rth.dk/resources/bsgatlas/), an annotation hub for display in the UCSC genome browser, supplementary tables and standardized GFF3 format, which can be used in large scale -omics studies. By complementing existing resources, the BSGatlas supports analyses of the B. subtilis genome and its molecular biology with respect to not only non-coding genes but also genome-wide transcriptional relationships of all genes.
Collapse
|
28
|
Sweeney BA, Petrov AI, Ribas CE, Finn RD, Bateman A, Szymanski M, Karlowski WM, Seemann SE, Gorodkin J, Cannone JJ, Gutell RR, Kay S, Marygold S, dos Santos G, Frankish A, Mudge JM, Barshir R, Fishilevich S, Chan PP, Lowe TM, Seal R, Bruford E, Panni S, Porras P, Karagkouni D, Hatzigeorgiou AG, Ma L, Zhang Z, Volders PJ, Mestdagh P, Griffiths-Jones S, Fromm B, Peterson KJ, Kalvari I, Nawrocki EP, Petrov AS, Weng S, Bouchard-Bourelle P, Scott M, Lui LM, Hoksza D, Lovering RC, Kramarz B, Mani P, Ramachandran S, Weinberg Z. RNAcentral 2021: secondary structure integration, improved sequence search and new member databases. Nucleic Acids Res 2021; 49:D212-D220. [PMID: 33106848 PMCID: PMC7779037 DOI: 10.1093/nar/gkaa921] [Citation(s) in RCA: 124] [Impact Index Per Article: 41.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2020] [Accepted: 10/05/2020] [Indexed: 12/16/2022] Open
Abstract
RNAcentral is a comprehensive database of non-coding RNA (ncRNA) sequences that provides a single access point to 44 RNA resources and >18 million ncRNA sequences from a wide range of organisms and RNA types. RNAcentral now also includes secondary (2D) structure information for >13 million sequences, making RNAcentral the world's largest RNA 2D structure database. The 2D diagrams are displayed using R2DT, a new 2D structure visualization method that uses consistent, reproducible and recognizable layouts for related RNAs. The sequence similarity search has been updated with a faster interface featuring facets for filtering search results by RNA type, organism, source database or any keyword. This sequence search tool is available as a reusable web component, and has been integrated into several RNAcentral member databases, including Rfam, miRBase and snoDB. To allow for a more fine-grained assignment of RNA types and subtypes, all RNAcentral sequences have been annotated with Sequence Ontology terms. The RNAcentral database continues to grow and provide a central data resource for the RNA community. RNAcentral is freely available at https://rnacentral.org.
Collapse
|
29
|
Creutzburg SCA, Wu WY, Mohanraju P, Swartjes T, Alkan F, Gorodkin J, Staals RHJ, van der Oost J. Good guide, bad guide: spacer sequence-dependent cleavage efficiency of Cas12a. Nucleic Acids Res 2020; 48:3228-3243. [PMID: 31989168 PMCID: PMC7102956 DOI: 10.1093/nar/gkz1240] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2019] [Revised: 12/19/2019] [Accepted: 01/24/2020] [Indexed: 12/26/2022] Open
Abstract
Genome editing has recently made a revolutionary development with the introduction of the CRISPR-Cas technology. The programmable CRISPR-associated Cas9 and Cas12a nucleases generate specific dsDNA breaks in the genome, after which host DNA-repair mechanisms can be manipulated to implement the desired editing. Despite this spectacular progress, the efficiency of Cas9/Cas12a-based engineering can still be improved. Here, we address the variation in guide-dependent efficiency of Cas12a, and set out to reveal the molecular basis of this phenomenon. We established a sensitive and robust in vivo targeting assay based on loss of a target plasmid encoding the red fluorescent protein (mRFP). Our results suggest that folding of both the precursor guide (pre-crRNA) and the mature guide (crRNA) have a major influence on Cas12a activity. Especially, base pairing of the direct repeat, other than with itself, was found to be detrimental to the activity of Cas12a. Furthermore, we describe different approaches to minimize base-pairing interactions between the direct repeat and the variable part of the guide. We show that design of the 3' end of the guide, which is not involved in target strand base pairing, may result in substantial improvement of the guide's targeting potential and hence of its genome editing efficiency.
Collapse
|
30
|
Jacobsen MJ, Havgaard JH, Anthon C, Mentzel CMJ, Cirera S, Krogh PM, Pundhir S, Karlskov-Mortensen P, Bruun CS, Lesnik P, Guerin M, Gorodkin J, Jørgensen CB, Fredholm M, Barrès R. Epigenetic and Transcriptomic Characterization of Pure Adipocyte Fractions From Obese Pigs Identifies Candidate Pathways Controlling Metabolism. Front Genet 2019; 10:1268. [PMID: 31921306 PMCID: PMC6927937 DOI: 10.3389/fgene.2019.01268] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Accepted: 11/18/2019] [Indexed: 12/11/2022] Open
Abstract
Reprogramming of adipocyte function in obesity is implicated in metabolic disorders like type 2 diabetes. Here, we used the pig, an animal model sharing many physiological and pathophysiological similarities with humans, to perform in-depth epigenomic and transcriptomic characterization of pure adipocyte fractions. Using a combined DNA methylation capture sequencing and Reduced Representation bisulfite sequencing (RRBS) strategy in 11 lean and 12 obese pigs, we identified in 3529 differentially methylated regions (DMRs) located at close proximity to-, or within genes in the adipocytes. By sequencing of the transcriptome from the same fraction of isolated adipocytes, we identified 276 differentially expressed transcripts with at least one or more DMR. These transcripts were over-represented in gene pathways related to MAPK, metabolic and insulin signaling. Using a candidate gene approach, we further characterized 13 genes potentially regulated by DNA methylation and identified putative transcription factor binding sites that could be affected by the differential methylation in obesity. Our data constitute a valuable resource for further investigations aiming to delineate the epigenetic etiology of metabolic disorders.
Collapse
|
31
|
Zaucker A, Nagorska A, Kumari P, Hecker N, Wang Y, Huang S, Cooper L, Sivashanmugam L, VijayKumar S, Brosens J, Gorodkin J, Sampath K. Translational co-regulation of a ligand and inhibitor by a conserved RNA element. Nucleic Acids Res 2019; 46:104-119. [PMID: 29059375 PMCID: PMC5758872 DOI: 10.1093/nar/gkx938] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Accepted: 10/03/2017] [Indexed: 12/20/2022] Open
Abstract
In many organisms, transcriptional and post-transcriptional regulation of components of pathways or processes has been reported. However, to date, there are few reports of translational co-regulation of multiple components of a developmental signaling pathway. Here, we show that an RNA element which we previously identified as a dorsal localization element (DLE) in the 3'UTR of zebrafish nodal-related1/squint (ndr1/sqt) ligand mRNA, is shared by the related ligand nodal-related2/cyclops (ndr2/cyc) and the nodal inhibitors, lefty1 (lft1) and lefty2 mRNAs. We investigated the activity of the DLEs through functional assays in live zebrafish embryos. The lft1 DLE localizes fluorescently labeled RNA similarly to the ndr1/sqt DLE. Similar to the ndr1/sqt 3'UTR, the lft1 and lft2 3'UTRs are bound by the RNA-binding protein (RBP) and translational repressor, Y-box binding protein 1 (Ybx1), whereas deletions in the DLE abolish binding to Ybx1. Analysis of zebrafish ybx1 mutants shows that Ybx1 represses lefty1 translation in embryos. CRISPR/Cas9-mediated inactivation of human YBX1 also results in human NODAL translational de-repression, suggesting broader conservation of the DLE RNA element/Ybx1 RBP module in regulation of Nodal signaling. Our findings demonstrate translational co-regulation of components of a signaling pathway by an RNA element conserved in both sequence and structure and an RBP, revealing a 'translational regulon'.
Collapse
|
32
|
Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data. J Proteome Res 2019. [PMID: 30450911 DOI: 10.1101/438192] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2023]
Abstract
Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp .
Collapse
|
33
|
Eiberg H, Mikkelsen AF, Bak M, Tommerup N, Lund AM, Wenzel A, Sabarinathan R, Gorodkin J, Bang-Berthelsen CH, Hansen L. A splice-site variant in the lncRNA gene RP1-140A9.1 cosegregates in the large Volkmann cataract family. Mol Vis 2019; 25:1-11. [PMID: 30820140 PMCID: PMC6377377] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2018] [Accepted: 01/17/2019] [Indexed: 11/12/2022] Open
Abstract
Purpose To identify the mutation for Volkmann cataract (CTRCT8) at 1p36.33. Methods The genes in the candidate region 1p36.33 were Sanger and parallel deep sequenced, and informative single nucleotide polymorphisms (SNPs) were identified for linkage analysis. Expression analysis with reverse transcription polymerase chain reaction (RT-PCR) of the candidate gene was performed using RNA from different human tissues. Quantitative transcription polymerase chain reaction (qRT-PCR) analysis of the GNB1 gene was performed in affected and healthy individuals. Bioinformatic analysis of the linkage regions including the candidate gene was performed. Results Linkage analysis of the 1p36.33 CCV locus applying new marker systems obtained with Sanger and deep sequencing reduced the candidate locus from 2.1 Mb to 0.389 Mb flanked by the markers STS-22AC and rs549772338 and resulted in an logarithm of the odds (LOD) score of Z = 21.67. The identified mutation, rs763295804, affects the donor splice site in the long non-coding RNA gene RP1-140A9.1 (ENSG00000231050). The gene including splice-site junctions is conserved in primates but not in other mammalian genomes, and two alternative transcripts were shown with RT-PCR. One of these transcripts represented a lens cell-specific transcript. Meta-analysis of the Cross-Linking-Immuno-Precipitation sequencing (CLIP-Seq) data suggested the RNA binding protein (RBP) eIF4AIII is an active counterpart for RP1-140A9.1, and several miRNA and transcription factors binding sites were predicted in the proximity of the mutation. ENCODE DNase I hypersensitivity and histone methylation and acetylation data suggest the genomic region may have regulatory functions. Conclusions The mutation in RP1-140A9.1 suggests the long non-coding RNA as the candidate cataract gene associated with the autosomal dominant inherited congenital cataract from CCV. The mutation has the potential to destroy exon/intron splicing of both transcripts of RP1-140A9.1. Sanger and massive deep resequencing of the linkage region failed to identify alternative candidates suggesting the mutation in RP1-140A9.1 is causative for the CCV phenotype.
Collapse
|
34
|
Kirsch R, Seemann SE, Ruzzo WL, Cohen SM, Stadler PF, Gorodkin J. Identification and characterization of novel conserved RNA structures in Drosophila. BMC Genomics 2018; 19:899. [PMID: 30537930 PMCID: PMC6288889 DOI: 10.1186/s12864-018-5234-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Accepted: 11/08/2018] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Comparative genomics approaches have facilitated the discovery of many novel non-coding and structured RNAs (ncRNAs). The increasing availability of related genomes now makes it possible to systematically search for compensatory base changes - and thus for conserved secondary structures - even in genomic regions that are poorly alignable in the primary sequence. The wealth of available transcriptome data can add valuable insight into expression and possible function for new ncRNA candidates. Earlier work identifying ncRNAs in Drosophila melanogaster made use of sequence-based alignments and employed a sliding window approach, inevitably biasing identification toward RNAs encoded in the more conserved parts of the genome. RESULTS To search for conserved RNA structures (CRSs) that may not be highly conserved in sequence and to assess the expression of CRSs, we conducted a genome-wide structural alignment screen of 27 insect genomes including D. melanogaster and integrated this with an extensive set of tiling array data. The structural alignment screen revealed ∼30,000 novel candidate CRSs at an estimated false discovery rate of less than 10%. With more than one quarter of all individual CRS motifs showing sequence identities below 60%, the predicted CRSs largely complement the findings of sliding window approaches applied previously. While a sixth of the CRSs were ubiquitously expressed, we found that most were expressed in specific developmental stages or cell lines. Notably, most statistically significant enrichment of CRSs were observed in pupae, mainly in exons of untranslated regions, promotors, enhancers, and long ncRNAs. Interestingly, cell lines were found to express a different set of CRSs than were found in vivo. Only a small fraction of intergenic CRSs were co-expressed with the adjacent protein coding genes, which suggests that most intergenic CRSs are independent genetic units. CONCLUSIONS This study provides a more comprehensive view of the ncRNA transcriptome in fly as well as evidence for differential expression of CRSs during development and in cell lines.
Collapse
|
35
|
Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: Network Analysis and Visualization of Proteomics Data. J Proteome Res 2018; 18:623-632. [PMID: 30450911 DOI: 10.1021/acs.jproteome.8b00702] [Citation(s) in RCA: 1049] [Impact Index Per Article: 174.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Protein networks have become a popular tool for analyzing and visualizing the often long lists of proteins or genes obtained from proteomics and other high-throughput technologies. One of the most popular sources of such networks is the STRING database, which provides protein networks for more than 2000 organisms, including both physical interactions from experimental data and functional associations from curated pathways, automatic text mining, and prediction methods. However, its web interface is mainly intended for inspection of small networks and their underlying evidence. The Cytoscape software, on the other hand, is much better suited for working with large networks and offers greater flexibility in terms of network analysis, import, and visualization of additional data. To include both resources in the same workflow, we created stringApp, a Cytoscape app that makes it easy to import STRING networks into Cytoscape, retains the appearance and many of the features of STRING, and integrates data from associated databases. Here, we introduce many of the stringApp features and show how they can be used to carry out complex network analysis and visualization tasks on a typical proteomics data set, all through the Cytoscape user interface. stringApp is freely available from the Cytoscape app store: http://apps.cytoscape.org/apps/stringapp .
Collapse
|
36
|
Alkan F, Wenzel A, Anthon C, Havgaard JH, Gorodkin J. CRISPR-Cas9 off-targeting assessment with nucleic acid duplex energy parameters. Genome Biol 2018; 19:177. [PMID: 30367669 PMCID: PMC6203265 DOI: 10.1186/s13059-018-1534-x] [Citation(s) in RCA: 86] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2018] [Accepted: 09/11/2018] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Recent experimental efforts of CRISPR-Cas9 systems have shown that off-target binding and cleavage are a concern for the system and that this is highly dependent on the selected guide RNA (gRNA) design. Computational predictions of off-targets have been proposed as an attractive and more feasible alternative to tedious experimental efforts. However, accurate scoring of the high number of putative off-targets plays a key role for the success of computational off-targeting assessment. RESULTS We present an approximate binding energy model for the Cas9-gRNA-DNA complex, which systematically combines the energy parameters obtained for RNA-RNA, DNA-DNA, and RNA-DNA duplexes. Based on this model, two novel off-target assessment methods for gRNA selection in CRISPR-Cas9 applications are introduced: CRISPRoff to assign confidence scores to predicted off-targets and CRISPRspec to measure the specificity of the gRNA. We benchmark the methods against current state-of-the-art methods and show that both are in better agreement with experimental results. Furthermore, we show significant evidence supporting the inverse relationship between the on-target cleavage efficiency and specificity of the system, in which introduced binding energies are key components. CONCLUSIONS The impact of the binding energies provides a direction for further studies of off-targeting mechanisms. The performance of CRISPRoff and CRISPRspec enables more accurate off-target evaluation for gRNA selections, prior to any CRISPR-Cas9 genome-editing application. For given gRNA sequences or all potential gRNAs in a given target region, CRISPRoff-based off-target predictions and CRISPRspec-based specificity evaluations can be carried out through our webserver at https://rth.dk/resources/crispr/ .
Collapse
|
37
|
Palasca O, Santos A, Stolte C, Gorodkin J, Jensen LJ. TISSUES 2.0: an integrative web resource on mammalian tissue expression. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4851151. [PMID: 29617745 PMCID: PMC5808782 DOI: 10.1093/database/bay003] [Citation(s) in RCA: 115] [Impact Index Per Article: 19.2] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2017] [Accepted: 01/04/2018] [Indexed: 11/13/2022]
Abstract
Physiological and molecular similarities between organisms make it possible to translate findings from simpler experimental systems—model organisms—into more complex ones, such as human. This translation facilitates the understanding of biological processes under normal or disease conditions. Researchers aiming to identify the similarities and differences between organisms at the molecular level need resources collecting multi-organism tissue expression data. We have developed a database of gene–tissue associations in human, mouse, rat and pig by integrating multiple sources of evidence: transcriptomics covering all four species and proteomics (human only), manually curated and mined from the scientific literature. Through a scoring scheme, these associations are made comparable across all sources of evidence and across organisms. Furthermore, the scoring produces a confidence score assigned to each of the associations. The TISSUES database (version 2.0) is publicly accessible through a user-friendly web interface and as part of the STRING app for Cytoscape. In addition, we analyzed the agreement between datasets, across and within organisms, and identified that the agreement is mainly affected by the quality of the datasets rather than by the technologies used or organisms compared. Database URL: http://tissues.jensenlab.org/
Collapse
|
38
|
Pan X, Jensen LJ, Gorodkin J. Inferring disease-associated long non-coding RNAs using genome-wide tissue expression profiles. Bioinformatics 2018; 35:1494-1502. [DOI: 10.1093/bioinformatics/bty859] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2018] [Revised: 08/28/2018] [Accepted: 10/04/2018] [Indexed: 11/13/2022] Open
|
39
|
Eichenlaub T, Villadsen R, Freitas FCP, Andrejeva D, Aldana BI, Nguyen HT, Petersen OW, Gorodkin J, Herranz H, Cohen SM. Warburg Effect Metabolism Drives Neoplasia in a Drosophila Genetic Model of Epithelial Cancer. Curr Biol 2018; 28:3220-3228.e6. [PMID: 30293715 DOI: 10.1016/j.cub.2018.08.035] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 05/21/2018] [Accepted: 08/15/2018] [Indexed: 01/08/2023]
Abstract
Cancers develop in a complex mutational landscape. Genetic models of tumor formation have been used to explore how combinations of mutations cooperate to promote tumor formation in vivo. Here, we identify lactate dehydrogenase (LDH), a key enzyme in Warburg effect metabolism, as a cooperating factor that is both necessary and sufficient for epidermal growth factor receptor (EGFR)-driven epithelial neoplasia and metastasis in a Drosophila model. LDH is upregulated during the transition from hyperplasia to neoplasia, and neoplasia is prevented by LDH depletion. Elevated LDH is sufficient to drive this transition. Notably, genetic alterations that increase glucose flux, or a high-sugar diet, are also sufficient to promote EGFR-driven neoplasia, and this depends on LDH activity. We provide evidence that increased LDHA expression promotes a transformed phenotype in a human primary breast cell culture model. Furthermore, analysis of publically available cancer data showed evidence of synergy between elevated EGFR and LDHA activity linked to poor clinical outcome in a number of human cancers. Altered metabolism has generally been assumed to be an enabling feature that accelerates cancer cell proliferation. Our findings provide evidence that sugar metabolism may have a more profound role in driving neoplasia than previously appreciated.
Collapse
|
40
|
Pan X, Wenzel A, Jensen LJ, Gorodkin J. Genome-wide identification of clusters of predicted microRNA binding sites as microRNA sponge candidates. PLoS One 2018; 13:e0202369. [PMID: 30142196 PMCID: PMC6108476 DOI: 10.1371/journal.pone.0202369] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Accepted: 08/01/2018] [Indexed: 12/21/2022] Open
Abstract
The number of discovered natural miRNA sponges in plants, viruses, and mammals is increasing steadily. Some sponges like ciRS-7 for miR-7 contain multiple nearby miRNA binding sites. We hypothesize that such clusters of miRNA binding sites on the genome can function together as a sponge. No systematic effort has been made in search for clusters of miRNA targets. Here, we, to our knowledge, make the first genome-wide target site predictions for clusters of mature human miRNAs. For each miRNA, we predict the target sites on a genome-wide scale, build a graph with edge weights based on the pairwise distances between sites, and apply Markov clustering to identify genomic regions with high binding site density. Significant clusters are then extracted based on cluster size difference between real and shuffled genomes preserving local properties such as the GC content. We then use conservation and binding energy to filter a final set of miRNA target site clusters or sponge candidates. Our pipeline predicts 3673 sponge candidates for 1250 miRNAs, including the experimentally verified miR-7 sponge ciRS-7. In addition, we point explicitly to 19 high-confidence candidates overlapping annotated genomic sequence. The full list of candidates is freely available at http://rth.dk/resources/mirnasponge, where detailed properties for individual candidates can be explored, such as alignment details, conservation, accessibility and target profiles, which facilitates selection of sponge candidates for further context specific analysis.
Collapse
|
41
|
Brogaard L, Larsen LE, Heegaard PMH, Anthon C, Gorodkin J, Dürrwald R, Skovgaard K. IFN-λ and microRNAs are important modulators of the pulmonary innate immune response against influenza A (H1N2) infection in pigs. PLoS One 2018; 13:e0194765. [PMID: 29677213 PMCID: PMC5909910 DOI: 10.1371/journal.pone.0194765] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2017] [Accepted: 02/02/2018] [Indexed: 11/19/2022] Open
Abstract
The innate immune system is paramount in the response to and clearance of influenza A virus (IAV) infection in non-immune individuals. Known factors include type I and III interferons and antiviral pathogen recognition receptors, and the cascades of antiviral and pro- and anti-inflammatory gene expression they induce. MicroRNAs (miRNAs) are increasingly recognized to participate in post-transcriptional modulation of these responses, but the temporal dynamics of how these players of the antiviral innate immune response collaborate to combat infection remain poorly characterized. We quantified the expression of miRNAs and protein coding genes in the lungs of pigs 1, 3, and 14 days after challenge with swine IAV (H1N2). Through RT-qPCR we observed a 400-fold relative increase in IFN-λ3 gene expression on day 1 after challenge, and a strong interferon-mediated antiviral response was observed on days 1 and 3 accompanied by up-regulation of genes related to the pro-inflammatory response and apoptosis. Using small RNA sequencing and qPCR validation we found 27 miRNAs that were differentially expressed after challenge, with the highest number of regulated miRNAs observed on day 3. In contrast, the number of protein coding genes found to be regulated due to IAV infection peaked on day 1. Pulmonary miRNAs may thus be aimed at fine-tuning the initial rapid inflammatory response after IAV infection. Specifically, we found five miRNAs (ssc-miR-15a, ssc-miR-18a, ssc-miR-21, ssc-miR-29b, and hsa-miR-590-3p)-four known porcine miRNAs and one novel porcine miRNA candidate-to be potential modulators of viral pathogen recognition and apoptosis. A total of 11 miRNAs remained differentially expressed 14 days after challenge, at which point the infection had cleared. In conclusion, the results suggested a role for miRNAs both during acute infection as well as later, with the potential to influence lung homeostasis and susceptibility to secondary infections in the lungs of pigs after IAV infection.
Collapse
|
42
|
Abstract
Over the last two decades it has become clear that RNA is much more than just a boring intermediate in protein expression. Ancient RNAs still appear in the core information metabolism and comprise a surprisingly large component in bacterial gene regulation. A common theme with these types of mostly small RNAs is their reliance of conserved secondary structures. Large scale sequencing projects, on the other hand, have profoundly changed our understanding of eukaryotic genomes. Pervasively transcribed, they give rise to a plethora of large and evolutionarily extremely flexible noncoding RNAs that exert a vastly diverse array of molecule functions. In this chapter we provide a-necessarily incomplete-overview of the current state of comparative analysis of noncoding RNAs, emphasizing computational approaches as a means to gain a global picture of the modern RNA world.
Collapse
|
43
|
Palasca O, Santos A, Stolte C, Gorodkin J, Jensen LJ. TISSUES 2.0: an integrative web resource on mammalian tissue expression. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2018; 2018:4939216. [PMID: 30403794 PMCID: PMC5855096 DOI: 10.1093/database/bay028] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
|
44
|
Kato Y, Gorodkin J, Havgaard JH. Alignment-free comparative genomic screen for structured RNAs using coarse-grained secondary structure dot plots. BMC Genomics 2017; 18:935. [PMID: 29197323 PMCID: PMC5712110 DOI: 10.1186/s12864-017-4309-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 11/15/2017] [Indexed: 01/01/2023] Open
Abstract
Background Structured non-coding RNAs play many different roles in the cells, but the annotation of these RNAs is lacking even within the human genome. The currently available computational tools are either too computationally heavy for use in full genomic screens or rely on pre-aligned sequences. Methods Here we present a fast and efficient method, DotcodeR, for detecting structurally similar RNAs in genomic sequences by comparing their corresponding coarse-grained secondary structure dot plots at string level. This allows us to perform an all-against-all scan of all window pairs from two genomes without alignment. Results Our computational experiments with simulated data and real chromosomes demonstrate that the presented method has good sensitivity. Conclusions DotcodeR can be useful as a pre-filter in a genomic comparative scan for structured RNAs. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4309-y) contains supplementary material, which is available to authorized users.
Collapse
|
45
|
Miladi M, Junge A, Costa F, Seemann SE, Havgaard JH, Gorodkin J, Backofen R. RNAscClust: clustering RNA sequences using structure conservation and graph based motifs. Bioinformatics 2017; 33:2089-2096. [PMID: 28334186 PMCID: PMC5870858 DOI: 10.1093/bioinformatics/btx114] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2016] [Revised: 12/22/2016] [Accepted: 02/23/2017] [Indexed: 12/22/2022] Open
Abstract
MOTIVATION Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account. RESULTS Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments. AVAILABILITY AND IMPLEMENTATION RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust . CONTACT gorodkin@rth.dk or backofen@informatik.uni-freiburg.de. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
46
|
Seemann SE, Mirza AH, Hansen C, Bang-Berthelsen CH, Garde C, Christensen-Dalsgaard M, Torarinsson E, Yao Z, Workman CT, Pociot F, Nielsen H, Tommerup N, Ruzzo WL, Gorodkin J. The identification and functional annotation of RNA structures conserved in vertebrates. Genome Res 2017; 27:1371-1383. [PMID: 28487280 PMCID: PMC5538553 DOI: 10.1101/gr.208652.116] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2016] [Accepted: 05/04/2017] [Indexed: 01/15/2023]
Abstract
Structured elements of RNA molecules are essential in, e.g., RNA stabilization, localization, and protein interaction, and their conservation across species suggests a common functional role. We computationally screened vertebrate genomes for conserved RNA structures (CRSs), leveraging structure-based, rather than sequence-based, alignments. After careful correction for sequence identity and GC content, we predict ∼516,000 human genomic regions containing CRSs. We find that a substantial fraction of human–mouse CRS regions (1) colocalize consistently with binding sites of the same RNA binding proteins (RBPs) or (2) are transcribed in corresponding tissues. Additionally, a CaptureSeq experiment revealed expression of many of our CRS regions in human fetal brain, including 662 novel ones. For selected human and mouse candidate pairs, qRT-PCR and in vitro RNA structure probing supported both shared expression and shared structure despite low abundance and low sequence identity. About 30,000 CRS regions are located near coding or long noncoding RNA genes or within enhancers. Structured (CRS overlapping) enhancer RNAs and extended 3′ ends have significantly increased expression levels over their nonstructured counterparts. Our findings of transcribed uncharacterized regulatory regions that contain CRSs support their RNA-mediated functionality.
Collapse
|
47
|
Alkan F, Wenzel A, Palasca O, Kerpedjiev P, Rudebeck A, Stadler PF, Hofacker IL, Gorodkin J. RIsearch2: suffix array-based large-scale prediction of RNA-RNA interactions and siRNA off-targets. Nucleic Acids Res 2017; 45:e60. [PMID: 28108657 PMCID: PMC5416843 DOI: 10.1093/nar/gkw1325] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Accepted: 12/19/2016] [Indexed: 12/28/2022] Open
Abstract
Intermolecular interactions of ncRNAs are at the core of gene regulation events, and identifying the full map of these interactions bears crucial importance for ncRNA functional studies. It is known that RNA-RNA interactions are built up by complementary base pairings between interacting RNAs and high level of complementarity between two RNA sequences is a powerful predictor of such interactions. Here, we present RIsearch2, a large-scale RNA-RNA interaction prediction tool that enables quick localization of potential near-complementary RNA-RNA interactions between given query and target sequences. In contrast to previous heuristics which either search for exact matches while including G-U wobble pairs or employ simplified energy models, we present a novel approach using a single integrated seed-and-extend framework based on suffix arrays. RIsearch2 enables fast discovery of candidate RNA-RNA interactions on genome/transcriptome-wide scale. We furthermore present an siRNA off-target discovery pipeline that not only predicts the off-target transcripts but also computes the off-targeting potential of a given siRNA. This is achieved by combining genome-wide RIsearch2 predictions with target site accessibilities and transcript abundance estimates. We show that this pipeline accurately predicts siRNA off-target interactions and enables off-targeting potential comparisons between different siRNA designs. RIsearch2 and the siRNA off-target discovery pipeline are available as stand-alone software packages from http://rth.dk/resources/risearch.
Collapse
|
48
|
Junge A, Zandi R, Havgaard JH, Gorodkin J, Cowland JB. Assessing the miRNA sponge potential of RUNX1T1 in t(8;21) acute myeloid leukemia. Gene 2017; 615:35-40. [PMID: 28322996 DOI: 10.1016/j.gene.2017.03.015] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2017] [Revised: 03/01/2017] [Accepted: 03/15/2017] [Indexed: 01/08/2023]
Abstract
t(8;21) acute myeloid leukemia (AML) is characterized by a translocation between chromosomes 8 and 21 and formation of a distinctive RUNX1-RUNX1T1 fusion transcript. This translocation places RUNX1T1 under control of the RUNX1 promoter leading to a pronounced upregulation of RUNX1T1 transcripts in t(8;21) AML, compared to normal hematopoietic cells. We investigated the role of highly-upregulated RUNX1T1 under the hypothesis that it acts as competing endogenous RNA (ceRNA) titrating microRNAs (miRNAs) away from their target transcripts and thus contributes to AML formation. Using publicly available t(8;21) AML RNA-Seq and miRNA-Seq data available from The Cancer Genome Atlas (TCGA) project, we obtained a network consisting of 605 genes that may act as ceRNAs competing for miRNAs with the suggested RUNX1T1 miRNA sponge. Among the 605 ceRNA candidates, 121 have previously been implied in cancer development. Players in the integrin, cadherin, and Wnt signaling pathways affected by the RUNX1T1 sponge were overrepresented. Finally, among a set of 21 high interest RUNX1T1 ceRNAs we found multiple genes that have previously been linked to AML formation. In conclusion, our study offers a novel look at the role of the RUNX1-RUNX1T1 fusion transcript in t(8;21) AML beyond previously investigated genetic and epigenetic aberrations.
Collapse
|
49
|
Junge A, Refsgaard JC, Garde C, Pan X, Santos A, Alkan F, Anthon C, von Mering C, Workman CT, Jensen LJ, Gorodkin J. RAIN: RNA-protein Association and Interaction Networks. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2017; 2017:baw167. [PMID: 28077569 PMCID: PMC5225963 DOI: 10.1093/database/baw167] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Revised: 11/18/2016] [Accepted: 12/05/2016] [Indexed: 12/11/2022]
Abstract
Protein association networks can be inferred from a range of resources including experimental data, literature mining and computational predictions. These types of evidence are emerging for non-coding RNAs (ncRNAs) as well. However, integration of ncRNAs into protein association networks is challenging due to data heterogeneity. Here, we present a database of ncRNA-RNA and ncRNA-protein interactions and its integration with the STRING database of protein-protein interactions. These ncRNA associations cover four organisms and have been established from curated examples, experimental data, interaction predictions and automatic literature mining. RAIN uses an integrative scoring scheme to assign a confidence score to each interaction. We demonstrate that RAIN outperforms the underlying microRNA-target predictions in inferring ncRNA interactions. RAIN can be operated through an easily accessible web interface and all interaction data can be downloaded.Database URL: http://rth.dk/resources/rain.
Collapse
|
50
|
Mentzel CMJ, Alkan F, Keinicke H, Jacobsen MJ, Gorodkin J, Fredholm M, Cirera S. Joint Profiling of miRNAs and mRNAs Reveals miRNA Mediated Gene Regulation in the Göttingen Minipig Obesity Model. PLoS One 2016; 11:e0167285. [PMID: 27902747 PMCID: PMC5130236 DOI: 10.1371/journal.pone.0167285] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 11/13/2016] [Indexed: 12/23/2022] Open
Abstract
Obesity and its comorbidities are an increasing challenge for both affected individuals and health care systems, worldwide. In obese individuals, perturbation of expression of both protein-coding genes and microRNAs (miRNA) are seen in obesity-relevant tissues (i.e. adipose tissue, liver and skeletal muscle). miRNAs are small non-coding RNA molecules which have important regulatory roles in a wide range of biological processes, including obesity. Rodents are widely used animal models for human diseases including obesity. However, not all research is applicable for human health or diseases. In contrast, pigs are emerging as an excellent animal model for obesity studies, due to their similarities in their metabolism, their digestive tract and their genetics, when compared to humans. The Göttingen minipig is a small sized easy-to-handle pig breed which has been extensively used for modeling human obesity, due to its capacity to develop severe obesity when fed ad libitum. The aim of this study was to identify differentially expressed of protein-coding genes and miRNAs in a Göttingen minipig obesity model. Liver, skeletal muscle and abdominal adipose tissue were sampled from 7 lean and 7 obese minipigs. Differential gene expression was investigated using high-throughput quantitative real-time PCR (qPCR) on 90 mRNAs and 72 miRNAs. The results revealed de-regulation of several obesity and inflammation-relevant protein-coding genes and miRNAs in all tissues examined. Many genes that are known to be de-regulated in obese humans were confirmed in the obese minipigs and several of these genes have target sites for miRNAs expressed in the opposing direction of the gene, confirming miRNA-mediated regulation in obesity. These results confirm the translational value of the pig for human obesity studies.
Collapse
|