31151
|
Kukurba KR, Zhang R, Li X, Smith KS, Knowles DA, How Tan M, Piskol R, Lek M, Snyder M, MacArthur DG, Li JB, Montgomery SB. Allelic expression of deleterious protein-coding variants across human tissues. PLoS Genet 2014; 10:e1004304. [PMID: 24786518 PMCID: PMC4006732 DOI: 10.1371/journal.pgen.1004304] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 02/27/2014] [Indexed: 11/19/2022] Open
Abstract
Personal exome and genome sequencing provides access to loss-of-function and rare deleterious alleles whose interpretation is expected to provide insight into individual disease burden. However, for each allele, accurate interpretation of its effect will depend on both its penetrance and the trait's expressivity. In this regard, an important factor that can modify the effect of a pathogenic coding allele is its level of expression; a factor which itself characteristically changes across tissues. To better inform the degree to which pathogenic alleles can be modified by expression level across multiple tissues, we have conducted exome, RNA and deep, targeted allele-specific expression (ASE) sequencing in ten tissues obtained from a single individual. By combining such data, we report the impact of rare and common loss-of-function variants on allelic expression exposing stronger allelic bias for rare stop-gain variants and informing the extent to which rare deleterious coding alleles are consistently expressed across tissues. This study demonstrates the potential importance of transcriptome data to the interpretation of pathogenic protein-coding variants. Gene expression is a fundamental cellular process that contributes to phenotypic diversity. Gene expression can vary between alleles of an individual through differences in genomic imprinting or cis-acting regulatory variation. Distinguishing allelic activity is important for informing the abundance of altered mRNA and protein products. Advances in sequencing technologies allow us to quantify patterns of allele-specific expression (ASE) in different individuals and cell-types. Previous studies have identified patterns of ASE across human populations for single cell-types; however the degree of tissue-specificity of ASE has not been deeply characterized. In this study, we compare patterns of ASE across multiple tissues from a single individual using whole transcriptome sequencing (RNA-Seq) and a targeted, high-resolution assay (mmPCR-Seq). We detect patterns of ASE for rare deleterious and loss-of-function protein-coding variants, informing the frequency at which allelic expression could modify the functional impact of personal deleterious protein-coding across tissues. We demonstrate that these interactions occur for one third of such variants however large direction flips in allelic expression are infrequent.
Collapse
Affiliation(s)
- Kimberly R. Kukurba
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Rui Zhang
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Xin Li
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Kevin S. Smith
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - David A. Knowles
- Department of Computer Science, Stanford University School of Medicine, Stanford, California, United States of America
| | - Meng How Tan
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Robert Piskol
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Monkol Lek
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Michael Snyder
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
| | - Daniel G. MacArthur
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, Massachusetts, United States of America
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, Massachusetts, United States of America
| | - Jin Billy Li
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (JBL); (SBM)
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Genetics, Stanford University School of Medicine, Stanford, California, United States of America
- Department of Computer Science, Stanford University School of Medicine, Stanford, California, United States of America
- * E-mail: (JBL); (SBM)
| |
Collapse
|
31152
|
Ramundo S, Casero D, Mühlhaus T, Hemme D, Sommer F, Crèvecoeur M, Rahire M, Schroda M, Rusch J, Goodenough U, Pellegrini M, Perez-Perez ME, Crespo JL, Schaad O, Civic N, Rochaix JD. Conditional Depletion of the Chlamydomonas Chloroplast ClpP Protease Activates Nuclear Genes Involved in Autophagy and Plastid Protein Quality Control. THE PLANT CELL 2014; 26:2201-2222. [PMID: 24879428 PMCID: PMC4079378 DOI: 10.1105/tpc.114.124842] [Citation(s) in RCA: 92] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2014] [Revised: 04/16/2014] [Accepted: 05/09/2014] [Indexed: 05/14/2023]
Abstract
Plastid protein homeostasis is critical during chloroplast biogenesis and responses to changes in environmental conditions. Proteases and molecular chaperones involved in plastid protein quality control are encoded by the nucleus except for the catalytic subunit of ClpP, an evolutionarily conserved serine protease. Unlike its Escherichia coli ortholog, this chloroplast protease is essential for cell viability. To study its function, we used a recently developed system of repressible chloroplast gene expression in the alga Chlamydomonas reinhardtii. Using this repressible system, we have shown that a selective gradual depletion of ClpP leads to alteration of chloroplast morphology, causes formation of vesicles, and induces extensive cytoplasmic vacuolization that is reminiscent of autophagy. Analysis of the transcriptome and proteome during ClpP depletion revealed a set of proteins that are more abundant at the protein level, but not at the RNA level. These proteins may comprise some of the ClpP substrates. Moreover, the specific increase in accumulation, both at the RNA and protein level, of small heat shock proteins, chaperones, proteases, and proteins involved in thylakoid maintenance upon perturbation of plastid protein homeostasis suggests the existence of a chloroplast-to-nucleus signaling pathway involved in organelle quality control. We suggest that this represents a chloroplast unfolded protein response that is conceptually similar to that observed in the endoplasmic reticulum and in mitochondria.
Collapse
Affiliation(s)
- Silvia Ramundo
- Departments of Molecular Biology and Plant Biology, University of Geneva, 1211 Geneva, Switzerland
| | - David Casero
- Institute for Genomics and Proteomics, University of California, Los Angeles, California 90095
| | - Timo Mühlhaus
- Max Planck Institute of Molecular Plant Physiology, D-14476 Potsdam-Golm Germany
| | - Dorothea Hemme
- Max Planck Institute of Molecular Plant Physiology, D-14476 Potsdam-Golm Germany
| | - Frederik Sommer
- Max Planck Institute of Molecular Plant Physiology, D-14476 Potsdam-Golm Germany
| | - Michèle Crèvecoeur
- Departments of Molecular Biology and Plant Biology, University of Geneva, 1211 Geneva, Switzerland
| | - Michèle Rahire
- Departments of Molecular Biology and Plant Biology, University of Geneva, 1211 Geneva, Switzerland
| | - Michael Schroda
- Max Planck Institute of Molecular Plant Physiology, D-14476 Potsdam-Golm Germany
| | - Jannette Rusch
- Department of Biology, Washington University, St. Louis, Missouri 63130
| | - Ursula Goodenough
- Department of Biology, Washington University, St. Louis, Missouri 63130
| | - Matteo Pellegrini
- Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, California 90095
| | - Maria Esther Perez-Perez
- Instituto de Bioquimica Vegetal y Fotosintesis, Consejo Superior de Investigaciones Cientificas, Universidad de Sevilla, 41092 Sevilla, Spain
| | - José Luis Crespo
- Instituto de Bioquimica Vegetal y Fotosintesis, Consejo Superior de Investigaciones Cientificas, Universidad de Sevilla, 41092 Sevilla, Spain
| | - Olivier Schaad
- Genomics Platform, University of Geneva, 1211 Geneva, Switzerland Department of Biochemistry, University of Geneva, 1211 Geneva, Switzerland
| | - Natacha Civic
- Genomics Platform, University of Geneva, 1211 Geneva, Switzerland
| | - Jean David Rochaix
- Departments of Molecular Biology and Plant Biology, University of Geneva, 1211 Geneva, Switzerland
| |
Collapse
|
31153
|
Sreedharan VT, Schultheiss SJ, Jean G, Kahles A, Bohnert R, Drewe P, Mudrakarta P, Görnitz N, Zeller G, Rätsch G. Oqtans: the RNA-seq workbench in the cloud for complete and reproducible quantitative transcriptome analysis. Bioinformatics 2014; 30:1300-1. [PMID: 24413671 PMCID: PMC3998122 DOI: 10.1093/bioinformatics/btt731] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2013] [Revised: 11/09/2013] [Accepted: 12/13/2013] [Indexed: 11/17/2022] Open
Abstract
We present Oqtans, an open-source workbench for quantitative transcriptome analysis, that is integrated in Galaxy. Its distinguishing features include customizable computational workflows and a modular pipeline architecture that facilitates comparative assessment of tool and data quality. Oqtans integrates an assortment of machine learning-powered tools into Galaxy, which show superior or equal performance to state-of-the-art tools. Implemented tools comprise a complete transcriptome analysis workflow: short-read alignment, transcript identification/quantification and differential expression analysis. Oqtans and Galaxy facilitate persistent storage, data exchange and documentation of intermediate results and analysis workflows. We illustrate how Oqtans aids the interpretation of data from different experiments in easy to understand use cases. Users can easily create their own workflows and extend Oqtans by integrating specific tools. Oqtans is available as (i) a cloud machine image with a demo instance at cloud.oqtans.org, (ii) a public Galaxy instance at galaxy.cbio.mskcc.org, (iii) a git repository containing all installed software (oqtans.org/git); most of which is also available from (iv) the Galaxy Toolshed and (v) a share string to use along with Galaxy CloudMan.
Collapse
Affiliation(s)
- Vipin T Sreedharan
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, New York, NY, USA, Machine Learning in Biology Group, Friedrich Miescher Laboratory, Tübingen, Germany, LINA, Combinatorics and Bioinformatics Group, University of Nantes, Nantes, France, Machine Learning/Intelligent Data Analysis Group, Technical University, Berlin, Germany and Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31154
|
Brown SD, Warren RL, Gibb EA, Martin SD, Spinelli JJ, Nelson BH, Holt RA. Neo-antigens predicted by tumor genome meta-analysis correlate with increased patient survival. Genome Res 2014; 24:743-50. [PMID: 24782321 PMCID: PMC4009604 DOI: 10.1101/gr.165985.113] [Citation(s) in RCA: 467] [Impact Index Per Article: 42.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Somatic missense mutations can initiate tumorogenesis and, conversely, anti-tumor cytotoxic T cell (CTL) responses. Tumor genome analysis has revealed extreme heterogeneity among tumor missense mutation profiles, but their relevance to tumor immunology and patient outcomes has awaited comprehensive evaluation. Here, for 515 patients from six tumor sites, we used RNA-seq data from The Cancer Genome Atlas to identify mutations that are predicted to be immunogenic in that they yielded mutational epitopes presented by the MHC proteins encoded by each patient’s autologous HLA-A alleles. Mutational epitopes were associated with increased patient survival. Moreover, the corresponding tumors had higher CTL content, inferred from CD8A gene expression, and elevated expression of the CTL exhaustion markers PDCD1 and CTLA4. Mutational epitopes were very scarce in tumors without evidence of CTL infiltration. These findings suggest that the abundance of predicted immunogenic mutations may be useful for identifying patients likely to benefit from checkpoint blockade and related immunotherapies.
Collapse
Affiliation(s)
- Scott D Brown
- Canada's Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia V5Z 1L3, Canada
| | | | | | | | | | | | | |
Collapse
|
31155
|
Mutated tumor alleles are expressed according to their DNA frequency. Sci Rep 2014; 4:4743. [PMID: 24752137 PMCID: PMC3994436 DOI: 10.1038/srep04743] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 03/31/2014] [Indexed: 01/23/2023] Open
Abstract
The transcription of tumor mutations from DNA into RNA has implications for biology, epigenetics and clinical practice. It is not clear if mutations are in general transcribed and, if so, at what proportion to the wild-type allele. Here, we examined the correlation between DNA mutation allele frequency and RNA mutation allele frequency. We sequenced the exome and transcriptome of tumor cell lines with large copy number variations, identified heterozygous single nucleotide mutations and absolute DNA copy number, and determined the corresponding DNA and RNA mutation allele fraction. We found that 99% of the DNA mutations in expressed genes are expressed as RNA. Moreover, we found a high correlation between the DNA and RNA mutation allele frequency. Exceptions are mutations that cause premature termination codons and therefore activate nonsense-mediated decay. Beyond this, we did not find evidence of any wide-scale mechanism, such as allele-specific epigenetic silencing, preferentially promoting mutated or wild-type alleles. In conclusion, our data strongly suggest that genes are equally transcribed from all alleles, mutated and wild-type, and thus transcribed in proportion to their DNA allele frequency.
Collapse
|
31156
|
Schurch NJ, Cole C, Sherstnev A, Song J, Duc C, Storey KG, McLean WHI, Brown SJ, Simpson GG, Barton GJ. Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-Seq and ESTs. PLoS One 2014; 9:e94270. [PMID: 24722185 PMCID: PMC3983147 DOI: 10.1371/journal.pone.0094270] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 03/13/2014] [Indexed: 11/23/2022] Open
Abstract
The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct and complete annotation in addition to the underlying genomic sequence is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3′ untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3′ polyadenylation sites to within +/− 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3′ UTR re-annotation (including extension of one 3′ UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental data.
Collapse
Affiliation(s)
- Nicholas J. Schurch
- Division of Computational Biology, University of Dundee, Dundee, United Kingdom
- Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, United Kingdom
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, United Kingdom
| | - Christian Cole
- Division of Computational Biology, University of Dundee, Dundee, United Kingdom
- Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, United Kingdom
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, United Kingdom
| | - Alexander Sherstnev
- Division of Computational Biology, University of Dundee, Dundee, United Kingdom
| | - Junfang Song
- Division of Cell and Developmental Biology, University of Dundee, Dundee, United Kingdom
| | - Céline Duc
- Division of Plant Sciences, University of Dundee, Dundee, United Kingdom
| | - Kate G. Storey
- Division of Cell and Developmental Biology, University of Dundee, Dundee, United Kingdom
| | - W. H. Irwin McLean
- Centre for Dermatology and Genetic Medicine, University of Dundee, Dundee, United Kingdom
| | - Sara J. Brown
- Centre for Dermatology and Genetic Medicine, University of Dundee, Dundee, United Kingdom
| | - Gordon G. Simpson
- Division of Plant Sciences, University of Dundee, Dundee, United Kingdom
- Cell and Molecular Sciences, The James Hutton Institute, Dundee, United Kingdom
| | - Geoffrey J. Barton
- Division of Computational Biology, University of Dundee, Dundee, United Kingdom
- Division of Biological Chemistry and Drug Discovery, University of Dundee, Dundee, United Kingdom
- Centre for Gene Regulation and Expression, University of Dundee, Dundee, United Kingdom
- * E-mail:
| |
Collapse
|
31157
|
Weissmueller S, Manchado E, Saborowski M, Morris JP, Wagenblast E, Davis CA, Moon SH, Pfister NT, Tschaharganeh DF, Kitzing T, Aust D, Markert EK, Wu J, Grimmond SM, Pilarsky C, Prives C, Biankin AV, Lowe SW. Mutant p53 drives pancreatic cancer metastasis through cell-autonomous PDGF receptor β signaling. Cell 2014; 157:382-394. [PMID: 24725405 PMCID: PMC4001090 DOI: 10.1016/j.cell.2014.01.066] [Citation(s) in RCA: 402] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Revised: 12/17/2013] [Accepted: 01/23/2014] [Indexed: 12/14/2022]
Abstract
Missense mutations in the p53 tumor suppressor inactivate its antiproliferative properties but can also promote metastasis through a gain-of-function activity. We show that sustained expression of mutant p53 is required to maintain the prometastatic phenotype of a murine model of pancreatic cancer, a highly metastatic disease that frequently displays p53 mutations. Transcriptional profiling and functional screening identified the platelet-derived growth factor receptor b (PDGFRb) as both necessary and sufficient to mediate these effects. Mutant p53 induced PDGFRb through a cell-autonomous mechanism involving inhibition of a p73/NF-Y complex that represses PDGFRb expression in p53-deficient, noninvasive cells. Blocking PDGFRb signaling by RNA interference or by small molecule inhibitors prevented pancreatic cancer cell invasion in vitro and metastasis formation in vivo. Finally, high PDGFRb expression correlates with poor disease-free survival in pancreatic, colon, and ovarian cancer patients, implicating PDGFRb as a prognostic marker and possible target for attenuating metastasis in p53 mutant tumors.
Collapse
Affiliation(s)
- Susann Weissmueller
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Eusebio Manchado
- Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Michael Saborowski
- Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - John P Morris
- Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Elvin Wagenblast
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Carrie A Davis
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA
| | - Sung-Hwan Moon
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Neil T Pfister
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Darjus F Tschaharganeh
- Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Thomas Kitzing
- Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA
| | - Daniela Aust
- Department of Visceral, Thoracic and Vascular Surgery, Technical University of Dresden, 01062 Dresden, Germany
| | - Elke K Markert
- The Simons Center for Systems Biology, Institute for Advanced Study, Princeton, NJ 08540, USA
| | - Jianmin Wu
- The Kinghorn Cancer Centre, Cancer Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia; St Vincent's Clinical School, University of New South Wales, Sydney, NSW 2010, Australia
| | - Sean M Grimmond
- Queensland Centre for Medical Genomics, Institute for Molecular Bioscience, University of Queensland, Santa Lucia 4072, Australia; Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Scotland G61 1BD, UK
| | - Christian Pilarsky
- Department of Visceral, Thoracic and Vascular Surgery, Technical University of Dresden, 01062 Dresden, Germany
| | - Carol Prives
- Department of Biological Sciences, Columbia University, New York, NY 10027, USA
| | - Andrew V Biankin
- The Kinghorn Cancer Centre, Cancer Division, Garvan Institute of Medical Research, Sydney NSW 2010, Australia; Wolfson Wohl Cancer Research Centre, Institute of Cancer Sciences, University of Glasgow, Scotland G61 1BD, UK
| | - Scott W Lowe
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY 11724, USA; Department of Cancer Biology and Genetics, Memorial Sloan-Kettering Cancer Center, New York, NY 10065, USA; Howard Hughes Medical Institute, New York, NY 10065, USA.
| |
Collapse
|
31158
|
Honeyman JN, Simon EP, Robine N, Chiaroni-Clarke R, Darcy DG, Lim IIP, Gleason CE, Murphy JM, Rosenberg BR, Teegan L, Takacs CN, Botero S, Belote R, Germer S, Emde AK, Vacic V, Bhanot U, LaQuaglia MP, Simon SM. Detection of a recurrent DNAJB1-PRKACA chimeric transcript in fibrolamellar hepatocellular carcinoma. Science 2014; 343:1010-4. [PMID: 24578576 DOI: 10.1126/science.1249484] [Citation(s) in RCA: 348] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
Fibrolamellar hepatocellular carcinoma (FL-HCC) is a rare liver tumor affecting adolescents and young adults with no history of primary liver disease or cirrhosis. We identified a chimeric transcript that is expressed in FL-HCC but not in adjacent normal liver and that arises as the result of a ~400-kilobase deletion on chromosome 19. The chimeric RNA is predicted to code for a protein containing the amino-terminal domain of DNAJB1, a homolog of the molecular chaperone DNAJ, fused in frame with PRKACA, the catalytic domain of protein kinase A. Immunoprecipitation and Western blot analyses confirmed that the chimeric protein is expressed in tumor tissue, and a cell culture assay indicated that it retains kinase activity. Evidence supporting the presence of the DNAJB1-PRKACA chimeric transcript in 100% of the FL-HCCs examined (15/15) suggests that this genetic alteration contributes to tumor pathogenesis.
Collapse
Affiliation(s)
- Joshua N Honeyman
- Laboratory of Cellular Biophysics, Rockefeller University, 1230 York Avenue, New York, NY 10065, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31159
|
Abstract
RNA sequencing (RNAseq) samples the majority of expressed genes infrequently, owing to the large size, complex splicing and wide dynamic range of eukaryotic transcriptomes. This results in sparse sequencing coverage that can hinder robust isoform assembly and quantification. RNA capture sequencing (CaptureSeq) addresses this challenge by using oligonucleotide probes to capture selected genes or regions of interest for targeted sequencing. Targeted RNAseq provides enhanced coverage for sensitive gene discovery, robust transcript assembly and accurate gene quantification. Here we describe a detailed protocol for all stages of RNA CaptureSeq, from initial probe design considerations and capture of targeted genes to final assembly and quantification of captured transcripts. Initial probe design and final analysis can take less than 1 d, whereas the central experimental capture stage requires ∼7 d.
Collapse
|
31160
|
Robles-Espinoza CD, Adams DJ. Cross-species analysis of mouse and human cancer genomes. Cold Spring Harb Protoc 2014; 2014:350-8. [PMID: 24173316 DOI: 10.1101/pdb.top078824] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Fundamental advances in our understanding of the human cancer genome have been made over the last five years, driven largely by the development of next-generation sequencing (NGS) technologies. Here we will discuss the tools and technologies that have been used to profile human tumors, how they may be applied to the analysis of the mouse cancer genome, and the results thus far. In addition to mutations that disrupt cancer genes, NGS is also being applied to the analysis of the transcriptome of cancers, and, through the use of techniques such as ChIP-Seq, the protein-DNA landscape is also being revealed. Gaining a comprehensive picture of the mouse cancer genome, at the DNA level and through the analysis of the transcriptome and regulatory landscape, will allow us to "biofilter" for driver genes in more complex human cancers and represents a critical test to determine which mouse cancer models are faithful genetic surrogates of the human disease.
Collapse
|
31161
|
Mensaert K, Denil S, Trooskens G, Van Criekinge W, Thas O, De Meyer T. Next-generation technologies and data analytical approaches for epigenomics. ENVIRONMENTAL AND MOLECULAR MUTAGENESIS 2014; 55:155-70. [PMID: 24327356 DOI: 10.1002/em.21841] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2013] [Revised: 11/27/2013] [Accepted: 11/27/2013] [Indexed: 05/18/2023]
Abstract
Epigenetics refers to the collection of heritable features that modulate the genome-environment interaction without being encoded in the actual DNA sequence. While being mitotically and sometimes even meiotically transmitted, epigenetic traits often demonstrate extensive flexibility. This allows cells to acquire diverse gene expression patterns during differentiation, but also to adapt to a changing environment. However, epigenetic alterations are not always beneficial to the organism, as they are, for example, frequently identified in human diseases such as cancer. Accurate and cost-efficient genome-scale profiling of epigenetic features is thus of major importance to pinpoint these "epimutations," for example, to monitor the epigenetic impact of environmental exposure. Over the last decade, the field of epigenetics has been revolutionized by several innovative "epigenomics" technologies exactly addressing this need. In this review, we discuss and compare widely used next-generation methods to assess DNA methylation and hydroxymethylation, noncoding RNA expression, histone modifications, and nucleosome positioning. Although recent methods are typically based on "second-generation" sequencing, we also pay attention to still commonly used array- and PCR-based methods, and look forward to the additional advantages of single-molecule sequencing. As the current bottleneck in epigenomics research is the analysis rather than generation of data, the basic difficulties and problem-solving strategies regarding data preprocessing and statistical analysis are introduced for the different technologies. Finally, we also consider the complications associated with epigenomic studies of species with yet unsequenced genomes and possible solutions.
Collapse
Affiliation(s)
- Klaas Mensaert
- Department of Mathematical Modelling, Statistics and Bioinformatics, Faculty of Bioscience Engineering, Ghent University, Ghent, Belgium
| | | | | | | | | | | |
Collapse
|
31162
|
Abstract
DNA methylation is a dynamic process through which specific chromatin modifications can be stably transmitted from parent to daughter cells. A large body of work has suggested that DNA methylation influences gene expression by silencing gene promoters. However, these conclusions were drawn from data focused mostly on promoter regions. Regarding the entire genome, it is unclear how methylation and gene transcription patterns are related during vertebrate development. To identify the genome-wide distribution of CpG methylation, we created series of high-resolution methylome maps of Danio rerio embryos during development and in mature, differentiated tissues. We found that embryonic and terminal tissues have unique methylation signatures in CpG islands and repetitive sequences. Fully differentiated tissues have increased CpG and LTR methylation and decreased SINE methylation relative to embryonic tissues. Unsupervised clustering analyses reveal that the embryonic and terminal tissues can be classified solely by their methylation patterning. Novel analyses also identify a previously undescribed genome-wide exon methylation signature. We also compared whole genome methylation with genome-wide mRNA expression levels using publicly available RNA-seq datasets. These comparisons revealed previously unrecognized relationships between gene expression, alternative splicing, and exon methylation. Surprisingly, we found that exonic methylation is a better predictor of mRNA expression level than promoter methylation. We also found that transcriptionally skipped exons have significantly less methylation than retained exons. Our integrative analyses reveal highly complex interplay between gene expression, alternative splicing, development, and methylation patterning in zebrafish.
Collapse
|
31163
|
Nance T, Smith KS, Anaya V, Richardson R, Ho L, Pala M, Mostafavi S, Battle A, Feghali-Bostwick C, Rosen G, Montgomery SB. Transcriptome analysis reveals differential splicing events in IPF lung tissue. PLoS One 2014; 9:e92111. [PMID: 24647608 PMCID: PMC3960165 DOI: 10.1371/journal.pone.0092111] [Citation(s) in RCA: 62] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2013] [Accepted: 02/18/2014] [Indexed: 12/22/2022] Open
Abstract
Idiopathic pulmonary fibrosis (IPF) is a complex disease in which a multitude of proteins and networks are disrupted. Interrogation of the transcriptome through RNA sequencing (RNA-Seq) enables the determination of genes whose differential expression is most significant in IPF, as well as the detection of alternative splicing events which are not easily observed with traditional microarray experiments. We sequenced messenger RNA from 8 IPF lung samples and 7 healthy controls on an Illumina HiSeq 2000, and found evidence for substantial differential gene expression and differential splicing. 873 genes were differentially expressed in IPF (FDR<5%), and 440 unique genes had significant differential splicing events in at least one exonic region (FDR<5%). We used qPCR to validate the differential exon usage in the second and third most significant exonic regions, in the genes COL6A3 (RNA-Seq adjusted pval = 7.18e-10) and POSTN (RNA-Seq adjusted pval = 2.06e-09), which encode the extracellular matrix proteins collagen alpha-3(VI) and periostin. The increased gene-level expression of periostin has been associated with IPF and its clinical progression, but its differential splicing has not been studied in the context of this disease. Our results suggest that alternative splicing of these and other genes may be involved in the pathogenesis of IPF. We have developed an interactive web application which allows users to explore the results of our RNA-Seq experiment, as well as those of two previously published microarray experiments, and we hope that this will serve as a resource for future investigations of gene regulation in IPF.
Collapse
Affiliation(s)
- Tracy Nance
- Department of Pathology, Stanford University, Stanford, California, United States of America
- * E-mail: (TN); (GR); (SBM)
| | - Kevin S. Smith
- Department of Pathology, Stanford University, Stanford, California, United States of America
| | - Vanessa Anaya
- Department of Pathology, Stanford University, Stanford, California, United States of America
| | - Rhea Richardson
- Department of Pathology, Stanford University, Stanford, California, United States of America
| | - Lawrence Ho
- Division of Pulmonary and Critical Care Medicine, University of Washington, Seattle, Washington, United States of America
| | - Mauro Pala
- Department of Pathology, Stanford University, Stanford, California, United States of America
| | - Sara Mostafavi
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Alexis Battle
- Department of Computer Science, Stanford University, Stanford, California, United States of America
| | - Carol Feghali-Bostwick
- Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania, United States of America
| | - Glenn Rosen
- Department of Medicine, Division of Pulmonary and Critical Care Medicine, Stanford University, Stanford, California, United States of America
- * E-mail: (TN); (GR); (SBM)
| | - Stephen B. Montgomery
- Department of Pathology, Stanford University, Stanford, California, United States of America
- * E-mail: (TN); (GR); (SBM)
| |
Collapse
|
31164
|
Cartography of neurexin alternative splicing mapped by single-molecule long-read mRNA sequencing. Proc Natl Acad Sci U S A 2014; 111:E1291-9. [PMID: 24639501 DOI: 10.1073/pnas.1403244111] [Citation(s) in RCA: 236] [Impact Index Per Article: 21.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Neurexins are evolutionarily conserved presynaptic cell-adhesion molecules that are essential for normal synapse formation and synaptic transmission. Indirect evidence has indicated that extensive alternative splicing of neurexin mRNAs may produce hundreds if not thousands of neurexin isoforms, but no direct evidence for such diversity has been available. Here we use unbiased long-read sequencing of full-length neurexin (Nrxn)1α, Nrxn1β, Nrxn2β, Nrxn3α, and Nrxn3β mRNAs to systematically assess how many sites of alternative splicing are used in neurexins with a significant frequency, and whether alternative splicing events at these sites are independent of each other. In sequencing more than 25,000 full-length mRNAs, we identified a novel, abundantly used alternatively spliced exon of Nrxn1α and Nrxn3α (referred to as alternatively spliced sequence 6) that encodes a 9-residue insertion in the flexible hinge region between the fifth LNS (laminin-α, neurexin, sex hormone-binding globulin) domain and the third EGF-like sequence. In addition, we observed several larger-scale events of alternative splicing that deleted multiple domains and were much less frequent than the canonical six sites of alternative splicing in neurexins. All of the six canonical events of alternative splicing appear to be independent of each other, suggesting that neurexins may exhibit an even larger isoform diversity than previously envisioned and comprise thousands of variants. Our data are consistent with the notion that α-neurexins represent extracellular protein-interaction scaffolds in which different LNS and EGF domains mediate distinct interactions that affect diverse functions and are independently regulated by independent events of alternative splicing.
Collapse
|
31165
|
Otto C, Stadler PF, Hoffmann S. Lacking alignments? The next-generation sequencing mapper segemehl revisited. ACTA ACUST UNITED AC 2014; 30:1837-43. [PMID: 24626854 DOI: 10.1093/bioinformatics/btu146] [Citation(s) in RCA: 58] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
MOTIVATION Next-generation sequencing has become an important tool in molecular biology. Various protocols to investigate genomic, transcriptomic and epigenomic features across virtually all species and tissues have been devised. For most of these experiments, one of the first crucial steps of bioinformatic analysis is the mapping of reads to reference genomes. RESULTS Here, we present thorough benchmarks of our read aligner segemehl in comparison with other state-of-the-art methods. Furthermore, we introduce the tool lack to rescue unmapped RNA-seq reads which works in conjunction with segemehl and many other frequently used split-read aligners. AVAILABILITY lack is distributed together with segemehl and freely available at www.bioinf.uni-leipzig.de/Software/segemehl/.
Collapse
Affiliation(s)
- Christian Otto
- Transcriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, DenmarkTranscriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark
| | - Peter F Stadler
- Transcriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, DenmarkTranscriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, DenmarkTranscriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, DenmarkTranscriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fra
| | - Steve Hoffmann
- Transcriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, DenmarkTranscriptome Bioinformatics Junior Research Group, LIFE-Leipzig Research Center for Civilization Diseases, Interdisciplinary Center for Bioinformatics, Bioinformatics Group, Department of Computer Science, University Leipzig, RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig, Germany, Santa Fe Institute, Santa Fe, New Mexico, USA, Department of Theoretical Chemistry, University of Vienna, Austria, Max-Planck-Institute for Mathematics in Sciences, Leipzig, Germany and Center for non-coding RNA in Technology and Health, University of Copenhagen, Denmark
| |
Collapse
|
31166
|
Barsi JC, Tu Q, Davidson EH. General approach for in vivo recovery of cell type-specific effector gene sets. Genome Res 2014; 24:860-8. [PMID: 24604781 PMCID: PMC4009615 DOI: 10.1101/gr.167668.113] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Differentially expressed, cell type-specific effector gene sets hold the key to multiple important problems in biology, from theoretical aspects of developmental gene regulatory networks (GRNs) to various practical applications. Although individual cell types of interest have been recovered by various methods and analyzed, systematic recovery of multiple cell type-specific gene sets from whole developing organisms has remained problematic. Here we describe a general methodology using the sea urchin embryo, a material of choice because of the large-scale GRNs already solved for this model system. This method utilizes the regulatory states expressed by given cells of the embryo to define cell type and includes a fluorescence activated cell sorting (FACS) procedure that results in no perturbation of transcript representation. We have extensively validated the method by spatial and qualitative analyses of the transcriptome expressed in isolated embryonic skeletogenic cells and as a consequence, generated a prototypical cell type-specific transcriptome database.
Collapse
Affiliation(s)
- Julius C Barsi
- Division of Biology, California Institute of Technology, Pasadena, California 91125, USA
| | | | | |
Collapse
|
31167
|
Gatto A, Torroja-Fungairiño C, Mazzarotto F, Cook SA, Barton PJR, Sánchez-Cabo F, Lara-Pezzi E. FineSplice, enhanced splice junction detection and quantification: a novel pipeline based on the assessment of diverse RNA-Seq alignment solutions. Nucleic Acids Res 2014; 42:e71. [PMID: 24574529 PMCID: PMC4005686 DOI: 10.1093/nar/gku166] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
Alternative splicing is the main mechanism governing protein diversity. The recent developments in RNA-Seq technology have enabled the study of the global impact and regulation of this biological process. However, the lack of standardized protocols constitutes a major bottleneck in the analysis of alternative splicing. This is particularly important for the identification of exon–exon junctions, which is a critical step in any analysis workflow. Here we performed a systematic benchmarking of alignment tools to dissect the impact of design and method on the mapping, detection and quantification of splice junctions from multi-exon reads. Accordingly, we devised a novel pipeline based on TopHat2 combined with a splice junction detection algorithm, which we have named FineSplice. FineSplice allows effective elimination of spurious junction hits arising from artefactual alignments, achieving up to 99% precision in both real and simulated data sets and yielding superior F1 scores under most tested conditions. The proposed strategy conjugates an efficient mapping solution with a semi-supervised anomaly detection scheme to filter out false positives and allows reliable estimation of expressed junctions from the alignment output. Ultimately this provides more accurate information to identify meaningful splicing patterns. FineSplice is freely available at https://sourceforge.net/p/finesplice/.
Collapse
Affiliation(s)
- Alberto Gatto
- Cardiovascular Development and Repair Department, Centro Nacional de Investigaciones Cardiovasculares, Madrid, 28029, Spain, Bioinformatics Unit, Centro Nacional de Investigaciones Cardiovasculares, Madrid, 28029, Spain, National Heart and Lung Institute, Imperial College London, London SW7 2AZ, UK, Cardiovascular Biomedical Research Unit, NIHR Royal Brompton and Harefield NHS Foundation Trust, London SW3 6NP, UK, Department of Cardiology, National Heart Centre Singapore, Singapore 168752, Singapore and Cardiovascular and Metabolic Disorders Program, Duke-NUS Graduate Medical School, Singapore 169857, Singapore
| | | | | | | | | | | | | |
Collapse
|
31168
|
Hoffmann S, Otto C, Doose G, Tanzer A, Langenberger D, Christ S, Kunz M, Holdt LM, Teupser D, Hackermüller J, Stadler PF. A multi-split mapping algorithm for circular RNA, splicing, trans-splicing and fusion detection. Genome Biol 2014; 15:R34. [PMID: 24512684 PMCID: PMC4056463 DOI: 10.1186/gb-2014-15-2-r34] [Citation(s) in RCA: 204] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 02/10/2014] [Indexed: 11/25/2022] Open
Abstract
Numerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools for conventionally spliced mRNAs and, with a gain of up to 40% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (http://www.bioinf.uni-leipzig.de/Software/segemehl/).
Collapse
Affiliation(s)
- Steve Hoffmann
- Junior Research Group Transcriptome Bioinformatics, Leipzig University, Haertelstrasse 16-18, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- LIFE Research Center for Civilization Diseases, Leipzig University
| | - Christian Otto
- Junior Research Group Transcriptome Bioinformatics, Leipzig University, Haertelstrasse 16-18, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- LIFE Research Center for Civilization Diseases, Leipzig University
| | - Gero Doose
- Junior Research Group Transcriptome Bioinformatics, Leipzig University, Haertelstrasse 16-18, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- LIFE Research Center for Civilization Diseases, Leipzig University
| | - Andrea Tanzer
- Department of Theoretical Chemistry, University of Vienna, Währinger Strasse 17, Vienna, Austria
| | - David Langenberger
- Junior Research Group Transcriptome Bioinformatics, Leipzig University, Haertelstrasse 16-18, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- LIFE Research Center for Civilization Diseases, Leipzig University
| | - Sabina Christ
- RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology – IZI, Perlickstrasse 1, Leipzig, Germany
| | - Manfred Kunz
- Department of Dermatology, Venerology and Allergology, Leipzig University, Philipp-Rosenthal-Strasse 23, Leipzig, Germany
| | - Lesca M Holdt
- LIFE Research Center for Civilization Diseases, Leipzig University
- Institute of Laboratory Medicine, Ludwig Maximilian University, Marchioninistrasse 15, Munich, Germany
| | - Daniel Teupser
- LIFE Research Center for Civilization Diseases, Leipzig University
- Institute of Laboratory Medicine, Ludwig Maximilian University, Marchioninistrasse 15, Munich, Germany
| | - Jörg Hackermüller
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology – IZI, Perlickstrasse 1, Leipzig, Germany
- Young Investigators Group Bioinformatics and Transcriptomics, Department of Proteomics, Helmholtz Centre for Environmental Research – UFZ, Permoserstrasse 15, Leipzig, Germany
| | - Peter F Stadler
- Junior Research Group Transcriptome Bioinformatics, Leipzig University, Haertelstrasse 16-18, Leipzig, Germany
- Interdisciplinary Center for Bioinformatics and Bioinformatics Group, University Leipzig, Haertelstrasse 16-18, Leipzig, Germany
- LIFE Research Center for Civilization Diseases, Leipzig University
- Department of Theoretical Chemistry, University of Vienna, Währinger Strasse 17, Vienna, Austria
- Max Planck Institute for Mathematics in the Sciences, Inselstrasse 22, Leipzig, Germany
- Center for non-coding RNA in Technology and Health, University of Copenhagen, Grønnegårdsvej 3, Frederiksberg, Denmark
- Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM, USA
| |
Collapse
|
31169
|
Ning L, Liu G, Li G, Hou Y, Tong Y, He J. Current challenges in the bioinformatics of single cell genomics. Front Oncol 2014; 4:7. [PMID: 24478987 PMCID: PMC3902584 DOI: 10.3389/fonc.2014.00007] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Accepted: 01/12/2014] [Indexed: 11/13/2022] Open
Abstract
Single cell genomics is a rapidly growing field with many new techniques emerging in the past few years. However, few bioinformatics tools specific for single cell genomics analysis are available. Single cell DNA/RNA sequencing data usually have low genome coverage and high amplification bias, which makes bioinformatics analysis challenging. Many current bioinformatics tools developed for bulk cell sequencing do not work well with single cell sequencing data. Here, we summarize current challenges in the bioinformatics analysis of single cell genomic DNA sequencing and single cell transcriptomes. These challenges include calling copy number variations, identifying mutated genes in tumor samples, reconstructing cell lineages, recovering low abundant transcripts, and improving the accuracy of quantitative analysis of transcripts. Development in single cell genomics bioinformatics analysis will promote the application of this technology to basic biology and medical research.
Collapse
Affiliation(s)
- Luwen Ning
- Department of Biology, South University of Science and Technology of China , Shenzhen , China
| | | | | | | | - Yin Tong
- Department of Biology, South University of Science and Technology of China , Shenzhen , China
| | - Jiankui He
- Department of Biology, South University of Science and Technology of China , Shenzhen , China
| |
Collapse
|
31170
|
Deng Q, Ramsköld D, Reinius B, Sandberg R. Single-cell RNA-seq reveals dynamic, random monoallelic gene expression in mammalian cells. Science 2014; 343:193-6. [PMID: 24408435 DOI: 10.1126/science.1245316] [Citation(s) in RCA: 883] [Impact Index Per Article: 80.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Expression from both alleles is generally observed in analyses of diploid cell populations, but studies addressing allelic expression patterns genome-wide in single cells are lacking. Here, we present global analyses of allelic expression across individual cells of mouse preimplantation embryos of mixed background (CAST/EiJ × C57BL/6J). We discovered abundant (12 to 24%) monoallelic expression of autosomal genes and that expression of the two alleles occurs independently. The monoallelic expression appeared random and dynamic because there was considerable variation among closely related embryonic cells. Similar patterns of monoallelic expression were observed in mature cells. Our allelic expression analysis also demonstrates the de novo inactivation of the paternal X chromosome. We conclude that independent and stochastic allelic transcription generates abundant random monoallelic expression in the mammalian cell.
Collapse
Affiliation(s)
- Qiaolin Deng
- Ludwig Institute for Cancer Research, Box 240, 171 77 Stockholm, Sweden
| | | | | | | |
Collapse
|
31171
|
Schueler M, Munschauer M, Gregersen LH, Finzel A, Loewer A, Chen W, Landthaler M, Dieterich C. Differential protein occupancy profiling of the mRNA transcriptome. Genome Biol 2014; 15:R15. [PMID: 24417896 PMCID: PMC4056462 DOI: 10.1186/gb-2014-15-1-r15] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2013] [Accepted: 01/13/2014] [Indexed: 12/16/2022] Open
Abstract
Background RNA-binding proteins (RBPs) mediate mRNA biogenesis, translation and decay. We recently developed an approach to profile transcriptome-wide RBP contacts on polyadenylated transcripts by next-generation sequencing. A comparison of such profiles from different biological conditions has the power to unravel dynamic changes in protein-contacted cis-regulatory mRNA regions without a priori knowledge of the regulatory protein component. Results We compared protein occupancy profiles of polyadenylated transcripts in MCF7 and HEK293 cells. Briefly, we developed a bioinformatics workflow to identify differential crosslinking sites in cDNA reads of 4-thiouridine crosslinked polyadenylated RNA samples. We identified 30,000 differential crosslinking sites between MCF7 and HEK293 cells at an estimated false discovery rate of 10%. 73% of all reported differential protein-RNA contact sites cannot be explained by local changes in exon usage as indicated by complementary RNA-seq data. The majority of differentially crosslinked positions are located in 3′ UTRs, show distinct secondary-structure characteristics and overlap with binding sites of known RBPs, such as ELAVL1. Importantly, mRNA transcripts with the most significant occupancy changes show elongated mRNA half-lives in MCF7 cells. Conclusions We present a global comparison of protein occupancy profiles from different cell types, and provide evidence for altered mRNA metabolism as a result of differential protein-RNA contacts. Additionally, we introduce POPPI, a bioinformatics workflow for the analysis of protein occupancy profiling experiments. Our work demonstrates the value of protein occupancy profiling for assessing cis-regulatory RNA sequence space and its dynamics in growth, development and disease.
Collapse
|
31172
|
Klein HU, Schäfer M, Porse BT, Hasemann MS, Ickstadt K, Dugas M. Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models. ACTA ACUST UNITED AC 2014; 30:1154-1162. [PMID: 24403540 DOI: 10.1093/bioinformatics/btu003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/30/2013] [Indexed: 01/08/2023]
Abstract
MOTIVATION Histone modifications are a key epigenetic mechanism to activate or repress the transcription of genes. Datasets of matched transcription data and histone modification data obtained by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatics approach to detect genes that show different transcript abundances between two conditions putatively caused by alterations in histone modification. RESULTS We introduce a correlation measure for integrative analysis of ChIP-seq and gene transcription data measured by RNA sequencing or microarrays and demonstrate that a proper normalization of ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different types of distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene transcription and histone modification. The method is applied to different datasets, and its superiority to a naive separate analysis of both data types is demonstrated. AVAILABILITY AND IMPLEMENTATION R/Bioconductor package epigenomix. CONTACT h.klein@uni-muenster.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Schäfer
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Bo T Porse
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Marie S Hasemann
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Katja Ickstadt
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| |
Collapse
|
31173
|
Zheng CL, Kawane S, Bottomly D, Wilmot B. Analysis considerations for utilizing RNA-Seq to characterize the brain transcriptome. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2014; 116:21-54. [PMID: 25172470 DOI: 10.1016/b978-0-12-801105-8.00002-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
RNA-Seq allows one to examine only gene expression as well as expression of noncoding RNAs, alternative splicing, and allele-specific expression. With this increased sensitivity and dynamic range, there are computational and statistical considerations that need to be contemplated, which are highly dependent on the biological question being asked. We highlight these to provide an overview of their importance and the impact they can have on downstream interpretation of the brain transcriptome.
Collapse
Affiliation(s)
- Christina L Zheng
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Knight Cancer Institute, Oregon Health, Oregon Health and Science University, Portland, Oregon, USA.
| | - Sunita Kawane
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Daniel Bottomly
- Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| | - Beth Wilmot
- Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, Oregon, USA; Clinical & Translational Research Institute, Oregon Health and Science University, Portland, Oregon, USA
| |
Collapse
|
31174
|
Kleinman CL, Gerges N, Papillon-Cavanagh S, Sin-Chan P, Pramatarova A, Quang DAK, Adoue V, Busche S, Caron M, Djambazian H, Bemmo A, Fontebasso AM, Spence T, Schwartzentruber J, Albrecht S, Hauser P, Garami M, Klekner A, Bognar L, Montes JL, Staffa A, Montpetit A, Berube P, Zakrzewska M, Zakrzewski K, Liberski PP, Dong Z, Siegel PM, Duchaine T, Perotti C, Fleming A, Faury D, Remke M, Gallo M, Dirks P, Taylor MD, Sladek R, Pastinen T, Chan JA, Huang A, Majewski J, Jabado N. Fusion of TTYH1 with the C19MC microRNA cluster drives expression of a brain-specific DNMT3B isoform in the embryonal brain tumor ETMR. Nat Genet 2014; 46:39-44. [PMID: 24316981 DOI: 10.1038/ng.2849] [Citation(s) in RCA: 139] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2013] [Accepted: 11/13/2013] [Indexed: 12/20/2022]
Abstract
Embryonal tumors with multilayered rosettes (ETMRs) are rare, deadly pediatric brain tumors characterized by high-level amplification of the microRNA cluster C19MC. We performed integrated genetic and epigenetic analyses of 12 ETMR samples and identified, in all cases, C19MC fusions to TTYH1 driving expression of the microRNAs. ETMR tumors, cell lines and xenografts showed a specific DNA methylation pattern distinct from those of other tumors and normal tissues. We detected extreme overexpression of a previously uncharacterized isoform of DNMT3B originating at an alternative promoter that is active only in the first weeks of neural tube development. Transcriptional and immunohistochemical analyses suggest that C19MC-dependent DNMT3B deregulation is mediated by RBL2, a known repressor of DNMT3B. Transfection with individual C19MC microRNAs resulted in DNMT3B upregulation and RBL2 downregulation in cultured cells. Our data suggest a potential oncogenic re-engagement of an early developmental program in ETMR via epigenetic alteration mediated by an embryonic, brain-specific DNMT3B isoform.
Collapse
Affiliation(s)
- Claudia L Kleinman
- 1] McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada. [2] Department of Human Genetics, McGill University, Montreal, Quebec, Canada. [3]
| | - Noha Gerges
- 1] Department of Human Genetics, McGill University, Montreal, Quebec, Canada. [2]
| | | | - Patrick Sin-Chan
- Division of Hematology-Oncology, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Albena Pramatarova
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | | | - Véronique Adoue
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Stephan Busche
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Maxime Caron
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Haig Djambazian
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Amandine Bemmo
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Adam M Fontebasso
- Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada
| | - Tara Spence
- Division of Hematology-Oncology, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | | | - Steffen Albrecht
- Department of Pathology, McGill University Health Centre, Montreal, Quebec, Canada
| | - Peter Hauser
- Second Department of Paediatrics, Semmelweis University, Budapest, Hungary
| | - Miklos Garami
- Second Department of Paediatrics, Semmelweis University, Budapest, Hungary
| | - Almos Klekner
- Department of Neurosurgery, Medical and Health Science Center, University of Debrecen, Debrecen, Hungary
| | - Laszlo Bognar
- Department of Neurosurgery, Medical and Health Science Center, University of Debrecen, Debrecen, Hungary
| | - Jose-Luis Montes
- Division of Neurosurgery, Department of Surgery, Montreal Children's Hospital, McGill University Health Centre, Montreal, Quebec, Canada
| | - Alfredo Staffa
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Alexandre Montpetit
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Pierre Berube
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Magdalena Zakrzewska
- Department of Molecular Pathology and Neuropathology, Medical University of Lodz, Lodz, Poland
| | - Krzysztof Zakrzewski
- Department of Neurosurgery, Polish Mother's Memorial Hospital Research Institute, Lodz, Poland
| | - Pawel P Liberski
- Department of Molecular Pathology and Neuropathology, Medical University of Lodz, Lodz, Poland
| | - Zhifeng Dong
- Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, Canada
| | - Peter M Siegel
- Rosalind and Morris Goodman Cancer Research Centre, McGill University, Montreal, Quebec, Canada
| | - Thomas Duchaine
- Department of Biochemistry, McGill University, Montreal, Quebec, Canada
| | - Christian Perotti
- Department of Pathology & Laboratory Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Adam Fleming
- Division of Pediatric Hematology-Oncology, Department of Pediatrics, McGill University and the McGill University Health Centre Research Institute, Montreal, Quebec, Canada
| | - Damien Faury
- Division of Pediatric Hematology-Oncology, Department of Pediatrics, McGill University and the McGill University Health Centre Research Institute, Montreal, Quebec, Canada
| | - Marc Remke
- Division of Neurosurgery, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Marco Gallo
- Division of Neurosurgery, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Peter Dirks
- Division of Neurosurgery, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Michael D Taylor
- Division of Neurosurgery, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada
| | - Robert Sladek
- 1] McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada. [2] Department of Human Genetics, McGill University, Montreal, Quebec, Canada
| | - Tomi Pastinen
- McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada
| | - Jennifer A Chan
- Department of Pathology & Laboratory Medicine, University of Calgary, Calgary, Alberta, Canada
| | - Annie Huang
- 1] Division of Hematology-Oncology, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Ontario, Canada. [2] Program in Cell Biology, Arthur & Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Department of Pediatrics, University of Toronto, Toronto, Ontario, Canada. [3]
| | - Jacek Majewski
- 1] McGill University and Génome Québec Innovation Centre, Montreal, Quebec, Canada. [2] Department of Human Genetics, McGill University, Montreal, Quebec, Canada. [3]
| | - Nada Jabado
- 1] Department of Human Genetics, McGill University, Montreal, Quebec, Canada. [2] Division of Experimental Medicine, McGill University, Montreal, Quebec, Canada. [3]
| |
Collapse
|
31175
|
Alamancos GP, Agirre E, Eyras E. Methods to study splicing from high-throughput RNA sequencing data. Methods Mol Biol 2014; 1126:357-97. [PMID: 24549677 DOI: 10.1007/978-1-62703-980-2_26] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
The development of novel high-throughput sequencing (HTS) methods for RNA (RNA-Seq) has provided a very powerful mean to study splicing under multiple conditions at unprecedented depth. However, the complexity of the information to be analyzed has turned this into a challenging task. In the last few years, a plethora of tools have been developed, allowing researchers to process RNA-Seq data to study the expression of isoforms and splicing events, and their relative changes under different conditions. We provide an overview of the methods available to study splicing from short RNA-Seq data, which could serve as an entry point for users who need to decide on a suitable tool for a specific analysis. We also attempt to propose a classification of the tools according to the operations they do, to facilitate the comparison and choice of methods.
Collapse
Affiliation(s)
- Gael P Alamancos
- Computational Genomics, Universitat Pompeu Fabra, Barcelona, Spain
| | | | | |
Collapse
|
31176
|
Rehrauer H, Opitz L, Tan G, Sieverling L, Schlapbach R. Blind spots of quantitative RNA-seq: the limits for assessing abundance, differential expression, and isoform switching. BMC Bioinformatics 2013; 14:370. [PMID: 24365034 PMCID: PMC3879183 DOI: 10.1186/1471-2105-14-370] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 12/19/2013] [Indexed: 11/10/2022] Open
Abstract
Background RNA-seq is now widely used to quantitatively assess gene expression, expression differences and isoform switching, and promises to deliver results for the entire transcriptome. However, whether the transcriptional state of a gene can be captured accurately depends critically on library preparation, read alignment, expression estimation and the tests for differential expression and isoform switching. There are comparisons available for the individual steps but there is not yet a systematic investigation which specific genes are impacted by biases throughout the entire analysis workflow. It is especially unclear whether for a given gene, with current methods and protocols, expression changes and isoform switches can be detected. Results For the human genes, we report their detectability under various conditions using different approaches. Overall, we find that the input material has the biggest influence and may, depending on the protocol and RNA degradation, exhibit already strong length-dependent over- and underrepresentation of transcripts. The alignment step aligns for 50% of the isoforms up to 99% of the reads correctly; only in the presence of transcript modifications mainly short isoforms will have a low alignment rate. In our dataset, we found that, depending on the aligner and the input material used, the expression estimation of up to 93% of the genes being accurate within a factor of two; with the deviations being due to ambiguous alignments. Detection of differential expression using a negative-binomial count model works reliably for our simulated data but is dependent on the count accuracy. Interestingly, using the fold-change instead of the p-value as a score for differential expression yields the same performance in the situation of three replicates and the true change being two-fold. Isoform switching is harder to detect and for at least 109 genes the isoform differences evade detection independent of the method used. Conclusions RNA-seq is a reliable tool but the repetitive nature of the human genome makes the origin of the reads ambiguous and limits the detectability for certain genes. RNA-seq does not equally well represent isoforms independent of their size which may range from ~200nt to ~100′000nt. Researchers are advised to verify that their target genes do not have extreme properties with respect to repeated regions, GC content, and isoform length and complexity.
Collapse
Affiliation(s)
- Hubert Rehrauer
- Functional Genomics Center Zurich, University of Zurich/ETH, Zurich, Switzerland.
| | | | | | | | | |
Collapse
|
31177
|
Steijger T, Abril JF, Engström PG, Kokocinski F, Hubbard TJ, Guigó R, Harrow J, Bertone P. Assessment of transcript reconstruction methods for RNA-seq. Nat Methods 2013; 10:1177-84. [PMID: 24185837 PMCID: PMC3851240 DOI: 10.1038/nmeth.2714] [Citation(s) in RCA: 471] [Impact Index Per Article: 39.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 09/23/2013] [Indexed: 11/09/2022]
Abstract
We evaluated 25 protocol variants of 14 independent computational methods for exon identification, transcript reconstruction and expression-level quantification from RNA-seq data. Our results show that most algorithms are able to identify discrete transcript components with high success rates but that assembly of complete isoform structures poses a major challenge even when all constituent elements are identified. Expression-level estimates also varied widely across methods, even when based on similar transcript models. Consequently, the complexity of higher eukaryotic genomes imposes severe limitations on transcript recall and splice product discrimination that are likely to remain limiting factors for the analysis of current-generation RNA-seq data.
Collapse
Affiliation(s)
- Tamara Steijger
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | - Josep F Abril
- Departament de Genètica, Facultat de Biologia, Universitat de Barcelona, Barcelona, Spain
| | - Pär G Engström
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
| | | | | | - Roderic Guigó
- Center for Genomic Regulation, Barcelona, Spain
- Universitat Pompeu Fabra, Barcelona, Spain
| | | | - Paul Bertone
- European Molecular Biology Laboratory, European Bioinformatics Institute, Cambridge, UK
- Genome Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Developmental Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany
- Wellcome Trust - Medical Research Council Cambridge Stem Cell Institute, University of Cambridge, Cambridge, UK
| |
Collapse
|
31178
|
Antonescu CR, Sung YS, Chen CL, Zhang L, Chen HW, Singer S, Agaram NP, Sboner A, Fletcher CD. Novel ZC3H7B-BCOR, MEAF6-PHF1, and EPC1-PHF1 fusions in ossifying fibromyxoid tumors--molecular characterization shows genetic overlap with endometrial stromal sarcoma. Genes Chromosomes Cancer 2013; 53:183-93. [PMID: 24285434 DOI: 10.1002/gcc.22132] [Citation(s) in RCA: 127] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Accepted: 11/07/2013] [Indexed: 12/17/2022] Open
Abstract
PHF1 gene rearrangements have been recently described in around 50% of ossifying fibromyxoid tumors (OFMT) including benign and malignant cases, with a small subset showing EP400-PHF1 fusions. In the remaining cases no alternative gene fusions have been identified. PHF1-negative OFMT, especially if lacking S100 protein staining or peripheral ossification, are difficult to diagnose and distinguish from other soft tissue mimics. In seeking more comprehensive molecular characterization, we investigated a large cohort of 39 OFMT of various anatomic sites, immunoprofiles and grades of malignancy. Tumors were screened for PHF1 and EP400 rearrangements by FISH. RNA sequencing was performed in two index cases (OFMT1, OFMT3), negative for EP400-PHF1 fusions, followed by FusionSeq data analysis, a modular computational tool developed to discover gene fusions from paired-end RNA-seq data. Two novel fusions were identified ZC3H7B-BCOR in OFMT1 and MEAF6-PHF1 in OFMT3. After being validated by FISH and RT-PCR, these abnormalities were screened on the remaining cases. With these additional gene fusions, 33/39 (85%) of OFMTs demonstrated recurrent gene rearrangements, which can be used as molecular markers in challenging cases. The most common abnormality is PHF1 gene rearrangement (80%), being present in benign, atypical and malignant lesions, with fusion to EP400 in 44% of cases. ZC3H7B-BCOR and MEAF6-PHF1 fusions occurred predominantly in S100 protein-negative and malignant OFMT. As similar gene fusions were reported in endometrial stromal sarcomas, we screened for potential gene abnormalities in JAZF1 and EPC1 by FISH and found two additional cases with EPC1-PHF1 fusions.
Collapse
|
31179
|
The human cap-binding complex is functionally connected to the nuclear RNA exosome. Nat Struct Mol Biol 2013; 20:1367-76. [PMID: 24270879 PMCID: PMC3923317 DOI: 10.1038/nsmb.2703] [Citation(s) in RCA: 167] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2013] [Accepted: 10/04/2013] [Indexed: 11/29/2022]
Abstract
Nuclear processing and quality control of eukaryotic RNA is mediated by the RNA exosome, which is regulated by accessory factors. However, the mechanism of exosome recruitment to its ribonucleoprotein (RNP) targets remains poorly understood. Here we disclose a physical link between the human exosome and the cap-binding complex (CBC). The CBC associates with the ARS2 protein to form CBC-ARS2 (CBCA), and then further connects together with the ZC3H18 protein to the nuclear exosome targeting (NEXT) complex, forming CBC-NEXT (CBCN). RNA immunoprecipitation using CBCN factors as well as the analysis of combinatorial depletion of CBCN and exosome components underscore the functional relevance of CBC-exosome bridging at the level of target RNA. Specifically, CBCA suppresses read-through products of several RNA families by promoting their transcriptional termination. We suggest that the RNP 5′cap links transcription termination to exosomal RNA degradation via CBCN.
Collapse
|
31180
|
Batut P, Gingeras TR. RAMPAGE: promoter activity profiling by paired-end sequencing of 5'-complete cDNAs. ACTA ACUST UNITED AC 2013; 104:Unit 25B.11. [PMID: 24510412 DOI: 10.1002/0471142727.mb25b11s104] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RNA annotation and mapping of promoters for analysis of gene expression (RAMPAGE) is a method that harnesses highly specific sequencing of 5'-complete complementary DNAs to identify transcription start sites (TSSs) genome-wide. Although TSS mapping has historically relied on detection of 5'-complete cDNAs, current genome-wide approaches typically have limited specificity and provide only scarce information regarding transcript structure. RAMPAGE allows for highly stringent selection of 5'-complete molecules, thus allowing base-resolution TSS identification with a high signal-to-noise ratio. Paired-end sequencing of medium-length cDNAs yields transcript structure information that is essential to interpreting the relationship of TSSs to annotated genes and transcripts. As opposed to standard RNA-seq, RAMPAGE explicitly yields accurate and highly reproducible expression level estimates for individual promoters. Moreover, this approach offers a streamlined 2- to 3-day protocol that is optimized for extensive sample multiplexing, and is therefore adapted for large-scale projects. This method has been applied successfully to human and Drosophila samples, and in principle should be applicable to any eukaryotic system.
Collapse
Affiliation(s)
- Philippe Batut
- Watson School of Biological Sciences, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York
| | | |
Collapse
|
31181
|
Systematic evaluation of spliced alignment programs for RNA-seq data. Nat Methods 2013; 10:1185-91. [PMID: 24185836 PMCID: PMC4018468 DOI: 10.1038/nmeth.2722] [Citation(s) in RCA: 363] [Impact Index Per Article: 30.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2013] [Accepted: 09/10/2013] [Indexed: 01/29/2023]
Abstract
Authors compare RNA-seq aligners on mouse and human data sets using benchmarks such as alignment yield, splice junction accuracy and suitability for transcript reconstruction. The work highlights the strength of each program and discusses outstanding needs in RNA-seq analysis. High-throughput RNA sequencing is an increasingly accessible method for studying gene structure and activity on a genome-wide scale. A critical step in RNA-seq data analysis is the alignment of partial transcript reads to a reference genome sequence. To assess the performance of current mapping software, we invited developers of RNA-seq aligners to process four large human and mouse RNA-seq data sets. In total, we compared 26 mapping protocols based on 11 programs and pipelines and found major performance differences between methods on numerous benchmarks, including alignment yield, basewise accuracy, mismatch and gap placement, exon junction discovery and suitability of alignments for transcript reconstruction. We observed concordant results on real and simulated RNA-seq data, confirming the relevance of the metrics employed. Future developments in RNA-seq alignment methods would benefit from improved placement of multimapped reads, balanced utilization of existing gene annotation and a reduced false discovery rate for splice junctions.
Collapse
|
31182
|
Borozan I, Watt SN, Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq. PLoS One 2013; 8:e76935. [PMID: 24204709 PMCID: PMC3813700 DOI: 10.1371/journal.pone.0076935] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/04/2013] [Indexed: 01/02/2023] Open
Abstract
Next-generation sequencing technologies provide an unparallelled opportunity for the characterization and discovery of known and novel viruses. Because viruses are known to have the highest mutation rates when compared to eukaryotic and bacterial organisms, we assess the extent to which eleven well-known alignment algorithms (BLAST, BLAT, BWA, BWA-SW, BWA-MEM, BFAST, Bowtie2, Novoalign, GSNAP, SHRiMP2 and STAR) can be used for characterizing mutated and non-mutated viral sequences--including those that exhibit RNA splicing--in transcriptome samples. To evaluate aligners objectively we developed a realistic RNA-Seq simulation and evaluation framework (RiSER) and propose a new combined score to rank aligners for viral characterization in terms of their precision, sensitivity and alignment accuracy. We used RiSER to simulate both human and viral read sequences and suggest the best set of aligners for viral sequence characterization in human transcriptome samples. Our results show that significant and substantial differences exist between aligners and that a digital-subtraction-based viral identification framework can and should use different aligners for different parts of the process. We determine the extent to which mutated viral sequences can be effectively characterized and show that more sensitive aligners such as BLAST, BFAST, SHRiMP2, BWA-SW and GSNAP can accurately characterize substantially divergent viral sequences with up to 15% overall sequence mutation rate. We believe that the results presented here will be useful to researchers choosing aligners for viral sequence characterization using next-generation sequencing data.
Collapse
Affiliation(s)
- Ivan Borozan
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
- * E-mail:
| | - Stuart N. Watt
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| | - Vincent Ferretti
- Informatics and Bio-computing, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
| |
Collapse
|
31183
|
Differential L1 regulation in pluripotent stem cells of humans and apes. Nature 2013; 503:525-529. [PMID: 24153179 PMCID: PMC4064720 DOI: 10.1038/nature12686] [Citation(s) in RCA: 174] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2012] [Accepted: 09/17/2013] [Indexed: 12/11/2022]
Abstract
Identifying cellular and molecular differences between human and non-human primates (NHPs) is essential to the basic understanding of the evolution and diversity of our own species. Until now, preserved tissues have been the main source for most comparative studies between humans, chimpanzees (Pan troglodytes) and bonobos (Pan paniscus)1,2. However, these tissue samples do not fairly represent the distinctive traits of live cell behavior and are not amenable to genetic manipulation. We hypothesized that induced pluripotent stem cells (iPSCs) could be a unique biological resource to elucidate relevant phenotypical differences between human and NHPs and that those differences could have potential adaptation and speciation value. Here, we describe the generation and initial characterization of iPSCs from chimpanzees and bonobos as novel tools to explore factors that have contributed to great ape evolution. Comparative gene expression analysis of human and NHP iPSCs revealed differences in the regulation of Long Interspersed Nuclear Element-1 (LINE-1 or L1) transposons. A force of change in mammalian evolution, L1 elements are retrotransposons that have remained active during primate evolution3-5. Decreased levels of L1 restricting factors APOBEC3B (A3B)6 and PIWIL27 in NHP iPSCs correlated with increased L1 mobility and endogenous L1 mRNA levels. Moreover, results from manipulation of A3B and PIWIL2 levels in iPSCs supported a causal inverse relationship between levels of these proteins and L1 retrotransposition. Finally, we found increased copy numbers of species-specific L1 elements in the genome of chimpanzees compared to humans, supporting the idea that increased L1 mobility in NHPs is not limited to iPSCs in culture and may have also occurred in the germline or embryonic cells developmentally upstream to germline specification during primate evolution. We propose that differences in L1 mobility may have differentially shaped the genomes of humans and NHPs and could have ongoing adaptive significance.
Collapse
|
31184
|
Strbenac D, Armstrong NJ, Yang JYH. Detection and classification of peaks in 5' cap RNA sequencing data. BMC Genomics 2013; 14 Suppl 5:S9. [PMID: 24564843 PMCID: PMC3852351 DOI: 10.1186/1471-2164-14-s5-s9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
BACKGROUND The large-scale sequencing of 5' cap enriched cDNA promises to reveal the diversity of transcription initiation across entire genomes. The process of transcription is noisy, and there is often no single, exact start site. This creates the need for a fast and simple method of identifying transcription start peaks based on this type of data. Due to both biological and technical noise, many of the peaks seen are not real transcription initiation events. Classification of the observed peaks is an essential filtering step in the discovery of genuine initiation locations. RESULTS We develop a two-stage approach consisting of a fast and simple algorithm based on a sliding window with Poisson null distribution for detecting the genomic locations of peaks, followed by a linear support vector machine classifier to distinguish between peaks which represent the initiation of transcription and peaks that do not. Comparison of classification performance to the best existing method based on whole genome segmentation showed comparable precision and improved recall. Internal features, which are intrinsic to the data and require no further experiments, had high precision and recall rates. Addition of pooled external data or matched RNA sequencing data resulted in gains of recall with equivalent precision. CONCLUSIONS The Poisson sliding window model is an effective and fast way of taking the peak neighbourhood into account, and finding statistically significant peaks over a range of transcript expression values. It is orders of magnitude faster than doing whole genome segmentation. The support vector classification scheme has better precision and recall than existing methods. Integrating additional datasets is shown to provide minor gains in recall, in comparison to using only the cap-sequencing data.
Collapse
|
31185
|
Behr J, Kahles A, Zhong Y, Sreedharan VT, Drewe P, Rätsch G. MITIE: Simultaneous RNA-Seq-based transcript identification and quantification in multiple samples. Bioinformatics 2013; 29:2529-38. [PMID: 23980025 PMCID: PMC3789545 DOI: 10.1093/bioinformatics/btt442] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2012] [Revised: 07/19/2013] [Accepted: 07/29/2013] [Indexed: 02/07/2023] Open
Abstract
MOTIVATION High-throughput sequencing of mRNA (RNA-Seq) has led to tremendous improvements in the detection of expressed genes and reconstruction of RNA transcripts. However, the extensive dynamic range of gene expression, technical limitations and biases, as well as the observed complexity of the transcriptional landscape, pose profound computational challenges for transcriptome reconstruction. RESULTS We present the novel framework MITIE (Mixed Integer Transcript IdEntification) for simultaneous transcript reconstruction and quantification. We define a likelihood function based on the negative binomial distribution, use a regularization approach to select a few transcripts collectively explaining the observed read data and show how to find the optimal solution using Mixed Integer Programming. MITIE can (i) take advantage of known transcripts, (ii) reconstruct and quantify transcripts simultaneously in multiple samples, and (iii) resolve the location of multi-mapping reads. It is designed for genome- and assembly-based transcriptome reconstruction. We present an extensive study based on realistic simulated RNA-Seq data. When compared with state-of-the-art approaches, MITIE proves to be significantly more sensitive and overall more accurate. Moreover, MITIE yields substantial performance gains when used with multiple samples. We applied our system to 38 Drosophila melanogaster modENCODE RNA-Seq libraries and estimated the sensitivity of reconstructing omitted transcript annotations and the specificity with respect to annotated transcripts. Our results corroborate that a well-motivated objective paired with appropriate optimization techniques lead to significant improvements over the state-of-the-art in transcriptome reconstruction. AVAILABILITY MITIE is implemented in C++ and is available from http://bioweb.me/mitie under the GPL license.
Collapse
Affiliation(s)
- Jonas Behr
- Computational Biology Center, Sloan-Kettering Institute, 1275 York Avenue, New York, NY 10065, USA and Friedrich Miescher Laboratory, Max Planck Society, Spemannstr. 39, 72076 Tübingen, Germany
| | | | | | | | | | | |
Collapse
|
31186
|
Sensitive detection of viral transcripts in human tumor transcriptomes. PLoS Comput Biol 2013; 9:e1003228. [PMID: 24098097 PMCID: PMC3789765 DOI: 10.1371/journal.pcbi.1003228] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 06/04/2013] [Indexed: 02/07/2023] Open
Abstract
In excess of % of human cancer incidents have a viral cofactor. Epidemiological studies of idiopathic human cancers indicate that additional tumor viruses remain to be discovered. Recent advances in sequencing technology have enabled systematic screenings of human tumor transcriptomes for viral transcripts. However, technical problems such as low abundances of viral transcripts in large volumes of sequencing data, viral sequence divergence, and homology between viral and human factors significantly confound identification of tumor viruses. We have developed a novel computational approach for detecting viral transcripts in human cancers that takes the aforementioned confounding factors into account and is applicable to a wide variety of viruses and tumors. We apply the approach to conducting the first systematic search for viruses in neuroblastoma, the most common cancer in infancy. The diverse clinical progression of this disease as well as related epidemiological and virological findings are highly suggestive of a pathogenic cofactor. However, a viral etiology of neuroblastoma is currently contested. We mapped transcriptomes of neuroblastoma as well as positive and negative controls to the human and all known viral genomes in order to detect both known and unknown viruses. Analysis of controls, comparisons with related methods, and statistical estimates demonstrate the high sensitivity of our approach. Detailed investigation of putative viral transcripts within neuroblastoma samples did not provide evidence for the existence of any known human viruses. Likewise, de-novo assembly and analysis of chimeric transcripts did not result in expression signatures associated with novel human pathogens. While confounding factors such as sample dilution or viral clearance in progressed tumors may mask viral cofactors in the data, in principle, this is rendered less likely by the high sensitivity of our approach and the number of biological replicates analyzed. Therefore, our results suggest that frequent viral cofactors of metastatic neuroblastoma are unlikely. Many human cancers are caused by infections with tumor viruses and identification of these pathogens is considered a critical contribution to cancer prevention. Deep sequencing enables us to systematically investigate viral nucleotide signatures in order to either verify or exclude the existence of viruses in idiopathic human cancers. We have developed Virana, a novel computational approach for identifying tumor viruses in human cancers that is applicable to a wide variety of tumors and viruses. Virana firstly addresses several important biological confounding factors that may hinder successful detection of these pathogens. We applied our approach in the first systematic search for cancer-causing viruses in metastatic neuroblastoma, the most common form of cancer in infancy. Although the heterogeneous clinical progression of this disease as well as epidemiological and virological findings are suggestive of a pathogenic cofactor, the viral etiology of neuroblastoma is currently contested. We conducted an analysis of experimental controls, comparisons with related approaches, as well as statistical analyses in order to validate our method. In spite of the high sensitivity of our approach, analyses of neuroblastoma transcriptomes did not provide evidence for the existence of any known or unknown human viruses. Our results therefore suggest that frequent viral cofactors of metastatic neuroblastoma are unlikely.
Collapse
|
31187
|
Picelli S, Björklund ÅK, Faridani OR, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods 2013; 10:1096-8. [PMID: 24056875 DOI: 10.1038/nmeth.2639] [Citation(s) in RCA: 1650] [Impact Index Per Article: 137.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Accepted: 07/17/2013] [Indexed: 12/25/2022]
Abstract
Single-cell gene expression analyses hold promise for characterizing cellular heterogeneity, but current methods compromise on either the coverage, the sensitivity or the throughput. Here, we introduce Smart-seq2 with improved reverse transcription, template switching and preamplification to increase both yield and length of cDNA libraries generated from individual cells. Smart-seq2 transcriptome libraries have improved detection, coverage, bias and accuracy compared to Smart-seq libraries and are generated with off-the-shelf reagents at lower cost.
Collapse
|
31188
|
Epstein-Barr virus and human herpesvirus 6 detection in a non-Hodgkin's diffuse large B-cell lymphoma cohort by using RNA sequencing. J Virol 2013; 87:13059-62. [PMID: 24049168 DOI: 10.1128/jvi.02380-13] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Comprehensive virome analysis of RNA sequence (RNA-seq) data sets from 118 non-Hodgkin's B-cell lymphomas revealed a small subset that is positive for Epstein-Barr virus (EBV) or human herpesvirus 6B (HHV-6B), with one coinfection. EBV transcriptome analysis revealed expression of the latency genes RPMS1, LMP1, and LMP2, with one sample additionally showing a high level of early lytic expression and another sample showing a high level of EBNA2 expression. HHV-6B transcriptome analysis revealed that the majority of genes were transcribed.
Collapse
|
31189
|
Florea LD, Salzberg SL. Genome-guided transcriptome assembly in the age of next-generation sequencing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1234-1240. [PMID: 24524156 PMCID: PMC4086730 DOI: 10.1109/tcbb.2013.140] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Next-generation sequencing technologies provide unprecedented power to explore the repertoire of genes and their alternative splice variants, collectively defining the transcriptome of a species in great detail. However, assembling the short reads into full-length gene and transcript models presents significant computational challenges. We review current algorithms for assembling transcripts and genes from next-generation sequencing reads aligned to a reference genome, and lay out areas for future improvements.
Collapse
|
31190
|
Ilott NE, Ponting CP. Predicting long non-coding RNAs using RNA sequencing. Methods 2013; 63:50-9. [PMID: 23541739 DOI: 10.1016/j.ymeth.2013.03.019] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Revised: 03/12/2013] [Accepted: 03/19/2013] [Indexed: 02/01/2023] Open
Abstract
The advent of next-generation sequencing, and in particular RNA-sequencing (RNA-seq), technologies has expanded our knowledge of the transcriptional capacity of human and other animal, genomes. In particular, recent RNA-seq studies have revealed that transcription is widespread across the mammalian genome, resulting in a large increase in the number of putative transcripts from both within, and intervening between, known protein-coding genes. Long transcripts that appear to lack protein-coding potential (long non-coding RNAs, lncRNAs) have been the focus of much recent research, in part owing to observations of their cell-type and developmental time-point restricted expression patterns. A variety of sequencing protocols are currently available for identifying lncRNAs including RNA polymerase II occupancy, chromatin state maps and - the focus of this review - deep RNA sequencing. In addition, there are numerous analytical methods available for mapping reads and assembling transcript models that predict the presence and structure of lncRNAs from RNA-seq data. Here we review current methods for identifying lncRNAs using large-scale sequencing data from RNA-seq experiments and highlight analytical considerations that are required when undertaking such projects.
Collapse
Affiliation(s)
- Nicholas E Ilott
- CGAT, MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, UK.
| | | |
Collapse
|
31191
|
Spicuglia S, Maqbool MA, Puthier D, Andrau JC. An update on recent methods applied for deciphering the diversity of the noncoding RNA genome structure and function. Methods 2013; 63:3-17. [DOI: 10.1016/j.ymeth.2013.04.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 04/02/2013] [Accepted: 04/04/2013] [Indexed: 12/17/2022] Open
|
31192
|
|
31193
|
Pandey RV, Schlötterer C. DistMap: a toolkit for distributed short read mapping on a Hadoop cluster. PLoS One 2013; 8:e72614. [PMID: 24009693 PMCID: PMC3751911 DOI: 10.1371/journal.pone.0072614] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2013] [Accepted: 07/12/2013] [Indexed: 02/05/2023] Open
Abstract
With the rapid and steady increase of next generation sequencing data output, the mapping of short reads has become a major data analysis bottleneck. On a single computer, it can take several days to map the vast quantity of reads produced from a single Illumina HiSeq lane. In an attempt to ameliorate this bottleneck we present a new tool, DistMap - a modular, scalable and integrated workflow to map reads in the Hadoop distributed computing framework. DistMap is easy to use, currently supports nine different short read mapping tools and can be run on all Unix-based operating systems. It accepts reads in FASTQ format as input and provides mapped reads in a SAM/BAM format. DistMap supports both paired-end and single-end reads thereby allowing the mapping of read data produced by different sequencing platforms. DistMap is available from http://code.google.com/p/distmap/
Collapse
Affiliation(s)
- Ram Vinay Pandey
- Institut für Populationsgenetik, Vetmeduni Vienna, Vienna, Austria
| | | |
Collapse
|
31194
|
Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Nat Protoc 2013; 8:1765-86. [PMID: 23975260 DOI: 10.1038/nprot.2013.099] [Citation(s) in RCA: 840] [Impact Index Per Article: 70.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
RNA sequencing (RNA-seq) has been rapidly adopted for the profiling of transcriptomes in many areas of biology, including studies into gene regulation, development and disease. Of particular interest is the discovery of differentially expressed genes across different conditions (e.g., tissues, perturbations) while optionally adjusting for other systematic factors that affect the data-collection process. There are a number of subtle yet crucial aspects of these analyses, such as read counting, appropriate treatment of biological variability, quality control checks and appropriate setup of statistical modeling. Several variations have been presented in the literature, and there is a need for guidance on current best practices. This protocol presents a state-of-the-art computational and statistical RNA-seq differential expression analysis workflow largely based on the free open-source R language and Bioconductor software and, in particular, on two widely used tools, DESeq and edgeR. Hands-on time for typical small experiments (e.g., 4-10 samples) can be <1 h, with computation time <1 d using a standard desktop PC.
Collapse
|
31195
|
Trabzuni D, Ryten M, Emmett W, Ramasamy A, Lackner KJ, Zeller T, Walker R, Smith C, Lewis PA, Mamais A, de Silva R, Vandrovcova J, Hernandez D, Nalls MA, Sharma M, Garnier S, Lesage S, Simon-Sanchez J, Gasser T, Heutink P, Brice A, Singleton A, Cai H, Schadt E, Wood NW, Bandopadhyay R, Weale ME, Hardy J, Plagnol V. Fine-mapping, gene expression and splicing analysis of the disease associated LRRK2 locus. PLoS One 2013; 8:e70724. [PMID: 23967090 PMCID: PMC3742662 DOI: 10.1371/journal.pone.0070724] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2012] [Accepted: 06/23/2013] [Indexed: 12/04/2022] Open
Abstract
Association studies have identified several signals at the LRRK2 locus for Parkinson's disease (PD), Crohn's disease (CD) and leprosy. However, little is known about the molecular mechanisms mediating these effects. To further characterize this locus, we fine-mapped the risk association in 5,802 PD and 5,556 controls using a dense genotyping array (ImmunoChip). Using samples from 134 post-mortem control adult human brains (UK Human Brain Expression Consortium), where up to ten brain regions were available per individual, we studied the regional variation, splicing and regulation of LRRK2. We found convincing evidence for a common variant PD association located outside of the LRRK2 protein coding region (rs117762348, A>G, P = 2.56×10(-8), case/control MAF 0.083/0.074, odds ratio 0.86 for the minor allele with 95% confidence interval [0.80-0.91]). We show that mRNA expression levels are highest in cortical regions and lowest in cerebellum. We find an exon quantitative trait locus (QTL) in brain samples that localizes to exons 32-33 and investigate the molecular basis of this eQTL using RNA-Seq data in n = 8 brain samples. The genotype underlying this eQTL is in strong linkage disequilibrium with the CD associated non-synonymous SNP rs3761863 (M2397T). We found two additional QTLs in liver and monocyte samples but none of these explained the common variant PD association at rs117762348. Our results characterize the LRRK2 locus, and highlight the importance and difficulties of fine-mapping and integration of multiple datasets to delineate pathogenic variants and thus develop an understanding of disease mechanisms.
Collapse
Affiliation(s)
- Daniah Trabzuni
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh, Saudi Arabia
| | - Mina Ryten
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
| | - Warren Emmett
- University College London Genetics Institute, University College London, London, United Kingdom
| | - Adaikalavan Ramasamy
- Department of Medical and Molecular Genetics, King's College London, Guy's Hospital, London, United Kingdom
| | - Karl J. Lackner
- Institute of Clinical Chemistry and Laboratory Medicine, University Medical Centre Mainz, Mainz, Germany
| | - Tanja Zeller
- University Heart Center Hamburg, Clinic for General and Interventional Cardiology, Hamburg, Germany
| | - Robert Walker
- MRC Sudden Death Brain Bank Project, University of Edinburgh, Department of Neuropathology, Edinburgh, Scotland, United Kingdom
| | - Colin Smith
- MRC Sudden Death Brain Bank Project, University of Edinburgh, Department of Neuropathology, Edinburgh, Scotland, United Kingdom
| | - Patrick A. Lewis
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- School of Pharmacy, University of Reading, Whiteknights, Reading, United Kingdom
| | - Adamantios Mamais
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Reta Lila Weston Institute of Neurological Studies, London, United Kingdom
| | - Rohan de Silva
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Reta Lila Weston Institute of Neurological Studies, London, United Kingdom
| | - Jana Vandrovcova
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Reta Lila Weston Institute of Neurological Studies, London, United Kingdom
| | | | - Dena Hernandez
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Michael A. Nalls
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Manu Sharma
- Division of Neurodegenerative Disorders, Hertie Institute for Clinical Brain Research, University of Tubingen, Tubingen, Germany
| | - Sophie Garnier
- Pierre and Marie Curie University, Institut National de la Santé et de la Recherche Médicale UMRS 937, Paris, France
| | - Suzanne Lesage
- CRICM, University Pierre et Marie Curie, Institut National de la Santé et de la Recherche Médicale UMRS 975, CNRS UMR 7225, Hospital Pitié-Salpêtrière, Paris, France
| | - Javier Simon-Sanchez
- Department of Clinical Genetics, Section of Medical Genomics, VU University Medical Centre, Amsterdam, The Netherlands
| | - Thomas Gasser
- Division of Neurodegenerative Disorders, Hertie Institute for Clinical Brain Research, University of Tubingen, Tubingen, Germany
| | - Peter Heutink
- Department of Clinical Genetics, Section of Medical Genomics, VU University Medical Centre, Amsterdam, The Netherlands
| | - Alexis Brice
- CRICM, University Pierre et Marie Curie, Institut National de la Santé et de la Recherche Médicale UMRS 975, CNRS UMR 7225, Hospital Pitié-Salpêtrière, Paris, France
| | - Andrew Singleton
- Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Huaibin Cai
- Unit of Transgenesis, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Eric Schadt
- Institute for Genomics and Multiscale Biology, Mount Sinai School of Medicine, New York, New York, United States of America
| | - Nicholas W. Wood
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
| | - Rina Bandopadhyay
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Reta Lila Weston Institute of Neurological Studies, London, United Kingdom
| | - Michael E. Weale
- Department of Medical and Molecular Genetics, King's College London, Guy's Hospital, London, United Kingdom
| | - John Hardy
- Department of Molecular Neuroscience, UCL Institute of Neurology, Queen Square, London, United Kingdom
- Reta Lila Weston Institute of Neurological Studies, London, United Kingdom
| | - Vincent Plagnol
- University College London Genetics Institute, University College London, London, United Kingdom
| |
Collapse
|
31196
|
Kang BH, Jensen KJ, Hatch JA, Janes KA. Simultaneous profiling of 194 distinct receptor transcripts in human cells. Sci Signal 2013; 6:rs13. [PMID: 23921087 DOI: 10.1126/scisignal.2003624] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Many signal transduction cascades are initiated by transmembrane receptors with the presence or absence and abundance of receptors dictating cellular responsiveness. We provide a validated array of quantitative reverse transcription polymerase chain reaction (qRT-PCR) reagents for high-throughput profiling of the presence and relative abundance of transcripts for 194 transmembrane receptors in the human genome. We found that the qRT-PCR array had greater sensitivity and specificity for the detected receptor transcript profiles compared to conventional oligonucleotide microarrays or exon microarrays. The qRT-PCR array also distinguished functional receptor presence versus absence more accurately than deep sequencing of adenylated RNA species by RNA sequencing (RNA-seq). By applying qRT-PCR-based receptor transcript profiling to 40 human cell lines representing four main tissues (pancreas, skin, breast, and colon), we identified clusters of cell lines with enhanced signaling capabilities and revealed a role for receptor silencing in defining tissue lineage. Ectopic expression of the interleukin-10 (IL-10) receptor-encoding gene IL10RA in melanoma cells engaged an IL-10 autocrine loop not otherwise present in this cell type, which altered signaling, gene expression, and cellular responses to proinflammatory stimuli. Our array provides a rapid, inexpensive, and convenient means for assigning a receptor signature to any human cell or tissue type.
Collapse
Affiliation(s)
- Byong H Kang
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | | | | | | |
Collapse
|
31197
|
Dorff KC, Chambwe N, Zeno Z, Simi M, Shaknovich R, Campagne F. GobyWeb: simplified management and analysis of gene expression and DNA methylation sequencing data. PLoS One 2013; 8:e69666. [PMID: 23936070 PMCID: PMC3720652 DOI: 10.1371/journal.pone.0069666] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2013] [Accepted: 06/11/2013] [Indexed: 01/04/2023] Open
Abstract
We present GobyWeb, a web-based system that facilitates the management and analysis of high-throughput sequencing (HTS) projects. The software provides integrated support for a broad set of HTS analyses and offers a simple plugin extension mechanism. Analyses currently supported include quantification of gene expression for messenger and small RNA sequencing, estimation of DNA methylation (i.e., reduced bisulfite sequencing and whole genome methyl-seq), or the detection of pathogens in sequenced data. In contrast to previous analysis pipelines developed for analysis of HTS data, GobyWeb requires significantly less storage space, runs analyses efficiently on a parallel grid, scales gracefully to process tens or hundreds of multi-gigabyte samples, yet can be used effectively by researchers who are comfortable using a web browser. We conducted performance evaluations of the software and found it to either outperform or have similar performance to analysis programs developed for specialized analyses of HTS data. We found that most biologists who took a one-hour GobyWeb training session were readily able to analyze RNA-Seq data with state of the art analysis tools. GobyWeb can be obtained at http://gobyweb.campagnelab.org and is freely available for non-commercial use. GobyWeb plugins are distributed in source code and licensed under the open source LGPL3 license to facilitate code inspection, reuse and independent extensions http://github.com/CampagneLaboratory/gobyweb2-plugins.
Collapse
Affiliation(s)
- Kevin C. Dorff
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, New York, United States of America
| | - Nyasha Chambwe
- Department of Physiology and Biophysics, The Weill Cornell Medical College, New York, New York, United States of America
- Tri-Institutional Training Program in Computational Biology and Medicine, The Weill Cornell Medical College, New York, New York, United States of America
| | - Zachary Zeno
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, New York, United States of America
| | - Manuele Simi
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, New York, United States of America
| | - Rita Shaknovich
- Department of Pathology and Department of Medicine; The Weill Cornell Medical College, New York, New York, United States of America
| | - Fabien Campagne
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, The Weill Cornell Medical College, New York, New York, United States of America
- Department of Physiology and Biophysics, The Weill Cornell Medical College, New York, New York, United States of America
| |
Collapse
|
31198
|
Polyadenylation site-induced decay of upstream transcripts enforces promoter directionality. Nat Struct Mol Biol 2013; 20:923-8. [PMID: 23851456 DOI: 10.1038/nsmb.2640] [Citation(s) in RCA: 205] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Accepted: 06/26/2013] [Indexed: 12/19/2022]
Abstract
Active human promoters produce promoter-upstream transcripts (PROMPTs). Why these RNAs are coupled to decay, whereas their neighboring promoter-downstream mRNAs are not, is unknown. Here high-throughput sequencing demonstrates that PROMPTs generally initiate in the antisense direction closely upstream of the transcription start sites (TSSs) of their associated genes. PROMPT TSSs share features with mRNA-producing TSSs, including stalled RNA polymerase II (RNAPII) and the production of small TSS-associated RNAs. Notably, motif analyses around PROMPT 3' ends reveal polyadenylation (pA)-like signals. Mutagenesis studies demonstrate that PROMPT pA signals are functional but linked to RNA degradation. Moreover, pA signals are under-represented in promoter-downstream versus promoter-upstream regions, thus allowing for more efficient RNAPII progress in the sense direction from gene promoters. We conclude that asymmetric sequence distribution around human gene promoters serves to provide a directional RNA output from an otherwise bidirectional transcription process.
Collapse
|
31199
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013. [PMID: 23618408 DOI: 10.1101/000851] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/16/2023] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|
31200
|
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 2013. [PMID: 23618408 DOI: 10.1186/gb‐2013‐14‐4‐r36] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at http://ccb.jhu.edu/software/tophat.
Collapse
|