551
|
Empirical insights into the stochasticity of small RNA sequencing. Sci Rep 2016; 6:24061. [PMID: 27052356 PMCID: PMC4823707 DOI: 10.1038/srep24061] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2016] [Accepted: 03/21/2016] [Indexed: 11/08/2022] Open
Abstract
The choice of stochasticity distribution for modeling the noise distribution is a fundamental assumption for the analysis of sequencing data and consequently is critical for the accurate assessment of biological heterogeneity and differential expression. The stochasticity of RNA sequencing has been assumed to follow Poisson distributions. We collected microRNA sequencing data and observed that its stochasticity is better approximated by gamma distributions, likely because of the stochastic nature of exponential PCR amplification. We validated our findings with two independent datasets, one for microRNA sequencing and another for RNA sequencing. Motivated by the gamma distributed stochasticity, we provided a simple method for the analysis of RNA sequencing data and showed its superiority to three existing methods for differential expression analysis using three data examples of technical replicate data and biological replicate data.
Collapse
|
552
|
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016. [PMID: 27043002 DOI: 10.1038/nbt.3519.] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.
Collapse
Affiliation(s)
- Nicolas L Bray
- Innovative Genomics Initiative, University of California, Berkeley, California, USA
| | - Harold Pimentel
- Department of Computer Science, University of California, Berkeley, California, USA
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland
| | - Lior Pachter
- Department of Computer Science, University of California, Berkeley, California, USA.,Department of Mathematics, University of California, Berkeley, California, USA.,Department of Molecular &Cell Biology, University of California, Berkeley, California, USA
| |
Collapse
|
553
|
Bray NL, Pimentel H, Melsted P, Pachter L. Near-optimal probabilistic RNA-seq quantification. Nat Biotechnol 2016; 34:525-7. [PMID: 27043002 DOI: 10.1038/nbt.3519] [Citation(s) in RCA: 5321] [Impact Index Per Article: 665.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 02/25/2016] [Indexed: 12/18/2022]
Abstract
We present kallisto, an RNA-seq quantification program that is two orders of magnitude faster than previous approaches and achieves similar accuracy. Kallisto pseudoaligns reads to a reference, producing a list of transcripts that are compatible with each read while avoiding alignment of individual bases. We use kallisto to analyze 30 million unaligned paired-end RNA-seq reads in <10 min on a standard laptop computer. This removes a major computational bottleneck in RNA-seq analysis.
Collapse
Affiliation(s)
- Nicolas L Bray
- Innovative Genomics Initiative, University of California, Berkeley, California, USA
| | - Harold Pimentel
- Department of Computer Science, University of California, Berkeley, California, USA
| | - Páll Melsted
- Faculty of Industrial Engineering, Mechanical Engineering and Computer Science, University of Iceland, Reykjavik, Iceland
| | - Lior Pachter
- Department of Computer Science, University of California, Berkeley, California, USA.,Department of Mathematics, University of California, Berkeley, California, USA.,Department of Molecular &Cell Biology, University of California, Berkeley, California, USA
| |
Collapse
|
554
|
Castells Domingo X, Ferrer-Font L, Davila M, Candiota AP, Simões RV, Fernández-Coello A, Gabarrós A, Boluda S, Barceló A, Ariño J, Arús C. Improving Ribosomal RNA Integrity in Surgically Resected Human Brain Tumor Biopsies. Biopreserv Biobank 2016; 14:156-64. [DOI: 10.1089/bio.2015.0086] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Affiliation(s)
- Xavier Castells Domingo
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Institut de Biotecnologia i de Biomedicina & Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Laura Ferrer-Font
- Institut de Biotecnologia i de Biomedicina & Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Departament de Bioquímica i Biologia Molecular, Facultat de Biociències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Cerdanyola del Vallès, Spain
| | - Myriam Davila
- Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Departament de Bioquímica i Biologia Molecular, Facultat de Biociències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Ana Paula Candiota
- Institut de Biotecnologia i de Biomedicina & Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Departament de Bioquímica i Biologia Molecular, Facultat de Biociències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Cerdanyola del Vallès, Spain
| | - Rui V. Simões
- Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Departament de Bioquímica i Biologia Molecular, Facultat de Biociències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Alejandro Fernández-Coello
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Cerdanyola del Vallès, Spain
- Departament de Neurocirurgia, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), Hospital Universitari de Bellvitge, L'Hospitalet de Llobregat, Spain
| | - Andreu Gabarrós
- Departament de Neurocirurgia, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), Hospital Universitari de Bellvitge, L'Hospitalet de Llobregat, Spain
| | - Susana Boluda
- Institut de Neuropatologia, Servei d'Anatomia Patològica, Institut d'Investigació Biomèdica de Bellvitge (IDIBELL), Hospital Universitari de Bellvitge, L'Hospitalet de Llobregat, Spain
| | - Anna Barceló
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Joaquín Ariño
- Servei de Genòmica i Bioinformàtica, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Institut de Biotecnologia i de Biomedicina & Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
| | - Carles Arús
- Institut de Biotecnologia i de Biomedicina & Departament de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Grup d'Aplicacions Biomèdiques de la RMN (GABRMN), Departament de Bioquímica i Biologia Molecular, Facultat de Biociències, Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- Centro de Investigación Biomédica en Red en Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Cerdanyola del Vallès, Spain
| |
Collapse
|
555
|
Nottingham RM, Wu DC, Qin Y, Yao J, Hunicke-Smith S, Lambowitz AM. RNA-seq of human reference RNA samples using a thermostable group II intron reverse transcriptase. RNA (NEW YORK, N.Y.) 2016; 22:597-613. [PMID: 26826130 PMCID: PMC4793214 DOI: 10.1261/rna.055558.115] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2015] [Accepted: 12/30/2015] [Indexed: 05/20/2023]
Abstract
Next-generation RNA sequencing (RNA-seq) has revolutionized our ability to analyze transcriptomes. Current RNA-seq methods are highly reproducible, but each has biases resulting from different modes of RNA sample preparation, reverse transcription, and adapter addition, leading to variability between methods. Moreover, the transcriptome cannot be profiled comprehensively because highly structured RNAs, such as tRNAs and snoRNAs, are refractory to conventional RNA-seq methods. Recently, we developed a new method for strand-specific RNA-seq using thermostable group II intron reverse transcriptases (TGIRTs). TGIRT enzymes have higher processivity and fidelity than conventional retroviral reverse transcriptases plus a novel template-switching activity that enables RNA-seq adapter addition during cDNA synthesis without using RNA ligase. Here, we obtained TGIRT-seq data sets for well-characterized human RNA reference samples and compared them to previous data sets obtained for these RNAs by the Illumina TruSeq v2 and v3 methods. We find that TGIRT-seq recapitulates the relative abundance of human transcripts and RNA spike-ins in ribo-depleted, fragmented RNA samples comparably to non-strand-specific TruSeq v2 and better than strand-specific TruSeq v3. Moreover, TGIRT-seq is more strand specific than TruSeq v3 and eliminates sampling biases from random hexamer priming, which are inherent to TruSeq. The TGIRT-seq data sets also show more uniform 5' to 3' gene coverage and identify more splice junctions, particularly near the 5' ends of mRNAs, than do the TruSeq data sets. Finally, TGIRT-seq enables the simultaneous profiling of mRNAs and lncRNAs in the same RNA-seq experiment as structured small ncRNAs, including tRNAs, which are essentially absent with TruSeq.
Collapse
Affiliation(s)
- Ryan M Nottingham
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Douglas C Wu
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Yidan Qin
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Jun Yao
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Scott Hunicke-Smith
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Alan M Lambowitz
- Institute for Cellular and Molecular Biology, University of Texas at Austin, Austin, Texas, 78712, USA Department of Molecular Biosciences, University of Texas at Austin, Austin, Texas, 78712, USA
| |
Collapse
|
556
|
Schulze S, Schleicher J, Guthke R, Linde J. How to Predict Molecular Interactions between Species? Front Microbiol 2016; 7:442. [PMID: 27065992 PMCID: PMC4814556 DOI: 10.3389/fmicb.2016.00442] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2015] [Accepted: 03/18/2016] [Indexed: 12/21/2022] Open
Abstract
Organisms constantly interact with other species through physical contact which leads to changes on the molecular level, for example the transcriptome. These changes can be monitored for all genes, with the help of high-throughput experiments such as RNA-seq or microarrays. The adaptation of the gene expression to environmental changes within cells is mediated through complex gene regulatory networks. Often, our knowledge of these networks is incomplete. Network inference predicts gene regulatory interactions based on transcriptome data. An emerging application of high-throughput transcriptome studies are dual transcriptomics experiments. Here, the transcriptome of two or more interacting species is measured simultaneously. Based on a dual RNA-seq data set of murine dendritic cells infected with the fungal pathogen Candida albicans, the software tool NetGenerator was applied to predict an inter-species gene regulatory network. To promote further investigations of molecular inter-species interactions, we recently discussed dual RNA-seq experiments for host-pathogen interactions and extended the applied tool NetGenerator (Schulze et al., 2015). The updated version of NetGenerator makes use of measurement variances in the algorithmic procedure and accepts gene expression time series data with missing values. Additionally, we tested multiple modeling scenarios regarding the stimuli functions of the gene regulatory network. Here, we summarize the work by Schulze et al. (2015) and put it into a broader context. We review various studies making use of the dual transcriptomics approach to investigate the molecular basis of interacting species. Besides the application to host-pathogen interactions, dual transcriptomics data are also utilized to study mutualistic and commensalistic interactions. Furthermore, we give a short introduction into additional approaches for the prediction of gene regulatory networks and discuss their application to dual transcriptomics data. We conclude that the application of network inference on dual-transcriptomics data is a promising approach to predict molecular inter-species interactions.
Collapse
Affiliation(s)
- Sylvie Schulze
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Jana Schleicher
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Reinhard Guthke
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| | - Jörg Linde
- Research Group Systems Biology and Bioinformatics, Leibniz-Institute for Natural Product Research and Infection Biology - Hans-Knöll-Institute Jena, Germany
| |
Collapse
|
557
|
Sutherland JJ, Jolly RA, Goldstein KM, Stevens JL. Assessing Concordance of Drug-Induced Transcriptional Response in Rodent Liver and Cultured Hepatocytes. PLoS Comput Biol 2016; 12:e1004847. [PMID: 27028627 PMCID: PMC4814051 DOI: 10.1371/journal.pcbi.1004847] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2015] [Accepted: 03/03/2016] [Indexed: 12/13/2022] Open
Abstract
The effect of drugs, disease and other perturbations on mRNA levels are studied using gene expression microarrays or RNA-seq, with the goal of understanding molecular effects arising from the perturbation. Previous comparisons of reproducibility across laboratories have been limited in scale and focused on a single model. The use of model systems, such as cultured primary cells or cancer cell lines, assumes that mechanistic insights derived from the models would have been observed via in vivo studies. We examined the concordance of compound-induced transcriptional changes using data from several sources: rat liver and rat primary hepatocytes (RPH) from Drug Matrix (DM) and open TG-GATEs (TG), human primary hepatocytes (HPH) from TG, and mouse liver / HepG2 results from the Gene Expression Omnibus (GEO) repository. Gene expression changes for treatments were normalized to controls and analyzed with three methods: 1) gene level for 9071 high expression genes in rat liver, 2) gene set analysis (GSA) using canonical pathways and gene ontology sets, 3) weighted gene co-expression network analysis (WGCNA). Co-expression networks performed better than genes or GSA when comparing treatment effects within rat liver and rat vs. mouse liver. Genes and modules performed similarly at Connectivity Map-style analyses, where success at identifying similar treatments among a collection of reference profiles is the goal. Comparisons between rat liver and RPH, and those between RPH, HPH and HepG2 cells reveal lower concordance for all methods. We observe that the baseline state of untreated cultured cells relative to untreated rat liver shows striking similarity with toxicant-exposed cells in vivo, indicating that gross systems level perturbation in the underlying networks in culture may contribute to the low concordance. Gene expression studies in model systems are widely used for understanding the mechanism of drugs and other perturbations in biological systems. Other researchers have examined the reproducibility of microarray studies between laboratories, or comparing microarrays and/or RNA sequencing. However, no large scale studies have compared results from protocols which differ in minor details, or results generated in vivo vs. in vitro culture systems thought to serve as useful models. The rat liver is by far the most extensively studied model evaluating effects of drugs and other perturbations, and existing data allowed us to assess the level of concordance between rat liver and rat primary hepatocytes cultured in collagen-coated plates (i.e. “flat” culture) for hundreds of drugs. We found that the mouse liver serves as a better model of the rat liver than do rat primary hepatocytes, even after allowing for differences due to pharmacokinetics. The low concordance observed between rat liver and rat hepatocytes suggests that validating the utility of ‘omics data generated on emerging cell culture approaches (e.g. “organ-on-a-chip”, 3D-printed tissues) using rat cells and comparison to the rat liver may be necessary in order to gain confidence these approaches substantially improve on traditional culture models of human cells.
Collapse
Affiliation(s)
- Jeffrey J. Sutherland
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, United States of America
- * E-mail: (JJS); (JLS)
| | - Robert A. Jolly
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, United States of America
| | - Keith M. Goldstein
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, United States of America
| | - James L. Stevens
- Lilly Research Laboratories, Eli Lilly and Company, Indianapolis, Indiana, United States of America
- * E-mail: (JJS); (JLS)
| |
Collapse
|
558
|
Byron SA, Van Keuren-Jensen KR, Engelthaler DM, Carpten JD, Craig DW. Translating RNA sequencing into clinical diagnostics: opportunities and challenges. Nat Rev Genet 2016; 17:257-71. [PMID: 26996076 PMCID: PMC7097555 DOI: 10.1038/nrg.2016.10] [Citation(s) in RCA: 452] [Impact Index Per Article: 56.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
RNA-based measurements have the potential for application across diverse areas of human health, including disease diagnosis, prognosis and therapeutic selection. Current clinical applications include infectious diseases, cancer, transplant medicine and fetal monitoring. RNA sequencing (RNA-seq) allows for the detection of a wide variety of RNA species, including mRNA, non-coding RNA, pathogen RNA, chimeric gene fusions, transcript isoforms and splice variants, and provides the capability to quantify known, pre-defined RNA species and rare RNA transcript variants within a sample. In addition to differential expression and detection of novel transcripts, RNA-seq also supports the detection of mutations and germline variation for hundreds to thousands of expressed genetic variants, facilitating assessment of allele-specific expression of these variants. Circulating RNAs and small regulatory RNAs, such as microRNAs, are very stable. These RNA species are vigorously being tested for their potential as biomarkers. However, there are currently few agreed upon methods for isolation or quantitative measurements and a current lack of quality controls that can be used to test platform accuracy and sample preparation quality. Analytical, bioinformatic and regulatory challenges exist, and ongoing efforts toward the establishment of benchmark standards, assay optimization for clinical conditions and demonstration of assay reproducibility are required to expand the clinical utility of RNA-seq.
RNA sequencing (RNA-seq) is a powerful approach for comprehensive analyses of transcriptomes. This Review describes the widespread potential applications of RNA-seq in clinical medicine, such as detecting disease-associated mutations and gene expression disruptions, as well as characteristic non-coding RNAs, circulating extracellular RNAs or pathogen RNAs. The authors also highlight the challenges in adopting RNA-seq routinely into clinical practice. With the emergence of RNA sequencing (RNA-seq) technologies, RNA-based biomolecules hold expanded promise for their diagnostic, prognostic and therapeutic applicability in various diseases, including cancers and infectious diseases. Detection of gene fusions and differential expression of known disease-causing transcripts by RNA-seq represent some of the most immediate opportunities. However, it is the diversity of RNA species detected through RNA-seq that holds new promise for the multi-faceted clinical applicability of RNA-based measures, including the potential of extracellular RNAs as non-invasive diagnostic indicators of disease. Ongoing efforts towards the establishment of benchmark standards, assay optimization for clinical conditions and demonstration of assay reproducibility are required to expand the clinical utility of RNA-seq.
Collapse
Affiliation(s)
- Sara A Byron
- Center for Translational Innovation, Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | | | - David M Engelthaler
- Pathogen Genomics Division, Translational Genomics Research Institute, Flagstaff, Arizona 86001, USA
| | - John D Carpten
- Integrated Cancer Genomics Division, Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| | - David W Craig
- Neurogenomics Division, Translational Genomics Research Institute, Phoenix, Arizona 85004, USA
| |
Collapse
|
559
|
Xu J, Gong B, Wu L, Thakkar S, Hong H, Tong W. Comprehensive Assessments of RNA-seq by the SEQC Consortium: FDA-Led Efforts Advance Precision Medicine. Pharmaceutics 2016; 8:E8. [PMID: 26999190 PMCID: PMC4810084 DOI: 10.3390/pharmaceutics8010008] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2016] [Revised: 03/08/2016] [Accepted: 03/10/2016] [Indexed: 01/22/2023] Open
Abstract
Studies on gene expression in response to therapy have led to the discovery of pharmacogenomics biomarkers and advances in precision medicine. Whole transcriptome sequencing (RNA-seq) is an emerging tool for profiling gene expression and has received wide adoption in the biomedical research community. However, its value in regulatory decision making requires rigorous assessment and consensus between various stakeholders, including the research community, regulatory agencies, and industry. The FDA-led SEquencing Quality Control (SEQC) consortium has made considerable progress in this direction, and is the subject of this review. Specifically, three RNA-seq platforms (Illumina HiSeq, Life Technologies SOLiD, and Roche 454) were extensively evaluated at multiple sites to assess cross-site and cross-platform reproducibility. The results demonstrated that relative gene expression measurements were consistently comparable across labs and platforms, but not so for the measurement of absolute expression levels. As part of the quality evaluation several studies were included to evaluate the utility of RNA-seq in clinical settings and safety assessment. The neuroblastoma study profiled tumor samples from 498 pediatric neuroblastoma patients by both microarray and RNA-seq. RNA-seq offers more utilities than microarray in determining the transcriptomic characteristics of cancer. However, RNA-seq and microarray-based models were comparable in clinical endpoint prediction, even when including additional features unique to RNA-seq beyond gene expression. The toxicogenomics study compared microarray and RNA-seq profiles of the liver samples from rats exposed to 27 different chemicals representing multiple toxicity modes of action. Cross-platform concordance was dependent on chemical treatment and transcript abundance. Though both RNA-seq and microarray are suitable for developing gene expression based predictive models with comparable prediction performance, RNA-seq offers advantages over microarray in profiling genes with low expression. The rat BodyMap study provided a comprehensive rat transcriptomic body map by performing RNA-Seq on 320 samples from 11 organs in either sex of juvenile, adolescent, adult and aged Fischer 344 rats. Lastly, the transferability study demonstrated that signature genes of predictive models are reciprocally transferable between microarray and RNA-seq data for model development using a comprehensive approach with two large clinical data sets. This result suggests continued usefulness of legacy microarray data in the coming RNA-seq era. In conclusion, the SEQC project enhances our understanding of RNA-seq and provides valuable guidelines for RNA-seq based clinical application and safety evaluation to advance precision medicine.
Collapse
Affiliation(s)
- Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Leihong Wu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Shraddha Thakkar
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Huixiao Hong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, USA.
| |
Collapse
|
560
|
Abstract
The epigenetic modifications are organized in patterns determining the functional properties of the underlying genome. Such patterns, typically measured by ChIP-seq assays of histone modifications, can be combined and translated into musical scores, summarizing multiple signals into a single waveform. As music is recognized as a universal way to convey meaningful information, we wanted to investigate properties of music obtained by sonification of ChIP-seq data. We show that the music produced by such quantitative signals is perceived by human listeners as more pleasant than that produced from randomized signals. Moreover, the waveform can be analyzed to predict phenotypic properties, such as differential gene expression.
Collapse
Affiliation(s)
- Davide Cittaro
- Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via olgettina 58, 20138 Milano, Italy
| | - Dejan Lazarevic
- Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via olgettina 58, 20138 Milano, Italy
| | - Paolo Provero
- Center for Translational Genomics and Bioinformatics, San Raffaele Hospital- via olgettina 58, 20138 Milano, Italy; Department of Molecular Biotechnology and Life Sciences, University of Turin - via Nizza 52, 10126 Torino, Italy
| |
Collapse
|
561
|
Inoue M, Shima Y, Miyabayashi K, Tokunaga K, Sato T, Baba T, Ohkawa Y, Akiyama H, Suyama M, Morohashi KI. Isolation and Characterization of Fetal Leydig Progenitor Cells of Male Mice. Endocrinology 2016; 157:1222-33. [PMID: 26697723 DOI: 10.1210/en.2015-1773] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Fetal and adult Leydig cells develop in mammalian prenatal and postnatal testes, respectively. In mice, fetal Leydig cells (FLCs) emerge in the interstitial space of the testis at embryonic day 12.5 and thereafter increase in number, possibly through differentiation from progenitor cells. However, the progenitor cells have not yet been identified. Previously, we established transgenic mice in which FLCs are labeled strongly with enhanced green fluorescent protein (EGFP). Interestingly, fluorescence-activated cell sorting provided us with weakly EGFP-labeled cells as well as strongly EGFP-labeled FLCs. In vitro reconstruction of fetal testes demonstrated that weakly EGFP-labeled cells contain FLC progenitors. Transcriptome from the 2 cell populations revealed, as expected, marked differences in the expression of genes required for growth factor/receptor signaling and steroidogenesis. In addition, genes for energy metabolisms such as glycolytic pathways and the citrate cycle were activated in strongly EGFP-labeled cells, suggesting that metabolism is activated during FLC differentiation.
Collapse
Affiliation(s)
- Miki Inoue
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Yuichi Shima
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Kanako Miyabayashi
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Kaori Tokunaga
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Tetsuya Sato
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Takashi Baba
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Yasuyuki Ohkawa
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Haruhiko Akiyama
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Mikita Suyama
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| | - Ken-ichirou Morohashi
- Division of Molecular Life Science (M.I., Y.S., T.B., K.-i.M.), Graduate School of Systems Life Science; Department of Molecular Biology (Y.S., K.M., K.T., T.B., K.-i.M.), Graduate School of Medical Sciences; Division of Bioinformatics (T.S., M.S.), Medical Institute of Bioregulation; and Department of Advanced Medical Initiatives (Y.O.), Japan Science and Technology Agency-Core Research for Evolutional Science and Technology, Faculty of Medicine, Kyushu University, Higashi-ku, Fukuoka 812-8582, Japan; and Department of Orthopaedics (H.A.), Gifu University Graduate School of Medicine, Gifu 501-1194, Japan
| |
Collapse
|
562
|
Escoffier J, Lee HC, Yassine S, Zouari R, Martinez G, Karaouzène T, Coutton C, Kherraf ZE, Halouani L, Triki C, Nef S, Thierry-Mieg N, Savinov SN, Fissore R, Ray PF, Arnoult C. Homozygous mutation of PLCZ1 leads to defective human oocyte activation and infertility that is not rescued by the WW-binding protein PAWP. Hum Mol Genet 2016; 25:878-91. [PMID: 26721930 PMCID: PMC4754041 DOI: 10.1093/hmg/ddv617] [Citation(s) in RCA: 103] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Revised: 12/06/2015] [Accepted: 12/17/2015] [Indexed: 11/13/2022] Open
Abstract
In mammals, sperm-oocyte fusion initiates Ca(2+) oscillations leading to a series of events called oocyte activation, which is the first stage of embryo development. Ca(2+) signaling is elicited by the delivery of an oocyte-activating factor by the sperm. A sperm-specific phospholipase C (PLCZ1) has emerged as the likely candidate to induce oocyte activation. Recently, PAWP, a sperm-born tryptophan domain-binding protein coded by WBP2NL, was proposed to serve the same purpose. Here, we studied two infertile brothers exhibiting normal sperm morphology but complete fertilization failure after intracytoplasmic sperm injection. Whole exomic sequencing evidenced a missense homozygous mutation in PLCZ1, c.1465A>T; p.Ile489Phe, converting Ile 489 into Phe. We showed the mutation is deleterious, leading to the absence of the protein in sperm, mislocalization of the protein when injected in mouse GV and MII oocytes, highly abnormal Ca(2+) transients and early embryonic arrest. Altogether these alterations are consistent with our patients' sperm inability to induce oocyte activation and initiate embryo development. In contrast, no deleterious variants were identified in WBP2NL and PAWP presented normal expression and localization. Overall we demonstrate in humans, the absence of PLCZ1 alone is sufficient to prevent oocyte activation irrespective of the presence of PAWP. Additionally, it is the first mutation located in the C2 domain of PLCZ1, a domain involved in targeting proteins to cell membranes. This opens the door to structure-function studies to identify the conserved amino acids of the C2 domain that regulate the targeting of PLCZ1 and its selectivity for its lipid substrate(s).
Collapse
Affiliation(s)
- Jessica Escoffier
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | | | - Sandra Yassine
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France
| | - Raoudha Zouari
- Polyclinique les Jasmins, Centre d'Aide Médicale à la Procréation, Centre Urbain Nord, 1003 Tunis, Tunisia
| | - Guillaume Martinez
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France
| | - Thomas Karaouzène
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France
| | - Charles Coutton
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, CHU de Grenoble, UF de Génétique Chromosomique, Grenoble F-38000, France
| | - Zine-Eddine Kherraf
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France
| | - Lazhar Halouani
- Polyclinique les Jasmins, Centre d'Aide Médicale à la Procréation, Centre Urbain Nord, 1003 Tunis, Tunisia
| | - Chema Triki
- Clinique Hannibal, Centre d'AMP, les berges du lac, 1053 Tunis, Tunisia
| | - Serge Nef
- Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland
| | - Nicolas Thierry-Mieg
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Laboratoire TIMC-IMAG, UMR CNRS 5525, Grenoble F-38000, France and
| | - Sergey N Savinov
- Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, MA 01003, USA
| | | | - Pierre F Ray
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France, CHU de Grenoble, UF de Biochimie et Génétique Moléculaire, Grenoble F-38000, France
| | - Christophe Arnoult
- Université Grenoble Alpes, Grenoble, F-38000, Grenoble, France, Institut Albert Bonniot, INSERM U823, La Tronche F-38700, France,
| |
Collapse
|
563
|
Zhang J, Wei Z. An empirical Bayes change-point model for identifying 3' and 5' alternative splicing by next-generation RNA sequencing. Bioinformatics 2016; 32:1823-31. [PMID: 26873932 DOI: 10.1093/bioinformatics/btw060] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2015] [Accepted: 01/19/2016] [Indexed: 01/08/2023] Open
Abstract
MOTIVATION Next-generation RNA sequencing (RNA-seq) has been widely used to investigate alternative isoform regulations. Among them, alternative 3 ': splice site (SS) and 5 ': SS account for more than 30% of all alternative splicing (AS) events in higher eukaryotes. Recent studies have revealed that they play important roles in building complex organisms and have a critical impact on biological functions which could cause disease. Quite a few analytical methods have been developed to facilitate alternative 3 ': SS and 5 ': SS studies using RNA-seq data. However, these methods have various limitations and their performances may be further improved. RESULTS We propose an empirical Bayes change-point model to identify alternative 3 ': SS and 5 ': SS. Compared with previous methods, our approach has several unique merits. First of all, our model does not rely on annotation information. Instead, it provides for the first time a systematic framework to integrate various information when available, in particular the useful junction read information, in order to obtain better performance. Second, we utilize an empirical Bayes model to efficiently pool information across genes to improve detection efficiency. Third, we provide a flexible testing framework in which the user can choose to address different levels of questions, namely, whether alternative 3 ': SS or 5 ': SS happens, and/or where it happens. Simulation studies and real data application have demonstrated that our method is powerful and accurate. AVAILABILITY AND IMPLEMENTATION The software is implemented in Java and can be freely downloaded from http://ebchangepoint.sourceforge.net/ CONTACT zhiwei@njit.edu.
Collapse
Affiliation(s)
- Jie Zhang
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| | - Zhi Wei
- Department of Computer Science, New Jersey Institute of Technology, Newark, NJ 07102, USA
| |
Collapse
|
564
|
Wang L, Nie J, Sicotte H, Li Y, Eckel-Passow JE, Dasari S, Vedell PT, Barman P, Wang L, Weinshiboum R, Jen J, Huang H, Kohli M, Kocher JPA. Measure transcript integrity using RNA-seq data. BMC Bioinformatics 2016; 17:58. [PMID: 26842848 PMCID: PMC4739097 DOI: 10.1186/s12859-016-0922-z] [Citation(s) in RCA: 149] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 01/29/2016] [Indexed: 11/21/2022] Open
Abstract
Background Stored biological samples with pathology information and medical records are invaluable resources for translational medical research. However, RNAs extracted from the archived clinical tissues are often substantially degraded. RNA degradation distorts the RNA-seq read coverage in a gene-specific manner, and has profound influences on whole-genome gene expression profiling. Result We developed the transcript integrity number (TIN) to measure RNA degradation. When applied to 3 independent RNA-seq datasets, we demonstrated TIN is a reliable and sensitive measure of the RNA degradation at both transcript and sample level. Through comparing 10 prostate cancer clinical samples with lower RNA integrity to 10 samples with higher RNA quality, we demonstrated that calibrating gene expression counts with TIN scores could effectively neutralize RNA degradation effects by reducing false positives and recovering biologically meaningful pathways. When further evaluating the performance of TIN correction using spike-in transcripts in RNA-seq data generated from the Sequencing Quality Control consortium, we found TIN adjustment had better control of false positives and false negatives (sensitivity = 0.89, specificity = 0.91, accuracy = 0.90), as compared to gene expression analysis results without TIN correction (sensitivity = 0.98, specificity = 0.50, accuracy = 0.86). Conclusion TIN is a reliable measurement of RNA integrity and a valuable approach used to neutralize in vitro RNA degradation effect and improve differential gene expression analysis. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0922-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Liguo Wang
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Jinfu Nie
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Hugues Sicotte
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Ying Li
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | | | - Surendra Dasari
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Peter T Vedell
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Poulami Barman
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Richard Weinshiboum
- Department of Molecular Pharmacology and Experimental Therapeutics, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Jin Jen
- Department of laboratory medicine and pathology, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Haojie Huang
- Department of Biochemistry and Molecular Biology, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Manish Kohli
- Department of Oncology, Mayo Clinic, Rochester, MN, 55905, USA.
| | - Jean-Pierre A Kocher
- Division of Biomedical Statistics and Informatics, Mayo Clinic, Rochester, MN, 55905, USA.
| |
Collapse
|
565
|
Wu PY, Wang MD. The Selection of Quantification Pipelines for Illumina RNA-seq Data Using a Subsampling Approach. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016; 2016:78-81. [PMID: 28133637 DOI: 10.1109/bhi.2016.7455839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
RNA sequencing, or (RNA-seq for short,, is a widely applied technology that for extractings gene and transcript expression from biological samples. Given numerous quantification pipelines for RNA-seq data, one fundamental challenge is to determine identify a pipeline that can produce the most accurate estimate the most accurate gene and/or transcript expression. Exploring all available pipelines requires tremendous extensive computational resources, so. Therefore, we propose to use a subsampling approach that can improve speed up the pipeline evaluation and selection the efficiency process of pipeline performance evaluation for a given RNA-seq dataset. We applied our approach to one simulated and two real RNA-seq datasets and found that expression estimates derived from subsampled data are close surrogates for those derived from original data. In addition, the ranking of quantification pipelines based on the subsampled data was highly correlated concordant with that based on the original data. Therefore, we conclude that subsampling is a valid approach to facilitating efficient quantification pipeline selection using RNA-seq data.
Collapse
|
566
|
Tong L, Yang C, Wu PY, Wang MD. Evaluating the impact of sequencing error correction for RNA-seq data with ERCC RNA spike-in controls. ... IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS. IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS 2016; 2016:74-77. [PMID: 27532064 DOI: 10.1109/bhi.2016.7455838] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Sequencing errors are a major issue for several next-generation sequencing-based applications such as de novo assembly and single nucleotide polymorphism detection. Several error-correction methods have been developed to improve raw data quality. However, error-correction performance is hard to evaluate because of the lack of a ground truth. In this study, we propose a novel approach which using ERCC RNA spike-in controls as the ground truth to facilitate error-correction performance evaluation. After aligning raw and corrected RNA-seq data, we characterized the quality of reads by three metrics: mismatch patterns (i.e., the substitution rate of A to C) of reads aligned with one mismatch, mismatch patterns of reads aligned with two mismatches and the percentage increase of reads aligned to reference. We observed that the mismatch patterns for reads aligned with one mismatch are significantly correlated between ERCC spike-ins and real RNA samples. Based on such observations, we conclude that ERCC spike-ins can serve as ground truths for error correction beyond their previous applications for validation of dynamic range and fold-change response. Also, the mismatch patterns for ERCC reads aligned with one mismatch can serve as a novel and reliable metric to evaluate the performance of error-correction tools.
Collapse
Affiliation(s)
- Li Tong
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - Cheng Yang
- Dept. of Biomedical Engineering, Peking University, No.5 Yiheyuan Road Haidian District, Beijing, P.R. China 100871
| | - Po-Yen Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
| | - May D Wang
- Dept. of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
567
|
Owens NDL, Blitz IL, Lane MA, Patrushev I, Overton JD, Gilchrist MJ, Cho KWY, Khokha MK. Measuring Absolute RNA Copy Numbers at High Temporal Resolution Reveals Transcriptome Kinetics in Development. Cell Rep 2016; 14:632-647. [PMID: 26774488 PMCID: PMC4731879 DOI: 10.1016/j.celrep.2015.12.050] [Citation(s) in RCA: 115] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2015] [Revised: 11/02/2015] [Accepted: 12/07/2015] [Indexed: 01/19/2023] Open
Abstract
Transcript regulation is essential for cell function, and misregulation can lead to disease. Despite technologies to survey the transcriptome, we lack a comprehensive understanding of transcript kinetics, which limits quantitative biology. This is an acute challenge in embryonic development, where rapid changes in gene expression dictate cell fate decisions. By ultra-high-frequency sampling of Xenopus embryos and absolute normalization of sequence reads, we present smooth gene expression trajectories in absolute transcript numbers. During a developmental period approximating the first 8 weeks of human gestation, transcript kinetics vary by eight orders of magnitude. Ordering genes by expression dynamics, we find that "temporal synexpression" predicts common gene function. Remarkably, a single parameter, the characteristic timescale, can classify transcript kinetics globally and distinguish genes regulating development from those involved in cellular metabolism. Overall, our analysis provides unprecedented insight into the reorganization of maternal and embryonic transcripts and redefines our ability to perform quantitative biology.
Collapse
Affiliation(s)
- Nick D L Owens
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway Mill Hill, London NW7 1AA, UK
| | - Ira L Blitz
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697 USA
| | - Maura A Lane
- Program in Vertebrate Developmental Biology, Department of Pediatrics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA; Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | - Ilya Patrushev
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway Mill Hill, London NW7 1AA, UK
| | - John D Overton
- Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA; Yale Center for Genome Analysis , Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA
| | - Michael J Gilchrist
- The Francis Crick Institute, Mill Hill Laboratory, The Ridgeway Mill Hill, London NW7 1AA, UK.
| | - Ken W Y Cho
- Department of Developmental and Cell Biology, University of California, Irvine, CA 92697 USA.
| | - Mustafa K Khokha
- Program in Vertebrate Developmental Biology, Department of Pediatrics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA; Department of Genetics, Yale University School of Medicine, 333 Cedar Street, New Haven, CT 06520, USA.
| |
Collapse
|
568
|
Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, Mortazavi A. A survey of best practices for RNA-seq data analysis. Genome Biol 2016; 17:13. [PMID: 26813401 PMCID: PMC4728800 DOI: 10.1186/s13059-016-0881-8] [Citation(s) in RCA: 1384] [Impact Index Per Article: 173.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
RNA-sequencing (RNA-seq) has a wide variety of applications, but no single analysis pipeline can be used in all cases. We review all of the major steps in RNA-seq data analysis, including experimental design, quality control, read alignment, quantification of gene and transcript levels, visualization, differential gene expression, alternative splicing, functional analysis, gene fusion detection and eQTL mapping. We highlight the challenges associated with each step. We discuss the analysis of small RNAs and the integration of RNA-seq with other functional genomics techniques. Finally, we discuss the outlook for novel technologies that are changing the state of the art in transcriptomics.
Collapse
Affiliation(s)
- Ana Conesa
- Institute for Food and Agricultural Sciences, Department of Microbiology and Cell Science, University of Florida, Gainesville, FL, 32603, USA. .,Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.
| | - Pedro Madrigal
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK. .,Wellcome Trust-Medical Research Council Cambridge Stem Cell Institute, Anne McLaren Laboratory for Regenerative Medicine, Department of Surgery, University of Cambridge, Cambridge, CB2 0SZ, UK.
| | - Sonia Tarazona
- Centro de Investigación Príncipe Felipe, Genomics of Gene Expression Laboratory, 46012, Valencia, Spain.,Department of Applied Statistics, Operations Research and Quality, Universidad Politécnica de Valencia, 46020, Valencia, Spain
| | - David Gomez-Cabrero
- Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, Karolinska University Hospital, 171 77, Stockholm, Sweden.,Center for Molecular Medicine, Karolinska Institutet, 17177, Stockholm, Sweden.,Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176, Stockholm, Sweden.,Science for Life Laboratory, 17121, Solna, Sweden
| | - Alejandra Cervera
- Systems Biology Laboratory, Institute of Biomedicine and Genome-Scale Biology Research Program, University of Helsinki, 00014, Helsinki, Finland
| | - Andrew McPherson
- School of Computing Science, Simon Fraser University, Burnaby, V5A 1S6, BC, Canada
| | - Michał Wojciech Szcześniak
- Department of Bioinformatics, Institute of Molecular Biology and Biotechnology, Adam Mickiewicz University in Poznań, 61-614, Poznań, Poland
| | - Daniel J Gaffney
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Laura L Elo
- Turku Centre for Biotechnology, University of Turku and Åbo Akademi University, FI-20520, Turku, Finland
| | - Xuegong Zhang
- Key Lab of Bioinformatics/Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing, 100084, China.,School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Ali Mortazavi
- Department of Developmental and Cell Biology, University of California, Irvine, Irvine, CA, 92697-2300, USA. .,Center for Complex Biological Systems, University of California, Irvine, Irvine, CA, 92697, USA.
| |
Collapse
|
569
|
Soneson C, Matthes KL, Nowicka M, Law CW, Robinson MD. Isoform prefiltering improves performance of count-based methods for analysis of differential transcript usage. Genome Biol 2016; 17:12. [PMID: 26813113 PMCID: PMC4729156 DOI: 10.1186/s13059-015-0862-3] [Citation(s) in RCA: 78] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2015] [Accepted: 12/29/2015] [Indexed: 01/08/2023] Open
Abstract
Background RNA-seq has been a boon to the quantitative analysis of transcriptomes. A notable application is the detection of changes in transcript usage between experimental conditions. For example, discovery of pathological alternative splicing may allow the development of new treatments or better management of patients. From an analysis perspective, there are several ways to approach RNA-seq data to unravel differential transcript usage, such as annotation-based exon-level counting, differential analysis of the percentage spliced in, or quantitative analysis of assembled transcripts. The goal of this research is to compare and contrast current state-of-the-art methods, and to suggest improvements to commonly used work flows. Results We assess the performance of representative work flows using synthetic data and explore the effect of using non-standard counting bin definitions as input to DEXSeq, a state-of-the-art inference engine. Although the canonical counting provided the best results overall, several non-canonical approaches were as good or better in specific aspects and most counting approaches outperformed the evaluated event- and assembly-based methods. We show that an incomplete annotation catalog can have a detrimental effect on the ability to detect differential transcript usage in transcriptomes with few isoforms per gene and that isoform-level prefiltering can considerably improve false discovery rate control. Conclusion Count-based methods generally perform well in the detection of differential transcript usage. Controlling the false discovery rate at the imposed threshold is difficult, particularly in complex organisms, but can be improved by prefiltering the annotation catalog. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0862-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| | - Katarina L Matthes
- Division of Chronic Disease Epidemiology, Epidemiology, Biostatistics and Prevention Institute (EPBI), University of Zurich, Hirschengraben 84, Zurich, 8001, Switzerland. .,Cancer Registry Zurich and Zug, University Hospital Zurich, Vogelsangstrasse 10, Zurich, 8091, Switzerland.
| | - Malgorzata Nowicka
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| | - Charity W Law
- Molecular Medicine Division, Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, 3052, Australia.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Winterthurerstrasse 190, Zurich, 8057, Switzerland. .,SIB Swiss Institute of Bioinformatics, University of Zurich, Zurich, 8057, Switzerland.
| |
Collapse
|
570
|
Confounding Factors in the Transcriptome Analysis of an In-Vivo Exposure Experiment. PLoS One 2016; 11:e0145252. [PMID: 26789003 PMCID: PMC4720430 DOI: 10.1371/journal.pone.0145252] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Accepted: 11/30/2015] [Indexed: 11/19/2022] Open
Abstract
Confounding factors In transcriptomics experimentation, confounding factors frequently exist alongside the intended experimental factors and can severely influence the outcome of a transcriptome analysis. Confounding factors are regularly discussed in methodological literature, but their actual, practical impact on the outcome and interpretation of transcriptomics experiments is, to our knowledge, not documented. For instance, in-vivo experimental factors; like Individual, Sample-Composition and Time-of-Day are potentially formidable confounding factors. To study these confounding factors, we designed an extensive in-vivo transcriptome experiment (n = 264) with UVR exposure of murine skin containing six consecutive samples from each individual mouse (n = 64). Analysis Approach Evaluation of the confounding factors: Sample-Composition, Time-of-Day, Handling-Stress, and Individual-Mouse resulted in the identification of many genes that were affected by them. These genes sometimes showed over 30-fold expression differences. The most prominent confounding factor was Sample-Composition caused by mouse-dependent skin composition differences, sampling variation and/or influx/efflux of mobile cells. Although we can only evaluate these effects for known cell type specifically expressed genes in our complex heterogeneous samples, it is clear that the observed variations also affect the cumulative expression levels of many other non-cell-type-specific genes. ANOVA ANOVA analysis can only attempt to neutralize the effects of the well-defined confounding factors, such as Individual-Mouse, on the experimental factors UV-Dose and Recovery-Time. Also, by definition, ANOVA only yields reproducible gene-expression differences, but we found that these differences were very small compared to the fold changes induced by the confounding factors, questioning the biological relevance of these ANOVA-detected differences. Furthermore, it turned out that many of the differentially expressed genes found by ANOVA were also present in the gene clusters associated with the confounding factors. Conclusion Hence our overall conclusion is that confounding factors have a major impact on the outcome of in-vivo transcriptomics experiments. Thus the set-up, analysis, and interpretation of such experiments should be approached with the utmost prudence.
Collapse
|
571
|
Amorim-Vaz S, Sanglard D. Novel Approaches for Fungal Transcriptomics from Host Samples. Front Microbiol 2016; 6:1571. [PMID: 26834721 PMCID: PMC4717316 DOI: 10.3389/fmicb.2015.01571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Accepted: 12/28/2015] [Indexed: 11/13/2022] Open
Abstract
Candida albicans adaptation to the host requires a profound reprogramming of the fungal transcriptome as compared to in vitro laboratory conditions. A detailed knowledge of the C. albicans transcriptome during the infection process is necessary in order to understand which of the fungal genes are important for host adaptation. Such genes could be thought of as potential targets for antifungal therapy. The acquisition of the C. albicans transcriptome is, however, technically challenging due to the low proportion of fungal RNA in host tissues. Two emerging technologies were used recently to circumvent this problem. One consists of the detection of low abundance fungal RNA using capture and reporter gene probes which is followed by emission and quantification of resulting fluorescent signals (nanoString). The other is based first on the capture of fungal RNA by short biotinylated oligonucleotide baits covering the C. albicans ORFome permitting fungal RNA purification. Next, the enriched fungal RNA is amplified and subjected to RNA sequencing (RNA-seq). Here we detail these two transcriptome approaches and discuss their advantages and limitations and future perspectives in microbial transcriptomics from host material.
Collapse
Affiliation(s)
- Sara Amorim-Vaz
- Institute of Microbiology, University Hospital Center, University of Lausanne Lausanne, Switzerland
| | - Dominique Sanglard
- Institute of Microbiology, University Hospital Center, University of Lausanne Lausanne, Switzerland
| |
Collapse
|
572
|
Lyu Y, Li Q. A semi-parametric statistical model for integrating gene expression profiles across different platforms. BMC Bioinformatics 2016; 17 Suppl 1:5. [PMID: 26818110 PMCID: PMC4895261 DOI: 10.1186/s12859-015-0847-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Determining differentially expressed genes (DEGs) between biological samples is the key to understand how genotype gives rise to phenotype. RNA-seq and microarray are two main technologies for profiling gene expression levels. However, considerable discrepancy has been found between DEGs detected using the two technologies. Integration data across these two platforms has the potential to improve the power and reliability of DEG detection. METHODS We propose a rank-based semi-parametric model to determine DEGs using information across different sources and apply it to the integration of RNA-seq and microarray data. By incorporating both the significance of differential expression and the consistency across platforms, our method effectively detects DEGs with moderate but consistent signals. We demonstrate the effectiveness of our method using simulation studies, MAQC/SEQC data and a synthetic microRNA dataset. CONCLUSIONS Our integration method is not only robust to noise and heterogeneity in the data, but also adaptive to the structure of data. In our simulations and real data studies, our approach shows a higher discriminate power and identifies more biologically relevant DEGs than eBayes, DEseq and some commonly used meta-analysis methods.
Collapse
Affiliation(s)
- Yafei Lyu
- The Huck Institute of Life Science, Pennsylvania State University, University Park, PA, 16802, USA.
| | - Qunhua Li
- Department of Statistics, Pennsylvania State University, University Park, PA, 16802, USA.
| |
Collapse
|
573
|
Abstract
Despite the enormous medical impact of cancers and intensive study of their biology, detailed characterization of tumor growth and development remains elusive. This difficulty occurs in large part because of enormous heterogeneity in the molecular mechanisms of cancer progression, both tumor-to-tumor and cell-to-cell in single tumors. Advances in genomic technologies, especially at the single-cell level, are improving the situation, but these approaches are held back by limitations of the biotechnologies for gathering genomic data from heterogeneous cell populations and the computational methods for making sense of those data. One popular way to gain the advantages of whole-genome methods without the cost of single-cell genomics has been the use of computational deconvolution (unmixing) methods to reconstruct clonal heterogeneity from bulk genomic data. These methods, too, are limited by the difficulty of inferring genomic profiles of rare or subtly varying clonal subpopulations from bulk data, a problem that can be computationally reduced to that of reconstructing the geometry of point clouds of tumor samples in a genome space. Here, we present a new method to improve that reconstruction by better identifying subspaces corresponding to tumors produced from mixtures of distinct combinations of clonal subpopulations. We develop a nonparametric clustering method based on medoidshift clustering for identifying subgroups of tumors expected to correspond to distinct trajectories of evolutionary progression. We show on synthetic and real tumor copy-number data that this new method substantially improves our ability to resolve discrete tumor subgroups, a key step in the process of accurately deconvolving tumor genomic data and inferring clonal heterogeneity from bulk data.
Collapse
Affiliation(s)
- Theodore Roman
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
| | - Lu Xie
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Joint Carnegie Mellon/University of Pittsburgh Ph.D. Program in Computational Biology, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA.
| | - Russell Schwartz
- Computational Biology Department, School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, 15213, PA, USA. .,Department of Biological Sciences, Mellon College of Science, Carnegie Mellon University, 4400 Fifth Avenue, Pittsburgh, 15213, PA, USA.
| |
Collapse
|
574
|
Kreutz C. New Concepts for Evaluating the Performance of Computational Methods * *The author acknowledge financial support by the by the German Ministry of Education and Research (BMBF) via e:Bio Grant No. 031L0080. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.ifacol.2016.12.104] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
575
|
Li W, Turner A, Aggarwal P, Matter A, Storvick E, Arnett DK, Broeckel U. Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis. BMC Genomics 2015; 16:1069. [PMID: 26673413 PMCID: PMC4681149 DOI: 10.1186/s12864-015-2270-1] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2015] [Accepted: 12/03/2015] [Indexed: 12/29/2022] Open
Abstract
Background Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq™ Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Results Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson’s r = 0.92) and Ion Torrent Proton (Pearson’s r = 0.92). We used ROC, Matthew’s correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Conclusions Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2270-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Wenli Li
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Amy Turner
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Praful Aggarwal
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Andrea Matter
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Erin Storvick
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| | - Donna K Arnett
- Department of Epidemiology, University of Alabama at Birmingham, 1530 3rd Avenue South, Birmingham, AL, 35294, USA.
| | - Ulrich Broeckel
- Department of Pediatrics, Section of Genomic Pediatrics, Medical College of Wisconsin, 8701 Watertown Plank Road, Milwaukee, WI, 53226, USA.
| |
Collapse
|
576
|
Epigenomic Reprogramming of Adult Cardiomyocyte-Derived Cardiac Progenitor Cells. Sci Rep 2015; 5:17686. [PMID: 26657817 PMCID: PMC4677315 DOI: 10.1038/srep17686] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2015] [Accepted: 10/14/2015] [Indexed: 01/01/2023] Open
Abstract
It has been believed that mammalian adult cardiomyocytes (ACMs) are terminally-differentiated and are unable to proliferate. Recently, using a bi-transgenic ACM fate mapping mouse model and an in vitro culture system, we demonstrated that adult mouse cardiomyocytes were able to dedifferentiate into cardiac progenitor-like cells (CPCs). However, little is known about the molecular basis of their intrinsic cellular plasticity. Here we integrate single-cell transcriptome and whole-genome DNA methylation analyses to unravel the molecular mechanisms underlying the dedifferentiation and cell cycle reentry of mouse ACMs. Compared to parental cardiomyocytes, dedifferentiated mouse cardiomyocyte-derived CPCs (mCPCs) display epigenomic reprogramming with many differentially-methylated regions, both hypermethylated and hypomethylated, across the entire genome. Correlated well with the methylome, our transcriptomic data showed that the genes encoding cardiac structure and function proteins are remarkably down-regulated in mCPCs, while those for cell cycle, proliferation, and stemness are significantly up-regulated. In addition, implantation of mCPCs into infarcted mouse myocardium improves cardiac function with augmented left ventricular ejection fraction. Our study demonstrates that the cellular plasticity of mammalian cardiomyocytes is the result of a well-orchestrated epigenomic reprogramming and a subsequent global transcriptomic alteration.
Collapse
|
577
|
Genomic Discoveries and Personalized Medicine in Neurological Diseases. Pharmaceutics 2015; 7:542-53. [PMID: 26690205 PMCID: PMC4695833 DOI: 10.3390/pharmaceutics7040542] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 11/30/2015] [Accepted: 12/02/2015] [Indexed: 12/22/2022] Open
Abstract
In the past decades, we have witnessed dramatic changes in clinical diagnoses and treatments due to the revolutions of genomics and personalized medicine. Undoubtedly we also met many challenges when we use those advanced technologies in drug discovery and development. In this review, we describe when genomic information is applied in personal healthcare in general. We illustrate some case examples of genomic discoveries and promising personalized medicine applications in the area of neurological disease particular. Available data suggest that individual genomics can be applied to better treat patients in the near future.
Collapse
|
578
|
Gonadal transcriptomic analysis and differentially expressed genes in the testis and ovary of the Pacific white shrimp (Litopenaeus vannamei). BMC Genomics 2015; 16:1006. [PMID: 26607692 PMCID: PMC4659196 DOI: 10.1186/s12864-015-2219-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 11/16/2015] [Indexed: 01/15/2023] Open
Abstract
Background The Pacific white shrimp (Litopenaeus vannamei) is the world’s most prevalent cultured crustacean species. However, the supply of high-quality broodstocks is limited and baseline information related to its reproductive activity and molecular issues related to gonad development are scarce. In this study, we performed transcriptome sequencing on the gonads of adult male and female L. vannamei to identify sex-related genes. Results A total of 25.16 gigabases (Gb) of sequences were generated from four L. vannamei gonadal tissue libraries. After quality control, 24.11 Gb of clean reads were selected from the gonadal libraries. De-novo assembly of all the clean reads generated a total of 65,218 unigenes with a mean size of 1021 bp and a N50 of 2000 bp. A search of all-unigene against Nr, SwissProt, KEGG, COG and NT databases resulted in 26,482, 23,062, 20,659, 11,935 and 14,626 annotations, respectively, providing a total of 30,304 annotated unigenes. Among annotated unigenes, 12,320 unigenes were assigned to gene ontology categories and 20,659 unigenes were mapped to 258 KEGG pathways. By comparing the ovary and testis libraries, 19,279 testicular up-regulated and 3,529 ovarian up-regulated unigenes were identified. Enrichment analysis of differentially expressed unigenes resulted in 1060 significantly enriched GO terms and 34 significantly enriched KEGG pathways. Nine ovary-specific, 6 testis-specific, 45 testicular up-regulated and 39 ovarian up-regulated unigenes were then confirmed by semi-quantitative PCR and quantitative real-time PCR. In addition, using all-unigenes as a reference, a total of 13,233 simple sequence repeats (SSRs) were identified in 10,411 unigene sequences. Conclusions The present study depicts the first large-scale RNA sequencing of shrimp gonads. We have identified many important sex-related functional genes, GO terms and pathways, all of which will facilitate future research into the reproductive biology of shrimp. We expect that the SSRs detected in this study can then be used as genetic markers for germplasm evaluation of breeding and imported populations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-2219-4) contains supplementary material, which is available to authorized users.
Collapse
|
579
|
Liu Y, Jing R, Xu J, Liu K, Xue J, Wen Z, Li M. Comparative analysis of oncogenes identified by microarray and RNA-sequencing as biomarkers for clinical prognosis. Biomark Med 2015; 9:1067-78. [DOI: 10.2217/bmm.15.97] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Aims: Although RNA-sequencing has been widely used to identify the differentially expressed genes (DEGs) as biomarkers to guide the therapeutic treatment, it is necessary to investigate the concordance of DEGs identified by microarray and RNA-sequencing for the clinical prognosis. Material & methods: By using The Cancer Genome Atlas data sets, we thoroughly investigated the concordance of DEGs identified from microarray and RNA-sequencing data and their molecular functions. Results: The DEGs identified by both technologies averaged ˜98.6% overlap. The cancer-related gene sets were significantly enriched with the DEGs and consistent between two technologies. Conclusions: The highly consistency of DEGs in their regulation directionality and molecular functions indicated the good reproducibility between microarray and RNA-sequencing in identifying potential oncogenes for clinical prognosis.
Collapse
Affiliation(s)
- Yuan Liu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Runyu Jing
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Junmei Xu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Keqin Liu
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Jiwei Xue
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Zhining Wen
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| | - Menglong Li
- College of Chemistry, Sichuan University, Chengdu 610064, PR China
| |
Collapse
|
580
|
Bioinformatic and Statistical Analysis of Adaptive Immune Repertoires. Trends Immunol 2015; 36:738-749. [DOI: 10.1016/j.it.2015.09.006] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2015] [Revised: 09/15/2015] [Accepted: 09/15/2015] [Indexed: 01/16/2023]
|
581
|
Liu Y, Ai N, Liao J, Fan X. Transcriptomics: a sword to cut the Gordian knot of traditional Chinese medicine. Biomark Med 2015; 9:1201-13. [DOI: 10.2217/bmm.15.91] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
The systemic effects of traditional Chinese medicine (TCM) seem to be a Gordian knot, impossible to untie for decades. With the advent of transcriptomics, a useful sword is provided to cut the knot and shed some light on complex bioprocesses and intrinsic connections among them. Here, we revisit studies on TCM ZHENGs using this approach, highlight its applications on elucidating the potential scientific basis of ZHENG and investigating mechanisms of action for the TCM formula, and demonstrating its unique role in novel TCM drug design and discovery through active ingredient detection from TCM and compatibility theory study of TCM. The limitations and future perspectives of transcriptomics approaches to TCM study are also discussed.
Collapse
Affiliation(s)
- Yufeng Liu
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ni Ai
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Jie Liao
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xiaohui Fan
- Pharmaceutical Informatics Institute, College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
582
|
Weber APM. Discovering New Biology through Sequencing of RNA. PLANT PHYSIOLOGY 2015; 169:1524-31. [PMID: 26353759 PMCID: PMC4634082 DOI: 10.1104/pp.15.01081] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Accepted: 09/09/2015] [Indexed: 05/08/2023]
Abstract
Sequencing of RNA (RNA-Seq) was invented approximately 1 decade ago and has since revolutionized biological research. This update provides a brief historic perspective on the development of RNA-Seq and then focuses on the application of RNA-Seq in qualitative and quantitative analyses of transcriptomes. Particular emphasis is given to aspects of data analysis. Since the wet-lab and data analysis aspects of RNA-Seq are still rapidly evolving and novel applications are continuously reported, a printed review will be rapidly outdated and can only serve to provide some examples and general guidelines for planning and conducting RNA-Seq studies. Hence, selected references to frequently update online resources are given.
Collapse
Affiliation(s)
- Andreas P M Weber
- Institute of Plant Biochemistry, Cluster of Excellence on Plant Science, Heinrich-Heine-Universität, D-40231 Duesseldorf, Germany
| |
Collapse
|
583
|
Veeneman BA, Shukla S, Dhanasekaran SM, Chinnaiyan AM, Nesvizhskii AI. Two-pass alignment improves novel splice junction quantification. Bioinformatics 2015; 32:43-9. [PMID: 26519505 DOI: 10.1093/bioinformatics/btv642] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2015] [Accepted: 10/27/2015] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION Discovery of novel splicing from RNA sequence data remains a critical and exciting focus of transcriptomics, but reduced alignment power impedes expression quantification of novel splice junctions. RESULTS Here, we profile performance characteristics of two-pass alignment, which separates splice junction discovery from quantification. Per sample, across a variety of transcriptome sequencing datasets, two-pass alignment improved quantification of at least 94% of simulated novel splice junctions, and provided as much as 1.7-fold deeper median read depth over those splice junctions. We further demonstrate that two-pass alignment works by increasing alignment of reads to splice junctions by short lengths, and that potential alignment errors are readily identifiable by simple classification. Taken together, two-pass alignment promises to advance quantification and discovery of novel splicing events. CONTACT arul@med.umich.edu, nesvi@med.umich.edu AVAILABILITY AND IMPLEMENTATION Two-pass alignment was implemented here as sequential alignment, genome indexing, and re-alignment steps with STAR. Full parameters are provided in Supplementary Table 2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Brendan A Veeneman
- Department of Computational Medicine and Bioinformatics, Michigan Center for Translational Pathology
| | - Sudhanshu Shukla
- Michigan Center for Translational Pathology, Department of Pathology
| | | | - Arul M Chinnaiyan
- Department of Computational Medicine and Bioinformatics, Michigan Center for Translational Pathology, Department of Pathology, Department of Urology and Howard Hughes Medical Institute, University of Michigan Medical School, Ann Arbor, Michigan 48109, USA
| | - Alexey I Nesvizhskii
- Department of Computational Medicine and Bioinformatics, Michigan Center for Translational Pathology, Department of Pathology
| |
Collapse
|
584
|
Zheng Y, Qing T, Song Y, Zhu J, Yu Y, Shi W, Pusztai L, Shi L. Standardization efforts enabling next-generation sequencing and microarray based biomarkers for precision medicine. Biomark Med 2015; 9:1265-72. [PMID: 26502353 DOI: 10.2217/bmm.15.99] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Microarrays and next-generation sequencing technologies have been increasingly employed in biomedical research. However, before they can be reliably used as clinical biomarker tests, standardization and quality control measures need to be developed to ensure their analytical validity. This review summarizes community-wide efforts such as the MicroArray and Sequencing Quality Control (MAQC/SEQC) project which have identified factors influencing the performance of these technologies. Consequently, consensus-based standards and well-documented best practices have been developed to improve the quality of scientific research, and reference materials and reference datasets have been made available for evaluating the technical proficiency in future studies. These efforts have built the foundation on which the translational application of genomics based technologies can help realize precision medicine.
Collapse
Affiliation(s)
- Yuanting Zheng
- Center for Pharmacogenomics & Department of Clinical Pharmacy, School of Pharmacy, Fudan University, Shanghai, China
| | - Tao Qing
- Center for Pharmacogenomics & Department of Clinical Pharmacy, School of Pharmacy, Fudan University, Shanghai, China
| | - Yunjie Song
- Center for Pharmacogenomics & Department of Clinical Pharmacy, School of Pharmacy, Fudan University, Shanghai, China
| | - Jinhang Zhu
- Center for Pharmacogenomics & Department of Clinical Pharmacy, School of Pharmacy, Fudan University, Shanghai, China
| | - Ying Yu
- Collaborative Innovation Center for Genetics & Development, State Key Laboratory of Genetic Engineering & MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
| | - Weiwei Shi
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Lajos Pusztai
- Breast Medical Oncology, Yale Cancer Center, Yale School of Medicine, New Haven, CT, USA
| | - Leming Shi
- Center for Pharmacogenomics & Department of Clinical Pharmacy, School of Pharmacy, Fudan University, Shanghai, China.,Collaborative Innovation Center for Genetics & Development, State Key Laboratory of Genetic Engineering & MOE Key Laboratory of Contemporary Anthropology, School of Life Sciences, Fudan University, Shanghai, China
| |
Collapse
|
585
|
Aviner R, Shenoy A, Elroy-Stein O, Geiger T. Uncovering Hidden Layers of Cell Cycle Regulation through Integrative Multi-omic Analysis. PLoS Genet 2015; 11:e1005554. [PMID: 26439921 PMCID: PMC4595013 DOI: 10.1371/journal.pgen.1005554] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 09/08/2015] [Indexed: 11/24/2022] Open
Abstract
Studying the complex relationship between transcription, translation and protein degradation is essential to our understanding of biological processes in health and disease. The limited correlations observed between mRNA and protein abundance suggest pervasive regulation of post-transcriptional steps and support the importance of profiling mRNA levels in parallel to protein synthesis and degradation rates. In this work, we applied an integrative multi-omic approach to study gene expression along the mammalian cell cycle through side-by-side analysis of mRNA, translation and protein levels. Our analysis sheds new light on the significant contribution of both protein synthesis and degradation to the variance in protein expression. Furthermore, we find that translation regulation plays an important role at S-phase, while progression through mitosis is predominantly controlled by changes in either mRNA levels or protein stability. Specific molecular functions are found to be co-regulated and share similar patterns of mRNA, translation and protein expression along the cell cycle. Notably, these include genes and entire pathways not previously implicated in cell cycle progression, demonstrating the potential of this approach to identify novel regulatory mechanisms beyond those revealed by traditional expression profiling. Through this three-level analysis, we characterize different mechanisms of gene expression, discover new cycling gene products and highlight the importance and utility of combining datasets generated using different techniques that monitor distinct steps of gene expression. How the genetic program of a cell unfolds to execute complex functions depends on a dynamic interplay between multiple steps that include transcription of DNA into mRNA, translation of mRNA into protein and post-translational degradation of mature proteins. Profiling of gene expression is traditionally based on measurements of steady-state mRNA levels, but recent studies have shown that mRNA and protein levels are highly discordant, suggesting that post-transcriptional mechanisms play a dominant role in modulating protein abundance. Here we combine measurements of mRNA, translation and protein across the mammalian cell cycle to uncover the hidden complexity of cell cycle regulation. Using this approach, we gain insights into the dynamics of protein synthesis and degradation and identify new genes and functions that cycle through cell division by periodic changes in translation or degradation rates. Integrative multi-omic analyses combining information on the transcriptome, translatome and proteome hold great promise for providing transformative biological insights in a variety of model systems.
Collapse
Affiliation(s)
- Ranen Aviner
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
| | - Anjana Shenoy
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Orna Elroy-Stein
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OES); (TG)
| | - Tamar Geiger
- Department of Human Molecular Genetics and Biochemistry, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
- * E-mail: (OES); (TG)
| |
Collapse
|
586
|
Gant T. The importance of data quality to enhance the impact of omics sciences. Toxicol Lett 2015. [DOI: 10.1016/j.toxlet.2015.08.097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
587
|
Harreither E, Hackl M, Pichler J, Shridhar S, Auer N, Łabaj PP, Scheideler M, Karbiener M, Grillari J, Kreil DP, Borth N. Microarray profiling of preselected CHO host cell subclones identifies gene expression patterns associated with increased production capacity. Biotechnol J 2015; 10:1625-38. [PMID: 26315449 DOI: 10.1002/biot.201400857] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2014] [Revised: 06/22/2015] [Accepted: 08/21/2015] [Indexed: 01/02/2023]
Abstract
Over the last three decades, product yields from CHO cells have increased dramatically, yet specific productivity (qP) remains a limiting factor. In a previous study, using repeated cell-sorting, we have established different host cell subclones that show superior transient qP over their respective parental cell lines (CHO-K1, CHO-S). The transcriptome of the resulting six cell lines in different biological states (untransfected, mock transfected, plasmid transfected) was first explored by hierarchical clustering and indicated that gene activity associated with increased qP did not stem from a certain cellular state but seemed to be inherent for a high qP host line. We then performed a novel gene regression analysis identifying drivers for an increase in qP. Genes significantly implicated were first systematically tested for enrichment of GO terms using a Bayesian approach incorporating the hierarchical structure of the GO term tree. Results indicated that specific cellular components such as nucleus, ER, and Golgi are relevant for cellular productivity. This was complemented by targeted GSA that tested functionally homogeneous, manually curated subsets of KEGG pathways known to be involved in transcription, translation, and protein processing. Significantly implicated pathways included mRNA surveillance, proteasome, protein processing in the ER and SNARE interactions in vesicular transport.
Collapse
Affiliation(s)
- Eva Harreither
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Matthias Hackl
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Johannes Pichler
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Smriti Shridhar
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | | | - Paweł P Łabaj
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Marcel Scheideler
- RNA Biology Group, Institute for Genomics and Bioinformatics, Graz University of Technology, Graz, Austria
| | - Michael Karbiener
- RNA Biology Group, Institute for Genomics and Bioinformatics, Graz University of Technology, Graz, Austria
| | - Johannes Grillari
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - David P Kreil
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria
| | - Nicole Borth
- Department of Biotechnology, University of Natural Resources and Life Sciences, Vienna, Austria. .,ACIB GmbH, Graz, Austria.
| |
Collapse
|
588
|
RNA Enrichment Method for Quantitative Transcriptional Analysis of Pathogens In Vivo Applied to the Fungus Candida albicans. mBio 2015; 6:e00942-15. [PMID: 26396240 PMCID: PMC4600103 DOI: 10.1128/mbio.00942-15] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
UNLABELLED In vivo transcriptional analyses of microbial pathogens are often hampered by low proportions of pathogen biomass in host organs, hindering the coverage of full pathogen transcriptome. We aimed to address the transcriptome profiles of Candida albicans, the most prevalent fungal pathogen in systemically infected immunocompromised patients, during systemic infection in different hosts. We developed a strategy for high-resolution quantitative analysis of the C. albicans transcriptome directly from early and late stages of systemic infection in two different host models, mouse and the insect Galleria mellonella. Our results show that transcriptome sequencing (RNA-seq) libraries were enriched for fungal transcripts up to 1,600-fold using biotinylated bait probes to capture C. albicans sequences. This enrichment biased the read counts of only ~3% of the genes, which can be identified and removed based on a priori criteria. This allowed an unprecedented resolution of C. albicans transcriptome in vivo, with detection of over 86% of its genes. The transcriptional response of the fungus was surprisingly similar during infection of the two hosts and at the two time points, although some host- and time point-specific genes could be identified. Genes that were highly induced during infection were involved, for instance, in stress response, adhesion, iron acquisition, and biofilm formation. Of the in vivo-regulated genes, 10% are still of unknown function, and their future study will be of great interest. The fungal RNA enrichment procedure used here will help a better characterization of the C. albicans response in infected hosts and may be applied to other microbial pathogens. IMPORTANCE Understanding the mechanisms utilized by pathogens to infect and cause disease in their hosts is crucial for rational drug development. Transcriptomic studies may help investigations of these mechanisms by determining which genes are expressed specifically during infection. This task has been difficult so far, since the proportion of microbial biomass in infected tissues is often extremely low, thus limiting the depth of sequencing and comprehensive transcriptome analysis. Here, we adapted a technology to capture and enrich C. albicans RNA, which was next used for deep RNA sequencing directly from infected tissues from two different host organisms. The high-resolution transcriptome revealed a large number of genes that were so far unknown to participate in infection, which will likely constitute a focus of study in the future. More importantly, this method may be adapted to perform transcript profiling of any other microbes during host infection or colonization.
Collapse
|
589
|
MetaRNA-Seq: An Interactive Tool to Browse and Annotate Metadata from RNA-Seq Studies. BIOMED RESEARCH INTERNATIONAL 2015; 2015:318064. [PMID: 26380270 PMCID: PMC4561952 DOI: 10.1155/2015/318064] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Revised: 04/10/2015] [Accepted: 04/11/2015] [Indexed: 11/17/2022]
Abstract
The number of RNA-Seq studies has grown in recent years. The design of RNA-Seq studies varies from very simple (e.g., two-condition case-control) to very complicated (e.g., time series involving multiple samples at each time point with separate drug treatments). Most of these publically available RNA-Seq studies are deposited in NCBI databases, but their metadata are scattered throughout four different databases: Sequence Read Archive (SRA), Biosample, Bioprojects, and Gene Expression Omnibus (GEO). Although the NCBI web interface is able to provide all of the metadata information, it often requires significant effort to retrieve study- or project-level information by traversing through multiple hyperlinks and going to another page. Moreover, project- and study-level metadata lack manual or automatic curation by categories, such as disease type, time series, case-control, or replicate type, which are vital to comprehending any RNA-Seq study. Here we describe "MetaRNA-Seq," a new tool for interactively browsing, searching, and annotating RNA-Seq metadata with the capability of semiautomatic curation at the study level.
Collapse
|
590
|
Yu J, Cliften PF, Juehne TI, Sinnwell TM, Sawyer CS, Sharma M, Lutz A, Tycksen E, Johnson MR, Minton MR, Klotz ET, Schriefer AE, Yang W, Heinz ME, Crosby SD, Head RD. Multi-platform assessment of transcriptional profiling technologies utilizing a precise probe mapping methodology. BMC Genomics 2015; 16:710. [PMID: 26385698 PMCID: PMC4575490 DOI: 10.1186/s12864-015-1913-6] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2015] [Accepted: 09/09/2015] [Indexed: 12/05/2022] Open
Abstract
Background The arrival of RNA-seq as a high-throughput method competitive to the established microarray technologies has necessarily driven a need for comparative evaluation. To date, cross-platform comparisons of these technologies have been relatively few in number of platforms analyzed and were typically gene name annotation oriented. Here, we present a more extensive and yet precise assessment to elucidate differences and similarities in performance of numerous aspects including dynamic range, fidelity of raw signal and fold-change with sample titration, and concordance with qRT-PCR (TaqMan). To ensure that these results were not confounded by incompatible comparisons, we introduce the concept of probe mapping directed “transcript pattern”. A transcript pattern identifies probe(set)s across platforms that target a common set of transcripts for a specific gene. Thus, three levels of data were examined: entire data sets, data derived from a subset of 15,442 RefSeq genes common across platforms, and data derived from the transcript pattern defined subset of 7,034 RefSeq genes. Results In general, there were substantial core similarities between all 6 platforms evaluated; but, to varying degrees, the two RNA-seq protocols outperformed three of the four microarray platforms in most categories. Notably, a fourth microarray platform, Agilent with a modified protocol, was comparable, or marginally superior, to the RNA-seq protocols within these same assessments, especially in regards to fold-change evaluation. Furthermore, these 3 platforms (Agilent and two RNA-seq methods) demonstrated over 80 % fold-change concordance with the gold standard qRT-PCR (TaqMan). Conclusions This study suggests that microarrays can perform on nearly equal footing with RNA-seq, in certain key features, specifically when the dynamic range is comparable. Furthermore, the concept of a transcript pattern has been introduced that may minimize potential confounding factors of multi-platform comparison and may be useful for similar evaluations. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1913-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jinsheng Yu
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Paul F Cliften
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Twyla I Juehne
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Toni M Sinnwell
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Chris S Sawyer
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Mala Sharma
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Andrew Lutz
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Eric Tycksen
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Mark R Johnson
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Matthew R Minton
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Elliott T Klotz
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Andrew E Schriefer
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Wei Yang
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Michael E Heinz
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Seth D Crosby
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA
| | - Richard D Head
- Genome Technology Access Center, Department of Genetics, Washington University in Saint Louis School of Medicine, 660 S. Euclid Ave. Campus Box 8232, Saint Louis, MO, 63110, USA.
| |
Collapse
|
591
|
Hagel JM, Morris JS, Lee EJ, Desgagné-Penix I, Bross CD, Chang L, Chen X, Farrow SC, Zhang Y, Soh J, Sensen CW, Facchini PJ. Transcriptome analysis of 20 taxonomically related benzylisoquinoline alkaloid-producing plants. BMC PLANT BIOLOGY 2015; 15:227. [PMID: 26384972 PMCID: PMC4575454 DOI: 10.1186/s12870-015-0596-0] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/03/2015] [Accepted: 08/15/2015] [Indexed: 05/18/2023]
Abstract
BACKGROUND Benzylisoquinoline alkaloids (BIAs) represent a diverse class of plant specialized metabolites sharing a common biosynthetic origin beginning with tyrosine. Many BIAs have potent pharmacological activities, and plants accumulating them boast long histories of use in traditional medicine and cultural practices. The decades-long focus on a select number of plant species as model systems has allowed near or full elucidation of major BIA pathways, including those of morphine, sanguinarine and berberine. However, this focus has created a dearth of knowledge surrounding non-model species, which also are known to accumulate a wide-range of BIAs but whose biosynthesis is thus far entirely unexplored. Further, these non-model species represent a rich source of catalyst diversity valuable to plant biochemists and emerging synthetic biology efforts. RESULTS In order to access the genetic diversity of non-model plants accumulating BIAs, we selected 20 species representing 4 families within the Ranunculales. RNA extracted from each species was processed for analysis by both 1) Roche GS-FLX Titanium and 2) Illumina GA/HiSeq platforms, generating a total of 40 deep-sequencing transcriptome libraries. De novo assembly, annotation and subsequent full-length coding sequence (CDS) predictions indicated greater success for most species using the Illumina-based platform. Assembled data for each transcriptome were deposited into an established web-based BLAST portal ( www.phytometasyn.ca) to allow public access. Homology-based mining of libraries using BIA-biosynthetic enzymes as queries yielded ~850 gene candidates potentially involved in alkaloid biosynthesis. Expression analysis of these candidates was performed using inter-library FPKM normalization methods. These expression data provide a basis for the rational selection of gene candidates, and suggest possible metabolic bottlenecks within BIA metabolism. Phylogenetic analysis was performed for each of 15 different enzyme/protein groupings, highlighting many novel genes with potential involvement in the formation of one or more alkaloid types, including morphinan, aporphine, and phthalideisoquinoline alkaloids. Transcriptome resources were used to design and execute a case study of candidate N-methyltransferases (NMTs) from Glaucium flavum, which revealed predicted and novel enzyme activities. CONCLUSIONS This study establishes an essential resource for the isolation and discovery of 1) functional homologues and 2) entirely novel catalysts within BIA metabolism. Functional analysis of G. flavum NMTs demonstrated the utility of this resource and underscored the importance of empirical determination of proposed enzymatic function. Publically accessible, fully annotated, BLAST-accessible transcriptomes were not previously available for most species included in this report, despite the rich repertoire of bioactive alkaloids found in these plants and their importance to traditional medicine. The results presented herein provide essential sequence information and inform experimental design for the continued elucidation of BIA metabolism.
Collapse
Affiliation(s)
- Jillian M Hagel
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Jeremy S Morris
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Eun-Jeong Lee
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Isabel Desgagné-Penix
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
- Current address: Département de Chimie, Biochimie et Physique, Université du Québec à Trois-Rivières, Trois-Rivières, QC, G9A 5H7, Canada.
| | - Crystal D Bross
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Limei Chang
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Xue Chen
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Scott C Farrow
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| | - Ye Zhang
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| | - Jung Soh
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, T2N 4N1, Canada.
| | - Christoph W Sensen
- Department of Biochemistry and Molecular Biology, University of Calgary, Calgary, AB, T2N 4N1, Canada.
- Current address: Institute of Molecular Biotechnology, Graz University of Technology, Graz, A-8010, Austria.
| | - Peter J Facchini
- Department of Biological Sciences, University of Calgary, Calgary, AB, T2N 1N4, Canada.
| |
Collapse
|
592
|
Parsons J, Munro S, Pine PS, McDaniel J, Mehaffey M, Salit M. Using mixtures of biological samples as process controls for RNA-sequencing experiments. BMC Genomics 2015; 16:708. [PMID: 26383878 PMCID: PMC4574543 DOI: 10.1186/s12864-015-1912-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2015] [Accepted: 09/09/2015] [Indexed: 12/02/2022] Open
Abstract
Background Genome-scale “-omics” measurements are challenging to benchmark due to the enormous variety of unique biological molecules involved. Mixtures of previously-characterized samples can be used to benchmark repeatability and reproducibility using component proportions as truth for the measurement. We describe and evaluate experiments characterizing the performance of RNA-sequencing (RNA-Seq) measurements, and discuss cases where mixtures can serve as effective process controls. Results We apply a linear model to total RNA mixture samples in RNA-seq experiments. This model provides a context for performance benchmarking. The parameters of the model fit to experimental results can be evaluated to assess bias and variability of the measurement of a mixture. A linear model describes the behavior of mixture expression measures and provides a context for performance benchmarking. Residuals from fitting the model to experimental data can be used as a metric for evaluating the effect that an individual step in an experimental process has on the linear response function and precision of the underlying measurement while identifying signals affected by interference from other sources. Effective benchmarking requires well-defined mixtures, which for RNA-Seq requires knowledge of the post-enrichment ‘target RNA’ content of the individual total RNA components. We demonstrate and evaluate an experimental method suitable for use in genome-scale process control and lay out a method utilizing spike-in controls to determine enriched RNA content of total RNA in samples. Conclusions Genome-scale process controls can be derived from mixtures. These controls relate prior knowledge of individual components to a complex mixture, allowing assessment of measurement performance. The target RNA fraction accounts for differential selection of RNA out of variable total RNA samples. Spike-in controls can be utilized to measure this relationship between target RNA content and input total RNA. Our mixture analysis method also enables estimation of the proportions of an unknown mixture, even when component-specific markers are not previously known, whenever pure components are measured alongside the mixture. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1912-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Jerod Parsons
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. .,Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.
| | - Sarah Munro
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. .,Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.
| | - P Scott Pine
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. .,Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA.
| | - Michele Mehaffey
- Leidos Biomedical Research Inc., P.O. Box B Bldg 428, Frederick, MD, 21702, USA.
| | - Marc Salit
- Material Measurement Laboratory, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD, 20899, USA. .,Department of Bioengineering, Stanford University, 443 Via Ortega, Stanford, CA, 94305, USA.
| |
Collapse
|
593
|
Eicher JD, Wakabayashi Y, Vitseva O, Esa N, Yang Y, Zhu J, Freedman JE, McManus DD, Johnson AD. Characterization of the platelet transcriptome by RNA sequencing in patients with acute myocardial infarction. Platelets 2015; 27:230-9. [PMID: 26367242 DOI: 10.3109/09537104.2015.1083543] [Citation(s) in RCA: 95] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Transcripts in platelets are largely produced in precursor megakaryocytes but remain physiologically active as platelets translate RNAs and regulate protein/RNA levels. Recent studies using transcriptome sequencing (RNA-seq) characterized the platelet transcriptome in limited number of non-diseased individuals. Here, we expand upon these RNA-seq studies by completing RNA-seq in platelets from 32 patients with acute myocardial infarction (MI). Our goals were to characterize the platelet transcriptome using a population of patients with acute MI and relate gene expression to platelet aggregation measures and ST-segment elevation MI (STEMI) (n = 16) vs. non-STEMI (NSTEMI) (n = 16) subtypes. Similar to other studies, we detected 9565 expressed transcripts, including several known platelet-enriched markers (e.g. PPBP, OST4). Our RNA-seq data strongly correlated with independently ascertained platelet expression data and showed enrichment for platelet-related pathways (e.g. wound response, hemostasis, and platelet activation), as well as actin-related and post-transcriptional processes. Several transcripts displayed suggestively higher (FBXL4, ECHDC3, KCNE1, TAOK2, AURKB, ERG, and FKBP5) and lower (MIAT, PVRL3, and PZP) expression in STEMI platelets compared to NSTEMI. We also identified transcripts correlated with platelet aggregation to TRAP (ATP6V1G2, SLC2A3), collagen (CEACAM1, ITGA2), and ADP (PDGFB, PDGFC, ST3GAL6). Our study adds to current platelet gene expression resources by providing transcriptome-wide analyses in platelets isolated from patients with acute MI. In concert with prior studies, we identify various genes for further study in regards to platelet function and acute MI. Future platelet RNA-seq studies examining more diverse sets of healthy and diseased samples will add to our understanding of platelet thrombotic and non-thrombotic functions.
Collapse
Affiliation(s)
- John D Eicher
- a The Framingham Heart Study , Framingham , MA , USA .,b National Heart, Lung, and Blood Institute, Division of Intramural Research, Population Sciences Branch , Bethesda , MD , USA
| | - Yoshiyuki Wakabayashi
- c National Heart, Lung, and Blood Institute, Division of Intramural Research, DNA Sequencing and Genomics Core Laboratory , Bethesda , MD , USA
| | - Olga Vitseva
- d Department of Medicine, Division of Cardiovascular Medicine , University of Massachusetts Medical School , Worcester , MA , USA , and
| | - Nada Esa
- e Memorial Heart and Vascular Center, University of Massachusetts , Worcester , MA , USA
| | - Yanqin Yang
- c National Heart, Lung, and Blood Institute, Division of Intramural Research, DNA Sequencing and Genomics Core Laboratory , Bethesda , MD , USA
| | - Jun Zhu
- c National Heart, Lung, and Blood Institute, Division of Intramural Research, DNA Sequencing and Genomics Core Laboratory , Bethesda , MD , USA
| | - Jane E Freedman
- e Memorial Heart and Vascular Center, University of Massachusetts , Worcester , MA , USA
| | - David D McManus
- e Memorial Heart and Vascular Center, University of Massachusetts , Worcester , MA , USA
| | - Andrew D Johnson
- a The Framingham Heart Study , Framingham , MA , USA .,b National Heart, Lung, and Blood Institute, Division of Intramural Research, Population Sciences Branch , Bethesda , MD , USA
| |
Collapse
|
594
|
Webster AF, Zumbo P, Fostel J, Gandara J, Hester SD, Recio L, Williams A, Wood CE, Yauk CL, Mason CE. Mining the Archives: A Cross-Platform Analysis of Gene Expression Profiles in Archival Formalin-Fixed Paraffin-Embedded Tissues. Toxicol Sci 2015; 148:460-72. [PMID: 26361796 PMCID: PMC4659533 DOI: 10.1093/toxsci/kfv195] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Formalin-fixed paraffin-embedded (FFPE) tissue samples represent a potentially invaluable resource for transcriptomic research. However, use of FFPE samples in genomic studies has been limited by technical challenges resulting from nucleic acid degradation. Here we evaluated gene expression profiles derived from fresh-frozen (FRO) and FFPE mouse liver tissues preserved in formalin for different amounts of time using 2 DNA microarray protocols and 2 whole-transcriptome sequencing (RNA-seq) library preparation methodologies. The ribo-depletion protocol outperformed the other methods by having the highest correlations of differentially expressed genes (DEGs), and best overlap of pathways, between FRO and FFPE groups. The effect of sample time in formalin (18 h or 3 weeks) on gene expression profiles indicated that test article treatment, not preservation method, was the main driver of gene expression profiles. Meta- and pathway analyses indicated that biological responses were generally consistent for 18 h and 3 week FFPE samples compared with FRO samples. However, clear erosion of signal intensity with time in formalin was evident, and DEG numbers differed by platform and preservation method. Lastly, we investigated the effect of time in paraffin on genomic profiles. Ribo-depletion RNA-seq analysis of 8-, 19-, and 26-year-old control blocks resulted in comparable quality metrics, including expected distributions of mapped reads to exonic, untranslated region, intronic, and ribosomal fractions of the transcriptome. Overall, our results indicate that FFPE samples are appropriate for use in genomic studies in which frozen samples are not available, and that ribo-depletion RNA-seq is the preferred method for this type of analysis in archival and long-aged FFPE samples.
Collapse
Affiliation(s)
- A Francina Webster
- *Environmental Health Science and Research Bureau, Health Canada, Ottawa K1A 0K9, Canada; Department of Biology, Carleton University, 1125 Colonel By Drive, Ottawa K1S 5B6, Canada
| | - Paul Zumbo
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065
| | - Jennifer Fostel
- National Institute of Environmental Health Sciences, Research Triangle Park, North Carolina 27709
| | - Jorge Gandara
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065
| | - Susan D Hester
- Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, North Carolina 27709
| | - Leslie Recio
- ILS, Inc., PO Box 13501, Research Triangle Park, North Carolina 27709
| | - Andrew Williams
- *Environmental Health Science and Research Bureau, Health Canada, Ottawa K1A 0K9, Canada
| | - Charles E Wood
- Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, North Carolina 27709
| | - Carole L Yauk
- *Environmental Health Science and Research Bureau, Health Canada, Ottawa K1A 0K9, Canada;
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York 10065; The Feil Family Brain and Mind Research Institute (BMRI), 413 East 69th Street, New York, New York 10021; and The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, 1305 York Avenue, New York, New York 10065
| |
Collapse
|
595
|
Kelly H, Downing T, Tuite NL, Smith TJ, Kerin MJ, Dwyer RM, Clancy E, Barry T, Reddington K. Cross Platform Standardisation of an Experimental Pipeline for Use in the Identification of Dysregulated Human Circulating MiRNAs. PLoS One 2015; 10:e0137389. [PMID: 26355751 PMCID: PMC4565682 DOI: 10.1371/journal.pone.0137389] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2015] [Accepted: 08/17/2015] [Indexed: 12/21/2022] Open
Abstract
Introduction Micro RNAs (miRNAs) are a class of highly conserved small non-coding RNAs that play an important part in the post-transcriptional regulation of gene expression. A substantial number of miRNAs have been proposed as biomarkers for diseases. While reverse transcriptase Real-time PCR (RT-qPCR) is considered the gold standard for the evaluation and validation of miRNA biomarkers, small RNA sequencing is now routinely being adopted for the identification of dysregulated miRNAs. However, in many cases where putative miRNA biomarkers are identified using small RNA sequencing, they are not substantiated when RT-qPCR is used for validation. To date, there is a lack of consensus regarding optimal methodologies for miRNA detection, quantification and standardisation when different platform technologies are used. Materials and Methods In this study we present an experimental pipeline that takes into consideration sample collection, processing, enrichment, and the subsequent comparative analysis of circulating small ribonucleic acids using small RNA sequencing and RT-qPCR. Results, Discussion, Conclusions Initially, a panel of miRNAs dysregulated in circulating blood from breast cancer patients compared to healthy women were identified using small RNA sequencing. MiR-320a was identified as the most dysregulated miRNA between the two female cohorts. Total RNA and enriched small RNA populations (<30 bp) isolated from peripheral blood from the same female cohort samples were then tested for using a miR-320a RT-qPCR assay. When total RNA was analysed with this miR-320a RT-qPCR assay, a 2.3-fold decrease in expression levels was observed between blood samples from healthy controls and breast cancer patients. However, upon enrichment for the small RNA population and subsequent analysis of miR-320a using RT-qPCR, its dysregulation in breast cancer patients was more pronounced with an 8.89-fold decrease in miR-320a expression. We propose that the experimental pipeline outlined could serve as a robust approach for the identification and validation of small RNA biomarkers for disease.
Collapse
Affiliation(s)
- Helena Kelly
- Nucleic Acid Diagnostics Research Laboratory (NADRL), Microbiology, School of Natural Sciences, National University of Ireland, Galway, Ireland
| | - Tim Downing
- School of Biotechnology, Dublin City University, Dublin, Ireland
| | - Nina L. Tuite
- Nucleic Acid Diagnostics Research Laboratory (NADRL), Microbiology, School of Natural Sciences, National University of Ireland, Galway, Ireland
| | - Terry J. Smith
- Molecular Diagnostics Research Group (MDRG), School of Natural Sciences, National University of Ireland, Galway, Ireland
- Biomedical Diagnostics Institute (BDI) Programme, National University of Ireland, Galway, Ireland
| | - Michael J. Kerin
- Discipline of Surgery, School of Medicine, National University of Ireland Galway, Galway, Ireland
| | - Róisín M. Dwyer
- Discipline of Surgery, School of Medicine, National University of Ireland Galway, Galway, Ireland
| | - Eoin Clancy
- Molecular Diagnostics Research Group (MDRG), School of Natural Sciences, National University of Ireland, Galway, Ireland
- Biomedical Diagnostics Institute (BDI) Programme, National University of Ireland, Galway, Ireland
| | - Thomas Barry
- Nucleic Acid Diagnostics Research Laboratory (NADRL), Microbiology, School of Natural Sciences, National University of Ireland, Galway, Ireland
- * E-mail:
| | - Kate Reddington
- Nucleic Acid Diagnostics Research Laboratory (NADRL), Microbiology, School of Natural Sciences, National University of Ireland, Galway, Ireland
| |
Collapse
|
596
|
Yang C, Wu PY, Tong L, Phan JH, Wang MD. The impact of RNA-seq aligners on gene expression estimation. ACM-BCB ... ... : THE ... ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE. ACM CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND BIOMEDICINE 2015; 2015:462-471. [PMID: 27583310 DOI: 10.1145/2808719.2808767] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
While numerous RNA-seq data analysis pipelines are available, research has shown that the choice of pipeline influences the results of differentially expressed gene detection and gene expression estimation. Gene expression estimation is a key step in RNA-seq data analysis, since the accuracy of gene expression estimates profoundly affects the subsequent analysis. Generally, gene expression estimation involves sequence alignment and quantification, and accurate gene expression estimation requires accurate alignment. However, the impact of aligners on gene expression estimation remains unclear. We address this need by constructing nine pipelines consisting of nine spliced aligners and one quantifier. We then use simulated data to investigate the impact of aligners on gene expression estimation. To evaluate alignment, we introduce three alignment performance metrics, (1) the percentage of reads aligned, (2) the percentage of reads aligned with zero mismatch (ZeroMismatchPercentage), and (3) the percentage of reads aligned with at most one mismatch (ZeroOneMismatchPercentage). We then evaluate the impact of alignment performance on gene expression estimation using three metrics, (1) gene detection accuracy, (2) the number of genes falsely quantified (FalseExpNum), and (3) the number of genes with falsely estimated fold changes (FalseFcNum). We found that among various pipelines, FalseExpNum and FalseFcNum are correlated. Moreover, FalseExpNum is linearly correlated with the percentage of reads aligned and ZeroMismatchPercentage, and FalseFcNum is linearly correlated with ZeroMismatchPercentage. Because of this correlation, the percentage of reads aligned and ZeroMismatchPercentage may be used to assess the performance of gene expression estimation for all RNA-seq datasets.
Collapse
Affiliation(s)
- Cheng Yang
- Department of Biomedical Engineering, Georgia Institute of Technology, Emory University, and Peking University, Atlanta, GA 30332, USA
| | - Po-Yen Wu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA, 30332, USA
| | - Li Tong
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - John H Phan
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| | - May D Wang
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, GA 30332, USA
| |
Collapse
|
597
|
Jia Z, Zhang X, Guan N, Bo X, Barnes MR, Luo Z. Gene Ranking of RNA-Seq Data via Discriminant Non-Negative Matrix Factorization. PLoS One 2015; 10:e0137782. [PMID: 26348772 PMCID: PMC4562600 DOI: 10.1371/journal.pone.0137782] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2015] [Accepted: 07/28/2015] [Indexed: 02/06/2023] Open
Abstract
RNA-sequencing is rapidly becoming the method of choice for studying the full complexity of transcriptomes, however with increasing dimensionality, accurate gene ranking is becoming increasingly challenging. This paper proposes an accurate and sensitive gene ranking method that implements discriminant non-negative matrix factorization (DNMF) for RNA-seq data. To the best of our knowledge, this is the first work to explore the utility of DNMF for gene ranking. When incorporating Fisher’s discriminant criteria and setting the reduced dimension as two, DNMF learns two factors to approximate the original gene expression data, abstracting the up-regulated or down-regulated metagene by using the sample label information. The first factor denotes all the genes’ weights of two metagenes as the additive combination of all genes, while the second learned factor represents the expression values of two metagenes. In the gene ranking stage, all the genes are ranked as a descending sequence according to the differential values of the metagene weights. Leveraging the nature of NMF and Fisher’s criterion, DNMF can robustly boost the gene ranking performance. The Area Under the Curve analysis of differential expression analysis on two benchmarking tests of four RNA-seq data sets with similar phenotypes showed that our proposed DNMF-based gene ranking method outperforms other widely used methods. Moreover, the Gene Set Enrichment Analysis also showed DNMF outweighs others. DNMF is also computationally efficient, substantially outperforming all other benchmarked methods. Consequently, we suggest DNMF is an effective method for the analysis of differential gene expression and gene ranking for RNA-seq data.
Collapse
Affiliation(s)
- Zhilong Jia
- Department of Chemistry and Biology, College of Science, National University of Defense Technology, Changsha, Hunan, P.R. China
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
| | - Xiang Zhang
- Science and Technology on Parallel and Distributed Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan, P.R. China
| | - Naiyang Guan
- Science and Technology on Parallel and Distributed Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan, P.R. China
| | - Xiaochen Bo
- Beijing Institute of Radiation Medicine, Beijing, P.R. China
| | - Michael R. Barnes
- William Harvey Research Institute, Barts and The London School of Medicine and Dentistry, Queen Mary University of London, London, United Kingdom
- * E-mail: (MRB); (ZL)
| | - Zhigang Luo
- Science and Technology on Parallel and Distributed Processing Laboratory, College of Computer, National University of Defense Technology, Changsha, Hunan, P.R. China
- * E-mail: (MRB); (ZL)
| |
Collapse
|
598
|
Lee S, Seo CH, Alver BH, Lee S, Park PJ. EMSAR: estimation of transcript abundance from RNA-seq data by mappability-based segmentation and reclustering. BMC Bioinformatics 2015; 16:278. [PMID: 26335049 PMCID: PMC4559005 DOI: 10.1186/s12859-015-0704-z] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2015] [Accepted: 08/13/2015] [Indexed: 11/10/2022] Open
Abstract
Background RNA-seq has been widely used for genome-wide expression profiling. RNA-seq data typically consists of tens of millions of short sequenced reads from different transcripts. However, due to sequence similarity among genes and among isoforms, the source of a given read is often ambiguous. Existing approaches for estimating expression levels from RNA-seq reads tend to compromise between accuracy and computational cost. Results We introduce a new approach for quantifying transcript abundance from RNA-seq data. EMSAR (Estimation by Mappability-based Segmentation And Reclustering) groups reads according to the set of transcripts to which they are mapped and finds maximum likelihood estimates using a joint Poisson model for each optimal set of segments of transcripts. The method uses nearly all mapped reads, including those mapped to multiple genes. With an efficient transcriptome indexing based on modified suffix arrays, EMSAR minimizes the use of CPU time and memory while achieving accuracy comparable to the best existing methods. Conclusions EMSAR is a method for quantifying transcripts from RNA-seq data with high accuracy and low computational cost. EMSAR is available at https://github.com/parklab/emsar Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0704-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Soohyun Lee
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Chae Hwa Seo
- Emerging Technology Center, DNA link, Seoul, South Korea
| | - Burak Han Alver
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Sanghyuk Lee
- Emerging Technology Center, DNA link, Seoul, South Korea.,Ewha Womans University, Seoul, Korea
| | - Peter J Park
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. .,Informatics Program, Boston Children's Hospital and Division of Genetics, Brigham and Women's Hospital, Boston, MA, USA.
| |
Collapse
|
599
|
Uren Webster TM, Shears JA, Moore K, Santos EM. Identification of conserved hepatic transcriptomic responses to 17β-estradiol using high-throughput sequencing in brown trout. Physiol Genomics 2015; 47:420-31. [PMID: 26082144 PMCID: PMC4556936 DOI: 10.1152/physiolgenomics.00123.2014] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 06/08/2015] [Indexed: 01/11/2023] Open
Abstract
Estrogenic chemicals are major contaminants of surface waters and can threaten the sustainability of natural fish populations. Characterization of the global molecular mechanisms of toxicity of environmental contaminants has been conducted primarily in model species rather than species with limited existing transcriptomic or genomic sequence information. We aimed to investigate the global mechanisms of toxicity of an endocrine disrupting chemical of environmental concern [17β-estradiol (E2)] using high-throughput RNA sequencing (RNA-Seq) in an environmentally relevant species, brown trout (Salmo trutta). We exposed mature males to measured concentrations of 1.94, 18.06, and 34.38 ng E2/l for 4 days and sequenced three individual liver samples per treatment using an Illumina HiSeq 2500 platform. Exposure to 34.4 ng E2/L resulted in 2,113 differentially regulated transcripts (FDR < 0.05). Functional analysis revealed upregulation of processes associated with vitellogenesis, including lipid metabolism, cellular proliferation, and ribosome biogenesis, together with a downregulation of carbohydrate metabolism. Using real-time quantitative PCR, we validated the expression of eight target genes and identified significant differences in the regulation of several known estrogen-responsive transcripts in fish exposed to the lower treatment concentrations (including esr1 and zp2.5). We successfully used RNA-Seq to identify highly conserved responses to estrogen and also identified some estrogen-responsive transcripts that have been less well characterized, including nots and tgm2l. These results demonstrate the potential application of RNA-Seq as a valuable tool for assessing mechanistic effects of pollutants in ecologically relevant species for which little genomic information is available.
Collapse
Affiliation(s)
- Tamsyn M Uren Webster
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter, United Kingdom
| | - Janice A Shears
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter, United Kingdom
| | - Karen Moore
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter, United Kingdom
| | - Eduarda M Santos
- Biosciences, College of Life & Environmental Sciences, University of Exeter, Exeter, United Kingdom
| |
Collapse
|
600
|
Hensman J, Papastamoulis P, Glaus P, Honkela A, Rattray M. Fast and accurate approximate inference of transcript expression from RNA-seq data. Bioinformatics 2015; 31:3881-9. [PMID: 26315907 PMCID: PMC4673974 DOI: 10.1093/bioinformatics/btv483] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 08/07/2015] [Indexed: 11/25/2022] Open
Abstract
Motivation: Assigning RNA-seq reads to their transcript of origin is a fundamental task in transcript expression estimation. Where ambiguities in assignments exist due to transcripts sharing sequence, e.g. alternative isoforms or alleles, the problem can be solved through probabilistic inference. Bayesian methods have been shown to provide accurate transcript abundance estimates compared with competing methods. However, exact Bayesian inference is intractable and approximate methods such as Markov chain Monte Carlo and Variational Bayes (VB) are typically used. While providing a high degree of accuracy and modelling flexibility, standard implementations can be prohibitively slow for large datasets and complex transcriptome annotations. Results: We propose a novel approximate inference scheme based on VB and apply it to an existing model of transcript expression inference from RNA-seq data. Recent advances in VB algorithmics are used to improve the convergence of the algorithm beyond the standard Variational Bayes Expectation Maximization algorithm. We apply our algorithm to simulated and biological datasets, demonstrating a significant increase in speed with only very small loss in accuracy of expression level estimation. We carry out a comparative study against seven popular alternative methods and demonstrate that our new algorithm provides excellent accuracy and inter-replicate consistency while remaining competitive in computation time. Availability and implementation: The methods were implemented in R and C++, and are available as part of the BitSeq project at github.com/BitSeq. The method is also available through the BitSeq Bioconductor package. The source code to reproduce all simulation results can be accessed via github.com/BitSeq/BitSeqVB_benchmarking. Contact:james.hensman@sheffield.ac.uk or panagiotis.papastamoulis@manchester.ac.uk or Magnus.Rattray@manchester.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- James Hensman
- Sheffield Institute for Translational Neuroscience (SITraN), Sheffield, UK
| | | | - Peter Glaus
- School of Computer Science, The University of Manchester, Manchester, UK and
| | - Antti Honkela
- Helsinki Institute for Information Technology (HIIT), Department of Computer Science, University of Helsinki, Helsinki, Finland
| | | |
Collapse
|