Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Borozan I, Watt SN, Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq. PLoS One 2013;8:e76935. [PMID: 24204709 PMCID: PMC3813700 DOI: 10.1371/journal.pone.0076935] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/04/2013] [Indexed: 01/02/2023] Open

For:	Borozan I, Watt SN, Ferretti V. Evaluation of alignment algorithms for discovery and identification of pathogens using RNA-Seq. PLoS One 2013;8:e76935. [PMID: 24204709 PMCID: PMC3813700 DOI: 10.1371/journal.pone.0076935] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Accepted: 09/04/2013] [Indexed: 01/02/2023] Open

Number

Cited by Other Article(s)

Li Y, Yang R. PxBLAT: an efficient python binding library for BLAT. BMC Bioinformatics 2024;25:219. [PMID: 38898394 DOI: 10.1186/s12859-024-05844-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 06/13/2024] [Indexed: 06/21/2024] Open

Amano N, Narumi S, Aizu K, Miyazawa M, Okamura K, Ohashi H, Katsumata N, Ishii T, Hasegawa T. Single-Exon Deletions of ZNRF3 Exon 2 Cause Congenital Adrenal Hypoplasia. J Clin Endocrinol Metab 2024;109:641-648. [PMID: 37878959 DOI: 10.1210/clinem/dgad627] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/11/2023] [Revised: 10/17/2023] [Accepted: 10/18/2023] [Indexed: 10/27/2023]

Abstract

CONTEXT

Primary adrenal insufficiency (PAI) is a life-threatening condition characterized by the inability of the adrenal cortex to produce sufficient steroid hormones. E3 ubiquitin protein ligase zinc and ring finger 3 (ZNRF3) is a negative regulator of Wnt/β-catenin signaling. R-spondin 1 (RSPO1) enhances Wnt/β-catenin signaling via binding and removal of ZNRF3 from the cell surface.

OBJECTIVE

This work aimed to explore a novel genetic form of PAI.

METHODS

We analyzed 9 patients with childhood-onset PAI of biochemically and genetically unknown etiology using array comparative genomic hybridization. To examine the functionality of the identified single-exon deletions of ZNRF3 exon 2, we performed three-dimensional (3D) structure modeling and in vitro functional studies.

RESULTS

We identified various-sized single-exon deletions encompassing ZNRF3 exon 2 in 3 patients who showed neonatal-onset adrenal hypoplasia with glucocorticoid and mineralocorticoid deficiencies. Reverse-transcriptase polymerase chain reaction (RT-PCR) analysis showed that the 3 distinct single-exon deletions were commonly transcribed into a 126-nucleotide deleted mRNA and translated into 42-amino acid deleted protein (ΔEx2-ZNRF3). Based on 3D structure modeling, we predicted that interaction between ZNRF3 and RSPO1 would be disturbed in ΔEx2-ZNRF3, suggesting loss of RSPO1-dependent activation of Wnt/β-catenin signaling. Cell-based functional assays with the TCF-LEF reporter showed that RSPO1-dependent activation of Wnt/β-catenin signaling was attenuated in cells expressing ΔEx2-ZNRF3 as compared with those expressing wild-type ZNRF3.

CONCLUSION

We provided genetic evidence linking deletions encompassing ZNRF3 exon 2 and congenital adrenal hypoplasia, which might be related to constitutive inactivation of Wnt/β-catenin signaling by ΔEx2-ZNRF3.

Collapse

Mazur FG, Morinisi LM, Martins JO, Guerra PPB, Freire CCM. Exploring Virome Diversity in Public Data in South America as an Approach for Detecting Viral Sources From Potentially Emerging Viruses. Front Genet 2022;12:722857. [PMID: 35126446 PMCID: PMC8814814 DOI: 10.3389/fgene.2021.722857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2021] [Accepted: 11/29/2021] [Indexed: 11/13/2022] Open

Abstract

The South American continent presents a great diversity of biomes, whose ecosystems are constantly threatened by the expansion of human activity. The emergence and re-emergence of viral populations with impact on the human population and ecosystem have shown increases in the last decades. In deference to the growing accumulation of genomic data, we explore the potential of South American-related public databases to detect signals that contribute to virosphere research. Therefore, our study aims to investigate public databases with emphasis on the surveillance of viruses with medical and ecological relevance. Herein, we profiled 120 "sequence read archives" metagenomes from 19 independent projects from the last decade. In a coarse view, our analyses identified only 0.38% of the total number of sequences from viruses, showing a higher proportion of RNA viruses. The metagenomes with the most important viral sequences in the analyzed environmental models were 1) aquatic samples from the Amazon River, 2) sewage from Brasilia, and 3) soil from the state of São Paulo, while the models of animal transmission were detected in mosquitoes from Rio Janeiro and Bats from Amazonia. Also, the classification of viral signals into operational taxonomic units (OTUs) (family) allowed us to infer from metadata a probable host range in the virome detected in each sample analyzed. Further, several motifs and viral sequences are related to specific viruses with emergence potential from Togaviridae, Arenaviridae, and Flaviviridae families. In this context, the exploration of public databases allowed us to evaluate the scope and informative capacity of sequences from third-party public databases and to detect signals related to viruses of clinical or environmental importance, which allowed us to infer traits associated with probable transmission routes or signals of ecological disequilibrium. The evaluation of our results showed that in most cases the size and type of the reference database, the percentage of guanine-cytosine (GC), and the length of the query sequences greatly influence the taxonomic classification of the sequences. In sum, our findings describe how the exploration of public genomic data can be exploited as an approach for epidemiological surveillance and the understanding of the virosphere.

Collapse

Fabiańska I, Borutzki S, Richter B, Tran HQ, Neubert A, Mayer D. LABRADOR-A Computational Workflow for Virus Detection in High-Throughput Sequencing Data. Viruses 2021;13:v13122541. [PMID: 34960810 PMCID: PMC8704571 DOI: 10.3390/v13122541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2021] [Revised: 12/13/2021] [Accepted: 12/16/2021] [Indexed: 11/16/2022] Open

Identifying proximal RNA interactions from cDNA-encoded crosslinks with ShapeJumper. PLoS Comput Biol 2021;17:e1009632. [PMID: 34905538 PMCID: PMC8670686 DOI: 10.1371/journal.pcbi.1009632] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Accepted: 11/11/2021] [Indexed: 01/07/2023] Open

Truong Nguyen PT, Plyusnin I, Sironen T, Vapalahti O, Kant R, Smura T. HAVoC, a bioinformatic pipeline for reference-based consensus assembly and lineage assignment for SARS-CoV-2 sequences. BMC Bioinformatics 2021;22:373. [PMID: 34273961 PMCID: PMC8285700 DOI: 10.1186/s12859-021-04294-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 07/08/2021] [Indexed: 01/21/2023] Open

Zeng X, Zhao L, Shen C, Zhou Y, Li G, Sung WK. HIVID2: an accurate tool to detect virus integrations in the host genome. Bioinformatics 2021;37:1821-1827. [PMID: 33453108 DOI: 10.1093/bioinformatics/btab031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 12/27/2020] [Accepted: 01/12/2021] [Indexed: 12/11/2022] Open

Abstract

MOTIVATION

Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task. However, most existing tools show limited specificity and sensitivity. Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes.

RESULTS

Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID. HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads. The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence. HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods. The advantage of our method was demonstrated using both simulated and real data sets. HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data. In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations. By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity. We believe that HIVID2 will help in enhancing future research associated with virus integration.

AVAILABILITY

HIVID2 can be accessed at https://github.com/zengxi-hada/HIVID2/.

CONTACT

Xi Zeng (zengxi@mail.hzau.edu.cn), Linghao Zhao (michael_yifan@126.com).

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

Collapse

Systematic comparison and assessment of RNA-seq procedures for gene expression quantitative analysis. Sci Rep 2020;10:19737. [PMID: 33184454 PMCID: PMC7665074 DOI: 10.1038/s41598-020-76881-x] [Citation(s) in RCA: 91] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Accepted: 11/03/2020] [Indexed: 01/16/2023] Open

Tong L, Wu PY, Phan JH, Hassazadeh HR, Tong W, Wang MD. Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction. Sci Rep 2020;10:17925. [PMID: 33087762 PMCID: PMC7578822 DOI: 10.1038/s41598-020-74567-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2019] [Accepted: 08/27/2020] [Indexed: 11/23/2022] Open

Noguera-Julian M, Lee ER, Shafer RW, Kantor R, Ji H. Dry Panels Supporting External Quality Assessment Programs for Next Generation Sequencing-Based HIV Drug Resistance Testing. Viruses 2020;12:v12060666. [PMID: 32575676 PMCID: PMC7354622 DOI: 10.3390/v12060666] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2020] [Revised: 06/18/2020] [Accepted: 06/18/2020] [Indexed: 12/18/2022] Open

Zapatka M, Borozan I, Brewer DS, Iskar M, Grundhoff A, Alawi M, Desai N, Sültmann H, Moch H, Cooper CS, Eils R, Ferretti V, Lichter P. The landscape of viral associations in human cancers. Nat Genet 2020;52:320-330. [PMID: 32025001 PMCID: PMC8076016 DOI: 10.1038/s41588-019-0558-9] [Citation(s) in RCA: 220] [Impact Index Per Article: 55.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2018] [Accepted: 11/22/2019] [Indexed: 12/30/2022]

Affiliation(s)

Marc Zapatka Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Ivan Borozan Informatics and Bio-computing Program, Ontario Institute for Cancer Research, Toronto, Ontario, Canada
Daniel S Brewer Norwich Medical School, University of East Anglia, Norwich, UK Earlham Institute, Norwich, UK
Murat Iskar Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany
Adam Grundhoff Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany German Center for Infection Research (DZIF), Partner Site Hamburg-Borstel-Lübeck-Riems, Hamburg, Germany
Malik Alawi Heinrich-Pette-Institute, Leibniz Institute for Experimental Virology, Hamburg, Germany Bioinformatics Core, University Medical Center Hamburg-Eppendorf, Hamburg, Germany
Nikita Desai Bioinformatics Group, Department of Computer Science, University College London, London, UK Biomedical Data Science Laboratory, Francis Crick Institute, London, UK
Holger Sültmann National Center for Tumor Diseases (NCT) Heidelberg, Heidelberg, Germany Division of Cancer Genome Research, German Cancer Research Center (DKFZ), Heidelberg, Germany German Cancer Consortium (DKTK), Heidelberg, Germany
Holger Moch Department of Pathology and Molecular Pathology, University and University Hospital Zürich, Zurich, Switzerland
Colin S Cooper Norwich Medical School, University of East Anglia, Norwich, UK Earlham Institute, Norwich, UK Institute of Cancer Research, London, UK University of East Anglia, Norwich, UK
Roland Eils Division of Theoretical Bioinformatics, German Cancer Research Center (DKFZ), Heidelberg, Germany Department of Bioinformatics and Functional Genomics, Institute of Pharmacy and Molecular Biotechnology, Heidelberg University and BioQuant Center, Heidelberg, Germany Center for Digital Health, Berlin Institute of Health and Charité Universitätsmedizin Berlin, Berlin, Germany
Vincent Ferretti Ontario Institute for Cancer Research, MaRS Centre, Toronto, Ontario, Canada Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Québec, Canada
Peter Lichter Division of Molecular Genetics, German Cancer Research Center (DKFZ), Heidelberg, Germany. German Cancer Consortium (DKTK), Heidelberg, Germany.

Collapse

Reich DP, Bass BL. Mapping the dsRNA World. Cold Spring Harb Perspect Biol 2019;11:11/3/a035352. [PMID: 30824577 DOI: 10.1101/cshperspect.a035352] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Wang M, Kong L. pblat: a multithread blat algorithm speeding up aligning sequences to genomes. BMC Bioinformatics 2019;20:28. [PMID: 30646844 PMCID: PMC6334396 DOI: 10.1186/s12859-019-2597-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 01/03/2019] [Indexed: 11/17/2022] Open

Analysis of Epstein-Barr Virus Genomes and Expression Profiles in Gastric Adenocarcinoma. J Virol 2018;92:JVI.01239-17. [PMID: 29093097 DOI: 10.1128/jvi.01239-17] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 10/05/2017] [Indexed: 01/10/2023] Open

Abstract

Epstein-Barr virus (EBV) is a causative agent of a variety of lymphomas, nasopharyngeal carcinoma (NPC), and ∼9% of gastric carcinomas (GCs). An important question is whether particular EBV variants are more oncogenic than others, but conclusions are currently hampered by the lack of sequenced EBV genomes. Here, we contribute to this question by mining whole-genome sequences of 201 GCs to identify 13 EBV-positive GCs and by assembling 13 new EBV genome sequences, almost doubling the number of available GC-derived EBV genome sequences and providing the first non-Asian EBV genome sequences from GC. Whole-genome sequence comparisons of all EBV isolates sequenced to date (85 from tumors and 57 from healthy individuals) showed that most GC and NPC EBV isolates were closely related although American Caucasian GC samples were more distant, suggesting a geographical component. However, EBV GC isolates were found to contain some consistent changes in protein sequences regardless of geographical origin. In addition, transcriptome data available for eight of the EBV-positive GCs were analyzed to determine which EBV genes are expressed in GC. In addition to the expected latency proteins (EBNA1, LMP1, and LMP2A), specific subsets of lytic genes were consistently expressed that did not reflect a typical lytic or abortive lytic infection, suggesting a novel mechanism of EBV gene regulation in the context of GC. These results are consistent with a model in which a combination of specific latent and lytic EBV proteins promotes tumorigenesis.IMPORTANCE Epstein-Barr virus (EBV) is a widespread virus that causes cancer, including gastric carcinoma (GC), in a small subset of individuals. An important question is whether particular EBV variants are more cancer associated than others, but more EBV sequences are required to address this question. Here, we have generated 13 new EBV genome sequences from GC, almost doubling the number of EBV sequences from GC isolates and providing the first EBV sequences from non-Asian GC. We further identify sequence changes in some EBV proteins common to GC isolates. In addition, gene expression analysis of eight of the EBV-positive GCs showed consistent expression of both the expected latency proteins and a subset of lytic proteins that was not consistent with typical lytic or abortive lytic expression. These results suggest that novel mechanisms activate expression of some EBV lytic proteins and that their expression may contribute to oncogenesis.

Collapse

Cox JW, Ballweg RA, Taft DH, Velayutham P, Haslam DB, Porollo A. A fast and robust protocol for metataxonomic analysis using RNAseq data. MICROBIOME 2017;5:7. [PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/05/2016] [Indexed: 05/03/2023]

Abstract

BACKGROUND

Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data.

RESULTS

We present a new protocol, IMSA+A, for accurate taxonomy classification based on metatranscriptome data of any read length that can efficiently and robustly identify bacteria, fungi, and viruses in the same sample. The new protocol improves accuracy by using a conservative reference database, employing a new counting scheme, and by assembling shotgun reads. Assembly also reduces analysis runtime. Simulated data were utilized to evaluate the protocol by permuting common experimental variables. When applied to the real metatranscriptome data for mouse intestines colonized by ASF, the protocol showed superior performance in detection of the microorganisms compared to the existing metataxonomics tools. IMSA+A is available at https://github.com/JeremyCoxBMI/IMSA-A .

CONCLUSIONS

The developed protocol addresses the need for taxonomy classification from RNAseq data. Previously not utilized, i.e., unmapped to a reference genome, RNAseq reads can now be used to gather taxonomic information about the microbiota present in a biological sample without conducting additional sequencing. Any metatranscriptome pipeline that includes assembly of reads can add this analysis with minimal additional cost of compute time. The new protocol also creates an opportunity to revisit old metatranscriptome data, where taxonomic content may be important but was not analyzed.

Collapse

Hjelmsø MH, Hellmér M, Fernandez-Cassi X, Timoneda N, Lukjancenko O, Seidel M, Elsässer D, Aarestrup FM, Löfström C, Bofill-Mas S, Abril JF, Girones R, Schultz AC. Evaluation of Methods for the Concentration and Extraction of Viruses from Sewage in the Context of Metagenomic Sequencing. PLoS One 2017;12:e0170199. [PMID: 28099518 PMCID: PMC5242460 DOI: 10.1371/journal.pone.0170199] [Citation(s) in RCA: 92] [Impact Index Per Article: 13.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2016] [Accepted: 01/02/2017] [Indexed: 01/18/2023] Open

Abstract

Viral sewage metagenomics is a novel field of study used for surveillance, epidemiological studies, and evaluation of waste water treatment efficiency. In raw sewage human waste is mixed with household, industrial and drainage water, and virus particles are, therefore, only found in low concentrations. This necessitates a step of sample concentration to allow for sensitive virus detection. Additionally, viruses harbor a large diversity of both surface and genome structures, which makes universal viral genomic extraction difficult. Current studies have tackled these challenges in many different ways employing a wide range of viral concentration and extraction procedures. However, there is limited knowledge of the efficacy and inherent biases associated with these methods in respect to viral sewage metagenomics, hampering the development of this field. By the use of next generation sequencing this study aimed to evaluate the efficiency of four commonly applied viral concentrations techniques (precipitation with polyethylene glycol, organic flocculation with skim milk, monolithic adsorption filtration and glass wool filtration) and extraction methods (Nucleospin RNA XS, QIAamp Viral RNA Mini Kit, NucliSENS^® miniMAG^®, or PowerViral^® Environmental RNA/DNA Isolation Kit) to determine the viriome in a sewage sample. We found a significant influence of concentration and extraction protocols on the detected viriome. The viral richness was largest in samples extracted with QIAamp Viral RNA Mini Kit or PowerViral^® Environmental RNA/DNA Isolation Kit. Highest viral specificity were found in samples concentrated by precipitation with polyethylene glycol or extracted with Nucleospin RNA XS. Detection of viral pathogens depended on the method used. These results contribute to the understanding of method associated biases, within the field of viral sewage metagenomics, making evaluation of the current literature easier and helping with the design of future studies.

Collapse

Affiliation(s)

Mathis Hjort Hjelmsø Research Group for Genomic Epidemiology, The National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark * E-mail:
Maria Hellmér Division of Microbiology and Production, The National Food Institute, Technical University of Denmark, Søborg, Denmark
Xavier Fernandez-Cassi Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology, and Statistics, University of Barcelona, Barcelona, Catalonia, Spain
Natàlia Timoneda Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology, and Statistics, University of Barcelona, Barcelona, Catalonia, Spain Institute of Biomedicine of the University of Barcelona, University of Barcelona, Barcelona, Catalonia, Spain
Oksana Lukjancenko Research Group for Genomic Epidemiology, The National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
Michael Seidel Institute of Hydrochemistry, Chair of Analytical Chemistry, Technical University of Munich, Munich, Germany
Dennis Elsässer Institute of Hydrochemistry, Chair of Analytical Chemistry, Technical University of Munich, Munich, Germany
Frank M. Aarestrup Research Group for Genomic Epidemiology, The National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
Charlotta Löfström Division of Microbiology and Production, The National Food Institute, Technical University of Denmark, Søborg, Denmark
Sílvia Bofill-Mas Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology, and Statistics, University of Barcelona, Barcelona, Catalonia, Spain
Josep F. Abril Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology, and Statistics, University of Barcelona, Barcelona, Catalonia, Spain Institute of Biomedicine of the University of Barcelona, University of Barcelona, Barcelona, Catalonia, Spain
Rosina Girones Laboratory of Virus Contaminants of Water and Food, Department of Genetics, Microbiology, and Statistics, University of Barcelona, Barcelona, Catalonia, Spain
Anna Charlotte Schultz Division of Microbiology and Production, The National Food Institute, Technical University of Denmark, Søborg, Denmark

Collapse

Brumme CJ, Poon AFY. Promises and pitfalls of Illumina sequencing for HIV resistance genotyping. Virus Res 2016;239:97-105. [PMID: 27993623 DOI: 10.1016/j.virusres.2016.12.008] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 12/15/2016] [Accepted: 12/15/2016] [Indexed: 12/13/2022]

Ali R, Blackburn RM, Kozlakidis Z. Next-Generation Sequencing and Influenza Virus: A Short Review of the Published Implementation Attempts. HAYATI JOURNAL OF BIOSCIENCES 2016. [DOI: 10.1016/j.hjb.2016.12.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022] Open

Pinthong W, Muangruen P, Suriyaphol P, Mairiang D. A simple grid implementation with Berkeley Open Infrastructure for Network Computing using BLAST as a model. PeerJ 2016;4:e2248. [PMID: 27547555 PMCID: PMC4974928 DOI: 10.7717/peerj.2248] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2016] [Accepted: 06/22/2016] [Indexed: 11/20/2022] Open

Flygare S, Simmon K, Miller C, Qiao Y, Kennedy B, Di Sera T, Graf EH, Tardif KD, Kapusta A, Rynearson S, Stockmann C, Queen K, Tong S, Voelkerding KV, Blaschke A, Byington CL, Jain S, Pavia A, Ampofo K, Eilbeck K, Marth G, Yandell M, Schlaberg R. Taxonomer: an interactive metagenomics analysis portal for universal pathogen detection and host mRNA expression profiling. Genome Biol 2016;17:111. [PMID: 27224977 PMCID: PMC4880956 DOI: 10.1186/s13059-016-0969-1] [Citation(s) in RCA: 110] [Impact Index Per Article: 13.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Accepted: 04/27/2016] [Indexed: 02/07/2023] Open

Affiliation(s)

Steven Flygare Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Keith Simmon Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA
Chase Miller Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Yi Qiao Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Brett Kennedy Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Tonya Di Sera Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Erin H Graf Department of Pathology, University of Utah, Salt Lake City, UT, USA
Keith D Tardif ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
Aurélie Kapusta Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Shawn Rynearson Department of Human Genetics, University of Utah, Salt Lake City, UT, USA
Chris Stockmann Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
Krista Queen Centers for Disease Control and Prevention, Atlanta, GA, USA
Suxiang Tong Centers for Disease Control and Prevention, Atlanta, GA, USA
Karl V Voelkerding Department of Pathology, University of Utah, Salt Lake City, UT, USA.,ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA
Anne Blaschke Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
Carrie L Byington Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
Seema Jain Centers for Disease Control and Prevention, Atlanta, GA, USA
Andrew Pavia Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
Krow Ampofo Department of Pediatrics, University of Utah, Salt Lake City, UT, USA
Karen Eilbeck Department of Biomedical Informatics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA
Gabor Marth Department of Human Genetics, University of Utah, Salt Lake City, UT, USA.,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA
Mark Yandell Department of Human Genetics, University of Utah, Salt Lake City, UT, USA. .,USTAR Center for Genetic Discovery, Salt Lake City, UT, USA.
Robert Schlaberg Department of Pathology, University of Utah, Salt Lake City, UT, USA. .,ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, UT, USA.

Collapse

Zhao S, Xi L, Quan J, Xi H, Zhang Y, von Schack D, Vincent M, Zhang B. QuickRNASeq lifts large-scale RNA-seq data analyses to the next level of automation and interactive visualization. BMC Genomics 2016;17:39. [PMID: 26747388 PMCID: PMC4706714 DOI: 10.1186/s12864-015-2356-9] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2015] [Accepted: 12/23/2015] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

RNA sequencing (RNA-seq), a next-generation sequencing technique for transcriptome profiling, is being increasingly used, in part driven by the decreasing cost of sequencing. Nevertheless, the analysis of the massive amounts of data generated by large-scale RNA-seq remains a challenge. Multiple algorithms pertinent to basic analyses have been developed, and there is an increasing need to automate the use of these tools so as to obtain results in an efficient and user friendly manner. Increased automation and improved visualization of the results will help make the results and findings of the analyses readily available to experimental scientists.

RESULTS

By combing the best open source tools developed for RNA-seq data analyses and the most advanced web 2.0 technologies, we have implemented QuickRNASeq, a pipeline for large-scale RNA-seq data analyses and visualization. The QuickRNASeq workflow consists of three main steps. In Step #1, each individual sample is processed, including mapping RNA-seq reads to a reference genome, counting the numbers of mapped reads, quality control of the aligned reads, and SNP (single nucleotide polymorphism) calling. Step #1 is computationally intensive, and can be processed in parallel. In Step #2, the results from individual samples are merged, and an integrated and interactive project report is generated. All analyses results in the report are accessible via a single HTML entry webpage. Step #3 is the data interpretation and presentation step. The rich visualization features implemented here allow end users to interactively explore the results of RNA-seq data analyses, and to gain more insights into RNA-seq datasets. In addition, we used a real world dataset to demonstrate the simplicity and efficiency of QuickRNASeq in RNA-seq data analyses and interactive visualizations. The seamless integration of automated capabilites with interactive visualizations in QuickRNASeq is not available in other published RNA-seq pipelines.

CONCLUSION

The high degree of automation and interactivity in QuickRNASeq leads to a substantial reduction in the time and effort required prior to further downstream analyses and interpretation of the analyses findings. QuickRNASeq advances primary RNA-seq data analyses to the next level of automation, and is mature for public release and adoption.

Collapse

Zhao S, Xi L, Zhang B. Union Exon Based Approach for RNA-Seq Gene Quantification: To Be or Not to Be? PLoS One 2015;10:e0141910. [PMID: 26559532 PMCID: PMC4641603 DOI: 10.1371/journal.pone.0141910] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/26/2015] [Accepted: 10/14/2015] [Indexed: 11/24/2022] Open

Abstract

In recent years, RNA-seq is emerging as a powerful technology in estimation of gene and/or transcript expression, and RPKM (Reads Per Kilobase per Million reads) is widely used to represent the relative abundance of mRNAs for a gene. In general, the methods for gene quantification can be largely divided into two categories: transcript-based approach and ‘union exon’-based approach. Transcript-based approach is intrinsically more difficult because different isoforms of the gene typically have a high proportion of genomic overlap. On the other hand, ‘union exon’-based approach method is much simpler and thus widely used in RNA-seq gene quantification. Biologically, a gene is expressed in one or more transcript isoforms. Therefore, transcript-based approach is logistically more meaningful than ‘union exon’-based approach. Despite the fact that gene quantification is a fundamental task in most RNA-seq studies, however, it remains unclear whether ‘union exon’-based approach for RNA-seq gene quantification is a good practice or not. In this paper, we carried out a side-by-side comparison of ‘union exon’-based approach and transcript-based method in RNA-seq gene quantification. It was found that the gene expression levels are significantly underestimated by ‘union exon’-based approach, and the average of RPKM from ‘union exons’-based method is less than 50% of the mean expression obtained from transcript-based approach. The difference between the two approaches is primarily affected by the number of transcripts in a gene. We performed differential analysis at both gene and transcript levels, respectively, and found more insights, such as isoform switches, are gained from isoform differential analysis. The accuracy of isoform quantification would improve if the read coverage pattern and exon-exon spanning reads are taken into account and incorporated into EM (Expectation Maximization) algorithm. Our investigation discourages the use of ‘union exons’-based approach in gene quantification despite its simplicity.

Collapse

Shuda M, Guastafierro A, Geng X, Shuda Y, Ostrowski SM, Lukianov S, Jenkins FJ, Honda K, Maricich SM, Moore PS, Chang Y. Merkel Cell Polyomavirus Small T Antigen Induces Cancer and Embryonic Merkel Cell Proliferation in a Transgenic Mouse Model. PLoS One 2015;10:e0142329. [PMID: 26544690 PMCID: PMC4636375 DOI: 10.1371/journal.pone.0142329] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2015] [Accepted: 10/19/2015] [Indexed: 01/30/2023] Open

Haque MM, Bose T, Dutta A, Reddy CVSK, Mande SS. CS-SCORE: Rapid identification and removal of human genome contaminants from metagenomic datasets. Genomics 2015;106:116-21. [DOI: 10.1016/j.ygeno.2015.04.005] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2015] [Revised: 04/09/2015] [Accepted: 04/26/2015] [Indexed: 02/01/2023]

Whipple JM, Youssef OA, Aruscavage PJ, Nix DA, Hong C, Johnson WE, Bass BL. Genome-wide profiling of the C. elegans dsRNAome. RNA (NEW YORK, N.Y.) 2015;21:786-800. [PMID: 25805852 PMCID: PMC4408787 DOI: 10.1261/rna.048801.114] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2014] [Accepted: 12/23/2014] [Indexed: 06/01/2023]

Zhao S, Zhang B. A comprehensive evaluation of ensembl, RefSeq, and UCSC annotations in the context of RNA-seq read mapping and gene quantification. BMC Genomics 2015;16:97. [PMID: 25765860 PMCID: PMC4339237 DOI: 10.1186/s12864-015-1308-8] [Citation(s) in RCA: 81] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2014] [Accepted: 01/30/2015] [Indexed: 01/09/2023] Open

Abstract

Background

RNA-Seq has become increasingly popular in transcriptome profiling. One aspect of transcriptome research is to quantify the expression levels of genomic elements, such as genes, their transcripts and exons. Acquiring a transcriptome expression profile requires genomic elements to be defined in the context of the genome. Multiple human genome annotation databases exist, including RefGene (RefSeq Gene), Ensembl, and the UCSC annotation database. The impact of the choice of an annotation on estimating gene expression remains insufficiently investigated.

Results

In this paper, we systematically characterized the impact of genome annotation choice on read mapping and transcriptome quantification by analyzing a RNA-Seq dataset generated by the Human Body Map 2.0 Project. The impact of a gene model on mapping of non-junction reads is different from junction reads. For the RNA-Seq dataset with a read length of 75 bp, on average, 95% of non-junction reads were mapped to exactly the same genomic location regardless of which gene models was used. By contrast, this percentage dropped to 53% for junction reads. In addition, about 30% of junction reads failed to align without the assistance of a gene model, while 10–15% mapped alternatively. There are 21,958 common genes among RefGene, Ensembl, and UCSC annotations. When we compared the gene quantification results in RefGene and Ensembl annotations, 20% of genes are not expressed, and thus have a zero count in both annotations. Surprisingly, identical gene quantification results were obtained for only 16.3% (about one sixth) of genes. Approximately 28.1% of genes’ expression levels differed by 5% or higher, and of those, the relative expression levels for 9.3% of genes (equivalent to 2038) differed by 50% or greater. The case studies revealed that the gene definition differences in gene models frequently result in inconsistency in gene quantification.

Conclusions

We demonstrated that the choice of a gene model has a dramatic effect on both gene quantification and differential analysis. Our research will help RNA-Seq data analysts to make an informed choice of gene model in practical RNA-Seq data analysis.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-015-1308-8) contains supplementary material, which is available to authorized users.

Collapse

Brister JR, Ako-Adjei D, Bao Y, Blinkova O. NCBI viral genomes resource. Nucleic Acids Res 2014;43:D571-7. [PMID: 25428358 PMCID: PMC4383986 DOI: 10.1093/nar/gku1207] [Citation(s) in RCA: 374] [Impact Index Per Article: 37.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Kuhn JH, Andersen KG, Bào Y, Bavari S, Becker S, Bennett RS, Bergman NH, Blinkova O, Bradfute S, Brister JR, Bukreyev A, Chandran K, Chepurnov AA, Davey RA, Dietzgen RG, Doggett NA, Dolnik O, Dye JM, Enterlein S, Fenimore PW, Formenty P, Freiberg AN, Garry RF, Garza NL, Gire SK, Gonzalez JP, Griffiths A, Happi CT, Hensley LE, Herbert AS, Hevey MC, Hoenen T, Honko AN, Ignatyev GM, Jahrling PB, Johnson JC, Johnson KM, Kindrachuk J, Klenk HD, Kobinger G, Kochel TJ, Lackemeyer MG, Lackner DF, Leroy EM, Lever MS, Mühlberger E, Netesov SV, Olinger GG, Omilabu SA, Palacios G, Panchal RG, Park DJ, Patterson JL, Paweska JT, Peters CJ, Pettitt J, Pitt L, Radoshitzky SR, Ryabchikova EI, Saphire EO, Sabeti PC, Sealfon R, Shestopalov AM, Smither SJ, Sullivan NJ, Swanepoel R, Takada A, Towner JS, van der Groen G, Volchkov VE, Volchkova VA, Wahl-Jensen V, Warren TK, Warfield KL, Weidmann M, Nichol ST. Filovirus RefSeq entries: evaluation and selection of filovirus type variants, type sequences, and names. Viruses 2014;6:3663-82. [PMID: 25256396 PMCID: PMC4189044 DOI: 10.3390/v6093663] [Citation(s) in RCA: 45] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2014] [Accepted: 09/23/2014] [Indexed: 12/14/2022] Open

Affiliation(s)

Jens H Kuhn Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Kristian G Andersen FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Yīmíng Bào Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Sina Bavari United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Stephan Becker Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
Richard S Bennett National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Nicholas H Bergman National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Olga Blinkova Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Steven Bradfute University of New Mexico, Albuquerque, NM 87131, USA.
J Rodney Brister Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Alexander Bukreyev Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
Kartik Chandran Department of Microbiology and Immunology, Albert Einstein College of Medicine, Bronx, NY 10461, USA.
Alexander A Chepurnov Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Robert A Davey Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Ralf G Dietzgen Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Norman A Doggett Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Olga Dolnik Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
John M Dye United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Sven Enterlein Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Paul W Fenimore Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Pierre Formenty Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Alexander N Freiberg Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
Robert F Garry Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Nicole L Garza United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Stephen K Gire FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Jean-Paul Gonzalez Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA. :
Anthony Griffiths Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Christian T Happi Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Lisa E Hensley Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Andrew S Herbert United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Michael C Hevey National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Thomas Hoenen Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Anna N Honko Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Georgy M Ignatyev FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Peter B Jahrling Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Joshua C Johnson Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Karl M Johnson FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Jason Kindrachuk Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Hans-Dieter Klenk Institut für Virologie, Philipps-Universität Marburg, 35043 Marburg, Germany.
Gary Kobinger FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Tadeusz J Kochel National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Matthew G Lackemeyer Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Daniel F Lackner National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Eric M Leroy FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Mark S Lever FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Elke Mühlberger FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Sergey V Netesov FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Gene G Olinger Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Sunday A Omilabu FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Gustavo Palacios United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Rekha G Panchal United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Daniel J Park FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Jean L Patterson Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Janusz T Paweska FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Clarence J Peters Department of Pathology and Galveston National Laboratory, University of Texas Medical Branch, Galveston, TX 77555, USA.
James Pettitt Integrated Research Facility at Fort Detrick, National Institute of Allergy and Infectious Diseases, National Institutes of Health, Fort Detrick, Frederick, MD 21702, USA.
Louise Pitt United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Sheli R Radoshitzky United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Elena I Ryabchikova Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Erica Ollmann Saphire Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Pardis C Sabeti FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Rachel Sealfon Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Aleksandr M Shestopalov Novosibirsk State University, Novosibirsk, Novosibirsk Region, Russia, 630090.
Sophie J Smither FAS Center for Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Nancy J Sullivan Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Robert Swanepoel Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Ayato Takada Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Jonathan S Towner Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Guido van der Groen Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Viktor E Volchkov Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Valentina A Volchkova Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Victoria Wahl-Jensen National Biodefense Analysis and Countermeasures Center, Fort Detrick, Frederick, MD 21702, USA.
Travis K Warren United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Kelly L Warfield Information Engineering Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
Manfred Weidmann United States Army Medical Research Institute of Infectious Diseases, Fort Detrick, Frederick, MD 21702, USA.
Stuart T Nichol IViral Special Pathogens Branch, Division of High-Consequence Pathogens Pathology, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, GA 30333, USA.

Collapse

Zhao S. Assessment of the impact of using a reference transcriptome in mapping short RNA-Seq reads. PLoS One 2014;9:e101374. [PMID: 24992027 PMCID: PMC4081564 DOI: 10.1371/journal.pone.0101374] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2014] [Accepted: 06/06/2014] [Indexed: 11/28/2022] Open

Abstract

RNA-Seq has become increasingly popular in transcriptome profiling. The major challenge in RNA-Seq data analysis is the accurate mapping of junction reads to their genomic origins. To detect splicing sites in short reads, many RNA-Seq aligners use reference transcriptome to inform placement of junction reads. However, no systematic evaluation has been performed to assess or quantify the benefits of incorporating reference transcriptome in mapping RNA-Seq reads. In this paper, we have studied the impact of reference transcriptome on mapping RNA-Seq reads, especially on junction ones. The same dataset were analysed with and without RefGene transcriptome, respectively. Then a Perl script was developed to analyse and compare the mapping results. It was found that about 50–55% junction reads can be mapped to the same genomic regions regardless of the usage of RefGene model. More than one-third of reads fail to be mapped without the help of a reference transcriptome. For “Alternatively” mapped reads, i.e., those reads mapped differently with and without RefGene model, the mappings without RefGene model are usually worse than their corresponding alignments with RefGene model. For junction reads that span more than two exons, it is less likely to align them correctly without the assistance of reference transcriptome. As the sequencing technology evolves, the read length is becoming longer and longer. When reads become longer, they are more likely to span multiple exons, and thus the mapping of long junction reads is actually becoming more and more challenging without the assistance of reference transcriptome. Therefore, the advantages of using reference transcriptome in the mapping demonstrated in this study are becoming more evident for longer reads. In addition, the effect of the completeness of reference transcriptome on mapping of RNA-Seq reads is discussed.

Collapse