1
|
Hu X, Hurtado-Gonzales OP, Adhikari BN, French-Monar RD, Malapi M, Foster JA, McFarland CD. PhytoPipe: a phytosanitary pipeline for plant pathogen detection and diagnosis using RNA-seq data. BMC Bioinformatics 2023; 24:470. [PMID: 38093207 PMCID: PMC10717670 DOI: 10.1186/s12859-023-05589-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 11/30/2023] [Indexed: 12/17/2023] Open
Abstract
BACKGROUND Detection of exotic plant pathogens and preventing their entry and establishment are critical for the protection of agricultural systems while securing the global trading of agricultural commodities. High-throughput sequencing (HTS) has been applied successfully for plant pathogen discovery, leading to its current application in routine pathogen detection. However, the analysis of massive amounts of HTS data has become one of the major challenges for the use of HTS more broadly as a rapid diagnostics tool. Several bioinformatics pipelines have been developed to handle HTS data with a focus on plant virus and viroid detection. However, there is a need for an integrative tool that can simultaneously detect a wider range of other plant pathogens in HTS data, such as bacteria (including phytoplasmas), fungi, and oomycetes, and this tool should also be capable of generating a comprehensive report on the phytosanitary status of the diagnosed specimen. RESULTS We have developed an open-source bioinformatics pipeline called PhytoPipe (Phytosanitary Pipeline) to provide the plant pathology diagnostician community with a user-friendly tool that integrates analysis and visualization of HTS RNA-seq data. PhytoPipe includes quality control of reads, read classification, assembly-based annotation, and reference-based mapping. The final product of the analysis is a comprehensive report for easy interpretation of not only viruses and viroids but also bacteria (including phytoplasma), fungi, and oomycetes. PhytoPipe is implemented in Snakemake workflow with Python 3 and bash scripts in a Linux environment. The source code for PhytoPipe is freely available and distributed under a BSD-3 license. CONCLUSIONS PhytoPipe provides an integrative bioinformatics pipeline that can be used for the analysis of HTS RNA-seq data. PhytoPipe is easily installed on a Linux or Mac system and can be conveniently used with a Docker image, which includes all dependent packages and software related to analyses. It is publicly available on GitHub at https://github.com/healthyPlant/PhytoPipe and on Docker Hub at https://hub.docker.com/r/healthyplant/phytopipe .
Collapse
Affiliation(s)
- Xiaojun Hu
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA.
| | - Oscar P Hurtado-Gonzales
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA
| | - Bishwo N Adhikari
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA
| | - Ronald D French-Monar
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA
| | - Martha Malapi
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA
- American Seed Trade Association (ASTA), Alexandria, VA, USA
| | - Joseph A Foster
- United States Department of Agriculture (USDA), Animal and Plant Health Inspection Service (APHIS), Plant Protection and Quarantine (PPQ), Plant Germplasm Quarantine Program (PGQP), Beltsville, MD, USA
| | | |
Collapse
|
2
|
Shin D, Kim J, Lee JH, Kim JI, Oh YM. Profiling of Microbial Landscape in Lung of Chronic Obstructive Pulmonary Disease Patients Using RNA Sequencing. Int J Chron Obstruct Pulmon Dis 2023; 18:2531-2542. [PMID: 38022823 PMCID: PMC10644840 DOI: 10.2147/copd.s426260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 10/30/2023] [Indexed: 12/01/2023] Open
Abstract
Purpose The aim of the study was to use RNA sequencing (RNA-seq) data of lung from chronic obstructive pulmonary disease (COPD) patients to identify the bacteria that are most commonly detected. Additionally, the study sought to investigate the differences in these infections between normal lung tissues and those affected by COPD. Patients and Methods We re-analyzed RNA-seq data of lung from 99 COPD patients and 93 non-COPD smokers to determine the extent to which the metagenomes differed between the two groups and to assess the reliability of the metagenomes. We used unmapped reads in the RNA-seq data that were not aligned to the human reference genome to identify more common infections in COPD patients. Results We identified 18 bacteria that exhibited significant differences between the COPD and non-COPD smoker groups. Among these, Yersinia enterocolitica was found to be more than 30% more abundant in COPD. Additionally, we observed difference in detection rate based on smoking history. To ensure the accuracy of our findings and distinguish them from false positives, we double-check the metagenomic profile using Basic Local Alignment Search Tool (BLAST). We were able to identify and remove specific species that might have been misclassified as other species in Kraken2 but were actually Staphylococcus aureus, as identified by BLAST analysis. Conclusion This study highlighted the method of using unmapped reads, which were not typically used in sequencing data, to identify microorganisms present in patients with lung diseases such as COPD. This method expanded our understanding of the microbial landscape in COPD and provided insights into the potential role of microorganisms in disease development and progression.
Collapse
Affiliation(s)
- Dongjin Shin
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Juhyun Kim
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea
| | - Jang Ho Lee
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| | - Jong-Il Kim
- Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea
- Department of Biochemistry and Molecular Biology, Seoul National University College of Medicine, Seoul, Republic of Korea
- Genomic Medicine Institute, Seoul National University, Seoul, Republic of Korea
- Seoul National University Cancer Research Institute, Seoul, Republic of Korea
| | - Yeon-Mok Oh
- Department of Pulmonary and Critical Care Medicine, Asan Medical Center, University of Ulsan College of Medicine, Seoul, Republic of Korea
| |
Collapse
|
3
|
Deshpande D, Chhugani K, Chang Y, Karlsberg A, Loeffler C, Zhang J, Muszyńska A, Munteanu V, Yang H, Rotman J, Tao L, Balliu B, Tseng E, Eskin E, Zhao F, Mohammadi P, P. Łabaj P, Mangul S. RNA-seq data science: From raw data to effective interpretation. Front Genet 2023; 14:997383. [PMID: 36999049 PMCID: PMC10043755 DOI: 10.3389/fgene.2023.997383] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2022] [Accepted: 02/24/2023] [Indexed: 03/14/2023] Open
Abstract
RNA sequencing (RNA-seq) has become an exemplary technology in modern biology and clinical science. Its immense popularity is due in large part to the continuous efforts of the bioinformatics community to develop accurate and scalable computational tools to analyze the enormous amounts of transcriptomic data that it produces. RNA-seq analysis enables genes and their corresponding transcripts to be probed for a variety of purposes, such as detecting novel exons or whole transcripts, assessing expression of genes and alternative transcripts, and studying alternative splicing structure. It can be a challenge, however, to obtain meaningful biological signals from raw RNA-seq data because of the enormous scale of the data as well as the inherent limitations of different sequencing technologies, such as amplification bias or biases of library preparation. The need to overcome these technical challenges has pushed the rapid development of novel computational tools, which have evolved and diversified in accordance with technological advancements, leading to the current myriad of RNA-seq tools. These tools, combined with the diverse computational skill sets of biomedical researchers, help to unlock the full potential of RNA-seq. The purpose of this review is to explain basic concepts in the computational analysis of RNA-seq data and define discipline-specific jargon.
Collapse
Affiliation(s)
- Dhrithi Deshpande
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Karishma Chhugani
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Yutong Chang
- Department of Pharmacology and Pharmaceutical Sciences, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Aaron Karlsberg
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Caitlin Loeffler
- Department of Computer Science, University of California, Los Angeles, CA, United States
| | - Jinyang Zhang
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
| | - Agata Muszyńska
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Institute of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Gliwice, Poland
| | - Viorel Munteanu
- Department of Computers, Informatics and Microelectronics, Technical University of Moldova, Chisinau, Moldova
| | - Harry Yang
- Department of Microbiology, Immunology and Molecular Genetics, University of California Los Angeles, Los Angeles, CA, United States
| | - Jeremy Rotman
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
| | - Laura Tao
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
| | - Brunilda Balliu
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
| | | | - Eleazar Eskin
- Department of Computer Science, University of California, Los Angeles, CA, United States
- Department of Computational Medicine, David Geffen School of Medicine at UCLA, CHS, Los Angeles, CA, United States
- Department of Human Genetics, David Geffen School of Medicine at UCLA, Los Angeles, CA, United States
| | - Fangqing Zhao
- Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing, China
- Key Laboratory of Systems Biology, Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Hangzhou, China
| | - Pejman Mohammadi
- Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, United States
| | - Paweł P. Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Department of Biotechnology, Boku University Vienna, Vienna, Austria
| | - Serghei Mangul
- Department of Clinical Pharmacy, USC Alfred E. Mann School of Pharmacy and Pharmaceutical Sciences, Los Angeles, CA, United States
- Department of Quantitative and Computational Biology, USC Dornsife College of Letters, Arts and Sciences, Los Angeles, CA, United States
- *Correspondence: Serghei Mangul,
| |
Collapse
|
4
|
Yin Q, Strong MJ, Zhuang Y, Flemington EK, Kaminski N, de Andrade JA, Lasky JA. Assessment of viral RNA in idiopathic pulmonary fibrosis using RNA-seq. BMC Pulm Med 2020; 20:81. [PMID: 32245461 PMCID: PMC7119082 DOI: 10.1186/s12890-020-1114-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2019] [Accepted: 03/13/2020] [Indexed: 11/23/2022] Open
Abstract
Background Numerous publications suggest an association between herpes virus infection and idiopathic pulmonary fibrosis (IPF). These reports have employed immunohistochemistry, in situ hybridization and/or PCR, which are susceptible to specificity artifacts. Methods We investigated the possible association between IPF and viral RNA expression using next-generation sequencing, which has the potential to provide a high degree of both sensitivity and specificity. We quantified viral RNA expression for 740 viruses in 28 IPF patient lung biopsy samples and 20 controls. Key RNA-seq results were confirmed using Real-time RT-PCR for select viruses (EBV, HCV, herpesvirus saimiri and HERV-K). Results We identified sporadic low-level evidence of viral infections in our lung tissue specimens, but did not find a statistical difference for expression of any virus, including EBV, herpesvirus saimiri and HERV-K, between IPF and control lungs. Conclusions To the best of our knowledge, this is the first publication that employs RNA-seq to assess whether viral infections are linked to the pathogenesis of IPF. Our results do not address the role of viral infection in acute exacerbations of IPF, however, this analysis patently did not support an association between herpes virus detection and IPF.
Collapse
Affiliation(s)
- Qinyan Yin
- Section of Pulmonary Diseases, Critical Care and Environmental Medicine, Department of Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, New Orleans, LA, 70112, USA
| | - Michael J Strong
- Department of Pathology and Laboratory Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, New Orleans, LA, 70112, USA
| | - Yan Zhuang
- Section of Pulmonary Diseases, Critical Care and Environmental Medicine, Department of Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, New Orleans, LA, 70112, USA
| | - Erik K Flemington
- Department of Pathology and Laboratory Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, New Orleans, LA, 70112, USA
| | - Naftali Kaminski
- Section of Pulmonary, Critical Care and Sleep Medicine, Yale University, 300 Cedar Street, Ste S441D, New Haven, CT, 06519, USA
| | - Joao A de Andrade
- Division of Allergy, Pulmonary, Critical Care Medicine, Department of Medicine, Vanderbilt University, 1161 21st Avenue South, B1317 MCN, Nashville, TN, 37232-2650, USA
| | - Joseph A Lasky
- Section of Pulmonary Diseases, Critical Care and Environmental Medicine, Department of Medicine, Tulane University School of Medicine, 1430 Tulane Avenue, New Orleans, LA, 70112, USA.
| |
Collapse
|
5
|
Joppich M, Zimmer R. From command-line bioinformatics to bioGUI. PeerJ 2019; 7:e8111. [PMID: 31772845 PMCID: PMC6875409 DOI: 10.7717/peerj.8111] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2019] [Accepted: 10/28/2019] [Indexed: 12/02/2022] Open
Abstract
Bioinformatics is a highly interdisciplinary field providing (bioinformatics) applications for scientists from many disciplines. Installing and starting applications on the command-line (CL) is inconvenient and/or inefficient for many scientists. Nonetheless, most methods are implemented with a command-line interface only. Providing a graphical user interface (GUI) for bioinformatics applications is one step toward routinely making CL-only applications available to more scientists and, thus, toward a more effective interdisciplinary work. With our bioGUI framework we address two main problems of using CL bioinformatics applications: First, many tools work on UNIX-systems only, while many scientists use Microsoft Windows. Second, scientists refrain from using CL tools which, however, could well support them in their research. With bioGUI install modules and templates, installing and using CL tools is made possible for most scientists-even on Windows, due to bioGUI's support for Windows Subsystem for Linux. In addition, bioGUI templates can easily be created, making the bioGUI framework highly rewarding for developers. From the bioGUI repository it is possible to download, install and use bioinformatics tools with just a few clicks.
Collapse
Affiliation(s)
- Markus Joppich
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| | - Ralf Zimmer
- Department of Informatics, Ludwig-Maximilians-Universität München, Munich, Germany
| |
Collapse
|
6
|
Park SJ, Onizuka S, Seki M, Suzuki Y, Iwata T, Nakai K. A systematic sequencing-based approach for microbial contaminant detection and functional inference. BMC Biol 2019; 17:72. [PMID: 31519179 PMCID: PMC6743104 DOI: 10.1186/s12915-019-0690-0] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Accepted: 08/20/2019] [Indexed: 12/16/2022] Open
Abstract
Background Microbial contamination poses a major difficulty for successful data analysis in biological and biomedical research. Computational approaches utilizing next-generation sequencing (NGS) data offer promising diagnostics to assess the presence of contaminants. However, as host cells are often contaminated by multiple microorganisms, these approaches require careful attention to intra- and interspecies sequence similarities, which have not yet been fully addressed. Results We present a computational approach that rigorously investigates the genomic origins of sequenced reads, including those mapped to multiple species that have been discarded in previous studies. Through the analysis of large-scale synthetic and public NGS samples, we estimate that 1000–100,000 contaminating microbial reads are detected per million host reads sequenced by RNA-seq. The microbe catalog we established included Cutibacterium as a prevalent contaminant, suggesting that contamination mostly originates from the laboratory environment. Importantly, by applying a systematic method to infer the functional impact of contamination, we revealed that host-contaminant interactions cause profound changes in the host molecular landscapes, as exemplified by changes in inflammatory and apoptotic pathways during Mycoplasma infection of lymphoma cells. Conclusions We provide a computational method for profiling microbial contamination on NGS data and suggest that sources of contamination in laboratory reagents and the experimental environment alter the molecular landscape of host cells leading to phenotypic changes. These findings reinforce the concept that precise determination of the origins and functional impacts of contamination is imperative for quality research and illustrate the usefulness of the proposed approach to comprehensively characterize contamination landscapes. Electronic supplementary material The online version of this article (10.1186/s12915-019-0690-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sung-Joon Park
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, 108-8693, Japan
| | - Satoru Onizuka
- Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.,Division of Periodontology, Department of Oral Function, Kyushu Dental University, Fukuoka, 803-8580, Japan
| | - Masahide Seki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8568, Japan
| | - Yutaka Suzuki
- Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8568, Japan
| | - Takanori Iwata
- Institute of Advanced Biomedical Engineering and Science, Tokyo Women's Medical University, Tokyo, 162-8666, Japan.,Department of Periodontology, Graduate School of Medical and Dental Sciences, Tokyo Medical and Dental University, Tokyo, 113-8549, Japan
| | - Kenta Nakai
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, Tokyo, 108-8693, Japan. .,Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, 277-8568, Japan.
| |
Collapse
|
7
|
Detection of Epstein-Barr Virus Infection in Non-Small Cell Lung Cancer. Cancers (Basel) 2019; 11:cancers11060759. [PMID: 31159203 PMCID: PMC6627930 DOI: 10.3390/cancers11060759] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 05/27/2019] [Accepted: 05/28/2019] [Indexed: 12/12/2022] Open
Abstract
Previous investigations proposed a link between the Epstein-Barr virus (EBV) and lung cancer (LC), but the results are highly controversial largely due to the insufficient sample size and the inherent limitation of the traditional viral screening methods such as PCR. Unlike PCR, current next-generation sequencing (NGS) utilizes an unbiased method for the global assessment of all exogenous agents within a cancer sample with high sensitivity and specificity. In our current study, we aim to resolve this long-standing controversy by utilizing our unbiased NGS-based informatics approaches in conjunction with traditional molecular methods to investigate the role of EBV in a total of 1127 LC. In situ hybridization analysis of 110 LC and 10 normal lung samples detected EBV transcripts in 3 LC samples. Comprehensive virome analyses of RNA sequencing (RNA-seq) data sets from 1017 LC and 110 paired adjacent normal lung specimens revealed EBV transcripts in three lung squamous cell carcinoma and one lung adenocarcinoma samples. In the sample with the highest EBV coverage, transcripts from the BamHI A region accounted for the majority of EBV reads. Expression of EBNA-1, LMP-1 and LMP-2 was observed. A number of viral circular RNA candidates were also detected. Thus, we for the first time revealed a type II latency-like viral transcriptome in the setting of LC in vivo. The high-level expression of viral BamHI A transcripts in LC suggests a functional role of these transcripts, likely as long non-coding RNA. Analyses of cellular gene expression and stained tissue sections indicated an increased immune cell infiltration in the sample expressing high levels of EBV transcripts compared to samples expressing low EBV transcripts. Increased level of immune checkpoint blockade factors was also detected in the sample with higher levels of EBV transcripts, indicating an induced immune tolerance. Lastly, inhibition of immune pathways and activation of oncogenic pathways were detected in the sample with high EBV transcripts compared to the EBV-low LC indicating the direct regulation of cancer pathways by EBV. Taken together, our data support the notion that EBV likely plays a pathological role in a subset of LC.
Collapse
|
8
|
Sangiovanni M, Granata I, Thind AS, Guarracino MR. From trash to treasure: detecting unexpected contamination in unmapped NGS data. BMC Bioinformatics 2019; 20:168. [PMID: 30999839 PMCID: PMC6472186 DOI: 10.1186/s12859-019-2684-x] [Citation(s) in RCA: 38] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023] Open
Abstract
Background Next Generation Sequencing (NGS) experiments produce millions of short sequences that, mapped to a reference genome, provide biological insights at genomic, transcriptomic and epigenomic level. Typically the amount of reads that correctly maps to the reference genome ranges between 70% and 90%, leaving in some cases a consistent fraction of unmapped sequences. This ’misalignment’ can be ascribed to low quality bases or sequence differences between the sample reads and the reference genome. Investigating the source of the unmapped reads is definitely important to better assess the quality of the whole experiment and to check for possible downstream or upstream ’contamination’ from exogenous nucleic acids. Results Here we propose DecontaMiner, a tool to unravel the presence of contaminating sequences among the unmapped reads. It uses a subtraction approach to identify bacteria, fungi and viruses genome contamination. DecontaMiner generates several output files to track all the processed reads, and to provide a complete report of their characteristics. The good quality matches on microorganism genomes are counted and compared among samples. DecontaMiner builds an offline HTML page containing summary statistics and plots. The latter are obtained using the state-of-the-art D3 javascript libraries. DecontaMiner has been mainly used to detect contamination in human RNA-Seq data. The software is freely available at http://www-labgtp.na.icar.cnr.it/decontaminer. Conclusions DecontaMiner is a tool designed and developed to investigate the presence of contaminating sequences in unmapped NGS data. It can suggest the presence of contaminating organisms in sequenced samples, that might derive either from laboratory contamination or from their biological source, and in both cases can be considered as worthy of further investigation and experimental validation. The novelty of DecontaMiner is mainly represented by its easy integration with the standard procedures of NGS data analysis, while providing a complete, reliable, and automatic pipeline. Electronic supplementary material The online version of this article (10.1186/s12859-019-2684-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Mara Sangiovanni
- Stazione Zoologica Anton Dohrn, Villa Comunale, Napoli, 80121, Italy
| | - Ilaria Granata
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy.
| | - Amarinder Singh Thind
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy
| | - Mario Rosario Guarracino
- High Performance Computing and Networking Institute, National Research Council of Italy, Via P. Castellino, 111, Napoli, 80131, Italy
| |
Collapse
|
9
|
Simon LM, Karg S, Westermann AJ, Engel M, Elbehery AHA, Hense B, Heinig M, Deng L, Theis FJ. MetaMap: an atlas of metatranscriptomic reads in human disease-related RNA-seq data. Gigascience 2018; 7:5036539. [PMID: 29901703 PMCID: PMC6025204 DOI: 10.1093/gigascience/giy070] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
Background With the advent of the age of big data in bioinformatics, large volumes of data and high-performance computing power enable researchers to perform re-analyses of publicly available datasets at an unprecedented scale. Ever more studies imply the microbiome in both normal human physiology and a wide range of diseases. RNA sequencing technology (RNA-seq) is commonly used to infer global eukaryotic gene expression patterns under defined conditions, including human disease-related contexts; however, its generic nature also enables the detection of microbial and viral transcripts. Findings We developed a bioinformatic pipeline to screen existing human RNA-seq datasets for the presence of microbial and viral reads by re-inspecting the non-human-mapping read fraction. We validated this approach by recapitulating outcomes from six independent, controlled infection experiments of cell line models and compared them with an alternative metatranscriptomic mapping strategy. We then applied the pipeline to close to 150 terabytes of publicly available raw RNA-seq data from more than 17,000 samples from more than 400 studies relevant to human disease using state-of-the-art high-performance computing systems. The resulting data from this large-scale re-analysis are made available in the presented MetaMap resource. Conclusions Our results demonstrate that common human RNA-seq data, including those archived in public repositories, might contain valuable information to correlate microbial and viral detection patterns with diverse diseases. The presented MetaMap database thus provides a rich resource for hypothesis generation toward the role of the microbiome in human disease. Additionally, codes to process new datasets and perform statistical analyses are made available.
Collapse
Affiliation(s)
- L M Simon
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - S Karg
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - A J Westermann
- Institute of Molecular Infection Biology, University of Würzburg, Würzburg, Germany.,Helmholtz Institute for RNA-Based Infection Research, Würzburg, Germany
| | - M Engel
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.,Helmholtz Zentrum München, German Research Center for Environmental Health, Scientific Computing Research Unit, Neuherberg, Germany
| | - A H A Elbehery
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Virology, Neuherberg, Germany
| | - B Hense
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - M Heinig
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany
| | - L Deng
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Virology, Neuherberg, Germany
| | - F J Theis
- Helmholtz Zentrum München, German Research Center for Environmental Health, Institute of Computational Biology, Neuherberg, Germany.,Department of Mathematics, Technische Universität München, Munich, Germany
| |
Collapse
|
10
|
Tang KW, Larsson E. Tumour virology in the era of high-throughput genomics. Philos Trans R Soc Lond B Biol Sci 2018; 372:rstb.2016.0265. [PMID: 28893932 PMCID: PMC5597732 DOI: 10.1098/rstb.2016.0265] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/09/2017] [Indexed: 12/12/2022] Open
Abstract
With the advent of massively parallel sequencing, oncogenic viruses in tumours can now be detected in an unbiased and comprehensive manner. Additionally, new viruses or strains can be discovered based on sequence similarity with known viruses. Using this approach, the causative agent for Merkel cell carcinoma was identified. Subsequent studies using data from large collections of tumours have confirmed models built during decades of hypothesis-driven and low-throughput research, and a more detailed and comprehensive description of virus-tumour associations have emerged. Notably, large cohorts and high sequencing depth, in combination with newly developed bioinformatical techniques, have made it possible to rule out several suggested virus-tumour associations with a high degree of confidence. In this review we discuss possibilities, limitations and insights gained from using massively parallel sequencing to characterize tumours with viral content, with emphasis on detection of viral sequences and genomic integration events.This article is part of the themed issue 'Human oncogenic viruses'.
Collapse
Affiliation(s)
- Ka-Wei Tang
- Department of Infectious Diseases, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| | - Erik Larsson
- Department of Medical Biochemistry and Cell Biology, Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Medicinaregatan 9A, 405 30 Gothenburg, Sweden
| |
Collapse
|
11
|
Wolf T, Kämmer P, Brunke S, Linde J. Two's company: studying interspecies relationships with dual RNA-seq. Curr Opin Microbiol 2017; 42:7-12. [PMID: 28957710 DOI: 10.1016/j.mib.2017.09.001] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Revised: 08/24/2017] [Accepted: 09/01/2017] [Indexed: 01/03/2023]
Abstract
Organisms do not exist isolated from each other, but constantly interact. Cells can sense the presence of interaction partners by a range of receptors and, via complex regulatory networks, specifically react by changing the expression of many of their genes. Technological advances in next-generation sequencing over the recent years now allow us to apply RNA sequencing to two species at the same time (dual RNA-seq), and thus to directly study the gene expression of two interacting species without the need to physically separate cells or RNA. In this review, we give an overview over the latest studies in interspecies interactions made possible by dual RNA-seq, ranging from pathogenic to symbiotic relationships. We summarize state-of-the-art experimental techniques, bioinformatic data analysis and data interpretation, while also highlighting potential problems and pitfalls starting from the selection of meaningful time points and number of reads to matters of rRNA depletion. A short outlook on new trends in the field of dual RNA-seq concludes this review, looking at sequencing of non-coding RNAs during host-pathogen interactions and the prediction of molecular interspecies interactions networks.
Collapse
Affiliation(s)
- Thomas Wolf
- Research Group Systems Biology and Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knoell-Institute, Jena, Germany
| | - Philipp Kämmer
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knoell-Institute, Jena, Germany
| | - Sascha Brunke
- Department of Microbial Pathogenicity Mechanisms, Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knoell-Institute, Jena, Germany
| | - Jörg Linde
- Research Group PiDOMICS, Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knoell-Institute, Jena, Germany.
| |
Collapse
|
12
|
Peng FY, Yang RC. Prediction and analysis of three gene families related to leaf rust (Puccinia triticina) resistance in wheat (Triticum aestivum L.). BMC PLANT BIOLOGY 2017; 17:108. [PMID: 28633642 PMCID: PMC5477749 DOI: 10.1186/s12870-017-1056-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/02/2017] [Accepted: 06/06/2017] [Indexed: 05/30/2023]
Abstract
BACKGROUND The resistance to leaf rust (Lr) caused by Puccinia triticina in wheat (Triticum aestivum L.) has been well studied over the past decades with over 70 Lr genes being mapped on different chromosomes and numerous QTLs (quantitative trait loci) being detected or mapped using DNA markers. Such resistance is often divided into race-specific and race-nonspecific resistance. The race-nonspecific resistance can be further divided into resistance to most or all races of the same pathogen and resistance to multiple pathogens. At the molecular level, these three types of resistance may cover across the whole spectrum of pathogen specificities that are controlled by genes encoding different protein families in wheat. The objective of this study is to predict and analyze genes in three such families: NBS-LRR (nucleotide-binding sites and leucine-rich repeats or NLR), START (Steroidogenic Acute Regulatory protein [STaR] related lipid-transfer) and ABC (ATP-Binding Cassette) transporter. The focus of the analysis is on the patterns of relationships between these protein-coding genes within the gene families and QTLs detected for leaf rust resistance. RESULTS We predicted 526 ABC, 1117 NLR and 144 START genes in the hexaploid wheat genome through a domain analysis of wheat proteome. Of the 1809 SNPs from leaf rust resistance QTLs in seedling and adult stages of wheat, 126 SNPs were found within coding regions of these genes or their neighborhood (5 Kb upstream from transcription start site [TSS] or downstream from transcription termination site [TTS] of the genes). Forty-three of these SNPs for adult resistance and 18 SNPs for seedling resistance reside within coding or neighboring regions of the ABC genes whereas 14 SNPs for adult resistance and 29 SNPs for seedling resistance reside within coding or neighboring regions of the NLR gene. Moreover, we found 17 nonsynonymous SNPs for adult resistance and five SNPs for seedling resistance in the ABC genes, and five nonsynonymous SNPs for adult resistance and six SNPs for seedling resistance in the NLR genes. Most of these coding SNPs were predicted to alter encoded amino acids and such information may serve as a starting point towards more thorough molecular and functional characterization of the designated Lr genes. Using the primer sequences of 99 known non-SNP markers from leaf rust resistance QTLs, we found candidate genes closely linked to these markers, including Lr34 with distances to its two gene-specific markers being 1212 bases (to cssfr1) and 2189 bases (to cssfr2). CONCLUSION This study represents a comprehensive analysis of ABC, NLR and START genes in the hexaploid wheat genome and their physical relationships with QTLs for leaf rust resistance at seedling and adult stages. Our analysis suggests that the ABC (and START) genes are more likely to be co-located with QTLs for race-nonspecific, adult resistance whereas the NLR genes are more likely to be co-located with QTLs for race-specific resistance that would be often expressed at the seedling stage. Though our analysis was hampered by inaccurate or unknown physical positions of numerous QTLs due to the incomplete assembly of the complex hexaploid wheat genome that is currently available, the observed associations between (i) QTLs for race-specific resistance and NLR genes and (ii) QTLs for nonspecific resistance and ABC genes will help discover SNP variants for leaf rust resistance at seedling and adult stages. The genes containing nonsynonymous SNPs are promising candidates that can be investigated in future studies as potential new sources of leaf rust resistance in wheat breeding.
Collapse
Affiliation(s)
- Fred Y Peng
- Department of Agricultural, Food and Nutritional Science, University of Alberta, 410 Agriculture/Forestry Centre, Edmonton, AB, T6G 2P5, Canada
| | - Rong-Cai Yang
- Department of Agricultural, Food and Nutritional Science, University of Alberta, 410 Agriculture/Forestry Centre, Edmonton, AB, T6G 2P5, Canada.
- Feed Crops Section, Alberta Agriculture and Forestry, 7000 - 113 Street, Edmonton, AB, T6H 5T6, Canada.
| |
Collapse
|
13
|
Doggett NA, Mukundan H, Lefkowitz EJ, Slezak TR, Chain PS, Morse S, Anderson K, Hodge DR, Pillai S. Culture-Independent Diagnostics for Health Security. Health Secur 2017; 14:122-42. [PMID: 27314653 DOI: 10.1089/hs.2015.0074] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The past decade has seen considerable development in the diagnostic application of nonculture methods, including nucleic acid amplification-based methods and mass spectrometry, for the diagnosis of infectious diseases. The implications of these new culture-independent diagnostic tests (CIDTs) include bypassing the need to culture organisms, thus potentially affecting public health surveillance systems, which continue to use isolates as the basis of their surveillance programs and to assess phenotypic resistance to antimicrobial agents. CIDTs may also affect the way public health practitioners detect and respond to a bioterrorism event. In response to a request from the Department of Homeland Security, Los Alamos National Laboratory and the Centers for Disease Control and Prevention cosponsored a workshop to review the impact of CIDTs on the rapid detection and identification of biothreat agents. Four panel discussions were held that covered nucleic acid amplification-based diagnostics, mass spectrometry, antibody-based diagnostics, and next-generation sequencing. Exploiting the extensive expertise available at this workshop, we identified the key features, benefits, and limitations of the various CIDT methods for providing rapid pathogen identification that are critical to the response and mitigation of a bioterrorism event. After the workshop we conducted a thorough review of the literature, investigating the current state of these 4 culture-independent diagnostic methods. This article combines information from the literature review and the insights obtained at the workshop.
Collapse
|
14
|
Gowtham YK, Saski CA, Harcum SW. Low glucose concentrations within typical industrial operating conditions have minimal effect on the transcriptome of recombinant CHO cells. Biotechnol Prog 2017; 33:771-785. [PMID: 28371311 DOI: 10.1002/btpr.2462] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2016] [Revised: 01/07/2017] [Indexed: 12/16/2022]
Abstract
Typically, mammalian cell culture medium contains high glucose concentrations that are analogous to diabetic levels in humans, suggesting that mammalian cells are cultivated in excessive glucose. Using RNA-Seq, this study characterized the Chinese hamster ovary (CHO) cell transcriptome under two glucose concentrations to assess the genetic effects associated with metabolic pathways, in addition to other global responses. The initial extracellular glucose concentrations used represented high (30 mM) and low (10 mM) glucose conditions, where at the time the transcriptomes were compared, the glucose concentrations were approximately 24 and 4.4 mM for the mid-exponential cultures, where 4.4 mM represents a common target concentration in the biopharmaceutical industry for controlled fed-batch cultures. A recombinant CHO cell line producing a monoclonal antibody was used, such that the impact on glycosylation genes could be evaluated. Relatively few genes were identified as being significantly different (FDR ≤ 0.01) between the high and low glucose conditions, for example, only 575 genes, and only 40 of these genes had 2-fold or greater differences. Gene expression differences for glycolysis, TCA cycle, and glycosylation-related reactions were minimal and unlikely to have biological significance. This transcriptome study indicates that low glucose concentrations in the culture medium are unlikely to cause any biologically significant or detrimental changes to CHO cells at the transcriptome level. Furthermore, it is well-known that maintaining low glucose concentrations in fed-batch cultures can reduce lactate production, which in turn improves process outcomes. Taken together, the transcriptome data supports the continued development of low glucose-based processes to control lactate. © 2017 American Institute of Chemical Engineers Biotechnol. Prog., 33:771-785, 2017.
Collapse
Affiliation(s)
| | - Christopher A Saski
- Inst. of Translational Genomics, Clemson University, Clemson, SC, 29634.,Dept. of Genetics and Biochemistry, Clemson University, Clemson, SC, 29634
| | - Sarah W Harcum
- Dept. of Bioengineering, Clemson University, Clemson, SC, 29634
| |
Collapse
|
15
|
Fasterius E, Raso C, Kennedy S, Rauch N, Lundin P, Kolch W, Uhlén M, Al-Khalili Szigyarto C. A novel RNA sequencing data analysis method for cell line authentication. PLoS One 2017; 12:e0171435. [PMID: 28192450 PMCID: PMC5305277 DOI: 10.1371/journal.pone.0171435] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2016] [Accepted: 01/20/2017] [Indexed: 11/19/2022] Open
Abstract
We have developed a novel analysis method that can interrogate the authenticity of biological samples used for generation of transcriptome profiles in public data repositories. The method uses RNA sequencing information to reveal mutations in expressed transcripts and subsequently confirms the identity of analysed cells by comparison with publicly available cell-specific mutational profiles. Cell lines constitute key model systems widely used within cancer research, but their identity needs to be confirmed in order to minimise the influence of cell contaminations and genetic drift on the analysis. Using both public and novel data, we demonstrate the use of RNA-sequencing data analysis for cell line authentication by examining the validity of COLO205, DLD1, HCT15, HCT116, HKE3, HT29 and RKO colorectal cancer cell lines. We successfully authenticate the studied cell lines and validate previous reports indicating that DLD1 and HCT15 are synonymous. We also show that the analysed HKE3 cells harbour an unexpected KRAS-G13D mutation and confirm that this cell line is a genuine KRAS dosage mutant, rather than a true isogenic derivative of HCT116 expressing only the wild type KRAS. This authentication method could be used to revisit the numerous cell line based RNA sequencing experiments available in public data repositories, analyse new experiments where whole genome sequencing is not available, as well as facilitate comparisons of data from different experiments, platforms and laboratories.
Collapse
Affiliation(s)
- Erik Fasterius
- School of Biotechnology, Royal Institute of Technology, Stockholm, Sweden
| | - Cinzia Raso
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Susan Kennedy
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Nora Rauch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
| | - Pär Lundin
- Science for Life Laboratory, Dept of Biochemistry and Biophysics, Stockholm University, Stockholm, Sweden
| | - Walter Kolch
- Systems Biology Ireland, University College Dublin, Belfield, Dublin 4, Ireland
- Conway Institute of Biomolecular & Biomedical Research, University College Dublin, Belfield, Dublin 4, Ireland
- School of Medicine, University College Dublin, Belfield, Dublin 4, Ireland
| | - Mathias Uhlén
- School of Biotechnology, Royal Institute of Technology, Stockholm, Sweden
- Science for Life Laboratory, Stockholm, Sweden
| | | |
Collapse
|
16
|
Cox JW, Ballweg RA, Taft DH, Velayutham P, Haslam DB, Porollo A. A fast and robust protocol for metataxonomic analysis using RNAseq data. MICROBIOME 2017; 5:7. [PMID: 28103917 PMCID: PMC5244565 DOI: 10.1186/s40168-016-0219-5] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 12/05/2016] [Indexed: 05/03/2023]
Abstract
BACKGROUND Metagenomics is a rapidly emerging field aimed to analyze microbial diversity and dynamics by studying the genomic content of the microbiota. Metataxonomics tools analyze high-throughput sequencing data, primarily from 16S rRNA gene sequencing and DNAseq, to identify microorganisms and viruses within a complex mixture. With the growing demand for analysis of the functional microbiome, metatranscriptome studies attract more interest. To make metatranscriptomic data sufficient for metataxonomics, new analytical workflows are needed to deal with sparse and taxonomically less informative sequencing data. RESULTS We present a new protocol, IMSA+A, for accurate taxonomy classification based on metatranscriptome data of any read length that can efficiently and robustly identify bacteria, fungi, and viruses in the same sample. The new protocol improves accuracy by using a conservative reference database, employing a new counting scheme, and by assembling shotgun reads. Assembly also reduces analysis runtime. Simulated data were utilized to evaluate the protocol by permuting common experimental variables. When applied to the real metatranscriptome data for mouse intestines colonized by ASF, the protocol showed superior performance in detection of the microorganisms compared to the existing metataxonomics tools. IMSA+A is available at https://github.com/JeremyCoxBMI/IMSA-A . CONCLUSIONS The developed protocol addresses the need for taxonomy classification from RNAseq data. Previously not utilized, i.e., unmapped to a reference genome, RNAseq reads can now be used to gather taxonomic information about the microbiota present in a biological sample without conducting additional sequencing. Any metatranscriptome pipeline that includes assembly of reads can add this analysis with minimal additional cost of compute time. The new protocol also creates an opportunity to revisit old metatranscriptome data, where taxonomic content may be important but was not analyzed.
Collapse
Affiliation(s)
- Jeremy W Cox
- Department of Electrical Engineering and Computing Systems, University of Cincinnati, 2901 Woodside Drive, Cincinnati, OH, 45221, USA
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Richard A Ballweg
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Diana H Taft
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA
| | - Prakash Velayutham
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - David B Haslam
- Division of Infectious Diseases, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA
| | - Aleksey Porollo
- The Center for Autoimmune Genomics and Etiology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, MLC 15012, Cincinnati, OH, 45229-3039, USA.
- Division of Biomedical Informatics, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, OH, 45229, USA.
| |
Collapse
|
17
|
Strong MJ, Blanchard E, Lin Z, Morris CA, Baddoo M, Taylor CM, Ware ML, Flemington EK. A comprehensive next generation sequencing-based virome assessment in brain tissue suggests no major virus - tumor association. Acta Neuropathol Commun 2016; 4:71. [PMID: 27402152 PMCID: PMC4940872 DOI: 10.1186/s40478-016-0338-z] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2016] [Accepted: 06/15/2016] [Indexed: 12/15/2022] Open
Abstract
Next generation sequencing (NGS) can globally interrogate the genetic composition of biological samples in an unbiased yet sensitive manner. The objective of this study was to utilize the capabilities of NGS to investigate the reported association between glioblastoma multiforme (GBM) and human cytomegalovirus (HCMV). A large-scale comprehensive virome assessment was performed on publicly available sequencing datasets from the Cancer Genome Atlas (TCGA), including RNA-seq datasets from primary GBM (n = 157), recurrent GBM (n = 13), low-grade gliomas (n = 514), recurrent low-grade gliomas (n = 17), and normal brain (n = 5), and whole genome sequencing (WGS) datasets from primary GBM (n = 51), recurrent GBM (n = 10), and normal matched blood samples (n = 20). In addition, RNA-seq datasets from MRI-guided biopsies (n = 92) and glioma stem-like cell cultures (n = 9) were analyzed. Sixty-four DNA-seq datasets from 11 meningiomas and their corresponding blood control samples were also analyzed. Finally, three primary GBM tissue samples were obtained, sequenced using RNA-seq, and analyzed. After in-depth analysis, the most robust virus findings were the detection of papillomavirus (HPV) and hepatitis B reads in the occasional LGG sample (4 samples and 1 sample, respectively). In addition, low numbers of virus reads were detected in several datasets but detailed investigation of these reads suggest that these findings likely represent artifacts or non-pathological infections. For example, all of the sporadic low level HCMV reads were found to map to the immediate early promoter intimating that they likely originated from laboratory expression vector contamination. Despite the detection of low numbers of Epstein-Barr virus reads in some samples, these likely originated from infiltrating B-cells. Finally, human herpesvirus 6 and 7 aligned viral reads were identified in all DNA-seq and a few RNA-seq datasets but detailed analysis demonstrated that these were likely derived from the homologous human telomeric-like repeats. Other low abundance viral reads were detected in some samples but for most viruses, the reads likely represent artifacts or incidental infections. This analysis argues against associations between most known viruses and GBM or mengiomas. Nevertheless, there may be a low percentage association between HPV and/or hepatitis B and LGGs.
Collapse
|
18
|
Enguita FJ, Costa MC, Fusco-Almeida AM, Mendes-Giannini MJ, Leitão AL. Transcriptomic Crosstalk between Fungal Invasive Pathogens and Their Host Cells: Opportunities and Challenges for Next-Generation Sequencing Methods. J Fungi (Basel) 2016; 2:jof2010007. [PMID: 29376924 PMCID: PMC5753088 DOI: 10.3390/jof2010007] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2015] [Revised: 12/12/2015] [Accepted: 12/12/2015] [Indexed: 12/22/2022] Open
Abstract
Fungal invasive infections are an increasing health problem. The intrinsic complexity of pathogenic fungi and the unmet clinical need for new and more effective treatments requires a detailed knowledge of the infection process. During infection, fungal pathogens are able to trigger a specific transcriptional program in their host cells. The detailed knowledge of this transcriptional program will allow for a better understanding of the infection process and consequently will help in the future design of more efficient therapeutic strategies. Simultaneous transcriptomic studies of pathogen and host by high-throughput sequencing (dual RNA-seq) is an unbiased protocol to understand the intricate regulatory networks underlying the infectious process. This protocol is starting to be applied to the study of the interactions between fungal pathogens and their hosts. To date, our knowledge of the molecular basis of infection for fungal pathogens is still very limited, and the putative role of regulatory players such as non-coding RNAs or epigenetic factors remains elusive. The wider application of high-throughput transcriptomics in the near future will help to understand the fungal mechanisms for colonization and survival, as well as to characterize the molecular responses of the host cell against a fungal infection.
Collapse
Affiliation(s)
- Francisco J Enguita
- Faculdade de Medicina, Universidade de Lisboa, Av. Professor Egas Moniz, Lisboa 1649-028, Portugal.
| | - Marina C Costa
- Faculdade de Medicina, Universidade de Lisboa, Av. Professor Egas Moniz, Lisboa 1649-028, Portugal.
| | - Ana Marisa Fusco-Almeida
- Núcleo de Proteômica, Faculdade de Ciências Farmacêuticas, Universidade Estadual Paulista-UNESP, Rodovia Araraquara-Jaú Km 1, Araraquara 14801-902, São Paulo, Brazil.
| | - Maria José Mendes-Giannini
- Núcleo de Proteômica, Faculdade de Ciências Farmacêuticas, Universidade Estadual Paulista-UNESP, Rodovia Araraquara-Jaú Km 1, Araraquara 14801-902, São Paulo, Brazil.
| | - Ana Lúcia Leitão
- MEtRICs, Departamento de Ciências e Tecnologia da Biomassa, Faculdade de Ciências e Tecnologia, Universidade NOVA de Lisboa, Campus de Caparica, Caparica 2829-516, Portugal.
| |
Collapse
|
19
|
Velmeshev D, Lally P, Magistri M, Faghihi MA. CANEapp: a user-friendly application for automated next generation transcriptomic data analysis. BMC Genomics 2016; 17:49. [PMID: 26758513 PMCID: PMC4710974 DOI: 10.1186/s12864-015-2346-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 12/22/2015] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Next generation sequencing (NGS) technologies are indispensable for molecular biology research, but data analysis represents the bottleneck in their application. Users need to be familiar with computer terminal commands, the Linux environment, and various software tools and scripts. Analysis workflows have to be optimized and experimentally validated to extract biologically meaningful data. Moreover, as larger datasets are being generated, their analysis requires use of high-performance servers. RESULTS To address these needs, we developed CANEapp (application for Comprehensive automated Analysis of Next-generation sequencing Experiments), a unique suite that combines a Graphical User Interface (GUI) and an automated server-side analysis pipeline that is platform-independent, making it suitable for any server architecture. The GUI runs on a PC or Mac and seamlessly connects to the server to provide full GUI control of RNA-sequencing (RNA-seq) project analysis. The server-side analysis pipeline contains a framework that is implemented on a Linux server through completely automated installation of software components and reference files. Analysis with CANEapp is also fully automated and performs differential gene expression analysis and novel noncoding RNA discovery through alternative workflows (Cuffdiff and R packages edgeR and DESeq2). We compared CANEapp to other similar tools, and it significantly improves on previous developments. We experimentally validated CANEapp's performance by applying it to data derived from different experimental paradigms and confirming the results with quantitative real-time PCR (qRT-PCR). CANEapp adapts to any server architecture by effectively using available resources and thus handles large amounts of data efficiently. CANEapp performance has been experimentally validated on various biological datasets. CANEapp is available free of charge at http://psychiatry.med.miami.edu/research/laboratory-of-translational-rna-genomics/CANE-app . CONCLUSIONS We believe that CANEapp will serve both biologists with no computational experience and bioinformaticians as a simple, timesaving but accurate and powerful tool to analyze large RNA-seq datasets and will provide foundations for future development of integrated and automated high-throughput genomics data analysis tools. Due to its inherently standardized pipeline and combination of automated analysis and platform-independence, CANEapp is an ideal for large-scale collaborative RNA-seq projects between different institutions and research groups.
Collapse
Affiliation(s)
- Dmitry Velmeshev
- Department of Psychiatry, University of Miami Miller School of Medicine, Miami, FL, 33136, USA. .,Department of Biochemistry & Molecular Biology, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
| | - Patrick Lally
- Department of Psychiatry, University of Miami Miller School of Medicine, Miami, FL, 33136, USA. .,Department of Biomedical Engineering, University of Miami, Coral Gables, FL, 33146, USA.
| | - Marco Magistri
- Department of Psychiatry, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
| | - Mohammad Ali Faghihi
- Department of Psychiatry, University of Miami Miller School of Medicine, Miami, FL, 33136, USA.
| |
Collapse
|
20
|
Han Y, Gao S, Muegge K, Zhang W, Zhou B. Advanced Applications of RNA Sequencing and Challenges. Bioinform Biol Insights 2015; 9:29-46. [PMID: 26609224 PMCID: PMC4648566 DOI: 10.4137/bbi.s28991] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Revised: 09/30/2015] [Accepted: 10/02/2015] [Indexed: 12/18/2022] Open
Abstract
Next-generation sequencing technologies have revolutionarily advanced sequence-based research with the advantages of high-throughput, high-sensitivity, and high-speed. RNA-seq is now being used widely for uncovering multiple facets of transcriptome to facilitate the biological applications. However, the large-scale data analyses associated with RNA-seq harbors challenges. In this study, we present a detailed overview of the applications of this technology and the challenges that need to be addressed, including data preprocessing, differential gene expression analysis, alternative splicing analysis, variants detection and allele-specific expression, pathway analysis, co-expression network analysis, and applications combining various experimental procedures beyond the achievements that have been made. Specifically, we discuss essential principles of computational methods that are required to meet the key challenges of the RNA-seq data analyses, development of various bioinformatics tools, challenges associated with the RNA-seq applications, and examples that represent the advances made so far in the characterization of the transcriptome.
Collapse
Affiliation(s)
- Yixing Han
- Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA
| | - Shouguo Gao
- Bioinformatics and Systems Biology Core, National Heart Lung Blood Institute, National Institutes of Health, Rockville Pike, Bethesda, MD, USA
| | - Kathrin Muegge
- Mouse Cancer Genetics Program, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Frederick, MD, USA. ; Leidos Biomedical Research, Inc., Basic Science Program, Frederick National Laboratory, Frederick, MD, USA
| | - Wei Zhang
- Department of Medicine, University of California, San Diego, La Jolla, CA, USA
| | - Bing Zhou
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA, USA
| |
Collapse
|
21
|
Encyclopedia of bacterial gene circuits whose presence or absence correlate with pathogenicity--a large-scale system analysis of decoded bacterial genomes. BMC Genomics 2015; 16:773. [PMID: 26459834 PMCID: PMC4603813 DOI: 10.1186/s12864-015-1957-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2015] [Accepted: 09/28/2015] [Indexed: 12/26/2022] Open
Abstract
BACKGROUND Bacterial infections comprise a global health challenge as the incidences of antibiotic resistance increase. Pathogenic potential of bacteria has been shown to be context dependent, varying in response to environment and even within the strains of the same genus. RESULTS We used the KEGG repository and extensive literature searches to identify among the 2527 bacterial genomes in the literature those implicated as pathogenic to the host, including those which show pathogenicity in a context dependent manner. Using data on the gene contents of these genomes, we identified sets of genes highly abundant in pathogenic but relatively absent in commensal strains and vice versa. In addition, we carried out genome comparison within a genus for the seventeen largest genera in our genome collection. We projected the resultant lists of ortholog genes onto KEGG bacterial pathways to identify clusters and circuits, which can be linked to either pathogenicity or synergy. Gene circuits relatively abundant in nonpathogenic bacteria often mediated biosynthesis of antibiotics. Other synergy-linked circuits reduced drug-induced toxicity. Pathogen-abundant gene circuits included modules in one-carbon folate, two-component system, type-3 secretion system, and peptidoglycan biosynthesis. Antibiotics-resistant bacterial strains possessed genes modulating phagocytosis, vesicle trafficking, cytoskeletal reorganization, and regulation of the inflammatory response. Our study also identified bacterial genera containing a circuit, elements of which were previously linked to Alzheimer's disease. CONCLUSIONS Present study produces for the first time, a signature, in the form of a robust list of gene circuitry whose presence or absence could potentially define the pathogenicity of a microbiome. Extensive literature search substantiated a bulk majority of the commensal and pathogenic circuitry in our predicted list. Scanning microbiome libraries for these circuitry motifs will provide further insights into the complex and context dependent pathogenicity of bacteria.
Collapse
|
22
|
Birol I, Behsaz B, Hammond SA, Kucuk E, Veldhoen N, Helbing CC. De novo Transcriptome Assemblies of Rana (Lithobates) catesbeiana and Xenopus laevis Tadpole Livers for Comparative Genomics without Reference Genomes. PLoS One 2015; 10:e0130720. [PMID: 26121473 PMCID: PMC4488148 DOI: 10.1371/journal.pone.0130720] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2015] [Accepted: 05/23/2015] [Indexed: 12/04/2022] Open
Abstract
In this work we studied the liver transcriptomes of two frog species, the American bullfrog (Rana (Lithobates) catesbeiana) and the African clawed frog (Xenopus laevis). We used high throughput RNA sequencing (RNA-seq) data to assemble and annotate these transcriptomes, and compared how their baseline expression profiles change when tadpoles of the two species are exposed to thyroid hormone. We generated more than 1.5 billion RNA-seq reads in total for the two species under two conditions as treatment/control pairs. We de novo assembled these reads using Trans-ABySS to reconstruct reference transcriptomes, obtaining over 350,000 and 130,000 putative transcripts for R. catesbeiana and X. laevis, respectively. Using available genomics resources for X. laevis, we annotated over 97% of our X. laevis transcriptome contigs, demonstrating the utility and efficacy of our methodology. Leveraging this validated analysis pipeline, we also annotated the assembled R. catesbeiana transcriptome. We used the expression profiles of the annotated genes of the two species to examine the similarities and differences between the tadpole liver transcriptomes. We also compared the gene ontology terms of expressed genes to measure how the animals react to a challenge by thyroid hormone. Our study reports three main conclusions. First, de novo assembly of RNA-seq data is a powerful method for annotating and establishing transcriptomes of non-model organisms. Second, the liver transcriptomes of the two frog species, R. catesbeiana and X. laevis, show many common features, and the distribution of their gene ontology profiles are statistically indistinguishable. Third, although they broadly respond the same way to the presence of thyroid hormone in their environment, their receptor/signal transduction pathways display marked differences.
Collapse
Affiliation(s)
- Inanc Birol
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
- * E-mail:
| | - Bahar Behsaz
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - S. Austin Hammond
- Department of Biochemistry and Microbiology, University of Victoria, P.O. Box 1700, Stn CSC, Victoria, BC, V8W 2Y2, Canada
| | - Erdi Kucuk
- Canada’s Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, V5Z 4S6, Canada
| | - Nik Veldhoen
- Department of Biochemistry and Microbiology, University of Victoria, P.O. Box 1700, Stn CSC, Victoria, BC, V8W 2Y2, Canada
| | - Caren C. Helbing
- Department of Biochemistry and Microbiology, University of Victoria, P.O. Box 1700, Stn CSC, Victoria, BC, V8W 2Y2, Canada
| |
Collapse
|
23
|
Poplawski A, Marini F, Hess M, Zeller T, Mazur J, Binder H. Systematically evaluating interfaces for RNA-seq analysis from a life scientist perspective. Brief Bioinform 2015; 17:213-23. [PMID: 26108229 DOI: 10.1093/bib/bbv036] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Indexed: 11/13/2022] Open
Abstract
RNA-sequencing (RNA-seq) has become an established way for measuring gene expression in model organisms and humans. While methods development for refining the corresponding data processing and analysis pipeline is ongoing, protocols for typical steps have been proposed and are widely used. Several user interfaces have been developed for making such analysis steps accessible to life scientists without extensive knowledge of command line tools. We performed a systematic search and evaluation of such interfaces to investigate to what extent these can indeed facilitate RNA-seq data analysis. We found a total of 29 open source interfaces, and six of the more widely used interfaces were evaluated in detail. Central criteria for evaluation were ease of configuration, documentation, usability, computational demand and reporting. No interface scored best in all of these criteria, indicating that the final choice will depend on the specific perspective of users and the corresponding weighting of criteria. Considerable technical hurdles had to be overcome in our evaluation. For many users, this will diminish potential benefits compared with command line tools, leaving room for future improvement of interfaces.
Collapse
|
24
|
Durmuş S, Çakır T, Özgür A, Guthke R. A review on computational systems biology of pathogen-host interactions. Front Microbiol 2015; 6:235. [PMID: 25914674 PMCID: PMC4391036 DOI: 10.3389/fmicb.2015.00235] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/10/2015] [Indexed: 12/27/2022] Open
Abstract
Pathogens manipulate the cellular mechanisms of host organisms via pathogen-host interactions (PHIs) in order to take advantage of the capabilities of host cells, leading to infections. The crucial role of these interspecies molecular interactions in initiating and sustaining infections necessitates a thorough understanding of the corresponding mechanisms. Unlike the traditional approach of considering the host or pathogen separately, a systems-level approach, considering the PHI system as a whole is indispensable to elucidate the mechanisms of infection. Following the technological advances in the post-genomic era, PHI data have been produced in large-scale within the last decade. Systems biology-based methods for the inference and analysis of PHI regulatory, metabolic, and protein-protein networks to shed light on infection mechanisms are gaining increasing demand thanks to the availability of omics data. The knowledge derived from the PHIs may largely contribute to the identification of new and more efficient therapeutics to prevent or cure infections. There are recent efforts for the detailed documentation of these experimentally verified PHI data through Web-based databases. Despite these advances in data archiving, there are still large amounts of PHI data in the biomedical literature yet to be discovered, and novel text mining methods are in development to unearth such hidden data. Here, we review a collection of recent studies on computational systems biology of PHIs with a special focus on the methods for the inference and analysis of PHI networks, covering also the Web-based databases and text-mining efforts to unravel the data hidden in the literature.
Collapse
Affiliation(s)
- Saliha Durmuş
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Tunahan Çakır
- Computational Systems Biology Group, Department of Bioengineering, Gebze Technical University, KocaeliTurkey
| | - Arzucan Özgür
- Department of Computer Engineering, Boǧaziçi University, IstanbulTurkey
| | - Reinhard Guthke
- Leibniz Institute for Natural Product Research and Infection Biology – Hans-Knoell-Institute, JenaGermany
| |
Collapse
|
25
|
Christley S, Cockrell C, An G. Computational Studies of the Intestinal Host-Microbiota Interactome. COMPUTATION (BASEL, SWITZERLAND) 2015; 3:2-28. [PMID: 34765258 PMCID: PMC8580329 DOI: 10.3390/computation3010002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
A large and growing body of research implicates aberrant immune response and compositional shifts of the intestinal microbiota in the pathogenesis of many intestinal disorders. The molecular and physical interaction between the host and the microbiota, known as the host-microbiota interactome, is one of the key drivers in the pathophysiology of many of these disorders. This host-microbiota interactome is a set of dynamic and complex processes, and needs to be treated as a distinct entity and subject for study. Disentangling this complex web of interactions will require novel approaches, using a combination of data-driven bioinformatics with knowledge-driven computational modeling. This review describes the computational approaches for investigating the host-microbiota interactome, with emphasis on the human intestinal tract and innate immunity, and highlights open challenges and existing gaps in the computation methodology for advancing our knowledge about this important facet of human health.
Collapse
Affiliation(s)
- Scott Christley
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| | - Chase Cockrell
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| | - Gary An
- Department of Surgery, University of Chicago, 5841 South Maryland Avenue, Chicago, IL 60637, USA
| |
Collapse
|
26
|
Wang Q, Jia P, Zhao Z. VERSE: a novel approach to detect virus integration in host genomes through reference genome customization. Genome Med 2015; 7:2. [PMID: 25699093 PMCID: PMC4333248 DOI: 10.1186/s13073-015-0126-6] [Citation(s) in RCA: 46] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 01/05/2015] [Indexed: 12/28/2022] Open
Abstract
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.
Collapse
Affiliation(s)
- Qingguo Wang
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA
| | - Peilin Jia
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA
| | - Zhongming Zhao
- Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203 USA ; Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, TN 37232 USA ; Department of Psychiatry, Vanderbilt University School of Medicine, Nashville, TN 37232 USA ; Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, TN 37232 USA
| |
Collapse
|
27
|
Strong MJ, Xu G, Morici L, Splinter Bon-Durant S, Baddoo M, Lin Z, Fewell C, Taylor CM, Flemington EK. Microbial contamination in next generation sequencing: implications for sequence-based analysis of clinical samples. PLoS Pathog 2014; 10:e1004437. [PMID: 25412476 PMCID: PMC4239086 DOI: 10.1371/journal.ppat.1004437] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The high level of accuracy and sensitivity of next generation sequencing for quantifying genetic material across organismal boundaries gives it tremendous potential for pathogen discovery and diagnosis in human disease. Despite this promise, substantial bacterial contamination is routinely found in existing human-derived RNA-seq datasets that likely arises from environmental sources. This raises the need for stringent sequencing and analysis protocols for studies investigating sequence-based microbial signatures in clinical samples.
Collapse
Affiliation(s)
- Michael J. Strong
- Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America
- Tulane Cancer Center, Tulane University, New Orleans, Louisiana, United States of America
| | - Guorong Xu
- Department of Genomic Medicine, University of California, San Diego, California, United States of America
| | - Lisa Morici
- Department of Microbiology and Immunology, Tulane University, New Orleans, Louisiana, United States of America
| | - Sandra Splinter Bon-Durant
- University of Wisconsin Biotechnology Center, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Melody Baddoo
- Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America
- Tulane Cancer Center, Tulane University, New Orleans, Louisiana, United States of America
| | - Zhen Lin
- Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America
- Tulane Cancer Center, Tulane University, New Orleans, Louisiana, United States of America
| | - Claire Fewell
- Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America
- Tulane Cancer Center, Tulane University, New Orleans, Louisiana, United States of America
| | - Christopher M. Taylor
- Department of Microbiology, Immunology & Parasitology, Louisiana State University School of Medicine, New Orleans, Louisiana, United States of America
- Research Institute for Children, Children's Hospital of New Orleans, New Orleans, Louisiana, United States of America
| | - Erik K. Flemington
- Department of Pathology, Tulane University, New Orleans, Louisiana, United States of America
- Tulane Cancer Center, Tulane University, New Orleans, Louisiana, United States of America
- * E-mail:
| |
Collapse
|
28
|
Expanding the conversation on high-throughput virome sequencing standards to include consideration of microbial contamination sources. mBio 2014; 5:e01989. [PMID: 25352620 PMCID: PMC4217176 DOI: 10.1128/mbio.01989-14] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
|
29
|
The impact of "omic" and imaging technologies on assessing the host immune response to biodefence agents. J Immunol Res 2014; 2014:237043. [PMID: 25333059 PMCID: PMC4182007 DOI: 10.1155/2014/237043] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2014] [Revised: 07/23/2014] [Accepted: 08/05/2014] [Indexed: 01/08/2023] Open
Abstract
Understanding the interactions between host and pathogen is important for the development and assessment of medical countermeasures to infectious agents, including potential biodefence pathogens such as Bacillus anthracis, Ebola virus, and Francisella tularensis. This review focuses on technological advances which allow this interaction to be studied in much greater detail. Namely, the use of “omic” technologies (next generation sequencing, DNA, and protein microarrays) for dissecting the underlying host response to infection at the molecular level; optical imaging techniques (flow cytometry and fluorescence microscopy) for assessing cellular responses to infection; and biophotonic imaging for visualising the infectious disease process. All of these technologies hold great promise for important breakthroughs in the rational development of vaccines and therapeutics for biodefence agents.
Collapse
|
30
|
Chu J, Sadeghi S, Raymond A, Jackman SD, Nip KM, Mar R, Mohamadi H, Butterfield YS, Robertson AG, Birol I. BioBloom tools: fast, accurate and memory-efficient host species sequence screening using bloom filters. ACTA ACUST UNITED AC 2014; 30:3402-4. [PMID: 25143290 PMCID: PMC4816029 DOI: 10.1093/bioinformatics/btu558] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Large datasets can be screened for sequences from a specific organism, quickly and with low memory requirements, by a data structure that supports time- and memory-efficient set membership queries. Bloom filters offer such queries but require that false positives be controlled. We present BioBloom Tools, a Bloom filter-based sequence-screening tool that is faster than BWA, Bowtie 2 (popular alignment algorithms) and FACS (a membership query algorithm). It delivers accuracies comparable with these tools, controls false positives and has low memory requirements. Availability and implementaion:www.bcgsc.ca/platform/bioinfo/software/biobloomtools Contact:cjustin@bcgsc.ca or ibirol@bcgsc.ca Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Justin Chu
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Sara Sadeghi
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Anthony Raymond
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Shaun D Jackman
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Ka Ming Nip
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Richard Mar
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Hamid Mohamadi
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Yaron S Butterfield
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - A Gordon Robertson
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| | - Inanç Birol
- Canada's Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC V5Z 4S6, Canada
| |
Collapse
|
31
|
Comprehensive high-throughput RNA sequencing analysis reveals contamination of multiple nasopharyngeal carcinoma cell lines with HeLa cell genomes. J Virol 2014; 88:10696-704. [PMID: 24991015 DOI: 10.1128/jvi.01457-14] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
UNLABELLED In an attempt to explore infectious agents associated with nasopharyngeal carcinomas (NPCs), we employed our high-throughput RNA sequencing (RNA-seq) analysis pipeline, RNA CoMPASS, to investigate the presence of ectopic organisms within a number of NPC cell lines commonly used by NPC and Epstein-Barr virus (EBV) researchers. Sequencing data sets from both CNE1 and HONE1 were found to contain reads for human papillomavirus 18 (HPV-18). Subsequent real-time reverse transcription-PCR (RT-PCR) analysis on a panel of NPC cell lines identified HPV-18 in CNE1 and HONE1 as well as three additional NPC cell lines (CNE2, AdAH, and NPC-KT). Further analysis of the chromosomal integration arrangement of HPV-18 in NPCs revealed patterns identical to those observed in HeLa cells. Clustering based on human single nucleotide variation (SNV) analysis of two separate HeLa cell lines and several NPC cell lines demonstrated two distinct clusters with CNE1, as well as HONE1 clustering with the two HeLa cell lines. In addition, duplex-PCR-based genotyping showed that CNE1, CNE2, and HONE1 do not have a HeLa cell-specific L1 retrotransposon insertion, suggesting that these three HPV-18(+) NPC lines are likely products of a somatic hybridization with HeLa cells, which is also consistent with our RNA-seq-based gene level SNV analysis. Taking all of these findings together, we conclude that a widespread HeLa contamination may exist in many NPC cell lines, and authentication of these cell lines is recommended. Finally, we provide a proof of concept for the utility of an RNA-seq-based approach for cell authentication. IMPORTANCE Nasopharyngeal carcinoma (NPC) cell lines are important model systems for analyzing the complex life cycle and pathogenesis of Epstein-Barr virus (EBV). Using an RNA-seq-based approach, we found HeLa cell contamination in several NPC cell lines that are commonly used in the EBV and related fields. Our data support the notion that contamination resulted from somatic hybridization with HeLa cells, likely occurring at the point of cell line establishment. Given the rarity of NPCs, the long history of NPC cell lines, and the lack of rigorous cell line authentication, it is likely that the actual prevalence and impact of HeLa cell contamination on the EBV field might be greater. We therefore recommend cell line authentication prior to performing experiments using NPC cell lines to avoid inaccurate conclusions. The novel RNA-seq-based cell authentication approach reported here can serve as a comprehensive method for validating cell lines.
Collapse
|