1
|
Luebbert L, Sullivan DK, Carilli M, Hjörleifsson KE, Winnett AV, Chari T, Pachter L. Efficient and accurate detection of viral sequences at single-cell resolution reveals putative novel viruses perturbing host gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.12.11.571168. [PMID: 38168363 PMCID: PMC10760059 DOI: 10.1101/2023.12.11.571168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/05/2024]
Abstract
There are an estimated 300,000 mammalian viruses from which infectious diseases in humans may arise. They inhabit human tissues such as the lungs, blood, and brain and often remain undetected. Efficient and accurate detection of viral infection is vital to understanding its impact on human health and to make accurate predictions to limit adverse effects, such as future epidemics. The increasing use of high-throughput sequencing methods in research, agriculture, and healthcare provides an opportunity for the cost-effective surveillance of viral diversity and investigation of virus-disease correlation. However, existing methods for identifying viruses in sequencing data rely on and are limited to reference genomes or cannot retain single-cell resolution through cell barcode tracking. We introduce a method that accurately and rapidly detects viral sequences in bulk and single-cell transcriptomics data based on highly conserved amino acid domains, which enables the detection of RNA viruses covering up to 1012 virus species. The analysis of viral presence and host gene expression in parallel at single-cell resolution allows for the characterization of host viromes and the identification of viral tropism and host responses. We applied our method to identify putative novel viruses in rhesus macaque PBMC data that display cell type specificity and whose presence correlates with altered host gene expression.
Collapse
Affiliation(s)
- Laura Luebbert
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Delaney K. Sullivan
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Maria Carilli
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | | | - Alexander Viloria Winnett
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- UCLA-Caltech Medical Scientist Training Program, David Geffen School of Medicine, University of California, Los Angeles, California
| | - Tara Chari
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
| | - Lior Pachter
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California
- Department of Computing and Mathematical Sciences, California Institute of Technology, Pasadena, California
| |
Collapse
|
2
|
Shankar R, Paithankar S, Gupta S, Chen B. Detection of viral infection in cell lines using ViralCellDetector. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.07.21.550094. [PMID: 37546847 PMCID: PMC10401957 DOI: 10.1101/2023.07.21.550094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/08/2023]
Abstract
Cell lines are commonly used in research to study biology, including gene expression regulation, cancer progression, and drug responses. However, cross-contaminations with bacteria, mycoplasma, and viruses are common issues in cell line experiments. Detection of bacteria and mycoplasma infections in cell lines is relatively easy but identifying viral infections in cell lines is difficult. Currently, there are no established methods or tools available for detecting viral infections in cell lines. To address this challenge, we developed a tool called ViralCellDetector that detects viruses through mapping RNA-seq data to a library of virus genome. Using this tool, we observed that around 10% of experiments with the MCF7 cell line were likely infected with viruses. Furthermore, to facilitate the detection of samples with unknown sources of viral infection, we identified the differentially expressed genes involved in viral infection from two different cell lines and used these genes in a machine learning approach to classify infected samples based on the host response gene expression biomarkers. Our model reclassifies the infected and non-infected samples with an AUC of 0.91 and an accuracy of 0.93. Overall, our mapping- and marker-based approaches can detect viral infections in any cell line simply based on readily accessible RNA-seq data, allowing researchers to avoid the use of unintentionally infected cell lines in their studies.
Collapse
Affiliation(s)
- Rama Shankar
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Shreya Paithankar
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Suchir Gupta
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
| | - Bin Chen
- Department of Pediatrics and Human Development, College of Human Medicine, Michigan State University, Grand Rapids, MI 49503, USA
- Department of Pharmacology and Toxicology, College of Human Medicine, Michigan State University, Grand Rapids, Michigan, USA
- Department of Computer Science and Engineering, College of Engineering, Michigan State University, East Lansing, Michigan, USA
| |
Collapse
|
3
|
Chen Y, Wang Y, Zhou P, Huang H, Li R, Zeng Z, Cui Z, Tian R, Jin Z, Liu J, Huang Z, Li L, Huang Z, Tian X, Yu M, Hu Z. VIS Atlas: A Database of Virus Integration Sites in Human Genome from NGS Data to Explore Integration Patterns. GENOMICS, PROTEOMICS & BIOINFORMATICS 2023; 21:300-310. [PMID: 36804047 PMCID: PMC10626058 DOI: 10.1016/j.gpb.2023.02.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/30/2022] [Revised: 01/08/2023] [Accepted: 02/10/2023] [Indexed: 02/17/2023]
Abstract
Integration of oncogenic DNA viruses into the human genome is a key step in most virus-induced carcinogenesis. Here, we constructed a virus integration site (VIS) Atlas database, an extensive collection of integration breakpoints for three most prevalent oncoviruses, human papillomavirus, hepatitis B virus, and Epstein-Barr virus based on the next-generation sequencing (NGS) data, literature, and experimental data. There are 63,179 breakpoints and 47,411 junctional sequences with full annotations deposited in the VIS Atlas database, comprising 47 virus genotypes and 17 disease types. The VIS Atlas database provides (1) a genome browser for NGS breakpoint quality check, visualization of VISs, and the local genomic context; (2) a novel platform to discover integration patterns; and (3) a statistics interface for a comprehensive investigation of genotype-specific integration features. Data collected in the VIS Atlas aid to provide insights into virus pathogenic mechanisms and the development of novel antitumor drugs. The VIS Atlas database is available at https://www.vis-atlas.tech/.
Collapse
Affiliation(s)
- Ye Chen
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuyan Wang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Ping Zhou
- Department of Obstetrics and Gynecology, Dongguan Maternal and Child Health Care Hospital, Dongguan 523000, China
| | - Hao Huang
- Office of Scientific Research & Development, Sun Yat-sen University, Guangzhou 510000, China
| | - Rui Li
- Department of Obstetrics and Gynecology, Academician Expert Workstation, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430000, China
| | - Zhen Zeng
- Department of Obstetrics and Gynecology, Academician Expert Workstation, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430000, China
| | - Zifeng Cui
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Rui Tian
- Center for Translational Medicine, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhuang Jin
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Jiashuo Liu
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Zhaoyue Huang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Lifang Li
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Zheying Huang
- Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Xun Tian
- Department of Obstetrics and Gynecology, Academician Expert Workstation, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430000, China.
| | - Meiying Yu
- Department of Pathology, the Central Hospital of Enshi Tujia and Miao Autonomous Prefecture, Enshi 445000, China.
| | - Zheng Hu
- Department of Obstetrics and Gynecology, Zhongnan Hospital of Wuhan University, Wuhan 430062, China; Department of Obstetrics and Gynecology, the First Affiliated Hospital, Sun Yat-sen University, Guangzhou 510000, China.
| |
Collapse
|
4
|
Privitera GF, Alaimo S, Ferro A, Pulvirenti A. Virus finding tools: current solutions and limitations. Brief Bioinform 2022; 23:6618234. [PMID: 35753694 DOI: 10.1093/bib/bbac235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/02/2022] [Accepted: 05/20/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The study of the Human Virome remains challenging nowadays. Viral metagenomics, through high-throughput sequencing data, is the best choice for virus discovery. The metagenomics approach is culture-independent and sequence-independent, helping search for either known or novel viruses. Though it is estimated that more than 40% of the viruses found in metagenomics analysis are not recognizable, we decided to analyze several tools to identify and discover viruses in RNA-seq samples. RESULTS We have analyzed eight Virus Tools for the identification of viruses in RNA-seq data. These tools were compared using a synthetic dataset of 30 viruses and a real one. Our analysis shows that no tool succeeds in recognizing all the viruses in the datasets. So we can conclude that each of these tools has pros and cons, and their choice depends on the application domain. AVAILABILITY Synthetic data used through the review and raw results of their analysis can be found at https://zenodo.org/record/6426147. FASTQ files of real data can be found in GEO (https://www.ncbi.nlm.nih.gov/gds) or ENA (https://www.ebi.ac.uk/ena/browser/home). Raw results of their analysis can be downloaded from https://zenodo.org/record/6425917.
Collapse
Affiliation(s)
- Grete Francesca Privitera
- Department of Physics and Astronomy, University of Catania, Viale A. Doria, 6, 95125, Catania, Italy
| | - Salvatore Alaimo
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Ferro
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| | - Alfredo Pulvirenti
- Department of Clinical and Experimental Medicine, University of Catania, c/o Dept. of Math. and Comp. Science Viale A. Doria, 6, 95125, Catania, Italy
| |
Collapse
|
5
|
Walker K, Kalra D, Lowdon R, Chen G, Molik D, Soto DC, Dabbaghie F, Khleifat AA, Mahmoud M, Paulin LF, Raza MS, Pfeifer SP, Agustinho DP, Aliyev E, Avdeyev P, Barrozo ER, Behera S, Billingsley K, Chong LC, Choubey D, De Coster W, Fu Y, Gener AR, Hefferon T, Henke DM, Höps W, Illarionova A, Jochum MD, Jose M, Kesharwani RK, Kolora SRR, Kubica J, Lakra P, Lattimer D, Liew CS, Lo BW, Lo C, Lötter A, Majidian S, Mendem SK, Mondal R, Ohmiya H, Parvin N, Peralta C, Poon CL, Prabhakaran R, Saitou M, Sammi A, Sanio P, Sapoval N, Syed N, Treangen T, Wang G, Xu T, Yang J, Zhang S, Zhou W, Sedlazeck FJ, Busby B. The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms. F1000Res 2022; 11:530. [PMID: 36262335 PMCID: PMC9557141 DOI: 10.12688/f1000research.110194.1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 05/04/2022] [Indexed: 01/25/2023] Open
Abstract
In October 2021, 59 scientists from 14 countries and 13 U.S. states collaborated virtually in the Third Annual Baylor College of Medicine & DNANexus Structural Variation hackathon. The goal of the hackathon was to advance research on structural variants (SVs) by prototyping and iterating on open-source software. This led to nine hackathon projects focused on diverse genomics research interests, including various SV discovery and genotyping methods, SV sequence reconstruction, and clinically relevant structural variation, including SARS-CoV-2 variants. Repositories for the projects that participated in the hackathon are available at https://github.com/collaborativebioinformatics.
Collapse
Affiliation(s)
- Kimberly Walker
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Divya Kalra
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | - Guangyi Chen
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany
- Center for Bioinformatics, Saarland University, Saarbrücken, Germany
| | - David Molik
- Tropical Crop and Commodity Protection Research Unit, Pacific Basin Agricultural Research Center, Hilo, HI, 96720, USA
| | - Daniela C. Soto
- Biochemistry & Molecular Medicine, Genome Center, MIND Institute, University of California, Davis, Davis, CA, 95616, USA
| | - Fawaz Dabbaghie
- Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Saarbrücken, Germany
- Institute for Medical Biometry and Bioinformatics, University hospital Düsseldorf, Düsseldorf, Germany
| | - Ahmad Al Khleifat
- Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK
| | - Medhat Mahmoud
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Luis F Paulin
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Muhammad Sohail Raza
- CAS Key Laboratory of Genomic and Precision Medicine, Beijing Institute of Genomics, Beijing, China
| | - Susanne P. Pfeifer
- Center for Evolution and Medicine, Arizona State University, Tempe, AZ, USA
| | - Daniel Paiva Agustinho
- Department of Molecular Microbiology, Washington University in St. Louis School of Medicine, St. Louis, MO, 63110, USA
| | - Elbay Aliyev
- Research Department, Sidra Medicine, Doha, Qatar
| | - Pavel Avdeyev
- Computational Biology Institute, The George Washington University, Washington, DC, 20052, USA
| | - Enrico R. Barrozo
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Sairam Behera
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Kimberley Billingsley
- Molecular Genetics Section, Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
| | - Li Chuin Chong
- Beykoz Institute of Life Sciences and Biotechnology, Bezmialem Vakif University, Beykoz, Istanbul, Turkey
| | - Deepak Choubey
- Department of Technology, Savitribai Phule Pune University, Pune, Maharashtra, India
| | - Wouter De Coster
- Applied and Translational Neurogenomics Group, VIB Center for Molecular Neurology, Antwerp, Belgium
- Applied and Translational Neurogenomics Group, Department of Biomedical Sciences, University of Antwerp, Antwerp, Belgium
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Alejandro R. Gener
- Association of Public Health Labs, Centers for Disease Control and Prevention, Downey, CA, USA
| | - Timothy Hefferon
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, 20892, USA
| | - David Morgan Henke
- Department Molecular Virology and Microbiology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Wolfram Höps
- EMBL Heidelberg, Genome Biology Unit, Heidelberg, Germany
| | | | - Michael D. Jochum
- Department of Obstetrics & Gynecology, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Maria Jose
- Centre for Bioinformatics, Pondicherry University, Pondicherry, India
| | - Rupesh K. Kesharwani
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | | | | - Priya Lakra
- Department of Zoology, University of Delhi, Delhi, India
| | - Damaris Lattimer
- University of Applied Sciences Upper Austria - FH Hagenberg, Mühlkreis, Austria
| | - Chia-Sin Liew
- Center for Biotechnology, University of Nebraska-Lincoln, Lincoln, Nebraska, 68588, USA
| | - Bai-Wei Lo
- Department of Biology, University of Konstanz, Konstanz, Germany
| | - Chunhsuan Lo
- Human Genetics Laboratory, National Institute of Genetics, Japan, Mishima City, Japan
| | - Anneri Lötter
- Department of Biochemistry, University of Pretoria, Pretoria, South Africa
| | - Sina Majidian
- Department of Computational Biology, University of Lausanne, Lausanne, Switzerland
| | | | - Rajarshi Mondal
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | - Hiroko Ohmiya
- Genetic Reagent Development Unit, Medical & Biological Laboratories Co., Ltd., Tokoyo, Japan
| | - Nasrin Parvin
- Department of Biotechnology, The University of Burdwan, West Bengal, India
| | | | | | | | - Marie Saitou
- Center of Integrative Genetics (CIGENE),Faculty of Biosciences, Norwegian University of Life Sciences, As, Norway
| | - Aditi Sammi
- School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi, Uttar Pradesh, India
| | - Philippe Sanio
- University of Applied Sciences Upper Austria - FH Hagenberg, Hagenberg im Mühlkreis, Austria
| | - Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Najeeb Syed
- Research Department, Sidra Medicine, Doha, Qatar
| | - Todd Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
| | | | - Tiancheng Xu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Jianzhi Yang
- Department of Quantitative and Computational Biology,, University of Southern California, Los Angeles, CA, USA
| | - Shangzhe Zhang
- School of Biology, University of St Andrews, St Andrews, UK
| | - Weiyu Zhou
- Department of Statistical Science, George Mason University, Fairfax, Virginia, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, 77030, USA
| | | |
Collapse
|
6
|
Stephens Z, O’Brien D, Dehankar M, Roberts LR, Iyer RK, Kocher JP. Exogene: A performant workflow for detecting viral integrations from paired-end next-generation sequencing data. PLoS One 2021; 16:e0250915. [PMID: 34550971 PMCID: PMC8457494 DOI: 10.1371/journal.pone.0250915] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Accepted: 07/08/2021] [Indexed: 01/14/2023] Open
Abstract
The integration of viruses into the human genome is known to be associated with tumorigenesis in many cancers, but the accurate detection of integration breakpoints from short read sequencing data is made difficult by human-viral homologies, viral genome heterogeneity, coverage limitations, and other factors. To address this, we present Exogene, a sensitive and efficient workflow for detecting viral integrations from paired-end next generation sequencing data. Exogene's read filtering and breakpoint detection strategies yield integration coordinates that are highly concordant with long read validation. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma (HCC) tumor samples, identifying integrations of hepatitis B virus that are also supported by long reads. Additionally, we applied Exogene to targeted capture data from 426 previously studied HCC samples, achieving 98.9% concordance with existing methods and identifying 238 high-confidence integrations that were not previously reported. Exogene is applicable to multiple types of paired-end sequence data, including genome, exome, RNA-Seq and targeted capture.
Collapse
Affiliation(s)
- Zachary Stephens
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States of America
| | - Daniel O’Brien
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| | - Mrunal Dehankar
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| | - Lewis R. Roberts
- Department of Internal Medicine, Mayo Clinic, Rochester, MN, United States of America
| | - Ravishankar K. Iyer
- Department of Electrical and Computer Engineering, University of Illinois Urbana-Champaign, Urbana, IL, United States of America
| | - Jean-Pierre Kocher
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, United States of America
| |
Collapse
|
7
|
Cameron DL, Jacobs N, Roepman P, Priestley P, Cuppen E, Papenfuss AT. VIRUSBreakend: Viral Integration Recognition Using Single Breakends. Bioinformatics 2021; 37:3115-3119. [PMID: 33973999 PMCID: PMC8504616 DOI: 10.1093/bioinformatics/btab343] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2020] [Revised: 03/25/2021] [Accepted: 05/03/2021] [Indexed: 12/17/2022] Open
Abstract
Motivation Integration of viruses into infected host cell DNA can cause DNA damage and disrupt genes. Recent cost reductions and growth of whole genome sequencing has produced a wealth of data in which viral presence and integration detection is possible. While key research and clinically relevant insights can be uncovered, existing software has not achieved widespread adoption, limited in part due to high computational costs, the inability to detect a wide range of viruses, as well as precision and sensitivity. Results Here, we describe VIRUSBreakend, a high-speed tool that identifies viral DNA presence and genomic integration. It utilizes single breakends, breakpoints in which only one side can be unambiguously placed, in a novel virus-centric variant calling and assembly approach to identify viral integrations with high sensitivity and a near-zero false discovery rate. VIRUSBreakend detects viral integrations anywhere in the host genome including regions such as centromeres and telomeres unable to be called by existing tools. Applying VIRUSBreakend to a large metastatic cancer cohort, we demonstrate that it can reliably detect clinically relevant viral presence and integration including HPV, HBV, MCPyV, EBV and HHV-8. Availability and implementation VIRUSBreakend is part of the Genomic Rearrangement IDentification Software Suite (GRIDSS). It is available under a GPLv3 license from https://github.com/PapenfussLab/VIRUSBreakend. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Daniel L Cameron
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, University of Melbourne, Australia.,Hartwig Medical Foundation Australia, Sydney, Australia
| | - Nina Jacobs
- Hartwig Medical Foundation, Amsterdam, The Netherlands
| | - Paul Roepman
- Hartwig Medical Foundation, Amsterdam, The Netherlands
| | | | - Edwin Cuppen
- Hartwig Medical Foundation, Amsterdam, The Netherlands.,Center for Molecular Medicine and Oncode Institute, University Medical Center Utrecht, Utrecht, The Netherlands
| | - Anthony T Papenfuss
- Bioinformatics Division, Walter and Eliza Hall Institute of Medical Research, Parkville, Australia.,Department of Medical Biology, University of Melbourne, Australia.,Peter MacCallum Cancer Centre, Melbourne, Australia.,Sir Peter MacCallum Department of Oncology, University of Melbourne, Australia
| |
Collapse
|
8
|
Shmookler Reis RJ, Atluri R, Balasubramaniam M, Johnson J, Ganne A, Ayyadevara S. "Protein aggregates" contain RNA and DNA, entrapped by misfolded proteins but largely rescued by slowing translational elongation. Aging Cell 2021; 20:e13326. [PMID: 33788386 PMCID: PMC8135009 DOI: 10.1111/acel.13326] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 01/12/2021] [Accepted: 02/01/2021] [Indexed: 01/03/2023] Open
Abstract
All neurodegenerative diseases feature aggregates, which usually contain disease-specific diagnostic proteins; non-protein constituents, however, have rarely been explored. Aggregates from SY5Y-APPSw neuroblastoma, a cell model of familial Alzheimer's disease, were crosslinked and sequences of linked peptides identified. We constructed a normalized "contactome" comprising 11 subnetworks, centered on 24 high-connectivity hubs. Remarkably, all 24 are nucleic acid-binding proteins. This led us to isolate and sequence RNA and DNA from Alzheimer's and control aggregates. RNA fragments were mapped to the human genome by RNA-seq and DNA by ChIP-seq. Nearly all aggregate RNA sequences mapped to specific genes, whereas DNA fragments were predominantly intergenic. These nucleic acid mappings are all significantly nonrandom, making an artifactual origin extremely unlikely. RNA (mostly cytoplasmic) exceeded DNA (chiefly nuclear) by twofold to fivefold. RNA fragments recovered from AD tissue were ~1.5-to 2.5-fold more abundant than those recovered from control tissue, similar to the increase in protein. Aggregate abundances of specific RNA sequences were strikingly differential between cultured SY5Y-APPSw glioblastoma cells expressing APOE3 vs. APOE4, consistent with APOE4 competition for E-box/CLEAR motifs. We identified many G-quadruplex and viral sequences within RNA and DNA of aggregates, suggesting that sequestration of viral genomes may have driven the evolution of disordered nucleic acid-binding proteins. After RNA-interference knockdown of the translational-procession factor EEF2 to suppress translation in SY5Y-APPSw cells, the RNA content of aggregates declined by >90%, while reducing protein content by only 30% and altering DNA content by ≤10%. This implies that cotranslational misfolding of nascent proteins may ensnare polysomes into aggregates, accounting for most of their RNA content.
Collapse
Affiliation(s)
- Robert J. Shmookler Reis
- Central Arkansas Veterans Healthcare System Little Rock AR USA
- Department of Geriatrics University of Arkansas for Medical Sciences Little Rock AR USA
- BioInformatics Program University of Arkansas for Medical Sciences and University of Arkansas at Little Rock Little Rock AR USA
| | - Ramani Atluri
- Department of Geriatrics University of Arkansas for Medical Sciences Little Rock AR USA
| | | | - Jay Johnson
- BioInformatics Program University of Arkansas for Medical Sciences and University of Arkansas at Little Rock Little Rock AR USA
| | - Akshatha Ganne
- BioInformatics Program University of Arkansas for Medical Sciences and University of Arkansas at Little Rock Little Rock AR USA
| | - Srinivas Ayyadevara
- Central Arkansas Veterans Healthcare System Little Rock AR USA
- Department of Geriatrics University of Arkansas for Medical Sciences Little Rock AR USA
| |
Collapse
|
9
|
Richmond PA, Kaye AM, Kounkou GJ, Av-Shalom TV, Wasserman WW. Demonstrating the utility of flexible sequence queries against indexed short reads with FlexTyper. PLoS Comput Biol 2021; 17:e1008815. [PMID: 33750951 PMCID: PMC8016220 DOI: 10.1371/journal.pcbi.1008815] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2020] [Revised: 04/01/2021] [Accepted: 02/17/2021] [Indexed: 11/26/2022] Open
Abstract
Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper. In the past 15 years, next generation sequencing technology has revolutionized our capacity to process and analyze DNA sequencing data. From agriculture to medicine, this technology is enabling a deeper understanding of the blueprint of life. Next generation sequencing data is composed of short sequences of DNA, referred to as “reads”, which are often shorter than 200 base pairs making them many orders of magnitude smaller than the entirety of a human genome. Gaining insights from this data has typically leveraged a reference-guided mapping approach, where the reads are aligned to a reference genome and then post-processed to gain actionable information such as presence or absence of genomic sequence, or variation between the reference genome and the sequenced sample. Many experts in the field of genomics have concluded that selecting a single, linear reference genome for mapping reads against is limiting, and several current research endeavors are focused on exploring options for improved analysis methods to unlock the full utility of sequencing data. Among these improvements are the usage of sex-matched genomes, population-specific reference genomes, and emergent graph-based reference pan-genomes. However, advanced methods that use raw DNA sequencing data to inform the choice of reference genome and guide the alignment of reads to enriched reference genomes are needed. Here we develop a method termed FlexTyper, which creates a searchable index of the short read data and enables flexible, user-guided queries to provide valuable insights without the need for reference-guided mapping. We demonstrate the utility of our method by identifying sample ancestry and sex in human whole genome sequencing data, detecting viral pathogen reads in RNA-seq data, African-enriched genome regions absent from the global reference, and killer-cell immune receptor alleles that are complex to discern using standard read mapping. We anticipate early adoption of FlexTyper within analysis pipelines as a pre-mapping component, and further envision the bioinformatics and genomics community will leverage the tool for creative uses of sequence queries from unmapped data.
Collapse
Affiliation(s)
- Phillip Andrew Richmond
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Alice Mary Kaye
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Godfrain Jacques Kounkou
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Tamar Vered Av-Shalom
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, Canada
| | - Wyeth W. Wasserman
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children’s Hospital Research Institute, University of British Columbia, Vancouver, Canada
- * E-mail:
| |
Collapse
|
10
|
Pischedda E, Crava C, Carlassara M, Zucca S, Gasmi L, Bonizzoni M. ViR: a tool to solve intrasample variability in the prediction of viral integration sites using whole genome sequencing data. BMC Bioinformatics 2021; 22:45. [PMID: 33541262 PMCID: PMC7863434 DOI: 10.1186/s12859-021-03980-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2020] [Accepted: 01/27/2021] [Indexed: 12/16/2022] Open
Abstract
Background Several bioinformatics pipelines have been developed to detect sequences from viruses that integrate into the human genome because of the health relevance of these integrations, such as in the persistence of viral infection and/or in generating genotoxic effects, often progressing into cancer. Recent genomics and metagenomics analyses have shown that viruses also integrate into the genome of non-model organisms (i.e., arthropods, fish, plants, vertebrates). However, rarely studies of endogenous viral elements (EVEs) in non-model organisms have gone beyond their characterization from reference genome assemblies. In non-model organisms, we lack a thorough understanding of the widespread occurrence of EVEs and their biological relevance, apart from sporadic cases which nevertheless point to significant roles of EVEs in immunity and regulation of expression. The concomitance of repetitive DNA, duplications and/or assembly fragmentations in a genome sequence and intrasample variability in whole-genome sequencing (WGS) data could determine misalignments when mapping data to a genome assembly. This phenomenon hinders our ability to properly identify integration sites. Results To fill this gap, we developed ViR, a pipeline which solves the dispersion of reads due to intrasample variability in sequencing data from both single and pooled DNA samples thus ameliorating the detection of integration sites. We tested ViR to work with both in silico and real sequencing data from a non-model organism, the arboviral vector Aedes albopictus. Potential viral integrations predicted by ViR were molecularly validated supporting the accuracy of ViR results. Conclusion ViR will open new venues to explore the biology of EVEs, especially in non-model organisms. Importantly, while we generated ViR with the identification of EVEs in mind, its application can be extended to detect any lateral transfer event providing an ad-hoc sequence to interrogate.
Collapse
Affiliation(s)
- Elisa Pischedda
- Department of Biology and Biotechnology, University of Pavia, 27100, Pavia, Italy
| | - Cristina Crava
- Department of Biology and Biotechnology, University of Pavia, 27100, Pavia, Italy.,ERI BIOTECMED, Universitat de Valencia, 46010, Valencia, Spain
| | - Martina Carlassara
- Department of Biology and Biotechnology, University of Pavia, 27100, Pavia, Italy
| | | | - Leila Gasmi
- Department of Biology and Biotechnology, University of Pavia, 27100, Pavia, Italy
| | - Mariangela Bonizzoni
- Department of Biology and Biotechnology, University of Pavia, 27100, Pavia, Italy.
| |
Collapse
|
11
|
Zeng X, Zhao L, Shen C, Zhou Y, Li G, Sung WK. HIVID2: an accurate tool to detect virus integrations in the host genome. Bioinformatics 2021; 37:1821-1827. [PMID: 33453108 DOI: 10.1093/bioinformatics/btab031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2020] [Revised: 12/27/2020] [Accepted: 01/12/2021] [Indexed: 12/11/2022] Open
Abstract
MOTIVATION Virus integration in the host genome is frequently reported to be closely associated with many human diseases, and the detection of virus integration is a critically challenging task. However, most existing tools show limited specificity and sensitivity. Therefore, the objective of this study is to develop a method for accurate detection of virus integration into host genomes. RESULTS Herein, we report a novel method termed HIVID2 that is a significant upgrade of HIVID. HIVID2 performs a paired-end combination (PE-combination) for potentially integrated reads. The resulting sequences are then remapped onto the reference genomes, and both split and discordant chimeric reads are used to identify accurate integration breakpoints with high confidence. HIVID2 represents a great improvement in specificity and sensitivity, and predicts breakpoints closer to the real integrations, compared with existing methods. The advantage of our method was demonstrated using both simulated and real data sets. HIVID2 uncovered novel integration breakpoints in well-known cervical cancer-related genes, including FHIT and LRP1B, which was verified using protein expression data. In addition, HIVID2 allows the user to decide whether to automatically perform advanced analysis using the identified virus integrations. By analyzing the simulated data and real data tests, we demonstrated that HIVID2 is not only more accurate than HIVID but also better than other existing programs with respect to both sensitivity and specificity. We believe that HIVID2 will help in enhancing future research associated with virus integration. AVAILABILITY HIVID2 can be accessed at https://github.com/zengxi-hada/HIVID2/. CONTACT Xi Zeng (zengxi@mail.hzau.edu.cn), Linghao Zhao (michael_yifan@126.com). SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xi Zeng
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Linghao Zhao
- Eastern Hepatobiliary Surgery Hospital, Second Military Medical University, Shanghai 200438, China
| | - Chenhang Shen
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Yi Zhou
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China
| | - Wing-Kin Sung
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Hubei Engineering Technology Research Center of Agricultural Big Data, College of Informatics, Huazhong Agricultural University, Wuhan 430070, China.,Department of Computer Science, National University of Singapore, Singapore, 117417, Singapore.,Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore, 138672, Singapore
| |
Collapse
|
12
|
Giannuzzi D, Aresu L. A First NGS Investigation Suggests No Association Between Viruses and Canine Cancers. Front Vet Sci 2020; 7:365. [PMID: 32766289 PMCID: PMC7380080 DOI: 10.3389/fvets.2020.00365] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 05/26/2020] [Indexed: 12/16/2022] Open
Abstract
Approximately 10–15% of worldwide human cancers are attributable to viral infection. When operating as carcinogenic elements, viruses may act with various mechanisms, but the most important is represented by viral integration into the host genome, causing chromosome instability, genomic mutations, and aberrations. In canine species, few reports have described an association between viral integration and canine cancers, but more comprehensive studies are needed. The advancement of next-generation sequencing and the cost reduction have resulted in a progressive increasing of sequencing data in veterinary oncology offering an opportunity to study virome in canine cancers. In this study, we have performed viral detection and integration analyses using VirusFinder2 software tool on available whole-genome and whole-exome sequencing data of different canine cancers. Several viral sequences were detected in lymphomas, hemangiosarcomas, melanomas, and osteosarcomas, but no reliable integration sites were identified. Even if with some limitations such as the depth and type of sequencing, a restricted number of available nonhuman genomes software, and a limited knowledge on endogenous retroviruses in the canine genome, results are compelling. However, further experiments are needed, and similarly to feline species, dedicated analysis tools for the identification of viral integration sites in canine cancers are required.
Collapse
Affiliation(s)
- Diana Giannuzzi
- Department of Comparative Biomedicine and Food Science, University of Padua, Legnaro, Italy
| | - Luca Aresu
- Department of Veterinary Science, University of Turin, Grugliasco, Italy
| |
Collapse
|
13
|
Zhi D, Zhao Z, Li F, Wu Z, Liu X, Wang K. The International Conference on Intelligent Biology and Medicine (ICIBM) 2018: genomics meets medicine. BMC Med Genomics 2019; 12:20. [PMID: 30704510 PMCID: PMC6357345 DOI: 10.1186/s12920-018-0448-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
During June 10–12, 2018, the International Conference on Intelligent Biology and Medicine (ICIBM 2018) was held in Los Angeles, California, USA. The conference included 11 scientific sessions, four tutorials, one poster session, four keynote talks and four eminent scholar talks that covered a wide range of topics ranging from 3D genome structure analysis and visualization, next generation sequencing analysis, computational drug discovery, medical informatics, cancer genomics to systems biology. While medical genomics has always been a main theme in ICIBM, this year we for the first time organized the BMC Medical Genomics Supplement for ICIBM. Here, we describe 15 ICIBM papers selected for publishing in BMC Medical Genomics.
Collapse
Affiliation(s)
- Degui Zhi
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA
| | - Fuhai Li
- Department of Biomedical Informatics, Ohio State University, Columbus, OH, 43210, USA
| | - Zhijin Wu
- Department of Biostatistics, Brown University, Providence, RI, 02912, USA
| | - Xiaoming Liu
- Human Genetics Center, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, 77030, USA.,Present address: College of Public Health, University of South Florida, Tampa, FL, 33612, USA
| | - Kai Wang
- Raymond G. Perelman Center for Cellular and Molecular Therapeutics, Children's Hospital of Philadelphia, Philadelphia, PA, 19104, USA. .,Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA.
| |
Collapse
|