1
|
Gong B, Li D, Łabaj PP, Pan B, Novoradovskaya N, Thierry-Mieg D, Thierry-Mieg J, Chen G, Bergstrom Lucas A, LoCoco JS, Richmond TA, Tseng E, Kusko R, Happe S, Mercer TR, Pabón-Peña C, Salmans M, Tilgner HU, Xiao W, Johann DJ, Jones W, Tong W, Mason CE, Kreil DP, Xu J. Targeted DNA-seq and RNA-seq of Reference Samples with Short-read and Long-read Sequencing. Sci Data 2024; 11:892. [PMID: 39152166 PMCID: PMC11329654 DOI: 10.1038/s41597-024-03741-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2024] [Accepted: 08/05/2024] [Indexed: 08/19/2024] Open
Abstract
Next-generation sequencing (NGS) has revolutionized genomic research by enabling high-throughput, cost-effective genome and transcriptome sequencing accelerating personalized medicine for complex diseases, including cancer. Whole genome/transcriptome sequencing (WGS/WTS) provides comprehensive insights, while targeted sequencing is more cost-effective and sensitive. In comparison to short-read sequencing, which still dominates the field due to high speed and cost-effectiveness, long-read sequencing can overcome alignment limitations and better discriminate similar sequences from alternative transcripts or repetitive regions. Hybrid sequencing combines the best strengths of different technologies for a more comprehensive view of genomic/transcriptomic variations. Understanding each technology's strengths and limitations is critical for translating cutting-edge technologies into clinical applications. In this study, we sequenced DNA and RNA libraries of reference samples using various targeted DNA and RNA panels and the whole transcriptome on both short-read and long-read platforms. This study design enables a comprehensive analysis of sequencing technologies, targeting protocols, and library preparation methods. Our expanded profiling landscape establishes a reference point for assessing current sequencing technologies, facilitating informed decision-making in genomic research and precision medicine.
Collapse
Affiliation(s)
- Binsheng Gong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Dan Li
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Paweł P Łabaj
- Małopolska Centre of Biotechnology, Jagiellonian University, Krakow, Poland
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria
| | - Bohu Pan
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | | | - Danielle Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Jean Thierry-Mieg
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD, 20894, USA
| | - Guangchun Chen
- Department of Immunology, Genomics and Microarray Core Facility, University of Texas Southwestern Medical Center, 5323 Harry Hine Blvd., Dallas, TX, 75390, USA
| | - Anne Bergstrom Lucas
- Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Todd A Richmond
- Market & Application Development Bioinformatics, Roche Sequencing Solutions Inc., 4300 Hacienda Dr., Pleasanton, CA, 94588, USA
| | | | - Rebecca Kusko
- Cellino Bio, 750 Main Street, Cambridge, MA, 02143, USA
| | - Scott Happe
- Agilent Technologies, Inc., 1834 State Hwy 71 West, Cedar Creek, TX, 78612, USA
| | - Timothy R Mercer
- Australian Institute for Bioengineering and Nanotechnology, The University of Queensland, St Lucia, QLD, Australia
| | - Carlos Pabón-Peña
- Agilent Technologies, Inc., 5301 Stevens Creek Blvd., Santa Clara, CA, 95051, USA
| | | | - Hagen U Tilgner
- Brain and Mind Research Institute, Weill Cornell Medicine, New York, NY, USA
- Center for Neurogenetics, Weill Cornell Medicine, New York, NY, USA
| | - Wenzhong Xiao
- Stanford Genome Technology Center, Stanford University, Palo Alto, CA, 94304, USA
- Massachusetts General Hospital, Harvard Medical School, Boston, MA, 02114, USA
| | - Donald J Johann
- Winthrop P Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, 4301W Markham St., Little Rock, AR, 72205, USA
| | - Wendell Jones
- Q squared Solutions Genomics, 2400 Elis Road, Durham, NC, 27703, USA
| | - Weida Tong
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medicine, Cornell University, New York, NY, 10065, USA.
- The HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
- The WorldQuant Initiative for Quantitative Prediction, Weill Cornell Medicine, New York, NY, USA.
| | - David P Kreil
- Bioinformatics Research, Institute of Molecular Biotechnology, Boku University Vienna, Vienna, Austria.
| | - Joshua Xu
- Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, Jefferson, AR, 72079, USA.
| |
Collapse
|
2
|
Prazsák I, Tombácz D, Fülöp Á, Torma G, Gulyás G, Dörmő Á, Kakuk B, McKenzie Spires L, Toth Z, Boldogkői Z. KSHV 3.0: a state-of-the-art annotation of the Kaposi's sarcoma-associated herpesvirus transcriptome using cross-platform sequencing. mSystems 2024; 9:e0100723. [PMID: 38206015 PMCID: PMC10878076 DOI: 10.1128/msystems.01007-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Accepted: 12/11/2023] [Indexed: 01/12/2024] Open
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) is a large, oncogenic DNA virus belonging to the gammaherpesvirus subfamily. KSHV has been extensively studied with various high-throughput RNA-sequencing approaches to map the transcription start and end sites, the splice junctions, and the translation initiation sites. Despite these efforts, the comprehensive annotation of the viral transcriptome remains incomplete. In the present study, we generated a long-read sequencing data set of the lytic and latent KSHV transcriptome using native RNA and direct cDNA-sequencing methods. This was supplemented with Cap Analysis of Gene Expression sequencing based on a short-read platform. We also utilized data sets from previous publications for our analysis. As a result of this combined approach, we have identified a number of novel viral transcripts and RNA isoforms and have either corroborated or improved the annotation of previously identified viral RNA molecules, thereby notably enhancing our comprehension of the transcriptomic architecture of the KSHV genome. We also evaluated the coding capability of transcripts previously thought to be non-coding by integrating our data on the viral transcripts with translatomic information from other publications.IMPORTANCEDeciphering the viral transcriptome of Kaposi's sarcoma-associated herpesvirus is of great importance because we can gain insight into the molecular mechanism of viral replication and pathogenesis, which can help develop potential targets for antiviral interventions. Specifically, the identification of substantial transcriptional overlaps by this work suggests the existence of a genome-wide interference between transcriptional machineries. This finding indicates the presence of a novel regulatory layer, potentially controlling the expression of viral genes.
Collapse
Affiliation(s)
- István Prazsák
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ádám Fülöp
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Torma
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Lauren McKenzie Spires
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Toth
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| |
Collapse
|
3
|
Prazsák I, Tombácz D, Fülöp Á, Torma G, Gulyás G, Dörmő Á, Kakuk B, Spires LM, Toth Z, Boldogkői Z. KSHV 3.0: A State-of-the-Art Annotation of the Kaposi's Sarcoma-Associated Herpesvirus Transcriptome Using Cross-Platform Sequencing. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.21.558842. [PMID: 37790386 PMCID: PMC10542539 DOI: 10.1101/2023.09.21.558842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]
Abstract
Kaposi's sarcoma-associated herpesvirus (KSHV) is a large, oncogenic DNA virus belonging to the gammaherpesvirus subfamily. KSHV has been extensively studied with various high-throughput RNA-sequencing approaches to map the transcription start and end sites, the splice junctions, and the translation initiation sites. Despite these efforts, the comprehensive annotation of the viral transcriptome remains incomplete. In the present study, we generated a long-read sequencing dataset of the lytic and latent KSHV transcriptome using native RNA and direct cDNA sequencing methods. This was supplemented with CAGE sequencing based on a short-read platform. We also utilized datasets from previous publications for our analysis. As a result of this combined approach, we have identified a number of novel viral transcripts and RNA isoforms and have either corroborated or improved the annotation of previously identified viral RNA molecules, thereby notably enhancing our comprehension of the transcriptomic architecture of the KSHV genome. We also evaluated the coding capability of transcripts previously thought to be non-coding, by integrating our data on the viral transcripts with translatomic information from other publications.
Collapse
Affiliation(s)
- István Prazsák
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ádám Fülöp
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Torma
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| | - Lauren McKenzie Spires
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Toth
- Department of Oral Biology, University of Florida College of Dentistry, Gainesville, Florida, USA
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged, Hungary
| |
Collapse
|
4
|
Tombácz D, Dörmő Á, Gulyás G, Csabai Z, Prazsák I, Kakuk B, Harangozó Á, Jankovics I, Dénes B, Boldogkői Z. High temporal resolution Nanopore sequencing dataset of SARS-CoV-2 and host cell RNAs. Gigascience 2022; 11:giac094. [PMID: 36251275 PMCID: PMC9575581 DOI: 10.1093/gigascience/giac094] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2022] [Revised: 07/14/2022] [Accepted: 09/12/2022] [Indexed: 11/04/2022] Open
Abstract
BACKGROUND Recent studies have disclosed the genome, transcriptome, and epigenetic compositions of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and the effect of viral infection on gene expression of the host cells. It has been demonstrated that, besides the major canonical transcripts, the viral genome also codes for noncanonical RNA molecules. While the structural characterizations have revealed a detailed transcriptomic architecture of the virus, the kinetic studies provided poor and often misleading results on the dynamics of both the viral and host transcripts due to the low temporal resolution of the infection event and the low virus/cell ratio (multiplicity of infection [MOI] = 0.1) applied for the infection. It has never been tested whether the alteration in the host gene expressions is caused by aging of the cells or by the viral infection. FINDINGS In this study, we used Oxford Nanopore's direct cDNA and direct RNA sequencing methods for the generation of a high-coverage, high temporal resolution transcriptomic dataset of SARS-CoV-2 and of the primate host cells, using a high infection titer (MOI = 5). Sixteen sampling time points ranging from 1 to 96 hours with a varying time resolution and 3 biological replicates were used in the experiment. In addition, for each infected sample, corresponding noninfected samples were employed. The raw reads were mapped to the viral and to the host reference genomes, resulting in 49,661,499 mapped reads (54,62 Gbs). The genome of the viral isolate was also sequenced and phylogenetically classified. CONCLUSIONS This dataset can serve as a valuable resource for profiling the SARS-CoV-2 transcriptome dynamics, the virus-host interactions, and the RNA base modifications. Comparison of expression profiles of the host gene in the virally infected and in noninfected cells at different time points allows making a distinction between the effect of the aging of cells in culture and the viral infection. These data can provide useful information for potential novel gene annotations and can also be used for studying the currently available bioinformatics pipelines.
Collapse
Affiliation(s)
- Dóra Tombácz
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - Gábor Gulyás
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - István Prazsák
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | - Ákos Harangozó
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| | | | - Béla Dénes
- Veterinary Diagnostic Directorate, National Food Chain Safety Office, Budapest 1143, Hungary
| | - Zsolt Boldogkői
- Department of Medical Biology, Albert Szent-Györgyi Medical School, University of Szeged, Szeged 6720, Hungary
| |
Collapse
|